etc/NEWS.unicode


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257

GNU Emacs NEWS -- history of user-visible changes.

Copyright (C) 2007, 2008 Free Software Foundation, Inc.
Copyright (C) 2007, 2008
  National Institute of Advanced Industrial Science and Technology (AIST)
  Registration Number H14PRO021
See the end of the file for license conditions.

This file is about changes in the Emacs "unicode" branch.


* Changes in Emacs Unicode

** The Emacs character set is now a superset of Unicode.  
(It has about four times the code space, which should be plenty).

The internal encoding used for buffers and strings is now
Unicode-based and called `utf-8-emacs'.  utf-8-emacs is backwards
compatible with the UTF-8 encoding of Unicode.  The `emacs-mule'
coding system can still read and write data in the old internal encoding.

Since the internal encoding is also used by default for byte-compiled
files -- i.e. the normal coding system for byte-compiled Lisp files is
now utf-8-Emacs -- Lisp containing non-ASCII characters which is
compiled by Emacs 23 can't be read by earlier versions of Emacs.  Files
compiled by Emacs 20, 21, or 22 are loaded correctly as emacs-mule
(whether or not they contain multibyte characters), which makes loading
them somewhat slower than Emacs 23-compiled files.  Thus it may be worth
recompiling existing .elc files which don't need to be shared with older
Emacsen.

** There are assorted new coding systems/aliases -- see M-x list-coding-systems.

** New charset implementation with many new charsets.
See M-x list-character-sets.  New charsets can be defined conveniently
as tables of unicodes.

The dimension of a charset is now 0, 1, 2, or 3, and the size of each
dimension is no longer limited to 94 or 96.

A dynamic charset priority list is used to infer the charset of
characters for display.

** New minor mode Auto Composition Mode composes characters automatically
when they are displayed.  This mode is globally on by default.

** Emacs now supports local fonts (fonts installed in the same machine
as Emacs is running) by freetype and fontconfig libraries.  On X, they
are derived via Xft library with antialias support.  Fontconfig-like
font names (e.g. monospace-12) are also accepted.

** New language environments Chinese-GBK, Chinese-GB18030, and TaiViet.

** The following facilities are obsolete:

Minor modes: unify-8859-on-encoding-mode, unify-8859-on-decoding-mode


* Lisp changes in Emacs Unicode

** Character code, representation, and charset changes.

Now character code space is 0x0..0x3FFFFF with no gap.  Among them,
characters of code 0x0..0x10FFFF are Unicode characters of the same
code points.  Characters of code 0x3FFF80..0x3FFFFF are raw 8-bit bytes.

Generic characters no longer exist.  

In buffer and string, characters are represented by UTF-8 byte
sequence in a multibyte buffer/string.

The concept of charset is changed.  A single character may belong to
multiple charset (e.g. a-grave (U+00E0) belongs to charsets unicode,
iso-8859-1, iso-8859-3, and etc).

*** The new function `characterp' returns t if and only if the argument
is a character.

*** The new function `max-char' returns the maximum character code
(currently it is #x3FFFFF).

*** The functions `encode-char' and `decode-char' now accept any
character sets.

*** The function `define-charset' now accepts a completely different
form of arguments (old-style arguments still work).

*** The new function `define-charset-alias' defines an alias of a charset.

*** The value of the function `char-charset' depends on the current
priorities of charsets.

*** The new function `charset-priority-list' returns the list of
charsets ordered by priority.

*** The new function `set-charset-priority' sets priorities of charsets.

*** The new function `unibyte-charset' returns the current unibyte
charset.  The unibyte charset determines how unibyte/multibyte
conversion is done.

*** The new function `set-unibyte-charset' sets the unibyte charset.

*** The new function `unibyte-string' make a unibyte string from bytes.

** Code conversion changes

*** The new function `define-coding-system' should be used to define a
coding system instead of `make-coding-system' (which is obsolete now).

*** The functions `encode-coding-region' and `decode-coding-region'
have an optional 4th argument to specify where the result of
conversion should go.

*** The functions `encode-coding-string' and `decode-coding-string'
have an optional 4th argument specifying a buffer to store the result
of conversion.

*** The new function `with-coding-priority' executes the body part with
the specified coding system priority order.

*** The new function `check-coding-systems-region' checks if the text
in the region is encodable by the specified coding systems.

*** The new function `coding-system-aliases' returns a list of aliases
of a coding system.

*** The new function `coding-system-charset-list' returns a list of
charsets supported by a coding system.

*** The new function `coding-system-priority-list' returns a list of
coding systems ordered by their priorities.

*** Thew new function `set-coding-system-priority' sets priorities of
coding systems.

** Composition changes

*** New functions and variables `auto-composition-mode' and
`global-auto-composition-mode' toggles the new minor mode Auto
Composition Mode locally and globally.

*** New variable `auto-composition-function' is a function used in
Auto Composition Mode to compose characters.  The default value is the
function `auto-compose-chars'.

** Font Backend changes.

*** New frame parameter `font-backend' specifies a list of
font-backends supported by the frame's graphic device.  On X, they are
currently `x' and `xft'.

*** New function `fontp' checks if the argument is a font-spec
or font-entity.

*** New function `font-spec' creates a new font-spec object.

*** New function `font-get' returns a font property value.

*** New function `font-put' sets a font property value.

*** New function `list-fonts' returns a list of font-entities matching
the given specification.

*** New function `list-families' returns a list of family names of
available fonts.

*** New function `font-font' returns a font-entity best matching with
the given specification.

*** New function `font-xlfd-name' returns an XLFD name of a given font
(font-spec, font-entity, or font-object).

*** New function `clear-font-cache' clears all font caches.

** The function get-char-code-property now accepts many Unicode base
character properties.  They are `name', `general-category',
`canonical-combining-class', `bidi-class', `decomposition',
`decimal-digit-value', `digit-value', `numeric-value', `mirrored',
`old-name', `iso-10646-comment', `uppercase', `lowercase', and
`titlecase'.

** Thew new function `define-char-code-property' defines a character
code property.

** The new function `char-code-property-description' returns the
description string of a character code property.

*** The new variable `find-word-boundary-function-table' is a
char-table of functions to search for a word boundary.

*** The new variable `char-script-table' is a char-table of script names.

*** The new variable `char-width-table' is a char-table of character widths.

*** The new variable `print-charset-text-property' controls how to
handle `charset' text property on printing a string.

*** The new variable `printable-chars' is a char-table defining if a
character is printable or not.

*** The new function `robin-define-package' defines a Robin package,
which is an input method system different from Quail.

*** The new function `robin-modify-package' modifies an existing Robin package.

*** The new function `robin-use-package' start using a Robin package
as an input method.

** The functions `modify-syntax-entry' and `modify-category-entry' now
accepts a cons of characters as the first argument, and modify all
entries in that range of characters.

** The function `set-fontset-font' now accepts a script name as the
second argument, and has an optional 5th argument to control how to
set the font.

** The functions `char-bytes', `chars-in-region', `set-coding-priority',
`make-coding-system', and `char-valid-p' are now obsolete.


* Incompatible Lisp changes

** The behavior of map-char-table has changed.  It may call the
specified function with a cons (FROM . TO) as a key if characters in
that range have the same value.

** The value of the function `charset-id' is now always 0.

** The functions `register-char-codings' and `coding-system-spec' are deleted.


----------------------------------------------------------------------
This file is part of GNU Emacs.

GNU Emacs is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.

GNU Emacs is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GNU Emacs; see the file COPYING.  If not, write to the
Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
Boston, MA 02110-1301, USA.


Local variables:
mode: outline
paragraph-separate: "[ 	]*$"
end:

arch-tag: e21801b9-0724-4cda-8c07-7d60bf3db3fd