summaryrefslogtreecommitdiff
path: root/admin/notes/unicode
diff options
context:
space:
mode:
Diffstat (limited to 'admin/notes/unicode')
-rw-r--r--admin/notes/unicode57
1 files changed, 25 insertions, 32 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode
index d641e60ff73..4d6aa6e9a9e 100644
--- a/admin/notes/unicode
+++ b/admin/notes/unicode
@@ -11,15 +11,20 @@ Emacs uses the following files from the Unicode Character Database
. UnicodeData.txt
. Blocks.txt
- . BidiMirroring.txt
. BidiBrackets.txt
+ . BidiCharacterTest.txt
+ . BidiMirroring.txt
. IVD_Sequences.txt
. NormalizationTest.txt
. SpecialCasing.txt
- . BidiCharacterTest.txt
First, the first 7 files need to be copied into admin/unidata/, and
-then Emacs should be rebuilt for them to take effect. Rebuilding
+the file https://www.unicode.org/copyright.html should be copied over
+copyright.html in admin/unidata (that file might need trailing
+whitespace removed before it can be committed to the Emacs
+repository).
+
+Then Emacs should be rebuilt for them to take effect. Rebuilding
Emacs updates several derived files elsewhere in the Emacs source
tree, mainly in lisp/international/.
@@ -28,7 +33,10 @@ files, pay attention to any warning or error messages. In particular,
admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines
new bidirectional attributes of characters, because unidata-gen.el,
bidi.c and dispextern.h need to be updated in that case; failure to do
-so will cause aborts in redisplay.
+so will cause aborts in redisplay. unidata-gen.el will also complain
+if the format of the Unicode Copyright notice in copyright.html
+changed in significant ways; in that case, update the regular
+expression in unidata-gen-file used to extract the copyright string.
Next, review the changes in UnicodeData.txt vs the previous version
used by Emacs. Any changes, be it introduction of new scripts or
@@ -40,7 +48,12 @@ and see if any changes in admin/unidata/blocks.awk are required.
The setting of char-width-table around line 1200 of characters.el
should be checked against the latest version of the Unicode file
-EastAsianWidth.txt, and any discrepancies fixed.
+EastAsianWidth.txt, and any discrepancies fixed: double-width
+characters are those marked with W or F in that file. Zero-width
+characters are not taken from EastAsianWidth.txt, they are those whose
+Unicode General Category property is one of Mn, Me, or Cf, and also
+Hangul jungseong and jongseong characters (a.k.a. "Jamo medial vowels"
+and "Jamo final consonants").
Any new scripts added by UnicodeData.txt will also need updates to
script-representative-chars defined in fontset.el, and also the list
@@ -230,41 +243,21 @@ nontrivial changes to the build process.
admin/charsets/mapfiles/cns2ucsdkw.txt
- * iso-2022-7bit
-
- This file switches between CJK charsets, which is not encoded in UTF-8.
+ * iso-2022-jp
- etc/HELLO
-
- Each of these files contains just one CJK charset, but Emacs
- currently has no easy way to specify set-charset-priority on a
- per-file basis, so converting any of these files to UTF-8 might
- change the file's appearance when viewed by an Emacs that is
- operating in some other language environment.
+ This contains just one CJK charset, but Emacs currently has no
+ easy way to specify set-charset-priority on a per-file basis, so
+ converting this file to UTF-8 might change the file's appearance
+ when viewed by an Emacs that is operating in some other language
+ environment.
etc/tutorials/TUTORIAL.ja
- lisp/international/ja-dic-cnv.el
- lisp/international/ja-dic-utl.el
- lisp/international/kinsoku.el
- lisp/international/kkc.el
- lisp/international/titdic-cnv.el
- lisp/language/japan-util.el
- lisp/language/japanese.el
- lisp/leim/quail/cyril-jis.el
- lisp/leim/quail/hanja-jis.el
- lisp/leim/quail/japanese.el
- lisp/leim/quail/py-punct.el
- lisp/leim/quail/pypunct-b5.el
-
- This file contains just Chinese characters, and has same problem.
- Also, it contains characters that cannot be encoded in UTF-8.
-
- lisp/international/titdic-cnv.el
* utf-8-emacs
These files contain characters that cannot be encoded in UTF-8.
+ lisp/international/titdic-cnv.el
lisp/language/ethio-util.el
lisp/language/ethiopic.el
lisp/language/ind-util.el