diff options
Diffstat (limited to 'admin/notes/unicode')
-rw-r--r-- | admin/notes/unicode | 57 |
1 files changed, 25 insertions, 32 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode index d641e60ff73..4d6aa6e9a9e 100644 --- a/admin/notes/unicode +++ b/admin/notes/unicode @@ -11,15 +11,20 @@ Emacs uses the following files from the Unicode Character Database . UnicodeData.txt . Blocks.txt - . BidiMirroring.txt . BidiBrackets.txt + . BidiCharacterTest.txt + . BidiMirroring.txt . IVD_Sequences.txt . NormalizationTest.txt . SpecialCasing.txt - . BidiCharacterTest.txt First, the first 7 files need to be copied into admin/unidata/, and -then Emacs should be rebuilt for them to take effect. Rebuilding +the file https://www.unicode.org/copyright.html should be copied over +copyright.html in admin/unidata (that file might need trailing +whitespace removed before it can be committed to the Emacs +repository). + +Then Emacs should be rebuilt for them to take effect. Rebuilding Emacs updates several derived files elsewhere in the Emacs source tree, mainly in lisp/international/. @@ -28,7 +33,10 @@ files, pay attention to any warning or error messages. In particular, admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines new bidirectional attributes of characters, because unidata-gen.el, bidi.c and dispextern.h need to be updated in that case; failure to do -so will cause aborts in redisplay. +so will cause aborts in redisplay. unidata-gen.el will also complain +if the format of the Unicode Copyright notice in copyright.html +changed in significant ways; in that case, update the regular +expression in unidata-gen-file used to extract the copyright string. Next, review the changes in UnicodeData.txt vs the previous version used by Emacs. Any changes, be it introduction of new scripts or @@ -40,7 +48,12 @@ and see if any changes in admin/unidata/blocks.awk are required. The setting of char-width-table around line 1200 of characters.el should be checked against the latest version of the Unicode file -EastAsianWidth.txt, and any discrepancies fixed. +EastAsianWidth.txt, and any discrepancies fixed: double-width +characters are those marked with W or F in that file. Zero-width +characters are not taken from EastAsianWidth.txt, they are those whose +Unicode General Category property is one of Mn, Me, or Cf, and also +Hangul jungseong and jongseong characters (a.k.a. "Jamo medial vowels" +and "Jamo final consonants"). Any new scripts added by UnicodeData.txt will also need updates to script-representative-chars defined in fontset.el, and also the list @@ -230,41 +243,21 @@ nontrivial changes to the build process. admin/charsets/mapfiles/cns2ucsdkw.txt - * iso-2022-7bit - - This file switches between CJK charsets, which is not encoded in UTF-8. + * iso-2022-jp - etc/HELLO - - Each of these files contains just one CJK charset, but Emacs - currently has no easy way to specify set-charset-priority on a - per-file basis, so converting any of these files to UTF-8 might - change the file's appearance when viewed by an Emacs that is - operating in some other language environment. + This contains just one CJK charset, but Emacs currently has no + easy way to specify set-charset-priority on a per-file basis, so + converting this file to UTF-8 might change the file's appearance + when viewed by an Emacs that is operating in some other language + environment. etc/tutorials/TUTORIAL.ja - lisp/international/ja-dic-cnv.el - lisp/international/ja-dic-utl.el - lisp/international/kinsoku.el - lisp/international/kkc.el - lisp/international/titdic-cnv.el - lisp/language/japan-util.el - lisp/language/japanese.el - lisp/leim/quail/cyril-jis.el - lisp/leim/quail/hanja-jis.el - lisp/leim/quail/japanese.el - lisp/leim/quail/py-punct.el - lisp/leim/quail/pypunct-b5.el - - This file contains just Chinese characters, and has same problem. - Also, it contains characters that cannot be encoded in UTF-8. - - lisp/international/titdic-cnv.el * utf-8-emacs These files contain characters that cannot be encoded in UTF-8. + lisp/international/titdic-cnv.el lisp/language/ethio-util.el lisp/language/ethiopic.el lisp/language/ind-util.el |