diff options
author | Eric S. Raymond <esr@snark.thyrsus.com> | 1993-03-22 03:00:23 +0000 |
---|---|---|
committer | Eric S. Raymond <esr@snark.thyrsus.com> | 1993-03-22 03:00:23 +0000 |
commit | 6c59ed5680001a63b13f3a3f1a2eac0cc254f760 (patch) | |
tree | d80f3e429cfdd41e3db8e016fd7db3b932cbf564 /etc | |
parent | e91a241cc39ced76fc2ff3fa7cea95df9de3dd79 (diff) | |
download | emacs-6c59ed5680001a63b13f3a3f1a2eac0cc254f760.tar.gz |
Merged in CHARACTERS
Diffstat (limited to 'etc')
-rw-r--r-- | etc/=TO-DO | 57 |
1 files changed, 57 insertions, 0 deletions
diff --git a/etc/=TO-DO b/etc/=TO-DO index cc5b398eeca..e5b9a49599b 100644 --- a/etc/=TO-DO +++ b/etc/=TO-DO @@ -24,3 +24,60 @@ Things useful to do for GNU Emacs: inverse video. * VMS code to list a file directory. Make dired work. + +Long range: + + Ideas for extending GNU Emacs to deal with arbitrary character sets. + +I would like GNU Emacs to be extended to handle all the world's alphabets +and word signs. I don't expect to have time to do such a thing in the next +few years, so here are my ideas on the best way to do it. + +* Each graphic is represented by a sequence of ordinary 8-bit characters. + +* All the characters that make up such a sequence have codes >= 0200. + +* The first character of such a sequence is between 0200 and 0237. + +* The remaining characters of such a sequence are all 0240 or higher. + +* The first character of the sequence determines the number of characters +in the sequence. Thus, 0200...0207 could start two-character sequences, +0210...0227 could start three-character sequences, and 0230 could start +four-character sequences. (Codes 0231...0237 would be reserved.) + +* Several common alphabets, and some mathematical symbols, would get +two-character sequences. (Probably Greek, Russian, Hebrew(?), Arabic(?), +Korean, and Japanese kana). The remaining alphabets, and some versions of +Chinese, would get three-character sequences. Other sets of Chinese +characters would get four-character sequences. + +Each country that uses Chinese characters has its own standard character +set, and it is not easy to correlate them to avoid overlap. So there may +need to be several sets of Chinese characters. That is why they need so +much code space. + +True support for Hebrew and Arabic requires dealing with the problem of +writing direction for mixed text; I don't know what to do for that. + +* The functions that use syntax table would determine the +syntax of a sequence from its first character. + +* Functions in indent.c for computing widths and columns would +determine the width of a sequence from its first character. +So would display routines. + +* Only a few other editing routines would need any change. In +particular, searching and regexp matching might not need any change. + +* Most of the work required would be in redisplay. The only case that +needs to be supported is with X windows, since ordinary terminals +can't display all these characters anyway. + +* There might need to be code to translate files from this format +to whatever format is typically stored on disk. + + +I would be very unhappy with half-measures, such as support for +Japanese only. + |