Merged in CHARACTERS

author: Eric S. Raymond <esr@snark.thyrsus.com> 1993-03-22 03:00:23 +0000
committer: Eric S. Raymond <esr@snark.thyrsus.com> 1993-03-22 03:00:23 +0000
commit: 6c59ed5680001a63b13f3a3f1a2eac0cc254f760 (patch)
tree: d80f3e429cfdd41e3db8e016fd7db3b932cbf564 /etc
parent: e91a241cc39ced76fc2ff3fa7cea95df9de3dd79 (diff)
download: emacs-6c59ed5680001a63b13f3a3f1a2eac0cc254f760.tar.gz
1 files changed, 57 insertions, 0 deletions
diff --git a/etc/=TO-DO b/etc/=TO-DO
index cc5b398eeca..e5b9a49599b 100644
--- a/etc/=TO-DO
+++ b/etc/=TO-DO
@@ -24,3 +24,60 @@ Things useful to do for GNU Emacs:
  inverse video.
 
 * VMS code to list a file directory.  Make dired work.
+
+Long range:
+
+   Ideas for extending GNU Emacs to deal with arbitrary character sets.
+
+I would like GNU Emacs to be extended to handle all the world's alphabets
+and word signs.  I don't expect to have time to do such a thing in the next
+few years, so here are my ideas on the best way to do it.
+
+* Each graphic is represented by a sequence of ordinary 8-bit characters.
+
+* All the characters that make up such a sequence have codes >= 0200.
+
+* The first character of such a sequence is between 0200 and 0237.
+
+* The remaining characters of such a sequence are all 0240 or higher.
+
+* The first character of the sequence determines the number of characters
+in the sequence.  Thus, 0200...0207 could start two-character sequences,
+0210...0227 could start three-character sequences, and 0230 could start
+four-character sequences.  (Codes 0231...0237 would be reserved.)
+
+*  Several common  alphabets,  and  some mathematical   symbols,  would get
+two-character sequences.  (Probably Greek,  Russian,  Hebrew(?), Arabic(?),
+Korean, and Japanese kana).  The remaining alphabets, and  some versions of
+Chinese,  would   get  three-character sequences.    Other  sets of Chinese
+characters would get four-character sequences.
+
+Each country that uses Chinese characters has its own standard character
+set, and it is not easy to correlate them to avoid overlap.  So there may
+need to be several sets of Chinese characters.  That is why they need so
+much code space.
+
+True support for Hebrew and Arabic requires dealing with the problem of
+writing direction for mixed text; I don't know what to do for that.
+
+* The functions that use syntax table would determine the
+syntax of a sequence from its first character.
+
+* Functions in indent.c for computing widths and columns would
+determine the width of a sequence from its first character.
+So would display routines.
+
+* Only a few other editing routines would need any change.  In
+particular, searching and regexp matching might not need any change.
+
+* Most of the work required would be in redisplay.  The only case that
+needs to be supported is with X windows, since ordinary terminals
+can't display all these characters anyway.
+
+* There might need to be code to translate files from this format
+to whatever format is typically stored on disk.
+
+
+I would be very unhappy with half-measures, such as support for
+Japanese only.
+
author	Eric S. Raymond <esr@snark.thyrsus.com>	1993-03-22 03:00:23 +0000
committer	Eric S. Raymond <esr@snark.thyrsus.com>	1993-03-22 03:00:23 +0000
commit	6c59ed5680001a63b13f3a3f1a2eac0cc254f760 (patch)
tree	d80f3e429cfdd41e3db8e016fd7db3b932cbf564 /etc
parent	e91a241cc39ced76fc2ff3fa7cea95df9de3dd79 (diff)
download	emacs-6c59ed5680001a63b13f3a3f1a2eac0cc254f760.tar.gz