*** empty log message ***

author: Richard M. Stallman <rms@gnu.org> 1998-05-19 03:45:57 +0000
committer: Richard M. Stallman <rms@gnu.org> 1998-05-19 03:45:57 +0000
commit: a9f0a989a17f47f9d25b7a426b4e82a8ff684ee4 (patch)
tree: d62b5592064177c684f1509989b223623db3f24c /lispref/nonascii.texi
parent: c6d6572475603083762cb0155ae966de7710bb9c (diff)
download: emacs-a9f0a989a17f47f9d25b7a426b4e82a8ff684ee4.tar.gz
1 files changed, 541 insertions, 229 deletions
diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi
index f75900d6818..ac7c9ed9c43 100644
--- a/lispref/nonascii.texi
+++ b/lispref/nonascii.texi
@@ -17,15 +17,12 @@ characters and how they are stored in strings and buffers.
 * Selecting a Representation::
 * Character Codes::
 * Character Sets::
-* Scanning Charsets::
 * Chars and Bytes::
+* Splitting Characters::
+* Scanning Charsets::
+* Translation of Characters::
 * Coding Systems::
-* Lisp and Coding System::
-* Default Coding Systems::
-* Specifying Coding Systems::
-* Explicit Encoding::
-* MS-DOS File Types::
-* MS-DOS Subprocesses::
+* Input Methods::
 @end menu
 
 @node Text Representations
@@ -53,19 +50,17 @@ character set by setting the variable @code{nonascii-insert-offset}).
 byte, and as a result, the full range of Emacs character codes can be
 stored.  The first byte of a multibyte character is always in the range
 128 through 159 (octal 0200 through 0237).  These values are called
-@dfn{leading codes}.  The first byte determines which character set the
-character belongs to (@pxref{Character Sets}); in particular, it
-determines how many bytes long the sequence is.  The second and
-subsequent bytes of a multibyte character are always in the range 160
-through 255 (octal 0240 through 0377).
+@dfn{leading codes}.  The second and subsequent bytes of a multibyte
+character are always in the range 160 through 255 (octal 0240 through
+0377).
 
   In a buffer, the buffer-local value of the variable
 @code{enable-multibyte-characters} specifies the representation used.
 The representation for a string is determined based on the string
 contents when the string is constructed.
 
-@tindex enable-multibyte-characters
 @defvar enable-multibyte-characters
+@tindex enable-multibyte-characters
 This variable specifies the current buffer's text representation.
 If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
 it contains unibyte text.
@@ -74,20 +69,21 @@ You cannot set this variable directly; instead, use the function
 @code{set-buffer-multibyte} to change a buffer's representation.
 @end defvar
 
-@tindex default-enable-multibyte-characters
 @defvar default-enable-multibyte-characters
-This variable`s value is entirely equivalent to @code{(default-value
+@tindex default-enable-multibyte-characters
+This variable's value is entirely equivalent to @code{(default-value
 'enable-multibyte-characters)}, and setting this variable changes that
-default value.  Although setting the local binding of
-@code{enable-multibyte-characters} in a specific buffer is dangerous,
-changing the default value is safe, and it is a reasonable thing to do.
+default value.  Setting the local binding of
+@code{enable-multibyte-characters} in a specific buffer is not allowed,
+but changing the default value is supported, and it is a reasonable
+thing to do, because it has no effect on existing buffers.
 
 The @samp{--unibyte} command line option does its job by setting the
 default value to @code{nil} early in startup.
 @end defvar
 
-@tindex multibyte-string-p
 @defun multibyte-string-p string
+@tindex multibyte-string-p
 Return @code{t} if @var{string} contains multibyte characters.
 @end defun
 
@@ -120,11 +116,12 @@ user that cannot be overridden automatically.
 unchanged, and likewise 128 through 159.  It converts the non-@sc{ASCII}
 codes 160 through 255 by adding the value @code{nonascii-insert-offset}
 to each character code.  By setting this variable, you specify which
-character set the unibyte characters correspond to.  For example, if
-@code{nonascii-insert-offset} is 2048, which is @code{(- (make-char
-'latin-iso8859-1 0) 128)}, then the unibyte non-@sc{ASCII} characters
-correspond to Latin 1.  If it is 2688, which is @code{(- (make-char
-'greek-iso8859-7 0) 128)}, then they correspond to Greek letters.
+character set the unibyte characters correspond to (@pxref{Character
+Sets}).  For example, if @code{nonascii-insert-offset} is 2048, which is
+@code{(- (make-char 'latin-iso8859-1) 128)}, then the unibyte
+non-@sc{ASCII} characters correspond to Latin 1.  If it is 2688, which
+is @code{(- (make-char 'greek-iso8859-7) 128)}, then they correspond to
+Greek letters.
 
   Converting multibyte text to unibyte is simpler: it performs
 logical-and of each character code with 255.  If
@@ -133,21 +130,22 @@ the beginning of some character set, this conversion is the inverse of
 the other: converting unibyte text to multibyte and back to unibyte
 reproduces the original unibyte text.
 
-@tindex nonascii-insert-offset
 @defvar nonascii-insert-offset
+@tindex nonascii-insert-offset
 This variable specifies the amount to add to a non-@sc{ASCII} character
 when converting unibyte text to multibyte.  It also applies when
-@code{insert-char} or @code{self-insert-command} inserts a character in
-the unibyte non-@sc{ASCII} range, 128 through 255.
+@code{self-insert-command} inserts a character in the unibyte
+non-@sc{ASCII} range, 128 through 255.  However, the function
+@code{insert-char} does not perform this conversion.
 
 The right value to use to select character set @var{cs} is @code{(-
-(make-char @var{cs} 0) 128)}.  If the value of
+(make-char @var{cs}) 128)}.  If the value of
 @code{nonascii-insert-offset} is zero, then conversion actually uses the
 value for the Latin 1 character set, rather than zero.
 @end defvar
 
-@tindex nonascii-translate-table
-@defvar nonascii-translate-table
+@defvar nonascii-translation-table
+@tindex nonascii-translation-table
 This variable provides a more general alternative to
 @code{nonascii-insert-offset}.  You can use it to specify independently
 how to translate each code in the range of 128 through 255 into a
@@ -155,15 +153,15 @@ multibyte character.  The value should be a vector, or @code{nil}.
 If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
 @end defvar
 
-@tindex string-make-unibyte
 @defun string-make-unibyte string
+@tindex string-make-unibyte
 This function converts the text of @var{string} to unibyte
 representation, if it isn't already, and return the result.  If
 @var{string} is a unibyte string, it is returned unchanged.
 @end defun
 
-@tindex string-make-multibyte
 @defun string-make-multibyte string
+@tindex string-make-multibyte
 This function converts the text of @var{string} to multibyte
 representation, if it isn't already, and return the result.  If
 @var{string} is a multibyte string, it is returned unchanged.
@@ -175,8 +173,8 @@ representation, if it isn't already, and return the result.  If
   Sometimes it is useful to examine an existing buffer or string as
 multibyte when it was unibyte, or vice versa.
 
-@tindex set-buffer-multibyte
 @defun set-buffer-multibyte multibyte
+@tindex set-buffer-multibyte
 Set the representation type of the current buffer.  If @var{multibyte}
 is non-@code{nil}, the buffer becomes multibyte.  If @var{multibyte}
 is @code{nil}, the buffer becomes unibyte.
@@ -193,8 +191,8 @@ representation is in use.  It also adjusts various data in the buffer
 same text as they did before.
 @end defun
 
-@tindex string-as-unibyte
 @defun string-as-unibyte string
+@tindex string-as-unibyte
 This function returns a string with the same bytes as @var{string} but
 treating each byte as a character.  This means that the value may have
 more characters than @var{string} has.
@@ -203,8 +201,8 @@ If @var{string} is unibyte already, then the value is @var{string}
 itself.
 @end defun
 
-@tindex string-as-multibyte
 @defun string-as-multibyte string
+@tindex string-as-multibyte
 This function returns a string with the same bytes as @var{string} but
 treating each multibyte sequence as one character.  This means that the
 value may have fewer characters than @var{string} has.
@@ -253,88 +251,93 @@ example, @code{latin-iso8859-1} is one character set,
 @code{greek-iso8859-7} is another, and @code{ascii} is another.  An
 Emacs character set can hold at most 9025 characters; therefore, in some
 cases, characters that would logically be grouped together are split
-into several character sets.  For example, one set of Chinese characters
-is divided into eight Emacs character sets, @code{chinese-cns11643-1}
-through @code{chinese-cns11643-7}.
+into several character sets.  For example, one set of Chinese
+characters, generally known as Big 5, is divided into two Emacs
+character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
 
-@tindex charsetp
 @defun charsetp object
+@tindex charsetp
 Return @code{t} if @var{object} is a character set name symbol,
 @code{nil} otherwise.
 @end defun
 
-@tindex charset-list
 @defun charset-list
+@tindex charset-list
 This function returns a list of all defined character set names.
 @end defun
 
-@tindex char-charset
 @defun char-charset character
-This function returns the the name of the character
+@tindex char-charset
+This function returns the name of the character
 set that @var{character} belongs to.
 @end defun
 
-@node Scanning Charsets
-@section Scanning for Character Sets
-
-  Sometimes it is useful to find out which character sets appear in a
-part of a buffer or a string.  One use for this is in determining which
-coding systems (@pxref{Coding Systems}) are capable of representing all
-of the text in question.
-
-@tindex find-charset-region
-@defun find-charset-region beg end &optional unification
-This function returns a list of the character sets
-that appear in the current buffer between positions @var{beg}
-and @var{end}.
-@end defun
-
-@tindex find-charset-string
-@defun find-charset-string string &optional unification
-This function returns a list of the character sets
-that appear in the string @var{string}.
-@end defun
-
 @node Chars and Bytes
 @section Characters and Bytes
 @cindex bytes and characters
 
+@cindex introduction sequence
+@cindex dimension (of character set)
   In multibyte representation, each character occupies one or more
-bytes.  The functions in this section convert between characters and the
-byte values used to represent them.  For most purposes, there is no need
-to be concerned with the number of bytes used to represent a character
+bytes.  Each character set has an @dfn{introduction sequence}, which is
+one or two bytes long.  The introduction sequence is the beginning of
+the byte sequence for any character in the character set.  (The
+@sc{ASCII} character set has a zero-length introduction sequence.)  The
+rest of the character's bytes distinguish it from the other characters
+in the same character set.  Depending on the character set, there are
+either one or two distinguishing bytes; the number of such bytes is
+called the @dfn{dimension} of the character set.
+
+@defun charset-dimension charset
+@tindex charset-dimension
+This function returns the dimension of @var{charset};
+At present, the dimension is always 1 or 2.
+@end defun
+
+  This is the simplest way to determine the byte length of a character
+set's introduction sequence:
+
+@example
+(- (char-bytes (make-char @var{charset}))
+   (charset-dimension @var{charset}))
+@end example
+
+@node Splitting Characters
+@section Splitting Characters
+
+  The functions in this section convert between characters and the byte
+values used to represent them.  For most purposes, there is no need to
+be concerned with the sequence of bytes used to represent a character,
 because Emacs translates automatically when necessary.
 
-@tindex char-bytes
 @defun char-bytes character
+@tindex char-bytes
 This function returns the number of bytes used to represent the
-character @var{character}.  In most cases, this is the same as
-@code{(length (split-char @var{character}))}; the only exception is for
-ASCII characters and the codes used in unibyte text, which use just one
-byte.
+character @var{character}.  This depends only on the character set that
+@var{character} belongs to; it equals the dimension of that character
+set (@pxref{Character Sets}), plus the length of its introduction
+sequence.
 
 @example
 (char-bytes 2248)
      @result{} 2
 (char-bytes 65)
      @result{} 1
-@end example
-
-This function's values are correct for both multibyte and unibyte
-representations, because the non-@sc{ASCII} character codes used in
-those two representations do not overlap.
-
-@example
 (char-bytes 192)
      @result{} 1
 @end example
+
+The reason this function can give correct results for both multibyte and
+unibyte representations is that the non-@sc{ASCII} character codes used
+in those two representations do not overlap.
 @end defun
 
-@tindex split-char
 @defun split-char character
+@tindex split-char
 Return a list containing the name of the character set of
-@var{character}, followed by one or two byte-values which identify
-@var{character} within that character set.
+@var{character}, followed by one or two byte values (integers) which
+identify @var{character} within that character set.  The number of byte
+values is the character set's dimension.
 
 @example
 (split-char 2248)
@@ -352,11 +355,13 @@ the @code{ascii} character set:
 @end example
 @end defun
 
-@tindex make-char
 @defun make-char charset &rest byte-values
-Thus function returns the character in character set @var{charset}
-identified by @var{byte-values}.  This is roughly the opposite of
-split-char.
+@tindex make-char
+This function returns the character in character set @var{charset}
+identified by @var{byte-values}.  This is roughly the inverse of
+@code{split-char}.  Normally, you should specify either one or two
+@var{byte-values}, according to the dimension of @var{charset}.  For
+example,
 
 @example
 (make-char 'latin-iso8859-1 72)
@@ -364,6 +369,105 @@ split-char.
 @end example
 @end defun
 
+@cindex generic characters
+  If you call @code{make-char} with no @var{byte-values}, the result is
+a @dfn{generic character} which stands for @var{charset}.  A generic
+character is an integer, but it is @emph{not} valid for insertion in the
+buffer as a character.  It can be used in @code{char-table-range} to
+refer to the whole character set (@pxref{Char-Tables}).
+@code{char-valid-p} returns @code{nil} for generic characters.
+For example:
+
+@example
+(make-char 'latin-iso8859-1)
+     @result{} 2176
+(char-valid-p 2176)
+     @result{} nil
+(split-char 2176)
+     @result{} (latin-iso8859-1 0)
+@end example
+
+@node Scanning Charsets
+@section Scanning for Character Sets
+
+  Sometimes it is useful to find out which character sets appear in a
+part of a buffer or a string.  One use for this is in determining which
+coding systems (@pxref{Coding Systems}) are capable of representing all
+of the text in question.
+
+@defun find-charset-region beg end &optional translation
+@tindex find-charset-region
+This function returns a list of the character sets that appear in the
+current buffer between positions @var{beg} and @var{end}.
+
+The optional argument @var{translation} specifies a translation table to
+be used in scanning the text (@pxref{Translation of Characters}).  If it
+is non-@code{nil}, then each character in the region is translated
+through this table, and the value returned describes the translated
+characters instead of the characters actually in the buffer.
+@end defun
+
+@defun find-charset-string string &optional translation
+@tindex find-charset-string
+This function returns a list of the character sets
+that appear in the string @var{string}.
+
+The optional argument @var{translation} specifies a
+translation table; see @code{find-charset-region}, above.
+@end defun
+
+@node Translation of Characters
+@section Translation of Characters
+@cindex character translation tables
+@cindex translation tables
+
+  A @dfn{translation table} specifies a mapping of characters
+into characters.  These tables are used in encoding and decoding, and
+for other purposes.  Some coding systems specify their own particular
+translation tables; there are also default translation tables which
+apply to all other coding systems.
+
+@defun make-translation-table translations
+This function returns a translation table based on the arguments
+@var{translations}.  Each argument---each element of
+@var{translations}---should be a list of the form @code{(@var{from}
+. @var{to})}; this says to translate the character @var{from} into
+@var{to}.
+
+You can also map one whole character set into another character set with
+the same dimension.  To do this, you specify a generic character (which
+designates a character set) for @var{from} (@pxref{Splitting Characters}).
+In this case, @var{to} should also be a generic character, for another
+character set of the same dimension.  Then the translation table
+translates each character of @var{from}'s character set into the
+corresponding character of @var{to}'s character set.
+@end defun
+
+  In decoding, the translation table's translations are applied to the
+characters that result from ordinary decoding.  If a coding system has
+property @code{character-translation-table-for-decode}, that specifies
+the translation table to use.  Otherwise, if
+@code{standard-character-translation-table-for-decode} is
+non-@code{nil}, decoding uses that table.
+
+  In encoding, the translation table's translations are applied to the
+characters in the buffer, and the result of translation is actually
+encoded.  If a coding system has property
+@code{character-translation-table-for-encode}, that specifies the
+translation table to use.  Otherwise the variable
+@code{standard-character-translation-table-for-encode} specifies the
+translation table.
+
+@defvar standard-character-translation-table-for-decode
+This is the default translation table for decoding, for
+coding systems that don't specify any other translation table.
+@end defvar
+
+@defvar standard-character-translation-table-for-encode
+This is the default translation table for encoding, for
+coding systems that don't specify any other translation table.
+@end defvar
+
 @node Coding Systems
 @section Coding Systems
 
@@ -373,6 +477,20 @@ subprocess or receives text from a subprocess, it normally performs
 character code conversion and end-of-line conversion as specified
 by a particular @dfn{coding system}.
 
+@menu
+* Coding System Basics::
+* Encoding and I/O::
+* Lisp and Coding Systems::
+* Default Coding Systems::
+* Specifying Coding Systems::
+* Explicit Encoding::
+* Terminal I/O Encoding::
+* MS-DOS File Types::
+@end menu
+
+@node Coding System Basics
+@subsection Basic Concepts of Coding Systems
+
 @cindex character code conversion
   @dfn{Character code conversion} involves conversion between the encoding
 used inside Emacs and some other encoding.  Emacs supports many
@@ -401,129 +519,219 @@ carriage-return.
 conversion unspecified, to be chosen based on the data.  @dfn{Variant
 coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
 @code{latin-1-mac} specify the end-of-line conversion explicitly as
-well.  Each base coding system has three corresponding variants whose
+well.  Most base coding systems have three corresponding variants whose
 names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
 
+  The coding system @code{raw-text} is special in that it prevents
+character code conversion, and causes the buffer visited with that
+coding system to be a unibyte buffer.  It does not specify the
+end-of-line conversion, allowing that to be determined as usual by the
+data, and has the usual three variants which specify the end-of-line
+conversion.  @code{no-conversion} is equivalent to @code{raw-text-unix}:
+it specifies no conversion of either character codes or end-of-line.
+
+  The coding system @code{emacs-mule} specifies that the data is
+represented in the internal Emacs encoding.  This is like
+@code{raw-text} in that no code conversion happens, but different in
+that the result is multibyte data.
+
+@defun coding-system-get coding-system property
+@tindex coding-system-get
+This function returns the specified property of the coding system
+@var{coding-system}.  Most coding system properties exist for internal
+purposes, but one that you might find useful is @code{mime-charset}.
+That property's value is the name used in MIME for the character coding
+which this coding system can read and write.  Examples:
+
+@example
+(coding-system-get 'iso-latin-1 'mime-charset)
+     @result{} iso-8859-1
+(coding-system-get 'iso-2022-cn 'mime-charset)
+     @result{} iso-2022-cn
+(coding-system-get 'cyrillic-koi8 'mime-charset)
+     @result{} koi8-r
+@end example
+
+The value of the @code{mime-charset} property is also defined
+as an alias for the coding system.
+@end defun
+
+@node Encoding and I/O
+@subsection Encoding and I/O
+
+  The principal purpose coding systems is for use in reading and
+writing files.  The function @code{insert-file-contents} uses
+a coding system for decoding the file data, and @code{write-region}
+uses one to encode the buffer contents.
+
+  You can specify the coding system to use either explicitly
+(@pxref{Specifying Coding Systems}), or implicitly using the defaulting
+mechanism (@pxref{Default Coding Systems}).  But these methods may not
+completely specify what to do.  For example, they may choose a coding
+system such as @code{undefined} which leaves the character code
+conversion to be determined from the data.  In these cases, the I/O
+operation finishes the job of choosing a coding system.  Very often
+you will want to find out afterwards which coding system was chosen.
+
+@defvar buffer-file-coding-system
+@tindex buffer-file-coding-system
+This variable records the coding system that was used for visiting the
+current buffer.  It is used for saving the buffer, and for writing part
+of the buffer with @code{write-region}.  When those operations ask the
+user to specify a different coding system,
+@code{buffer-file-coding-system} is updated to the coding system
+specified.
+@end defvar
+
+@defvar save-buffer-coding-system
+@tindex save-buffer-coding-system
+This variable specifies the coding system for saving the buffer---but it
+is not used for @code{write-region}.  When saving the buffer asks the
+user to specify a different coding system, and
+@code{save-buffer-coding-system} was used, then it is updated to the
+coding system that was specified.
+@end defvar
+
+@defvar last-coding-system-used
+@tindex last-coding-system-used
+I/O operations for files and subprocesses set this variable to the
+coding system name that was used.  The explicit encoding and decoding
+functions (@pxref{Explicit Encoding}) set it too.
+
+@strong{Warning:} Since receiving subprocess output sets this variable,
+it can change whenever Emacs waits; therefore, you should use copy the
+value shortly after the function call which stores the value you are
+interested in.
+@end defvar
+
 @node Lisp and Coding Systems
 @subsection Coding Systems in Lisp
 
   Here are Lisp facilities for working with coding systems;
 
-@tindex coding-system-list
 @defun coding-system-list &optional base-only
+@tindex coding-system-list
 This function returns a list of all coding system names (symbols).  If
 @var{base-only} is non-@code{nil}, the value includes only the
 base coding systems.  Otherwise, it includes variant coding systems as well.
 @end defun
 
-@tindex coding-system-p
 @defun coding-system-p object
+@tindex coding-system-p
 This function returns @code{t} if @var{object} is a coding system
 name.
 @end defun
 
-@tindex check-coding-system
 @defun check-coding-system coding-system
+@tindex check-coding-system
 This function checks the validity of @var{coding-system}.
 If that is valid, it returns @var{coding-system}.
 Otherwise it signals an error with condition @code{coding-system-error}.
 @end defun
 
-@tindex find-safe-coding-system
-@defun find-safe-coding-system from to
-Return a list of proper coding systems to encode a text between
-@var{from} and @var{to}.  All coding systems in the list can safely
-encode any multibyte characters in the text.
+@defun coding-system-change-eol-conversion coding-system eol-type
+@tindex coding-system-change-eol-conversion
+This function returns a coding system which is like @var{coding-system}
+except for its in eol conversion, which is specified by @code{eol-type}.
+@var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
+@code{nil}.  If it is @code{nil}, the returned coding system determines
+the end-of-line conversion from the data.
+@end defun
 
-If the text contains no multibyte characters, return a list of a single
-element @code{undecided}.
+@defun coding-system-change-text-conversion eol-coding text-coding
+@tindex coding-system-change-text-conversion
+This function returns a coding system which uses the end-of-line
+conversion of @var{eol-coding}, and the text conversion of
+@var{text-coding}.  If @var{text-coding} is @code{nil}, it returns
+@code{undecided}, or one of its variants according to @var{eol-coding}.
 @end defun
 
+@defun find-coding-systems-region from to
+@tindex find-coding-systems-region
+This function returns a list of coding systems that could be used to
+encode a text between @var{from} and @var{to}.  All coding systems in
+the list can safely encode any multibyte characters in that portion of
+the text.
+
+If the text contains no multibyte characters, the function returns the
+list @code{(undecided)}.
+@end defun
+
+@defun find-coding-systems-string string
+@tindex find-coding-systems-string
+This function returns a list of coding systems that could be used to
+encode the text of @var{string}.  All coding systems in the list can
+safely encode any multibyte characters in @var{string}.  If the text
+contains no multibyte characters, this returns the list
+@code{(undecided)}.
+@end defun
+
+@defun find-coding-systems-for-charsets charsets
+@tindex find-coding-systems-for-charsets
+This function returns a list of coding systems that could be used to
+encode all the character sets in the list @var{charsets}.
+@end defun
+
+@defun detect-coding-region start end &optional highest
 @tindex detect-coding-region
-@defun detect-coding-region start end highest
 This function chooses a plausible coding system for decoding the text
 from @var{start} to @var{end}.  This text should be ``raw bytes''
 (@pxref{Explicit Encoding}).
 
-Normally this function returns is a list of coding systems that could
+Normally this function returns a list of coding systems that could
 handle decoding the text that was scanned.  They are listed in order of
-decreasing priority, based on the priority specified by the user with
-@code{prefer-coding-system}.  But if @var{highest} is non-@code{nil},
-then the return value is just one coding system, the one that is highest
-in priority.
+decreasing priority.  But if @var{highest} is non-@code{nil}, then the
+return value is just one coding system, the one that is highest in
+priority.
+
+If the region contains only @sc{ASCII} characters, the value
+is @code{undecided} or @code{(undecided)}.
 @end defun
 
-@tindex detect-coding-string string highest
-@defun detect-coding-string
+@defun detect-coding-string string highest
+@tindex detect-coding-string
 This function is like @code{detect-coding-region} except that it
 operates on the contents of @var{string} instead of bytes in the buffer.
 @end defun
 
-@defun find-operation-coding-system operation &rest arguments
-This function returns the coding system to use (by default) for
-performing @var{operation} with @var{arguments}.  The value has this
-form:
-
-@example
-(@var{decoding-system} @var{encoding-system})
-@end example
-
-The first element, @var{decoding-system}, is the coding system to use
-for decoding (in case @var{operation} does decoding), and
-@var{encoding-system} is the coding system for encoding (in case
-@var{operation} does encoding).
-
-The argument @var{operation} should be an Emacs I/O primitive:
-@code{insert-file-contents}, @code{write-region}, @code{call-process},
-@code{call-process-region}, @code{start-process}, or
-@code{open-network-stream}.
-
-The remaining arguments should be the same arguments that might be given
-to that I/O primitive.  Depending on which primitive, one of those
-arguments is selected as the @dfn{target}.  For example, if
-@var{operation} does file I/O, whichever argument specifies the file
-name is the target.  For subprocess primitives, the process name is the
-target.  For @code{open-network-stream}, the target is the service name
-or port number.
-
-This function looks up the target in @code{file-coding-system-alist},
-@code{process-coding-system-alist}, or
-@code{network-coding-system-alist}, depending on @var{operation}.
-@xref{Default Coding Systems}.
-@end defun
-
   Here are two functions you can use to let the user specify a coding
 system, with completion.  @xref{Completion}.
 
+@defun read-coding-system prompt &optional default
 @tindex read-coding-system
-@defun read-coding-system prompt default
 This function reads a coding system using the minibuffer, prompting with
 string @var{prompt}, and returns the coding system name as a symbol.  If
 the user enters null input, @var{default} specifies which coding system
 to return.  It should be a symbol or a string.
 @end defun
 
-@tindex read-non-nil-coding-system
 @defun read-non-nil-coding-system prompt
+@tindex read-non-nil-coding-system
 This function reads a coding system using the minibuffer, prompting with
-string @var{prompt},and returns the coding system name as a symbol.  If
+string @var{prompt}, and returns the coding system name as a symbol.  If
 the user tries to enter null input, it asks the user to try again.
 @xref{Coding Systems}.
 @end defun
 
+  @xref{Process Information}, for how to examine or set the coding
+systems used for I/O to a subprocess.
+
 @node Default Coding Systems
-@section Default Coding Systems
+@subsection Default Coding Systems
 
-  These variable specify which coding system to use by default for
-certain files or when running certain subprograms.  The idea of these
-variables is that you set them once and for all to the defaults you
-want, and then do not change them again.  To specify a particular coding
-system for a particular operation in a Lisp program, don't change these
-variables; instead, override them using @code{coding-system-for-read}
-and @code{coding-system-for-write} (@pxref{Specifying Coding Systems}).
+  This section describes variables that specify the default coding
+system for certain files or when running certain subprograms, and the
+function which which I/O operations use to access them.
+
+  The idea of these variables is that you set them once and for all to the
+defaults you want, and then do not change them again.  To specify a
+particular coding system for a particular operation in a Lisp program,
+don't change these variables; instead, override them using
+@code{coding-system-for-read} and @code{coding-system-for-write}
+(@pxref{Specifying Coding Systems}).
 
-@tindex file-coding-system-alist
 @defvar file-coding-system-alist
+@tindex file-coding-system-alist
 This variable is an alist that specifies the coding systems to use for
 reading and writing particular files.  Each element has the form
 @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
@@ -542,8 +750,8 @@ system or a cons cell containing two coding systems.  This value is used
 as described above.
 @end defvar
 
-@tindex process-coding-system-alist
 @defvar process-coding-system-alist
+@tindex process-coding-system-alist
 This variable is an alist specifying which coding systems to use for a
 subprocess, depending on which program is running in the subprocess.  It
 works like @code{file-coding-system-alist}, except that @var{pattern} is
@@ -553,8 +761,21 @@ coding systems used for I/O to the subprocess, but you can specify
 other coding systems later using @code{set-process-coding-system}.
 @end defvar
 
-@tindex network-coding-system-alist
+  @strong{Warning:} Coding systems such as @code{undecided} which
+determine the coding system from the data do not work entirely reliably
+with asynchronous subprocess output.  This is because Emacs processes
+asynchronous subprocess output in batches, as it arrives.  If the coding
+system leaves the character code conversion unspecified, or leaves the
+end-of-line conversion unspecified, Emacs must try to detect the proper
+conversion from one batch at a time, and this does not always work.
+
+  Therefore, with an asynchronous subprocess, if at all possible, use a
+coding system which determines both the character code conversion and
+the end of line conversion---that is, one like @code{latin-1-unix},
+rather than @code{undecided} or @code{latin-1}.
+
 @defvar network-coding-system-alist
+@tindex network-coding-system-alist
 This variable is an alist that specifies the coding system to use for
 network streams.  It works much like @code{file-coding-system-alist},
 with the difference that the @var{pattern} in an element may be either a
@@ -563,26 +784,60 @@ is matched against the network service name used to open the network
 stream.
 @end defvar
 
-@tindex default-process-coding-system
 @defvar default-process-coding-system
+@tindex default-process-coding-system
 This variable specifies the coding systems to use for subprocess (and
 network stream) input and output, when nothing else specifies what to
 do.
 
-The value should be a cons cell of the form @code{(@var{output-coding}
-. @var{input-coding})}.  Here @var{output-coding} applies to output to
-the subprocess, and @var{input-coding} applies to input from it.
+The value should be a cons cell of the form @code{(@var{input-coding}
+. @var{output-coding})}.  Here @var{input-coding} applies to input from
+the subprocess, and @var{output-coding} applies to output to it.
 @end defvar
 
+@defun find-operation-coding-system operation &rest arguments
+@tindex find-operation-coding-system
+This function returns the coding system to use (by default) for
+performing @var{operation} with @var{arguments}.  The value has this
+form:
+
+@example
+(@var{decoding-system} @var{encoding-system})
+@end example
+
+The first element, @var{decoding-system}, is the coding system to use
+for decoding (in case @var{operation} does decoding), and
+@var{encoding-system} is the coding system for encoding (in case
+@var{operation} does encoding).
+
+The argument @var{operation} should be an Emacs I/O primitive:
+@code{insert-file-contents}, @code{write-region}, @code{call-process},
+@code{call-process-region}, @code{start-process}, or
+@code{open-network-stream}.
+
+The remaining arguments should be the same arguments that might be given
+to that I/O primitive.  Depending on which primitive, one of those
+arguments is selected as the @dfn{target}.  For example, if
+@var{operation} does file I/O, whichever argument specifies the file
+name is the target.  For subprocess primitives, the process name is the
+target.  For @code{open-network-stream}, the target is the service name
+or port number.
+
+This function looks up the target in @code{file-coding-system-alist},
+@code{process-coding-system-alist}, or
+@code{network-coding-system-alist}, depending on @var{operation}.
+@xref{Default Coding Systems}.
+@end defun
+
 @node Specifying Coding Systems
-@section Specifying a Coding System for One Operation
+@subsection Specifying a Coding System for One Operation
 
   You can specify the coding system for a specific operation by binding
 the variables @code{coding-system-for-read} and/or
 @code{coding-system-for-write}.
 
-@tindex coding-system-for-read
 @defvar coding-system-for-read
+@tindex coding-system-for-read
 If this variable is non-@code{nil}, it specifies the coding system to
 use for reading a file, or for input from a synchronous subprocess.
 
@@ -605,14 +860,14 @@ of the right way to use the variable:
 @end example
 
 When its value is non-@code{nil}, @code{coding-system-for-read} takes
-precedence all other methods of specifying a coding system to use for
+precedence over all other methods of specifying a coding system to use for
 input, including @code{file-coding-system-alist},
 @code{process-coding-system-alist} and
 @code{network-coding-system-alist}.
 @end defvar
 
-@tindex coding-system-for-write
 @defvar coding-system-for-write
+@tindex coding-system-for-write
 This works much like @code{coding-system-for-read}, except that it
 applies to output rather than input.  It affects writing to files,
 subprocesses, and net connections.
@@ -623,53 +878,16 @@ When a single operation does both input and output, as do
 affect it.
 @end defvar
 
-@tindex last-coding-system-used
-@defvar last-coding-system-used
-All I/O operations that use a coding system set this variable
-to the coding system name that was used.
-@end defvar
-
-@tindex inhibit-eol-conversion
 @defvar inhibit-eol-conversion
+@tindex inhibit-eol-conversion
 When this variable is non-@code{nil}, no end-of-line conversion is done,
 no matter which coding system is specified.  This applies to all the
 Emacs I/O and subprocess primitives, and to the explicit encoding and
 decoding functions (@pxref{Explicit Encoding}).
 @end defvar
 
-@tindex keyboard-coding-system
-@defun keyboard-coding-system
-This function returns the coding system that is in use for decoding
-keyboard input---or @code{nil} if no coding system is to be used.
-@end defun
-
-@tindex set-keyboard-coding-system
-@defun set-keyboard-coding-system coding-system
-This function specifies @var{coding-system} as the coding system to
-use for decoding keyboard input.  If @var{coding-system} is @code{nil},
-that means do not decode keyboard input.
-@end defun
-
-@tindex terminal-coding-system
-@defun terminal-coding-system
-This function returns the coding system that is in use for encoding
-terminal output---or @code{nil} for no encoding.
-@end defun
-
-@tindex set-terminal-coding-system
-@defun set-terminal-coding-system coding-system
-This function specifies @var{coding-system} as the coding system to use
-for encoding terminal output.  If @var{coding-system} is @code{nil},
-that means do not encode terminal output.
-@end defun
-
-  See also the functions @code{process-coding-system} and
-@code{set-process-coding-system}.  @xref{Process Information}.
-
-  See also @code{read-coding-system} in @ref{High-Level Completion}.
-
 @node Explicit Encoding
-@section Explicit Encoding and Decoding
+@subsection Explicit Encoding and Decoding
 @cindex encoding text
 @cindex decoding text
 
@@ -699,39 +917,72 @@ write them with @code{write-region} (@pxref{Writing to Files}), and
 suppress encoding for that @code{write-region} call by binding
 @code{coding-system-for-write} to @code{no-conversion}.
 
-@tindex encode-coding-region
 @defun encode-coding-region start end coding-system
+@tindex encode-coding-region
 This function encodes the text from @var{start} to @var{end} according
 to coding system @var{coding-system}.  The encoded text replaces the
 original text in the buffer.  The result of encoding is ``raw bytes,''
 but the buffer remains multibyte if it was multibyte before.
 @end defun
 
-@tindex encode-coding-string
 @defun encode-coding-string string coding-system
+@tindex encode-coding-string
 This function encodes the text in @var{string} according to coding
 system @var{coding-system}.  It returns a new string containing the
 encoded text.  The result of encoding is a unibyte string of ``raw bytes.''
 @end defun
 
-@tindex decode-coding-region
 @defun decode-coding-region start end coding-system
+@tindex decode-coding-region
 This function decodes the text from @var{start} to @var{end} according
 to coding system @var{coding-system}.  The decoded text replaces the
 original text in the buffer.  To make explicit decoding useful, the text
 before decoding ought to be ``raw bytes.''
 @end defun
 
-@tindex decode-coding-string
 @defun decode-coding-string string coding-system
+@tindex decode-coding-string
 This function decodes the text in @var{string} according to coding
 system @var{coding-system}.  It returns a new string containing the
 decoded text.  To make explicit decoding useful, the contents of
 @var{string} ought to be ``raw bytes.''
 @end defun
 
+@node Terminal I/O Encoding
+@subsection Terminal I/O Encoding
+
+  Emacs can decode keyboard input using a coding system, and encode
+terminal output.  This kind of decoding and encoding does not set
+@code{last-coding-system-used}.
+
+@defun keyboard-coding-system
+@tindex keyboard-coding-system
+This function returns the coding system that is in use for decoding
+keyboard input---or @code{nil} if no coding system is to be used.
+@end defun
+
+@defun set-keyboard-coding-system coding-system
+@tindex set-keyboard-coding-system
+This function specifies @var{coding-system} as the coding system to
+use for decoding keyboard input.  If @var{coding-system} is @code{nil},
+that means do not decode keyboard input.
+@end defun
+
+@defun terminal-coding-system
+@tindex terminal-coding-system
+This function returns the coding system that is in use for encoding
+terminal output---or @code{nil} for no encoding.
+@end defun
+
+@defun set-terminal-coding-system coding-system
+@tindex set-terminal-coding-system
+This function specifies @var{coding-system} as the coding system to use
+for encoding terminal output.  If @var{coding-system} is @code{nil},
+that means do not encode terminal output.
+@end defun
+
 @node MS-DOS File Types
-@section MS-DOS File Types
+@subsection MS-DOS File Types
 @cindex DOS file types
 @cindex MS-DOS file types
 @cindex Windows file types
@@ -740,17 +991,24 @@ decoded text.  To make explicit decoding useful, the contents of
 @cindex binary files and text files
 
   Emacs on MS-DOS and on MS-Windows recognizes certain file names as
-text files or binary files.  For a text file, Emacs always uses DOS
-end-of-line conversion.  For a binary file, Emacs does no end-of-line
-conversion and no character code conversion.
+text files or binary files.  By ``binary file'' we mean a file of
+literal byte values that are not necessary meant to be characters.
+Emacs does no end-of-line conversion and no character code conversion
+for a binary file.  Meanwhile, when you create a new file which is
+marked by its name as a ``text file'', Emacs uses DOS end-of-line
+conversion.
 
 @defvar buffer-file-type
 This variable, automatically buffer-local in each buffer, records the
-file type of the buffer's visited file.  The value is @code{nil} for
-text, @code{t} for binary.  When a buffer does not specify a coding
-system with @code{buffer-file-coding-system}, this variable is used by
-the function @code{find-buffer-file-type-coding-system} to determine
-which coding system to use when writing the contents of the buffer.
+file type of the buffer's visited file.  When a buffer does not specify
+a coding system with @code{buffer-file-coding-system}, this variable is
+used to determine which coding system to use when writing the contents
+of the buffer.  It should be @code{nil} for text, @code{t} for binary.
+If it is @code{t}, the coding system is @code{no-conversion}.
+Otherwise, @code{undecided-dos} is used.
+
+Normally this variable is set by visiting a file; it is set to
+@code{nil} if the file was visited without any actual conversion.
 @end defvar
 
 @defopt file-name-buffer-file-type-alist
@@ -775,26 +1033,80 @@ This variable says how to handle files for which
 @code{file-name-buffer-file-type-alist} says nothing about the type.
 
 If this variable is non-@code{nil}, then these files are treated as
-binary.  Otherwise, nothing special is done for them---the coding system
-is deduced solely from the file contents, in the usual Emacs fashion.
+binary: the coding system @code{no-conversion} is used.  Otherwise,
+nothing special is done for them---the coding system is deduced solely
+from the file contents, in the usual Emacs fashion.
 @end defopt
 
-@node MS-DOS Subprocesses
-@section MS-DOS Subprocesses
-
-  On Microsoft operating systems, these variables provide an alternative
-way to specify the kind of end-of-line conversion to use for input and
-output.  The variable @code{binary-process-input} applies to input sent
-to the subprocess, and @code{binary-process-output} applies to output
-received from it.  A non-@code{nil} value means the data is ``binary,''
-and @code{nil} means the data is text.
-
-@defvar binary-process-input
-If this variable is @code{nil}, convert newlines to @sc{crlf} sequences in
-the input to a synchronous subprocess.
+@node Input Methods
+@section Input Methods
+@cindex input methods
+
+  @dfn{Input methods} provide convenient ways of entering non-@sc{ASCII}
+characters from the keyboard.  Unlike coding systems, which translate
+non-@sc{ASCII} characters to and from encodings meant to be read by
+programs, input methods provide human-friendly commands.  (@xref{Input
+Methods,,, emacs, The GNU Emacs Manual}, for information on how users
+use input methods to enter text.)  How to define input methods is not
+yet documented in this manual, but here we describe how to use them.
+
+  Each input method has a name, which is currently a string;
+in the future, symbols may also be usable as input method names.
+
+@tindex current-input-method
+@defvar current-input-method
+This variable holds the name of the input method now active in the
+current buffer.  (It automatically becomes local in each buffer when set
+in any fashion.)  It is @code{nil} if no input method is active in the
+buffer now.
 @end defvar
 
-@defvar binary-process-output
-If this variable is @code{nil}, convert @sc{crlf} sequences to newlines in
-the output from a synchronous subprocess.
+@tindex default-input-method
+@defvar default-input-method
+This variable holds the default input method for commands that choose an
+input method.  Unlike @code{current-input-method}, this variable is
+normally global.
 @end defvar
+
+@tindex set-input-method
+@defun set-input-method input-method
+This function activates input method @var{input-method} for the current
+buffer.  It also sets @code{default-input-method} to @var{input-method}.
+If @var{input-method} is @code{nil}, this function deactivates any input
+method for the current buffer.
+@end defun
+
+@tindex read-input-method-name
+@defun read-input-method-name prompt &optional default inhibit-null
+This function reads an input method name with the minibuffer, prompting
+with @var{prompt}.  If @var{default} is non-@code{nil}, that is returned
+by default, if the user enters empty input.  However, if
+@var{inhibit-null} is non-@code{nil}, empty input signals an error.
+
+The returned value is a string.
+@end defun
+
+@tindex input-method-alist
+@defvar input-method-alist
+This variable defines all the supported input methods.
+Each element defines one input method, and should have the form:
+
+@example
+(@var{input-method} @var{language-env} @var{activate-func} @var{title} @var{description} @var{args}...)
+@end example
+
+Here @var{input-method} is the input method name, a string; @var{env} is
+another string, the name of the language environment this input method
+is recommended for.  (That serves only for documentation purposes.)
+
+@var{title} is a string to display in the mode line while this method is
+active.  @var{description} is a string describing this method and what
+it is good for.
+
+@var{activate-func} is a function to call to activate this method.  The
+@var{args}, if any, are passed as arguments to @var{activate-func}.  All
+told, the arguments to @var{activate-func} are @var{input-method} and
+the @var{args}.
+@end defun
+
+
author	Richard M. Stallman <rms@gnu.org>	1998-05-19 03:45:57 +0000
committer	Richard M. Stallman <rms@gnu.org>	1998-05-19 03:45:57 +0000
commit	a9f0a989a17f47f9d25b7a426b4e82a8ff684ee4 (patch)
tree	d62b5592064177c684f1509989b223623db3f24c /lispref/nonascii.texi
parent	c6d6572475603083762cb0155ae966de7710bb9c (diff)
download	emacs-a9f0a989a17f47f9d25b7a426b4e82a8ff684ee4.tar.gz