diff options
author | Kenichi Handa <handa@m17n.org> | 2005-04-01 00:29:51 +0000 |
---|---|---|
committer | Kenichi Handa <handa@m17n.org> | 2005-04-01 00:29:51 +0000 |
commit | 6fa886202fdca940dd16e9f0b863347c4f565e8a (patch) | |
tree | d007919400bd9acc188ec3ea308b3463f0eb30f1 /lispref/nonascii.texi | |
parent | 9b06ffa3dc182766ec67ee4fe06c2f7141602bc2 (diff) | |
download | emacs-6fa886202fdca940dd16e9f0b863347c4f565e8a.tar.gz |
(Coding System Basics): Describe about rondtrip
identity of coding systems.
Diffstat (limited to 'lispref/nonascii.texi')
-rw-r--r-- | lispref/nonascii.texi | 22 |
1 files changed, 22 insertions, 0 deletions
diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi index 70e77e0a837..91a47ea50f9 100644 --- a/lispref/nonascii.texi +++ b/lispref/nonascii.texi @@ -628,6 +628,28 @@ characters; for example, there are three coding systems for the Cyrillic conversion, but some of them leave the choice unspecified---to be chosen heuristically for each file, based on the data. +In general, a coding system doesn't guarantee a roundtrip identity, +i.e. decoding followed by encoding in the same coding system can +result in the different byte sequence. But there are several coding +systems that go guarantee that the result will be the same as what you +originally decoded. They are: + +@quotation +chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule +greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3 +iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe +japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text +@end quotation + +Likewise, a coding systme doesn't guarantee the other way of roundtrip +identity, i.e. encoding buffer text into a coding system followed by +decoding again with the same coding system will produce the different +buffer text. For instance, when you encode Latin-2 characters by +@code{utf-8} and decode it back by the same coding system, you'll get +Unicode charactes (of charset @code{mule-unicode-0100-24ff}), and when +you encode Unicode characters by @code{iso-latin-2} and decode it back +by the same coding system, you'll get Latin-2 characters. + @cindex end of line conversion @dfn{End of line conversion} handles three different conventions used on various systems for representing end of line in files. The Unix |