summaryrefslogtreecommitdiff
path: root/etc/charsets/README
blob: 2282abef8a64cdd1e3598d7b519fe1fecc4b0431 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# README file for charset mapping files in this directory.
# Copyright (C) 2001, 2002
#   National Institute of Advanced Industrial Science and Technology (AIST)
#   Registration Number H13PRO009
# Copyright (C) 2002 Free Software Foundation, Inc.

# This file is part of GNU Emacs.

# GNU Emacs is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.

# GNU Emacs is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with GNU Emacs; see the file COPYING.  If not, write to
# the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
# Boston, MA 02111-1307, USA.


(1) Format of mapping files

Each line contains a code point and the corresponding Unicode
character code separated by a space.  Both code points and Unicode
character codes are in hexadecimal preceded by "0x".  Comments may be
used, starting with "#".  Code ranges may also be used, with
(inclusive) start and end code points separated by "-" followed by the
unicode of the start of the range

Examples:
0xA0 0x00A0  # no-break space

0x8141-0x8143 0x4E04 # map onto a Unicode range


(2) Source of mapping files

Most mapping files are generated from the data files distributed with
glibc (under the sub-directory "localedata/charmaps").  This list
shows the correspondence of the data file, the mapping file, and which
charset uses it.

DATA-FILE			MAP-FILE		CHARSET
=========			========		=======
ISO-8859-2			8859-2.map		iso-8859-2
ISO-8859-3			8859-3.map		iso-8859-3
ISO-8859-4			8859-4.map		iso-8859-4
ISO-8859-5			8859-5.map		iso-8859-5
ISO-8859-6			8859-6.map		iso-8859-6
ISO-8859-7			8859-7.map		iso-8859-7
ISO-8859-8			8859-8.map		iso-8859-8
ISO-8859-9			8859-9.map		iso-8859-9
ISO-8859-10			8859-10.map		iso-8859-10
ISO-8859-13			8859-13.map		iso-8859-13
ISO-8859-14			8859-14.map		iso-8859-14
ISO-8859-15			8859-15.map		iso-8859-15
ISO-8859-16			8859-16.map		iso-8859-16
GB2312				gb2312-1980.map		chinese-gb2312
EUC-KR				ksc5601-1987.map	korean-ksc5601
JIS_C6220-1969-RO and EUC-JP	jisx0201.map		jisx0201	
EUC-JP				jisx0208-1990.map	japanese-jisx0208
EUC-JP				jisx0212-1990.map	japanese-jisx0212
EUC-TW				cns11643-1.map		chinese-cns11643-1
EUC-TW				cns11643-2.map		chinese-cns11643-2
BIG5				big5.map		big5
BIG5				big5-1.map		chinese-big5-1
BIG5				big5-2.map		chinese-big5-2
MACINTOSH			mac-roman.map		mac-roman
VISCII				viscii.map		viscii
VISCII				viscii-lower.map	vietnamese-viscii-lower
VISCII				viscii-upper.map	vietnamese-viscii-upper
KOI8-R				koi8-r.map		koi8-r
IBM866				ibm866.map		alternativnyj
CP1251				windows-1251.map	windows-1251
CP1250				windows-1250.map	windows-1250
GEORGIAN-PS			georgian-ps.map		georgian-ps
KOI8-U				koi8-u.map		koi8-u
KOI8-T				koi8-t.map		koi8-t
EBCDIC-US			ebcdic.us.map		ebcdic-us
EBCDIC-UK			ebcdic.uk.map		ebcdic-uk
CP1250				windows-1250.map	windows-1250
CP1251				windows-1251.map	windows-1251
CP1252				windows-1252.map	windows-1252

From ICU:
				cp1125.map		cp1125