summaryrefslogtreecommitdiff
path: root/doc/uniconv.texi
blob: 2437a383aaaef847b379e4f3886bf5849b0ee649 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
@node uniconv.h
@chapter Conversions between Unicode and encodings @code{<uniconv.h>}

This include file declares functions for converting between Unicode strings
and @code{char *} strings in locale encoding or in other specified encodings.

@cindex locale encoding
The following function returns the locale encoding.

@deftypefun {const char *} locale_charset ()
Determines the current locale's character encoding, and canonicalizes it
into one of the canonical names listed in @file{config.charset}.
If the canonical name cannot be determined, the result is a non-canonical
name.

The result must not be freed; it is statically allocated.

The result of this function can be used as an argument to the @code{iconv_open}
function in GNU libc, in GNU libiconv, or in the gnulib provided wrapper
around the native @code{iconv_open} function.  It may not work as an argument
to the native @code{iconv_open} function directly.
@end deftypefun

The handling of unconvertible characters during the conversions can be
parametrized through the following enumeration type:

@deftp Type {enum iconv_ilseq_handler}
This type specifies how unconvertible characters in the input are handled.
@end deftp

@deftypevr Constant {enum iconv_ilseq_handler} iconveh_error
This handler causes the function to return with @code{errno} set to
@code{EILSEQ}.
@end deftypevr

@deftypevr Constant {enum iconv_ilseq_handler} iconveh_question_mark
This handler produces one question mark @samp{?} per unconvertible character.
@end deftypevr

@deftypevr Constant {enum iconv_ilseq_handler} iconveh_escape_sequence
This handler produces an escape sequence @code{\u@var{xxxx}} or
@code{\U@var{xxxxxxxx}} for each unconvertible character.
@end deftypevr

@cindex converting
The following functions convert between strings in a specified encoding and
Unicode strings.

@deftypefun {uint8_t *} u8_conv_from_encoding (const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}char@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, uint8_t@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp})
@deftypefunx {uint16_t *} u16_conv_from_encoding (const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}char@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, uint16_t@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp})
@deftypefunx {uint32_t *} u32_conv_from_encoding (const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}char@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, uint32_t@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp})
Converts an entire string, possibly including NUL bytes, from one encoding
to UTF-8 encoding.

Converts a memory region given in encoding @var{fromcode}.  @var{fromcode} is
as for the @code{iconv_open} function.

The input is in the memory region between @var{src} (inclusive) and
@code{@var{src} + @var{srclen}} (exclusive).

If @var{offsets} is not NULL, it should point to an array of @var{srclen}
integers; this array is filled with offsets into the result, i.e@. the
character starting at @code{@var{src}[i]} corresponds to the character starting
at @code{@var{result}[@var{offsets}[i]]}, and other offsets are set to
@code{(size_t)(-1)}.

@code{@var{resultbuf}} and @code{*@var{lengthp}} should be a scratch
buffer and its size, or @code{@var{resultbuf}} can be NULL.

May erase the contents of the memory at @code{@var{resultbuf}}.

If successful: The resulting Unicode string (non-NULL) is returned and
its length stored in @code{*@var{lengthp}}.  The resulting string is
@code{@var{resultbuf}} if no dynamic memory allocation was necessary,
or a freshly allocated memory block otherwise.

In case of error: NULL is returned and @code{errno} is set.
Particular @code{errno} values: @code{EINVAL}, @code{EILSEQ}, @code{ENOMEM}.
@end deftypefun

@deftypefun {char *} u8_conv_to_encoding (const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}uint8_t@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, char@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp})
@deftypefunx {char *} u16_conv_to_encoding (const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}uint16_t@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, char@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp})
@deftypefunx {char *} u32_conv_to_encoding (const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}uint32_t@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, char@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp})
Converts an entire Unicode string, possibly including NUL units, from UTF-8
encoding to a given encoding.

Converts a memory region to encoding @var{tocode}.  @var{tocode} is as for
the @code{iconv_open} function.

The input is in the memory region between @var{src} (inclusive) and
@code{@var{src} + @var{srclen}} (exclusive).

If @var{offsets} is not NULL, it should point to an array of @var{srclen}
integers; this array is filled with offsets into the result, i.e@. the
character starting at @code{@var{src}[i]} corresponds to the character starting
at @code{@var{result}[@var{offsets}[i]]}, and other offsets are set to
@code{(size_t)(-1)}.

@code{@var{resultbuf}} and @code{*@var{lengthp}} should be a scratch
buffer and its size, or @code{@var{resultbuf}} can be NULL.

May erase the contents of the memory at @code{@var{resultbuf}}.

If successful: The resulting Unicode string (non-NULL) is returned and
its length stored in @code{*@var{lengthp}}.  The resulting string is
@code{@var{resultbuf}} if no dynamic memory allocation was necessary,
or a freshly allocated memory block otherwise.

In case of error: NULL is returned and @code{errno} is set.
Particular @code{errno} values: @code{EINVAL}, @code{EILSEQ}, @code{ENOMEM}.
@end deftypefun

The following functions convert between NUL terminated strings in a specified
encoding and NUL terminated Unicode strings.

@deftypefun {uint8_t *} u8_strconv_from_encoding (const@tie{}char@tie{}*@var{string}, const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler})
@deftypefunx {uint16_t *} u16_strconv_from_encoding (const@tie{}char@tie{}*@var{string}, const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler})
@deftypefunx {uint32_t *} u32_strconv_from_encoding (const@tie{}char@tie{}*@var{string}, const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler})
Converts a NUL terminated string from a given encoding.

The result is @code{malloc} allocated, or NULL (with @var{errno} set) in case of error.

Particular @code{errno} values: @code{EILSEQ}, @code{ENOMEM}.
@end deftypefun

@deftypefun {char *} u8_strconv_to_encoding (const@tie{}uint8_t@tie{}*@var{string}, const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler})
@deftypefunx {char *} u16_strconv_to_encoding (const@tie{}uint16_t@tie{}*@var{string}, const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler})
@deftypefunx {char *} u32_strconv_to_encoding (const@tie{}uint32_t@tie{}*@var{string}, const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler})
Converts a NUL terminated string to a given encoding.

The result is @code{malloc} allocated, or NULL (with @code{errno} set) in case of error.

Particular @code{errno} values: @code{EILSEQ}, @code{ENOMEM}.
@end deftypefun

The following functions are shorthands that convert between NUL terminated
strings in locale encoding and NUL terminated Unicode strings.

@deftypefun {uint8_t *} u8_strconv_from_locale (const@tie{}char@tie{}*@var{string})
@deftypefunx {uint16_t *} u16_strconv_from_locale (const@tie{}char@tie{}*@var{string})
@deftypefunx {uint32_t *} u32_strconv_from_locale (const@tie{}char@tie{}*@var{string})
Converts a NUL terminated string from the locale encoding.

The result is @code{malloc} allocated, or NULL (with @code{errno} set) in case of error.

Particular @code{errno} values: @code{ENOMEM}.
@end deftypefun

@deftypefun {char *} u8_strconv_to_locale (const@tie{}uint8_t@tie{}*@var{string})
@deftypefunx {char *} u16_strconv_to_locale (const@tie{}uint16_t@tie{}*@var{string})
@deftypefunx {char *} u32_strconv_to_locale (const@tie{}uint32_t@tie{}*@var{string})
Converts a NUL terminated string to the locale encoding.

The result is @code{malloc} allocated, or NULL (with @code{errno} set) in case of error.

Particular @code{errno} values: @code{ENOMEM}.
@end deftypefun