summaryrefslogtreecommitdiff
path: root/doc/uniconv.texi
blob: 08197c5aa0cb28056cbc17fcbd554d326413defa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
@node uniconv.h
@chapter Conversions between Unicode and encodings @code{<uniconv.h>}

This include file declares functions for converting between Unicode strings
and @code{char *} strings in locale encoding or in other specified encodings.

@cindex locale encoding
The following function returns the locale encoding.

@deftypefun {const char *} locale_charset ()
Determines the current locale's character encoding, and canonicalizes it
into one of the canonical names listed in @file{config.charset}.
If the canonical name cannot be determined, the result is a non-canonical
name.

The result must not be freed; it is statically allocated.

The result of this function can be used as an argument to the @code{iconv_open}
function in GNU libc, in GNU libiconv, or in the gnulib provided wrapper
around the native @code{iconv_open} function.  It may not work as an argument
to the native @code{iconv_open} function directly.
@end deftypefun

@cindex converting
The following functions convert between strings in a specified encoding and
Unicode strings.

@deftypefun int u8_conv_from_encoding (const char *@var{fromcode}, enum iconv_ilseq_handler @var{handler}, const char *@var{src}, size_t @var{srclen}, size_t *@var{offsets}, uint8_t **@var{resultp}, size_t *@var{lengthp})
@deftypefunx int u16_conv_from_encoding (const char *@var{fromcode}, enum iconv_ilseq_handler @var{handler}, const char *@var{src}, size_t @var{srclen}, size_t *@var{offsets}, uint16_t **@var{resultp}, size_t *@var{lengthp})
@deftypefunx int u32_conv_from_encoding (const char *@var{fromcode}, enum iconv_ilseq_handler @var{handler}, const char *@var{src}, size_t @var{srclen}, size_t *@var{offsets}, uint32_t **@var{resultp}, size_t *@var{lengthp})
Converts an entire string, possibly including NUL bytes, from one encoding
to UTF-8 encoding.

Converts a memory region given in encoding @var{fromcode}.  @var{fromcode} is
as for the @code{iconv_open} function.

The input is in the memory region between @var{src} (inclusive) and
@code{@var{src} + @var{srclen}} (exclusive).

If @var{offsets} is not NULL, it should point to an array of @var{srclen}
integers; this array is filled with offsets into the result, i.e@. the
character starting at @code{@var{src}[i]} corresponds to the character starting
at @code{(*@var{resultp})[@var{offsets}[i]]}, and other offsets are set to
@code{(size_t)(-1)}.

@code{*@var{resultp}} and @code{*@var{lengthp}} should initially be a scratch
buffer and its size, or @code{*@var{resultp}} can initially be NULL.

May erase the contents of the memory at @code{*@var{resultp}}.

Return value: 0 if successful, otherwise -1 and @code{errno} set.
If successful: The resulting string is stored in @code{*@var{resultp}} and
its length in @code{*@var{lengthp}}.  @code{*@var{resultp}} is set to a
freshly allocated memory block, or is unchanged if no dynamic memory allocation
was necessary.

Particular @code{errno} values: @code{EINVAL}, @code{EILSEQ}, @code{ENOMEM}.
@end deftypefun

@deftypefun int u8_conv_to_encoding (const char *@var{tocode}, enum iconv_ilseq_handler @var{handler}, const uint8_t *@var{src}, size_t @var{srclen}, size_t *@var{offsets}, char **@var{resultp}, size_t *@var{lengthp})
@deftypefunx int u16_conv_to_encoding (const char *@var{tocode}, enum iconv_ilseq_handler @var{handler}, const uint16_t *@var{src}, size_t @var{srclen}, size_t *@var{offsets}, char **@var{resultp}, size_t *@var{lengthp})
@deftypefunx int u32_conv_to_encoding (const char *@var{tocode}, enum iconv_ilseq_handler @var{handler}, const uint32_t *@var{src}, size_t @var{srclen}, size_t *@var{offsets}, char **@var{resultp}, size_t *@var{lengthp})
Converts an entire Unicode string, possibly including NUL units, from UTF-8
encoding to a given encoding.

Converts a memory region to encoding @var{tocode}.  @var{tocode} is as for
the @code{iconv_open} function.

The input is in the memory region between @var{src} (inclusive) and
@code{@var{src} + @var{srclen}} (exclusive).

If @var{offsets} is not NULL, it should point to an array of @var{srclen}
integers; this array is filled with offsets into the result, i.e@. the
character starting at @code{@var{src}[i]} corresponds to the character starting
at @code{(*@var{resultp})[@var{offsets}[i]]}, and other offsets are set to
@code{(size_t)(-1)}.

@code{*@var{resultp}} and @code{*@var{lengthp}} should initially be a scratch
buffer and its size, or @code{*@var{resultp}} can initially be NULL.

May erase the contents of the memory at @code{*@var{resultp}}.

Return value: 0 if successful, otherwise -1 and @code{errno} set.
If successful: The resulting string is stored in @code{*@var{resultp}} and
its length in @code{*@var{lengthp}}.  @code{*@var{resultp}} is set to a
freshly allocated memory block, or is unchanged if no dynamic memory allocation
was necessary.

Particular @code{errno} values: @code{EINVAL}, @code{EILSEQ}, @code{ENOMEM}.
@end deftypefun

The following functions convert between NUL terminated strings in a specified
encoding and NUL terminated Unicode strings.

@deftypefun {uint8_t *} u8_strconv_from_encoding (const char *@var{string}, const char *@var{fromcode}, enum iconv_ilseq_handler @var{handler})
@deftypefunx {uint16_t *} u16_strconv_from_encoding (const char *@var{string}, const char *@var{fromcode}, enum iconv_ilseq_handler @var{handler})
@deftypefunx {uint32_t *} u32_strconv_from_encoding (const char *@var{string}, const char *@var{fromcode}, enum iconv_ilseq_handler @var{handler})
Converts a NUL terminated string from a given encoding.

The result is @code{malloc} allocated, or NULL (with @var{errno} set) in case of error.

Particular @code{errno} values: @code{EILSEQ}, @code{ENOMEM}.
@end deftypefun

@deftypefun {char *} u8_strconv_to_encoding (const uint8_t *@var{string}, const char *@var{tocode}, enum iconv_ilseq_handler @var{handler})
@deftypefunx {char *} u16_strconv_to_encoding (const uint16_t *@var{string}, const char *@var{tocode}, enum iconv_ilseq_handler @var{handler})
@deftypefunx {char *} u32_strconv_to_encoding (const uint32_t *@var{string}, const char *@var{tocode}, enum iconv_ilseq_handler @var{handler})
Converts a NUL terminated string to a given encoding.

The result is @code{malloc} allocated, or NULL (with @code{errno} set) in case of error.

Particular @code{errno} values: @code{EILSEQ}, @code{ENOMEM}.
@end deftypefun

The following functions are shorthands that convert between NUL terminated
strings in locale encoding and NUL terminated Unicode strings.

@deftypefun {uint8_t *} u8_strconv_from_locale (const char *@var{string})
@deftypefunx {uint16_t *} u16_strconv_from_locale (const char *@var{string})
@deftypefunx {uint32_t *} u32_strconv_from_locale (const char *@var{string})
Converts a NUL terminated string from the locale encoding.

The result is @code{malloc} allocated, or NULL (with @code{errno} set) in case of error.

Particular @code{errno} values: @code{ENOMEM}.
@end deftypefun

@deftypefun {char *} u8_strconv_to_locale (const uint8_t *@var{string})
@deftypefunx {char *} u16_strconv_to_locale (const uint16_t *@var{string})
@deftypefunx {char *} u32_strconv_to_locale (const uint32_t *@var{string})
Converts a NUL terminated string to the locale encoding.

The result is @code{malloc} allocated, or NULL (with @code{errno} set) in case of error.

Particular @code{errno} values: @code{ENOMEM}.
@end deftypefun