summaryrefslogtreecommitdiff
path: root/doc/libunistring.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/libunistring.texi')
-rw-r--r--doc/libunistring.texi70
1 files changed, 47 insertions, 23 deletions
diff --git a/doc/libunistring.texi b/doc/libunistring.texi
index cde0360..6c907de 100644
--- a/doc/libunistring.texi
+++ b/doc/libunistring.texi
@@ -20,6 +20,30 @@
@include version.texi
+@c Location of the POSIX specification on the web.
+@set POSIXURL http://www.opengroup.org/onlinepubs/9699919799
+
+@c Macro for referencing a POSIX function.
+@c We don't write it as func(), see section "GNU Manuals" of the
+@c GNU coding standards.
+@ifinfo
+@macro posixfunc{func}
+@code{\func\}
+@end macro
+@end ifinfo
+@ifnotinfo
+@macro posixfunc{func}
+@uref{@value{POSIXURL}/functions/\func\.html,,@code{\func\}}
+@end macro
+@end ifnotinfo
+
+@c Macro for referencing a normal function.
+@c We don't write it as func(), see section "GNU Manuals" of the
+@c GNU coding standards.
+@macro func{func}
+@code{\func\}
+@end macro
+
@ifinfo
@dircategory Software development
@direntry
@@ -356,7 +380,7 @@ The formatting of date and time. This is the @code{LC_TIME} category.
In particular, the @code{LC_CTYPE} category of the current locale determines
the character encoding. This is the encoding of @samp{char *} strings.
We also call it the ``locale encoding''. GNU libunistring has a function,
-@code{locale_charset()}, that returns a standardized (platform independent)
+@func{locale_charset}, that returns a standardized (platform independent)
name for this encoding.
All locale encodings used on glibc systems are essentially ASCII compatible:
@@ -429,33 +453,33 @@ As a consequence:
The @code{<ctype.h>} API is useless in this context; it does not work in
multibyte locales.
@item
-The @code{strlen()} function does not return the number of characters
+The @posixfunc{strlen} function does not return the number of characters
in a string. Nor does it return the number of screen columns occupied
by a string after it is output. It merely returns the number of
@emph{bytes} occupied by a string.
@item
-Truncating a string, for example, with @code{strncpy()}, can have the
+Truncating a string, for example, with @posixfunc{strncpy}, can have the
effect of truncating it in the middle of a multibyte character. Such
a string will, when output, have a garbled character at its end, often
represented by a hollow box.
@item
-@code{strchr()} and @code{strrchr()} do not work with multibyte strings
+@posixfunc{strchr} and @posixfunc{strrchr} do not work with multibyte strings
if the locale encoding is GB18030 and the character to be searched is
a digit.
@item
-@code{strstr()} does not work with multibyte strings if the locale encoding
+@posixfunc{strstr} does not work with multibyte strings if the locale encoding
is different from UTF-8.
@item
-@code{strcspn()}, @code{strpbrk()}, @code{strspn()} cannot work correctly
-in multibyte locales: they assume the second argument is a list of
+@posixfunc{strcspn}, @posixfunc{strpbrk}, @posixfunc{strspn} cannot work
+correctly in multibyte locales: they assume the second argument is a list of
single-byte characters. Even in this simple case, they do not work with
multibyte strings if the locale encoding is GB18030 and one of the
characters to be searched is a digit.
@item
-@code{strsep()} and @code{strtok_r()} do not work with multibyte strings
+@posixfunc{strsep} and @posixfunc{strtok_r} do not work with multibyte strings
unless all of the delimiter characters are ASCII characters < 0x30.
@item
-The @code{strcasecmp()}, @code{strncasecmp()}, and @code{strcasestr()}
+The @posixfunc{strcasecmp}, @posixfunc{strncasecmp}, and @posixfunc{strcasestr}
functions do not work with multibyte strings.
@end itemize
@@ -466,26 +490,26 @@ gnulib has modules @samp{mbchar}, @samp{mbiter}, @samp{mbuiter} that
represent multibyte characters and allow to iterate across a multibyte
string with the same ease as through a unibyte string.
@item
-gnulib has functions @code{mbslen()} and @code{mbswidth()} that can be
-used instead of @code{strlen()} when the number of characters or the
+gnulib has functions @func{mbslen} and @func{mbswidth} that can be
+used instead of @posixfunc{strlen} when the number of characters or the
number of screen columns of a string is requested.
@item
-gnulib has functions @code{mbschr()} and @code{mbsrrchr()} that are
-like @code{strchr()} and @code{strrchr()}, but work in multibyte locales.
+gnulib has functions @func{mbschr} and @func{mbsrrchr} that are
+like @posixfunc{strchr} and @posixfunc{strrchr}, but work in multibyte locales.
@item
-gnulib has a function @code{mbsstr()}, like @code{strstr()}, but works
+gnulib has a function @func{mbsstr}, like @posixfunc{strstr}, but works
in multibyte locales.
@item
-gnulib has functions @code{mbscspn()}, @code{mbspbrk()}, @code{mbsspn()}
-that are like @code{strcspn()}, @code{strpbrk()}, @code{strspn()} , but
+gnulib has functions @func{mbscspn}, @func{mbspbrk}, @func{mbsspn}
+that are like @posixfunc{strcspn}, @posixfunc{strpbrk}, @posixfunc{strspn}, but
work in multibyte locales.
@item
-gnulib has functions @code{mbssep()} and @code{mbstok_r()} that are
-like @code{strsep()} and @code{strtok_r()} but work in multibyte locales.
+gnulib has functions @func{mbssep} and @func{mbstok_r} that are
+like @posixfunc{strsep} and @posixfunc{strtok_r} but work in multibyte locales.
@item
-gnulib has functions @code{mbscasecmp()}, @code{mbsncasecmp()},
-@code{mbspcasecmp()}, and @code{mbscasestr()} that are like
-@code{strcasecmp()}, @code{strncasecmp()}, and @code{strcasestr()}, but
+gnulib has functions @func{mbscasecmp}, @func{mbsncasecmp},
+@func{mbspcasecmp}, and @func{mbscasestr} that are like @posixfunc{strcasecmp},
+@posixfunc{strncasecmp}, and @posixfunc{strcasestr}, but
work in multibyte locales. Still, the function @code{ulc_casecmp} is
preferable to these functions; see below.
@end itemize
@@ -558,11 +582,11 @@ and undocumented. This means, if you want to know any property of a
@code{wchar_t} character, other than the properties defined by
@code{<wctype.h>} --- such as whether it's a dash, currency symbol,
paragraph separator, or similar ---, you have to convert it to
-@code{char *} encoding first, by use of the function @code{wctomb()}.
+@code{char *} encoding first, by use of the function @posixfunc{wctomb}.
@item
When you read a stream of wide characters, through the functions
-@code{fgetwc()} and @code{fgetws()}, and when the input stream/file is
+@posixfunc{fgetwc} and @posixfunc{fgetws}, and when the input stream/file is
not in the expected encoding, you have no way to determine the invalid
byte sequence and do some corrective action. If you use these
functions, your program becomes ``garbage in - more garbage out'' or