summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBruno Haible <bruno@clisp.org>2020-11-14 21:13:57 +0100
committerBruno Haible <bruno@clisp.org>2020-11-14 21:13:57 +0100
commitfc6f96660801e1da624753bd17379a68f82aa409 (patch)
treea066750eda0ec4a780cfc61990c4c9abc543c94d
parenteee98aa8d72595fdb4308ed5d0583516bfc4c0d5 (diff)
downloadlibunistring-fc6f96660801e1da624753bd17379a68f82aa409.tar.gz
doc: Document char32_t.
* doc/char32_t.texi: New file. * doc/libunistring.texi: Include it. * doc/Makefile.am (libunistring_TEXINFOS): Add char32_t.texi.
-rw-r--r--doc/Makefile.am2
-rw-r--r--doc/char32_t.texi28
-rw-r--r--doc/libunistring.texi7
3 files changed, 34 insertions, 3 deletions
diff --git a/doc/Makefile.am b/doc/Makefile.am
index c82b9ec..3f33218 100644
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -36,7 +36,7 @@ info_TEXINFOS = libunistring.texi
libunistring_TEXINFOS = \
unitypes.texi unistr.texi uniconv.texi unistdio.texi uniname.texi \
unictype.texi uniwidth.texi unigbrk.texi uniwbrk.texi unilbrk.texi \
- uninorm.texi unicase.texi uniregex.texi wchar_t.texi \
+ uninorm.texi unicase.texi uniregex.texi wchar_t.texi char32_t.texi \
gpl.texi lgpl.texi fdl.texi
# The dependencies of stamp-vti generated by automake are incomplete.
diff --git a/doc/char32_t.texi b/doc/char32_t.texi
new file mode 100644
index 0000000..e2afba5
--- /dev/null
+++ b/doc/char32_t.texi
@@ -0,0 +1,28 @@
+@node The char32_t problem
+@appendix The @code{char32_t} problem
+
+@cindex char32_t, type
+@cindex char16_t, type
+In response to the @code{wchar_t} mess described in the previous section, ISO C 11 introduces two new types: @code{char32_t} and @code{char16_t}.
+
+@code{char32_t} is a type like @code{wchar_t}, with the added guarantee that it is 32 bits wide. So, it is a type that is appropriate for encoding a Unicode character. It is meant to resolve the problems of the 16-bit wide @code{wchar_t} on AIX and Windows platforms, and allow a saner programming model for wide character strings across all platforms.
+
+@code{char16_t} is a type like @code{wchar_t}, with the added guarantee that it is 16 bits wide. It is meant to allow porting programs that use the broken wide character strings programming model from Windows to all platforms. Of course, no one needs this.
+
+These types are accompanied with a syntax for defining wide string literals with these element types: @code{u"..."} and @code{U"..."}.
+
+So far, so good. What the ISO C designers forgot, is to provide standardized C library functions that operate on these wide character strings. They standardized only the most basic functions, @code{mbrtoc32} and @code{c32rtomb}, which are analogous to @code{mbrtowc} and @code{wcrtomb}, respectively. For the rest, GNU gnulib @url{https://www.gnu.org/software/gnulib/} provides the functions:
+@itemize @bullet
+@item
+Functions for converting an entire string: @code{mbstoc32s} -- like @code{mbstowcs}, @code{c32stombs} -- like @code{wcstombs}.
+@item
+Functions for testing the properties of a 32-bit wide character: @code{c32isalnum}, @code{c32isalpha}, etc. -- like @code{iswalnum}, @code{iswalpha}, etc.
+@end itemize
+
+Still, this API has two problems:
+@itemize @bullet
+@item
+The @code{char32_t} encoding is locale dependent and undocumented. This means, if you want to know any property of a @code{char32_t} character, other than the properties defined by @code{<wctype.h>} -- such as whether it's a dash, currency symbol, paragraph separator, or similar --, you have to convert it to @code{char *} encoding first, by use of the function @code{c32tomb}.
+@item
+Even on platforms where @code{wchar_t} is 32 bits wide, the @code{char32_t} encoding may be different from the @code{wchar_t} encoding.
+@end itemize
diff --git a/doc/libunistring.texi b/doc/libunistring.texi
index b26f48f..b54f632 100644
--- a/doc/libunistring.texi
+++ b/doc/libunistring.texi
@@ -98,7 +98,7 @@ This manual is for GNU libunistring.
@ignore
@c This was: @copying but it triggers a makeinfo 4.13 bug
-Copyright (C) 2001-2019 Free Software Foundation, Inc.
+Copyright (C) 2001-2020 Free Software Foundation, Inc.
This manual is free documentation. It is dually licensed under the
GNU FDL and the GNU GPL. This means that you can redistribute this
@@ -129,7 +129,7 @@ A copy of the license is included in @ref{GNU GPL}.
@page
@vskip 0pt plus 1filll
@c @insertcopying
-Copyright (C) 2001-2019 Free Software Foundation, Inc.
+Copyright (C) 2001-2020 Free Software Foundation, Inc.
This manual is free documentation. It is dually licensed under the
GNU FDL and the GNU GPL. This means that you can redistribute this
@@ -178,6 +178,7 @@ A copy of the license is included in @ref{GNU GPL}.
* Using the library:: How to link with the library and use it?
* More functionality:: More advanced functionality
* The wchar_t mess:: Why @code{wchar_t *} strings are useless
+* The char32_t problem:: Why @code{char32_t *} strings are problematic
* Licenses:: Licenses
* Index:: General Index
@@ -927,6 +928,8 @@ For the rendering of Unicode strings outside of the context of a given toolkit
@include wchar_t.texi
+@include char32_t.texi
+
@node Licenses
@appendix Licenses
@cindex Licenses