Add grapheme cluster break functions.

author: Bruno Haible <bruno@clisp.org> 2011-03-29 23:10:57 +0200
committer: Bruno Haible <bruno@clisp.org> 2011-03-29 23:47:50 +0200
commit: 6ec70a06da21e2bdceb9814fe6fde6be46f890cd (patch)
tree: 1eb310cb7adb756f54392eeea8423af19b1e6600 /doc
parent: 820590c2b81686f64c50d22022aeb49ff3c6e3ad (diff)
download: libunistring-6ec70a06da21e2bdceb9814fe6fde6be46f890cd.tar.gz
4 files changed, 34 insertions, 17 deletions
diff --git a/doc/Makefile.am b/doc/Makefile.am
index cd5c514..c470f6c 100644
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -33,7 +33,7 @@ info_TEXINFOS = libunistring.texi
 # List of texinfo sources @included by libunistring.texi, excluding version.texi.
 libunistring_TEXINFOS = \
   unitypes.texi unistr.texi uniconv.texi unistdio.texi uniname.texi \
-  unictype.texi uniwidth.texi uniwbrk.texi unilbrk.texi unigbrk.texi \
+  unictype.texi uniwidth.texi unigbrk.texi uniwbrk.texi unilbrk.texi \
   uninorm.texi unicase.texi uniregex.texi \
   gpl.texi lgpl.texi fdl.texi
 
diff --git a/doc/libunistring.texi b/doc/libunistring.texi
index 32209ab..a6f9c8f 100644
--- a/doc/libunistring.texi
+++ b/doc/libunistring.texi
@@ -158,9 +158,9 @@ A copy of the license is included in @ref{GNU GPL}.
 * uniname.h::                   Names of Unicode characters
 * unictype.h::                  Unicode character classification and properties
 * uniwidth.h::                  Display width
+* unigbrk.h::                   Grapheme cluster breaking
 * uniwbrk.h::                   Word breaks in strings
 * unilbrk.h::                   Line breaking
-* unigbrk.h::                   Grapheme cluster breaking
 * uninorm.h::                   Normalization forms
 * unicase.h::                   Case mappings
 * uniregex.h::                  Regular expressions
@@ -217,16 +217,16 @@ Properties
 * Properties as objects::
 * Properties as functions::
 
-uniwbrk.h
-
-* Word breaks in a string::
-* Word break property::
-
 unigbrk.h
 
 * Grapheme cluster breaks in a string::
 * Grapheme cluster break property::
 
+uniwbrk.h
+
+* Word breaks in a string::
+* Word break property::
+
 uninorm.h
 
 * Decomposition of characters::
@@ -281,12 +281,12 @@ character names
 character classification and properties
 @item <uniwidth.h>
 string width when using nonproportional fonts
+@item <unigbrk.h>
+grapheme cluster breaks
 @item <uniwbrk.h>
 word breaks
 @item <unilbrk.h>
 line breaking algorithm
-@item <unigbrk.h>
-grapheme cluster breaks
 @item <uninorm.h>
 normalization (composition and decomposition)
 @item <unicase.h>
@@ -763,9 +763,9 @@ NULL is returned and @code{errno} is set.
 @include uniname.texi
 @include unictype.texi
 @include uniwidth.texi
+@include unigbrk.texi
 @include uniwbrk.texi
 @include unilbrk.texi
-@include unigbrk.texi
 @include uninorm.texi
 @include unicase.texi
 @include uniregex.texi
diff --git a/doc/unigbrk.texi b/doc/unigbrk.texi
index db4df6a..196bd9f 100644
--- a/doc/unigbrk.texi
+++ b/doc/unigbrk.texi
@@ -2,11 +2,18 @@
 @chapter Grapheme cluster breaks in strings @code{<unigbrk.h>}
 
 @cindex grapheme cluster breaks
+@cindex grapheme cluster boundaries
 @cindex breaks, grapheme cluster
+@cindex boundaries, between grapheme clusters
 This include file declares functions for determining where in a string
 ``grapheme clusters'' start and end.  A ``grapheme cluster'' is an
 approximation to a user-perceived character, which sometimes
-corresponds to multiple Unicode characters.  The letter @samp{@'e},
+corresponds to multiple Unicode characters.  Editing operations such as
+mouse selection, cursor movement, and backspacing often operate on
+grapheme clusters as units, not on individual characters.
+
+Some grapheme clusters are built from a base character and a combining
+character.  The letter @samp{@'e},
 for example, is most commonly represented in Unicode as a single
 character U+00E8 @sc{LATIN SMALL LETTER E WITH ACUTE}.  It is,
 however, equally valid to use the pair of characters U+0065 @sc{LATIN
@@ -14,6 +21,12 @@ SMALL LETTER E} followed by U+0301 @sc{COMBINING ACUTE ACCENT}.  Since
 the user would perceive this pair of characters as a single character,
 they would be grouped into a single grapheme cluster.
 
+But there are also grapheme clusters that consist of several base characters.
+For example, a Devanagari letter and a Devanagari vowel sign that follows it
+may form a grapheme cluster.  Similarly, some pairs of Thai characters and
+Hangul syllables (formed by two or three Hangul characters) are grapheme
+clusters.
+
 @menu
 * Grapheme cluster breaks in a string::
 * Grapheme cluster break property::
@@ -65,10 +78,11 @@ grapheme cluster break at start of text.
 @node Grapheme cluster break property
 @section Grapheme cluster break property
 
-This is a more low-level API.  The grapheme cluster break property is a property defined
-in Unicode Standard Annex #29, section ``Grapheme Cluster Boundaries, see
-@url{http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries}.@texnl{}  It is
-used for determining the grapheme cluster breaks in a string.
+This is a more low-level API.  The grapheme cluster break property is a
+property defined in Unicode Standard Annex #29, section ``Grapheme Cluster
+Boundaries'', see
+@url{http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries}.@texnl{}
+It is used for determining the grapheme cluster breaks in a string.
 
 The following are the possible values of the grapheme cluster break
 property.  More values may be added in the future.
@@ -87,7 +101,8 @@ property.  More values may be added in the future.
 @deftypevrx Constant int GBP_LVT
 @end deftypevr
 
-The following function looks up the grapheme cluster break property of a character.
+The following function looks up the grapheme cluster break property of a
+character.
 
 @deftypefun int uc_graphemeclusterbreak_property (ucs4_t @var{uc})
 Returns the Grapheme_Cluster_Break property of a Unicode character.
@@ -102,7 +117,7 @@ Returns true if there is an grapheme cluster boundary between Unicode
 characters @var{a} and @var{b}.
 
 There is always a grapheme cluster break at the start or end of text.
-Specify zero for @var{a} or @var{b} to indicate start of text or end
+You can specify zero for @var{a} or @var{b} to indicate start of text or end
 of text, respectively.
 
 This implements the extended (not legacy) grapheme cluster rules
diff --git a/doc/uniwbrk.texi b/doc/uniwbrk.texi
index 6f06b92..08c273c 100644
--- a/doc/uniwbrk.texi
+++ b/doc/uniwbrk.texi
@@ -2,7 +2,9 @@
 @chapter Word breaks in strings @code{<uniwbrk.h>}
 
 @cindex word breaks
+@cindex word boundaries
 @cindex breaks, word
+@cindex boundaries, between words
 This include file declares functions for determining where in a string
 ``words'' start and end.  Here ``words'' are not necessarily the same as
 entities that can be looked up in dictionaries, but rather groups of
author	Bruno Haible <bruno@clisp.org>	2011-03-29 23:10:57 +0200
committer	Bruno Haible <bruno@clisp.org>	2011-03-29 23:47:50 +0200
commit	6ec70a06da21e2bdceb9814fe6fde6be46f890cd (patch)
tree	1eb310cb7adb756f54392eeea8423af19b1e6600 /doc
parent	820590c2b81686f64c50d22022aeb49ff3c6e3ad (diff)
download	libunistring-6ec70a06da21e2bdceb9814fe6fde6be46f890cd.tar.gz