summaryrefslogtreecommitdiff
path: root/doc/unigbrk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/unigbrk.texi')
-rw-r--r--doc/unigbrk.texi29
1 files changed, 22 insertions, 7 deletions
diff --git a/doc/unigbrk.texi b/doc/unigbrk.texi
index db4df6a..196bd9f 100644
--- a/doc/unigbrk.texi
+++ b/doc/unigbrk.texi
@@ -2,11 +2,18 @@
@chapter Grapheme cluster breaks in strings @code{<unigbrk.h>}
@cindex grapheme cluster breaks
+@cindex grapheme cluster boundaries
@cindex breaks, grapheme cluster
+@cindex boundaries, between grapheme clusters
This include file declares functions for determining where in a string
``grapheme clusters'' start and end. A ``grapheme cluster'' is an
approximation to a user-perceived character, which sometimes
-corresponds to multiple Unicode characters. The letter @samp{@'e},
+corresponds to multiple Unicode characters. Editing operations such as
+mouse selection, cursor movement, and backspacing often operate on
+grapheme clusters as units, not on individual characters.
+
+Some grapheme clusters are built from a base character and a combining
+character. The letter @samp{@'e},
for example, is most commonly represented in Unicode as a single
character U+00E8 @sc{LATIN SMALL LETTER E WITH ACUTE}. It is,
however, equally valid to use the pair of characters U+0065 @sc{LATIN
@@ -14,6 +21,12 @@ SMALL LETTER E} followed by U+0301 @sc{COMBINING ACUTE ACCENT}. Since
the user would perceive this pair of characters as a single character,
they would be grouped into a single grapheme cluster.
+But there are also grapheme clusters that consist of several base characters.
+For example, a Devanagari letter and a Devanagari vowel sign that follows it
+may form a grapheme cluster. Similarly, some pairs of Thai characters and
+Hangul syllables (formed by two or three Hangul characters) are grapheme
+clusters.
+
@menu
* Grapheme cluster breaks in a string::
* Grapheme cluster break property::
@@ -65,10 +78,11 @@ grapheme cluster break at start of text.
@node Grapheme cluster break property
@section Grapheme cluster break property
-This is a more low-level API. The grapheme cluster break property is a property defined
-in Unicode Standard Annex #29, section ``Grapheme Cluster Boundaries, see
-@url{http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries}.@texnl{} It is
-used for determining the grapheme cluster breaks in a string.
+This is a more low-level API. The grapheme cluster break property is a
+property defined in Unicode Standard Annex #29, section ``Grapheme Cluster
+Boundaries'', see
+@url{http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries}.@texnl{}
+It is used for determining the grapheme cluster breaks in a string.
The following are the possible values of the grapheme cluster break
property. More values may be added in the future.
@@ -87,7 +101,8 @@ property. More values may be added in the future.
@deftypevrx Constant int GBP_LVT
@end deftypevr
-The following function looks up the grapheme cluster break property of a character.
+The following function looks up the grapheme cluster break property of a
+character.
@deftypefun int uc_graphemeclusterbreak_property (ucs4_t @var{uc})
Returns the Grapheme_Cluster_Break property of a Unicode character.
@@ -102,7 +117,7 @@ Returns true if there is an grapheme cluster boundary between Unicode
characters @var{a} and @var{b}.
There is always a grapheme cluster break at the start or end of text.
-Specify zero for @var{a} or @var{b} to indicate start of text or end
+You can specify zero for @var{a} or @var{b} to indicate start of text or end
of text, respectively.
This implements the extended (not legacy) grapheme cluster rules