summaryrefslogtreecommitdiff
path: root/utf8.h
diff options
context:
space:
mode:
authorKarl Williamson <khw@khw-desktop.(none)>2009-12-24 22:54:58 -0700
committerAbigail <abigail@abigail.be>2009-12-25 10:07:41 +0100
commite1b711dac329baf9cf4ea3e4628e6c713e24b342 (patch)
treeb12ce1b41c2d6c0582296ddad541efd2ae3f71e2 /utf8.h
parent27bca3226281a592aed848b7e68ea50f27381dac (diff)
downloadperl-e1b711dac329baf9cf4ea3e4628e6c713e24b342.tar.gz
Update .pods
Signed-off-by: Abigail <abigail@abigail.be>
Diffstat (limited to 'utf8.h')
-rw-r--r--utf8.h8
1 files changed, 4 insertions, 4 deletions
diff --git a/utf8.h b/utf8.h
index 9eed545305..87653360ea 100644
--- a/utf8.h
+++ b/utf8.h
@@ -72,17 +72,17 @@ END_EXTERN_C
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
U+0000..U+007F 00..7F
- U+0080..U+07FF C2..DF 80..BF
+ U+0080..U+07FF * C2..DF 80..BF
U+0800..U+0FFF E0 * A0..BF 80..BF
U+1000..U+CFFF E1..EC 80..BF 80..BF
- U+D000..U+D7FF ED * 80..9F 80..BF
+ U+D000..U+D7FF ED 80..9F 80..BF
U+D800..U+DFFF +++++++ utf16 surrogates, not legal utf8 +++++++
U+E000..U+FFFF EE..EF 80..BF 80..BF
U+10000..U+3FFFF F0 * 90..BF 80..BF 80..BF
U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
U+100000..U+10FFFF F4 80..8F 80..BF 80..BF
-Note the gaps before the 2nd Byte entries above marked by '*'. These are
+Note the gaps before several of the byte entries above marked by '*'. These are
caused by legal UTF-8 avoiding non-shortest encodings: it is technically
possible to UTF-8-encode a single code point in different ways, but that is
explicitly forbidden, and the shortest possible encoding should always be used
@@ -101,7 +101,7 @@ explicitly forbidden, and the shortest possible encoding should always be used
00000dddccccccbbbbbbaaaaaa 11110ddd 10cccccc 10bbbbbb 10aaaaaa
As you can see, the continuation bytes all begin with C<10>, and the
-leading bits of the start byte tell how many bytes the are in the
+leading bits of the start byte tell how many bytes there are in the
encoded character.
*/