utf8.pm: Clarify that UTF-8 is Perl's extended form

Spotted by Christian Hansen
author: Karl Williamson <khw@cpan.org> 2017-12-22 12:32:12 -0700
committer: Karl Williamson <khw@cpan.org> 2017-12-22 12:49:41 -0700
commit: 50a85cfe6c852deb0c2f738cb82006623052dc8e (patch)
tree: bcd01fe799affc1ecbb2a0d107e2d0966ff765f1 /lib
parent: 842c2139ee2bb06de824c15923ef1aca9b9c504c (diff)
download: perl-50a85cfe6c852deb0c2f738cb82006623052dc8e.tar.gz
1 files changed, 15 insertions, 17 deletions
diff --git a/lib/utf8.pm b/lib/utf8.pm
index 34930a0554..7bb65773b1 100644
--- a/lib/utf8.pm
+++ b/lib/utf8.pm
@@ -2,7 +2,7 @@ package utf8;
 
 $utf8::hint_bits = 0x00800000;
 
-our $VERSION = '1.20';
+our $VERSION = '1.21';
 
 sub import {
     $^H |= $utf8::hint_bits;
@@ -136,14 +136,12 @@ use L<Encode> instead.
 =item * C<$success = utf8::downgrade($string[, $fail_ok])>
 
 (Since Perl v5.8.0)
-Converts in-place the internal representation of the string from
-UTF-8 to the equivalent octet sequence in the native encoding (Latin-1
-or EBCDIC). The logical character sequence itself is unchanged. If
-I<$string> is already stored as native 8 bit, then this is a no-op.  Can
-be used to
-make sure that the UTF-8 flag is off, e.g. when you want to make sure
-that the substr() or length() function works with the usually faster
-byte algorithm.
+Converts in-place the internal representation of the string from UTF-8 to the
+equivalent octet sequence in the native encoding (Latin-1 or EBCDIC). The
+logical character sequence itself is unchanged. If I<$string> is already
+stored as native 8 bit, then this is a no-op.  Can be used to make sure that
+the UTF-8 flag is off, e.g. when you want to make sure that the substr() or
+length() function works with the usually faster byte algorithm.
 
 Fails if the original UTF-8 sequence cannot be represented in the
 native 8 bit encoding. On failure dies or, if the value of I<$fail_ok> is
@@ -167,8 +165,8 @@ use L<Encode> instead.
 
 (Since Perl v5.8.0)
 Converts in-place the character sequence to the corresponding octet
-sequence in UTF-8. That is, every (possibly wide) character gets
-replaced with a sequence of one or more characters that represent the
+sequence in Perl's extended UTF-8. That is, every (possibly wide) character
+gets replaced with a sequence of one or more characters that represent the
 individual UTF-8 bytes of the character.  The UTF8 flag is turned off.
 Returns nothing.
 
@@ -188,12 +186,12 @@ use L<Encode> instead.
 =item * C<$success = utf8::decode($string)>
 
 (Since Perl v5.8.0)
-Attempts to convert in-place the octet sequence encoded as UTF-8 to the
-corresponding character sequence. That is, it replaces each sequence of
-characters in the string whose ords represent a valid UTF-8 byte
-sequence, with the corresponding single character.  The UTF-8 flag is
-turned on only if the source string contains multiple-byte UTF-8
-characters.  If I<$string> is invalid as UTF-8, returns false;
+Attempts to convert in-place the octet sequence encoded in Perl's extended
+UTF-8 to the corresponding character sequence. That is, it replaces each
+sequence of characters in the string whose ords represent a valid (extended)
+UTF-8 byte sequence, with the corresponding single character.  The UTF-8 flag
+is turned on only if the source string contains multiple-byte UTF-8
+characters.  If I<$string> is invalid as extended UTF-8, returns false;
 otherwise returns true.
 
  my $x = "\xc4\x80"; # $x contains two characters, with ords
author	Karl Williamson <khw@cpan.org>	2017-12-22 12:32:12 -0700
committer	Karl Williamson <khw@cpan.org>	2017-12-22 12:49:41 -0700
commit	50a85cfe6c852deb0c2f738cb82006623052dc8e (patch)
tree	bcd01fe799affc1ecbb2a0d107e2d0966ff765f1 /lib
parent	842c2139ee2bb06de824c15923ef1aca9b9c504c (diff)
download	perl-50a85cfe6c852deb0c2f738cb82006623052dc8e.tar.gz