diff options
author | Karl Williamson <khw@cpan.org> | 2016-08-27 20:08:52 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2016-08-31 20:32:37 -0600 |
commit | 89d986df51f55257b8cc3f6e4f54eba60f607e48 (patch) | |
tree | 1bcfc07660079fb264bd43052493ad5d1243cd0c /regen/regcharclass.pl | |
parent | 52be253637900db8bbf44d387a06af529972b855 (diff) | |
download | perl-89d986df51f55257b8cc3f6e4f54eba60f607e48.tar.gz |
Make 3 UTF-8 macros API
These may be useful to various module writers. They certainly are
useful for Encode. This makes public API macros to determine if the
input UTF-8 represents (one macro for each category)
a) a surrogate code point
b) a non-character code point
c) a code point that is above Unicode's legal maximum.
The macros are machine generated. In making them public, I am now using
the string end location parameter to guard against running off the end
of the input. Previously this parameter was ignored, as their use in
the core could be tightly controlled so that we already knew that the
string was long enough when calling these macros. But this can't be
guaranteed in the public API. An optimizing compiler should be able to
remove redundant length checks.
Diffstat (limited to 'regen/regcharclass.pl')
-rwxr-xr-x | regen/regcharclass.pl | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/regen/regcharclass.pl b/regen/regcharclass.pl index 9115eafeb6..e22720b508 100755 --- a/regen/regcharclass.pl +++ b/regen/regcharclass.pl @@ -1630,11 +1630,11 @@ REPLACEMENT: Unicode REPLACEMENT CHARACTER 0xFFFD NONCHAR: Non character code points -=> UTF8 :fast +=> UTF8 :safe \p{_Perl_Nchar} SURROGATE: Surrogate characters -=> UTF8 :fast +=> UTF8 :safe \p{_Perl_Surrogate} # This program was run with this enabled, and the results copied to utf8.h; |