diff options
author | Karl Williamson <khw@cpan.org> | 2016-09-12 16:52:41 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2016-09-17 21:10:50 -0600 |
commit | a82be82b512232b63f28c5865113f7990fb59a3a (patch) | |
tree | 63e6434ecae9cfe89149e4f043dfdcac39a10434 /regen | |
parent | e23e8bc1957a5981b8a507b62471ae38ec06c661 (diff) | |
download | perl-a82be82b512232b63f28c5865113f7990fb59a3a.tar.gz |
Add macro for Unicode Corregindum #9 strict
This macro follows Unicode Corrigendum #9 to allow non-character code
points. These are still discouraged but not completely forbidden.
It's best for code that isn't intended to operate on arbitrary other
code text to use the original definition, but code that does things,
such as source code control, should change to use this definition if it
wants to be Unicode-strict.
Perl can't adopt C9 wholesale, as it might create security holes in
existing applications that rely on Perl keeping non-chars out.
Diffstat (limited to 'regen')
-rwxr-xr-x | regen/regcharclass.pl | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/regen/regcharclass.pl b/regen/regcharclass.pl index abc4942354..f3f8b997f3 100755 --- a/regen/regcharclass.pl +++ b/regen/regcharclass.pl @@ -1704,6 +1704,16 @@ SURROGATE: Surrogate code points #0xF0000 - 0xFFFFD #0x100000 - 0x10FFFD +#C9_STRICT_UTF8_CHAR: Matches legal Unicode UTF-8 variant code points, no surrogates +#=> UTF8 :no_length_checks only_ascii_platform +#0x0080 - 0xD7FF +#0xE000 - 0x10FFFF +# +#C9_STRICT_UTF8_CHAR: Matches legal Unicode UTF-8 variant code points including non-character code points, no surrogates +#=> UTF8 :no_length_checks only_ebcdic_platform +#0x00A0 - 0xD7FF +#0xE000 - 0x10FFFF + QUOTEMETA: Meta-characters that \Q should quote => high :fast \p{_Perl_Quotemeta} |