summaryrefslogtreecommitdiff
path: root/regen
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2016-09-12 16:52:41 -0600
committerKarl Williamson <khw@cpan.org>2016-09-17 21:10:50 -0600
commita82be82b512232b63f28c5865113f7990fb59a3a (patch)
tree63e6434ecae9cfe89149e4f043dfdcac39a10434 /regen
parente23e8bc1957a5981b8a507b62471ae38ec06c661 (diff)
downloadperl-a82be82b512232b63f28c5865113f7990fb59a3a.tar.gz
Add macro for Unicode Corregindum #9 strict
This macro follows Unicode Corrigendum #9 to allow non-character code points. These are still discouraged but not completely forbidden. It's best for code that isn't intended to operate on arbitrary other code text to use the original definition, but code that does things, such as source code control, should change to use this definition if it wants to be Unicode-strict. Perl can't adopt C9 wholesale, as it might create security holes in existing applications that rely on Perl keeping non-chars out.
Diffstat (limited to 'regen')
-rwxr-xr-xregen/regcharclass.pl10
1 files changed, 10 insertions, 0 deletions
diff --git a/regen/regcharclass.pl b/regen/regcharclass.pl
index abc4942354..f3f8b997f3 100755
--- a/regen/regcharclass.pl
+++ b/regen/regcharclass.pl
@@ -1704,6 +1704,16 @@ SURROGATE: Surrogate code points
#0xF0000 - 0xFFFFD
#0x100000 - 0x10FFFD
+#C9_STRICT_UTF8_CHAR: Matches legal Unicode UTF-8 variant code points, no surrogates
+#=> UTF8 :no_length_checks only_ascii_platform
+#0x0080 - 0xD7FF
+#0xE000 - 0x10FFFF
+#
+#C9_STRICT_UTF8_CHAR: Matches legal Unicode UTF-8 variant code points including non-character code points, no surrogates
+#=> UTF8 :no_length_checks only_ebcdic_platform
+#0x00A0 - 0xD7FF
+#0xE000 - 0x10FFFF
+
QUOTEMETA: Meta-characters that \Q should quote
=> high :fast
\p{_Perl_Quotemeta}