summaryrefslogtreecommitdiff
path: root/regen
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2012-10-16 10:45:44 -0600
committerKarl Williamson <public@khwilliamson.com>2012-10-16 21:48:36 -0600
commit40b1ba4ffc62ae8198d69e8e3b33cf8201c6a18f (patch)
tree6c9dc0617aba9a4baa147b2e78d58ddcff1b5ede /regen
parentb4291290926312792a6bfb115da2883d6c9c433d (diff)
downloadperl-40b1ba4ffc62ae8198d69e8e3b33cf8201c6a18f.tar.gz
regen/regcharclass.pl: Change name of generated macro
This changes the macro isMULTI_CHAR_FOLD() (non-utf8 version) from just generating ascii-range code points to generating the full Latin1 range. However there are no such non-ASCII values, so the macro expansion is unchanged. By changing the name, it becomes clearer in future commits that we aren't excluding things that we should be considering.
Diffstat (limited to 'regen')
-rwxr-xr-xregen/regcharclass.pl6
-rw-r--r--regen/regcharclass_multi_char_folds.pl22
2 files changed, 16 insertions, 12 deletions
diff --git a/regen/regcharclass.pl b/regen/regcharclass.pl
index 37a86822fb..461192b587 100755
--- a/regen/regcharclass.pl
+++ b/regen/regcharclass.pl
@@ -1253,8 +1253,8 @@ do regen/regcharclass_multi_char_folds.pl
# 1 => All folds
&regcharclass_multi_char_folds::multi_char_folds(1)
-MULTI_CHAR_FOLD: multi-char ascii strings that are folded to by a single character
-=> low : safe
+MULTI_CHAR_FOLD: multi-char strings that are folded to by a single character
+=> LATIN1 :safe
-# 0 => ASCII-only
&regcharclass_multi_char_folds::multi_char_folds(0)
+# 0 => Latin1-only
diff --git a/regen/regcharclass_multi_char_folds.pl b/regen/regcharclass_multi_char_folds.pl
index ce2d781af7..f0fd6b3a89 100644
--- a/regen/regcharclass_multi_char_folds.pl
+++ b/regen/regcharclass_multi_char_folds.pl
@@ -9,15 +9,19 @@ use Unicode::UCD "prop_invmap";
# of the sequences of code points that are multi-character folds in the
# current Unicode version. If the parameter is 1, all such folds are
# returned. If the parameters is 0, only the ones containing exclusively
-# ASCII characters are returned. In the latter case all combinations of ASCII
-# characters that can fold to the base one are returned. Thus for 'ss', it
-# would return in addition, 'Ss', 'sS', and 'SS'. This is because this code
-# is designed to help regcomp.c, and EXACTFish regnodes. For non-UTF-8
-# patterns, the strings are not folded, so we need to check for the upper and
-# lower case versions. For UTF-8 patterns, the strings are folded, so we only
-# need to worry about the fold version. There are no non-ASCII Latin1
-# multi-char folds currently, and none likely to be ever added, so this
-# doesn't worry about that case, except to croak should it happen.
+# Latin1 characters are returned. In the latter case all combinations of
+# Latin1 characters that can fold to the base one are returned. Thus for
+# 'ss', it would return in addition, 'Ss', 'sS', and 'SS'. This is because
+# this code is designed to help regcomp.c, and EXACTFish regnodes. For
+# non-UTF-8 patterns, the strings are not folded, so we need to check for the
+# upper and lower case versions. For UTF-8 patterns, the strings are folded,
+# so we only need to worry about the fold version. There are no non-ASCII
+# Latin1 multi-char folds currently, and none likely to be ever added. Thus
+# the output is the same as if it were just asking for ASCII characters, not
+# full Latin1. Hence, it is suitable for generating things that match
+# EXACTFA. It does check for and croak if there ever were to be an upper
+# Latin1 range multi-character fold.
+#
# This is designed for input to regen/regcharlass.pl.
sub gen_combinations ($;) {