summaryrefslogtreecommitdiff
path: root/regcomp.h
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2011-01-15 13:42:58 -0700
committerKarl Williamson <public@khwilliamson.com>2011-01-16 08:18:54 -0700
commit11454c594f22abc5945e69a46fc965363dbf326e (patch)
tree8e51baaf062d5e28410294b9cac63f791c63ced2 /regcomp.h
parentf424400810b6af341e96230836690da51c37b812 (diff)
downloadperl-11454c594f22abc5945e69a46fc965363dbf326e.tar.gz
Fix \xa0 matching both [\s] [\S], et.al.
This bug stemmed from Latin1 characters not matching any (non-complemented) character class in /d semantics when the target string is no utf8; but having unicode semantics when it isn't. The solution here is to add a special flag. There were several tests that relied on the broken behavior, specifically they tested that \xff isn't a printable word character even in utf8. I changed the deparse test to instead use a non-printable code point, and I changed the ones in re_tests to be TODOs, and will change them back using /a when that is shortly added.
Diffstat (limited to 'regcomp.h')
-rw-r--r--regcomp.h4
1 files changed, 4 insertions, 0 deletions
diff --git a/regcomp.h b/regcomp.h
index 0dc4374973..96e7ae14f6 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -362,6 +362,10 @@ struct regnode_charclass_class {
/* Matches every code point 0x100 and above*/
#define ANYOF_UNICODE_ALL 0x40
+/* Match all Latin1 characters that aren't ASCII when the target string is not
+ * in utf8. */
+#define ANYOF_NON_UTF8_LATIN1_ALL 0x80
+
#define ANYOF_FLAGS_ALL 0xff
/* Character classes for node->classflags of ANYOF */