summaryrefslogtreecommitdiff
path: root/regcomp.h
diff options
context:
space:
mode:
authorYves Orton <demerphq@gmail.com>2008-11-07 20:20:21 +0000
committerYves Orton <demerphq@gmail.com>2008-11-07 20:20:21 +0000
commitda7fcca4b8d6fb4dc88e0305bf9830bf24912ebd (patch)
treed05a14842c3d234ee9e4f5d1f692c20733133eb1 /regcomp.h
parent463559e728b65f7b60e46efa081b43ff1b4b6fa4 (diff)
downloadperl-da7fcca4b8d6fb4dc88e0305bf9830bf24912ebd.tar.gz
create new unicode props as defined in POSIX spec (optionally use them in the regex engine)
Perlbug #60156 and #49302 (and probably others) resolve down to the problem that the definition of \s and \w and \d and the POSIX charclasses are different for unicode strings and for non-unicode strings. This broke the character class logic in the regex engine. The easiest fix to make the character class logic sane again is to define new properties which do match. This change creates new property classes that can be used instead of the traditional ones (it does not change the previously defined ones). If the define in regcomp.h: #define PERL_LEGACY_UNICODE_CHARCLASS_MAPPINGS 1 is changed to 0, then the new mappings will be used. This will fix a bunch of bugs that are reported as TODO items in the new reg_posixcc.t test file. p4raw-id: //depot/perl@34769
Diffstat (limited to 'regcomp.h')
-rw-r--r--regcomp.h18
1 files changed, 18 insertions, 0 deletions
diff --git a/regcomp.h b/regcomp.h
index 1664871d01..2ac1be11e9 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -18,6 +18,24 @@ typedef OP OP_4tree; /* Will be redefined later. */
/* Be really agressive about optimising patterns with trie sequences? */
#define PERL_ENABLE_EXTENDED_TRIE_OPTIMISATION 1
+/* Use old style unicode mappings for perl and posix character classes
+ *
+ * NOTE: Enabling this essentially breaks character class matching against unicode
+ * strings, so that POSIX char classes match when they shouldn't, and \d matches
+ * way more than 10 characters, and sometimes a charclass and its complement either
+ * both match or neither match.
+ * NOTE: Disabling this will cause various backwards compatibility issues to rear
+ * their head, and tests to fail. However it will make the charclass behaviour
+ * consistant regardless of internal string type, and make character class inversions
+ * consistant. The tests that fail in the regex engine are basically broken tests.
+ *
+ * Personally I think 5.12 should disable this for sure. Its a bit more debatable for
+ * 5.10, so for now im leaving it enabled.
+ *
+ * -demerphq
+ */
+#define PERL_LEGACY_UNICODE_CHARCLASS_MAPPINGS 1
+
/* Should the optimiser take positive assertions into account? */
#define PERL_ENABLE_POSITIVE_ASSERTION_STUDY 0