summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDavid Mitchell <davem@iabyn.com>2013-07-13 15:23:59 +0100
committerDavid Mitchell <davem@iabyn.com>2013-07-28 10:33:37 +0100
commitf1fb9b037a1fe86e0ccdfbf27affa94647af2a37 (patch)
tree7b9698a524bd8384e5e717fa048e73df0b38d8c8
parentfefee432646d03d27d3385f943793bbdd7bc1168 (diff)
downloadperl-f1fb9b037a1fe86e0ccdfbf27affa94647af2a37.tar.gz
enable intuit under anchored \G, and fix a bug
Since 1999, regcomp has had approximately the following comment and code: /* XXXX Currently intuiting is not compatible with ANCH_GPOS. This should be changed ASAP! */ if ((r->check_substr || r->check_utf8) && !(r->extflags & RXf_ANCH_GPOS)) { r->extflags |= RXf_USE_INTUIT; .... However, it appears that since that time, intuit has had (at least some) support for achored \G added. Note also that the RXf_USE_INTUIT flag (up until a few commits go) was only used by *callers* of regexec() to decide whether to call intuit() first; regexec() itself also internally calls intuit() on occasion, and in those cases it directly checks just the check_substr and check_utf8 fields, rather than the RXf_USE_INTUIT flag; so in those cases it's using intuit even in the presence of anchored \G. So, in the grand perl tradition of "make the change and see if anything in the test suite breaks", that's what I've done for this commit (i.e. removed the RXf_ANCH_GPOS check above). So intuit is now normally called even in the presence of anchored \G. This means that something like "aaaa" =~ /\G.*xx/ will now quickly fail in intuit rather than more slowly failing in regmatch(). Note that I have no actual knowledge of whether intuit is *really* anchored-\G-safe. As it happens one thing in the test suite did break, and this was due to the following code, added back in 1997: if ( .... && !((RExC_seen & REG_SEEN_GPOS) || (r->extflags & RXf_ANCH_GPOS))) ) r->extflags |= RXf_CHECK_ALL; It was clearly meant to say that if either of those \G flags were present, don't set the RXf_CHECK_ALL flag (which enables intuit-only matches). But the '!' was set to cover the first condition only, rather than both. Presumably this had never been spotted before due to skipping intuit under anchored \G. [Actually this commit broke some other stuff too, not covered by the test suite. See the next commit. Hooray for git rebase -i and history re-writing!]
-rw-r--r--regcomp.c6
1 files changed, 2 insertions, 4 deletions
diff --git a/regcomp.c b/regcomp.c
index 90e6c9a5b7..0af3483a92 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -6242,7 +6242,7 @@ reStudy:
&& data.last_start_min == 0 && data.last_end > 0
&& !RExC_seen_zerolen
&& !(RExC_seen & REG_SEEN_VERBARG)
- && (!(RExC_seen & REG_SEEN_GPOS) || (r->extflags & RXf_ANCH_GPOS)))
+ && !((RExC_seen & REG_SEEN_GPOS) || (r->extflags & RXf_ANCH_GPOS)))
r->extflags |= RXf_CHECK_ALL;
scan_commit(pRExC_state, &data,&minlen,0);
@@ -6339,9 +6339,7 @@ reStudy:
r->check_offset_min = r->float_min_offset;
r->check_offset_max = r->float_max_offset;
}
- /* XXXX Currently intuiting is not compatible with ANCH_GPOS.
- This should be changed ASAP! */
- if ((r->check_substr || r->check_utf8) && !(r->extflags & RXf_ANCH_GPOS)) {
+ if ((r->check_substr || r->check_utf8) ) {
r->extflags |= RXf_USE_INTUIT;
if (SvTAIL(r->check_substr ? r->check_substr : r->check_utf8))
r->extflags |= RXf_INTUIT_TAIL;