diff options
author | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2013-03-27 11:13:36 +0000 |
---|---|---|
committer | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2013-03-27 11:13:36 +0000 |
commit | f5f97ff8e2ffeabf42f8636de5f5a9aea17b5d72 (patch) | |
tree | ca763b27aa287db45fa9b2d7cff6e40017bddb98 | |
parent | 2bd616230afecbea6d658f5c1541942288752fb9 (diff) | |
download | pcre-f5f97ff8e2ffeabf42f8636de5f5a9aea17b5d72.tar.gz |
Further changes to backtracking verbs in assertions.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@1302 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r-- | ChangeLog | 6 | ||||
-rw-r--r-- | doc/pcrepattern.3 | 56 | ||||
-rw-r--r-- | pcre_exec.c | 71 | ||||
-rw-r--r-- | testdata/testinput1 | 27 | ||||
-rw-r--r-- | testdata/testinput2 | 9 | ||||
-rw-r--r-- | testdata/testoutput1 | 36 | ||||
-rw-r--r-- | testdata/testoutput2 | 30 |
7 files changed, 181 insertions, 54 deletions
@@ -110,7 +110,7 @@ Version 8.33 xx-xxxx-201x 30. Update RunTest with additional test selector options. -31. The way PCRE handles backtracking verbs has been changed in to ways. +31. The way PCRE handles backtracking verbs has been changed in two ways. (1) Previously, in something like (*COMMIT)(*SKIP), COMMIT would override SKIP. Now, PCRE acts on whichever backtracking verb is reached first by @@ -118,8 +118,8 @@ Version 8.33 xx-xxxx-201x rather obscure rules do not always do the same thing. (2) Previously, backtracking verbs were confined within assertions. This is - no longer the case, except for (*ACCEPT). Again, this sometimes improves - Perl compatibility, and sometimes does not. + no longer the case for positive assertions, except for (*ACCEPT). Again, + this sometimes improves Perl compatibility, and sometimes does not. 32. A number of tests that were in test 2 because Perl did things differently have been moved to test 1, because either Perl or PCRE has changed, and diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3 index caf8724..b28dabc 100644 --- a/doc/pcrepattern.3 +++ b/doc/pcrepattern.3 @@ -1,4 +1,4 @@ -.TH PCREPATTERN 3 "22 March 2013" "PCRE 8.33" +.TH PCREPATTERN 3 "27 March 2013" "PCRE 8.33" .SH NAME PCRE - Perl-compatible regular expressions .SH "PCRE REGULAR EXPRESSION DETAILS" @@ -2673,13 +2673,13 @@ remarks apply to the PCRE features described in this section. .P The new verbs make use of what was previously invalid syntax: an opening parenthesis followed by an asterisk. They are generally of the form -(*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour, -depending on whether or not a name is present. A name is any sequence of -characters that does not include a closing parenthesis. The maximum length of -name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit libraries. -If the name is empty, that is, if the closing parenthesis immediately follows -the colon, the effect is as if the colon were not there. Any number of these -verbs may occur in a pattern. +(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving +differently depending on whether or not a name is present. A name is any +sequence of characters that does not include a closing parenthesis. The maximum +length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit +libraries. If the name is empty, that is, if the closing parenthesis +immediately follows the colon, the effect is as if the colon were not there. +Any number of these verbs may occur in a pattern. .P Since these verbs are specifically related to backtracking, most of them can be used only when the pattern is to be matched using one of the traditional @@ -2807,9 +2807,9 @@ indicates which of the two alternatives matched. This is a more efficient way of obtaining this information than putting each alternative in its own capturing parentheses. .P -If a verb with a name is encountered in a positive assertion, its name is -recorded and passed back if it is the last-encountered. This does not happen -for negative assertions. +If a verb with a name is encountered in a positive assertion that is true, the +name is recorded and passed back if it is the last-encountered. This does not +happen for negative assertions or failing positive assertions. .P After a partial match or a failed match, the last encountered name in the entire match process is returned. For example: @@ -2839,14 +2839,16 @@ The following verbs do nothing when they are encountered. Matching continues with what follows, but if there is no subsequent match, causing a backtrack to the verb, a failure is forced. That is, backtracking cannot pass to the left of the verb. However, when one of these verbs appears inside an atomic group or an -assertion, its effect is confined to that group, because once the group has -been matched, there is never any backtracking into it. In this situation, -backtracking can "jump back" to the left of the entire atomic group or -assertion. (Remember also, as stated above, that this localization also applies -in subroutine calls.) +assertion that is true, its effect is confined to that group, because once the +group has been matched, there is never any backtracking into it. In this +situation, backtracking can "jump back" to the left of the entire atomic group +or assertion. (Remember also, as stated above, that this localization also +applies in subroutine calls.) .P These verbs differ in exactly what kind of failure occurs when backtracking -reaches them. +reaches them. The behaviour described below is what happens when the verb is +not in a subroutine or an assertion. Subsequent sections cover these special +cases. .sp (*COMMIT) .sp @@ -2942,8 +2944,10 @@ pattern-based if-then-else block: .sp If the COND1 pattern matches, FOO is tried (and possibly further items after the end of the group if FOO succeeds); on failure, the matcher skips to the -second alternative and tries COND2, without backtracking into COND1. -If (*THEN) is not inside an alternation, it acts like (*PRUNE). +second alternative and tries COND2, without backtracking into COND1. If that +succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no +more alternatives, so there is a backtrack to whatever came before the entire +group. If (*THEN) is not inside an alternation, it acts like (*PRUNE). .P The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). It is like (*MARK:NAME) in that the name is remembered for passing back to the @@ -3039,10 +3043,18 @@ the second repeat of the group acts. further processing. In a negative assertion, (*ACCEPT) causes the assertion to fail without any further processing. .P -The other backtracking verbs are not treated specially if they appear in an -assertion. In particular, (*THEN) skips to the next alternative in the +The other backtracking verbs are not treated specially if they appear in a +positive assertion. In particular, (*THEN) skips to the next alternative in the innermost enclosing group that has alternations, whether or not this is within the assertion. +.P +Negative assertions are, however, different, in order to ensure that changing a +positive assertion into a negative assertion changes its result. Backtracking +into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true, +without considering any further alternative branches in the assertion. +Backtracking into (*THEN) causes it to skip to the next enclosing alternative +within the assertion (the normal behaviour), but if the assertion does not have +such an alternative, (*THEN) behaves like (*PRUNE). . . .\" HTML <a name="btsub"></a> @@ -3088,6 +3100,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 22 March 2013 +Last updated: 27 March 2013 Copyright (c) 1997-2013 University of Cambridge. .fi diff --git a/pcre_exec.c b/pcre_exec.c index 79c14db..5b030a9 100644 --- a/pcre_exec.c +++ b/pcre_exec.c @@ -1608,11 +1608,18 @@ for (;;) do { RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, NULL, RM4); + + /* A match means that the assertion is true; break out of the loop + that matches its alternatives. */ + if (rrc == MATCH_MATCH || rrc == MATCH_ACCEPT) { mstart = md->start_match_ptr; /* In case \K reset it */ break; } + + /* If not matched, restore the previous mark setting. */ + md->mark = save_mark; /* See comment in the code for capturing groups above about handling @@ -1626,17 +1633,19 @@ for (;;) rrc = MATCH_NOMATCH; } - /* Anything other than NOMATCH causes the assertion to fail. This - includes COMMIT, SKIP, and PRUNE. However, this consistent approach does - not always have exactly the same effect as in Perl. */ + /* Anything other than NOMATCH causes the entire assertion to fail, + passing back the return code. This includes COMMIT, SKIP, PRUNE and an + uncaptured THEN, which means they take their normal effect. This + consistent approach does not always have exactly the same effect as in + Perl. */ if (rrc != MATCH_NOMATCH) RRETURN(rrc); ecode += GET(ecode, 1); } - while (*ecode == OP_ALT); + while (*ecode == OP_ALT); /* Continue for next alternative */ /* If we have tried all the alternative branches, the assertion has - failed. */ + failed. If not, we broke out after a match. */ if (*ecode == OP_KET) RRETURN(MATCH_NOMATCH); @@ -1670,35 +1679,57 @@ for (;;) do { RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, NULL, RM5); - md->mark = save_mark; + md->mark = save_mark; /* Always restore the mark setting */ - /* A successful match means the assertion has failed. */ - - if (rrc == MATCH_MATCH || rrc == MATCH_ACCEPT) RRETURN(MATCH_NOMATCH); + switch(rrc) + { + case MATCH_MATCH: /* A successful match means */ + case MATCH_ACCEPT: /* the assertion has failed. */ + RRETURN(MATCH_NOMATCH); + + case MATCH_NOMATCH: /* Carry on with next branch */ + break; - /* See comment in the code for capturing groups above about handling - THEN. */ + /* See comment in the code for capturing groups above about handling + THEN. */ - if (rrc == MATCH_THEN) - { + case MATCH_THEN: next = ecode + GET(ecode,1); if (md->start_match_ptr < next && (*ecode == OP_ALT || *next == OP_ALT)) + { rrc = MATCH_NOMATCH; + break; + } + /* Otherwise fall through. */ + + /* COMMIT, SKIP, PRUNE, and an uncaptured THEN cause the whole + assertion to fail to match, without considering any more alternatives. + Failing to match means the assertion is true. This is a consistent + approach, but does not always have the same effect as in Perl. */ + + case MATCH_COMMIT: + case MATCH_SKIP: + case MATCH_SKIP_ARG: + case MATCH_PRUNE: + do ecode += GET(ecode,1); while (*ecode == OP_ALT); + goto NEG_ASSERT_TRUE; /* Break out of alternation loop */ + + /* Anything else is an error */ + + default: + RRETURN(rrc); } - /* No match on a branch means we must carry on and try the next branch. - Anything else, in particular, SKIP, PRUNE, etc. causes a failure in the - enclosing branch. This is a consistent approach, but does not always have - the same effect as in Perl. */ - - if (rrc != MATCH_NOMATCH) RRETURN(rrc); + /* Continue with next branch */ + ecode += GET(ecode,1); } while (*ecode == OP_ALT); /* All branches in the assertion failed to match. */ - + + NEG_ASSERT_TRUE: if (condassert) RRETURN(MATCH_MATCH); /* Condition assertion */ ecode += 1 + LINK_SIZE; /* Continue with current branch */ continue; diff --git a/testdata/testinput1 b/testdata/testinput1 index 4630e8d..50bdf23 100644 --- a/testdata/testinput1 +++ b/testdata/testinput1 @@ -5521,5 +5521,32 @@ AbcdCBefgBhiBqz ac /--------/ + +/(?(?!b(*THEN)a)bn|bnn)/ + bnn + +/(?!b(*SKIP)a)bn|bnn/ + bnn + +/(?(?!b(*SKIP)a)bn|bnn)/ + bnn + +/(?!b(*PRUNE)a)bn|bnn/ + bnn + +/(?(?!b(*PRUNE)a)bn|bnn)/ + bnn + +/(?!b(*COMMIT)a)bn|bnn/ + bnn + +/(?(?!b(*COMMIT)a)bn|bnn)/ + bnn + +/(?=b(*SKIP)a)bn|bnn/ + bnn + +/(?=b(*THEN)a)bn|bnn/ + bnn /-- End of testinput1 --/ diff --git a/testdata/testinput2 b/testdata/testinput2 index 91fcad8..0d05358 100644 --- a/testdata/testinput2 +++ b/testdata/testinput2 @@ -3817,6 +3817,15 @@ backtracking verbs. --/ /^(A(*THEN)B|A(*THEN)D)/ AD + +/(?!b(*THEN)a)bn|bnn/ + bnn + +/(?(?=b(*SKIP)a)bn|bnn)/ + bnn + +/(?=b(*THEN)a|)bn|bnn/ + bnn /-------------------------/ diff --git a/testdata/testoutput1 b/testdata/testoutput1 index 28d186e..f9fab09 100644 --- a/testdata/testoutput1 +++ b/testdata/testoutput1 @@ -9079,5 +9079,41 @@ No match No match /--------/ + +/(?(?!b(*THEN)a)bn|bnn)/ + bnn + 0: bn + +/(?!b(*SKIP)a)bn|bnn/ + bnn + 0: bn + +/(?(?!b(*SKIP)a)bn|bnn)/ + bnn + 0: bn + +/(?!b(*PRUNE)a)bn|bnn/ + bnn + 0: bn + +/(?(?!b(*PRUNE)a)bn|bnn)/ + bnn + 0: bn + +/(?!b(*COMMIT)a)bn|bnn/ + bnn + 0: bn + +/(?(?!b(*COMMIT)a)bn|bnn)/ + bnn + 0: bn + +/(?=b(*SKIP)a)bn|bnn/ + bnn +No match + +/(?=b(*THEN)a)bn|bnn/ + bnn + 0: bnn /-- End of testinput1 --/ diff --git a/testdata/testoutput2 b/testdata/testoutput2 index d7fa221..56db934 100644 --- a/testdata/testoutput2 +++ b/testdata/testoutput2 @@ -12520,37 +12520,37 @@ backtracking verbs. --/ /^(?!a(*SKIP)b)/ ac -No match + 0: /^(?!a(*SKIP)b)../ acd -No match + 0: ac /(?!a(*SKIP)b)../ acd - 0: cd + 0: ac /^(?(?!a(*SKIP)b))/ ac -No match + 0: /^(?!a(*PRUNE)b)../ acd -No match + 0: ac /(?!a(*PRUNE)b)../ acd - 0: cd + 0: ac /(?!a(*COMMIT)b)ac|cd/ ac -No match + 0: ac /(?!a(*COMMIT)b)ac|ad/ ac -No match + 0: ac ad -No match + 0: ad /^(?!a(*THEN)b|ac)../ ac @@ -12596,6 +12596,18 @@ No match AD 0: AD 1: AD + +/(?!b(*THEN)a)bn|bnn/ + bnn + 0: bn + +/(?(?=b(*SKIP)a)bn|bnn)/ + bnn +No match + +/(?=b(*THEN)a|)bn|bnn/ + bnn + 0: bn /-------------------------/ |