summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2013-03-27 11:13:36 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2013-03-27 11:13:36 +0000
commitf5f97ff8e2ffeabf42f8636de5f5a9aea17b5d72 (patch)
treeca763b27aa287db45fa9b2d7cff6e40017bddb98
parent2bd616230afecbea6d658f5c1541942288752fb9 (diff)
downloadpcre-f5f97ff8e2ffeabf42f8636de5f5a9aea17b5d72.tar.gz
Further changes to backtracking verbs in assertions.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@1302 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r--ChangeLog6
-rw-r--r--doc/pcrepattern.356
-rw-r--r--pcre_exec.c71
-rw-r--r--testdata/testinput127
-rw-r--r--testdata/testinput29
-rw-r--r--testdata/testoutput136
-rw-r--r--testdata/testoutput230
7 files changed, 181 insertions, 54 deletions
diff --git a/ChangeLog b/ChangeLog
index 3a2b342..dc43cd1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -110,7 +110,7 @@ Version 8.33 xx-xxxx-201x
30. Update RunTest with additional test selector options.
-31. The way PCRE handles backtracking verbs has been changed in to ways.
+31. The way PCRE handles backtracking verbs has been changed in two ways.
(1) Previously, in something like (*COMMIT)(*SKIP), COMMIT would override
SKIP. Now, PCRE acts on whichever backtracking verb is reached first by
@@ -118,8 +118,8 @@ Version 8.33 xx-xxxx-201x
rather obscure rules do not always do the same thing.
(2) Previously, backtracking verbs were confined within assertions. This is
- no longer the case, except for (*ACCEPT). Again, this sometimes improves
- Perl compatibility, and sometimes does not.
+ no longer the case for positive assertions, except for (*ACCEPT). Again,
+ this sometimes improves Perl compatibility, and sometimes does not.
32. A number of tests that were in test 2 because Perl did things differently
have been moved to test 1, because either Perl or PCRE has changed, and
diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3
index caf8724..b28dabc 100644
--- a/doc/pcrepattern.3
+++ b/doc/pcrepattern.3
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "22 March 2013" "PCRE 8.33"
+.TH PCREPATTERN 3 "27 March 2013" "PCRE 8.33"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -2673,13 +2673,13 @@ remarks apply to the PCRE features described in this section.
.P
The new verbs make use of what was previously invalid syntax: an opening
parenthesis followed by an asterisk. They are generally of the form
-(*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
-depending on whether or not a name is present. A name is any sequence of
-characters that does not include a closing parenthesis. The maximum length of
-name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit libraries.
-If the name is empty, that is, if the closing parenthesis immediately follows
-the colon, the effect is as if the colon were not there. Any number of these
-verbs may occur in a pattern.
+(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving
+differently depending on whether or not a name is present. A name is any
+sequence of characters that does not include a closing parenthesis. The maximum
+length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit
+libraries. If the name is empty, that is, if the closing parenthesis
+immediately follows the colon, the effect is as if the colon were not there.
+Any number of these verbs may occur in a pattern.
.P
Since these verbs are specifically related to backtracking, most of them can be
used only when the pattern is to be matched using one of the traditional
@@ -2807,9 +2807,9 @@ indicates which of the two alternatives matched. This is a more efficient way
of obtaining this information than putting each alternative in its own
capturing parentheses.
.P
-If a verb with a name is encountered in a positive assertion, its name is
-recorded and passed back if it is the last-encountered. This does not happen
-for negative assertions.
+If a verb with a name is encountered in a positive assertion that is true, the
+name is recorded and passed back if it is the last-encountered. This does not
+happen for negative assertions or failing positive assertions.
.P
After a partial match or a failed match, the last encountered name in the
entire match process is returned. For example:
@@ -2839,14 +2839,16 @@ The following verbs do nothing when they are encountered. Matching continues
with what follows, but if there is no subsequent match, causing a backtrack to
the verb, a failure is forced. That is, backtracking cannot pass to the left of
the verb. However, when one of these verbs appears inside an atomic group or an
-assertion, its effect is confined to that group, because once the group has
-been matched, there is never any backtracking into it. In this situation,
-backtracking can "jump back" to the left of the entire atomic group or
-assertion. (Remember also, as stated above, that this localization also applies
-in subroutine calls.)
+assertion that is true, its effect is confined to that group, because once the
+group has been matched, there is never any backtracking into it. In this
+situation, backtracking can "jump back" to the left of the entire atomic group
+or assertion. (Remember also, as stated above, that this localization also
+applies in subroutine calls.)
.P
These verbs differ in exactly what kind of failure occurs when backtracking
-reaches them.
+reaches them. The behaviour described below is what happens when the verb is
+not in a subroutine or an assertion. Subsequent sections cover these special
+cases.
.sp
(*COMMIT)
.sp
@@ -2942,8 +2944,10 @@ pattern-based if-then-else block:
.sp
If the COND1 pattern matches, FOO is tried (and possibly further items after
the end of the group if FOO succeeds); on failure, the matcher skips to the
-second alternative and tries COND2, without backtracking into COND1.
-If (*THEN) is not inside an alternation, it acts like (*PRUNE).
+second alternative and tries COND2, without backtracking into COND1. If that
+succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no
+more alternatives, so there is a backtrack to whatever came before the entire
+group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
.P
The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
It is like (*MARK:NAME) in that the name is remembered for passing back to the
@@ -3039,10 +3043,18 @@ the second repeat of the group acts.
further processing. In a negative assertion, (*ACCEPT) causes the assertion to
fail without any further processing.
.P
-The other backtracking verbs are not treated specially if they appear in an
-assertion. In particular, (*THEN) skips to the next alternative in the
+The other backtracking verbs are not treated specially if they appear in a
+positive assertion. In particular, (*THEN) skips to the next alternative in the
innermost enclosing group that has alternations, whether or not this is within
the assertion.
+.P
+Negative assertions are, however, different, in order to ensure that changing a
+positive assertion into a negative assertion changes its result. Backtracking
+into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true,
+without considering any further alternative branches in the assertion.
+Backtracking into (*THEN) causes it to skip to the next enclosing alternative
+within the assertion (the normal behaviour), but if the assertion does not have
+such an alternative, (*THEN) behaves like (*PRUNE).
.
.
.\" HTML <a name="btsub"></a>
@@ -3088,6 +3100,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 22 March 2013
+Last updated: 27 March 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi
diff --git a/pcre_exec.c b/pcre_exec.c
index 79c14db..5b030a9 100644
--- a/pcre_exec.c
+++ b/pcre_exec.c
@@ -1608,11 +1608,18 @@ for (;;)
do
{
RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, NULL, RM4);
+
+ /* A match means that the assertion is true; break out of the loop
+ that matches its alternatives. */
+
if (rrc == MATCH_MATCH || rrc == MATCH_ACCEPT)
{
mstart = md->start_match_ptr; /* In case \K reset it */
break;
}
+
+ /* If not matched, restore the previous mark setting. */
+
md->mark = save_mark;
/* See comment in the code for capturing groups above about handling
@@ -1626,17 +1633,19 @@ for (;;)
rrc = MATCH_NOMATCH;
}
- /* Anything other than NOMATCH causes the assertion to fail. This
- includes COMMIT, SKIP, and PRUNE. However, this consistent approach does
- not always have exactly the same effect as in Perl. */
+ /* Anything other than NOMATCH causes the entire assertion to fail,
+ passing back the return code. This includes COMMIT, SKIP, PRUNE and an
+ uncaptured THEN, which means they take their normal effect. This
+ consistent approach does not always have exactly the same effect as in
+ Perl. */
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
ecode += GET(ecode, 1);
}
- while (*ecode == OP_ALT);
+ while (*ecode == OP_ALT); /* Continue for next alternative */
/* If we have tried all the alternative branches, the assertion has
- failed. */
+ failed. If not, we broke out after a match. */
if (*ecode == OP_KET) RRETURN(MATCH_NOMATCH);
@@ -1670,35 +1679,57 @@ for (;;)
do
{
RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, NULL, RM5);
- md->mark = save_mark;
+ md->mark = save_mark; /* Always restore the mark setting */
- /* A successful match means the assertion has failed. */
-
- if (rrc == MATCH_MATCH || rrc == MATCH_ACCEPT) RRETURN(MATCH_NOMATCH);
+ switch(rrc)
+ {
+ case MATCH_MATCH: /* A successful match means */
+ case MATCH_ACCEPT: /* the assertion has failed. */
+ RRETURN(MATCH_NOMATCH);
+
+ case MATCH_NOMATCH: /* Carry on with next branch */
+ break;
- /* See comment in the code for capturing groups above about handling
- THEN. */
+ /* See comment in the code for capturing groups above about handling
+ THEN. */
- if (rrc == MATCH_THEN)
- {
+ case MATCH_THEN:
next = ecode + GET(ecode,1);
if (md->start_match_ptr < next &&
(*ecode == OP_ALT || *next == OP_ALT))
+ {
rrc = MATCH_NOMATCH;
+ break;
+ }
+ /* Otherwise fall through. */
+
+ /* COMMIT, SKIP, PRUNE, and an uncaptured THEN cause the whole
+ assertion to fail to match, without considering any more alternatives.
+ Failing to match means the assertion is true. This is a consistent
+ approach, but does not always have the same effect as in Perl. */
+
+ case MATCH_COMMIT:
+ case MATCH_SKIP:
+ case MATCH_SKIP_ARG:
+ case MATCH_PRUNE:
+ do ecode += GET(ecode,1); while (*ecode == OP_ALT);
+ goto NEG_ASSERT_TRUE; /* Break out of alternation loop */
+
+ /* Anything else is an error */
+
+ default:
+ RRETURN(rrc);
}
- /* No match on a branch means we must carry on and try the next branch.
- Anything else, in particular, SKIP, PRUNE, etc. causes a failure in the
- enclosing branch. This is a consistent approach, but does not always have
- the same effect as in Perl. */
-
- if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+ /* Continue with next branch */
+
ecode += GET(ecode,1);
}
while (*ecode == OP_ALT);
/* All branches in the assertion failed to match. */
-
+
+ NEG_ASSERT_TRUE:
if (condassert) RRETURN(MATCH_MATCH); /* Condition assertion */
ecode += 1 + LINK_SIZE; /* Continue with current branch */
continue;
diff --git a/testdata/testinput1 b/testdata/testinput1
index 4630e8d..50bdf23 100644
--- a/testdata/testinput1
+++ b/testdata/testinput1
@@ -5521,5 +5521,32 @@ AbcdCBefgBhiBqz
ac
/--------/
+
+/(?(?!b(*THEN)a)bn|bnn)/
+ bnn
+
+/(?!b(*SKIP)a)bn|bnn/
+ bnn
+
+/(?(?!b(*SKIP)a)bn|bnn)/
+ bnn
+
+/(?!b(*PRUNE)a)bn|bnn/
+ bnn
+
+/(?(?!b(*PRUNE)a)bn|bnn)/
+ bnn
+
+/(?!b(*COMMIT)a)bn|bnn/
+ bnn
+
+/(?(?!b(*COMMIT)a)bn|bnn)/
+ bnn
+
+/(?=b(*SKIP)a)bn|bnn/
+ bnn
+
+/(?=b(*THEN)a)bn|bnn/
+ bnn
/-- End of testinput1 --/
diff --git a/testdata/testinput2 b/testdata/testinput2
index 91fcad8..0d05358 100644
--- a/testdata/testinput2
+++ b/testdata/testinput2
@@ -3817,6 +3817,15 @@ backtracking verbs. --/
/^(A(*THEN)B|A(*THEN)D)/
AD
+
+/(?!b(*THEN)a)bn|bnn/
+ bnn
+
+/(?(?=b(*SKIP)a)bn|bnn)/
+ bnn
+
+/(?=b(*THEN)a|)bn|bnn/
+ bnn
/-------------------------/
diff --git a/testdata/testoutput1 b/testdata/testoutput1
index 28d186e..f9fab09 100644
--- a/testdata/testoutput1
+++ b/testdata/testoutput1
@@ -9079,5 +9079,41 @@ No match
No match
/--------/
+
+/(?(?!b(*THEN)a)bn|bnn)/
+ bnn
+ 0: bn
+
+/(?!b(*SKIP)a)bn|bnn/
+ bnn
+ 0: bn
+
+/(?(?!b(*SKIP)a)bn|bnn)/
+ bnn
+ 0: bn
+
+/(?!b(*PRUNE)a)bn|bnn/
+ bnn
+ 0: bn
+
+/(?(?!b(*PRUNE)a)bn|bnn)/
+ bnn
+ 0: bn
+
+/(?!b(*COMMIT)a)bn|bnn/
+ bnn
+ 0: bn
+
+/(?(?!b(*COMMIT)a)bn|bnn)/
+ bnn
+ 0: bn
+
+/(?=b(*SKIP)a)bn|bnn/
+ bnn
+No match
+
+/(?=b(*THEN)a)bn|bnn/
+ bnn
+ 0: bnn
/-- End of testinput1 --/
diff --git a/testdata/testoutput2 b/testdata/testoutput2
index d7fa221..56db934 100644
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@@ -12520,37 +12520,37 @@ backtracking verbs. --/
/^(?!a(*SKIP)b)/
ac
-No match
+ 0:
/^(?!a(*SKIP)b)../
acd
-No match
+ 0: ac
/(?!a(*SKIP)b)../
acd
- 0: cd
+ 0: ac
/^(?(?!a(*SKIP)b))/
ac
-No match
+ 0:
/^(?!a(*PRUNE)b)../
acd
-No match
+ 0: ac
/(?!a(*PRUNE)b)../
acd
- 0: cd
+ 0: ac
/(?!a(*COMMIT)b)ac|cd/
ac
-No match
+ 0: ac
/(?!a(*COMMIT)b)ac|ad/
ac
-No match
+ 0: ac
ad
-No match
+ 0: ad
/^(?!a(*THEN)b|ac)../
ac
@@ -12596,6 +12596,18 @@ No match
AD
0: AD
1: AD
+
+/(?!b(*THEN)a)bn|bnn/
+ bnn
+ 0: bn
+
+/(?(?=b(*SKIP)a)bn|bnn)/
+ bnn
+No match
+
+/(?=b(*THEN)a|)bn|bnn/
+ bnn
+ 0: bn
/-------------------------/