summaryrefslogtreecommitdiff
path: root/src/regexp
Commit message (Collapse)AuthorAgeFilesLines
* regexp/syntax: fix factoring of common prefixes in alternationsPaul Wankadia2016-01-083-7/+20
| | | | | | | | | | | | | In the past, `a.*?c|a.*?b` was factored to `a.*?[bc]`. Thus, given "abc" as its input string, the automaton would consume "ab" and then stop (when unanchored) whereas it should consume all of "abc" as per leftmost semantics. Fixes #13812. Change-Id: I67ac0a353d7793b3d0c9c4aaf22d157621dfe784 Reviewed-on: https://go-review.googlesource.com/18357 Reviewed-by: Russ Cox <rsc@golang.org>
* regexp/syntax: fix handling of \Q...\ERuss Cox2015-12-012-1/+10
| | | | | | | | | | | | | | | It's not a group: must handle the inside as a sequence of literal chars, not a single literal string. That is, \Qab\E+ is the same as ab+, not (ab)+. Fixes #11187. Change-Id: I5406d05ccf7efff3a7f15395bdb0cfb2bd23a8ed Reviewed-on: https://go-review.googlesource.com/17233 Reviewed-by: David Crawshaw <crawshaw@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
* regexp: fix one-pass compilationCaleb Spare2015-11-252-118/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | The one-pass transformation is structured as a search over the input machine for conditions that violate the one-pass requisites. At each iteration, we should fully explore all non-input paths that proceed from the current instruction; this is implemented via recursive check calls. But when we reach instructions that demand input (InstRune*), these should be put onto the search queue. Instead of searching this way, the routine previously (effectively) proceeded through the machine one instruction at a time until finding an Inst{Match,Fail,Rune*}, calling check on each instruction. This caused bug #11905, where the transformation stopped before rewriting all InstAlts as InstAltMatches. Further, the check function unnecessarily recurred on InstRune* instructions. (I believe this helps to mask the above bug.) This change also deletes some unused functions and duplicate test cases. Fixes #11905. Change-Id: I5b0b26efea3d3bd01c7479a518b5ed1b886701cd Reviewed-on: https://go-review.googlesource.com/17195 Reviewed-by: Russ Cox <rsc@golang.org>
* regexp: fix LiteralPrefix for certain onepass progsCaleb Spare2015-11-252-2/+20
| | | | | | | | | | | | | | | | The prefix computation for onepass was incorrectly returning complete=true when it encountered a beginning-of-text empty width match (^) in the middle of an expression. Fix by returning complete only when the prefix is followed by $ and then an accepting state. Fixes #11175. Change-Id: Ie9c4cf5f76c1d2c904a6fb2f016cedb265c19fde Reviewed-on: https://go-review.googlesource.com/16200 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
* regexp: add Copy method to RegexpCaleb Spare2015-11-252-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This helps users who wish to use separate Regexps in each goroutine to avoid lock contention. Previously they had to parse the expression multiple times to achieve this. I used variants of the included benchmark to evaluate this change. I used the arguments -benchtime 20s -cpu 1,2,4,8,16 on a machine with 16 hardware cores. Comparing a single shared Regexp vs. copied Regexps, we can see that lock contention causes huge slowdowns at higher levels of parallelism. The copied version shows the expected linear speedup. name old time/op new time/op delta MatchParallel 366ns ± 0% 370ns ± 0% +1.09% (p=0.000 n=10+8) MatchParallel-2 324ns ±28% 184ns ± 1% -43.37% (p=0.000 n=10+10) MatchParallel-4 352ns ± 5% 93ns ± 1% -73.70% (p=0.000 n=9+10) MatchParallel-8 480ns ± 3% 46ns ± 0% -90.33% (p=0.000 n=9+8) MatchParallel-16 510ns ± 8% 24ns ± 6% -95.36% (p=0.000 n=10+8) I also compared a modified version of Regexp that has no mutex and a single machine (the "RegexpForSingleGoroutine" rsc mentioned in https://github.com/golang/go/issues/8232#issuecomment-66096128). In this next test, I compared using N copied Regexps vs. N separate RegexpForSingleGoroutines. This shows that, even for this relatively simple regex, avoiding the lock entirely would only buy about 10-12% further improvement. name old time/op new time/op delta MatchParallel 370ns ± 0% 322ns ± 0% -12.97% (p=0.000 n=8+8) MatchParallel-2 184ns ± 1% 162ns ± 1% -11.60% (p=0.000 n=10+10) MatchParallel-4 92.7ns ± 1% 81.1ns ± 2% -12.43% (p=0.000 n=10+10) MatchParallel-8 46.4ns ± 0% 41.8ns ±10% -9.78% (p=0.000 n=8+10) MatchParallel-16 23.7ns ± 6% 20.6ns ± 1% -13.14% (p=0.000 n=8+8) Updates #8232. Change-Id: I15201a080c363d1b44104eafed46d8df5e311902 Reviewed-on: https://go-review.googlesource.com/16110 Reviewed-by: Russ Cox <rsc@golang.org>
* regexp/syntax: correctly print `^` BOL and `$` EOLTamir Duberstein2015-11-252-4/+4
| | | | | | | | Fixes #12980. Change-Id: I936db2f57f7c4dc80bb8ec32715c4c6b7bf0d708 Reviewed-on: https://go-review.googlesource.com/16112 Reviewed-by: Russ Cox <rsc@golang.org>
* regexp: fix slice bounds out of range panicsDidier Spezia2015-10-232-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Regular expressions involving a (x){0} term are simplified by removing this term from the expression, just before the expression is compiled. The number of subexpressions is evaluated before the simplification. The number of capture instructions in the compiled expressions is not necessarily in line with the number of subexpressions. When the ReplaceAll(String) methods are used, a number of capture slots (nmatch) is evaluated as 2*(s+1) (s being the number of subexpressions). In some case, it can be higher than the number of capture instructions evaluated at compile time, resulting in a panic when the internal slices of regexp.machine are resized to this value. Fixed by capping the number of capture slots to the number of capture instructions. I must say I do not really see the benefits of setting nmatch lower than re.prog.NumCap using this 2*(s+1) formula, so perhaps this can be further simplified. Fixes #11178 Fixes #11176 Change-Id: I21415e8ef2dd5f2721218e9a679f7f6bfb76ae9b Reviewed-on: https://go-review.googlesource.com/14013 Reviewed-by: Russ Cox <rsc@golang.org>
* regexp: add runnable example to regex.SplitSeth Hoenig2015-09-231-0/+22
| | | | | | | | | | | | The existing comment for regex.Split contains a plain text example, while many of the other regex functions have runnable examples. This change provides a runnable example for Split. Change-Id: I5373f57f532fe843d7d0adcf4b513061ec797047 Reviewed-on: https://go-review.googlesource.com/14737 Reviewed-by: Andrew Gerrand <adg@golang.org> Run-TryBot: Andrew Gerrand <adg@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
* regexp: small correction to test commentMichael Matloob2015-06-141-1/+1
| | | | | | | | s/Backtrace/Backtrack/ Change-Id: I062aab18f23f2bc2110cf7210c2e7264747e02cf Reviewed-on: https://go-review.googlesource.com/11091 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* regexp: suggest go doc, not godocRob Pike2015-06-011-1/+1
| | | | | | | | In 1.6, go doc is more likely to be available. Change-Id: I970ad1d3317b35273f5c8d830f75713d3570c473 Reviewed-on: https://go-review.googlesource.com/10518 Reviewed-by: Andrew Gerrand <adg@golang.org>
* regexp: trivial change in comments to update code.google.com linkDmitry Savintsev2015-04-272-5/+6
| | | | | | | | | | Replaced code.google.com/p/re2/ with github.com/google/re2/ and updated the file names (re2-exhaustive.txt.bz2 not re2.txt.gz) as well as the re2 make command (make log). Change-Id: I15937b0b8a898d78d45366857ed86421c8d69960 Reviewed-on: https://go-review.googlesource.com/9372 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* regexp: set b.cap[0] and b.cap[1] only when captures requestedMichael Matloob2015-04-172-13/+30
| | | | | | | | | Fixes #10319 Change-Id: I96015b0e1dff30a72de11fea3837638b5c672891 Reviewed-on: https://go-review.googlesource.com/8501 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
* regexp: skip backtracker for long programsMatthew Brennan2015-04-092-1/+22
| | | | | | | | | | | | This update makes maxBacktrackLen return 0 if len(prog.Inst) > maxBacktrackProg. This prevents an attempt to backtrack against a nil bitstate. Fixes #10319 Change-Id: Icdbeb2392782ccf66f9d0a70ea57af22fb93f01b Reviewed-on: https://go-review.googlesource.com/8473 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* regexp: fix link to RE2 syntaxBrad Fitzpatrick2015-03-231-1/+1
| | | | | | | | Fixes #10224 Change-Id: I21037379b4667575e51ab0b6b683138c505c3f68 Reviewed-on: https://go-review.googlesource.com/7960 Reviewed-by: David Crawshaw <crawshaw@golang.org>
* regexp: port RE2's bitstate backtracker to the regexp packageMichael Matloob2015-03-232-7/+374
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a port of RE2's bitstate backtracker, which triggers under the same conditions that the RE2 backtracker triggers. However I wasn't sure how to port over some of the optimizations in the RE2 backtracker, and there is a ~2% penalty on benchmarks that don't trigger the backtracker. benchmark old ns/op new ns/op delta BenchmarkLiteral 312 189 -39.42% BenchmarkNotLiteral 4435 3001 -32.33% BenchmarkMatchClass 5758 4378 -23.97% BenchmarkMatchClass_InRange 5385 4084 -24.16% BenchmarkReplaceAll 5291 3505 -33.76% BenchmarkAnchoredLiteralShortNonMatch 190 200 +5.26% BenchmarkAnchoredLiteralLongNonMatch 189 194 +2.65% BenchmarkAnchoredShortMatch 479 304 -36.53% BenchmarkAnchoredLongMatch 478 499 +4.39% BenchmarkOnePassShortA 791 798 +0.88% BenchmarkNotOnePassShortA 3202 1571 -50.94% BenchmarkOnePassShortB 614 633 +3.09% BenchmarkNotOnePassShortB 2685 881 -67.19% BenchmarkOnePassLongPrefix 152 154 +1.32% BenchmarkOnePassLongNotPrefix 505 533 +5.54% BenchmarkMatchEasy0_32 139 171 +23.02% BenchmarkMatchEasy0_1K 653 1797 +175.19% BenchmarkMatchEasy0_32K 12032 13346 +10.92% BenchmarkMatchEasy0_1M 462882 461272 -0.35% BenchmarkMatchEasy0_32M 15015339 15365238 +2.33% BenchmarkMatchEasy1_32 122 168 +37.70% BenchmarkMatchEasy1_1K 3339 2612 -21.77% BenchmarkMatchEasy1_32K 72330 71721 -0.84% BenchmarkMatchEasy1_1M 2545410 2652284 +4.20% BenchmarkMatchEasy1_32M 80072063 82609750 +3.17% BenchmarkMatchMedium_32 2359 1980 -16.07% BenchmarkMatchMedium_1K 75939 58593 -22.84% BenchmarkMatchMedium_32K 2450907 2501106 +2.05% BenchmarkMatchMedium_1M 78707697 80174418 +1.86% BenchmarkMatchMedium_32M 2535146010 2570896441 +1.41% BenchmarkMatchHard_32 4297 2960 -31.11% BenchmarkMatchHard_1K 133592 88997 -33.38% BenchmarkMatchHard_32K 4240445 4336907 +2.27% BenchmarkMatchHard_1M 136187006 139350238 +2.32% BenchmarkMatchHard_32M 4350855890 4478537306 +2.93% benchmark old MB/s new MB/s speedup BenchmarkMatchEasy0_32 228.74 186.11 0.81x BenchmarkMatchEasy0_1K 1565.91 569.64 0.36x BenchmarkMatchEasy0_32K 2723.31 2455.10 0.90x BenchmarkMatchEasy0_1M 2265.32 2273.22 1.00x BenchmarkMatchEasy0_32M 2234.68 2183.79 0.98x BenchmarkMatchEasy1_32 261.08 190.22 0.73x BenchmarkMatchEasy1_1K 306.59 391.91 1.28x BenchmarkMatchEasy1_32K 453.03 456.88 1.01x BenchmarkMatchEasy1_1M 411.95 395.35 0.96x BenchmarkMatchEasy1_32M 419.05 406.18 0.97x BenchmarkMatchMedium_32 13.56 16.16 1.19x BenchmarkMatchMedium_1K 13.48 17.48 1.30x BenchmarkMatchMedium_32K 13.37 13.10 0.98x BenchmarkMatchMedium_1M 13.32 13.08 0.98x BenchmarkMatchMedium_32M 13.24 13.05 0.99x BenchmarkMatchHard_32 7.45 10.81 1.45x BenchmarkMatchHard_1K 7.67 11.51 1.50x BenchmarkMatchHard_32K 7.73 7.56 0.98x BenchmarkMatchHard_1M 7.70 7.52 0.98x BenchmarkMatchHard_32M 7.71 7.49 0.97x Fixes #4154 Change-Id: Iff7fb9507f0872b320d08afc08679751ed1b28bc Reviewed-on: https://go-review.googlesource.com/2153 Reviewed-by: Russ Cox <rsc@golang.org>
* all: use "reports whether" in place of "returns true if(f)"Josh Bleecher Snyder2015-03-181-2/+2
| | | | | | | | Comment changes only. Change-Id: I56848814564c4aa0988b451df18bebdfc88d6d94 Reviewed-on: https://go-review.googlesource.com/7721 Reviewed-by: Rob Pike <r@golang.org>
* regexp: update URLs in testsShenghou Ma2015-01-261-2/+2
| | | | | | Change-Id: I06035d949272157bbb7255563b37ac93cbf07f15 Reviewed-on: https://go-review.googlesource.com/3272 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* regexp: fix typo in comment: s/onpass/onepass/Michael Matloob2014-12-241-1/+1
| | | | | | Change-Id: Idff57050a34d09e7fa9b77e9b53d61bb5ea2a71c Reviewed-on: https://go-review.googlesource.com/2095 Reviewed-by: Minux Ma <minux@golang.org>
* regexp/syntax: Clarify comment of OpAnyCharNotNL.Robin Eklind2014-11-111-1/+1
| | | | | | | LGTM=iant R=golang-codereviews, iant CC=golang-codereviews https://golang.org/cl/171560043
* regexp: fix TestOnePassCutoffRuss Cox2014-10-201-4/+12
| | | | | | | | | | | | | The stack blowout can no longer happen, but we can still test that too-complex regexps are rejected. Replacement for CL 162770043. LGTM=iant, r R=r, iant CC=bradfitz, golang-codereviews https://golang.org/cl/162860043
* regexp/syntax: fix validity testing of zero repeatsIan Lance Taylor2014-10-201-1/+6
| | | | | | | | | | This is already tested by TestRE2Exhaustive, but the build has not broken because that test is not run when using -test.short. LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/155580043
* regexp: correct doc comment for ReplaceAllLiteralStringIan Lance Taylor2014-10-191-1/+1
| | | | | | | | | Fixes #8959. LGTM=adg R=golang-codereviews, adg CC=golang-codereviews https://golang.org/cl/161790043
* regexp/syntax: regenerate doc.go from re2 syntaxRuss Cox2014-10-061-24/+24
| | | | | | | | | | | Generated using re2/doc/mksyntaxgo. Fixes #8505. LGTM=iant R=r, iant CC=golang-codereviews https://golang.org/cl/155890043
* regexp/syntax: reject large repetitions created by nesting small onesRuss Cox2014-09-302-0/+47
| | | | | | | | | Fixes #7609. LGTM=r R=r CC=golang-codereviews https://golang.org/cl/150270043
* build: move package sources from src/pkg to srcRuss Cox2014-09-0827-0/+14992
Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.