summaryrefslogtreecommitdiff
path: root/NEWS
diff options
context:
space:
mode:
authorPaul Eggert <eggert@cs.ucla.edu>2016-06-02 10:00:48 -0700
committerPaul Eggert <eggert@cs.ucla.edu>2016-06-02 15:26:21 -0700
commita3d42de483a29c1da381f5c3dad87cbbc86a5c70 (patch)
tree211301764c05bac498d6153d7975e85d4959c2be /NEWS
parentc7526623f458084cbe947f0e1b43f328481b2ca3 (diff)
downloadgrep-a3d42de483a29c1da381f5c3dad87cbbc86a5c70.tar.gz
grep: use Aho-Corasick algorithm to search multiple fixed words
Searching multiple fixed words, grep used the Commentz-Walter algorithm, but this was O(m*n) and was very slow in the worst case. For example: - input: yes `printf %040d` | head -10000000 - word1: x0000000000000000000 - word2: x This change instead uses the Aho-Corasick algorithm to search multiple fixed words. It uses a high-quality trie-building function that is already defined for Commentz-Walter in kwset.c. I see 7x speed-up even for a typical case on Fedora 21 with a 3.2GHz i5 by this change. Using best-of-5 trials for the benchmark: find /usr/share/doc/ -type f | LC_ALL=C time -p xargs.sh src/grep -Ff /usr/share/dict/linux.words >/dev/null The results were: real 11.37 user 11.03 sys 0.24 [without the change] real 1.49 user 1.31 sys 0.15 [with the change] * src/kwset.c (struct kwset): Add a new member 'mode'. (kwsalloc): Use it. All callers are changed. (kwsincr): Using Aho-Corasick algorithm, build tries in normal order. (acexec_trans, acexec): Add a new function. (kwsexec): Use it. * src/kwset.h (kwsalloc): Update a prototype. * NEWS (Improvements): Mention it.
Diffstat (limited to 'NEWS')
-rw-r--r--NEWS5
1 files changed, 5 insertions, 0 deletions
diff --git a/NEWS b/NEWS
index cb035dcd..fb6f41e4 100644
--- a/NEWS
+++ b/NEWS
@@ -6,6 +6,11 @@ GNU grep NEWS -*- outline -*-
grep can be much faster now when standard output is /dev/null.
+** Improvements
+
+ grep -F is now typically 7 times faster for a lot of words. The
+ improvement replace Commentz-Walter algorithm which has been used for
+ a long time to Aho-Corasick algorithm.
* Noteworthy changes in release 2.25 (2016-04-21) [stable]