summaryrefslogtreecommitdiff
path: root/src/searchutils.c
diff options
context:
space:
mode:
authorPaul Eggert <eggert@cs.ucla.edu>2016-06-02 10:00:48 -0700
committerPaul Eggert <eggert@cs.ucla.edu>2016-06-02 15:26:21 -0700
commita3d42de483a29c1da381f5c3dad87cbbc86a5c70 (patch)
tree211301764c05bac498d6153d7975e85d4959c2be /src/searchutils.c
parentc7526623f458084cbe947f0e1b43f328481b2ca3 (diff)
downloadgrep-a3d42de483a29c1da381f5c3dad87cbbc86a5c70.tar.gz
grep: use Aho-Corasick algorithm to search multiple fixed words
Searching multiple fixed words, grep used the Commentz-Walter algorithm, but this was O(m*n) and was very slow in the worst case. For example: - input: yes `printf %040d` | head -10000000 - word1: x0000000000000000000 - word2: x This change instead uses the Aho-Corasick algorithm to search multiple fixed words. It uses a high-quality trie-building function that is already defined for Commentz-Walter in kwset.c. I see 7x speed-up even for a typical case on Fedora 21 with a 3.2GHz i5 by this change. Using best-of-5 trials for the benchmark: find /usr/share/doc/ -type f | LC_ALL=C time -p xargs.sh src/grep -Ff /usr/share/dict/linux.words >/dev/null The results were: real 11.37 user 11.03 sys 0.24 [without the change] real 1.49 user 1.31 sys 0.15 [with the change] * src/kwset.c (struct kwset): Add a new member 'mode'. (kwsalloc): Use it. All callers are changed. (kwsincr): Using Aho-Corasick algorithm, build tries in normal order. (acexec_trans, acexec): Add a new function. (kwsexec): Use it. * src/kwset.h (kwsalloc): Update a prototype. * NEWS (Improvements): Mention it.
Diffstat (limited to 'src/searchutils.c')
-rw-r--r--src/searchutils.c4
1 files changed, 2 insertions, 2 deletions
diff --git a/src/searchutils.c b/src/searchutils.c
index 1f21a0e9..d25e5f83 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -37,10 +37,10 @@ kwsinit (kwset_t *kwset)
for (i = 0; i < NCHAR; ++i)
trans[i] = toupper (i);
- *kwset = kwsalloc (trans);
+ *kwset = kwsalloc (trans, false);
}
else
- *kwset = kwsalloc (NULL);
+ *kwset = kwsalloc (NULL, false);
if (!*kwset)
xalloc_die ();