summaryrefslogtreecommitdiff
path: root/tests/unibyte-binary
diff options
context:
space:
mode:
authorPaul Eggert <eggert@cs.ucla.edu>2016-06-02 10:00:48 -0700
committerPaul Eggert <eggert@cs.ucla.edu>2016-06-02 15:26:21 -0700
commita3d42de483a29c1da381f5c3dad87cbbc86a5c70 (patch)
tree211301764c05bac498d6153d7975e85d4959c2be /tests/unibyte-binary
parentc7526623f458084cbe947f0e1b43f328481b2ca3 (diff)
downloadgrep-a3d42de483a29c1da381f5c3dad87cbbc86a5c70.tar.gz
grep: use Aho-Corasick algorithm to search multiple fixed words
Searching multiple fixed words, grep used the Commentz-Walter algorithm, but this was O(m*n) and was very slow in the worst case. For example: - input: yes `printf %040d` | head -10000000 - word1: x0000000000000000000 - word2: x This change instead uses the Aho-Corasick algorithm to search multiple fixed words. It uses a high-quality trie-building function that is already defined for Commentz-Walter in kwset.c. I see 7x speed-up even for a typical case on Fedora 21 with a 3.2GHz i5 by this change. Using best-of-5 trials for the benchmark: find /usr/share/doc/ -type f | LC_ALL=C time -p xargs.sh src/grep -Ff /usr/share/dict/linux.words >/dev/null The results were: real 11.37 user 11.03 sys 0.24 [without the change] real 1.49 user 1.31 sys 0.15 [with the change] * src/kwset.c (struct kwset): Add a new member 'mode'. (kwsalloc): Use it. All callers are changed. (kwsincr): Using Aho-Corasick algorithm, build tries in normal order. (acexec_trans, acexec): Add a new function. (kwsexec): Use it. * src/kwset.h (kwsalloc): Update a prototype. * NEWS (Improvements): Mention it.
Diffstat (limited to 'tests/unibyte-binary')
0 files changed, 0 insertions, 0 deletions