diff options
author | Paul Eggert <eggert@cs.ucla.edu> | 2017-01-16 12:13:50 -0800 |
---|---|---|
committer | Paul Eggert <eggert@cs.ucla.edu> | 2017-01-17 08:02:05 -0800 |
commit | aa1f107a143900d2d9a9d8c3aa541671140315ba (patch) | |
tree | 6d96789c20d185fc64dd290390ee79a5487db76e /src/pcresearch.c | |
parent | e0cb70cc85aac558106d0c66adff239c97aba563 (diff) | |
download | grep-aa1f107a143900d2d9a9d8c3aa541671140315ba.tar.gz |
Improve -i performance in typical UTF-8 searches
Currently ‘grep -i i’ is slow in a UTF-8 locale, because ‘i’ in
the pattern matches the two-byte character 'ı' (U+0131, LATIN
SMALL LETTER DOTLESS I) in data, and kwset handles only
single-byte character translations, so grep falls back on a slower
DFA-based search for all searches. Improve -i performance in the
typical case by using kwset when data are free of troublesome
characters like 'ı', falling back on the DFA only when data
contain troublesome characters.
* src/dfasearch.c (GEAcompile):
* src/grep.c (compile_fp_t):
* src/kwsearch.c (Fcompile):
* src/pcresearch.c (Pcompile):
Pattern arg is now char *, not char const *, since Fcompile
now reallocates it sometimes.
* src/grep.c (all_single_byte_after_folding): Remove.
All callers removed.
(fgrep_icase_charlen): New function.
(fgrep_icase_available, try_fgrep_pattern):
Use it, for more-generous semantics.
(fgrep_to_grep_pattern): Now extern.
(main): Do not free keys, since Fexecute may use them.
* src/kwsearch.c (struct kwsearch): New struct.
(Fcompile): Return it. If -i, be more generous about patterns.
(Fexecute): Use it. Fall back on DFA when the data contain
troublesome characters; this should be rare in practice.
* src/kwset.c, src/kwset.h (kwswords): New function.
Diffstat (limited to 'src/pcresearch.c')
-rw-r--r-- | src/pcresearch.c | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/src/pcresearch.c b/src/pcresearch.c index 28144b4f..703498c3 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -91,7 +91,7 @@ jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes, #endif void * -Pcompile (char const *pattern, size_t size, reg_syntax_t ignored) +Pcompile (char *pattern, size_t size, reg_syntax_t ignored) { #if !HAVE_LIBPCRE die (EXIT_TROUBLE, 0, |