diff options
author | Paul Eggert <eggert@cs.ucla.edu> | 2014-05-05 20:19:19 -0700 |
---|---|---|
committer | Paul Eggert <eggert@cs.ucla.edu> | 2014-05-05 20:19:59 -0700 |
commit | eb3292b3b205e50d0373f26ff0950ec82f49c14a (patch) | |
tree | bfcb18201f277f03886e83efedfc070693652d45 /src/searchutils.c | |
parent | 17683df11fbea7aa01c9d60f1b45874c9ea5e26a (diff) | |
download | grep-eb3292b3b205e50d0373f26ff0950ec82f49c14a.tar.gz |
grep: fix encoding-error incompatibilities among regex, DFA, KWset
This follows up to http://bugs.gnu.org/17376 and fixes a different
set of incompatibilities, namely between the regex matcher and the
other matchers, when the pattern contains encoding errors.
The GNU regex matcher is not consistent in this area: sometimes
an encoding error matches only itself, and sometimes it
matches part of a multibyte character. There is no documentation
for grep's behavior in this area and users don't seem to care,
and it's simpler to defer to the regex matcher for problematic
cases like these.
* NEWS: Document this.
* src/dfa.c (ctok): Remove. All uses removed.
(parse_bracket_exp, atom): Use BACKREF if a pattern contains
an encoding error, so that the matcher will revert to regex.
* src/dfasearch.c, src/grep.c, src/pcresearch.c, src/searchutils.c:
Don't include dfa.h, since search.h now does that for us.
* src/dfasearch.c (EGexecute):
* src/kwsearch.c (Fexecute): In a UTF-8 locale, there's no need to
worry about matching part of a multibyte character.
* src/grep.c (contains_encoding_error): New static function.
(main): Use it, so that grep -F is consistent with plain fgrep
when the pattern contains an encoding error.
* src/search.h: Include dfa.h, so that kwsearch.c can call using_utf8.
* src/searchutils.c (is_mb_middle): Remove UTF-8-specific code.
Callers now ensure that we are in a non-UTF-8 locale.
The code was clearly wrong, anyway.
* tests/fgrep-infloop, tests/invalid-multibyte-infloop:
* tests/prefix-of-multibyte:
Do not require that grep have a particular behavor for this test.
It's OK to match (exit status 0), not match (exit status 1), or
report an error (exit status 2), since the pattern contains an
encoding error and grep's behavior is not specified for such
patterns. Test only that KWset, DFA, and regex agree.
* tests/prefix-of-multibyte: Add tests for ABCABC and __..._ABCABC___.
Diffstat (limited to 'src/searchutils.c')
-rw-r--r-- | src/searchutils.c | 11 |
1 files changed, 0 insertions, 11 deletions
diff --git a/src/searchutils.c b/src/searchutils.c index 76005bfc..ef2cc5a8 100644 --- a/src/searchutils.c +++ b/src/searchutils.c @@ -19,7 +19,6 @@ #include <config.h> #include <assert.h> #include "search.h" -#include "dfa.h" #define NCHAR (UCHAR_MAX + 1) @@ -230,16 +229,6 @@ is_mb_middle (const char **good, const char *buf, const char *end, const char *p = *good; mbstate_t cur_state; - if (using_utf8 () && buf - p > MB_CUR_MAX) - { - for (p = buf; buf - p > MB_CUR_MAX; p--) - if (mbclen_cache[to_uchar (*p)] != (size_t) -1) - break; - - if (buf - p == MB_CUR_MAX) - p = buf; - } - memset (&cur_state, 0, sizeof cur_state); while (p < buf) |