| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
* doc/grep.in.1: Use "-" in copyright year ranges, not \en.
|
| |
|
|
|
|
|
| |
* src/dfasearch.c (possible_backrefs_in_pattern): Remove a
duplicate "a", insert a "be" and a comma, and reformat.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes some bugs in the previous commit,
and should finish the fix for Bug#33249.
* NEWS: Mention fix for Bug#33249.
* src/dfasearch.c (possible_backrefs_in_pattern, regex_compile)
(GEAcompile): In new code, prefer ptrdiff_t to size_t when either
will do, since ptrdiff_t has better error checking. At some point
we should adjust the old code too.
(possible_backrefs_in_pattern): Rename from
find_backref_in_pattern. New arg BS_SAFE. All uses changed.
Fix false negative if a multibyte character ends in a single
'\\' byte, followed by the two bytes '\\', '1'.
(regex_compile): Simplify.
(GEAcompile): Avoid quadratic behavior when reallocating growing
buffers. Fix a couple of bugs in copying pattern data involving
backreferences. Fix another bug in copying pattern metadata
involving backreferences, by removing the need to copy it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When grep uses regex, it splits a pattern with multiple lines by
newline character into fragments. Compilation and execution run for
each fragment. That causes slowdown. By this change, each fragment is
divided into groups by whether the fragment includes back references.
A fragment with back references constitutes group, and all fragments
that lack back references also constitute a group.
This change extremely speeds-up following case.
$ seq -f '%040g' 0 9999 | sed '1s/$/\\(0\\)\\1/' >pat
$ yes 00000000000000000000000000000000000000000x | head -10000 >in
$ time -p env LC_ALL=C src/grep -f pat in
* src/dfasearch.c (find_backref_in_pattern, regex_compile):
New functions.
(GEAcompile): Use the new functions to group fragments
as mentioned above.
|
|
|
|
|
|
|
|
|
|
| |
DFAMUST() must be called after parse and before tokens re-order which is
introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0c98, but both are
executed in compilation phase.
* lib/dfa.c (dfaparse): Change it to global function.
(dfacomp): If first argument is NULL, skip parse.
* lib/dfa.h: (dfaparse): Add a prototype.
|
|
|
|
|
|
|
|
| |
Update Gnulib to latest. Also:
* src/dfasearch.c (EGexecute): Use ptrdiff_t, not size_t,
to match new Gnulib API.
* tests/Makefile.am (TESTS): Add dfa-invalid-utf8.
* tests/dfa-invalid-utf8: New file.
|
|
|
|
|
|
|
| |
* src/searchutils.c (mb_goback): New parameter. All callers changed.
* src/search.h (mb_goback): Update prototype.
* src/kwsearch.c (Fexecute): Use mb_goback's MBCLEN to detect a
word-boundary even more efficiently.
|
|
|
|
| |
* gnulib: Also update submodule for its copyright updates.
|
|
|
|
|
|
| |
* gnulib: Update to latest.
* all files: Run "make update-copyright".
* bootstrap: Update from gnulib.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently ‘grep -i i’ is slow in a UTF-8 locale, because ‘i’ in
the pattern matches the two-byte character 'ı' (U+0131, LATIN
SMALL LETTER DOTLESS I) in data, and kwset handles only
single-byte character translations, so grep falls back on a slower
DFA-based search for all searches. Improve -i performance in the
typical case by using kwset when data are free of troublesome
characters like 'ı', falling back on the DFA only when data
contain troublesome characters.
* src/dfasearch.c (GEAcompile):
* src/grep.c (compile_fp_t):
* src/kwsearch.c (Fcompile):
* src/pcresearch.c (Pcompile):
Pattern arg is now char *, not char const *, since Fcompile
now reallocates it sometimes.
* src/grep.c (all_single_byte_after_folding): Remove.
All callers removed.
(fgrep_icase_charlen): New function.
(fgrep_icase_available, try_fgrep_pattern):
Use it, for more-generous semantics.
(fgrep_to_grep_pattern): Now extern.
(main): Do not free keys, since Fexecute may use them.
* src/kwsearch.c (struct kwsearch): New struct.
(Fcompile): Return it. If -i, be more generous about patterns.
(Fexecute): Use it. Fall back on DFA when the data contain
troublesome characters; this should be rare in practice.
* src/kwset.c, src/kwset.h (kwswords): New function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code already cannot handle objects with size greater than
SIZE_MAX / 2, so be more honest about it and use ptrdiff_t instead
of size_t. ptrdiff_t arithmetic is signed, which allows for more
checking via -fsanitize=undefined. It also makes the code a tad
smaller on x86-64, since it can test for < 0 rather than for ==
SIZE_MAX.
* src/dfasearch.c (struct dfa_comp.kwset_exact_matches):
(kwsmusts, EGexecute):
* src/kwsearch.c (Fcompile, Fexecute):
* src/kwset.c (struct kwset.kwsexec, kwsincr, memchr_kwset)
(memoff2_kwset, bmexec_trans, bmexec, cwexec, acexec_trans)
(acexec, kwsexec):
* src/kwset.h (struct kwsmatch.index, .offset, .size):
Prefer ptrdiff_t to size_t where either will do.
|
|
|
|
|
| |
* gnulib: Update to latest.
* all files: Run "make update-copyright".
|
|
|
|
|
|
|
|
|
| |
It's not really dfasearch-specific, and grep.c initializes it, so it
seems like the most appropriate "owner".
* src/dfasearch.c (localeinfo): Remove.
* src/grep.c (localeinfo): Add.
* src/search.h (localeinfo): Move to new commented section.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/dfasearch.c (struct dfa_comp): New struct to hold
previously-global variables.
(dfawarn): Remove static variable.
(kwsmusts): Operate on a dfa_comp parameter instead of global
variables.
(GEAcompile): Allocate and return a dfa_comp struct instead of setting
global variables.
(EGexecute): Operate on a dfa_comp parameter instead of global
variables.
* src/searchutils.c (kwsinit): Replace a static array with a
dynamically-allocated one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To facilitate removing mutable global state from search backends,
compile() functions will return an opaque pointer to backend-specific
data, which must then be passed back into the corresponding execute()
function. This is merely a preparatory step changing function
signatures and call sites, so the pointers passed & returned are
dummies for now and not (yet) actually used.
* src/grep.c (compile_fp_t): Now returns an opaque pointer (the
compiled pattern).
(execute_fp_t): Now passed the pointer returned by a compile_fp_t.
All call sites updated accordingly.
(compiled_pattern): New static variable.
* src/dfasearch.c (GEAcompile): Return a void pointer (dummy NULL).
(EGexecute): Receive a void pointer argument (unused).
* src/kwsearch.c (Fcompile): Return a void pointer (dummy NULL).
(Fexecute): Receive a void pointer argument (unused).
* src/pcresearch.c (Pcompile): Return a void pointer (dummy NULL).
(Pexecute): Receive a void pointer argument (unused).
* src/search.h: Update compile/execute function prototypes.
|
|
|
|
|
|
|
|
| |
* src/dfasearch.c (EGexecute):
* src/grep.c (main):
* src/kwsearch.c (Fexecute):
* src/pcresearch.c (Pcompile):
Prefer localeinfo.multibyte to (MB_CUR_MAX > 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves performance a bit.
* src/dfasearch.c, src/kwsearch.c (wordchar):
Remove; now in searchutils.c.
* src/grep.c (main): Call wordinit if -w.
* src/search.h: Adjust.
* src/searchutils.c: Include verify.h.
(word_start): New static var.
(wordchar): Move here from dfasearch.c and kwsearch.c.
(wordinit, wordchars_count, wordchar_next, wordchar_prev):
New functions.
(mb_prev_wc, mb_next_wc): Remove.
All callers changed to use the new functions instead.
|
|
|
|
| |
* src/dfasearch.c (GEAcompile): Remove use of flag, RE_ICASE covers it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit f6603c4e1e04dbb87a7232c4b44acc6afdf65fef,
as the extra performance is not worth the trouble for PCRE users.
Problem reported by Stephane Chazelas in:
http://bugs.gnu.org/22655#103
* NEWS: Document this and the next patch.
* src/dfasearch.c (EGexecute):
* src/grep.c (execute_fp_t):
* src/kwsearch.c (Fexecute):
* src/pcresearch.c (Pexecute):
First arg is now a const pointer again.
* src/grep.c (buf_has_encoding_errors): Now static.
* src/grep.h (buf_has_encoding_errors): Remove decl.
* src/search.h: Adjust decls.
* src/pcresearch.c (reflags): Remove. All uses removed.
(Pcompile, Pexecute): Do not use PCRE_MULTILINE.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/die.h: New file.
* src/dfasearch.c, src/grep.c, src/pcresearch.c: Include die.h.
* src/dfasearch.c (dfaerror):
* src/grep.c (context_length_arg, add_count, prline, setmatcher, main):
* src/pcresearch.c (jit_exec, Pcompile, Pexecute):
Use 'die' instead of 'error' when exiting.
* src/pcresearch.c: Do not include verify.h.
(die): Remove; now in die.h.
* src/search.h: Do not include error.h here, since this file does
not use anything defined in error.h. Instead, dfasearch.c, which
uses error.h's symbols, now includes error.h directly.
|
|
|
|
|
|
|
|
|
|
| |
This follows up on a suggestion by Norihiro Tanaka (Bug#24262).
* src/dfa.c (struct regex_syntax): New member 'anchor'.
(char_context): Use it.
(dfasyntax): Change signature to specify it, along with the old
FOLD and EOL args, as a single DFAOPTS arg. All uses changed.
* src/dfa.h (DFA_ANCHOR, DFA_CASE_FOLD, DFA_EOL_NUL): New constants
for dfasyntax new last arg.
|
|
|
|
|
|
|
| |
This builds on a suggestion by Norihiro Tanaka (Bug#24009).
* src/dfasearch.c (GEAcompile): Use a fastmap unless -i.
This improves performance 20x for me using the first benchmark
given in Bug#24009.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is mostly refactoring, with a bit of performance tweaking.
It is done in preparation for a fix for Bug#24009.
* src/dfasearch.c (patterns): Now of type struct re_pattern_buffer *
instead of an anonymous struct pointer, since there is no longer
any need to keep regs here. All uses changed.
(GEAcompile): Use patlim instead of a hard-to-follow "total".
Use x2nrealloc to avoid potential O(N**2) reallocation algorithm.
Initialize just the pattern members that need clearing.
(EGexecute): Put regs into a static variable, as this code did
before 2001-02-18, as there is no need to have a separate set of
regs for each pattern. Explain the "Q@#%!#" comment better.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This follows up on the -iF performance improvement (Bug#23752).
* NEWS: Simplify description of -iF improvement.
* src/dfa.c: Do not include wctype.h.
(lonesome_lower, case_folded_counterparts): Move to localeinfo.c.
(CASE_FOLDED_BUFSIZE): Move to localeinfo.h.
* src/grep.c: Do not include wctype.h.
(lonesome_lower): Remove.
(fgrep_icase_available): Use case_folded_counterparts instead.
Do not call it for the same character twice.
Return false on wcrtomb failures (which should never happen).
(fgrep_to_grep_pattern, main): Simplify. Let fgrep_to_grep’s
caller fiddle with the global variables.
* src/localeinfo.c: Include <wctype.h>
(lonesome_lower, case_folded_counterparts):
Move here from src/dfa.c. Return int, not unsigned int.
Verify that CASE_FOLDED_BUFSIZE is big enough.
* src/localeinfo.h (CASE_FOLDED_BUFSIZE): Now 32, so that
we don’t expose lonesome_lower’s size.
* src/searchutils.c (kwsinit): Return new kwset instead of
storing it via a pointer. All callers changed. Simplify a bit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a multibyte locale, if a pattern is composed of only single byte
characters and their all counterparts are also single byte characters
and the pattern does not have invalid sequences, grep -iF uses the
fgrep matcher, the same as in a single byte locale (Bug#23752).
* NEWS: Mention it.
* src/grep.c (lonesome_lower): New constant.
(fgrep_icase_available): New function.
(fgrep_to_grep_pattern): Simplify it.
(main): Use them.
* src/searchutils.c (kwsinit): New arg MB_TRANS; all uses changed.
Try fgrep matcher for case insensitive matching by grep -F in multibyte
locale.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This follows up on Zev Weiss’s recent patches to make the DFA code
thread-safe (Bug#24249). It removes the remaining static
variables used by dfa.c. These variables are locale-dependent, so
they would cause problems in multithreaded code where different
threads are in different locales (e.g., via uselocale). I
abstracted most of the variables into a new localeinfo module.
* src/Makefile.am (grep_SOURCES): Add localeinfo.c.
(noinst_HEADERS): Add localeinfo.h.
* src/dfa.c: Include localeinfo.h.
(struct dfa): Remove multibyte member, as it is now part of
localeinfo. New members simple_locale and localeinfo.
Put locale-related members at the end.
(mbrtowc_cache): Remove; now part of dfa->localeinfo.
(charclass_index): Rename back from dfa_charclass_index,
since it's private.
(unibyte_word_constituent): New arg DFA; use its sbctowc member.
(using_utf8, dfa_using_utf8, init_mbrtowc_cache, check_utf8):
Remove; now done by localeinfo members. All uses changed.
(dfasyntax): New localeinfo arg. Move to end to avoid forward decls.
Initialize the entire DFA.
(unibyte_c, check_unibyte_c): Remove; now in simple_locale member.
(using_simple_locale): Now takes bool instead of DFA.
Do the locale check here, rather than in the caller,
as the result is now cached in dfa->simple_locale.
(dfaalloc): Just allocate the DFA. dfasyntax now initializes it.
* src/dfa.h: Add forward decl of struct localeinfo.
Adjust to new dfa.c API.
* src/dfasearch.c (localeinfo): New var, replacing former static
vars like mbrtowc_cache.
* src/localeinfo.c, src/localeinfo.h: New files.
* src/search.h: Include localeinfo.h.
(localeinfo): New decl.
* src/searchutils.c (mbclen_cache, build_mbclen_cache):
Remove. All uses changed to localeinfo.
* tests/Makefile.am (dfa_match_aux_LDADD): Add localeinfo.o.
* tests/dfa-match-aux.c: Include localeinfo.h.
(main): Adjust to changes in DFA API.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/dfa.c: Replace utf8 and unibyte_c static local variables with
static globals initialized by a new function dfa_init() which must be
called before any other dfa*() functions.
(dfa_using_utf8): Rename using_utf8() to dfa_using_utf8() for
consistency with other exported functions.
* src/dfa.h (dfa_using_utf8): Rename using_utf8() to dfa_using_utf8();
also add _GL_ATTRIBUTE_PURE.
(dfa_init): New function.
* src/grep.c (main), tests/dfa-match-aux.c (main): Call dfa_init().
* src/dfasearch.c (EGexecute): Replace using_utf8 with dfa_using_utf8.
* src/kwsearch.c (Fexecute): Likewise.
* src/pcresearch.c (Pcompile): Likewise.
http://bugs.gnu.org/24259
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/dfa.c: move global variables holding regex syntax configuration
into a new struct (`struct regex_syntax') and add an instance of it to
struct dfa. All references to the globals are replaced with
references to the dfa struct's new member. As a side effect, a
`struct dfa' must be allocated with dfaalloc() and passed to
dfasyntax().
* src/dfa.h (dfasyntax): Add new struct dfa* parameter.
* src/dfasearch.c (GEAcompile): Allocate `dfa' earlier and pass it to
dfasyntax().
* tests/dfa-match-aux.c (main): Pass `dfa' to dfasyntax().
http://bugs.gnu.org/24259
|
|
|
|
|
|
|
|
|
|
|
| |
Some compilers warn about 'static int const x;' on the grounds
that X should have an initializer. Instead of worrying about
this, rewrite to avoid this sort of thing.
* src/dfa.c (emptyset): New function.
(parse_bracket_exp): Use it instead of 'equal' and a zero constant.
* src/dfasearch.c (struct patterns): Remove tag 'patterns'.
(patterns0): Remove zero constant.
(GEAcompile): Use memset instead of the zero constant.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Determining the file name and line number is a little tricky because
of the way the regular expressions are all concatenated onto a newline-
separated list. By the time grep would compile regular expressions,
the <filename,lineno> origin of each regexp was no longer available.
This patch adds a list of filename,first_lineno pairs, one per input
source, by which we can then map the ordinal regexp number to a
filename,lineno pair for the diagnostic.
* src/dfasearch.c (GEAcompile): When diagnosing an invalid regexp
specified via -f FILE, include the "FILENAME:LINENO: " prefix.
Also, when there are two or more lines with compilation failures,
diagnose all of them, rather than stopping after the first.
* src/grep.h (pattern_file_name): Declare it.
* src/grep.c: (struct FL_pair): Define type.
(fl_pair, n_fl_pair_slots, n_pattern_files, patfile_lineno):
Define globals.
(fl_add, pattern_file_name): Define functions.
(main): Call fl_add for each type of the following: -e argument,
-f argument, command-line-specified (without -e) regexp.
* tests/filename-lineno.pl: New file.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Improvements): Mention this.
Initially reported by Gunnar Wolf in https://bugs.debian.org/525214
Forwarded to grep's bug list by Santiago Ruano Rincón as
http://debbugs.gnu.org/23965
|
|
|
|
|
|
|
| |
\n cannot occur inside a multibyte character. So an input always
matches single line only with DFA superset.
* src/dfasearch.c (EGexecute): Simplify it with above.
|
|
|
|
|
|
| |
* src/dfa.c (parse_bracket_exp): mark zeroclass const.
* src/dfasearch.c: mark patterns0 const.
http://bugs.gnu.org/23712
|
|
|
|
|
|
|
|
|
| |
Searching multiple fixed words, grep immediately returns without longest
match if not needed. Without this change, grep tries longest match for
multiple words even if not needed.
* src/kwset.c (kwsexec, acexec, cwexec, bmexec): Add a bool argument
for whether longest match is needed. All callers changed.
* src/kwset.h (kwsexec): Update prototype.
|
|
|
|
|
|
|
|
| |
* src/dfa.c (struct dfa, dfasyntax, dfaanalyze, dfaexec_main)
(dfaexec_mb, dfaexec_sb, dfaexec_noop, dfaexec, dfacomp):
* src/dfa.h (dfasyntax, dfacomp, dfaexec, dfaanalyze):
* src/dfasearch.c (EGexecute):
Use bool for boolean.
|
|
|
|
|
|
|
|
|
|
|
| |
* src/dfasearch.c (EGexecute): Clear the newline_anchor bit when
eolbyte is not '\n'.
* tests/z-anchor-newline: New file.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Bug fixes): Describe it.
Originally reported by Ulrich Mueller in
https://bugs.gentoo.org/show_bug.cgi?id=574662
Reported to us by Sergei Trofimovich as http://debbugs.gnu.org/22655
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS, doc/grep.texi (Matching Control): Mention this.
* src/dfasearch.c (EGexecute):
* src/pcresearch.c (Pcompile):
Don't get confused by -w if -x is also present.
* src/pcresearch.c (Pcompile): Remove misleading comment about
non-UTF-8 multibyte locales, as PCRE doesn't support them.
Calculate buffer sizes more carefully; the old method
allocated a buffer slightly too big, seemingly due to luck.
* tests/backref-word, tests/pcre: Add tests for this bug.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On my platform in the en_US.utf8 locale, this makes 'grep -P "z.*a" k'
220x faster, where k is created by the shell command:
yes 'abcdefg hijklmn opqrstu vwxyz' | head -n 10000000 >k
* src/dfasearch.c (EGexecute):
* src/grep.c (execute_fp_t):
* src/kwsearch.c (Fexecute):
* src/pcresearch.c (Pexecute):
First arg is now char *, not char const *, since Pexecute now
temporarily modifies this argument.
* src/grep.c, src/grep.h (buf_has_encoding_errors): Now extern.
* src/pcresearch.c (Pexecute): Use it. If the input is free of
encoding errors, use a multiline search and the PCRE_NO_UTF8_CHECK
option, as this is typically way faster. This restores an
optimization that was removed with the recent changes for binary
file detection.
|
|
|
|
|
|
|
|
| |
Run "make update-copyright" and then...
* gnulib: Update to latest.
* tests/init.sh: Update from gnulib.
* bootstrap: Likewise.
|
|
|
|
|
|
|
|
|
| |
EGexecute would use "backref" uninitialized.
While that could have no bearing on correctness, it could
impact performance, via an unnecessary use of regexp.
* src/dfasearch.c (EGexecute): Initialize backref.
Reported as http://debbugs.gnu.org/21273
Introduced by commit v2.21-55-gea0ebaa.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we won't use KWset, do not build a "struct dfamust".
Now it is built only when needed.
* src/dfa.c (struct dfa) [musts]: Remove member.
(dfacomp): Don't build dfamust here.
(dfamustfree): New function to free a struct dfamust.
(dfamust): Make it a global function, and make it return a pointer
to a malloc'd struct dfamust.
(dfamusts): Remove it.
* src/dfa.h (struct dfamust) [next]: Remove member.
In the implementation preceding this patch, there was
never more than one of these in a given "struct dfa".
(dfamustfree, dfamust): Add prototypes.
(dfamusts): Remove prototype.
(dfaalloc): Declare with _GL_ATTRIBUTE_MALLOC.
To make that symbol usable there, move the inclusion
of "xalloc.h" from dfa.c to this file, dfa.h.
* src/dfasearch.c (kwsmusts): Adapt to use the new interface.
Update the comments to reflect reality.
This addresses http://bugs.gnu.org/17715
|
|
|
|
|
| |
Run "make update-copyright". Also, ...
* grep.texi: Update manually, converting each "--" to "-".
|
|
|
|
|
| |
* src/dfasearch.c (EGexecute): Avoid unnecessary test in a context
where memrchr cannot return a null pointer.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/dfasearch.c (EGexecute): Without this patch, the code reverts
to KWset when the DFA superset matches multiple lines.
However, if the DFA superset matches multiple lines, it most likely
also matches a single line, and reverting to KWset means dfafast
won't work effectively. Change the code so that it retries the DFA
superset immediately after it matches multipline lines. On my platform
this improves the performance of "LC_ALL=C grep '\(ab\)cd\1d' k" from
3.48 to 2.14 seconds realtime, where k contains the output of
"yes abcdabc | head -50000000".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS: Document this.
* src/dfasearch.c, src/kwsearch.c (WCHAR): Remove.
(wordchar): New static function.
* src/dfasearch.c (EGexecute):
* src/kwsearch.c (Fexecute): Use the new functions, so that the
code works correctly if a multibyte character adjacent to the
match has two or more bytes.
* src/search.h, src/searchutils.c (mb_prev_wc, mb_next_wc):
New functions.
* tests/word-delim-multibyte: Add a test for grep -w (which now
passes), and a test for \> (which still fails). The \< test also
still fails.
|
|
|
|
|
|
|
|
|
| |
* src/search.h, src/searchutils.c (mb_goback): Rename from
is_mb_middle. Omit last arg. Return number of bytes to go back,
not just a boolean. All uses changed.
* src/dfasearch.c (EGexecute):
* src/kwsearch.c (Fexecute): Adjust to API change.
* src/kwsearch.c (Fexecute): Eliminate common subexpression.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This follows up to http://bugs.gnu.org/17376 and fixes a different
set of incompatibilities, namely between the regex matcher and the
other matchers, when the pattern contains encoding errors.
The GNU regex matcher is not consistent in this area: sometimes
an encoding error matches only itself, and sometimes it
matches part of a multibyte character. There is no documentation
for grep's behavior in this area and users don't seem to care,
and it's simpler to defer to the regex matcher for problematic
cases like these.
* NEWS: Document this.
* src/dfa.c (ctok): Remove. All uses removed.
(parse_bracket_exp, atom): Use BACKREF if a pattern contains
an encoding error, so that the matcher will revert to regex.
* src/dfasearch.c, src/grep.c, src/pcresearch.c, src/searchutils.c:
Don't include dfa.h, since search.h now does that for us.
* src/dfasearch.c (EGexecute):
* src/kwsearch.c (Fexecute): In a UTF-8 locale, there's no need to
worry about matching part of a multibyte character.
* src/grep.c (contains_encoding_error): New static function.
(main): Use it, so that grep -F is consistent with plain fgrep
when the pattern contains an encoding error.
* src/search.h: Include dfa.h, so that kwsearch.c can call using_utf8.
* src/searchutils.c (is_mb_middle): Remove UTF-8-specific code.
Callers now ensure that we are in a non-UTF-8 locale.
The code was clearly wrong, anyway.
* tests/fgrep-infloop, tests/invalid-multibyte-infloop:
* tests/prefix-of-multibyte:
Do not require that grep have a particular behavor for this test.
It's OK to match (exit status 0), not match (exit status 1), or
report an error (exit status 2), since the pattern contains an
encoding error and grep's behavior is not specified for such
patterns. Test only that KWset, DFA, and regex agree.
* tests/prefix-of-multibyte: Add tests for ABCABC and __..._ABCABC___.
|
|
|
|
| |
* src/dfasearch.c (EGexecute): Change if-then-else to !if-else-then.
|
|
|
|
| |
* src/dfasearch.c (EGexecute): Do it.
|
|
|
|
|
|
|
| |
* src/dfa.c, src/dfa.h (dfasuperset): Arg is now const pointer.
Now pure.
* src/dfasearch.c (EGexecute): Coalesce some duplicate code.
Don't worry about memrchr returning NULL when that's impossible.
|