| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves runtime checking for integer overflow when compiling
with gcc -fsanitize=undefined and the like. It also avoids
the need for some integer casts, which can be error-prone.
* bootstrap.conf (gnulib_modules): Add idx.
* src/dfasearch.c (struct dfa_comp, kwsmusts):
(possible_backrefs_in_pattern, regex_compile, GEAcompile)
(EGexecute):
* src/grep.c (struct patloc, patlocs_allocated, patlocs_used)
(n_patterns, update_patterns, pattern_file_name, poison_len)
(asan_poison, fwrite_errno, compile_fp_t, execute_fp_t)
(buf_has_encoding_errors, buf_has_nulls, file_must_have_nulls)
(bufalloc, pagesize, all_zeros, fillbuf, nlscan)
(print_line_head, print_line_middle, print_line_tail, grepbuf)
(grep, contains_encoding_error, fgrep_icase_available)
(fgrep_icase_charlen, fgrep_to_grep_pattern, try_fgrep_pattern)
(main):
* src/kwsearch.c (struct kwsearch, Fcompile, Fexecute):
* src/kwset.c (struct trie, struct kwset, kwsalloc, kwsincr)
(kwswords, treefails, memchr_kwset, acexec_trans, kwsexec)
(treedelta, kwsprep, bm_delta2_search, bmexec_trans, bmexec)
(acexec):
* src/kwset.h (struct kwsmatch):
* src/pcresearch.c (Pcompile, Pexecute):
* src/search.h (mb_clen):
* src/searchutils.c (kwsinit, mb_goback, wordchars_count)
(wordchars_size, wordchar_next, wordchar_prev):
Prefer idx_t to size_t or ptrdiff_t for nonnegative sizes,
and prefer ptrdiff_t to size_t for sizes plus error values.
* src/grep.c (uword_size): New constant, used for signed
size calculations.
(totalnl, add_count, totalcc, print_offset, print_line_head, grep):
Prefer intmax_t to uintmax_t for wide integer calculations.
(fgrep_icase_charlen): Prefer ptrdiff_t to int for size offsets.
* src/grep.h: Include idx.h.
* src/search.h (imbrlen): New function, like mbrlen except
with idx_t and ptrdiff_t.
|
| |
|
|
|
|
|
|
|
|
|
| |
* src/kwsearch.c (Fexecute):
Assume C99 to put declarations nearer uses.
* src/kwset.c (bmexec): Omit unnecessary test.
* src/kwset.h (struct kwsmatch): Make OFFSET and SIZE individual
elements, not arrays of size 1 (a revenant of an earlier API).
All uses changed.
|
|
|
|
|
|
|
|
| |
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
* doc/grep.in.1: Use "-" in copyright year ranges, not \en.
|
|
|
|
| |
* gnulib: Also update submodule for its copyright updates.
|
|
|
|
|
|
| |
* gnulib: Update to latest.
* all files: Run "make update-copyright".
* bootstrap: Update from gnulib.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code was not being used, and complicated maintenance.
We can bring it back from the repository if it turns out
to be useful later.
* src/kwset.c (struct kwset.reverse): Remove. All uses of
FOO->reverse replaced by (FOO->kwsexec == bmexec).
(kwsalloc): Remove 'reverse' arg, as callers outside this
module do not care about algorithm choice. All callers changed.
(kwsprep): When deciding whether to use Boyer-Moore, do not worry
about being called twice on the same kwset, as that is not allowed.
(cwexec): Remove; it was never called. All uses removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently ‘grep -i i’ is slow in a UTF-8 locale, because ‘i’ in
the pattern matches the two-byte character 'ı' (U+0131, LATIN
SMALL LETTER DOTLESS I) in data, and kwset handles only
single-byte character translations, so grep falls back on a slower
DFA-based search for all searches. Improve -i performance in the
typical case by using kwset when data are free of troublesome
characters like 'ı', falling back on the DFA only when data
contain troublesome characters.
* src/dfasearch.c (GEAcompile):
* src/grep.c (compile_fp_t):
* src/kwsearch.c (Fcompile):
* src/pcresearch.c (Pcompile):
Pattern arg is now char *, not char const *, since Fcompile
now reallocates it sometimes.
* src/grep.c (all_single_byte_after_folding): Remove.
All callers removed.
(fgrep_icase_charlen): New function.
(fgrep_icase_available, try_fgrep_pattern):
Use it, for more-generous semantics.
(fgrep_to_grep_pattern): Now extern.
(main): Do not free keys, since Fexecute may use them.
* src/kwsearch.c (struct kwsearch): New struct.
(Fcompile): Return it. If -i, be more generous about patterns.
(Fexecute): Use it. Fall back on DFA when the data contain
troublesome characters; this should be rare in practice.
* src/kwset.c, src/kwset.h (kwswords): New function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code already cannot handle objects with size greater than
SIZE_MAX / 2, so be more honest about it and use ptrdiff_t instead
of size_t. ptrdiff_t arithmetic is signed, which allows for more
checking via -fsanitize=undefined. It also makes the code a tad
smaller on x86-64, since it can test for < 0 rather than for ==
SIZE_MAX.
* src/dfasearch.c (struct dfa_comp.kwset_exact_matches):
(kwsmusts, EGexecute):
* src/kwsearch.c (Fcompile, Fexecute):
* src/kwset.c (struct kwset.kwsexec, kwsincr, memchr_kwset)
(memoff2_kwset, bmexec_trans, bmexec, cwexec, acexec_trans)
(acexec, kwsexec):
* src/kwset.h (struct kwsmatch.index, .offset, .size):
Prefer ptrdiff_t to size_t where either will do.
|
|
|
|
|
|
|
| |
Remove kwset.h comments that are obsolete and seemingly not
maintained anyway; people can look in kwset.c instead.
Update comments to reflect current behavior better.
Cite Faro & Lecroq 2013. Use GNU style for end-of-sentence.
|
|
|
|
|
| |
* gnulib: Update to latest.
* all files: Run "make update-copyright".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS: Don't claim 7x, as the value seems to be system-dependent.
* src/kwset.c (struct kwset.kwsexec, bmexec, acexec_trans, acexec):
* src/kwset.c, src/kwset.h (kwsalloc, kwsexec):
Don't put 'const' into the declaration when that is irrelevant to
the API. More generally, don't bother with 'const' when it's only
a local so it is reasonably obvious to a reader that it is 'const'
anyway. It would be overkill to add 'const' to all locals that
never change.
* src/kwset.c (U): Avoid unnecessary parens.
(treefails, memoff2_kwset, bmexec_trans, bmexec, cwexec, acexec_trans):
Prefer SIZE_MAX to (size_t) -1.
(bmexec_trans, cwexec, acexec_trans):
Remove attributes for static functions that no longer seem needed.
(memoff2_kwset): Rename from memchr2_kwset, since it returns
an offset, not a pointer. All uses changed.
(cwexec, acexec_trans) [lint]: Remove initialization that is no
longer needed; at least, GCC 6.1 x86-64 does not need it.
(acexec_trans): Clarify code by using nesting rather than 'continue'.
|
|
|
|
|
|
|
|
|
| |
Searching multiple fixed words, grep immediately returns without longest
match if not needed. Without this change, grep tries longest match for
multiple words even if not needed.
* src/kwset.c (kwsexec, acexec, cwexec, bmexec): Add a bool argument
for whether longest match is needed. All callers changed.
* src/kwset.h (kwsexec): Update prototype.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Searching multiple fixed words, grep used the Commentz-Walter
algorithm, but this was O(m*n) and was very slow in the worst case.
For example:
- input: yes `printf %040d` | head -10000000
- word1: x0000000000000000000
- word2: x
This change instead uses the Aho-Corasick algorithm to search multiple
fixed words. It uses a high-quality trie-building function that is
already defined for Commentz-Walter in kwset.c.
I see 7x speed-up even for a typical case on Fedora 21 with a 3.2GHz i5
by this change. Using best-of-5 trials for the benchmark:
find /usr/share/doc/ -type f |
LC_ALL=C time -p xargs.sh src/grep -Ff /usr/share/dict/linux.words >/dev/null
The results were:
real 11.37 user 11.03 sys 0.24 [without the change]
real 1.49 user 1.31 sys 0.15 [with the change]
* src/kwset.c (struct kwset): Add a new member 'mode'.
(kwsalloc): Use it.
All callers are changed.
(kwsincr): Using Aho-Corasick algorithm, build tries in normal order.
(acexec_trans, acexec): Add a new function.
(kwsexec): Use it.
* src/kwset.h (kwsalloc): Update a prototype.
* NEWS (Improvements): Mention it.
|
|
|
|
|
|
|
|
| |
Run "make update-copyright" and then...
* gnulib: Update to latest.
* tests/init.sh: Update from gnulib.
* bootstrap: Likewise.
|
|
|
|
|
| |
Run "make update-copyright". Also, ...
* grep.texi: Update manually, converting each "--" to "-".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* src/kwset.c: Include kwset.h first, to check its prereqs.
Include xalloc.h, for xmalloc.
(kwsalloc): Use xmalloc, not malloc, so that the caller need not
worry about memory allocation failure.
(kwsalloc, kwsincr, kwsprep): Do not worry about obstack_alloc
returning NULL, as that's not possible.
(kwsalloc, kwsincr, kwsprep, bmexec, cwexec, kwsexec, kwsfree):
Omit unnecessary conversion between struct kwset * and kwset_t.
(kwsincr, kwsprep): Return void since memory-allocation failure is
not possible now. All uses changed.
* src/kwset.h: Include <stddef.h>, for size_t, so that this
include file doesn't require other files to be included first.
|
|
|
|
| |
Do that by running "make update-copyright".
|
|
|
|
| |
Run "make update-copyright".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These problems mostly occur because the code attempts to stuff
sizes into int or into unsigned int; this doesn't work on most
64-bit hosts and the errors can lead to core dumps.
* NEWS: Document this.
* src/dfa.c (token): Typedef to ptrdiff_t, since the enum's
range could be as small as -128 .. 127 on practical hosts.
(position.index): Now size_t, not unsigned int.
(leaf_set.elems): Now size_t *, not unsigned int *.
(dfa_state.hash, struct mb_char_classes.nchars, .nch_classes)
(.nranges, .nequivs, .ncoll_elems, struct dfa.cindex, .calloc, .tindex)
(.talloc, .depth, .nleaves, .nregexps, .nmultibyte_prop, .nmbcsets):
(.mbcsets_alloc): Now size_t, not int.
(dfa_state.first_end): Now token, not int.
(state_num): New type.
(struct mb_char_classes.cset): Now ptrdiff_t, not int.
(struct dfa.utf8_anychar_classes): Now token[5], not int[5].
(struct dfa.sindex, .salloc, .tralloc): Now state_num, not int.
(struct dfa.trans, .realtrans, .fails): Now state_num **, not int **.
(struct dfa.newlines): Now state_num *, not int *.
(prtok): Don't assume 'token' is no wider than int.
(lexleft, parens, depth): Now size_t, not int.
(charclass_index, nsubtoks)
(parse_bracket_exp, addtok, copytoks, closure, insert, merge, delete)
(state_index, epsclosure, state_separate_contexts)
(dfaanalyze, dfastate, build_state, realloc_trans_if_necessary)
(transit_state_singlebyte, match_anychar, match_mb_charset)
(check_matching_with_multibyte_ops, transit_state_consume_1char)
(transit_state, dfaexec, free_mbdata, dfaoptimize, dfafree)
(freelist, enlist, addlists, inboth, dfamust):
Don't assume indexes fit in 'int'.
(lex): Avoid overflow in string-to-{hi,lo} conversions.
(dfaanalyze): Redo indexing so that it works with size_t values,
which cannot go negative.
* src/dfa.h (dfaexec): Count argument is now size_t *, not int *.
(dfastate): State numbers are now ptrdiff_t, not int.
* src/dfasearch.c: Include "intprops.h", for TYPE_MAXIMUM.
(kwset_exact_matches): Now size_t, not int.
(EGexecute): Don't assume indexes fit in 'int'.
Check for overflow before converting a ptrdiff_t to a regoff_t,
as regoff_t is narrower than ptrdiff_t in 64-bit glibc (contra POSIX).
Check for memory exhaustion in re_search rather than treating
it merely as failure to match; use xalloc_die () to report any error.
* src/kwset.c (struct trie.accepting): Now size_t, not unsigned int.
(struct kwset.words): Now ptrdiff_t, not int.
* src/kwset.h (struct kwsmatch.index): Now size_t, not int.
|
|
|
|
| |
Run "make update-copyright".
|
|
|
|
| |
Run "make update-copyright", so "make syntax-check" works in 2011.
|
|
|
|
|
|
|
|
|
| |
* src/kwset.c (kwsexec): Correct comments. This function has been
returning an offset, not a pointer, for 9 years.
Do not test for kwsmatch == NULL. All callers pass non-NULL.
(cwexec): Likewise.
* src/kwset.h (kwsexec): Mark the 4th parameter, kwsmatch, as non-NULL.
Include "arg-nonnull.h".
|
|
|
|
|
| |
* src/kwset.h: Declare kwset using an incomplete struct type.
* src/system.h (ptr_t): Remove.
|
|
|
|
|
|
|
|
|
|
|
| |
* lib/savedir.h (PARAMS): Remove definitions manually.
Remove the remaining ones via this command:
git grep -l define.PARAMS |xargs perl -ni -e '/define PARAMS/ or print'
* src/dfa.h (PARAMS): Remove definitions.
* src/system.h (PARAMS): Likewise.
Remove most uses with this:
git grep -lw PARAMS |xargs perl -pi -e 's/\bPARAMS *\((.*)\);/$1;/'
Remove the remainder manually.
|
|
|
|
|
|
| |
Use this command:
git ls-files |grep -vE '^(\..*|COPYING|gnulib)$' |xargs \
env UPDATE_COPYRIGHT_USE_INTERVALS=1 build-aux/update-copyright
|
| |
|
| |
|
|
|
|
|
| |
* 78 files: Update FSF's civic address, zip code,
and citizen relocation code.
|
|
|
|
|
| |
* src/kwset.h (kwsincr, kwsprep): Change return type to 'const char *'.
* src/kwset.c (kwsincr, kwsprep): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we know that the input is always terminated by a
newline before the matching algorithms see it, clean up the
matching algorithms so that they no longer need to modify the
input by inserting a sentinel newline, and no longer worry
about running off the end of the buffer due to a missing sentinel.
* src/grep.c (nlscan, prpending, prtext, grepbuf): Do not
worry about running off the end of the input buffer, since
it's now guaranteed to end in the sentinel newline.
* src/search.c (EGexecute, Pexecute): Likewise.
* src/dfa.c (prtok, dfasyntax, dfaparse, copy, merge, state_index,
epsclosure, dfaexec, dfacomp):
Change many instances of "T *" to "T const *", to catch
any inadvertent programming errors made during this conversion.
* src/dfa.h (dfacomp, dfaexec, dfaparse): Likewise.
* src/grep.c (struct stats.parent, long_options, grepdir,
compile, execute, fillbuf, lastnl, lastout, nlscan, prline,
prpending, prtext, grepbuf, grep, grepfile, grepdir): Likewise.
* src/grep.h (struct matcher.compile, struct matcher.execute):
Likewise.
* src/kwset.c (struct kwset.trans, kwsalloc, kwsincr, treefails,
treedelta, hasevery, treenext, bmexec, cwexec, kwsexec): Likewise.
* src/kwset.h (kwsalloc, kwsincr, kwsexec): Likewise.
* src/search.c (kwsmusts, Gcompile, Ecompile, EGexecute, Pcompile,
Pexecute): Likewise.
* src/dfa.c (dfaexec):
Use size_t, not char *, to avoid worrisome casts to convert
char const * to char *.
* src/dfa.h (dfaexec): Likewise.
* src/grep.c (execute): Likewise.
* src/grep.h (execute): Likewise.
* src/kwset.c (bmexec, cwexec, kwsexec): Likewise.
* src/kwset.h (struct kwsmatch.offset, kwsalloc, kwsincr,
kwsexec): Likewise.
* src/search.c (EGexecute, Fexecute, Pexecute): Likewise.
* src/dfa.h (_PTR_T): Depend on defined __STDC__, not __STDC__.
(PARAMS): Depend on PROTOTYPES, not __STDC__.
* src/dfa.c (dfasyntax): Last arg is unsigned char, not int.
* src/dfa.h (dfasyntax): Likewise.
* src/dfa.h (struct dfa): Remove member newlines; no longer needed.
* src/dfa.c (build_state, dfaexec, dfafree): Do not worry
about special newline state.
* src/search.c (matchers): Move definition to end of file, so
that we don't need forward decls.
(lastexact): Remove.
(kwset_exact_matches): New var; subsumes old lastexact var.
All uses changed.
* src/dfa.c (index): Remove macro.
(REALLOC_IF_NECESSARY): Skip unnecessary test.
(tstbit, setbit, clrbit): Declare arg to be unsigned, to help compiler.
(copyset, zeroset, equal): Use C builtin primitives, to help compiler.
(dfaexec): Do not modify input string.
Remove newline parameter; no longer needed.
(comsubs): Use strchr, not index.
* src/grep.h (matchers): Use fixed name size, not pointer (as
there's no need for the extra flexibility). All uses changed.
* src/kwset.h (struct kwsmatch.offset): Renamed from beg, with
change of type to size_t. All uses changed.
* src/grep.c (reset): No longer need kludge for dfaexec. Simplify.
(reset, grepbuf): Adjust to new interface for 'execute'.
(install_matcher): List is now terminated by null compile,
not null name.
Do not invoke setrlimit if that wouldn't change the limit.
* src/dfa.c (xcalloc, xmalloc, xrealloc, prtok, tstbit, setbit,
clrbit, copyset, zeroset, notset, equal, charclass_index,
looking_at, lex, addtok, atom, nsubtoks, copytoks, closure,
branch, regexp, copy, insert, merge, delete, state_index,
build_state, build_state_zero, icatalloc, icpyalloc, istrstr,
ifree, freelist, enlist, comsubs, addlists, inboth):
Remove forward decls; no longer needed.
* src/grep.c (ck_atoi, usage, error, setmatcher,
install_matcher, prepend_args, prepend_default_options,
page_alloc, reset, fillbuf, grepbuf, prtext, prpending, prline,
print_offset_sep, nlscan, grep, grepfile): Likewise.
* src/kwset.c (enqueue, treefails, treedelta, hasevery,
treenext, bmexec, cwexec): Likewise.
* src/search.c (Gcompile, Ecompile, EGexecute, Fcompile, Fexecute,
Pcompile, Pexecute, kwsinit): Likewise.
* src/search.c (Pcompile): Do not assume newly allocated
storage is zeroed.
|
|
|