| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
* src/pcresearch.c (struct pcre_comp, jit_exec, Pexecute):
Prefer signed to unsigned types when either will do.
(jit_exec): Use INT_MULTIPLY_WRAPV instead of doing it by hand.
(Pexecute): Omit line length limit test that is no longer
needed with PCRE2.
|
|
|
|
|
|
|
| |
* src/pcresearch.c (bad_utf8_from_pcre2): New function. Fix bug
where PCRE2_ERROR_UTF8_ERR1 was not treated as an encoding error.
Improve performance when PCRE2_MATCH_INVALID_UTF is defined.
(Pexecute): Use it.
|
|
|
|
|
| |
* src/pcresearch.c (Pcompile): Improve comments re
pcre2_get_error_message buffer.
|
|
|
|
|
| |
* src/pcresearch.c (jit_exec): Remove arbitrary INT_MAX limit on JIT
stack size.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly a bug by bug translation of the original code to the PCRE2 API.
Code still could do with some optimizations but should be good as a
starting point.
The API changes the sign of some types and therefore some ugly casts
were needed, some of the changes are just to make sure all variables
fit into the newer types better.
Includes backward compatibility and could be made to build all the way
to 10.00, but assumes a recent enough version and has been tested with
10.23 (from CentOS 7, the oldest).
Performance seems equivalent, and it also seems functionally complete.
* m4/pcre.m4 (gl_FUNC_PCRE): Check for PCRE2, not the original PCRE.
* src/pcresearch.c (struct pcre_comp, jit_exec)
(Pcompile, Pexecute):
Use PCRE2, not the original PCRE.
* tests/filename-lineno.pl: Adjust to match PCRE2 diagnostics.
|
| |
|
|
|
|
| |
* tests/pcre-context: Initialize ‘fail’ earlier.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Included in the original bug #20957, but corrupted somehow in
transit as the required NUL characters are missing.
Add a simpler version of the test case that uses plain characters
and match the -z data and output to show the equivalence.
Note the output is still not correct as it is missing the expected
LF characters, but a full fix will have to wait until PCRE2.
Fixes Bug#51735.
|
|
|
|
|
|
| |
Problem reported by Carlo Marcelo Arenas Belón (Bug#51710).
* src/pcresearch.c (jit_exec): Don’t attempt to grow the JIT stack
over INT_MAX - 8 * 1024.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
* doc/grep.texi (Basic vs Extended, Performance):
Document limitations of interval expressions (Bug#44538).
|
|
|
|
| |
* src/system.h: Update decls to match current Gnulib.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves runtime checking for integer overflow when compiling
with gcc -fsanitize=undefined and the like. It also avoids
the need for some integer casts, which can be error-prone.
* bootstrap.conf (gnulib_modules): Add idx.
* src/dfasearch.c (struct dfa_comp, kwsmusts):
(possible_backrefs_in_pattern, regex_compile, GEAcompile)
(EGexecute):
* src/grep.c (struct patloc, patlocs_allocated, patlocs_used)
(n_patterns, update_patterns, pattern_file_name, poison_len)
(asan_poison, fwrite_errno, compile_fp_t, execute_fp_t)
(buf_has_encoding_errors, buf_has_nulls, file_must_have_nulls)
(bufalloc, pagesize, all_zeros, fillbuf, nlscan)
(print_line_head, print_line_middle, print_line_tail, grepbuf)
(grep, contains_encoding_error, fgrep_icase_available)
(fgrep_icase_charlen, fgrep_to_grep_pattern, try_fgrep_pattern)
(main):
* src/kwsearch.c (struct kwsearch, Fcompile, Fexecute):
* src/kwset.c (struct trie, struct kwset, kwsalloc, kwsincr)
(kwswords, treefails, memchr_kwset, acexec_trans, kwsexec)
(treedelta, kwsprep, bm_delta2_search, bmexec_trans, bmexec)
(acexec):
* src/kwset.h (struct kwsmatch):
* src/pcresearch.c (Pcompile, Pexecute):
* src/search.h (mb_clen):
* src/searchutils.c (kwsinit, mb_goback, wordchars_count)
(wordchars_size, wordchar_next, wordchar_prev):
Prefer idx_t to size_t or ptrdiff_t for nonnegative sizes,
and prefer ptrdiff_t to size_t for sizes plus error values.
* src/grep.c (uword_size): New constant, used for signed
size calculations.
(totalnl, add_count, totalcc, print_offset, print_line_head, grep):
Prefer intmax_t to uintmax_t for wide integer calculations.
(fgrep_icase_charlen): Prefer ptrdiff_t to int for size offsets.
* src/grep.h: Include idx.h.
* src/search.h (imbrlen): New function, like mbrlen except
with idx_t and ptrdiff_t.
|
|
|
|
|
|
|
|
| |
* src/searchutils.c (mb_goback): When scanning backward through
UTF-8, check the length implied by the putative byte 1 before
bothering to invoke mb_clen. This length check also lets us use
mbrlen directly rather than calling mb_clen, which would
eventually defer to mbrlen anyway.
|
|
|
|
|
|
|
| |
* src/searchutils.c (mb_goback): Set *MBCLEN only in
non-UTF-8 encodings, since that’s the only time it’s needed,
and this lets us see more clearly that the UTF-8 clen value
is not useful to the caller.
|
|
|
|
|
| |
* src/searchutils.c (wordchar_prev): Tweak performance by using a
value already in a local variable rather than consulting a table.
|
|
|
|
|
|
| |
* src/searchutils.c (mb_goback): Improve the comment to better
describe this confusing function. And remove an unnecessary
test of cur vs end.
|
|
|
|
| |
* src/kwset.c (struct kwset.maxd): Remove. All uses removed.
|
|
|
|
|
|
|
|
|
| |
This helps move the code away from unsigned types.
* src/grep.c (buf_has_encoding_errors, contains_encoding_error):
* src/searchutils.c (mb_goback):
Compare to MB_LEN_MAX, not to (size_t) -2. This is a bit safer
anyway, as grep relies on MB_LEN_MAX limits elsewhere.
* src/search.h (mb_clen): Compare to -2 before converting to size_t.
|
|
|
|
|
|
| |
* tests/mb-non-UTF8-perf-Fw: Use head -n 10000000 rather than the
work-alike sed command. This provides a 4x speedup and saves 0.5s.
* tests/null-byte: Likewise.
|
|
|
|
|
|
|
|
| |
Inspired by bug#50129 even though this is a different bug.
* src/grep.c (main): For ‘-f -’, use clearerr (stdin) after
reading, so that ‘grep -f - -f -’ reads stdin twice even
when stdin is a tty. Also, for ‘-f FILE’, report any
I/O error when closing FILE.
|
|
|
|
|
| |
* tests/mb-non-UTF8-perf-Fw: Prefer ‘sed 10q’ to ‘head -10’,
which doesn’t conform to POSIX.
|
|
|
|
|
| |
Problem reported by Alex Murray (bug#50093).
* src/grep.c (hash_pattern): Use a nonzero initial value.
|
|
|
|
|
|
|
|
| |
* doc/grep.texi (General Output Control, Basic vs Extended):
No need to complicate the portability advice by talking about 7th
edition grep, since it’s no longer a practical porting target.
Instead, mention only Solaris 10 grep, the last practical holdout
of somewhat-traditional grep.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS: Mention this (see bug#49996).
* doc/Makefile.am (egrep.1 fgrep.1): Remove. All uses removed.
* doc/grep.in.1, doc/grep.texi (grep Programs):
Remove documentation for egrep, fgrep.
* doc/grep.texi (Usage): Add FAQ for egrep and fgrep.
* src/Makefile.am (shell_does_substrings): Substitute for ${0##*/},
not for ${0%/\*} (which was not being used anyway).
* src/egrep.sh: Issue an obsolescence warning.
* tests/fedora: Use "grep -F" instead of "fgrep" in diagnostics,
as this tests "grep -F" not "fgrep".
|
| |
|
|
|
|
|
|
| |
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
|
|
|
|
| |
* NEWS: Record release date.
|
|
|
|
|
|
|
|
| |
...so we can continue to use seq, but the wrapper when needed.
* tests/init.cfg (seq): Some systems lask seq.
Provide a replacement.
* tests/hash-collision-perf: Use seq once again.
* tests/long-pattern-perf: Likewise. And remove a comment about seq.
|
|
|
|
|
| |
* src/dfasearch.c (EGexecute): Remove a label and goto.
This also makes the machine code a bit shorter, on x86-64 gcc.
|
|
|
|
| |
* src/grep.c (fillbuf): Simplify movement of saved data.
|
|
|
|
|
|
| |
* src/grep.c (ALIGN_TO): When converting pointers to unsigned
integers, convert to uintptr_t not size_t, as size_t in theory
might be too narrow.
|
|
|
|
|
|
|
| |
Portability problem reported by Dagobert Michelsen in:
https://lists.gnu.org/r/grep-devel/2021-08/msg00004.html
* tests/hash-collision-perf, tests/long-pattern-perf:
Don’t assume seq is installed; use awk instead.
|
| |
|
| |
|
|
|
|
|
| |
* src/grep.c (usage): Document --group-separator
and --no-group-separator.
|
|
|
|
|
|
| |
* doc/grep.in.1:
Add copy of docs for --group-separator from doc/grep.texi.
Add copy of docs for --no-group-separator from doc/grep.texi.
|
| |
|
|
|
|
| |
* doc/grep.in.1 (-H): Mention that this is a GNU extension.
|
| |
|
| |
|
|
|
|
|
| |
* doc/grep.texi (The Backslash Character and Special Expressions)
(Usage): Improve doc (Bug#48948).
|
|
|
|
|
|
|
|
|
|
| |
* doc/grep.texi (-L): Remove erroneous sentence about stopping early.
With -L, grep cannot stop scanning early.
(-l): Tweak existing wording.
* doc/grep.in.1: Remove the -L sentence here, too.
(-l): Copy the sentence from grep.texi, to clarify: it's only per-file
scanning that stops upon match. Reported by Robert Bruntz
in http://debbugs.gnu.org/46179
|
|
|
|
|
| |
* configure.ac (GNULIB_TEST_WARN_CFLAGS): Add
-Woverlength-strings to avoid clang warnings.
|
|
|
|
|
|
| |
* doc/grep.texi (Fundamental Structure)
(Back-references and Subexpressions, Basic vs Extended):
Further clarifications.
|
| |
|
| |
|