summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* grep: use ximalloc, not xcallocPaul Eggert2021-11-141-1/+3
| | | | | | * src/pcresearch.c (Pcompile): Use ximalloc, not xcalloc, and explicitly initialize the two slots that should be null. This is more likely to catch future errors if we use valgrind.
* grep: improve memory exhaustion checking with -PPaul Eggert2021-11-141-19/+31
| | | | | | | | | | * src/pcresearch.c (struct pcre_comp): New member gcontext. (private_malloc, private_free): New functions. (jit_exec): It is OK to call pcre2_jit_stack_free (NULL), so simplify. Use gcontext for allocation. Check for pcre2_jit_stack_create failure, since sljit bypasses private_malloc. Redo to avoid two ‘continue’s. (Pcompile): Create and use gcontext.
* grep: simplify JIT setupPaul Eggert2021-11-141-5/+3
| | | | * src/pcresearch.c (Pcompile): Simplify since ‘die’ cannot return.
* grep: use PCRE2_EXTRA_MATCH_LINEPaul Eggert2021-11-141-24/+30
| | | | | | * src/pcresearch.c (Pcompile): If available, use PCRE2_EXTRA_MATCH_LINE instead of doing it by hand. Simplify construction of substitute regular expression.
* grep: prefer signed integersPaul Eggert2021-11-141-13/+11
| | | | | | | | * src/pcresearch.c (struct pcre_comp, jit_exec, Pexecute): Prefer signed to unsigned types when either will do. (jit_exec): Use INT_MULTIPLY_WRAPV instead of doing it by hand. (Pexecute): Omit line length limit test that is no longer needed with PCRE2.
* grep: speed up, fix bad-UTF8 check with -PPaul Eggert2021-11-141-2/+14
| | | | | | | * src/pcresearch.c (bad_utf8_from_pcre2): New function. Fix bug where PCRE2_ERROR_UTF8_ERR1 was not treated as an encoding error. Improve performance when PCRE2_MATCH_INVALID_UTF is defined. (Pexecute): Use it.
* grep: improve pcre2_get_error_message commentsPaul Eggert2021-11-141-2/+3
| | | | | * src/pcresearch.c (Pcompile): Improve comments re pcre2_get_error_message buffer.
* grep: Don’t limit jitstack_max to INT_MAXPaul Eggert2021-11-141-1/+7
| | | | | * src/pcresearch.c (jit_exec): Remove arbitrary INT_MAX limit on JIT stack size.
* maint: minor rewording and reindentingPaul Eggert2021-11-141-22/+22
|
* grep: migrate to pcre2Carlo Marcelo Arenas Belón2021-11-141-129/+120
| | | | | | | | | | | | | | | | | | | | | | Mostly a bug by bug translation of the original code to the PCRE2 API. Code still could do with some optimizations but should be good as a starting point. The API changes the sign of some types and therefore some ugly casts were needed, some of the changes are just to make sure all variables fit into the newer types better. Includes backward compatibility and could be made to build all the way to 10.00, but assumes a recent enough version and has been tested with 10.23 (from CentOS 7, the oldest). Performance seems equivalent, and it also seems functionally complete. * m4/pcre.m4 (gl_FUNC_PCRE): Check for PCRE2, not the original PCRE. * src/pcresearch.c (struct pcre_comp, jit_exec) (Pcompile, Pexecute): Use PCRE2, not the original PCRE. * tests/filename-lineno.pl: Adjust to match PCRE2 diagnostics.
* grep: work around PCRE bugPaul Eggert2021-11-091-1/+4
| | | | | | Problem reported by Carlo Marcelo Arenas Belón (Bug#51710). * src/pcresearch.c (jit_exec): Don’t attempt to grow the JIT stack over INT_MAX - 8 * 1024.
* build: update gnulib submodule to latestPaul Eggert2021-08-271-2/+2
| | | | * src/system.h: Update decls to match current Gnulib.
* grep: prefer signed to unsigned integersPaul Eggert2021-08-259-277/+293
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This improves runtime checking for integer overflow when compiling with gcc -fsanitize=undefined and the like. It also avoids the need for some integer casts, which can be error-prone. * bootstrap.conf (gnulib_modules): Add idx. * src/dfasearch.c (struct dfa_comp, kwsmusts): (possible_backrefs_in_pattern, regex_compile, GEAcompile) (EGexecute): * src/grep.c (struct patloc, patlocs_allocated, patlocs_used) (n_patterns, update_patterns, pattern_file_name, poison_len) (asan_poison, fwrite_errno, compile_fp_t, execute_fp_t) (buf_has_encoding_errors, buf_has_nulls, file_must_have_nulls) (bufalloc, pagesize, all_zeros, fillbuf, nlscan) (print_line_head, print_line_middle, print_line_tail, grepbuf) (grep, contains_encoding_error, fgrep_icase_available) (fgrep_icase_charlen, fgrep_to_grep_pattern, try_fgrep_pattern) (main): * src/kwsearch.c (struct kwsearch, Fcompile, Fexecute): * src/kwset.c (struct trie, struct kwset, kwsalloc, kwsincr) (kwswords, treefails, memchr_kwset, acexec_trans, kwsexec) (treedelta, kwsprep, bm_delta2_search, bmexec_trans, bmexec) (acexec): * src/kwset.h (struct kwsmatch): * src/pcresearch.c (Pcompile, Pexecute): * src/search.h (mb_clen): * src/searchutils.c (kwsinit, mb_goback, wordchars_count) (wordchars_size, wordchar_next, wordchar_prev): Prefer idx_t to size_t or ptrdiff_t for nonnegative sizes, and prefer ptrdiff_t to size_t for sizes plus error values. * src/grep.c (uword_size): New constant, used for signed size calculations. (totalnl, add_count, totalcc, print_offset, print_line_head, grep): Prefer intmax_t to uintmax_t for wide integer calculations. (fgrep_icase_charlen): Prefer ptrdiff_t to int for size offsets. * src/grep.h: Include idx.h. * src/search.h (imbrlen): New function, like mbrlen except with idx_t and ptrdiff_t.
* grep: scan back thru UTF-8 a bit fasterPaul Eggert2021-08-241-6/+13
| | | | | | | | * src/searchutils.c (mb_goback): When scanning backward through UTF-8, check the length implied by the putative byte 1 before bothering to invoke mb_clen. This length check also lets us use mbrlen directly rather than calling mb_clen, which would eventually defer to mbrlen anyway.
* grep: tweak mb_goback performancePaul Eggert2021-08-241-5/+11
| | | | | | | * src/searchutils.c (mb_goback): Set *MBCLEN only in non-UTF-8 encodings, since that’s the only time it’s needed, and this lets us see more clearly that the UTF-8 clen value is not useful to the caller.
* grep: tweak wordchar_prev performancePaul Eggert2021-08-241-2/+1
| | | | | * src/searchutils.c (wordchar_prev): Tweak performance by using a value already in a local variable rather than consulting a table.
* grep: tweak mb_goback and comment it betterPaul Eggert2021-08-241-13/+30
| | | | | | * src/searchutils.c (mb_goback): Improve the comment to better describe this confusing function. And remove an unnecessary test of cur vs end.
* grep: omit unused maxd memberPaul Eggert2021-08-241-4/+0
| | | | * src/kwset.c (struct kwset.maxd): Remove. All uses removed.
* grep: avoid some size_t castsPaul Eggert2021-08-243-10/+10
| | | | | | | | | This helps move the code away from unsigned types. * src/grep.c (buf_has_encoding_errors, contains_encoding_error): * src/searchutils.c (mb_goback): Compare to MB_LEN_MAX, not to (size_t) -2. This is a bit safer anyway, as grep relies on MB_LEN_MAX limits elsewhere. * src/search.h (mb_clen): Compare to -2 before converting to size_t.
* grep: avoid sticky problem with ‘-f - -f -’Paul Eggert2021-08-211-6/+11
| | | | | | | | Inspired by bug#50129 even though this is a different bug. * src/grep.c (main): For ‘-f -’, use clearerr (stdin) after reading, so that ‘grep -f - -f -’ reads stdin twice even when stdin is a tty. Also, for ‘-f FILE’, report any I/O error when closing FILE.
* grep: djb2 correctionPaul Eggert2021-08-181-1/+9
| | | | | Problem reported by Alex Murray (bug#50093). * src/grep.c (hash_pattern): Use a nonzero initial value.
* egrep, fgrep: now obsoletePaul Eggert2021-08-162-2/+4
| | | | | | | | | | | | | * NEWS: Mention this (see bug#49996). * doc/Makefile.am (egrep.1 fgrep.1): Remove. All uses removed. * doc/grep.in.1, doc/grep.texi (grep Programs): Remove documentation for egrep, fgrep. * doc/grep.texi (Usage): Add FAQ for egrep and fgrep. * src/Makefile.am (shell_does_substrings): Substitute for ${0##*/}, not for ${0%/\*} (which was not being used anyway). * src/egrep.sh: Issue an obsolescence warning. * tests/fedora: Use "grep -F" instead of "fgrep" in diagnostics, as this tests "grep -F" not "fgrep".
* doc: update cites and authorsPaul Eggert2021-08-141-15/+3
|
* grep: simplify EGexecutePaul Eggert2021-08-091-2/+1
| | | | | * src/dfasearch.c (EGexecute): Remove a label and goto. This also makes the machine code a bit shorter, on x86-64 gcc.
* grep: simplify data movement slightlyPaul Eggert2021-08-091-11/+5
| | | | * src/grep.c (fillbuf): Simplify movement of saved data.
* grep: pointer-integer cast nitPaul Eggert2021-08-091-2/+2
| | | | | | * src/grep.c (ALIGN_TO): When converting pointers to unsigned integers, convert to uintptr_t not size_t, as size_t in theory might be too narrow.
* doc: usage: --group-separator/--no-group-separatorKevin Locke2021-08-061-0/+2
| | | | | * src/grep.c (usage): Document --group-separator and --no-group-separator.
* maint: run "make update-copyright"Paul Eggert2021-01-0112-12/+12
|
* maint: add parentheses to avoid new clang-10 warningJim Meyering2020-12-301-1/+1
| | | | | * src/dfasearch.c (regex_compile): Parenthesize arith-OR vs ternary, to placate clang-10.
* grep: use of --unix-byte-offsets (-u) now elicits a warningJim Meyering2020-12-251-2/+2
| | | | | | | * NEWS (Change in behavior): Mention this. * src/grep.c (main): Warn about each use of obsolete --unix-byte-offsets (-u). * doc/grep.in.1 (-u): Remove its documentation.
* grep: avoid performance regression with many patternsJim Meyering2020-11-261-2/+3
| | | | | | | | | | * src/grep.c (hash_pattern): Switch from PJW to DJB2, to avoid an O(N) to O(N^2) performance regression due to hash collisions with patterns from e.g., seq 500000|tr 0-9 A-J Reported by Frank Heckenbach in https://bugs.gnu.org/44754 * NEWS (Bug fixes): Mention it. * tests/hash-collision-perf: New file. * tests/Makefile.am (TESTS): Add it.
* build: update gnulib to latest for warning fixesJim Meyering2020-11-261-1/+1
| | | | | | | * gnulib: Update submodule to latest. * src/grep.c (printf_errno): Reflect gnulib's renaming: change _GL_ATTRIBUTE_FORMAT_PRINTF to _GL_ATTRIBUTE_FORMAT_PRINTF_STANDARD
* grep: remove GREP_OPTIONSPaul Eggert2020-11-031-67/+2
| | | | | | | | | | | | * NEWS: Mention this. * doc/grep.in.1: Remove GREP_OPTIONS documentation. * doc/grep.texi (Environment Variables): Move GREP_OPTIONS stuff into a “no longer implemented” paragraph. * src/grep.c (prepend_args, prepend_default_options): Remove. (main): Do not look at GREP_OPTIONS. * tests/Makefile.am (TESTS_ENVIRONMENTS): * tests/init.cfg (vars_): Remove GREP_OPTIONS.
* grep: use RE_NO_SUB when calling regex solely to check syntaxNorihiro Tanaka2020-11-011-4/+12
| | | | | | | * src/dfasearch.c (regex_compile): New parameter. All callers changed. (GEAcompile): Move setting syntax for regex into regex_compile() function. This addresses a performance problem exposed by extreme regular expressions, as described in https://bugs.gnu.org/43862 .
* grep: -P: report input filename upon PCRE execution failureJim Meyering2020-10-113-6/+12
| | | | | | | | | | | | Without this, it could be tedious to determine which input file evokes a PCRE-execution-time failure. * src/pcresearch.c (Pexecute): When failing, include the error-provoking file name in the diagnostic. * src/grep.c (input_filename): Make extern, since used above. * src/search.h (input_filename): Declare. * tests/filename-lineno.pl: Test for this. ($no_pcre): Factor out. * NEWS (Bug fixes): Mention this.
* grep: minor kwset cleanupsPaul Eggert2020-10-113-36/+20
| | | | | | | | | * src/kwsearch.c (Fexecute): Assume C99 to put declarations nearer uses. * src/kwset.c (bmexec): Omit unnecessary test. * src/kwset.h (struct kwsmatch): Make OFFSET and SIZE individual elements, not arrays of size 1 (a revenant of an earlier API). All uses changed.
* grep: remove unused codeNorihiro Tanaka2020-10-111-47/+0
| | | | | * src/kwsearch.c (Fcompile, Fexecute): Remove unused code. No longer these are used after commit 016e590a8198009bce0e1078f6d4c7e037e2df3c.
* grep: pacify Sun C 5.15Paul Eggert2020-09-231-1/+1
| | | | | | | This suppresses a false alarm '"grep.c", line 720: warning: initializer will be sign-extended: -1'. * src/grep.c (uword_max): New static constant. (initialize_unibyte_mask): Use it.
* grep: fix more Turkish-eyes bugsPaul Eggert2020-09-232-53/+86
| | | | | | | | | | | | | | | | | Fix more bugs recently uncovered by Norihiro Tanaka (Bug#43577). * NEWS: Mention new bug report. * src/grep.c (ok_fold): New static var. (setup_ok_fold): New function. (fgrep_icase_charlen): Reject single-byte characters if they match some multibyte characters when ignoring case. This part of the patch is partly derived from <https://bugs.gnu.org/43577#14>, which means it is: Co-authored-by: Norihiro Tanaka <noritnk@kcn.ne.jp> (main): Call setup_ok_fold if ok_fold might be needed. * src/searchutils.c (kwsinit): With the grep.c changes, this code can now revert to classic 7th Edition Unix style; aborting would be wrong. * tests/turkish-eyes: Add tests for these bugs.
* grep: fix recently-introduced performance glitchPaul Eggert2020-09-231-1/+0
| | | | | | * src/grep.c (main): Do not double-increment update_patterns. update_patterns increments n_patterns now; do not increment it again, as the incorrect count would hurt performance heuristics later.
* grep: avoid unnecessary regex compilationNorihiro Tanaka2020-09-225-23/+28
| | | | | | | | | | | | | | | | | | | Grep resorts to using the regex engine when the precision of either -o or --color is required, or when the pattern is not supported by our DFA engine (e.g., backref). Otherwise, grep would perform regex compilation solely to check the syntax. This change makes grep skip that compilation in the common case for which it is unnecessary. The compilation we are avoiding is quite costly, consuming O(N^2) RSS for N regular expressions. * src/dfasearch.c (GEAcompile): Add new argument, and avoid unneeded compilation of regex. * src/grep.c (compile_fp_t): Update prototype. (main): Update caller. * src/kwsearch.c (Fcompile): Update caller and add new argument. * src/pcresearch.c (Pcompile): Add new argument. * src/search.h (GEAcompile, Fcompile, Pcompile): Update prototype.
* * src/dfasearch.c (struct dfa_comp): Fix out-of-date comment.Paul Eggert2020-09-181-1/+1
|
* grep: "grep '\)'" reports an error againPaul Eggert2020-09-181-0/+6
| | | | | | | * src/grep.c (try_fgrep_pattern): With -G, pass \) through to GEAcompile so that it can complain. This fixes an unexpected change in behavior from grep 3.4 and earlier. * tests/filename-lineno.pl: Add tests for this sort of thing.
* grep: tweak by using mempcpyPaul Eggert2020-09-181-4/+2
| | | | | * src/grep.c (try_fgrep_pattern): Tweak previous change by using mempcpy.
* grep: make echo .|grep '\.' match once againJim Meyering2020-09-181-0/+3
| | | | | | | | | | | | | | The same applied for many other backslash-escaped bytes, not just metacharacters. The switch to rawmemchr in v3.4-almost-10-g9393b97 made some parts of the code require the usually-guaranteed newline sentinel at the end of each pattern. Before, some consumers used a (correct) pattern length and did not care that try_fgrep_pattern could transform a pattern (with sentinel) like "\\.\n" to "..\n", thus violating that assumption. * src/grep.c (try_fgrep_pattern): Preserve the invariant that each regexp is newline-terminated. * tests/backslash-dot: New file. Test for this. * tests/Makefile.am (TESTS): Add it.
* grep: be more consistent about diagnostic formatPaul Eggert2020-09-181-6/+3
| | | | | | | | | | * NEWS: Mention this. * bootstrap.conf (gnulib_modules): Remove 'quote'. * src/grep.c: Do not include quote.h. (grep, grepdirent, grepdesc): Put the three unusual diagnostics into the same "grep: FOO: message" form that grep uses elsewhere. * tests/binary-file-matches, tests/in-eq-out-infloop: Adjust tests to match new diagnostic format.
* maint: avoid syntax-check failureJim Meyering2020-09-171-1/+1
| | | | | | | | * src/grep.c (grep): Lower-case the "B" in "Binary file... matches" diagnostic that we now emit to stderr. This avoids the following when running "make syntax-check": maint.mk: found capitalized error message make: *** [maint.mk:469: sc_error_message_uppercase] Error 1
* Send "Binary file FOO matches" to stderrPaul Eggert2020-09-171-6/+2
| | | | | | | | | | | * NEWS, doc/grep.texi: Mention this change (Bug#29668). * src/grep.c (grep): Send "Binary file FOO matches" to stderr instead of stdout. * tests/encoding-error, tests/invalid-multibyte-infloop: * tests/null-byte, tests/pcre-count, tests/surrogate-pair: * tests/symlink, tests/unibyte-binary: Adjust tests to match new behavior. In all cases this simplifies the tests, which is a good sign.
* Suppress "Binary file FOO matches" if -IPaul Eggert2020-09-171-2/+3
| | | | | | | Problem reported by Jason Franklin (Bug#33552). * NEWS: Mention this. * src/grep.c (grep): Do not output "Binary file FOO matches" if -I. * tests/encoding-error: Add test for this bug.
* grep: fix logic for growing PCRE JIT stackPaul Eggert2020-09-091-6/+8
| | | | | | | * src/pcresearch.c (jit_exec) [PCRE_EXTRA_MATCH_LIMIT_RECURSION]: When growing the match_limit_recursion limit, do not use the old value if ! (flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION), as it is uninitialized in that case.