summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* pcre: work around a PCRE2_MATCH_INVALID_UTF bugCarlo Marcelo Arenas Belón2023-04-301-12/+17
| | | | | | | | | | | | | | PCRE2 has a bug when using PCRE2_MATCH_INVALID_UTF: it would sometimes fail to match patterns using negative classes like \W and \D. * NEWS (Bug fixes): Mention it. * src/pcre2search.c: Restrict impact of the bug. Do not use the problematic flag with broken versions of PCRE2. Also, generate locale tables only for single-byte locales, as the PCRE2 documentation recommends this. * tests/Makefile.am (TESTS): Add the file name * tests/pcre-utf8-bug224: New file, to test for this.
* grep: make -P survive JIT compilation failurePaul Eggert2023-04-131-3/+3
| | | | | * src/pcresearch.c (Pcompile): Ignore failure returns from pcre2_jit_compile.
* grep: improve PCRE2 version outputPaul Eggert2023-04-103-9/+11
| | | | | | | | | | * src/grep.c: No need to include pcre2.h. (main) [HAVE_LIBPCRE]: Call Pprint_version instead of doing it ourselves. * src/pcresearch.c (Pprint_version): New function. It also checks belatedly for buffer overflow, and says "grep -P uses PCRE2" instead of "Built with PCRE". * tests/version-pcre: Adjust test to match.
* grep: --version: print pcre version infoJim Meyering2023-04-091-0/+11
| | | | | | | | | | PCRE is integral to the functioning of grep's -P option, so it is in our interest to make it easy to see which version of PCRE grep uses. * src/grep.c [HAVE_LIBPCRE]: Include <pcre2.h>. [HAVE_LIBPCRE] (main): Print pcre version info. * tests/version-pcre: New test for this. * tests/Makefile.am (TESTS): Add the file name. * NEWS (Changes in behavior): Mention it.
* grep: fix -P [\d] by fixing \w only if PCRE2 10.43Paul Eggert2023-04-021-86/+11
| | | | | | | | | | | | | | Our prepass-based fixes for the -P \d bug have caused repeated further bugs. Avoid the need for a prepass, by using PCRE2_UCP only if PCRE2_EXTRA_ASCII_BSD is also supported. Since the -P \w bug was present from grep 2.5 through 3.8 it’s OK if we wait a little longer to fix it. * NEWS: Mention this. * src/pcresearch.c (pcre_pattern_expand_backslash_d}: Remove. Remove its use. (Pcompile): Use PCRE2_UCP only if PCRE2_EXTRA_ASCII_BSD. * tests/pcre-ascii-digits, tests/pcre-utf8-w: Skip tests on older PCRE2 implementations.
* grep: -P (--perl-regexp) \D once again works like [^0-9]Jim Meyering2023-03-191-3/+11
| | | | | | | | | | | * NEWS: Mention \D, too. * doc/grep.texi: Likewise * src/pcresearch.c (pcre_pattern_expand_backslash_d): Handle \D. Also, ifdef-out this new function and its call site when not needed. * tests/pcre-ascii-digits: Test \D, too. Tighten one test by using returns_ 1. Add comments and tests that work only with 10.43 and newer. Paul Eggert raised the issue of \D in https://bugs.gnu.org/62267#8
* grep: forward port to PCRE2 10.43Paul Eggert2023-03-192-84/+88
| | | | | | | | | | | * doc/grep.texi: Document this. * src/grep.c: Move recent changes into pcresearch.c. (P_MATCHER_INDEX): Remove. (pcre_pattern_expand_backslash_d): Move from here ... * src/pcresearch.c: ... to here. (PCRE2_EXTRA_ASCII_BSD): Default to 0. (Pcompile): Use PCRE2_EXTRA_ASCII_BSD if available, and expand \d to [0-9] otherwise.
* grep: -P (--perl-regexp) \d: match only ASCII digitsJim Meyering2023-03-181-1/+81
| | | | | | | | | | | | | | | | Prior to grep-3.9, the PCRE matcher had always treated \d just like [0-9]. grep-3.9's fix for \w and \b mistakenly relaxed \d to also match multibyte digits. * src/grep.c (P_MATCHER_INDEX): Define enum. (pcre_pattern_expand_backslash_d): New function. (main): Call it for -P. * NEWS (Bug fixes): Mention it. * doc/grep.texi: Document it: with -P, \d matches only ASCII digits. Provide a PCRE documentation URL and an example of how to use (?s) with -z. * tests/pcre-ascii-digits: New test. * tests/Makefile.am (TESTS): Add that file name. Reported as https://bugs.gnu.org/62267
* maint: prefer https: to git:Jim Meyering2023-02-041-1/+1
| | | | | | | | | | | | | | | | | | | | The idea is to defend against some adversary-in-the-middle attacks. Also prefer git.savannah.gnu.org over its shorter alias, git.sv.gnu.org to avoid a warning e.g., from git clone. Also, drop any final ".git" suffix on the resulting URIs. Inspired by Paul Eggert's nearly identical changes to coreutils. Induced by running these commands: git grep -l 'git clone git:'|xargs perl -pi -e \ 's{(git clone) git://(\S+)/([^/]+)\b}{$1 https://$2/git/$3}' git grep -l git.sv.gn \ |xargs perl -pi -e 's{git\.sv\.gnu}{git\.savannah\.gnu}' perl -pi -e \ 's{(url =) git://(\S+)/([^/.]+)(\.git)?\b}{$1 https://$2/git/$3}'\ .gitmodules * .gitmodules: As above. * HACKING: Likewise. * README-hacking: Likewise. * src/grep.c (main): Likewise.
* maint: stop including getprogname.hPaul Eggert2023-01-211-1/+0
| | | | | | It’s obsolete in bleeding-edge Gnulib. * src/grep.c, tests/get-mb-cur-max.c: Don’t include getprogname.h. Instead, rely on stdlib.h to declare getprogname.
* grep: fix rawmemrchr etc. commentsPaul Eggert2023-01-151-2/+2
| | | | * src/grep.c: Fix comments.
* grep: diagnose no UTF-8 support (Bug#60708)Paul Eggert2023-01-121-3/+5
| | | | | * src/pcresearch.c (Pcompile): Issue a diagnostic and exit instead of misbehaving if libpcre2 does not support the requested locale.
* pcre: use UTF only when available in the libraryCarlo Marcelo Arenas Belón2023-01-111-1/+3
| | | | | | | | | | | Before this change, if linked with a PCRE library without unicode any invocations of grep when using a UTF locale will error with: grep: this version of PCRE2 does not have Unicode support * src/pcresearch.c: Check whether Unicode was compiled in. * tests/pcre-utf8-w: Add check to skip test. * tests/pcre-utf8: Update check.
* pcre: use UCP in UTF modeCarlo Marcelo Arenas Belón2023-01-071-1/+1
| | | | | | | | | | | | | This fixes a serious bug affecting word-boundary and word-constituent regular expressions when the desired match involves non-ASCII UTF8 characters. * src/pcresearch.c: Set PCRE2_UCP together with PCRE2_UTF * tests/pcre-utf8-w: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention this. * THANKS.in: Add Gro-Tsen and Karl Petterson. Reported by Gro-Tsen https://twitter.com/gro_tsen/status/1610972356972875777 via Karl Pettersson in https://github.com/PCRE2Project/pcre2/issues/185 This bug was present from grep-2.5, when --perl-regexp (-P) support was added.
* maint: update copyright datesJim Meyering2023-01-0112-12/+12
|
* maint: src/dfasearch.c: remove unnecessary re_set_syntax callJim Meyering2022-12-101-2/+0
| | | | | * src/dfasearch.c (GEAcompile): Don't call "re_set_syntax (syntax_bits)" just before regex_compile; that function does the same thing already.
* grep: bug: backref in last of multiple patternsPaul Eggert2022-12-051-13/+12
| | | | | | | | * NEWS: Mention this. * src/dfasearch.c (GEAcompile): Trim trailing newline from the last pattern, even if it has back-references and follows a pattern that lacks back-references. * tests/backref: Add test for this bug.
* maint: prefer stdckdint.h to intprops.hPaul Eggert2022-10-113-6/+9
| | | | | | | | Prefer the standard C23 ckd_* macros to Gnulib’s *_WRAPV macros. * bootstrap.conf (gnulib_modules): Add stdckdint. * src/grep.c, src/kwset.c, src/pcresearch.c: Include stdckdint.h, and prefer ckd_* to *_WRAPV. Include intprops.h only if needed.
* maint: add missing includePaul Eggert2022-10-111-0/+2
| | | | * src/pcresearch.c: Include intprops.h.
* maint: prefer C23 style for static_assertPaul Eggert2022-10-111-1/+1
| | | | | | * bootstrap.conf (gnulib_modules): Add assert-h, for static_assert. * src/dfasearch.c (regex_compile): Prefer static_assert to verify.
* Assume C23-like boolPaul Eggert2022-09-103-3/+0
| | | | | | Gnulib’s stdbool module now provides C23-like semantics, so there’s no longer any need to include stdbool.h. * src/die.h, src/grep.h, src/kwset.h: Don’t include stdbool.h.
* build: add parentheses to placate clang-14Jim Meyering2022-06-291-1/+1
| | | | | | | | * src/dfasearch.c (regex_compile): Parenthesize to avoid this warning: dfasearch.c:154:43: error: operator '?:' has lower precedence than '|'; '|' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses]
* grep: fix regex compilation memory leaksPaul Eggert2022-06-241-8/+16
| | | | | | Problem reported by Jim Meyering in: https://lists.gnu.org/r/grep-devel/2022-06/msg00012.html * src/dfasearch.c (regex_compile): Fix memory leaks when SYNTAX_ONLY.
* grep: don’t diagnose "grep '\-c'"Paul Eggert2022-06-061-1/+9
| | | | | | * src/grep.c (main): Skip past leading backslash of a pattern that begins with "\-". Inspired by a remark by Bruno Haible in: https://lists.gnu.org/r/bug-gnulib/2022-06/msg00022.html
* grep: sanity-check GREP_COLORPaul Eggert2022-05-311-1/+6
| | | | | | | | | | | | | | This patch closes a longstanding security issue with GREP_COLOR that I just noticed, where if the attacker has control over GREP_COLOR's settings the attacker can trash the victim's terminal or have 'grep' generate misleading output. For example, without the patch the shell command: GREP_COLOR="$(printf '31m\33[2J\33[31')" grep --color=always PATTERN mucks with the screen, leaving behind only the trailing part of the last matching line. With the patch, this GREP_COLOR is ignored. * src/grep.c (main): Sanity-check GREP_COLOR contents the same way GREP_COLORS values are checked, to not trash the user's terminal. This follows up the recent fix to Bug#55641.
* grep: deprecate GREP_COLORPaul Eggert2022-05-291-2/+7
| | | | | | This is to avoid confusion such as that reported by Cholden in: https://bugs.gnu.org/55641 * src/grep.c (main): Warn if GREP_COLOR has an effect.
* grep: warn about ‘(+x)’ etc.Paul Eggert2022-05-241-0/+2
| | | | | | | | | These expressions are not portable and don’t always work as expected, so warn about them. For example, “grep -E '(+)'” doesn’t act like “grep '\(\+\)'”. * src/dfasearch.c (GEAcompile): Warn about a repetition op at the start of a regular expression or subexpression, except for ‘*’ in BREs which is portable.
* grep: warn about stray backslashesPaul Eggert2022-05-231-3/+4
| | | | | | | | | | | | This papers over a problem reported by Benno Schulenberg and Tomasz Dziendzielski <https://bugs.gnu.org/39678> involving regular expressions like \a that have unspecified behavior. * src/dfasearch.c (dfawarn): Just output a warning. Don’t exit, as DFA_CONFUSING_BRACKETS_ERROR now does that for us, and we need the ability to warn without exiting to diagnose \a etc. (GEAcompile): Use new dfa options DFA_CONFUSING_BRACKETS_ERROR and DFA_STRAY_BACKSLASH_WARN.
* grep: assume POSIX.1-2017 for [:space:]Paul Eggert2022-05-211-6/+2
| | | | | | | * src/dfasearch.c (dfawarn): Always call dfaerror now, regardless of POSIXLY_CORRECT. * tests/warn-char-classes: Omit test of POSIX.1-2008 behavior, since POSIX.1-2017 allows the GNU behavior.
* grep: Remove recent PCRE2 bug workaroundsPaul Eggert2022-03-221-7/+0
| | | | | | | * src/pcresearch.c (Pcompile): Remove recent workaround for PCRE2 bugs; apparently it’s not needed. This reverts back to where things were before today. Suggested by Carlo Arenas in: https://lists.gnu.org/r/grep-devel/2022-03/msg00006.html
* grep: work around another potential PCRE2 bugPaul Eggert2022-03-221-6/+7
| | | | | | | Potential problem reported by René Scharfe in: https://lore.kernel.org/git/99b0adb6-26ba-293c-3a8f-679f59e7cb4d@web.de/T * src/pcresearch.c (Pcompile): Mimic git grep’s workarounds for PCRE2 bugs more closely; this is more conservative.
* grep: work around PCRE2 bug 2642Paul Eggert2022-03-221-0/+6
| | | | | | | | Problem reported by Carlo Arenas in: https://lists.gnu.org/r/grep-devel/2022-03/msg00004.html * src/pcresearch.c (Pcompile) [PCRE2_MATCH_INVALID_UTF]: In PCRE2 10.35 and earlier, disable start optimization if doing a caseless UTF-8 search.
* grep: very long lines no longer evoke unwarranted "memory exhausted"Jim Meyering2022-03-201-1/+1
| | | | | | | | | | | When calling xpalloc (NULL, &n, incr_min, alloc_max, 1) with nontrivial ALLOC_MAX, this must hold: N + INCR_MIN <= ALLOC_MAX. With a very long line, it did not, and grep would mistakenly fail with a report of "memory exhausted". * src/grep.c (fillbuf): When using nontrivial ALLOC_MAX, ensure it is at least N+INCR_MIN. * tests/fillbuf-long-line: New file, to test for this. * tests/Makefile.am (TESTS): Add its name.
* grep: Remove commentUlrich Eckhardt2022-02-151-1/+0
| | | | | | | | The comment was introduced in 500f07fee50ab16a70fe2946b85318020c7f4017 and relates to absent cleanup code at the end of main(), not the code following it. It relates to fallible flushing of stdout and related error handling, but even then it doesn't explain much. Copyright-paperwork-exempt: yes
* maint: make update-copyrightJim Meyering2022-01-0112-12/+12
|
* grep: -s does not suppress “binary file matches”Paul Eggert2021-11-201-1/+1
| | | | | * src/grep.c (grep): Implement this. * tests/binary-file-matches: Add regression test.
* grep: port to PCRE2 10.20Paul Eggert2021-11-141-1/+4
| | | | * src/pcresearch.c (PCRE2_SIZE_MAX): Default to SIZE_MAX.
* grep: fix minor -P memory leakPaul Eggert2021-11-141-0/+1
| | | | * src/pcresearch.c (Pcompile): Free ccontext when no longer needed.
* grep: use ximalloc, not xcallocPaul Eggert2021-11-141-1/+3
| | | | | | * src/pcresearch.c (Pcompile): Use ximalloc, not xcalloc, and explicitly initialize the two slots that should be null. This is more likely to catch future errors if we use valgrind.
* grep: improve memory exhaustion checking with -PPaul Eggert2021-11-141-19/+31
| | | | | | | | | | * src/pcresearch.c (struct pcre_comp): New member gcontext. (private_malloc, private_free): New functions. (jit_exec): It is OK to call pcre2_jit_stack_free (NULL), so simplify. Use gcontext for allocation. Check for pcre2_jit_stack_create failure, since sljit bypasses private_malloc. Redo to avoid two ‘continue’s. (Pcompile): Create and use gcontext.
* grep: simplify JIT setupPaul Eggert2021-11-141-5/+3
| | | | * src/pcresearch.c (Pcompile): Simplify since ‘die’ cannot return.
* grep: use PCRE2_EXTRA_MATCH_LINEPaul Eggert2021-11-141-24/+30
| | | | | | * src/pcresearch.c (Pcompile): If available, use PCRE2_EXTRA_MATCH_LINE instead of doing it by hand. Simplify construction of substitute regular expression.
* grep: prefer signed integersPaul Eggert2021-11-141-13/+11
| | | | | | | | * src/pcresearch.c (struct pcre_comp, jit_exec, Pexecute): Prefer signed to unsigned types when either will do. (jit_exec): Use INT_MULTIPLY_WRAPV instead of doing it by hand. (Pexecute): Omit line length limit test that is no longer needed with PCRE2.
* grep: speed up, fix bad-UTF8 check with -PPaul Eggert2021-11-141-2/+14
| | | | | | | * src/pcresearch.c (bad_utf8_from_pcre2): New function. Fix bug where PCRE2_ERROR_UTF8_ERR1 was not treated as an encoding error. Improve performance when PCRE2_MATCH_INVALID_UTF is defined. (Pexecute): Use it.
* grep: improve pcre2_get_error_message commentsPaul Eggert2021-11-141-2/+3
| | | | | * src/pcresearch.c (Pcompile): Improve comments re pcre2_get_error_message buffer.
* grep: Don’t limit jitstack_max to INT_MAXPaul Eggert2021-11-141-1/+7
| | | | | * src/pcresearch.c (jit_exec): Remove arbitrary INT_MAX limit on JIT stack size.
* maint: minor rewording and reindentingPaul Eggert2021-11-141-22/+22
|
* grep: migrate to pcre2Carlo Marcelo Arenas Belón2021-11-141-129/+120
| | | | | | | | | | | | | | | | | | | | | | Mostly a bug by bug translation of the original code to the PCRE2 API. Code still could do with some optimizations but should be good as a starting point. The API changes the sign of some types and therefore some ugly casts were needed, some of the changes are just to make sure all variables fit into the newer types better. Includes backward compatibility and could be made to build all the way to 10.00, but assumes a recent enough version and has been tested with 10.23 (from CentOS 7, the oldest). Performance seems equivalent, and it also seems functionally complete. * m4/pcre.m4 (gl_FUNC_PCRE): Check for PCRE2, not the original PCRE. * src/pcresearch.c (struct pcre_comp, jit_exec) (Pcompile, Pexecute): Use PCRE2, not the original PCRE. * tests/filename-lineno.pl: Adjust to match PCRE2 diagnostics.
* grep: work around PCRE bugPaul Eggert2021-11-091-1/+4
| | | | | | Problem reported by Carlo Marcelo Arenas Belón (Bug#51710). * src/pcresearch.c (jit_exec): Don’t attempt to grow the JIT stack over INT_MAX - 8 * 1024.
* build: update gnulib submodule to latestPaul Eggert2021-08-271-2/+2
| | | | * src/system.h: Update decls to match current Gnulib.