summaryrefslogtreecommitdiff
path: root/src/pcresearch.c
Commit message (Collapse)AuthorAgeFilesLines
* pcre: work around a PCRE2_MATCH_INVALID_UTF bugCarlo Marcelo Arenas Belón2023-04-301-12/+17
| | | | | | | | | | | | | | PCRE2 has a bug when using PCRE2_MATCH_INVALID_UTF: it would sometimes fail to match patterns using negative classes like \W and \D. * NEWS (Bug fixes): Mention it. * src/pcre2search.c: Restrict impact of the bug. Do not use the problematic flag with broken versions of PCRE2. Also, generate locale tables only for single-byte locales, as the PCRE2 documentation recommends this. * tests/Makefile.am (TESTS): Add the file name * tests/pcre-utf8-bug224: New file, to test for this.
* grep: make -P survive JIT compilation failurePaul Eggert2023-04-131-3/+3
| | | | | * src/pcresearch.c (Pcompile): Ignore failure returns from pcre2_jit_compile.
* grep: improve PCRE2 version outputPaul Eggert2023-04-101-0/+9
| | | | | | | | | | * src/grep.c: No need to include pcre2.h. (main) [HAVE_LIBPCRE]: Call Pprint_version instead of doing it ourselves. * src/pcresearch.c (Pprint_version): New function. It also checks belatedly for buffer overflow, and says "grep -P uses PCRE2" instead of "Built with PCRE". * tests/version-pcre: Adjust test to match.
* grep: fix -P [\d] by fixing \w only if PCRE2 10.43Paul Eggert2023-04-021-86/+11
| | | | | | | | | | | | | | Our prepass-based fixes for the -P \d bug have caused repeated further bugs. Avoid the need for a prepass, by using PCRE2_UCP only if PCRE2_EXTRA_ASCII_BSD is also supported. Since the -P \w bug was present from grep 2.5 through 3.8 it’s OK if we wait a little longer to fix it. * NEWS: Mention this. * src/pcresearch.c (pcre_pattern_expand_backslash_d}: Remove. Remove its use. (Pcompile): Use PCRE2_UCP only if PCRE2_EXTRA_ASCII_BSD. * tests/pcre-ascii-digits, tests/pcre-utf8-w: Skip tests on older PCRE2 implementations.
* grep: -P (--perl-regexp) \D once again works like [^0-9]Jim Meyering2023-03-191-3/+11
| | | | | | | | | | | * NEWS: Mention \D, too. * doc/grep.texi: Likewise * src/pcresearch.c (pcre_pattern_expand_backslash_d): Handle \D. Also, ifdef-out this new function and its call site when not needed. * tests/pcre-ascii-digits: Test \D, too. Tighten one test by using returns_ 1. Add comments and tests that work only with 10.43 and newer. Paul Eggert raised the issue of \D in https://bugs.gnu.org/62267#8
* grep: forward port to PCRE2 10.43Paul Eggert2023-03-191-3/+87
| | | | | | | | | | | * doc/grep.texi: Document this. * src/grep.c: Move recent changes into pcresearch.c. (P_MATCHER_INDEX): Remove. (pcre_pattern_expand_backslash_d): Move from here ... * src/pcresearch.c: ... to here. (PCRE2_EXTRA_ASCII_BSD): Default to 0. (Pcompile): Use PCRE2_EXTRA_ASCII_BSD if available, and expand \d to [0-9] otherwise.
* grep: diagnose no UTF-8 support (Bug#60708)Paul Eggert2023-01-121-3/+5
| | | | | * src/pcresearch.c (Pcompile): Issue a diagnostic and exit instead of misbehaving if libpcre2 does not support the requested locale.
* pcre: use UTF only when available in the libraryCarlo Marcelo Arenas Belón2023-01-111-1/+3
| | | | | | | | | | | Before this change, if linked with a PCRE library without unicode any invocations of grep when using a UTF locale will error with: grep: this version of PCRE2 does not have Unicode support * src/pcresearch.c: Check whether Unicode was compiled in. * tests/pcre-utf8-w: Add check to skip test. * tests/pcre-utf8: Update check.
* pcre: use UCP in UTF modeCarlo Marcelo Arenas Belón2023-01-071-1/+1
| | | | | | | | | | | | | This fixes a serious bug affecting word-boundary and word-constituent regular expressions when the desired match involves non-ASCII UTF8 characters. * src/pcresearch.c: Set PCRE2_UCP together with PCRE2_UTF * tests/pcre-utf8-w: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention this. * THANKS.in: Add Gro-Tsen and Karl Petterson. Reported by Gro-Tsen https://twitter.com/gro_tsen/status/1610972356972875777 via Karl Pettersson in https://github.com/PCRE2Project/pcre2/issues/185 This bug was present from grep-2.5, when --perl-regexp (-P) support was added.
* maint: update copyright datesJim Meyering2023-01-011-1/+1
|
* maint: prefer stdckdint.h to intprops.hPaul Eggert2022-10-111-2/+3
| | | | | | | | Prefer the standard C23 ckd_* macros to Gnulib’s *_WRAPV macros. * bootstrap.conf (gnulib_modules): Add stdckdint. * src/grep.c, src/kwset.c, src/pcresearch.c: Include stdckdint.h, and prefer ckd_* to *_WRAPV. Include intprops.h only if needed.
* maint: add missing includePaul Eggert2022-10-111-0/+2
| | | | * src/pcresearch.c: Include intprops.h.
* grep: Remove recent PCRE2 bug workaroundsPaul Eggert2022-03-221-7/+0
| | | | | | | * src/pcresearch.c (Pcompile): Remove recent workaround for PCRE2 bugs; apparently it’s not needed. This reverts back to where things were before today. Suggested by Carlo Arenas in: https://lists.gnu.org/r/grep-devel/2022-03/msg00006.html
* grep: work around another potential PCRE2 bugPaul Eggert2022-03-221-6/+7
| | | | | | | Potential problem reported by René Scharfe in: https://lore.kernel.org/git/99b0adb6-26ba-293c-3a8f-679f59e7cb4d@web.de/T * src/pcresearch.c (Pcompile): Mimic git grep’s workarounds for PCRE2 bugs more closely; this is more conservative.
* grep: work around PCRE2 bug 2642Paul Eggert2022-03-221-0/+6
| | | | | | | | Problem reported by Carlo Arenas in: https://lists.gnu.org/r/grep-devel/2022-03/msg00004.html * src/pcresearch.c (Pcompile) [PCRE2_MATCH_INVALID_UTF]: In PCRE2 10.35 and earlier, disable start optimization if doing a caseless UTF-8 search.
* maint: make update-copyrightJim Meyering2022-01-011-1/+1
|
* grep: port to PCRE2 10.20Paul Eggert2021-11-141-1/+4
| | | | * src/pcresearch.c (PCRE2_SIZE_MAX): Default to SIZE_MAX.
* grep: fix minor -P memory leakPaul Eggert2021-11-141-0/+1
| | | | * src/pcresearch.c (Pcompile): Free ccontext when no longer needed.
* grep: use ximalloc, not xcallocPaul Eggert2021-11-141-1/+3
| | | | | | * src/pcresearch.c (Pcompile): Use ximalloc, not xcalloc, and explicitly initialize the two slots that should be null. This is more likely to catch future errors if we use valgrind.
* grep: improve memory exhaustion checking with -PPaul Eggert2021-11-141-19/+31
| | | | | | | | | | * src/pcresearch.c (struct pcre_comp): New member gcontext. (private_malloc, private_free): New functions. (jit_exec): It is OK to call pcre2_jit_stack_free (NULL), so simplify. Use gcontext for allocation. Check for pcre2_jit_stack_create failure, since sljit bypasses private_malloc. Redo to avoid two ‘continue’s. (Pcompile): Create and use gcontext.
* grep: simplify JIT setupPaul Eggert2021-11-141-5/+3
| | | | * src/pcresearch.c (Pcompile): Simplify since ‘die’ cannot return.
* grep: use PCRE2_EXTRA_MATCH_LINEPaul Eggert2021-11-141-24/+30
| | | | | | * src/pcresearch.c (Pcompile): If available, use PCRE2_EXTRA_MATCH_LINE instead of doing it by hand. Simplify construction of substitute regular expression.
* grep: prefer signed integersPaul Eggert2021-11-141-13/+11
| | | | | | | | * src/pcresearch.c (struct pcre_comp, jit_exec, Pexecute): Prefer signed to unsigned types when either will do. (jit_exec): Use INT_MULTIPLY_WRAPV instead of doing it by hand. (Pexecute): Omit line length limit test that is no longer needed with PCRE2.
* grep: speed up, fix bad-UTF8 check with -PPaul Eggert2021-11-141-2/+14
| | | | | | | * src/pcresearch.c (bad_utf8_from_pcre2): New function. Fix bug where PCRE2_ERROR_UTF8_ERR1 was not treated as an encoding error. Improve performance when PCRE2_MATCH_INVALID_UTF is defined. (Pexecute): Use it.
* grep: improve pcre2_get_error_message commentsPaul Eggert2021-11-141-2/+3
| | | | | * src/pcresearch.c (Pcompile): Improve comments re pcre2_get_error_message buffer.
* grep: Don’t limit jitstack_max to INT_MAXPaul Eggert2021-11-141-1/+7
| | | | | * src/pcresearch.c (jit_exec): Remove arbitrary INT_MAX limit on JIT stack size.
* maint: minor rewording and reindentingPaul Eggert2021-11-141-22/+22
|
* grep: migrate to pcre2Carlo Marcelo Arenas Belón2021-11-141-129/+120
| | | | | | | | | | | | | | | | | | | | | | Mostly a bug by bug translation of the original code to the PCRE2 API. Code still could do with some optimizations but should be good as a starting point. The API changes the sign of some types and therefore some ugly casts were needed, some of the changes are just to make sure all variables fit into the newer types better. Includes backward compatibility and could be made to build all the way to 10.00, but assumes a recent enough version and has been tested with 10.23 (from CentOS 7, the oldest). Performance seems equivalent, and it also seems functionally complete. * m4/pcre.m4 (gl_FUNC_PCRE): Check for PCRE2, not the original PCRE. * src/pcresearch.c (struct pcre_comp, jit_exec) (Pcompile, Pexecute): Use PCRE2, not the original PCRE. * tests/filename-lineno.pl: Adjust to match PCRE2 diagnostics.
* grep: work around PCRE bugPaul Eggert2021-11-091-1/+4
| | | | | | Problem reported by Carlo Marcelo Arenas Belón (Bug#51710). * src/pcresearch.c (jit_exec): Don’t attempt to grow the JIT stack over INT_MAX - 8 * 1024.
* grep: prefer signed to unsigned integersPaul Eggert2021-08-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This improves runtime checking for integer overflow when compiling with gcc -fsanitize=undefined and the like. It also avoids the need for some integer casts, which can be error-prone. * bootstrap.conf (gnulib_modules): Add idx. * src/dfasearch.c (struct dfa_comp, kwsmusts): (possible_backrefs_in_pattern, regex_compile, GEAcompile) (EGexecute): * src/grep.c (struct patloc, patlocs_allocated, patlocs_used) (n_patterns, update_patterns, pattern_file_name, poison_len) (asan_poison, fwrite_errno, compile_fp_t, execute_fp_t) (buf_has_encoding_errors, buf_has_nulls, file_must_have_nulls) (bufalloc, pagesize, all_zeros, fillbuf, nlscan) (print_line_head, print_line_middle, print_line_tail, grepbuf) (grep, contains_encoding_error, fgrep_icase_available) (fgrep_icase_charlen, fgrep_to_grep_pattern, try_fgrep_pattern) (main): * src/kwsearch.c (struct kwsearch, Fcompile, Fexecute): * src/kwset.c (struct trie, struct kwset, kwsalloc, kwsincr) (kwswords, treefails, memchr_kwset, acexec_trans, kwsexec) (treedelta, kwsprep, bm_delta2_search, bmexec_trans, bmexec) (acexec): * src/kwset.h (struct kwsmatch): * src/pcresearch.c (Pcompile, Pexecute): * src/search.h (mb_clen): * src/searchutils.c (kwsinit, mb_goback, wordchars_count) (wordchars_size, wordchar_next, wordchar_prev): Prefer idx_t to size_t or ptrdiff_t for nonnegative sizes, and prefer ptrdiff_t to size_t for sizes plus error values. * src/grep.c (uword_size): New constant, used for signed size calculations. (totalnl, add_count, totalcc, print_offset, print_line_head, grep): Prefer intmax_t to uintmax_t for wide integer calculations. (fgrep_icase_charlen): Prefer ptrdiff_t to int for size offsets. * src/grep.h: Include idx.h. * src/search.h (imbrlen): New function, like mbrlen except with idx_t and ptrdiff_t.
* maint: run "make update-copyright"Paul Eggert2021-01-011-1/+1
|
* grep: -P: report input filename upon PCRE execution failureJim Meyering2020-10-111-5/+9
| | | | | | | | | | | | Without this, it could be tedious to determine which input file evokes a PCRE-execution-time failure. * src/pcresearch.c (Pexecute): When failing, include the error-provoking file name in the diagnostic. * src/grep.c (input_filename): Make extern, since used above. * src/search.h (input_filename): Declare. * tests/filename-lineno.pl: Test for this. ($no_pcre): Factor out. * NEWS (Bug fixes): Mention this.
* grep: avoid unnecessary regex compilationNorihiro Tanaka2020-09-221-1/+1
| | | | | | | | | | | | | | | | | | | Grep resorts to using the regex engine when the precision of either -o or --color is required, or when the pattern is not supported by our DFA engine (e.g., backref). Otherwise, grep would perform regex compilation solely to check the syntax. This change makes grep skip that compilation in the common case for which it is unnecessary. The compilation we are avoiding is quite costly, consuming O(N^2) RSS for N regular expressions. * src/dfasearch.c (GEAcompile): Add new argument, and avoid unneeded compilation of regex. * src/grep.c (compile_fp_t): Update prototype. (main): Update caller. * src/kwsearch.c (Fcompile): Update caller and add new argument. * src/pcresearch.c (Pcompile): Add new argument. * src/search.h (GEAcompile, Fcompile, Pcompile): Update prototype.
* grep: fix logic for growing PCRE JIT stackPaul Eggert2020-09-091-6/+8
| | | | | | | * src/pcresearch.c (jit_exec) [PCRE_EXTRA_MATCH_LIMIT_RECURSION]: When growing the match_limit_recursion limit, do not use the old value if ! (flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION), as it is uninitialized in that case.
* grep: fix PCRE JIT test when JIT not availablePaul Eggert2020-09-091-0/+3
| | | | | | | Problem reported by Thomas Deutschmann (Bug#29446#23). * src/pcresearch.c (Pexecute): Diagnose PCRE_ERROR_RECURSIONLIMIT. * tests/pcre-jitstack: Treat recursion limit overflow like stack overflow.
* Prefer rawmemchr to memchr when it’s easyPaul Eggert2020-09-071-7/+11
| | | | | | | | | | * bootstrap.conf (gnulib_modules): Add rawmemchr. * src/dfasearch.c (GEAcompile, EGexecute): * src/grep.c (update_patterns, prpending, prtext): * src/kwsearch.c (Fcompile, Fexecute): * src/pcresearch.c (Pcompile, Pexecute): Simplify (and presumably speed up a little) by using rawmemchr with a sentinel, instead of using memchr.
* maint: update all copyright year number rangesJim Meyering2020-01-011-1/+1
| | | | | | | | Run "make update-copyright" and then... * gnulib: Update to latest with copyright year adjusted. * tests/init.sh: Sync with gnulib to pick up copyright year. * bootstrap: Likewise. * doc/grep.in.1: Use "-" in copyright year ranges, not \en.
* grep: simplify pcresearch.c ifdefsPaul Eggert2019-01-201-34/+20
| | | | | | | | | | This fixes a warning if PCRE is not used (Bug#34054). * configure.ac (USE_PCRE): New conditional. * src/Makefile.am (grep_SOURCES) [!USE_PCRE]: Omit pcresearch.c. * src/grep.c (matchers) [!HAVE_LIBPCRE]: Omit perl matcher. (setmatcher) [!HAVE_LIBPCRE]: If helpful, mention --disable-perl-regexp in diagnostic. * src/pcresearch.c: Simplify by assuming HAVE_LIBPCRE.
* maint: update all copyright dates via "make update-copyright"Jim Meyering2019-01-011-1/+1
| | | | * gnulib: Also update submodule for its copyright updates.
* maint: update gnulib and copyright dates for 2018Jim Meyering2018-01-061-1/+1
| | | | | | * gnulib: Update to latest. * all files: Run "make update-copyright". * bootstrap: Update from gnulib.
* grep: port better to Adélie GNU/Linux 64-bit ppcPaul Eggert2017-11-251-1/+23
| | | | | | | | | | Problem reported by A. Wilcox (Bug#29446). * src/pcresearch.c (PCRE_EXTRA_MATCH_LIMIT_RECURSION) (PCRE_STUDY_EXTRA_NEEDED): Default to 0. (jit_exec): If we run up against the recursion limit, double it (if possible) and try again. (Pcompile): Also specify PCRE_STUDY_EXTRA_NEEDED so that pc->extra is not null.
* Do not assume PCRE 8.20 or laterPaul Eggert2017-02-071-3/+2
| | | | | | | Problem reported by Zube (Bug#25647) * NEWS: Document this. * src/pcresearch.c (struct pcre.com.jit_stack): Declare only if PCRE_STUDY_JIT_COMPILE.
* Improve -i performance in typical UTF-8 searchesPaul Eggert2017-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently ‘grep -i i’ is slow in a UTF-8 locale, because ‘i’ in the pattern matches the two-byte character 'ı' (U+0131, LATIN SMALL LETTER DOTLESS I) in data, and kwset handles only single-byte character translations, so grep falls back on a slower DFA-based search for all searches. Improve -i performance in the typical case by using kwset when data are free of troublesome characters like 'ı', falling back on the DFA only when data contain troublesome characters. * src/dfasearch.c (GEAcompile): * src/grep.c (compile_fp_t): * src/kwsearch.c (Fcompile): * src/pcresearch.c (Pcompile): Pattern arg is now char *, not char const *, since Fcompile now reallocates it sometimes. * src/grep.c (all_single_byte_after_folding): Remove. All callers removed. (fgrep_icase_charlen): New function. (fgrep_icase_available, try_fgrep_pattern): Use it, for more-generous semantics. (fgrep_to_grep_pattern): Now extern. (main): Do not free keys, since Fexecute may use them. * src/kwsearch.c (struct kwsearch): New struct. (Fcompile): Return it. If -i, be more generous about patterns. (Fexecute): Use it. Fall back on DFA when the data contain troublesome characters; this should be rare in practice. * src/kwset.c, src/kwset.h (kwswords): New function.
* maint: update gnulib and copyright dates for 2017Jim Meyering2017-01-011-1/+1
| | | | | * gnulib: Update to latest. * all files: Run "make update-copyright".
* pcresearch: thread safetyZev Weiss2016-12-251-41/+47
| | | | | | | | | * src/pcresearch.c (pcre_comp): New struct to hold previously-global state. (jit_exec): Operate on a pcre_comp parameter instead of global state. (Pcompile): Allocate and return a pcre_comp instead of setting global variables. (Pexecute): Operate on a pcre_comp parameter instead of global state.
* grep: prepare search backends for thread-safetyZev Weiss2016-12-251-2/+4
| | | | | | | | | | | | | | | | | | | | | | To facilitate removing mutable global state from search backends, compile() functions will return an opaque pointer to backend-specific data, which must then be passed back into the corresponding execute() function. This is merely a preparatory step changing function signatures and call sites, so the pointers passed & returned are dummies for now and not (yet) actually used. * src/grep.c (compile_fp_t): Now returns an opaque pointer (the compiled pattern). (execute_fp_t): Now passed the pointer returned by a compile_fp_t. All call sites updated accordingly. (compiled_pattern): New static variable. * src/dfasearch.c (GEAcompile): Return a void pointer (dummy NULL). (EGexecute): Receive a void pointer argument (unused). * src/kwsearch.c (Fcompile): Return a void pointer (dummy NULL). (Fexecute): Receive a void pointer argument (unused). * src/pcresearch.c (Pcompile): Return a void pointer (dummy NULL). (Pexecute): Receive a void pointer argument (unused). * src/search.h: Update compile/execute function prototypes.
* grep: standardize on localeinfo.multibytePaul Eggert2016-12-231-1/+1
| | | | | | | | * src/dfasearch.c (EGexecute): * src/grep.c (main): * src/kwsearch.c (Fexecute): * src/pcresearch.c (Pcompile): Prefer localeinfo.multibyte to (MB_CUR_MAX > 1).
* grep: simplify matcher configurationPaul Eggert2016-12-201-1/+1
| | | | | | | | | | | | | | | * src/grep.c (matcher, compile): Remove static vars. (compile_fp_t): Now takes a 3rd syntax argument. (Gcomppile, Ecompile, Acompile, GAcompile, PAcompile): Remove. (struct matcher): Now nameless, since it is used only once. Make 'name' a bit shorter. New member 'syntax'. (matchers): Initialize it, and change removed functions to GEAcompile. (F_MATCHER_INDEX, G_MATCHER_INDEX): New constants. (setmatcher): New arg MATCHER, and return new matcher index. Avoid unnecessary call to strcmp. (main): Keep matcher as a local int, not a global pointer. * src/kwsearch.c (Fcompile): * src/pcresearch.c (Pcompile): Ignore the 3rd syntax argument.
* grep: further -P performance fixPaul Eggert2016-11-191-3/+5
| | | | | | | Problem reported by Stephane Chazelas in: http://bugs.gnu.org/22655#103 * src/pcresearch.c (Pexecute): Set the subject to the start of each line as it is found.
* grep: -P no longer uses PCRE_MULTILINEPaul Eggert2016-11-191-91/+10
| | | | | | | | | | | | | | | | | | This reverts commit f6603c4e1e04dbb87a7232c4b44acc6afdf65fef, as the extra performance is not worth the trouble for PCRE users. Problem reported by Stephane Chazelas in: http://bugs.gnu.org/22655#103 * NEWS: Document this and the next patch. * src/dfasearch.c (EGexecute): * src/grep.c (execute_fp_t): * src/kwsearch.c (Fexecute): * src/pcresearch.c (Pexecute): First arg is now a const pointer again. * src/grep.c (buf_has_encoding_errors): Now static. * src/grep.h (buf_has_encoding_errors): Remove decl. * src/search.h: Adjust decls. * src/pcresearch.c (reflags): Remove. All uses removed. (Pcompile, Pexecute): Do not use PCRE_MULTILINE.