summaryrefslogtreecommitdiff
path: root/NEWS
Commit message (Collapse)AuthorAgeFilesLines
* grep: fix performance with multiple patternsPaul Eggert2016-12-201-0/+11
| | | | | | | | | | | | Problem reported by Jaroslav Skarvada (Bug#22357). * NEWS: Document this and other recent performance fixes. * src/grep.c (E_MATCHER_INDEX): New constant. (all_single_byte_after_folding): New function, split out from fgrep_icase_available. (fgrep_icase_available): Use it. (try_fgrep_pattern): New function, which also uses it. (main): With two or more patterns, use try_fgrep_pattern to fix performance regression. The number "two" here is just a heuristic.
* grep: work around proc lseek glitchPaul Eggert2016-12-121-0/+6
| | | | | | | | Problem reported by Andreas Schwab (Bug#25180). * NEWS: Document this. * src/grep.c (finalize_input): Ignore EINVAL lseek failures. * tests/Makefile.am (TESTS): Add proc. * tests/proc: New file.
* maint: clarify early-exit news for 2.27Paul Eggert2016-12-071-1/+2
| | | | | * NEWS: Mention early-exit options to avoid confusion. See: http://lists.gnu.org/archive/html/grep-devel/2016-12/msg00007.html
* maint: post-release administriviaJim Meyering2016-12-061-0/+3
| | | | | | * NEWS: Add header line for next release. * .prev-version: Record previous version. * cfg.mk (old_NEWS_hash): Auto-update.
* version 2.27v2.27Jim Meyering2016-12-061-1/+1
| | | | * NEWS: Record release date.
* build: update gnulib submodule to latestPaul Eggert2016-11-281-2/+1
|
* grep: avoid false matches in non-UTF8 multibyte localesJim Meyering2016-11-271-0/+7
| | | | | | | | | | | | * gnulib: Update to latest, for the dfa.c fix. * NEWS (Bug fixes): Mention it. * tests/false-match-mb-non-utf8: New file, with tests for this. Based on tests from Stephane Chazelas. * tests/Makefile.am (TESTS): Add it. Introduced by commit v2.18-54-g3ef4c8e, a change that made grep use its DFA matcher more aggressively. The malfunction arises only with the DFA matcher, not with regex. Reported by Stephane Chazelas in https://bugs.gnu.org/24975
* tests: check for unibyte French range bugPaul Eggert2016-11-201-0/+3
| | | | | | | | | | Problem reported by Stephane Chazelas (Bug#24973). This bug was fixed in Gnulib. * NEWS: Document the fix. * tests/init.cfg (require_ru_RU_koi8_r): Remove. * tests/unibyte-bracket-expr: Add a test for the bug. Call get-mb-cur-max directly instead of bothering with require_ru_RU_koi8_r.
* grep: -P no longer uses PCRE_MULTILINEPaul Eggert2016-11-191-3/+5
| | | | | | | | | | | | | | | | | | This reverts commit f6603c4e1e04dbb87a7232c4b44acc6afdf65fef, as the extra performance is not worth the trouble for PCRE users. Problem reported by Stephane Chazelas in: http://bugs.gnu.org/22655#103 * NEWS: Document this and the next patch. * src/dfasearch.c (EGexecute): * src/grep.c (execute_fp_t): * src/kwsearch.c (Fexecute): * src/pcresearch.c (Pexecute): First arg is now a const pointer again. * src/grep.c (buf_has_encoding_errors): Now static. * src/grep.h (buf_has_encoding_errors): Remove decl. * src/search.h: Adjust decls. * src/pcresearch.c (reflags): Remove. All uses removed. (Pcompile, Pexecute): Do not use PCRE_MULTILINE.
* grep: fix -zxP bugPaul Eggert2016-11-191-3/+3
| | | | | | | * NEWS: Document this. * src/pcresearch.c (Pcompile): Search a line at a time if -x is used, since -x uses ^ and $. * tests/pcre: Test this.
* grep: -T now adjusts number widths for worst casePaul Eggert2016-11-191-0/+4
| | | | | | | | | * NEWS, doc/grep.texi (Output Line Prefix Control): Document this (Bug#24451). * src/grep.c (offset_width): New static var. (print_offset): Use it instead of arg. All callers changed. (grep): Set it. * tests/initial-tab: Test this.
* grep: -T no longer outputs BSPaul Eggert2016-11-191-0/+4
| | | | | | | * NEWS: Document this (Bug#24451). * src/grep.c (print_line_head): Do not attempt to backspace output. * tests/initial-tab: New test. * tests/Makefile.am (TESTS): Add it.
* grep -f /dev/null -L PAT FILE outputs FILEPaul Eggert2016-11-191-0/+2
| | | | | | * NEWS: Document this. * src/grep.c (main): Do not exit right away with -L. * tests/skip-read: Test for the fix.
* grep: treat -f /dev/null like -m0Paul Eggert2016-11-191-0/+5
| | | | | | | | * NEWS: Document this. * src/grep.c (main): With -f /dev/null, don't bother to read the input. This is what FreeBSD grep does. * tests/Makefile.am (TESTS): Add skip-read. * tests/skip-read: New file.
* grep: scale back /dev/null speedupPaul Eggert2016-11-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | The performance improvement when output is /dev/null (commit af6af288eac28951b5eee1eaaf373e22b2193b7b dated 2016-05-01) breaks scripts that run "PROGRAM | grep PATTERN >/dev/null" where PROGRAM dies when writing into a broken pipe. Suppress the improvement if standard input is not seekable. Problem reported by Gary Johnson (Bug#24941). * NEWS: Document this. * src/grep.c (seek_failed): New static var. (seek_data_failed): Move decl earlier, to be next to seek_failed. (file_must_have_nulls): Skip useless syscalls if seek_failed. Lessen source-code nesting. (reset): Set seek_failed and seek_data_failed. Try lseek even on non-regular files. (grep): New arg INEOF. All callers changed. Do not clear seek_data_failed here, since 'reset' now does this. (finalize_input): New static function. (grepdesc): Use it. (main): Do not exit on first match merely because output is /dev/null. * tests/grep-dev-null-out: Adjust to new behavior.
* grep: -Pz no longer rejects ^, $Paul Eggert2016-11-191-0/+4
| | | | | | | | | | | | | Problem reported by Stephane Chazelas (Bug#22655). * NEWS: Document this. * doc/grep.texi (grep Programs): Warn about -Pz. * src/pcresearch.c (reflags): New static var. (multibyte_locale): Remove static var; now local to Pcompile. (Pcompile): Check for (? and (* too. Set reflags instead of dying when problematic operators are found. (Pexecute): Use reflags to decide whether searches should be multiline. * tests/pcre: Test new behavior.
* doc: grep builds on HP-UX once againJim Meyering2016-10-261-0/+4
| | | | * NEWS (Bug fixes): Mention the HP-UX fix.
* maint: post-release administriviaJim Meyering2016-10-021-0/+3
| | | | | | * NEWS: Add header line for next release. * .prev-version: Record previous version. * cfg.mk (old_NEWS_hash): Auto-update.
* version 2.26v2.26Jim Meyering2016-10-021-1/+1
| | | | * NEWS: Record release date.
* grep: add news entry for fix to bug#24233Norihiro Tanaka2016-09-181-0/+3
| | | | | * NEWS (Bug fixes): Add an entry describing bug#24233. The bug was fixed by commit v2.25-77-gad468bb, by chance.
* grep: encoding errors suppress just their linePaul Eggert2016-09-081-0/+5
| | | | | | | | From a suggestion by Marcello Perathoner (Bug#22838). * NEWS, doc/grep.texi (File and Directory Selection): Document this. * src/grep.c (print_line_head): Do not suppress later output lines merely because an earlier output line would have had an encoding error. * tests/encoding-error: Test for the new behavior.
* grep: update NEWSPaul Eggert2016-09-011-0/+3
| | | | * NEWS: Describe previous change.
* dfa: document previous changePaul Eggert2016-09-011-2/+2
| | | | * NEWS: Adjust to match previous change.
* grep: avoid code duplication with -iFPaul Eggert2016-09-011-6/+3
| | | | | | | | | | | | | | | | | | | | | | | This follows up on the -iF performance improvement (Bug#23752). * NEWS: Simplify description of -iF improvement. * src/dfa.c: Do not include wctype.h. (lonesome_lower, case_folded_counterparts): Move to localeinfo.c. (CASE_FOLDED_BUFSIZE): Move to localeinfo.h. * src/grep.c: Do not include wctype.h. (lonesome_lower): Remove. (fgrep_icase_available): Use case_folded_counterparts instead. Do not call it for the same character twice. Return false on wcrtomb failures (which should never happen). (fgrep_to_grep_pattern, main): Simplify. Let fgrep_to_grep’s caller fiddle with the global variables. * src/localeinfo.c: Include <wctype.h> (lonesome_lower, case_folded_counterparts): Move here from src/dfa.c. Return int, not unsigned int. Verify that CASE_FOLDED_BUFSIZE is big enough. * src/localeinfo.h (CASE_FOLDED_BUFSIZE): Now 32, so that we don’t expose lonesome_lower’s size. * src/searchutils.c (kwsinit): Return new kwset instead of storing it via a pointer. All callers changed. Simplify a bit.
* grep: speed up -iF in multibyte localesNorihiro Tanaka2016-09-011-0/+6
| | | | | | | | | | | | | | | In a multibyte locale, if a pattern is composed of only single byte characters and their all counterparts are also single byte characters and the pattern does not have invalid sequences, grep -iF uses the fgrep matcher, the same as in a single byte locale (Bug#23752). * NEWS: Mention it. * src/grep.c (lonesome_lower): New constant. (fgrep_icase_available): New function. (fgrep_to_grep_pattern): Simplify it. (main): Use them. * src/searchutils.c (kwsinit): New arg MB_TRANS; all uses changed. Try fgrep matcher for case insensitive matching by grep -F in multibyte locale.
* dfa: minor refactoring and doc fixesPaul Eggert2016-08-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | * NEWS: Improve description of recent change. * src/dfa.c: Improve commentary. Indent new code (and some long-existing howlers) more in GNU style. (dfa_state): Reorder members to make struct smaller on x86. mb_trindex member is now state_num, not size_t, so that -1 is more natural; all uses changed. (struct dfa): Similarly for mb_trcount member. (state_index): Compute values for new state components before allocating the state, to make the code easier to understand. (state_index, dfastate): Prefer A & ~B to other forms like (A & B) != A. (dfastate, build_state, transit_state): In new code, prefer i++ to ++i in for-loop control. (build_state, transit_state): In new code, prefer < to >. (transit_state): Add to *PP in one assignment, rather than in a loop. Prefer !x to x == NULL. Use xmalloc instead of xnmalloc, since the size is a constant. Do the size calculation as a signed integer constant expression, so that the compiler diagnoses any overflow. (transit_state, free_mbdata): Tune by looping from -1 to N - 1, rather than from 0 to N - 1 with a separate instance for -1. (dfaexec_main): Rewrite to avoid side effects in if-part. (free_mbdata): Simplify.
* dfa: improve leading "." with non-UTF8 multibyteNorihiro Tanaka2016-08-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In non-UTF8 multibyte locales, matching the dot expression is very slow, as the next state is calculated on demand. This change caches the result for the typical case (Bug#21486). Compare the run times of this command before and after this change, on a i5-4570 CPU @ 3.20GHz using rawhide (~fedora 22) and compiled with gcc 5.1.1 20150618: yes "$(printf 'a%38db\n' 0)" | head -1000000 >in env LC_ALL=ja_JP.eucJP time -p \ src/grep .......................................... in Before: 19.10 After : 0.55 * NEWS: Document this. * src/dfa.c: (struct dfa_state): New members curr_dependent, mb_trindex. (MAX_TRCOUNT): New constant. (struct dfa): New members mb_trans, mb_trcount. (state_index): Initialize new members of struct dfa_state and calculate dependency on context of next character for positions for dot. (dfastate): Calculate follows positions for dot if enabled. (realloc_trans_if_necessary): Allocate transition tables. (build_state): Use new constant and reset transition tables. (transit_state): Use cache for transition from a state with the dot expression. (free_mbdata): Deallocate transition tables.
* grep: print "filename:lineno:" in invalid-regex diagnosticJim Meyering2016-07-251-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Determining the file name and line number is a little tricky because of the way the regular expressions are all concatenated onto a newline- separated list. By the time grep would compile regular expressions, the <filename,lineno> origin of each regexp was no longer available. This patch adds a list of filename,first_lineno pairs, one per input source, by which we can then map the ordinal regexp number to a filename,lineno pair for the diagnostic. * src/dfasearch.c (GEAcompile): When diagnosing an invalid regexp specified via -f FILE, include the "FILENAME:LINENO: " prefix. Also, when there are two or more lines with compilation failures, diagnose all of them, rather than stopping after the first. * src/grep.h (pattern_file_name): Declare it. * src/grep.c: (struct FL_pair): Define type. (fl_pair, n_fl_pair_slots, n_pattern_files, patfile_lineno): Define globals. (fl_add, pattern_file_name): Define functions. (main): Call fl_add for each type of the following: -e argument, -f argument, command-line-specified (without -e) regexp. * tests/filename-lineno.pl: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Improvements): Mention this. Initially reported by Gunnar Wolf in https://bugs.debian.org/525214 Forwarded to grep's bug list by Santiago Ruano Rincón as http://debbugs.gnu.org/23965
* grep: minor cleanups for -F Aho-CorasickPaul Eggert2016-06-021-3/+4
| | | | | | | | | | | | | | | | | | | | | * NEWS: Don't claim 7x, as the value seems to be system-dependent. * src/kwset.c (struct kwset.kwsexec, bmexec, acexec_trans, acexec): * src/kwset.c, src/kwset.h (kwsalloc, kwsexec): Don't put 'const' into the declaration when that is irrelevant to the API. More generally, don't bother with 'const' when it's only a local so it is reasonably obvious to a reader that it is 'const' anyway. It would be overkill to add 'const' to all locals that never change. * src/kwset.c (U): Avoid unnecessary parens. (treefails, memoff2_kwset, bmexec_trans, bmexec, cwexec, acexec_trans): Prefer SIZE_MAX to (size_t) -1. (bmexec_trans, cwexec, acexec_trans): Remove attributes for static functions that no longer seem needed. (memoff2_kwset): Rename from memchr2_kwset, since it returns an offset, not a pointer. All uses changed. (cwexec, acexec_trans) [lint]: Remove initialization that is no longer needed; at least, GCC 6.1 x86-64 does not need it. (acexec_trans): Clarify code by using nesting rather than 'continue'.
* grep: use Aho-Corasick algorithm to search multiple fixed wordsPaul Eggert2016-06-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Searching multiple fixed words, grep used the Commentz-Walter algorithm, but this was O(m*n) and was very slow in the worst case. For example: - input: yes `printf %040d` | head -10000000 - word1: x0000000000000000000 - word2: x This change instead uses the Aho-Corasick algorithm to search multiple fixed words. It uses a high-quality trie-building function that is already defined for Commentz-Walter in kwset.c. I see 7x speed-up even for a typical case on Fedora 21 with a 3.2GHz i5 by this change. Using best-of-5 trials for the benchmark: find /usr/share/doc/ -type f | LC_ALL=C time -p xargs.sh src/grep -Ff /usr/share/dict/linux.words >/dev/null The results were: real 11.37 user 11.03 sys 0.24 [without the change] real 1.49 user 1.31 sys 0.15 [with the change] * src/kwset.c (struct kwset): Add a new member 'mode'. (kwsalloc): Use it. All callers are changed. (kwsincr): Using Aho-Corasick algorithm, build tries in normal order. (acexec_trans, acexec): Add a new function. (kwsexec): Use it. * src/kwset.h (kwsalloc): Update a prototype. * NEWS (Improvements): Mention it.
* maint: avoid NEWS syntax-check failureJim Meyering2016-05-021-2/+4
| | | | | * NEWS: Move the mention of the /dev/null speed-up from the block for 2.25 into the current, in-preparation block.
* grep: /dev/null output speedupPaul Eggert2016-05-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | This sped up 'seq 10000000000 | grep . >/dev/null' by a factor of 380,000 on my platform (Fedora 23, x86-64, AMD Phenom II X4 910e, en_US.UTF-8 locale). * NEWS: Document this. * src/grep.c (grepbuf): exit_on_match no longer implies that -q was specified, so when a match is found, exit with exit_failure if an error was also found. (grepdesc): Omit unnecessary S_ISREG and st_ino checks. out_stat.st_ino is zero if stdout is not a regular file, and this cannot possibly equal st->st_ino. (main): Omit duplicate initialization of exit_failure. Do not bother with isatty unless -q is not used and stdout is a character special file and --color=auto and TERM says colorization is possible. Most importantly, set exit_on_match if the output is /dev/null. * tests/grep-dev-null-out: New test. * tests/Makefile.am (TESTS): Add it. * tests/status: Do not require grep to actually read all the input files when the output is /dev/null and a matching line has been found.
* maint: post-release administriviaJim Meyering2016-04-211-0/+3
| | | | | | * NEWS: Add header line for next release. * .prev-version: Record previous version. * cfg.mk (old_NEWS_hash): Auto-update.
* version 2.25v2.25Jim Meyering2016-04-211-1/+1
| | | | * NEWS: Record release date.
* grep: in C locale, all bytes are valid charactersPaul Eggert2016-04-101-0/+6
| | | | | | | | | | | | | | | This works around glibc bug 19932: https://sourceware.org/bugzilla/show_bug.cgi?id=19932 The actual bug fix was the update to the current version of Gnulib. grep problem reported by Björn Jacke in: http://bugs.gnu.org/23234 * NEWS: Mention this. * doc/grep.texi (File and Directory Selection): Crossref to LC_* section. Suggest why -a or LC_ALL=C might be useful. (Environment Variables): Mention 'locale -a'. Say that LC_CTYPE also specifies encoding, and that every byte is a valid character in the C or POSIX locale. * tests/c-locale: New test. * tests/Makefile.am (TESTS): Add it.
* grep: -Pz no longer misdiagnoses [^a]Paul Eggert2016-03-231-0/+3
| | | | | | | Problem reported by Michael Jess. * NEWS: Document this. * src/pcresearch.c (Pcompile): Do not diagnose [^ when [ is unescaped. * tests/pcre: Test for the bug.
* maint: move new 'Improvements' blurb into proper sectionJim Meyering2016-03-221-2/+3
| | | | | | * NEWS (Improvements): Move this new section from within the block for the already-released 2.24 into the proper "next-release" block. Also, retain the 2-blank-line separator between blocks.
* grep: -oz now outputs null bytes, not newlinesPaul Eggert2016-03-181-0/+4
| | | | | | | | | * NEWS: Document this. * doc/grep.texi (Other Options): Clarify that -z affects output as well as input data. * src/grep.c (print_line_middle): Output eolbyte, not newline, if -o. * tests/null-byte: Test -o too. * tests/pcre-context: Adjust test to match new behavior.
* grep: use errno consistently in write diagnosticsPaul Eggert2016-03-171-0/+6
| | | | | | | | | | | | | | | | | Feature request and initial version reported by Assaf Gordon in: http://bugs.gnu.org/23031 * NEWS: Document this. * src/grep.c: Include <stdarg.h>. (stdout_errno): New static var. (write_error_seen): Remove; superseded by stdout_errno. All uses changed. (putchar_errno, fputs_errno, printf_errno, fwrite_errno) (fflush_errno): New static functions. (print_filename, print_sep, print_offset, print_line_head) (print_line_middle, print_line_tail, prline, prtext, grep) (grepdesc): Use them. * tests/write-error-msg: New file. * tests/Makefile.am (TESTS): Add it.
* maint: post-release administriviaJim Meyering2016-03-101-0/+3
| | | | | | * NEWS: Add header line for next release. * .prev-version: Record previous version. * cfg.mk (old_NEWS_hash): Auto-update.
* version 2.24v2.24Jim Meyering2016-03-101-1/+1
| | | | * NEWS: Record release date.
* grep: -Pz is incompatible with ^ and $Paul Eggert2016-02-211-0/+7
| | | | | | | Problem reported by Sergei Trofimovich in: http://bugs.gnu.org/22655 * NEWS: Document this. * src/pcresearch.c (Pcompile): Warn with -Pz and anchors. * tests/pcre: Test new behavior.
* grep -z: avoid erroneous match with regexp anchor and \n in textJim Meyering2016-02-201-0/+13
| | | | | | | | | | | * src/dfasearch.c (EGexecute): Clear the newline_anchor bit when eolbyte is not '\n'. * tests/z-anchor-newline: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Describe it. Originally reported by Ulrich Mueller in https://bugs.gentoo.org/show_bug.cgi?id=574662 Reported to us by Sergei Trofimovich as http://debbugs.gnu.org/22655
* maint: post-release administriviaJim Meyering2016-02-041-0/+3
| | | | | | * NEWS: Add header line for next release. * .prev-version: Record previous version. * cfg.mk (old_NEWS_hash): Auto-update.
* version 2.23v2.23Jim Meyering2016-02-041-1/+1
| | | | * NEWS: Record release date.
* maint: fix typo in NEWS: s/a/an/Jim Meyering2016-01-231-1/+1
|
* grep: -x now supersedes -w more consistentlyPaul Eggert2016-01-151-1/+4
| | | | | | | | | | | | * NEWS, doc/grep.texi (Matching Control): Mention this. * src/dfasearch.c (EGexecute): * src/pcresearch.c (Pcompile): Don't get confused by -w if -x is also present. * src/pcresearch.c (Pcompile): Remove misleading comment about non-UTF-8 multibyte locales, as PCRE doesn't support them. Calculate buffer sizes more carefully; the old method allocated a buffer slightly too big, seemingly due to luck. * tests/backref-word, tests/pcre: Add tests for this bug.
* doc: mention unibyte encoding fixPaul Eggert2016-01-071-0/+5
| | | | * NEWS: Document recent fix for encoding errors in unibyte locales.
* maint: update copyright year, bootstrap, init.shJim Meyering2016-01-011-1/+1
| | | | | | | | Run "make update-copyright" and then... * gnulib: Update to latest. * tests/init.sh: Update from gnulib. * bootstrap: Likewise.
* doc: clarify text vs binary match outputPaul Eggert2015-12-311-5/+9
| | | | | | | | * NEWS: * doc/grep.texi (File and Directory Selection): Make it clearer that grep can now output matching text before reporting a binary match. Problem reported by Norihiro Tanaka in: http://bugs.gnu.org/20526#83