| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Problem reported by Jaroslav Skarvada (Bug#22357).
* NEWS: Document this and other recent performance fixes.
* src/grep.c (E_MATCHER_INDEX): New constant.
(all_single_byte_after_folding):
New function, split out from fgrep_icase_available.
(fgrep_icase_available): Use it.
(try_fgrep_pattern): New function, which also uses it.
(main): With two or more patterns, use try_fgrep_pattern to fix
performance regression. The number "two" here is just a heuristic.
|
|
|
|
|
|
|
|
| |
Problem reported by Andreas Schwab (Bug#25180).
* NEWS: Document this.
* src/grep.c (finalize_input): Ignore EINVAL lseek failures.
* tests/Makefile.am (TESTS): Add proc.
* tests/proc: New file.
|
|
|
|
|
| |
* NEWS: Mention early-exit options to avoid confusion. See:
http://lists.gnu.org/archive/html/grep-devel/2016-12/msg00007.html
|
|
|
|
|
|
| |
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
|
|
|
|
| |
* NEWS: Record release date.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* gnulib: Update to latest, for the dfa.c fix.
* NEWS (Bug fixes): Mention it.
* tests/false-match-mb-non-utf8: New file, with tests for this.
Based on tests from Stephane Chazelas.
* tests/Makefile.am (TESTS): Add it.
Introduced by commit v2.18-54-g3ef4c8e, a change that made grep use
its DFA matcher more aggressively. The malfunction arises only with
the DFA matcher, not with regex.
Reported by Stephane Chazelas in https://bugs.gnu.org/24975
|
|
|
|
|
|
|
|
|
|
| |
Problem reported by Stephane Chazelas (Bug#24973).
This bug was fixed in Gnulib.
* NEWS: Document the fix.
* tests/init.cfg (require_ru_RU_koi8_r): Remove.
* tests/unibyte-bracket-expr: Add a test for the bug.
Call get-mb-cur-max directly instead of bothering with
require_ru_RU_koi8_r.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit f6603c4e1e04dbb87a7232c4b44acc6afdf65fef,
as the extra performance is not worth the trouble for PCRE users.
Problem reported by Stephane Chazelas in:
http://bugs.gnu.org/22655#103
* NEWS: Document this and the next patch.
* src/dfasearch.c (EGexecute):
* src/grep.c (execute_fp_t):
* src/kwsearch.c (Fexecute):
* src/pcresearch.c (Pexecute):
First arg is now a const pointer again.
* src/grep.c (buf_has_encoding_errors): Now static.
* src/grep.h (buf_has_encoding_errors): Remove decl.
* src/search.h: Adjust decls.
* src/pcresearch.c (reflags): Remove. All uses removed.
(Pcompile, Pexecute): Do not use PCRE_MULTILINE.
|
|
|
|
|
|
|
| |
* NEWS: Document this.
* src/pcresearch.c (Pcompile): Search a line at a time if -x is
used, since -x uses ^ and $.
* tests/pcre: Test this.
|
|
|
|
|
|
|
|
|
| |
* NEWS, doc/grep.texi (Output Line Prefix Control):
Document this (Bug#24451).
* src/grep.c (offset_width): New static var.
(print_offset): Use it instead of arg. All callers changed.
(grep): Set it.
* tests/initial-tab: Test this.
|
|
|
|
|
|
|
| |
* NEWS: Document this (Bug#24451).
* src/grep.c (print_line_head): Do not attempt to backspace output.
* tests/initial-tab: New test.
* tests/Makefile.am (TESTS): Add it.
|
|
|
|
|
|
| |
* NEWS: Document this.
* src/grep.c (main): Do not exit right away with -L.
* tests/skip-read: Test for the fix.
|
|
|
|
|
|
|
|
| |
* NEWS: Document this.
* src/grep.c (main): With -f /dev/null, don't bother to read the
input. This is what FreeBSD grep does.
* tests/Makefile.am (TESTS): Add skip-read.
* tests/skip-read: New file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The performance improvement when output is /dev/null (commit
af6af288eac28951b5eee1eaaf373e22b2193b7b dated 2016-05-01)
breaks scripts that run "PROGRAM | grep PATTERN >/dev/null"
where PROGRAM dies when writing into a broken pipe.
Suppress the improvement if standard input is not seekable.
Problem reported by Gary Johnson (Bug#24941).
* NEWS: Document this.
* src/grep.c (seek_failed): New static var.
(seek_data_failed): Move decl earlier, to be next to seek_failed.
(file_must_have_nulls): Skip useless syscalls if seek_failed.
Lessen source-code nesting.
(reset): Set seek_failed and seek_data_failed.
Try lseek even on non-regular files.
(grep): New arg INEOF. All callers changed.
Do not clear seek_data_failed here, since 'reset' now does this.
(finalize_input): New static function.
(grepdesc): Use it.
(main): Do not exit on first match merely because output is
/dev/null.
* tests/grep-dev-null-out: Adjust to new behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem reported by Stephane Chazelas (Bug#22655).
* NEWS: Document this.
* doc/grep.texi (grep Programs): Warn about -Pz.
* src/pcresearch.c (reflags): New static var.
(multibyte_locale): Remove static var; now local to Pcompile.
(Pcompile): Check for (? and (* too. Set reflags instead of
dying when problematic operators are found.
(Pexecute): Use reflags to decide whether searches should
be multiline.
* tests/pcre: Test new behavior.
|
|
|
|
| |
* NEWS (Bug fixes): Mention the HP-UX fix.
|
|
|
|
|
|
| |
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
|
|
|
|
| |
* NEWS: Record release date.
|
|
|
|
|
| |
* NEWS (Bug fixes): Add an entry describing bug#24233.
The bug was fixed by commit v2.25-77-gad468bb, by chance.
|
|
|
|
|
|
|
|
| |
From a suggestion by Marcello Perathoner (Bug#22838).
* NEWS, doc/grep.texi (File and Directory Selection): Document this.
* src/grep.c (print_line_head): Do not suppress later output lines
merely because an earlier output line would have had an encoding error.
* tests/encoding-error: Test for the new behavior.
|
|
|
|
| |
* NEWS: Describe previous change.
|
|
|
|
| |
* NEWS: Adjust to match previous change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This follows up on the -iF performance improvement (Bug#23752).
* NEWS: Simplify description of -iF improvement.
* src/dfa.c: Do not include wctype.h.
(lonesome_lower, case_folded_counterparts): Move to localeinfo.c.
(CASE_FOLDED_BUFSIZE): Move to localeinfo.h.
* src/grep.c: Do not include wctype.h.
(lonesome_lower): Remove.
(fgrep_icase_available): Use case_folded_counterparts instead.
Do not call it for the same character twice.
Return false on wcrtomb failures (which should never happen).
(fgrep_to_grep_pattern, main): Simplify. Let fgrep_to_grep’s
caller fiddle with the global variables.
* src/localeinfo.c: Include <wctype.h>
(lonesome_lower, case_folded_counterparts):
Move here from src/dfa.c. Return int, not unsigned int.
Verify that CASE_FOLDED_BUFSIZE is big enough.
* src/localeinfo.h (CASE_FOLDED_BUFSIZE): Now 32, so that
we don’t expose lonesome_lower’s size.
* src/searchutils.c (kwsinit): Return new kwset instead of
storing it via a pointer. All callers changed. Simplify a bit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a multibyte locale, if a pattern is composed of only single byte
characters and their all counterparts are also single byte characters
and the pattern does not have invalid sequences, grep -iF uses the
fgrep matcher, the same as in a single byte locale (Bug#23752).
* NEWS: Mention it.
* src/grep.c (lonesome_lower): New constant.
(fgrep_icase_available): New function.
(fgrep_to_grep_pattern): Simplify it.
(main): Use them.
* src/searchutils.c (kwsinit): New arg MB_TRANS; all uses changed.
Try fgrep matcher for case insensitive matching by grep -F in multibyte
locale.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS: Improve description of recent change.
* src/dfa.c: Improve commentary. Indent new code (and some
long-existing howlers) more in GNU style.
(dfa_state): Reorder members to make struct smaller on x86.
mb_trindex member is now state_num, not size_t, so that -1 is more
natural; all uses changed.
(struct dfa): Similarly for mb_trcount member.
(state_index): Compute values for new state components before
allocating the state, to make the code easier to understand.
(state_index, dfastate): Prefer A & ~B to other forms like (A & B)
!= A.
(dfastate, build_state, transit_state): In new code, prefer i++ to
++i in for-loop control.
(build_state, transit_state): In new code, prefer < to >.
(transit_state): Add to *PP in one assignment, rather than in a
loop. Prefer !x to x == NULL. Use xmalloc instead of xnmalloc,
since the size is a constant. Do the size calculation as a signed
integer constant expression, so that the compiler diagnoses any
overflow.
(transit_state, free_mbdata): Tune by looping from -1 to N - 1,
rather than from 0 to N - 1 with a separate instance for -1.
(dfaexec_main): Rewrite to avoid side effects in if-part.
(free_mbdata): Simplify.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In non-UTF8 multibyte locales, matching the dot expression is very
slow, as the next state is calculated on demand. This change caches
the result for the typical case (Bug#21486).
Compare the run times of this command before and after this change,
on a i5-4570 CPU @ 3.20GHz using rawhide (~fedora 22) and compiled
with gcc 5.1.1 20150618:
yes "$(printf 'a%38db\n' 0)" | head -1000000 >in
env LC_ALL=ja_JP.eucJP time -p \
src/grep .......................................... in
Before: 19.10
After : 0.55
* NEWS: Document this.
* src/dfa.c: (struct dfa_state): New members curr_dependent, mb_trindex.
(MAX_TRCOUNT): New constant.
(struct dfa): New members mb_trans, mb_trcount.
(state_index): Initialize new members of struct dfa_state and calculate
dependency on context of next character for positions for dot.
(dfastate): Calculate follows positions for dot if enabled.
(realloc_trans_if_necessary): Allocate transition tables.
(build_state): Use new constant and reset transition tables.
(transit_state): Use cache for transition from a state with the dot
expression.
(free_mbdata): Deallocate transition tables.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Determining the file name and line number is a little tricky because
of the way the regular expressions are all concatenated onto a newline-
separated list. By the time grep would compile regular expressions,
the <filename,lineno> origin of each regexp was no longer available.
This patch adds a list of filename,first_lineno pairs, one per input
source, by which we can then map the ordinal regexp number to a
filename,lineno pair for the diagnostic.
* src/dfasearch.c (GEAcompile): When diagnosing an invalid regexp
specified via -f FILE, include the "FILENAME:LINENO: " prefix.
Also, when there are two or more lines with compilation failures,
diagnose all of them, rather than stopping after the first.
* src/grep.h (pattern_file_name): Declare it.
* src/grep.c: (struct FL_pair): Define type.
(fl_pair, n_fl_pair_slots, n_pattern_files, patfile_lineno):
Define globals.
(fl_add, pattern_file_name): Define functions.
(main): Call fl_add for each type of the following: -e argument,
-f argument, command-line-specified (without -e) regexp.
* tests/filename-lineno.pl: New file.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Improvements): Mention this.
Initially reported by Gunnar Wolf in https://bugs.debian.org/525214
Forwarded to grep's bug list by Santiago Ruano Rincón as
http://debbugs.gnu.org/23965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS: Don't claim 7x, as the value seems to be system-dependent.
* src/kwset.c (struct kwset.kwsexec, bmexec, acexec_trans, acexec):
* src/kwset.c, src/kwset.h (kwsalloc, kwsexec):
Don't put 'const' into the declaration when that is irrelevant to
the API. More generally, don't bother with 'const' when it's only
a local so it is reasonably obvious to a reader that it is 'const'
anyway. It would be overkill to add 'const' to all locals that
never change.
* src/kwset.c (U): Avoid unnecessary parens.
(treefails, memoff2_kwset, bmexec_trans, bmexec, cwexec, acexec_trans):
Prefer SIZE_MAX to (size_t) -1.
(bmexec_trans, cwexec, acexec_trans):
Remove attributes for static functions that no longer seem needed.
(memoff2_kwset): Rename from memchr2_kwset, since it returns
an offset, not a pointer. All uses changed.
(cwexec, acexec_trans) [lint]: Remove initialization that is no
longer needed; at least, GCC 6.1 x86-64 does not need it.
(acexec_trans): Clarify code by using nesting rather than 'continue'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Searching multiple fixed words, grep used the Commentz-Walter
algorithm, but this was O(m*n) and was very slow in the worst case.
For example:
- input: yes `printf %040d` | head -10000000
- word1: x0000000000000000000
- word2: x
This change instead uses the Aho-Corasick algorithm to search multiple
fixed words. It uses a high-quality trie-building function that is
already defined for Commentz-Walter in kwset.c.
I see 7x speed-up even for a typical case on Fedora 21 with a 3.2GHz i5
by this change. Using best-of-5 trials for the benchmark:
find /usr/share/doc/ -type f |
LC_ALL=C time -p xargs.sh src/grep -Ff /usr/share/dict/linux.words >/dev/null
The results were:
real 11.37 user 11.03 sys 0.24 [without the change]
real 1.49 user 1.31 sys 0.15 [with the change]
* src/kwset.c (struct kwset): Add a new member 'mode'.
(kwsalloc): Use it.
All callers are changed.
(kwsincr): Using Aho-Corasick algorithm, build tries in normal order.
(acexec_trans, acexec): Add a new function.
(kwsexec): Use it.
* src/kwset.h (kwsalloc): Update a prototype.
* NEWS (Improvements): Mention it.
|
|
|
|
|
| |
* NEWS: Move the mention of the /dev/null speed-up from the
block for 2.25 into the current, in-preparation block.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This sped up 'seq 10000000000 | grep . >/dev/null' by a factor of
380,000 on my platform (Fedora 23, x86-64, AMD Phenom II X4 910e,
en_US.UTF-8 locale).
* NEWS: Document this.
* src/grep.c (grepbuf): exit_on_match no longer implies that -q
was specified, so when a match is found, exit with exit_failure if
an error was also found.
(grepdesc): Omit unnecessary S_ISREG and st_ino checks.
out_stat.st_ino is zero if stdout is not a regular file,
and this cannot possibly equal st->st_ino.
(main): Omit duplicate initialization of exit_failure. Do not
bother with isatty unless -q is not used and stdout is a character
special file and --color=auto and TERM says colorization is
possible. Most importantly, set exit_on_match if the output is
/dev/null.
* tests/grep-dev-null-out: New test.
* tests/Makefile.am (TESTS): Add it.
* tests/status: Do not require grep to actually read all the input
files when the output is /dev/null and a matching line has been
found.
|
|
|
|
|
|
| |
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
|
|
|
|
| |
* NEWS: Record release date.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This works around glibc bug 19932:
https://sourceware.org/bugzilla/show_bug.cgi?id=19932
The actual bug fix was the update to the current version of Gnulib.
grep problem reported by Björn Jacke in: http://bugs.gnu.org/23234
* NEWS: Mention this.
* doc/grep.texi (File and Directory Selection): Crossref to LC_*
section. Suggest why -a or LC_ALL=C might be useful.
(Environment Variables): Mention 'locale -a'.
Say that LC_CTYPE also specifies encoding, and that every
byte is a valid character in the C or POSIX locale.
* tests/c-locale: New test.
* tests/Makefile.am (TESTS): Add it.
|
|
|
|
|
|
|
| |
Problem reported by Michael Jess.
* NEWS: Document this.
* src/pcresearch.c (Pcompile): Do not diagnose [^ when [ is unescaped.
* tests/pcre: Test for the bug.
|
|
|
|
|
|
| |
* NEWS (Improvements): Move this new section from within the block
for the already-released 2.24 into the proper "next-release" block.
Also, retain the 2-blank-line separator between blocks.
|
|
|
|
|
|
|
|
|
| |
* NEWS: Document this.
* doc/grep.texi (Other Options): Clarify that -z affects output
as well as input data.
* src/grep.c (print_line_middle): Output eolbyte, not newline, if -o.
* tests/null-byte: Test -o too.
* tests/pcre-context: Adjust test to match new behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Feature request and initial version reported by Assaf Gordon in:
http://bugs.gnu.org/23031
* NEWS: Document this.
* src/grep.c: Include <stdarg.h>.
(stdout_errno): New static var.
(write_error_seen): Remove; superseded by stdout_errno.
All uses changed.
(putchar_errno, fputs_errno, printf_errno, fwrite_errno)
(fflush_errno): New static functions.
(print_filename, print_sep, print_offset, print_line_head)
(print_line_middle, print_line_tail, prline, prtext, grep)
(grepdesc): Use them.
* tests/write-error-msg: New file.
* tests/Makefile.am (TESTS): Add it.
|
|
|
|
|
|
| |
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
|
|
|
|
| |
* NEWS: Record release date.
|
|
|
|
|
|
|
| |
Problem reported by Sergei Trofimovich in: http://bugs.gnu.org/22655
* NEWS: Document this.
* src/pcresearch.c (Pcompile): Warn with -Pz and anchors.
* tests/pcre: Test new behavior.
|
|
|
|
|
|
|
|
|
|
|
| |
* src/dfasearch.c (EGexecute): Clear the newline_anchor bit when
eolbyte is not '\n'.
* tests/z-anchor-newline: New file.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Bug fixes): Describe it.
Originally reported by Ulrich Mueller in
https://bugs.gentoo.org/show_bug.cgi?id=574662
Reported to us by Sergei Trofimovich as http://debbugs.gnu.org/22655
|
|
|
|
|
|
| |
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
|
|
|
|
| |
* NEWS: Record release date.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* NEWS, doc/grep.texi (Matching Control): Mention this.
* src/dfasearch.c (EGexecute):
* src/pcresearch.c (Pcompile):
Don't get confused by -w if -x is also present.
* src/pcresearch.c (Pcompile): Remove misleading comment about
non-UTF-8 multibyte locales, as PCRE doesn't support them.
Calculate buffer sizes more carefully; the old method
allocated a buffer slightly too big, seemingly due to luck.
* tests/backref-word, tests/pcre: Add tests for this bug.
|
|
|
|
| |
* NEWS: Document recent fix for encoding errors in unibyte locales.
|
|
|
|
|
|
|
|
| |
Run "make update-copyright" and then...
* gnulib: Update to latest.
* tests/init.sh: Update from gnulib.
* bootstrap: Likewise.
|
|
|
|
|
|
|
|
| |
* NEWS:
* doc/grep.texi (File and Directory Selection):
Make it clearer that grep can now output matching text before
reporting a binary match. Problem reported by Norihiro Tanaka in:
http://bugs.gnu.org/20526#83
|