delta/grep.git - git.savannah.gnu.org: git/grep.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	grep: with -E, unmatched ')' matches itself	Paul Eggert	2014-06-27	2	-1/+4
\| \| \| \| \| \| \| \|	Problem reported by Nathan Weeks in: http://bugs.gnu.org/17856 * src/grep.c (Ecompile): Also specify RE_UNMATCHED_RIGHT_PAREN_ORD. * doc/grep.texi (Fundamental Structure), NEWS: Document this. * tests/ere.tests: Add a couple of tests for this. * tests/spencer1.tests: Fix exit status.
*	grep: fix --max-count=N (-m N) to stop reading after Nth match	Jim Meyering	2014-05-30	2	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With --max-count=N (-m N), grep is supposed to stop reading input after it has found the Nth match. However, a recent context- related change made it so grep would always read to end of file. * src/grep.c (prtext): Don't let a negative "out_after" value make "pending" line count negative. * tests/max-count-overread: New test, for this. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention it. * THANKS: Add names of two recent bug reporters. This bug was introduced by commit v2.18-139-g5122195. Reported by Marc Aldorasi in http://bugs.gnu.org/17640.
*	dfa: fix bug with regex containing multiple begin/end-line constraints	Norihiro Tanaka	2014-05-28	2	-0/+29
\| \| \| \| \| \| \| \| \|	grep -E 'a(b$\|c$)' would mistakenly match "aa". * src/dfa.c (dfamust): When resetting 'is' in OR, also reset 'begline' and 'endline' of 'must'. * NEWS (Bug fixes): Mention it. This bug was introduced via commit v2.18-85-g2c94326. Reported by Péter Radics in <http://bugs.gnu.org/17617>.
*	grep: --exclude-dir=FOO/ now ignores the trailing slash	Paul Eggert	2014-05-24	1	-0/+4
\| \| \| \| \| \| \|	Problem reported by Khaled Ziyaeen; see: http://bugs.gnu.org/17481 * NEWS, doc/grep.texi (File and Directory Selection): Document this. * src/grep.c (main): Implement this. * tests/include-exclude: Test this.
*	tests: add test case for newline-count fix	Norihiro Tanaka	2014-05-17	2	-0/+29
\| \| \| \| \|	* tests/count-newline: New test. * tests/Makefile.am (TESTS): Add it.
*	tests: port mb-non-UTF8-performance to RHEL 6.5	Paul Eggert	2014-05-15	1	-1/+5
\| \| \| \| \|	* tests/mb-non-UTF8-performance (timeout): Use an integer, as 'timeout 1.234' doesn't work in EUC locales.
*	dfa: fix bug with \< etc in multibyte locales	Paul Eggert	2014-05-10	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem reported by Stephane Chazelas in: http://bugs.gnu.org/16867 * NEWS: Document the fix. * src/dfa.c (dfaoptimize): Remove any superset if changing from UTF-8 to unibyte, and if the pattern has no backreferences. (dfassbuild): In multibyte locales, treat \< \> \b \B as backreferences in the DFA, since the DFA relies on unibyte tests to check them. (dfacomp): Optimize after building the superset, so that dfassbuild can depend on d->multibyte. A downside is that dfaoptimize must remove supersets that are likely slower than the DFA after optimization, but that's been done in the above-described change. * tests/Makefile.am (XFAIL_TESTS): Remove word-delim-multibyte, since the test works now.
*	tests: add test case for -C 0 change	Paul Eggert	2014-05-10	2	-0/+28
\| \| \| \| \|	* tests/context-0: New test. * tests/Makefile.am (TESTS): Add it.
*	grep: fix -w match next to a multibyte letter	Paul Eggert	2014-05-05	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* NEWS: Document this. * src/dfasearch.c, src/kwsearch.c (WCHAR): Remove. (wordchar): New static function. * src/dfasearch.c (EGexecute): * src/kwsearch.c (Fexecute): Use the new functions, so that the code works correctly if a multibyte character adjacent to the match has two or more bytes. * src/search.h, src/searchutils.c (mb_prev_wc, mb_next_wc): New functions. * tests/word-delim-multibyte: Add a test for grep -w (which now passes), and a test for \> (which still fails). The \< test also still fails.
*	grep: fix encoding-error incompatibilities among regex, DFA, KWset	Paul Eggert	2014-05-05	3	-17/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This follows up to http://bugs.gnu.org/17376 and fixes a different set of incompatibilities, namely between the regex matcher and the other matchers, when the pattern contains encoding errors. The GNU regex matcher is not consistent in this area: sometimes an encoding error matches only itself, and sometimes it matches part of a multibyte character. There is no documentation for grep's behavior in this area and users don't seem to care, and it's simpler to defer to the regex matcher for problematic cases like these. * NEWS: Document this. * src/dfa.c (ctok): Remove. All uses removed. (parse_bracket_exp, atom): Use BACKREF if a pattern contains an encoding error, so that the matcher will revert to regex. * src/dfasearch.c, src/grep.c, src/pcresearch.c, src/searchutils.c: Don't include dfa.h, since search.h now does that for us. * src/dfasearch.c (EGexecute): * src/kwsearch.c (Fexecute): In a UTF-8 locale, there's no need to worry about matching part of a multibyte character. * src/grep.c (contains_encoding_error): New static function. (main): Use it, so that grep -F is consistent with plain fgrep when the pattern contains an encoding error. * src/search.h: Include dfa.h, so that kwsearch.c can call using_utf8. * src/searchutils.c (is_mb_middle): Remove UTF-8-specific code. Callers now ensure that we are in a non-UTF-8 locale. The code was clearly wrong, anyway. * tests/fgrep-infloop, tests/invalid-multibyte-infloop: * tests/prefix-of-multibyte: Do not require that grep have a particular behavor for this test. It's OK to match (exit status 0), not match (exit status 1), or report an error (exit status 2), since the pattern contains an encoding error and grep's behavior is not specified for such patterns. Test only that KWset, DFA, and regex agree. * tests/prefix-of-multibyte: Add tests for ABCABC and __..._ABCABC___.
*	tests: improve coverage for prefix-of-multibyte	Paul Eggert	2014-05-04	1	-4/+9
\| \| \| \|	* tests/prefix-of-multibyte: Also test the regex version.
*	grep: make KWset and DFA agree about invalid sequences in patterns	Norihiro Tanaka	2014-05-04	2	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	See: http://bugs.gnu.org/17376 * src/dfa.c (dfambcache): Don't cache invalid sequences, because they can't be represented by wide characters. (dfambcache, mbs_to_wchar): Return WEOF for invalid sequences. (ctok): New global variable. (parse_bracket_exp, atom, match_anychar, match_mb_charset): Don't allow WEOF. (lex): Set 'ctok'. * src/kwsearch.c (Fexecute): * src/searchutils.c (is_mb_middle): Don't check here. * tests/invalid-multibyte-infloop: Adjust to fixed behavior. * tests/prefix-of-multibyte: Add test cases for this bug.
*	misc: fix doc and test bugs re grep -z	Paul Eggert	2014-04-24	1	-2/+2
\| \| \| \| \| \| \|	Problem reported by Stephane Chazelas in: http://bugs.gnu.org/16871 * doc/grep.texi (Usage): Remove incorrect example with -P. * tests/pcre: Improve test so that it actually tests whether \s matches a newline.
*	tests: use consistent spelling for locale name, en_US.UTF-8	Jim Meyering	2014-04-23	1	-1/+1
\| \| \| \| \| \|	* tests/pcre-infloop: Spell locale name, en_US.UTF-8, consistently, converting this one use from "en_US.utf8", which would provoke a test failure on OS/X.
*	grep: -P now rejects invalid input sequences in UTF-8 locales	Paul Eggert	2014-04-21	2	-4/+3
\| \| \| \| \| \| \| \| \| \|	See <http://bugs.gnu.org/17245> and <http://bugs.exim.org/1468>. * NEWS: Document this. * src/pcresearch.c (Pexecute): Do not use PCRE_NO_UTF8_CHECK, as this leads to undefined behavior when the input is not UTF-8. * tests/pcre-infloop, tests/pcre-invalid-utf8-input: Exit status is now 2, not 1, when grep -P is given invalid UTF-8 data in a UTF-8 locale.
*	dfa: fix bug that caused NUL to be mishandled in patterns	Paul Eggert	2014-04-20	2	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \|	This bug was introduced in the early-2012 patches that fixed some context-handling bugs. Bisecting found commit d8951d3f4e1bbd564809aa8e713d8333bda2f802 (2012-02-05 18:00:43 +0100), but it apears the underlying problem was introduced in commit 8b47c4cf6556933f59226c234b0fe984f6c77dc7 (2012-01-03 11:22:09 +0100). * NEWS: Mention bug fix. * src/dfa.c (char_context): Consider NUL to be a newline only if -z. * tests/Makefile.am (TESTS): Add null-byte. * tests/null-byte: New file.
*	tests: detect an infloop-inducing bug in grep -P (pcre-8.35)	Jim Meyering	2014-04-14	2	-0/+34
\| \| \| \| \|	* tests/pcre-infloop: New test. * tests/Makefile.am (TESTS): Add it.
*	grep: cleanup for empty-string fix	Paul Eggert	2014-04-11	1	-7/+31
\| \| \| \| \| \| \| \|	* NEWS: Document it. * src/dfasearch.c (GEAcompile): * src/kwsearch.c (Fcompile): Use C99-style decls to simplify. Avoid duplicate code. * tests/empty-line: Add some more tests like this.
*	grep: no match for the empty string included in multiple patterns	Norihiro Tanaka	2014-04-11	2	-0/+18
\| \| \| \| \|	* src/dfasearch.c (EGAcompile): Fix it. * src/kwsearch.c (Fcompile): Fix it.
*	tests: placate "make syntax-check" re compare arg ordering	Jim Meyering	2014-03-28	1	-1/+1
\| \| \| \| \| \|	* tests/euc-mb: Reverse order of arguments to compare. Be consistent in ordering compare arguments: expected followed by actual.
*	grep: perform the kwset-helping DFA match in narrower range	Norihiro Tanaka	2014-03-27	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \|	When kwsexec gives us the offset of a potential match, we compute line begin/end and then run the DFA matcher to see if there really is a match on that line. When the beginning of the line, BEG, is not on a multibyte character boundary, advance BEG until it on such a boundary, before running the DFA search. * src/dfasearch.c (EGexecute): As above. Add a comment. * tests/euc-mb: Add a test case that exercises this code. This addresses http://debbugs.gnu.org/17095.
*	tests: avoid false-positive failure on some AMD CPUs	Jim Meyering	2014-03-16	1	-2/+4
\| \| \| \| \|	* tests/mb-non-UTF8-performance: Avoid false-positive failure when run on certain AMD processors.
*	tests: make a performance-measuring test less system-sensitive	Jim Meyering	2014-03-10	2	-3/+29
\| \| \| \| \| \| \| \| \| \| \| \| \|	Andreas Schwab reported in http://debbugs.gnu.org/16941 that this test would timeout and fail on m68k-suse-linux. Rather than testing absolute duration with a limit tuned to today's hardware, compare performance of grep with LC_ALL=C against that same command using LC_ALL=ja_JP.eucJP. * tests/init.cfg (require_hi_res_time_): New function. * tests/mb-non-UTF8-performance: Rewrite to use it: record absolute duration D of the first (normally much faster) command, and set a timeout of 8*D for the command running in an affected locale.
*	fgrep: fix case-fold incompatibility with plain 'grep'	Paul Eggert	2014-03-07	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fgrep converted to lowercase, whereas the regex code converted to uppercase. The resulting behaviors don't agree in offbeat cases like Greek sigmas and Turkish Is. Fix this by changing fgrep to agree with the regex code. * src/kwsearch.c (Fcompile, Fexecute): * src/searchutils.c (kwsinit, mbtoupper): Convert to uppercase, not to lowercase, for compatibility with plain 'grep'. * src/search.h, src/searchutils.c (mbtoupper): Rename from mbtolower, since it now converts to uppercase. All uses changed. * tests/case-fold-titlecase: Add tests for this.
*	grep: fix case-fold mismatches between DFA and regex	Paul Eggert	2014-03-07	1	-11/+156
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DFA code and the regex code didn't use the same semantics for case-folding. The regex code says that the data char d matches the pattern char p if uc (d) == uc (p). POSIX is unclear in this area; the simplest fix for now is to change the DFA code to agree with the regex code. See <http://bugs.gnu.org/16919>. * src/dfa.c (static_assert): New macro, if not already defined. (setbit_case_fold_c): Assume MB_CUR_MAX is 1 and that case_fold is nonzero; all callers changed. (setbit_case_fold_c, parse_bracket_exp, lex, atom): Case-fold like the regex code does. (lonesome_lower): New constant. (case_folded_counterparts): New function. (parse_bracket_exp): Prefer plain setbit when case-folding is not needed. * src/dfa.h (CASE_FOLDED_BUFSIZE): New constant. (case_folded_counterparts): New function decl. * src/main.c (trivial_case_ignore): Case-fold like the regex code does. (main): Try to improve comment re trivial_case_ignore. * tests/case-fold-titlecase: Add lots more test cases.
*	grep: fix bugs with -i and titlecase	Paul Eggert	2014-02-28	2	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \|	* NEWS: Document this. * src/dfa.c (setbit_wc): Simplify. (setbit_c): Remove; no longer used. (setbit_case_fold_c, parse_bracket_exp, atom): Don't mishandle titlecase. For 'atom', this removes the need for the refactoring of Bug#16729. (lex): Use the slower approach only for letters that have a differing case. * tests/case-fold-titlecase: New file. * tests/Makefile.am (TESTS): Add it.
*	grep: fix multiple bugs with bracket expressions	Paul Eggert	2014-02-27	2	-0/+34
\| \| \| \| \| \| \| \| \| \| \|	* NEWS: Document this. * src/dfa.c (using_simple_locale): New function. (parse_bracket_exp): Handle bracket expressions like [a-[.z.]] correctly. Don't assume that dfaexec handles expressions like [^a-z] correctly, as they can match multiple characters in some locales. * tests/posix-bracket: New file. * tests/Makefile.am (TESTS): Add it.
*	align grep -Pw with grep -w	Stephane Chazelas	2014-02-25	2	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the -w option, with -P, we used to look for the pattern surrounded by word boundaries. That's different from what grep -w does and what the documentation describes. Now align with grep -w and the documentation by using PCRE look-behind and look-ahead operators to match the pattern if it is not surrounded by word constituents. * src/pcresearch.c (Pcompile): Use (?<!\w)(?:...)(?!\w) rather than \b(?:...)\b. * NEWS (Bug fixes): Mention it. * tests/pcre-w: New file. * tests/Makefile.am (TESTS): Add it. This complements the fix for http://debbugs.gnu.org/16865
*	grep -P: fix it so backreferences now work with -w and -x	Stephane Chazelas	2014-02-24	2	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \|	To implement -w and -x, we bracket the search term with parentheses. However, that set of parentheses had the default semantics of "capturing", i.e., creating a backreferenceable matched quantity. Instead, use (?:...), to create a non-capturing group. * src/pcresearch.c (Pcompile): Use (?:...) rather than (...). * NEWS (Bug fixes): Mention it. * tests/pcre-wx-backref: New file. * tests/Makefile.am (TESTS): Add it. This addresses http://debbugs.gnu.org/16865
*	tests: test for the non-UTF8 multi-byte performance regression	Jim Meyering	2014-02-20	3	-0/+49
\| \| \| \| \| \| \| \| \| \| \|	Test for the just-fixed performance regression. With a 100-200x differential, it is reasonable to expect that a very slow system will be able to complete the designated task in a few seconds, while with the bug, even a very fast system would exceed the timeout. * tests/mb-non-UTF8-performance: New file. * tests/Makefile.am (TESTS): Add it. * tests/init.cfg (require_JP_EUC_locale_): New function.
*	tests: test [^^-^] in unibyte locales	Paul Eggert	2014-02-20	3	-0/+44
\| \| \| \| \| \| \| \|	This is a bug in the current dfa.c, which was reintroduced by the recent reversion from RRI. * tests/unibyte-negated-circumflex: New file. * tests/Makefile.am (TESTS): Add it. * tests/init.cfg (require_unibyte_locale): New function.
*	maint: remove vestiges of support for long-disabled --mmap option	Jim Meyering	2014-01-27	2	-21/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This option was disabled in March of 2010, and began to elicit a warning in January of 2012. Its time has come. * doc/grep.in.1: Remove mention. * doc/grep.texi: Likewise. * src/main.c (GROUP_SEPARATOR_OPTION, usage, MMAP_OPTION) (long_options, main): Remove all traces. * tests/Makefile.am (check_PROGRAMS): Remove mention of ignore-mmap. * tests/ignore-mmap: Remove file. * NEWS (Maintenance): Mention it.
*	dfasearch: skip kwset optimization when multi-byte+case-insensitive	Norihiro Tanaka	2014-01-26	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that DFA searching works with multi-byte locales, the only remaining reason to case-convert the searched input is the kwset optimization. But multi-byte case-conversion is so expensive that it's not worthwhile even to attempt that optimization. * src/dfasearch.c (kwsmusts): Skip this function in ignore-case mode when the locale is multi-byte. (EGexecute): Now that this code need not handle multi-byte case-ignoring matches, remove the expensive copy/case-conversion code. With no case-converted buffer, there is no longer any need to call mb_case_map_apply, so remove it and associated code. (kwsincr_case): Remove function. Now, every use of this function is equivalent to a use of kwsincr. Replace all uses. * tests/turkish-eyes: Test all of -E, -F and -G.
*	tests: remove superfluous uses of printf	Pádraig Brady	2014-01-10	1	-2/+2
\| \| \| \|	* tests/turkish-eyes: Remove unnecessary uses of printf.
*	grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales	Jim Meyering	2014-01-09	2	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These days, nearly everyone uses a multibyte locale, and grep is often used with the --ignore-case (-i) option, but that option imposes a very high cost in order to handle some unusual cases in just a few multibyte locales. This change gets most of the performance of using LC_ALL=C without eliminating the ability to search for multibyte strings. With the following example, I see an 11x speed-up with a 2.3GHz i7: Generate a 10M-line file, with each line consisting of 40 'j's: yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj \| head -10000000 > k Time searching it for the simple/noexistent string "foobar", first with this patch (best-of-5 trials): LC_ALL=en_US.UTF-8 env time src/grep -i foobar k 1.10 real 1.03 user 0.07 sys Back out that commit (temporarily), recompile, and rerun the experiment: git log -1 -p\|patch -R -p1; make LC_ALL=en_US.UTF-8 env time src/grep -i foobar k 12.50 real 12.41 user 0.08 sys The trick is to realize that for some search strings, it is easy to convert to an equivalent one that is handled much more efficiently. E.g., convert this command: grep -i foobar k to this: grep '[fF][oO][oO][bB][aA][rR]' k That allows the matcher to search in buffer mode, rather than having to extract/case-convert/search each line separately. Currently, we perform this conversion only when search strings contain neither '\' nor '['. See the comments for more detail. * src/main.c (trivial_case_ignore): New function. (main): When possible, transform the regexp so we can drop the -i. * tests/turkish-eyes: New file. * tests/Makefile.am (TESTS): Use it. * NEWS (Improvements): Mention it.
*	tests: port Solaris 10 /bin/sh patch back to GNU/Linux	Paul Eggert	2014-01-07	4	-4/+5
\| \| \| \| \| \| \| \|	Problem reported by Jim Meyering. * tests/bre, tests/ere, tests/spencer1-locale: Prefer re_shell, not re_shell_. * tests/init.sh (re_shell): New var, which is exported instead of re_shell_.
*	Port to Solaris 10 /bin/sh.	Paul Eggert	2014-01-07	4	-3/+4
\| \| \| \| \| \| \|	Problem reported by Dagobert Michelsen in <http://bugs.gnu.org/16380>. * tests/bre, tests/ere, tests/spencer1-locale: Prefer re_shell_ to SHELL, if re_shell_ is set. * tests/init.sh (re_shell_): Export if it's used.
*	maint: update copyright dates for 2014	Jim Meyering	2014-01-01	39	-39/+39
\| \| \| \|	Do that by running "make update-copyright".
*	pcre: use PCRE_NO_UTF8_CHECK properly	Jim Meyering	2013-12-31	1	-7/+4
\| \| \| \| \| \| \| \| \|	In order to obtain the behavior we want, i.e., to disable error-on-invalid-UTF-in-input, apply this PCRE option in pcre_exec, not when compiling. * src/pcresearch.c (Pexecute): Use PCRE_NO_UTF8_CHECK here, ... (Pcompile): ...rather than here. * tests/pcre-invalid-utf8-input: Adjust test case to test for this.
*	pcre: tell grep -P to relax its stance on invalid multibyte chars	Santiago Ruano Rincón	2013-12-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not exit-2 for invalid UTF-8 characters. Just prior to this change, this command would match no lines and fail like this: $ printf 'j\x82\nj\n'\|LC_ALL=en_US.UTF-8 grep -P j\|cat -A; echo $? grep: invalid UTF-8 byte sequence in input 2 After this change, the same command matches both lines, and succeeds: jM-^B$ j$ 0 * src/pcresearch.c (Pcompile): Use PCRE_NO_UTF8_CHECK, too, and add a comment. * tests/pcre-utf8: Add a test and a comment. This change did not work with Debian unstable pcre-8.31-2 or with some 8.33 and 8.34-based versions, but does work with Fedora 20's 8.33 and with a built-from-latest source library. Based on a patch by Santiago Ruano Rincón. See http://bugs.gnu.org/15758/
*	tests: avoid FP failure due to exhausted memory	Jim Meyering	2013-12-21	1	-0/+3
\| \| \| \| \|	* tests/long-line-vs-2GiB-read: Don't declare the test "failed" when running out of memory. In that case, skip it.
*	grep: handle lines longer than INT_MAX on more systems	Jim Meyering	2013-12-18	2	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When trying to exercize some long-line-handling code, I ran these commands: $ dd bs=1 seek=2G of=big < /dev/null; grep -l x big; echo $? grep: big: Invalid argument 2 grep should not have issued that diagnostic, and it should have exited with status 1, not 2. What happened? grep read the 2GiB of NULs, doubled its buffer size, copied the 2GiB into the new 4GiB buffer, and proceeded to call "read" with a byte-count argument of 2^32. On at least Darwin 12.5.0, that makes read fail with EINVAL. The solution is to use gnulib's safe_read wrapper. * src/main.c: Include "safe-read.h" (fillbuf): Use safe_read, rather than bare read. The latter cannot handle a read size of 2^32 on some systems. * bootstrap.conf (gnulib_modules): Add safe-read. * tests/long-line-vs-2GiB-read: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention it.
*	tests: port to non-GNU sed	Jim Meyering	2013-11-25	1	-1/+1
\| \| \| \| \| \| \| \|	* tests/multibyte-white-space (utf8_space_characters): The generation of test inputs relied on GNU sed's interpretation of \<, but that is not portable, and caused spurious test failures. Adjust the sed regexp to work on all versions. Reported by Karl Dubost in http://bugs.gnu.org/15953.
*	grep: fix regression with -P vs. invalid UTF-8 input	Jim Meyering	2013-11-02	2	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \|	* src/pcresearch.c (Pexecute): Don't abort upon unexpected PCRE-specific error code. Explicitly handle PCRE_ERROR_BADUTF8, and change the default to print a diagnostic including the unhandled integer PCRE error code and exit with status 2. * tests/pcre-invalid-utf8-input: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention it. * THANKS: Update. Reported by Dave Reisner in http://bugs.gnu.org/15758.
*	grep: fix regression involving \s and \S	Jim Meyering	2013-11-02	2	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \|	Commit v2.14-40-g01ec90b made \s and \S work with multi-byte characters, but it made it so any use like \s, \s+, \s?, \s{3} would malfunction in a multi-byte locale. src/dfa.c (lex): Also reset laststart. * tests/backslash-s-and-repetition-operators: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention it. * THANKS: Update. Reported by Mirraz Mirraz in http://bugs.gnu.org/15773.
*	tests: port more tests to bourne shells with hex-challenged printf	Jim Meyering	2013-10-24	3	-3/+3
\| \| \| \| \| \| \|	* tests/pcre-utf8: Convert the hex \xHH literals for the euro symbol to octal \OOO. * tests/turkish-I: Likewise for "I with dot". * tests/turkish-I-without-dot: Likewise for another Turkish I: U+0131.
*	tests: port to bourne shells whose printf doesn't grok hex	Jim Meyering	2013-10-23	5	-13/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use octal escapes, not hex, in printf(1) format strings, and in one case, use $AWK's printf so we can continue to use the table of hex values. * tests/char-class-multibyte: Use printf octal escapes, not hex, for portability to shells like dash and Solaris 10's /bin/sh. * tests/backslash-s-vs-invalid-multitype: Likewise. * tests/surrogate-pair: Likewise. * tests/unibyte-bracket-expr: Count in decimal and convert to octal. * tests/multibyte-white-space (hex_printf): New function. Use it in place of printf so we can retain the table of hex digits without hitting the limitation of some bourne shells. Reported by Paul Eggert in http://bugs.gnu.org/15690#11
*	tests: extend the multibyte-white-space test	Jim Meyering	2013-10-19	1	-19/+36
\| \| \| \| \| \| \|	* tests/multibyte-white-space (utf8_space_characters): Add more single-byte whitespace characters. Align RHS hex values and make the sed substitution less rigid, to accommodate. Also, ensure that grep '\S' exits with status 1.
*	tests: add a test for better coverage of some tricky code	Jim Meyering	2013-10-09	1	-0/+1
\| \| \| \| \| \|	* tests/spencer1.tests: Add a non-range bracket expression representing the same regexp, to cover the alternate code path, the one that does not require a regcomp/exec call to interpret the regexp.
*	tests: ensure neither \s nor \S matches an invalid multibyte character	Jim Meyering	2013-10-01	2	-0/+27
\| \| \| \| \| \| \|	* tests/backslash-S-vs-invalid-multitype: New file. Prompted by the bug report from Roman at http://savannah.gnu.org/bugs/?40009 * tests/Makefile.am (TESTS): Add it.