delta/grep.git - git.savannah.gnu.org: git/grep.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	maint: update copyright dates	Jim Meyering	2023-01-01	1	-1/+1
\|
*	maint: make update-copyright	Jim Meyering	2022-01-01	1	-1/+1
\|
*	maint: run "make update-copyright"	Paul Eggert	2021-01-01	1	-1/+1
\|
*	grep: fix more Turkish-eyes bugs	Paul Eggert	2020-09-23	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix more bugs recently uncovered by Norihiro Tanaka (Bug#43577). * NEWS: Mention new bug report. * src/grep.c (ok_fold): New static var. (setup_ok_fold): New function. (fgrep_icase_charlen): Reject single-byte characters if they match some multibyte characters when ignoring case. This part of the patch is partly derived from <https://bugs.gnu.org/43577#14>, which means it is: Co-authored-by: Norihiro Tanaka <noritnk@kcn.ne.jp> (main): Call setup_ok_fold if ok_fold might be needed. * src/searchutils.c (kwsinit): With the grep.c changes, this code can now revert to classic 7th Edition Unix style; aborting would be wrong. * tests/turkish-eyes: Add tests for these bugs.
*	maint: update all copyright year number ranges	Jim Meyering	2020-01-01	1	-1/+1
\| \| \| \| \| \| \| \|	Run "make update-copyright" and then... * gnulib: Update to latest with copyright year adjusted. * tests/init.sh: Sync with gnulib to pick up copyright year. * bootstrap: Likewise. * doc/grep.in.1: Use "-" in copyright year ranges, not \en.
*	maint: update all copyright dates via "make update-copyright"	Jim Meyering	2019-01-01	1	-1/+1
\| \| \| \|	* gnulib: Also update submodule for its copyright updates.
*	maint: update URLs	Paul Eggert	2018-04-21	1	-1/+1
\| \| \| \| \|	Mostly this is just changing http: to https:. In one or two places it removes no-longer-useful URLs.
*	maint: update gnulib and copyright dates for 2018	Jim Meyering	2018-01-06	1	-1/+1
\| \| \| \| \| \|	* gnulib: Update to latest. * all files: Run "make update-copyright". * bootstrap: Update from gnulib.
*	maint: update gnulib and copyright dates for 2017	Jim Meyering	2017-01-01	1	-1/+1
\| \| \| \| \|	* gnulib: Update to latest. * all files: Run "make update-copyright".
*	maint: update copyright year, bootstrap, init.sh	Jim Meyering	2016-01-01	1	-1/+1
\| \| \| \| \| \| \| \|	Run "make update-copyright" and then... * gnulib: Update to latest. * tests/init.sh: Update from gnulib. * bootstrap: Likewise.
*	maint: update copyright year ranges to include 2015	Jim Meyering	2015-01-01	1	-1/+1
\| \| \| \| \|	Run "make update-copyright". Also, ... * grep.texi: Update manually, converting each "--" to "-".
*	dfasearch: skip kwset optimization when multi-byte+case-insensitive	Norihiro Tanaka	2014-01-26	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that DFA searching works with multi-byte locales, the only remaining reason to case-convert the searched input is the kwset optimization. But multi-byte case-conversion is so expensive that it's not worthwhile even to attempt that optimization. * src/dfasearch.c (kwsmusts): Skip this function in ignore-case mode when the locale is multi-byte. (EGexecute): Now that this code need not handle multi-byte case-ignoring matches, remove the expensive copy/case-conversion code. With no case-converted buffer, there is no longer any need to call mb_case_map_apply, so remove it and associated code. (kwsincr_case): Remove function. Now, every use of this function is equivalent to a use of kwsincr. Replace all uses. * tests/turkish-eyes: Test all of -E, -F and -G.
*	tests: remove superfluous uses of printf	Pádraig Brady	2014-01-10	1	-2/+2
\| \| \| \|	* tests/turkish-eyes: Remove unnecessary uses of printf.
*	grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales	Jim Meyering	2014-01-09	1	-0/+44
	These days, nearly everyone uses a multibyte locale, and grep is often used with the --ignore-case (-i) option, but that option imposes a very high cost in order to handle some unusual cases in just a few multibyte locales. This change gets most of the performance of using LC_ALL=C without eliminating the ability to search for multibyte strings. With the following example, I see an 11x speed-up with a 2.3GHz i7: Generate a 10M-line file, with each line consisting of 40 'j's: yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj \| head -10000000 > k Time searching it for the simple/noexistent string "foobar", first with this patch (best-of-5 trials): LC_ALL=en_US.UTF-8 env time src/grep -i foobar k 1.10 real 1.03 user 0.07 sys Back out that commit (temporarily), recompile, and rerun the experiment: git log -1 -p\|patch -R -p1; make LC_ALL=en_US.UTF-8 env time src/grep -i foobar k 12.50 real 12.41 user 0.08 sys The trick is to realize that for some search strings, it is easy to convert to an equivalent one that is handled much more efficiently. E.g., convert this command: grep -i foobar k to this: grep '[fF][oO][oO][bB][aA][rR]' k That allows the matcher to search in buffer mode, rather than having to extract/case-convert/search each line separately. Currently, we perform this conversion only when search strings contain neither '\' nor '['. See the comments for more detail. * src/main.c (trivial_case_ignore): New function. (main): When possible, transform the regexp so we can drop the -i. * tests/turkish-eyes: New file. * tests/Makefile.am (TESTS): Use it. * NEWS (Improvements): Mention it.