summaryrefslogtreecommitdiff
path: root/tests/turkish-eyes
Commit message (Collapse)AuthorAgeFilesLines
* maint: update copyright datesJim Meyering2023-01-011-1/+1
|
* maint: make update-copyrightJim Meyering2022-01-011-1/+1
|
* maint: run "make update-copyright"Paul Eggert2021-01-011-1/+1
|
* grep: fix more Turkish-eyes bugsPaul Eggert2020-09-231-3/+15
| | | | | | | | | | | | | | | | | Fix more bugs recently uncovered by Norihiro Tanaka (Bug#43577). * NEWS: Mention new bug report. * src/grep.c (ok_fold): New static var. (setup_ok_fold): New function. (fgrep_icase_charlen): Reject single-byte characters if they match some multibyte characters when ignoring case. This part of the patch is partly derived from <https://bugs.gnu.org/43577#14>, which means it is: Co-authored-by: Norihiro Tanaka <noritnk@kcn.ne.jp> (main): Call setup_ok_fold if ok_fold might be needed. * src/searchutils.c (kwsinit): With the grep.c changes, this code can now revert to classic 7th Edition Unix style; aborting would be wrong. * tests/turkish-eyes: Add tests for these bugs.
* maint: update all copyright year number rangesJim Meyering2020-01-011-1/+1
| | | | | | | | Run "make update-copyright" and then... * gnulib: Update to latest with copyright year adjusted. * tests/init.sh: Sync with gnulib to pick up copyright year. * bootstrap: Likewise. * doc/grep.in.1: Use "-" in copyright year ranges, not \en.
* maint: update all copyright dates via "make update-copyright"Jim Meyering2019-01-011-1/+1
| | | | * gnulib: Also update submodule for its copyright updates.
* maint: update URLsPaul Eggert2018-04-211-1/+1
| | | | | Mostly this is just changing http: to https:. In one or two places it removes no-longer-useful URLs.
* maint: update gnulib and copyright dates for 2018Jim Meyering2018-01-061-1/+1
| | | | | | * gnulib: Update to latest. * all files: Run "make update-copyright". * bootstrap: Update from gnulib.
* maint: update gnulib and copyright dates for 2017Jim Meyering2017-01-011-1/+1
| | | | | * gnulib: Update to latest. * all files: Run "make update-copyright".
* maint: update copyright year, bootstrap, init.shJim Meyering2016-01-011-1/+1
| | | | | | | | Run "make update-copyright" and then... * gnulib: Update to latest. * tests/init.sh: Update from gnulib. * bootstrap: Likewise.
* maint: update copyright year ranges to include 2015Jim Meyering2015-01-011-1/+1
| | | | | Run "make update-copyright". Also, ... * grep.texi: Update manually, converting each "--" to "-".
* dfasearch: skip kwset optimization when multi-byte+case-insensitiveNorihiro Tanaka2014-01-261-2/+4
| | | | | | | | | | | | | | | | | Now that DFA searching works with multi-byte locales, the only remaining reason to case-convert the searched input is the kwset optimization. But multi-byte case-conversion is so expensive that it's not worthwhile even to attempt that optimization. * src/dfasearch.c (kwsmusts): Skip this function in ignore-case mode when the locale is multi-byte. (EGexecute): Now that this code need not handle multi-byte case-ignoring matches, remove the expensive copy/case-conversion code. With no case-converted buffer, there is no longer any need to call mb_case_map_apply, so remove it and associated code. (kwsincr_case): Remove function. Now, every use of this function is equivalent to a use of kwsincr. Replace all uses. * tests/turkish-eyes: Test all of -E, -F and -G.
* tests: remove superfluous uses of printfPádraig Brady2014-01-101-2/+2
| | | | * tests/turkish-eyes: Remove unnecessary uses of printf.
* grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte localesJim Meyering2014-01-091-0/+44
These days, nearly everyone uses a multibyte locale, and grep is often used with the --ignore-case (-i) option, but that option imposes a very high cost in order to handle some unusual cases in just a few multibyte locales. This change gets most of the performance of using LC_ALL=C without eliminating the ability to search for multibyte strings. With the following example, I see an 11x speed-up with a 2.3GHz i7: Generate a 10M-line file, with each line consisting of 40 'j's: yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > k Time searching it for the simple/noexistent string "foobar", first with this patch (best-of-5 trials): LC_ALL=en_US.UTF-8 env time src/grep -i foobar k 1.10 real 1.03 user 0.07 sys Back out that commit (temporarily), recompile, and rerun the experiment: git log -1 -p|patch -R -p1; make LC_ALL=en_US.UTF-8 env time src/grep -i foobar k 12.50 real 12.41 user 0.08 sys The trick is to realize that for some search strings, it is easy to convert to an equivalent one that is handled much more efficiently. E.g., convert this command: grep -i foobar k to this: grep '[fF][oO][oO][bB][aA][rR]' k That allows the matcher to search in buffer mode, rather than having to extract/case-convert/search each line separately. Currently, we perform this conversion only when search strings contain neither '\' nor '['. See the comments for more detail. * src/main.c (trivial_case_ignore): New function. (main): When possible, transform the regexp so we can drop the -i. * tests/turkish-eyes: New file. * tests/Makefile.am (TESTS): Use it. * NEWS (Improvements): Mention it.