delta/grep.git - git.savannah.gnu.org: git/grep.git

diff options

author	Jim Meyering <meyering@redhat.com>	2012-06-01 21:18:00 +0200
committer	Jim Meyering <meyering@redhat.com>	2012-06-02 11:06:08 +0200
commit	7aa698d36b5b2eeb8e90e7a327eb7ebe46d59e87 (patch)
tree	cadef59a85a9ffe401994e8addeb1a995394c279 /bootstrap
parent	2665746b756bd372ba856e165388dc98032362fd (diff)
download	grep-7aa698d36b5b2eeb8e90e7a327eb7ebe46d59e87.tar.gz

grep: fix how -i works with a match containing the Turkish I-with-dot

Fix a long-standing problem in the way grep's -i interacts with data whose byte count changes when we convert it to lower case. For example, the UTF-8 Turkish I-with-dot (İ) occupies two bytes, but its lower case analog, i, occupies just one byte. The code converts both search string and the haystack data to lower case, and then searches for the modified string in the modified buffer. The trouble arose when using a lowercase buffer <offset,length> pair to manipulate the original (longer) buffer. The solution is to change mbtolower to return additional information: a malloc'd mapping vector. With that, the caller maps the lowercase- relative <offset,length> to numbers that refer to the original buffer. This mapping is used only when lengths actually differ, so the cost in general should be small. * src/searchutils.c (mbtolower): Add the new map parameter. * src/search.h (mb_case_map_apply): New function. * src/kwsearch.c (Fexecute): Update mbtolower caller, and upon success, apply the new map. * src/dfasearch.c (EGexecute): Likewise. * tests/Makefile.am (XFAIL_TESTS): Remove turkish-I from this list; that test is no longer expected to fail. * NEWS (Bug fixes): Mention it. Reported by Ilya Basin in http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3413 and later by Strahinja Kustudic in http://savannah.gnu.org/bugs/?36567

Diffstat (limited to 'bootstrap')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: