summaryrefslogtreecommitdiff
path: root/bootstrap
diff options
context:
space:
mode:
authorJim Meyering <meyering@redhat.com>2012-06-01 21:18:00 +0200
committerJim Meyering <meyering@redhat.com>2012-06-02 11:06:08 +0200
commit7aa698d36b5b2eeb8e90e7a327eb7ebe46d59e87 (patch)
treecadef59a85a9ffe401994e8addeb1a995394c279 /bootstrap
parent2665746b756bd372ba856e165388dc98032362fd (diff)
downloadgrep-7aa698d36b5b2eeb8e90e7a327eb7ebe46d59e87.tar.gz
grep: fix how -i works with a match containing the Turkish I-with-dot
Fix a long-standing problem in the way grep's -i interacts with data whose byte count changes when we convert it to lower case. For example, the UTF-8 Turkish I-with-dot (İ) occupies two bytes, but its lower case analog, i, occupies just one byte. The code converts both search string and the haystack data to lower case, and then searches for the modified string in the modified buffer. The trouble arose when using a lowercase buffer <offset,length> pair to manipulate the original (longer) buffer. The solution is to change mbtolower to return additional information: a malloc'd mapping vector. With that, the caller maps the lowercase- relative <offset,length> to numbers that refer to the original buffer. This mapping is used only when lengths actually differ, so the cost in general should be small. * src/searchutils.c (mbtolower): Add the new map parameter. * src/search.h (mb_case_map_apply): New function. * src/kwsearch.c (Fexecute): Update mbtolower caller, and upon success, apply the new map. * src/dfasearch.c (EGexecute): Likewise. * tests/Makefile.am (XFAIL_TESTS): Remove turkish-I from this list; that test is no longer expected to fail. * NEWS (Bug fixes): Mention it. Reported by Ilya Basin in http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3413 and later by Strahinja Kustudic in http://savannah.gnu.org/bugs/?36567
Diffstat (limited to 'bootstrap')
0 files changed, 0 insertions, 0 deletions