diff options
author | Jim Meyering <meyering@redhat.com> | 2012-06-01 21:18:00 +0200 |
---|---|---|
committer | Jim Meyering <meyering@redhat.com> | 2012-06-02 11:06:08 +0200 |
commit | 7aa698d36b5b2eeb8e90e7a327eb7ebe46d59e87 (patch) | |
tree | cadef59a85a9ffe401994e8addeb1a995394c279 /bootstrap | |
parent | 2665746b756bd372ba856e165388dc98032362fd (diff) | |
download | grep-7aa698d36b5b2eeb8e90e7a327eb7ebe46d59e87.tar.gz |
grep: fix how -i works with a match containing the Turkish I-with-dot
Fix a long-standing problem in the way grep's -i interacts with
data whose byte count changes when we convert it to lower case.
For example, the UTF-8 Turkish I-with-dot (İ) occupies two bytes,
but its lower case analog, i, occupies just one byte. The code
converts both search string and the haystack data to lower case,
and then searches for the modified string in the modified buffer.
The trouble arose when using a lowercase buffer <offset,length>
pair to manipulate the original (longer) buffer.
The solution is to change mbtolower to return additional information:
a malloc'd mapping vector. With that, the caller maps the lowercase-
relative <offset,length> to numbers that refer to the original buffer.
This mapping is used only when lengths actually differ, so the cost
in general should be small.
* src/searchutils.c (mbtolower): Add the new map parameter.
* src/search.h (mb_case_map_apply): New function.
* src/kwsearch.c (Fexecute): Update mbtolower caller, and upon
success, apply the new map.
* src/dfasearch.c (EGexecute): Likewise.
* tests/Makefile.am (XFAIL_TESTS): Remove turkish-I from this list;
that test is no longer expected to fail.
* NEWS (Bug fixes): Mention it.
Reported by Ilya Basin in
http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3413 and later
by Strahinja Kustudic in http://savannah.gnu.org/bugs/?36567
Diffstat (limited to 'bootstrap')
0 files changed, 0 insertions, 0 deletions