diff options
author | Jim Meyering <meyering@redhat.com> | 2012-06-16 12:32:35 +0200 |
---|---|---|
committer | Jim Meyering <meyering@redhat.com> | 2012-06-16 18:18:00 +0200 |
commit | 2be0c6591ea5d56ee7fbc364228f24b85f569a5c (patch) | |
tree | 094cd910c8362e8fb429cfced49d96459f47b784 /NEWS | |
parent | 074842d3e3054714a495252e582886f0e4ace4e4 (diff) | |
download | grep-2be0c6591ea5d56ee7fbc364228f24b85f569a5c.tar.gz |
grep -i: work also when converting to lower-case inflates byte count
Commit v2.12-16-g7aa698d addressed the case in which the lower-case
representation of an input byte occupies fewer bytes than the original.
However, even with commit v2.12-20-g074842d, grep -i would still
misbehave when converting a character to lower-case increased its
byte count. The map-manipulation code assumed that the case conversion
could only shrink the byte count. With the consideration that it may
also inflate it, the deltas recorded in the map array must be signed,
and we must account for the one-to-two-or-more mapping when the
original-to-lower-case conversion causes the byte count to increase.
* src/searchutils.c (mbtolower): When a lower-case character occupies
more than one byte, set its remaining map slots to zero. Change the
type of the map to be signed, and compute the change in character
byte count as new_length - old_length.
* src/search.h: Include <stdint.h>, for decl of intmax_t.
(mb_case_map_apply): Adjust for signed increments:
each map entry is now signed.
(mb_len_map_t): Define type. Thanks to Paul Eggert for noticing
in review that using a bare "char" as the base type would be wrong on
systems for which it is a signed type (as with gcc's -funsigned-char).
* src/kwsearch.c (Fcompile, Fexecute): Likewise.
* src/dfasearch.c (kwsincr_case, EGexecute): Likewise.
* tests/turkish-I-without-dot: New test. Thanks to Paolo Bonzini
for the tip that in the tr_TR.utf8 locale, mapping "I" to lower case
increases the character's byte count.
* tests/Makefile.am (TESTS): Add it.
* tests/init.cfg (require_tr_utf8_locale_): New function.
* NEWS (Bug fixes): Expand the existing entry.
Diffstat (limited to 'NEWS')
-rw-r--r-- | NEWS | 3 |
1 files changed, 3 insertions, 0 deletions
@@ -7,6 +7,9 @@ GNU grep NEWS -*- outline -*- grep -i, in a multi-byte locale, when matching a line containing a character like the UTF-8 Turkish I-with-dot (U+0130) (whose lower-case representation occupies fewer bytes), would print an incomplete output line. + Similarly, with a matched line containing a character (e.g., the Latin + capital I in a Turkish UTF-8 locale), where the lower-case representation + occupies more bytes, grep could print garbage. [bug introduced in grep-2.6] --include and --exclude can again be combined, and again apply to |