diff options
author | Paolo Bonzini <bonzini@gnu.org> | 2008-11-23 17:28:46 +0100 |
---|---|---|
committer | Paolo Bonzini <bonzini@gnu.org> | 2010-03-17 07:56:45 +0100 |
commit | 70e236167c3973fc428d2b5b297218fde9b68e73 (patch) | |
tree | a22ee0eb6c9d4aaac49a239a98d4f04baa3f2864 /tests | |
parent | c32c042126a633d1bda23317410667cdd3a527f5 (diff) | |
download | grep-70e236167c3973fc428d2b5b297218fde9b68e73.tar.gz |
dfa: rewrite handling of multibyte case_fold lexing
Let dfacomp do the folding to lowercase of multibyte input strings,
and remove it from grep.c. Input strings to kwset.c are still folded
outside kwset.c, so we still need to do mbtolower in search.c.
* NEWS: Document bugfixes.
* .x-sc_cast_of_argument_to_free: Remove.
* src/dfa.c (wctok, addtok_wc): New.
(cur_mb_index, update_mb_len_index): Remove.
(FETCH): Do not call it.
(parse_bracket_exp_mb) [GREP]: Disable case-folding of ranges and
characters.
(addtok): Extract part to...
(addtok_mb): ... this new function.
(lex): Call fetch_wc in the main loop for MB_CUR_MAX > 1. Return WCHAR
for normal characters if MB_CUR_MAX > 1.
(atom): Handle WCHAR instead of treating multibyte characters specially.
Do case folding of multibyte characters here.
(dfacomp): Remove case_fold special casing.
* src/dfa.h (WCHAR): New.
* src/grep.c (mb_icase_keys): Remove.
(main): Do not call it.
* src/search.c (kwsinit): Init transition table only for MB_CUR_MAX == 1.
(mbtolower): New.
(kwsincr_case): New.
(kwsmusts): Call it instead of kwsincr.
(check_multibyte_string): Remove.
(check_multibyte_string_no_icase): Rename to check_multibyte_string.
(GEAcompile, EGexecute, Fcompile): Use mbtolower instead of the old
check_multibyte_string.
* tests/Makefile.am (TESTS): Add case-fold-backslash-w.
* tests/foad1.sh: Enable fixed tests.
* tests/case-fold-backslash-w: New.
Diffstat (limited to 'tests')
-rw-r--r-- | tests/Makefile.am | 1 | ||||
-rwxr-xr-x | tests/case-fold-backslash-w | 14 | ||||
-rwxr-xr-x | tests/foad1.sh | 10 |
3 files changed, 19 insertions, 6 deletions
diff --git a/tests/Makefile.am b/tests/Makefile.am index 6a56dcff..e27d1d5d 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -17,6 +17,7 @@ TESTS = \ backref.sh \ bre.sh \ + case-fold-backslash-w \ case-fold-char-class \ case-fold-char-range \ case-fold-char-type \ diff --git a/tests/case-fold-backslash-w b/tests/case-fold-backslash-w new file mode 100755 index 00000000..6ae70463 --- /dev/null +++ b/tests/case-fold-backslash-w @@ -0,0 +1,14 @@ +#!/bin/sh +# test that \W works on case-insensitive matches. It used to become \w. +# Derived from https://savannah.gnu.org/bugs/?28162 +: ${srcdir=.} +. "$srcdir/init.sh"; path_prepend_ ../src + +if echo foo bar | LANG=C.ASCII grep '^foo\W'; then + echo foo bar | LANG=C.ASCII grep -i '^foo\W' || fail_ ASCII insensitive +else + echo foo bar | LANG=C grep '^foo\W' || fail_ LANG=C sensitive + echo foo bar | LANG=C grep -i '^foo\W' || fail_ LANG=C insensitive +fi +echo foo bar | LANG=en_US.UTF-8 grep '^foo\W' || fail_ UTF-8 sensitive +echo foo bar | LANG=en_US.UTF-8 grep -i '^foo\W' || fail_ UTF-8 insensitive diff --git a/tests/foad1.sh b/tests/foad1.sh index 7c16d00e..68acc777 100755 --- a/tests/foad1.sh +++ b/tests/foad1.sh @@ -42,9 +42,8 @@ grep_test () # "-o" with "-i" should output an exact copy of the matching input text. grep_test "WordA/wordB/WORDC/" "Word/word/WORD/" "word" -o -i -# Comment out cases that are known to fail. These should be uncommented after the 2.5.4 release. TAA. -#grep_test "WordA/wordB/WORDC/" "Word/word/WORD/" "Word" -o -i -#grep_test "WordA/wordB/WORDC/" "Word/word/WORD/" "WORD" -o -i +grep_test "WordA/wordB/WORDC/" "Word/word/WORD/" "Word" -o -i +grep_test "WordA/wordB/WORDC/" "Word/word/WORD/" "WORD" -o -i # Should display the line number (-n), octet offset (-b), or file name # (-H) of every match, not just of the first match on each input line. @@ -82,9 +81,8 @@ CE="[m[K" # "--color" with "-i" should output an exact copy of the matching input text. grep_test "WordA/wordb/WORDC/" "${CB}Word${CE}A/${CB}word${CE}b/${CB}WORD${CE}C/" "word" --color=always -i -# Comment out cases that are known to fail. These should be uncommented after the 2.5.4 release. TAA. -#grep_test "WordA/wordb/WORDC/" "${CB}Word${CE}A/${CB}word${CE}b/${CB}WORD${CE}C/" "Word" --color=always -i -#grep_test "WordA/wordb/WORDC/" "${CB}Word${CE}A/${CB}word${CE}b/${CB}WORD${CE}C/" "WORD" --color=always -i +grep_test "WordA/wordb/WORDC/" "${CB}Word${CE}A/${CB}word${CE}b/${CB}WORD${CE}C/" "Word" --color=always -i +grep_test "WordA/wordb/WORDC/" "${CB}Word${CE}A/${CB}word${CE}b/${CB}WORD${CE}C/" "WORD" --color=always -i # End of a previous match should not match a "start of ..." expression. grep_test "word_word/" "${CB}word_${CE}word/" "^word_*" --color=always |