summaryrefslogtreecommitdiff
path: root/tests
diff options
context:
space:
mode:
authorJim Meyering <meyering@fb.com>2016-11-27 15:31:35 -0800
committerJim Meyering <meyering@fb.com>2016-11-27 15:51:18 -0800
commitfce643886981ab14c1d4c8fd8f0f4d33f57c5ef9 (patch)
treed05eac727848819b87b0ff946540eca2966e431f /tests
parent9dc7483239001d4cd1fcdbacd632c7e93187c871 (diff)
downloadgrep-fce643886981ab14c1d4c8fd8f0f4d33f57c5ef9.tar.gz
grep: avoid false matches in non-UTF8 multibyte locales
* gnulib: Update to latest, for the dfa.c fix. * NEWS (Bug fixes): Mention it. * tests/false-match-mb-non-utf8: New file, with tests for this. Based on tests from Stephane Chazelas. * tests/Makefile.am (TESTS): Add it. Introduced by commit v2.18-54-g3ef4c8e, a change that made grep use its DFA matcher more aggressively. The malfunction arises only with the DFA matcher, not with regex. Reported by Stephane Chazelas in https://bugs.gnu.org/24975
Diffstat (limited to 'tests')
-rw-r--r--tests/Makefile.am1
-rwxr-xr-xtests/false-match-mb-non-utf838
2 files changed, 39 insertions, 0 deletions
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 56e860f0..442e85a8 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -94,6 +94,7 @@ TESTS = \
equiv-classes \
ere \
euc-mb \
+ false-match-mb-non-utf8 \
fedora \
fgrep-infloop \
file \
diff --git a/tests/false-match-mb-non-utf8 b/tests/false-match-mb-non-utf8
new file mode 100755
index 00000000..6dfd10a5
--- /dev/null
+++ b/tests/false-match-mb-non-utf8
@@ -0,0 +1,38 @@
+#! /bin/sh
+# Test for false matches in grep 2.19..2.26 in multibyte, non-UTF8 locales
+#
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+
+# Add "." to PATH for the use of get-mb-cur-max.
+path_prepend_ .
+
+fail=0
+
+loc=zh_CN.gb18030
+test "$(get-mb-cur-max $loc)" = 4 || skip_ "no support for the $loc locale"
+
+# This must not match: the input is a single character, \uC9 followed
+# by a newline. But it just so happens that that character is made up
+# of four bytes, the last of which is the digit, 7, and grep's DFA
+# matcher would mistakenly report that ".*7" matches that input line.
+printf '\2010\2077\n' > in || framework_failure_
+LC_ALL=$loc returns_ 1 grep -E '.*7' in || fail=1
+
+LC_ALL=$loc returns_ 1 grep -E '.{0,1}7' in || fail=1
+
+LC_ALL=$loc returns_ 1 grep -E '.?7' in || fail=1
+
+# Similar for the \ue9 code point, which ends in an "m" byte.
+loc=zh_HK.big5hkscs
+test "$(get-mb-cur-max $loc)" = 2 || skip_ "no support for the $loc locale"
+
+printf '\210m\n' > in || framework_failure_
+LC_ALL=$loc returns_ 1 grep '.*m' in || fail=1
+
+Exit $fail