summaryrefslogtreecommitdiff
path: root/tests
diff options
context:
space:
mode:
authorCarlo Marcelo Arenas Belón <carenas@gmail.com>2023-01-06 19:34:56 -0800
committerJim Meyering <meyering@fb.com>2023-01-07 18:24:51 -0800
commit5e3b760f65f13856e5717e5b9d935f5b4a615be3 (patch)
tree84050b2bd44f892b69288bc1b12fbcdd23826d71 /tests
parent45e1158a4bb44e507239274535290db61dd27577 (diff)
downloadgrep-5e3b760f65f13856e5717e5b9d935f5b4a615be3.tar.gz
pcre: use UCP in UTF mode
This fixes a serious bug affecting word-boundary and word-constituent regular expressions when the desired match involves non-ASCII UTF8 characters. * src/pcresearch.c: Set PCRE2_UCP together with PCRE2_UTF * tests/pcre-utf8-w: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention this. * THANKS.in: Add Gro-Tsen and Karl Petterson. Reported by Gro-Tsen https://twitter.com/gro_tsen/status/1610972356972875777 via Karl Pettersson in https://github.com/PCRE2Project/pcre2/issues/185 This bug was present from grep-2.5, when --perl-regexp (-P) support was added.
Diffstat (limited to 'tests')
-rw-r--r--tests/Makefile.am1
-rwxr-xr-xtests/pcre-utf8-w28
2 files changed, 29 insertions, 0 deletions
diff --git a/tests/Makefile.am b/tests/Makefile.am
index e0b0503c..a47cf5c0 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -147,6 +147,7 @@ TESTS = \
pcre-jitstack \
pcre-o \
pcre-utf8 \
+ pcre-utf8-w \
pcre-w \
pcre-wx-backref \
pcre-z \
diff --git a/tests/pcre-utf8-w b/tests/pcre-utf8-w
new file mode 100755
index 00000000..4cd7db69
--- /dev/null
+++ b/tests/pcre-utf8-w
@@ -0,0 +1,28 @@
+#!/bin/sh
+# Ensure non-ASCII UTF-8 characters are correctly identified as word-consituent
+#
+# Copyright (C) 2023 Free Software Foundation, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+require_en_utf8_locale_
+LC_ALL=en_US.UTF-8
+export LC_ALL
+require_pcre_
+
+fail=0
+
+echo 'Perú'> in || framework_failure_
+
+echo 'ú' > exp || framework_failure_
+grep -Po '.\b' in > out || fail=1
+compare exp out || fail=1
+
+echo 'rú' > exp || framework_failure_
+grep -Po 'r\w' in > out || fail=1
+compare exp out || fail=1
+
+Exit $fail