diff options
author | Carlo Marcelo Arenas Belón <carenas@gmail.com> | 2023-01-06 19:34:56 -0800 |
---|---|---|
committer | Jim Meyering <meyering@fb.com> | 2023-01-07 18:24:51 -0800 |
commit | 5e3b760f65f13856e5717e5b9d935f5b4a615be3 (patch) | |
tree | 84050b2bd44f892b69288bc1b12fbcdd23826d71 /tests | |
parent | 45e1158a4bb44e507239274535290db61dd27577 (diff) | |
download | grep-5e3b760f65f13856e5717e5b9d935f5b4a615be3.tar.gz |
pcre: use UCP in UTF mode
This fixes a serious bug affecting word-boundary and word-constituent regular
expressions when the desired match involves non-ASCII UTF8 characters.
* src/pcresearch.c: Set PCRE2_UCP together with PCRE2_UTF
* tests/pcre-utf8-w: New file.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Bug fixes): Mention this.
* THANKS.in: Add Gro-Tsen and Karl Petterson.
Reported by Gro-Tsen https://twitter.com/gro_tsen/status/1610972356972875777
via Karl Pettersson in https://github.com/PCRE2Project/pcre2/issues/185
This bug was present from grep-2.5, when --perl-regexp (-P) support was added.
Diffstat (limited to 'tests')
-rw-r--r-- | tests/Makefile.am | 1 | ||||
-rwxr-xr-x | tests/pcre-utf8-w | 28 |
2 files changed, 29 insertions, 0 deletions
diff --git a/tests/Makefile.am b/tests/Makefile.am index e0b0503c..a47cf5c0 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -147,6 +147,7 @@ TESTS = \ pcre-jitstack \ pcre-o \ pcre-utf8 \ + pcre-utf8-w \ pcre-w \ pcre-wx-backref \ pcre-z \ diff --git a/tests/pcre-utf8-w b/tests/pcre-utf8-w new file mode 100755 index 00000000..4cd7db69 --- /dev/null +++ b/tests/pcre-utf8-w @@ -0,0 +1,28 @@ +#!/bin/sh +# Ensure non-ASCII UTF-8 characters are correctly identified as word-consituent +# +# Copyright (C) 2023 Free Software Foundation, Inc. +# +# Copying and distribution of this file, with or without modification, +# are permitted in any medium without royalty provided the copyright +# notice and this notice are preserved. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +require_en_utf8_locale_ +LC_ALL=en_US.UTF-8 +export LC_ALL +require_pcre_ + +fail=0 + +echo 'Perú'> in || framework_failure_ + +echo 'ú' > exp || framework_failure_ +grep -Po '.\b' in > out || fail=1 +compare exp out || fail=1 + +echo 'rú' > exp || framework_failure_ +grep -Po 'r\w' in > out || fail=1 +compare exp out || fail=1 + +Exit $fail |