summaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorCarlo Marcelo Arenas Belón <carenas@gmail.com>2023-01-06 19:34:56 -0800
committerJim Meyering <meyering@fb.com>2023-01-07 18:24:51 -0800
commit5e3b760f65f13856e5717e5b9d935f5b4a615be3 (patch)
tree84050b2bd44f892b69288bc1b12fbcdd23826d71 /src
parent45e1158a4bb44e507239274535290db61dd27577 (diff)
downloadgrep-5e3b760f65f13856e5717e5b9d935f5b4a615be3.tar.gz
pcre: use UCP in UTF mode
This fixes a serious bug affecting word-boundary and word-constituent regular expressions when the desired match involves non-ASCII UTF8 characters. * src/pcresearch.c: Set PCRE2_UCP together with PCRE2_UTF * tests/pcre-utf8-w: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Bug fixes): Mention this. * THANKS.in: Add Gro-Tsen and Karl Petterson. Reported by Gro-Tsen https://twitter.com/gro_tsen/status/1610972356972875777 via Karl Pettersson in https://github.com/PCRE2Project/pcre2/issues/185 This bug was present from grep-2.5, when --perl-regexp (-P) support was added.
Diffstat (limited to 'src')
-rw-r--r--src/pcresearch.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/src/pcresearch.c b/src/pcresearch.c
index a107f4d0..45b67eed 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -149,7 +149,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
{
if (! localeinfo.using_utf8)
die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 locales"));
- flags |= PCRE2_UTF;
+ flags |= (PCRE2_UTF | PCRE2_UCP);
#if 0
/* Do not match individual code units but only UTF-8. */
flags |= PCRE2_NEVER_BACKSLASH_C;