diff options
author | Paul Eggert <eggert@cs.ucla.edu> | 2023-03-19 01:50:00 -0700 |
---|---|---|
committer | Jim Meyering <meyering@meta.com> | 2023-03-19 08:43:01 -0700 |
commit | 99330c2b1dc8b619dff8a5a6a35f524d382508c8 (patch) | |
tree | dba7c6cf4aa7081208994e4ce5c5f2d4c36329de /doc | |
parent | 373b4434ebc15f447ca6f96007ed6181c9a2a496 (diff) | |
download | grep-99330c2b1dc8b619dff8a5a6a35f524d382508c8.tar.gz |
grep: forward port to PCRE2 10.43
* doc/grep.texi: Document this.
* src/grep.c: Move recent changes into pcresearch.c.
(P_MATCHER_INDEX): Remove.
(pcre_pattern_expand_backslash_d): Move from here ...
* src/pcresearch.c: ... to here.
(PCRE2_EXTRA_ASCII_BSD): Default to 0.
(Pcompile): Use PCRE2_EXTRA_ASCII_BSD if available,
and expand \d to [0-9] otherwise.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/grep.texi | 18 |
1 files changed, 11 insertions, 7 deletions
diff --git a/doc/grep.texi b/doc/grep.texi index b17c4dac..8a0aef51 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -1144,13 +1144,17 @@ combined with the @option{-z} (@option{--null-data}) option, and note that For documentation, refer to @url{https://www.pcre.org/}, with these caveats: @itemize @item -@samp{\d} always matches only the ten ASCII digits, regardless of locale or -in-regexp directives like @samp{(?aD)}. -Use @samp{\p@{Nd@}} if you require to match non-ASCII digits. -Once pcre2 support for @samp{(?aD)} is widespread enough, -we expect to make that the default, so it will be overridable. -@c Using pcre2 git commit pcre2-10.40-112-g6277357, this demonstrates how -@c we'll prefix with (?aD) to make \d's ASCII-only behavior the default: +@samp{\d} matches only the ten ASCII digits, regardless of locale. +Use @samp{\p@{Nd@}} to also match non-ASCII digits. + +When @command{grep} is built with PCRE2 10.42 and earlier, @samp{\d} +ignores in-regexp directives like @samp{(?aD)} and matches only ASCII +digits regardless of these directives. However, later versions of +PCRE2 likely will fix this, and the plan is for @command{grep} to +respect those directives if possible. +@c Using PCRE2 git commit pcre2-10.40-112-g6277357, this demonstrates +@c the equivalent of how grep could use PCRE2_EXTRA_ASCII_BSD to make \d's +@c ASCII-only behavior the default: @c $ LC_ALL=en_US.UTF-8 ./pcre2grep -u '(?aD)^\d+' <<< '٠١٢٣٤٥٦٧٨٩' @c [Exit 1] @c $ LC_ALL=en_US.UTF-8 ./pcre2grep -u '^\d+' <<< '٠١٢٣٤٥٦٧٨٩' |