summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorPaul Eggert <eggert@cs.ucla.edu>2023-03-19 01:50:00 -0700
committerJim Meyering <meyering@meta.com>2023-03-19 08:43:01 -0700
commit99330c2b1dc8b619dff8a5a6a35f524d382508c8 (patch)
treedba7c6cf4aa7081208994e4ce5c5f2d4c36329de /doc
parent373b4434ebc15f447ca6f96007ed6181c9a2a496 (diff)
downloadgrep-99330c2b1dc8b619dff8a5a6a35f524d382508c8.tar.gz
grep: forward port to PCRE2 10.43
* doc/grep.texi: Document this. * src/grep.c: Move recent changes into pcresearch.c. (P_MATCHER_INDEX): Remove. (pcre_pattern_expand_backslash_d): Move from here ... * src/pcresearch.c: ... to here. (PCRE2_EXTRA_ASCII_BSD): Default to 0. (Pcompile): Use PCRE2_EXTRA_ASCII_BSD if available, and expand \d to [0-9] otherwise.
Diffstat (limited to 'doc')
-rw-r--r--doc/grep.texi18
1 files changed, 11 insertions, 7 deletions
diff --git a/doc/grep.texi b/doc/grep.texi
index b17c4dac..8a0aef51 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1144,13 +1144,17 @@ combined with the @option{-z} (@option{--null-data}) option, and note that
For documentation, refer to @url{https://www.pcre.org/}, with these caveats:
@itemize
@item
-@samp{\d} always matches only the ten ASCII digits, regardless of locale or
-in-regexp directives like @samp{(?aD)}.
-Use @samp{\p@{Nd@}} if you require to match non-ASCII digits.
-Once pcre2 support for @samp{(?aD)} is widespread enough,
-we expect to make that the default, so it will be overridable.
-@c Using pcre2 git commit pcre2-10.40-112-g6277357, this demonstrates how
-@c we'll prefix with (?aD) to make \d's ASCII-only behavior the default:
+@samp{\d} matches only the ten ASCII digits, regardless of locale.
+Use @samp{\p@{Nd@}} to also match non-ASCII digits.
+
+When @command{grep} is built with PCRE2 10.42 and earlier, @samp{\d}
+ignores in-regexp directives like @samp{(?aD)} and matches only ASCII
+digits regardless of these directives. However, later versions of
+PCRE2 likely will fix this, and the plan is for @command{grep} to
+respect those directives if possible.
+@c Using PCRE2 git commit pcre2-10.40-112-g6277357, this demonstrates
+@c the equivalent of how grep could use PCRE2_EXTRA_ASCII_BSD to make \d's
+@c ASCII-only behavior the default:
@c $ LC_ALL=en_US.UTF-8 ./pcre2grep -u '(?aD)^\d+' <<< '٠١٢٣٤٥٦٧٨٩'
@c [Exit 1]
@c $ LC_ALL=en_US.UTF-8 ./pcre2grep -u '^\d+' <<< '٠١٢٣٤٥٦٧٨٩'