summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorPaul Eggert <eggert@cs.ucla.edu>2021-08-27 18:20:58 -0700
committerPaul Eggert <eggert@cs.ucla.edu>2021-08-27 18:21:24 -0700
commitf0d97db2a2104c5fd558178713054f3f267623b2 (patch)
tree5ae27353bf81682c734bc9820515331e3ff529b3 /doc
parentfd72f5d2c2a9a6a220e98af1c0230f1ae6e0a8d2 (diff)
downloadgrep-f0d97db2a2104c5fd558178713054f3f267623b2.tar.gz
doc: document interval expression limitations
* doc/grep.texi (Basic vs Extended, Performance): Document limitations of interval expressions (Bug#44538).
Diffstat (limited to 'doc')
-rw-r--r--doc/grep.texi15
1 files changed, 14 insertions, 1 deletions
diff --git a/doc/grep.texi b/doc/grep.texi
index b92ecb77..e5b9fd8a 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1526,7 +1526,7 @@ before an interval expression's closing @samp{@}}, and an unmatched
@code{\)} is invalid.
Portable scripts should avoid the following constructs, as
-POSIX says they produce undefined results:
+POSIX says they produce unspecified results:
@itemize @bullet
@item
@@ -1541,6 +1541,8 @@ Empty alternatives (as in, e.g, @samp{a|}).
Repetition operators that immediately follow empty expressions,
unescaped @samp{$}, or other repetition operators.
@item
+Interval expressions containing repetition counts greater than 255.
+@item
A backslash escaping an ordinary character (e.g., @samp{\S}),
unless it is a back-reference.
@item
@@ -1965,6 +1967,17 @@ bracket expressions like @samp{[a-z]} and @samp{[[=a=]b]}, can be
surprisingly inefficient due to difficulties in fast portable access to
concepts like multi-character collating elements.
+@cindex interval expressions
+Interval expressions may be implemented internally via repetition.
+For example, @samp{^(a|bc)@{2,4@}$} might be implemented as
+@samp{^(a|bc)(a|bc)((a|bc)(a|bc)?)?$}. A large repetition count may
+exhaust memory or greatly slow matching. Even small counts can cause
+problems if cascaded; for example, @samp{grep -E
+".*@{10,@}@{10,@}@{10,@}@{10,@}@{10,@}"} is likely to overflow a
+stack. Fortunately, regular expressions like these are typically
+artificial, and cascaded repetitions do not conform to POSIX so cannot
+be used in portable programs anyway.
+
@cindex back-references
A back-reference such as @samp{\1} can hurt performance significantly
in some cases, since back-references cannot in general be implemented