diff options
author | Paul Eggert <eggert@cs.ucla.edu> | 2021-08-27 18:20:58 -0700 |
---|---|---|
committer | Paul Eggert <eggert@cs.ucla.edu> | 2021-08-27 18:21:24 -0700 |
commit | f0d97db2a2104c5fd558178713054f3f267623b2 (patch) | |
tree | 5ae27353bf81682c734bc9820515331e3ff529b3 /doc | |
parent | fd72f5d2c2a9a6a220e98af1c0230f1ae6e0a8d2 (diff) | |
download | grep-f0d97db2a2104c5fd558178713054f3f267623b2.tar.gz |
doc: document interval expression limitations
* doc/grep.texi (Basic vs Extended, Performance):
Document limitations of interval expressions (Bug#44538).
Diffstat (limited to 'doc')
-rw-r--r-- | doc/grep.texi | 15 |
1 files changed, 14 insertions, 1 deletions
diff --git a/doc/grep.texi b/doc/grep.texi index b92ecb77..e5b9fd8a 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -1526,7 +1526,7 @@ before an interval expression's closing @samp{@}}, and an unmatched @code{\)} is invalid. Portable scripts should avoid the following constructs, as -POSIX says they produce undefined results: +POSIX says they produce unspecified results: @itemize @bullet @item @@ -1541,6 +1541,8 @@ Empty alternatives (as in, e.g, @samp{a|}). Repetition operators that immediately follow empty expressions, unescaped @samp{$}, or other repetition operators. @item +Interval expressions containing repetition counts greater than 255. +@item A backslash escaping an ordinary character (e.g., @samp{\S}), unless it is a back-reference. @item @@ -1965,6 +1967,17 @@ bracket expressions like @samp{[a-z]} and @samp{[[=a=]b]}, can be surprisingly inefficient due to difficulties in fast portable access to concepts like multi-character collating elements. +@cindex interval expressions +Interval expressions may be implemented internally via repetition. +For example, @samp{^(a|bc)@{2,4@}$} might be implemented as +@samp{^(a|bc)(a|bc)((a|bc)(a|bc)?)?$}. A large repetition count may +exhaust memory or greatly slow matching. Even small counts can cause +problems if cascaded; for example, @samp{grep -E +".*@{10,@}@{10,@}@{10,@}@{10,@}@{10,@}"} is likely to overflow a +stack. Fortunately, regular expressions like these are typically +artificial, and cascaded repetitions do not conform to POSIX so cannot +be used in portable programs anyway. + @cindex back-references A back-reference such as @samp{\1} can hurt performance significantly in some cases, since back-references cannot in general be implemented |