From 7090135ad270c767d3e15413175810c20148ac4a Mon Sep 17 00:00:00 2001
From: Karl Heuer <kwzh@gnu.org>
Date: Mon, 5 Jun 1995 12:23:13 +0000
Subject: *** empty log message ***

---
 lispref/searching.texi | 157 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 120 insertions(+), 37 deletions(-)

(limited to 'lispref/searching.texi')

diff --git a/lispref/searching.texi b/lispref/searching.texi
index ec082152aad..7919804d35c 100644
--- a/lispref/searching.texi
+++ b/lispref/searching.texi
@@ -17,6 +17,7 @@ portions of it.
 * String Search::         Search for an exact match.
 * Regular Expressions::   Describing classes of strings.
 * Regexp Search::         Searching for a match for a regexp.
+* POSIX Regexps::         Searching POSIX-style for the longest match.
 * Search and Replace::	  Internals of @code{query-replace}.
 * Match Data::            Finding out which part of the text matched
                             various parts of a regexp, after regexp search.
@@ -226,12 +227,12 @@ The next alternative is for @samp{a*} to match only two @samp{a}s.
 With this choice, the rest of the regexp matches successfully.@refill
 
 Nested repetition operators can be extremely slow if they specify
-backtracking loops.  For example, @samp{\(x+y*\)*a} could take hours to
-match the sequence @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}.  The
-slowness is because Emacs must try each imaginable way of grouping the
-35 @samp{x}'s before concluding that none of them can work.  To make
-sure your regular expressions run fast, check nested repetitions
-carefully.
+backtracking loops.  For example, it could take hours for the regular
+expression @samp{\(x+y*\)*a} to match the sequence
+@samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}.  The slowness is because
+Emacs must try each imaginable way of grouping the 35 @samp{x}'s before
+concluding that none of them can work.  To make sure your regular
+expressions run fast, check nested repetitions carefully.
 
 @item +
 @cindex @samp{+} in regexp
@@ -715,6 +716,48 @@ comes back" twice.
 @end example
 @end defun
 
+@node POSIX Regexps
+@section POSIX Regular Expression Searching
+
+  The usual regular expression functions do backtracking when necessary
+to handle the @samp{\|} and repetition constructs, but they continue
+this only until they find @emph{some} match.  Then they succeed and
+report the first match found.
+
+  This section describes alternative search functions which perform the
+full backtracking specified by the POSIX standard for regular expression
+matching.  They continue backtracking until they have tried all
+possibilities and found all matches, so they can report the longest
+match, as required by POSIX.  This is much slower, so use these
+functions only when you really need the longest match.
+
+  In Emacs versions prior to 19.29, these functions did not exist, and
+the functions described above implemented full POSIX backtracking.
+
+@defun posix-search-forward regexp &optional limit noerror repeat
+This is like @code{re-search-forward} except that it performs the full
+backtracking specified by the POSIX standard for regular expression
+matching.
+@end defun
+
+@defun posix-search-backward regexp &optional limit noerror repeat
+This is like @code{re-search-backward} except that it performs the full
+backtracking specified by the POSIX standard for regular expression
+matching.
+@end defun
+
+@defun posix-looking-at regexp
+This is like @code{looking-at} except that it performs the full
+backtracking specified by the POSIX standard for regular expression
+matching.
+@end defun
+
+@defun posix-string-match regexp string &optional start
+This is like @code{string-match} except that it performs the full
+backtracking specified by the POSIX standard for regular expression
+matching.
+@end defun
+
 @ignore
 @deffn Command delete-matching-lines regexp
 This function is identical to @code{delete-non-matching-lines}, save
@@ -909,34 +952,56 @@ match data around it, to prevent it from being overwritten.
 @node Simple Match Data
 @subsection Simple Match Data Access
 
-  This section explains how to use the match data to find the starting
-point or ending point of the text that was matched by a particular
-search, or by a particular parenthetical subexpression of a regular
-expression.
+  This section explains how to use the match data to find out what was
+matched by the last search or match operation.
+
+  You can ask about the entire matching text, or about a particular
+parenthetical subexpression of a regular expression.  The @var{count}
+argument in the functions below specifies which.  If @var{count} is
+zero, you are asking about the entire match.  If @var{count} is
+positive, it specifies which subexpression you want.
+
+  Recall that the subexpressions of a regular expression are those
+expressions grouped with escaped parentheses, @samp{\(@dots{}\)}.  The
+@var{count}th subexpression is found by counting occurrences of
+@samp{\(} from the beginning of the whole regular expression.  The first
+subexpression is numbered 1, the second 2, and so on.  Only regular
+expressions can have subexpressions---after a simple string search, the
+only information available is about the entire match.
+
+@defun match-string count &optional in-string
+This function returns, as a string, the text matched in the last search
+or match operation.  It returns the entire text if @var{count} is zero,
+or just the portion corresponding to the @var{count}th parenthetical
+subexpression, if @var{count} is positive.  If @var{count} is out of
+range, the value is @code{nil}.
+
+If the last such operation was done against a string with
+@code{string-match}, then you should pass the same string as the
+argument @var{in-string}.  Otherwise, after a buffer search or match,
+you should omit @var{in-string} or pass @code{nil} for it; but you
+should make sure that the current buffer when you call
+@code{match-string} is the one in which you did the searching or
+matching.
+@end defun
 
 @defun match-beginning count
 This function returns the position of the start of text matched by the
 last regular expression searched for, or a subexpression of it.
 
 If @var{count} is zero, then the value is the position of the start of
-the text matched by the whole regexp.  Otherwise, @var{count}, specifies
-a subexpression in the regular expresion.  The value of the function is
-the starting position of the match for that subexpression.
-
-Subexpressions of a regular expression are those expressions grouped
-with escaped parentheses, @samp{\(@dots{}\)}.  The @var{count}th
-subexpression is found by counting occurrences of @samp{\(} from the
-beginning of the whole regular expression.  The first subexpression is
-numbered 1, the second 2, and so on.
-
-The value is @code{nil} for a subexpression inside a
-@samp{\|} alternative that wasn't used in the match.
+the entire match.  Otherwise, @var{count}, specifies a subexpression in
+the regular expresion, and the value of the function is the starting
+position of the match for that subexpression.
+
+The value is @code{nil} for a subexpression inside a @samp{\|}
+alternative that wasn't used in the match.
 @end defun
 
 @defun match-end count
-This function returns the position of the end of the text that matched
-the last regular expression searched for, or a subexpression of it.
-This function is otherwise similar to @code{match-beginning}.
+This function is like @code{match-beginning} except that it returns the
+position of the end of the match, rather than the position of the
+beginning.
 @end defun
 
   Here is an example of using the match data, with a comment showing the
@@ -950,6 +1015,15 @@ positions within the text:
      @result{} 4
 @end group
 
+@group
+(match-string 0 "The quick fox jumped quickly.")
+     @result{} "quick"
+(match-string 1 "The quick fox jumped quickly.")
+     @result{} "qu"
+(match-string 2 "The quick fox jumped quickly.")
+     @result{} "ick"
+@end group
+
 @group
 (match-beginning 1)       ; @r{The beginning of the match}
      @result{} 4                 ;   @r{with @samp{qu} is at index 4.}
@@ -1004,11 +1078,15 @@ character of the buffer counts as 1.)
 @var{replacement}.
 
 @cindex case in replacements
-@defun replace-match replacement &optional fixedcase literal
-This function replaces the buffer text matched by the last search, with
-@var{replacement}.  It applies only to buffers; you can't use
-@code{replace-match} to replace a substring found with
-@code{string-match}.
+@defun replace-match replacement &optional fixedcase literal string
+This function replaces the text in the buffer (or in @var{string}) that
+was matched by the last search.  It replaces that text with
+@var{replacement}.
+
+If @var{string} is @code{nil}, @code{replace-match} does the replacement
+by editing the buffer; it leaves point at the end of the replacement
+text, and returns @code{t}.  If @var{string} is a string, it does the
+replacement by constructing and returning a new string.
 
 If @var{fixedcase} is non-@code{nil}, then the case of the replacement
 text is not changed; otherwise, the replacement text is converted to a
@@ -1044,9 +1122,6 @@ Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
 @cindex @samp{\} in replacement
 @samp{\\} stands for a single @samp{\} in the replacement text.
 @end table
-
-@code{replace-match} leaves point at the end of the replacement text,
-and returns @code{t}.
 @end defun
 
 @node Entire Match Data
@@ -1239,19 +1314,27 @@ default value is @code{"^\014"} (i.e., @code{"^^L"} or @code{"^\C-l"});
 this matches a line that starts with a formfeed character.
 @end defvar
 
+  The following two regular expressions should @emph{not} assume the
+match always starts at the beginning of a line; they should not use
+@samp{^} to anchor the match.  Most often, the paragraph commands do
+check for a match only at the beginning of a line, which means that
+@samp{^} would be superfluous.  When there is a left margin, they accept
+matches that start after the left margin.  In that case, a @samp{^}
+would be incorrect.
+
 @defvar paragraph-separate
 This is the regular expression for recognizing the beginning of a line
 that separates paragraphs.  (If you change this, you may have to
 change @code{paragraph-start} also.)  The default value is
-@w{@code{"^[@ \t\f]*$"}}, which matches a line that consists entirely of
-spaces, tabs, and form feeds.
+@w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
+spaces, tabs, and form feeds (after its left margin).
 @end defvar
 
 @defvar paragraph-start
 This is the regular expression for recognizing the beginning of a line
 that starts @emph{or} separates paragraphs.  The default value is
-@w{@code{"^[@ \t\n\f]"}}, which matches a line starting with a space, tab,
-newline, or form feed.
+@w{@code{"[@ \t\n\f]"}}, which matches a line starting with a space, tab,
+newline, or form feed (after its left margin).
 @end defvar
 
 @defvar sentence-end
-- 
cgit v1.2.1