summaryrefslogtreecommitdiff
path: root/lispref/searching.texi
diff options
context:
space:
mode:
Diffstat (limited to 'lispref/searching.texi')
-rw-r--r--lispref/searching.texi285
1 files changed, 193 insertions, 92 deletions
diff --git a/lispref/searching.texi b/lispref/searching.texi
index 0f465edc011..68593e4bbef 100644
--- a/lispref/searching.texi
+++ b/lispref/searching.texi
@@ -34,9 +34,9 @@ portions of it.
These are the primitive functions for searching through the text in a
buffer. They are meant for use in programs, but you may call them
-interactively. If you do so, they prompt for the search string;
-@var{limit} and @var{noerror} are set to @code{nil}, and @var{repeat}
-is set to 1.
+interactively. If you do so, they prompt for the search string; the
+arguments @var{limit} and @var{noerror} are @code{nil}, and @var{repeat}
+is 1.
These search functions convert the search string to multibyte if the
buffer is multibyte; they convert the search string to unibyte if the
@@ -167,6 +167,7 @@ regexps; the following section says how to search for them.
@menu
* Syntax of Regexps:: Rules for writing regular expressions.
+* Regexp Functions:: Functions for operating on regular expressions.
* Regexp Example:: Illustrates regular expression syntax.
@end menu
@@ -182,21 +183,33 @@ special characters will be defined in the future. Any other character
appearing in a regular expression is ordinary, unless a @samp{\}
precedes it.
-For example, @samp{f} is not a special character, so it is ordinary, and
+ For example, @samp{f} is not a special character, so it is ordinary, and
therefore @samp{f} is a regular expression that matches the string
@samp{f} and no other string. (It does @emph{not} match the string
-@samp{ff}.) Likewise, @samp{o} is a regular expression that matches
-only @samp{o}.@refill
+@samp{fg}, but it does match a @emph{part} of that string.) Likewise,
+@samp{o} is a regular expression that matches only @samp{o}.@refill
-Any two regular expressions @var{a} and @var{b} can be concatenated. The
+ Any two regular expressions @var{a} and @var{b} can be concatenated. The
result is a regular expression that matches a string if @var{a} matches
some amount of the beginning of that string and @var{b} matches the rest of
the string.@refill
-As a simple example, we can concatenate the regular expressions @samp{f}
+ As a simple example, we can concatenate the regular expressions @samp{f}
and @samp{o} to get the regular expression @samp{fo}, which matches only
the string @samp{fo}. Still trivial. To do something more powerful, you
-need to use one of the special characters. Here is a list of them:
+need to use one of the special regular expression constructs.
+
+@menu
+* Regexp Special:: Special characters in regular expressions.
+* Char Classes:: Character classes used in regular expressions.
+* Regexp Backslash:: Backslash-sequences in regular expressions.
+@end menu
+
+@node Regexp Special
+@subsubsection Special Characters in Regular Expressions
+
+ Here is a list of the characters that are special in a regular
+expression.
@need 800
@table @asis
@@ -266,23 +279,10 @@ matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
You can also include character ranges in a character alternative, by
writing the starting and ending characters with a @samp{-} between them.
-Thus, @samp{[a-z]} matches any lower-case @sc{ASCII} letter. Ranges may be
+Thus, @samp{[a-z]} matches any lower-case @sc{ascii} letter. Ranges may be
intermixed freely with individual characters, as in @samp{[a-z$%.]},
-which matches any lower case @sc{ASCII} letter or @samp{$}, @samp{%} or
+which matches any lower case @sc{ascii} letter or @samp{$}, @samp{%} or
period.
-
-You cannot always match all non-@sc{ASCII} characters with the regular
-expression @samp{[\200-\377]}. This works when searching a unibyte
-buffer or string (@pxref{Text Representations}), but not in a multibyte
-buffer or string, because many non-@sc{ASCII} characters have codes
-above octal 0377. However, the regular expression @samp{[^\000-\177]}
-does match all non-@sc{ASCII} characters, in both multibyte and unibyte
-representations, because only the @sc{ASCII} characters are excluded.
-
-The beginning and end of a range must be in the same character set
-(@pxref{Character Sets}). Thus, @samp{[a-\x8e0]} is invalid because
-@samp{a} is in the @sc{ASCII} character set but the character 0x8e0
-(@samp{a} with grave accent) is in the Emacs character set for Latin-1.
Note that the usual regexp special characters are not special inside a
character alternative. A completely different set of characters is
@@ -297,6 +297,27 @@ matches both @samp{]} and @samp{-}.
To include @samp{^} in a character alternative, put it anywhere but at
the beginning.
+The beginning and end of a range must be in the same character set
+(@pxref{Character Sets}). Thus, @samp{[a-\x8e0]} is invalid because
+@samp{a} is in the @sc{ascii} character set but the character 0x8e0
+(@samp{a} with grave accent) is in the Emacs character set for Latin-1.
+
+You cannot always match all non-@sc{ascii} characters with the regular
+expression @samp{[\200-\377]}. This works when searching a unibyte
+buffer or string (@pxref{Text Representations}), but not in a multibyte
+buffer or string, because many non-@sc{ascii} characters have codes
+above octal 0377. However, the regular expression @samp{[^\000-\177]}
+does match all non-@sc{ascii} characters (see below regarding @samp{^}),
+in both multibyte and unibyte representations, because only the
+@sc{ascii} characters are excluded.
+
+Starting in Emacs 21, a character alternative can also specify named
+character classes (@pxref{Char Classes}). This is a POSIX feature whose
+syntax is @samp{[:@var{class}:]}. Using a character class is equivalent
+to mentioning each of the characters in that class; but the latter is
+not feasible in practice, since some classes include thousands of
+different characters.
+
@item @samp{[^ @dots{} ]}
@cindex @samp{^} in regexp
@samp{[^} begins a @dfn{complemented character alternative}, which matches any
@@ -321,14 +342,21 @@ the beginning of a line.
When matching a string instead of a buffer, @samp{^} matches at the
beginning of the string or after a newline character @samp{\n}.
+For historical compatibility reasons, @samp{^} can be used only at the
+beginning of the regular expression, or after @samp{\(} or @samp{\|}.
+
@item @samp{$}
@cindex @samp{$} in regexp
+@cindex end of line in regexp
is similar to @samp{^} but matches only at the end of a line. Thus,
@samp{x+$} matches a string of one @samp{x} or more at the end of a line.
When matching a string instead of a buffer, @samp{$} matches at the end
of the string or before a newline character @samp{\n}.
+For historical compatibility reasons, @samp{$} can be used only at the
+end of the regular expression, or before @samp{\)} or @samp{\|}.
+
@item @samp{\}
@cindex @samp{\} in regexp
has two functions: it quotes the special characters (including
@@ -354,11 +382,66 @@ ordinary since there is no preceding expression on which the @samp{*}
can act. It is poor practice to depend on this behavior; quote the
special character anyway, regardless of where it appears.@refill
-For the most part, @samp{\} followed by any character matches only that
-character. However, there are several exceptions: two-character
-sequences starting with @samp{\} which have special meanings. (The
-second character in such a sequence is always ordinary when used on its
-own.) Here is a table of @samp{\} constructs.
+@node Char Classes
+@subsubsection Character Classes
+@cindex character classes in regexp
+
+ Here is a table of the classes you can use in a character alternative,
+in Emacs 21, and what they mean:
+
+@table @samp
+@item [:ascii:]
+This matches any ASCII (unibyte) character.
+@item [:alnum:]
+This matches any letter or digit. (At present, for multibyte
+characters, it matches anything that has word syntax.)
+@item [:alpha:]
+This matches any letter. (At present, for multibyte characters, it
+matches anything that has word syntax.)
+@item [:blank:]
+This matches space and tab only.
+@item [:cntrl:]
+This matches any ASCII control character.
+@item [:digit:]
+This matches @samp{0} through @samp{9}. Thus, @samp{[-+[:digit:]]}
+matches any digit, as well as @samp{+} and @samp{-}.
+@item [:graph:]
+This matches graphic characters---everything except ASCII control characters,
+space, and DEL.
+@item [:lower:]
+This matches any lower-case letter, as determined by
+the current case table (@pxref{Case Tables}).
+@item [:nonascii:]
+This matches any non-ASCII (multibyte) character.
+@item [:print:]
+This matches printing characters---everything except ASCII control
+characters and DEL.
+@item [:punct:]
+This matches any punctuation character. (At present, for multibyte
+characters, it matches anything that has non-word syntax.)
+@item [:space:]
+This matches any character that has whitespace syntax
+(@pxref{Syntax Class Table}).
+@item [:upper:]
+This matches any upper-case letter, as determined by
+the current case table (@pxref{Case Tables}).
+@item [:word:]
+This matches any character that has word syntax (@pxref{Syntax Class
+Table}).
+@item [:xdigit:]
+This matches the hexadecimal digits: @samp{0} through @samp{9}, @samp{a}
+through @samp{f} and @samp{A} through @samp{F}.
+@end table
+
+@node Regexp Backslash
+@subsubsection Backslash Constructs in Regular Expressions
+
+ For the most part, @samp{\} followed by any character matches only
+that character. However, there are several exceptions: certain
+two-character sequences starting with @samp{\} that have special
+meanings. (The character after the @samp{\} in such a sequence is
+always ordinary when used on its own.) Here is a table of the special
+@samp{\} constructs.
@table @samp
@item \|
@@ -376,7 +459,9 @@ but no other string.@refill
surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
@samp{\|}.@refill
-Full backtracking capability exists to handle multiple uses of @samp{\|}.
+Full backtracking capability exists to handle multiple uses of
+@samp{\|}, if you use the POSIX regular expression functions
+(@pxref{POSIX Regexps}).
@item \( @dots{} \)
@cindex @samp{(} in regexp
@@ -505,62 +590,6 @@ as @samp{[]]}), and so is a string that ends with a single @samp{\}. If
an invalid regular expression is passed to any of the search functions,
an @code{invalid-regexp} error is signaled.
-@defun regexp-quote string
-This function returns a regular expression string that matches exactly
-@var{string} and nothing else. This allows you to request an exact
-string match when calling a function that wants a regular expression.
-
-@example
-@group
-(regexp-quote "^The cat$")
- @result{} "\\^The cat\\$"
-@end group
-@end example
-
-One use of @code{regexp-quote} is to combine an exact string match with
-context described as a regular expression. For example, this searches
-for the string that is the value of @var{string}, surrounded by
-whitespace:
-
-@example
-@group
-(re-search-forward
- (concat "\\s-" (regexp-quote string) "\\s-"))
-@end group
-@end example
-@end defun
-
-@defun regexp-opt strings &optional paren
-@tindex regexp-opt
-This function returns an efficient regular expression that will match
-any of the strings @var{strings}. This is useful when you need to make
-matching or searching as fast as possible---for example, for Font Lock
-mode.
-
-If the optional argument @var{paren} is non-@code{nil}, then the
-returned regular expression is always enclosed by at least one
-parentheses-grouping construct.
-
-This simplified definition of @code{regexp-opt} produces a
-regular expression which is equivalent to the actual value
-(but not as efficient):
-
-@example
-(defun regexp-opt (strings paren)
- (let ((open-paren (if paren "\\(" ""))
- (close-paren (if paren "\\)" "")))
- (concat open-paren
- (mapconcat 'regexp-quote strings "\\|")
- close-paren)))
-@end example
-@end defun
-
-@defun regexp-opt-depth regexp
-@tindex regexp-opt-depth
-This function returns the total number of grouping constructs
-(parenthesized expressions) in @var{regexp}.
-@end defun
-
@node Regexp Example
@comment node-name, next, previous, up
@subsection Complex Regexp Example
@@ -624,6 +653,72 @@ Finally, the last part of the pattern matches any additional whitespace
beyond the minimum needed to end a sentence.
@end table
+@node Regexp Functions
+@subsection Regular Expression Functions
+
+ These functions operate on regular expressions.
+
+@defun regexp-quote string
+This function returns a regular expression whose only exact match is
+@var{string}. Using this regular expression in @code{looking-at} will
+succeed only if the next characters in the buffer are @var{string};
+using it in a search function will succeed if the text being searched
+contains @var{string}.
+
+This allows you to request an exact string match or search when calling
+a function that wants a regular expression.
+
+@example
+@group
+(regexp-quote "^The cat$")
+ @result{} "\\^The cat\\$"
+@end group
+@end example
+
+One use of @code{regexp-quote} is to combine an exact string match with
+context described as a regular expression. For example, this searches
+for the string that is the value of @var{string}, surrounded by
+whitespace:
+
+@example
+@group
+(re-search-forward
+ (concat "\\s-" (regexp-quote string) "\\s-"))
+@end group
+@end example
+@end defun
+
+@defun regexp-opt strings &optional paren
+@tindex regexp-opt
+This function returns an efficient regular expression that will match
+any of the strings @var{strings}. This is useful when you need to make
+matching or searching as fast as possible---for example, for Font Lock
+mode.
+
+If the optional argument @var{paren} is non-@code{nil}, then the
+returned regular expression is always enclosed by at least one
+parentheses-grouping construct.
+
+This simplified definition of @code{regexp-opt} produces a
+regular expression which is equivalent to the actual value
+(but not as efficient):
+
+@example
+(defun regexp-opt (strings paren)
+ (let ((open-paren (if paren "\\(" ""))
+ (close-paren (if paren "\\)" "")))
+ (concat open-paren
+ (mapconcat 'regexp-quote strings "\\|")
+ close-paren)))
+@end example
+@end defun
+
+@defun regexp-opt-depth regexp
+@tindex regexp-opt-depth
+This function returns the total number of grouping constructs
+(parenthesized expressions) in @var{regexp}.
+@end defun
+
@node Regexp Search
@section Regular Expression Searching
@cindex regular expression searching
@@ -908,10 +1003,19 @@ The argument @var{replacements} specifies what to replace occurrences
with. If it is a string, that string is used. It can also be a list of
strings, to be used in cyclic order.
+If @var{replacements} is a cons cell, @var{(@var{function}
+. @var{data})}, this means to call @var{function} after each match to
+get the replacement text. This function is called with two arguments:
+@var{data}, and the number of replacements already made.
+
If @var{repeat-count} is non-@code{nil}, it should be an integer. Then
it specifies how many times to use each of the strings in the
@var{replacements} list before advancing cyclicly to the next one.
+If @var{from-string} contains upper-case letters, then
+@code{perform-replace} binds @code{case-fold-search} to @code{nil}, and
+it uses the @code{replacements} without altering the case of them.
+
Normally, the keymap @code{query-replace-map} defines the possible user
responses for queries. The argument @var{map}, if non-@code{nil}, is a
keymap to use instead of @code{query-replace-map}.
@@ -1009,7 +1113,7 @@ match data around it, to prevent it from being overwritten.
@end menu
@node Replacing Match
-@subsection Replacing the Text That Matched
+@subsection Replacing the Text that Matched
This function replaces the text matched by the last search with
@var{replacement}.
@@ -1039,9 +1143,6 @@ If the original text contains just one word, and that word is a capital
letter, @code{replace-match} considers this a capitalized first word
rather than all upper case.
-If @code{case-replace} is @code{nil}, then case conversion is not done,
-regardless of the value of @var{fixed-case}. @xref{Searching and Case}.
-
If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
exactly as it is, the only alterations being case changes as needed.
If it is @code{nil} (the default), then the character @samp{\} is treated
@@ -1361,8 +1462,8 @@ preserve case. If the variable is @code{nil}, that means to use the
replacement text verbatim. A non-@code{nil} value means to convert the
case of the replacement text according to the text being replaced.
-The function @code{replace-match} is where this variable actually has
-its effect. @xref{Replacing Match}.
+This variable is used by passing it as an argument to the function
+@code{replace-match}. @xref{Replacing Match}.
@end defopt
@defopt case-fold-search