diff options
Diffstat (limited to 'lispref/searching.texi')
-rw-r--r-- | lispref/searching.texi | 285 |
1 files changed, 193 insertions, 92 deletions
diff --git a/lispref/searching.texi b/lispref/searching.texi index 0f465edc011..68593e4bbef 100644 --- a/lispref/searching.texi +++ b/lispref/searching.texi @@ -34,9 +34,9 @@ portions of it. These are the primitive functions for searching through the text in a buffer. They are meant for use in programs, but you may call them -interactively. If you do so, they prompt for the search string; -@var{limit} and @var{noerror} are set to @code{nil}, and @var{repeat} -is set to 1. +interactively. If you do so, they prompt for the search string; the +arguments @var{limit} and @var{noerror} are @code{nil}, and @var{repeat} +is 1. These search functions convert the search string to multibyte if the buffer is multibyte; they convert the search string to unibyte if the @@ -167,6 +167,7 @@ regexps; the following section says how to search for them. @menu * Syntax of Regexps:: Rules for writing regular expressions. +* Regexp Functions:: Functions for operating on regular expressions. * Regexp Example:: Illustrates regular expression syntax. @end menu @@ -182,21 +183,33 @@ special characters will be defined in the future. Any other character appearing in a regular expression is ordinary, unless a @samp{\} precedes it. -For example, @samp{f} is not a special character, so it is ordinary, and + For example, @samp{f} is not a special character, so it is ordinary, and therefore @samp{f} is a regular expression that matches the string @samp{f} and no other string. (It does @emph{not} match the string -@samp{ff}.) Likewise, @samp{o} is a regular expression that matches -only @samp{o}.@refill +@samp{fg}, but it does match a @emph{part} of that string.) Likewise, +@samp{o} is a regular expression that matches only @samp{o}.@refill -Any two regular expressions @var{a} and @var{b} can be concatenated. The + Any two regular expressions @var{a} and @var{b} can be concatenated. The result is a regular expression that matches a string if @var{a} matches some amount of the beginning of that string and @var{b} matches the rest of the string.@refill -As a simple example, we can concatenate the regular expressions @samp{f} + As a simple example, we can concatenate the regular expressions @samp{f} and @samp{o} to get the regular expression @samp{fo}, which matches only the string @samp{fo}. Still trivial. To do something more powerful, you -need to use one of the special characters. Here is a list of them: +need to use one of the special regular expression constructs. + +@menu +* Regexp Special:: Special characters in regular expressions. +* Char Classes:: Character classes used in regular expressions. +* Regexp Backslash:: Backslash-sequences in regular expressions. +@end menu + +@node Regexp Special +@subsubsection Special Characters in Regular Expressions + + Here is a list of the characters that are special in a regular +expression. @need 800 @table @asis @@ -266,23 +279,10 @@ matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. You can also include character ranges in a character alternative, by writing the starting and ending characters with a @samp{-} between them. -Thus, @samp{[a-z]} matches any lower-case @sc{ASCII} letter. Ranges may be +Thus, @samp{[a-z]} matches any lower-case @sc{ascii} letter. Ranges may be intermixed freely with individual characters, as in @samp{[a-z$%.]}, -which matches any lower case @sc{ASCII} letter or @samp{$}, @samp{%} or +which matches any lower case @sc{ascii} letter or @samp{$}, @samp{%} or period. - -You cannot always match all non-@sc{ASCII} characters with the regular -expression @samp{[\200-\377]}. This works when searching a unibyte -buffer or string (@pxref{Text Representations}), but not in a multibyte -buffer or string, because many non-@sc{ASCII} characters have codes -above octal 0377. However, the regular expression @samp{[^\000-\177]} -does match all non-@sc{ASCII} characters, in both multibyte and unibyte -representations, because only the @sc{ASCII} characters are excluded. - -The beginning and end of a range must be in the same character set -(@pxref{Character Sets}). Thus, @samp{[a-\x8e0]} is invalid because -@samp{a} is in the @sc{ASCII} character set but the character 0x8e0 -(@samp{a} with grave accent) is in the Emacs character set for Latin-1. Note that the usual regexp special characters are not special inside a character alternative. A completely different set of characters is @@ -297,6 +297,27 @@ matches both @samp{]} and @samp{-}. To include @samp{^} in a character alternative, put it anywhere but at the beginning. +The beginning and end of a range must be in the same character set +(@pxref{Character Sets}). Thus, @samp{[a-\x8e0]} is invalid because +@samp{a} is in the @sc{ascii} character set but the character 0x8e0 +(@samp{a} with grave accent) is in the Emacs character set for Latin-1. + +You cannot always match all non-@sc{ascii} characters with the regular +expression @samp{[\200-\377]}. This works when searching a unibyte +buffer or string (@pxref{Text Representations}), but not in a multibyte +buffer or string, because many non-@sc{ascii} characters have codes +above octal 0377. However, the regular expression @samp{[^\000-\177]} +does match all non-@sc{ascii} characters (see below regarding @samp{^}), +in both multibyte and unibyte representations, because only the +@sc{ascii} characters are excluded. + +Starting in Emacs 21, a character alternative can also specify named +character classes (@pxref{Char Classes}). This is a POSIX feature whose +syntax is @samp{[:@var{class}:]}. Using a character class is equivalent +to mentioning each of the characters in that class; but the latter is +not feasible in practice, since some classes include thousands of +different characters. + @item @samp{[^ @dots{} ]} @cindex @samp{^} in regexp @samp{[^} begins a @dfn{complemented character alternative}, which matches any @@ -321,14 +342,21 @@ the beginning of a line. When matching a string instead of a buffer, @samp{^} matches at the beginning of the string or after a newline character @samp{\n}. +For historical compatibility reasons, @samp{^} can be used only at the +beginning of the regular expression, or after @samp{\(} or @samp{\|}. + @item @samp{$} @cindex @samp{$} in regexp +@cindex end of line in regexp is similar to @samp{^} but matches only at the end of a line. Thus, @samp{x+$} matches a string of one @samp{x} or more at the end of a line. When matching a string instead of a buffer, @samp{$} matches at the end of the string or before a newline character @samp{\n}. +For historical compatibility reasons, @samp{$} can be used only at the +end of the regular expression, or before @samp{\)} or @samp{\|}. + @item @samp{\} @cindex @samp{\} in regexp has two functions: it quotes the special characters (including @@ -354,11 +382,66 @@ ordinary since there is no preceding expression on which the @samp{*} can act. It is poor practice to depend on this behavior; quote the special character anyway, regardless of where it appears.@refill -For the most part, @samp{\} followed by any character matches only that -character. However, there are several exceptions: two-character -sequences starting with @samp{\} which have special meanings. (The -second character in such a sequence is always ordinary when used on its -own.) Here is a table of @samp{\} constructs. +@node Char Classes +@subsubsection Character Classes +@cindex character classes in regexp + + Here is a table of the classes you can use in a character alternative, +in Emacs 21, and what they mean: + +@table @samp +@item [:ascii:] +This matches any ASCII (unibyte) character. +@item [:alnum:] +This matches any letter or digit. (At present, for multibyte +characters, it matches anything that has word syntax.) +@item [:alpha:] +This matches any letter. (At present, for multibyte characters, it +matches anything that has word syntax.) +@item [:blank:] +This matches space and tab only. +@item [:cntrl:] +This matches any ASCII control character. +@item [:digit:] +This matches @samp{0} through @samp{9}. Thus, @samp{[-+[:digit:]]} +matches any digit, as well as @samp{+} and @samp{-}. +@item [:graph:] +This matches graphic characters---everything except ASCII control characters, +space, and DEL. +@item [:lower:] +This matches any lower-case letter, as determined by +the current case table (@pxref{Case Tables}). +@item [:nonascii:] +This matches any non-ASCII (multibyte) character. +@item [:print:] +This matches printing characters---everything except ASCII control +characters and DEL. +@item [:punct:] +This matches any punctuation character. (At present, for multibyte +characters, it matches anything that has non-word syntax.) +@item [:space:] +This matches any character that has whitespace syntax +(@pxref{Syntax Class Table}). +@item [:upper:] +This matches any upper-case letter, as determined by +the current case table (@pxref{Case Tables}). +@item [:word:] +This matches any character that has word syntax (@pxref{Syntax Class +Table}). +@item [:xdigit:] +This matches the hexadecimal digits: @samp{0} through @samp{9}, @samp{a} +through @samp{f} and @samp{A} through @samp{F}. +@end table + +@node Regexp Backslash +@subsubsection Backslash Constructs in Regular Expressions + + For the most part, @samp{\} followed by any character matches only +that character. However, there are several exceptions: certain +two-character sequences starting with @samp{\} that have special +meanings. (The character after the @samp{\} in such a sequence is +always ordinary when used on its own.) Here is a table of the special +@samp{\} constructs. @table @samp @item \| @@ -376,7 +459,9 @@ but no other string.@refill surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of @samp{\|}.@refill -Full backtracking capability exists to handle multiple uses of @samp{\|}. +Full backtracking capability exists to handle multiple uses of +@samp{\|}, if you use the POSIX regular expression functions +(@pxref{POSIX Regexps}). @item \( @dots{} \) @cindex @samp{(} in regexp @@ -505,62 +590,6 @@ as @samp{[]]}), and so is a string that ends with a single @samp{\}. If an invalid regular expression is passed to any of the search functions, an @code{invalid-regexp} error is signaled. -@defun regexp-quote string -This function returns a regular expression string that matches exactly -@var{string} and nothing else. This allows you to request an exact -string match when calling a function that wants a regular expression. - -@example -@group -(regexp-quote "^The cat$") - @result{} "\\^The cat\\$" -@end group -@end example - -One use of @code{regexp-quote} is to combine an exact string match with -context described as a regular expression. For example, this searches -for the string that is the value of @var{string}, surrounded by -whitespace: - -@example -@group -(re-search-forward - (concat "\\s-" (regexp-quote string) "\\s-")) -@end group -@end example -@end defun - -@defun regexp-opt strings &optional paren -@tindex regexp-opt -This function returns an efficient regular expression that will match -any of the strings @var{strings}. This is useful when you need to make -matching or searching as fast as possible---for example, for Font Lock -mode. - -If the optional argument @var{paren} is non-@code{nil}, then the -returned regular expression is always enclosed by at least one -parentheses-grouping construct. - -This simplified definition of @code{regexp-opt} produces a -regular expression which is equivalent to the actual value -(but not as efficient): - -@example -(defun regexp-opt (strings paren) - (let ((open-paren (if paren "\\(" "")) - (close-paren (if paren "\\)" ""))) - (concat open-paren - (mapconcat 'regexp-quote strings "\\|") - close-paren))) -@end example -@end defun - -@defun regexp-opt-depth regexp -@tindex regexp-opt-depth -This function returns the total number of grouping constructs -(parenthesized expressions) in @var{regexp}. -@end defun - @node Regexp Example @comment node-name, next, previous, up @subsection Complex Regexp Example @@ -624,6 +653,72 @@ Finally, the last part of the pattern matches any additional whitespace beyond the minimum needed to end a sentence. @end table +@node Regexp Functions +@subsection Regular Expression Functions + + These functions operate on regular expressions. + +@defun regexp-quote string +This function returns a regular expression whose only exact match is +@var{string}. Using this regular expression in @code{looking-at} will +succeed only if the next characters in the buffer are @var{string}; +using it in a search function will succeed if the text being searched +contains @var{string}. + +This allows you to request an exact string match or search when calling +a function that wants a regular expression. + +@example +@group +(regexp-quote "^The cat$") + @result{} "\\^The cat\\$" +@end group +@end example + +One use of @code{regexp-quote} is to combine an exact string match with +context described as a regular expression. For example, this searches +for the string that is the value of @var{string}, surrounded by +whitespace: + +@example +@group +(re-search-forward + (concat "\\s-" (regexp-quote string) "\\s-")) +@end group +@end example +@end defun + +@defun regexp-opt strings &optional paren +@tindex regexp-opt +This function returns an efficient regular expression that will match +any of the strings @var{strings}. This is useful when you need to make +matching or searching as fast as possible---for example, for Font Lock +mode. + +If the optional argument @var{paren} is non-@code{nil}, then the +returned regular expression is always enclosed by at least one +parentheses-grouping construct. + +This simplified definition of @code{regexp-opt} produces a +regular expression which is equivalent to the actual value +(but not as efficient): + +@example +(defun regexp-opt (strings paren) + (let ((open-paren (if paren "\\(" "")) + (close-paren (if paren "\\)" ""))) + (concat open-paren + (mapconcat 'regexp-quote strings "\\|") + close-paren))) +@end example +@end defun + +@defun regexp-opt-depth regexp +@tindex regexp-opt-depth +This function returns the total number of grouping constructs +(parenthesized expressions) in @var{regexp}. +@end defun + @node Regexp Search @section Regular Expression Searching @cindex regular expression searching @@ -908,10 +1003,19 @@ The argument @var{replacements} specifies what to replace occurrences with. If it is a string, that string is used. It can also be a list of strings, to be used in cyclic order. +If @var{replacements} is a cons cell, @var{(@var{function} +. @var{data})}, this means to call @var{function} after each match to +get the replacement text. This function is called with two arguments: +@var{data}, and the number of replacements already made. + If @var{repeat-count} is non-@code{nil}, it should be an integer. Then it specifies how many times to use each of the strings in the @var{replacements} list before advancing cyclicly to the next one. +If @var{from-string} contains upper-case letters, then +@code{perform-replace} binds @code{case-fold-search} to @code{nil}, and +it uses the @code{replacements} without altering the case of them. + Normally, the keymap @code{query-replace-map} defines the possible user responses for queries. The argument @var{map}, if non-@code{nil}, is a keymap to use instead of @code{query-replace-map}. @@ -1009,7 +1113,7 @@ match data around it, to prevent it from being overwritten. @end menu @node Replacing Match -@subsection Replacing the Text That Matched +@subsection Replacing the Text that Matched This function replaces the text matched by the last search with @var{replacement}. @@ -1039,9 +1143,6 @@ If the original text contains just one word, and that word is a capital letter, @code{replace-match} considers this a capitalized first word rather than all upper case. -If @code{case-replace} is @code{nil}, then case conversion is not done, -regardless of the value of @var{fixed-case}. @xref{Searching and Case}. - If @var{literal} is non-@code{nil}, then @var{replacement} is inserted exactly as it is, the only alterations being case changes as needed. If it is @code{nil} (the default), then the character @samp{\} is treated @@ -1361,8 +1462,8 @@ preserve case. If the variable is @code{nil}, that means to use the replacement text verbatim. A non-@code{nil} value means to convert the case of the replacement text according to the text being replaced. -The function @code{replace-match} is where this variable actually has -its effect. @xref{Replacing Match}. +This variable is used by passing it as an argument to the function +@code{replace-match}. @xref{Replacing Match}. @end defopt @defopt case-fold-search |