summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorPaul Eggert <eggert@cs.ucla.edu>2019-04-02 15:00:59 -0700
committerPaul Eggert <eggert@cs.ucla.edu>2019-04-02 15:01:34 -0700
commitf9ff60e0d7288e30cdbd1e43225059f1374441f1 (patch)
tree0e7e37a750e55adc0f959ca372369f4aa81cd3c2 /doc
parentbb669166ba6b33cd1a927c772c87ee2240a10f89 (diff)
downloademacs-f9ff60e0d7288e30cdbd1e43225059f1374441f1.tar.gz
Improve regexp advice again, and unchain ranges
* doc/lispref/searching.texi (Regexp Special): Mention char classes earlier, in a more-logical place. Advise sticking to ASCII letters and digits in ranges. Reword negative advice to make it clearer that it’s negative. * lisp/files.el (make-auto-save-file-name): * lisp/gnus/message.el (message-mailer-swallows-blank-line): * lisp/gnus/nndoc.el (nndoc-lanl-gov-announce-type-p) (nndoc-generate-lanl-gov-head): * lisp/org/org-eshell.el (org-eshell-open): * lisp/org/org.el (org-deadline-time-hour-regexp) (org-scheduled-time-hour-regexp): * lisp/progmodes/bat-mode.el (bat-font-lock-keywords): * lisp/progmodes/bug-reference.el (bug-reference-bug-regexp): * lisp/textmodes/less-css-mode.el (less-css-font-lock-keywords): * lisp/vc/vc-cvs.el (vc-cvs-valid-symbolic-tag-name-p): * lisp/vc/vc-svn.el (vc-svn-valid-symbolic-tag-name-p): Avoid attempts to chain ranges, as this can be confusing. For example, instead of [0-9-_.], use [0-9_.-].
Diffstat (limited to 'doc')
-rw-r--r--doc/lispref/searching.texi52
1 files changed, 32 insertions, 20 deletions
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 72ee9233a3c..8775254dd07 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -395,9 +395,18 @@ or @samp{$}, @samp{%} or period. However, the ending character of one
range should not be the starting point of another one; for example,
@samp{[a-m-z]} should be avoided.
+A character alternative can also specify named character classes
+(@pxref{Char Classes}). This is a POSIX feature. For example,
+@samp{[[:ascii:]]} matches any @acronym{ASCII} character.
+Using a character class is equivalent to mentioning each of the
+characters in that class; but the latter is not feasible in practice,
+since some classes include thousands of different characters.
+A character class should not appear as the lower or upper bound
+of a range.
+
The usual regexp special characters are not special inside a
character alternative. A completely different set of characters is
-special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
+special: @samp{]}, @samp{-} and @samp{^}.
To include @samp{]} in a character alternative, put it at the
beginning. To include @samp{^}, put it anywhere but at the beginning.
To include @samp{-}, put it at the end. Thus, @samp{[]^-]} matches
@@ -430,33 +439,36 @@ matches only @samp{/} rather than the likely-intended four characters.
@end enumerate
Some kinds of character alternatives are not the best style even
-though they are standardized by POSIX and are portable. They include:
+though they have a well-defined meaning in Emacs. They include:
@enumerate
@item
-A character alternative can include duplicates. For example,
-@samp{[XYa-yYb-zX]} is less clear than @samp{[XYa-z]}.
+Although a range's bound can be almost any character, it is better
+style to stay within natural sequences of ASCII letters and digits
+because most people have not memorized character code tables.
+For example, @samp{[.-9]} is less clear than @samp{[./0-9]},
+and @samp{[`-~]} is less clear than @samp{[`a-z@{|@}~]}.
+Unicode character escapes can help here; for example, for most programmers
+@samp{[ก-ฺ฿-๛]} is less clear than @samp{[\u0E01-\u0E3A\u0E3F-\u0E5B]}.
@item
-A range can denote just one, two, or three characters. For example,
-@samp{[(-(]} is less clear than @samp{[(]}, @samp{[*-+]} is less clear
-than @samp{[*+]}, and @samp{[*-,]} is less clear than @samp{[*+,]}.
+Although a character alternative can include duplicates, it is better
+style to avoid them. For example, @samp{[XYa-yYb-zX]} is less clear
+than @samp{[XYa-z]}.
@item
-A @samp{-} also appear at the beginning of a character alternative, or
-as the upper bound of a range. For example, although @samp{[-a-z]} is
-valid, @samp{[a-z-]} is better style; and although @samp{[!--/]} is
-valid, @samp{[!-,/-]} is clearer.
-@end enumerate
+Although a range can denote just one, two, or three characters, it
+is simpler to list the characters. For example,
+@samp{[a-a0]} is less clear than @samp{[a0]}, @samp{[i-j]} is less clear
+than @samp{[ij]}, and @samp{[i-k]} is less clear than @samp{[ijk]}.
-A character alternative can also specify named character classes
-(@pxref{Char Classes}). This is a POSIX feature. For example,
-@samp{[[:ascii:]]} matches any @acronym{ASCII} character.
-Using a character class is equivalent to mentioning each of the
-characters in that class; but the latter is not feasible in practice,
-since some classes include thousands of different characters.
-A character class should not appear as the lower or upper bound
-of a range.
+@item
+Although a @samp{-} can appear at the beginning of a character
+alternative or as the upper bound of a range, it is better style to
+put @samp{-} by itself at the end of a character alternative. For
+example, although @samp{[-a-z]} is valid, @samp{[a-z-]} is better
+style; and although @samp{[*--]} is valid, @samp{[*+,-]} is clearer.
+@end enumerate
@item @samp{[^ @dots{} ]}
@cindex @samp{^} in regexp