summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorPaul Eggert <eggert@cs.ucla.edu>2014-01-17 14:32:10 -0800
committerPaul Eggert <eggert@cs.ucla.edu>2014-01-17 14:32:41 -0800
commit1078b64302bbf5c0a46635772808ff7f75171dbc (patch)
tree6ef621097785597b6a0fad25fe53672bad16973e /doc
parent45284e38cfb07343ab50d20b116375c8a1d64196 (diff)
downloadgrep-1078b64302bbf5c0a46635772808ff7f75171dbc.tar.gz
grep: DFA now uses rational ranges in unibyte locales
Problem reported by Aharon Robbins in <http://bugs.gnu.org/16481>. * NEWS: * doc/grep.texi (Environment Variables) (Character Classes and Bracket Expressions): Document this. * src/dfa.c (parse_bracket_exp): Treat unibyte locales like multibyte.
Diffstat (limited to 'doc')
-rw-r--r--doc/grep.texi19
1 files changed, 9 insertions, 10 deletions
diff --git a/doc/grep.texi b/doc/grep.texi
index 473a1816..42fb9a23 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -960,8 +960,8 @@ They are omitted (i.e., false) by default and become true when specified.
@cindex national language support
@cindex NLS
These variables specify the locale for the @code{LC_COLLATE} category,
-which determines the collating sequence
-used to interpret range expressions like @samp{[a-z]}.
+which might affect how range expressions like @samp{[a-z]} are
+interpreted.
@item LC_ALL
@itemx LC_CTYPE
@@ -1223,14 +1223,13 @@ For example, the regular expression
Within a bracket expression, a @dfn{range expression} consists of two
characters separated by a hyphen.
It matches any single character that
-sorts between the two characters, inclusive, using the locale's
-collating sequence and character set.
-For example, in the default C
-locale, @samp{[a-d]} is equivalent to @samp{[abcd]}.
-Many locales sort
-characters in dictionary order, and in these locales @samp{[a-d]} is
-typically not equivalent to @samp{[abcd]};
-it might be equivalent to @samp{[aBbCcDd]}, for example.
+sorts between the two characters, inclusive.
+In the default C locale, the sorting sequence is the native character
+order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
+In other locales, the sorting sequence is not specified, and
+@samp{[a-d]} might be equivalent to @samp{[abcd]} or to
+@samp{[aBbCcDd]}, or it might fail to match any character, or the set of
+characters that it matches might even be erratic.
To obtain the traditional interpretation
of bracket expressions, you can use the @samp{C} locale by setting the
@env{LC_ALL} environment variable to the value @samp{C}.