diff options
author | Paul Eggert <eggert@cs.ucla.edu> | 2014-01-17 14:32:10 -0800 |
---|---|---|
committer | Paul Eggert <eggert@cs.ucla.edu> | 2014-01-17 14:32:41 -0800 |
commit | 1078b64302bbf5c0a46635772808ff7f75171dbc (patch) | |
tree | 6ef621097785597b6a0fad25fe53672bad16973e /doc | |
parent | 45284e38cfb07343ab50d20b116375c8a1d64196 (diff) | |
download | grep-1078b64302bbf5c0a46635772808ff7f75171dbc.tar.gz |
grep: DFA now uses rational ranges in unibyte locales
Problem reported by Aharon Robbins in <http://bugs.gnu.org/16481>.
* NEWS:
* doc/grep.texi (Environment Variables)
(Character Classes and Bracket Expressions):
Document this.
* src/dfa.c (parse_bracket_exp): Treat unibyte locales like multibyte.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/grep.texi | 19 |
1 files changed, 9 insertions, 10 deletions
diff --git a/doc/grep.texi b/doc/grep.texi index 473a1816..42fb9a23 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -960,8 +960,8 @@ They are omitted (i.e., false) by default and become true when specified. @cindex national language support @cindex NLS These variables specify the locale for the @code{LC_COLLATE} category, -which determines the collating sequence -used to interpret range expressions like @samp{[a-z]}. +which might affect how range expressions like @samp{[a-z]} are +interpreted. @item LC_ALL @itemx LC_CTYPE @@ -1223,14 +1223,13 @@ For example, the regular expression Within a bracket expression, a @dfn{range expression} consists of two characters separated by a hyphen. It matches any single character that -sorts between the two characters, inclusive, using the locale's -collating sequence and character set. -For example, in the default C -locale, @samp{[a-d]} is equivalent to @samp{[abcd]}. -Many locales sort -characters in dictionary order, and in these locales @samp{[a-d]} is -typically not equivalent to @samp{[abcd]}; -it might be equivalent to @samp{[aBbCcDd]}, for example. +sorts between the two characters, inclusive. +In the default C locale, the sorting sequence is the native character +order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}. +In other locales, the sorting sequence is not specified, and +@samp{[a-d]} might be equivalent to @samp{[abcd]} or to +@samp{[aBbCcDd]}, or it might fail to match any character, or the set of +characters that it matches might even be erratic. To obtain the traditional interpretation of bracket expressions, you can use the @samp{C} locale by setting the @env{LC_ALL} environment variable to the value @samp{C}. |