diff options
author | Alain Magloire <alainm@rcsm.ee.mcgill.ca> | 2001-02-08 15:03:50 +0000 |
---|---|---|
committer | Alain Magloire <alainm@rcsm.ee.mcgill.ca> | 2001-02-08 15:03:50 +0000 |
commit | 67d86c7220648435fdfe63dbfcb5494180d1085c (patch) | |
tree | 88422478f83ec2ce9c22b36d345523242979bfab | |
parent | 5d6c5528ceaf61a916d667af1a54562c9caa7789 (diff) | |
download | grep-67d86c7220648435fdfe63dbfcb5494180d1085c.tar.gz |
Doc improvoments fro P.E.
-rw-r--r-- | ChangeLog | 10 | ||||
-rw-r--r-- | NEWS | 7 | ||||
-rw-r--r-- | doc/grep.1 | 93 | ||||
-rw-r--r-- | doc/grep.texi | 77 |
4 files changed, 138 insertions, 49 deletions
@@ -1,4 +1,10 @@ -2000-03-26 Paul Eggert <eggert@twinsun.com> +2000-04-06 Paul Eggert + + * doc/grep.1, doc/grep.texi, NEWS: Improve the explanation of + locale-dependent behavior of range expressions. Mention + LC_COLLATE, since this affects range expressions. + +2000-03-26 Paul Eggert * Makefile.am (ACINCLUDE_INPUTS): Add decl.m4, inttypes_h.m4, uintmax_t.m4, ulonglong.m4, xstrtoumax.m4. @@ -27,7 +33,7 @@ New files, taken unchanged from textutils, fileutils, sh-utils and/or tar. -2000-03-23 Paul Eggert <eggert@twinsun.com> +2000-03-23 Paul Eggert * src/search.c (Pcompile): Add support for NUL bytes in Perl regular expressions. @@ -1,3 +1,10 @@ + - Bracket regular expressions like [a-z] are now locale-dependent, + as POSIX.2 requires. For example, many locales sort characters in + dictionary order, and in these locales the regular expression + [a-d] is not equivalent to [abcd]; it might be equivalent to + [aBbCcDd], for example. To obtain the traditional interpretation + of bracket expressions, you can use the C locale by setting the + LC_ALL environment variable to the value "C". - The new -P or --perl-regexp option tells grep to interpert the pattern as a Perl regular expression. @@ -12,7 +12,7 @@ .de Id .ds Dt \\$4 .. -.Id $Id: grep.1,v 1.13 2001/02/08 05:33:57 alainm Exp $ +.Id $Id: grep.1,v 1.14 2001/02/08 15:03:50 alainm Exp $ .TH GREP 1 \*(Dt "GNU Project" .SH NAME grep, egrep, fgrep \- print lines matching a pattern @@ -395,11 +395,13 @@ a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash. .PP -A list of characters enclosed by +A +.I "bracket expression" +is a list of characters enclosed by .B [ and -.B ] -matches any single +.BR ] . +It matches any single character in that list; if the first character of the list is the caret .B ^ @@ -408,10 +410,32 @@ then it matches any character in the list. For example, the regular expression .B [0123456789] -matches any single digit. A range of characters -may be specified by giving the first and last characters, separated -by a hyphen. -Finally, certain named classes of characters are predefined. +matches any single digit. +.PP +Within a bracket expression, a +.I "range expression" +consists of two characters separated by a hyphen. +It matches any single character that sorts between the two characters, +inclusive, using the locale's collating sequence and character set. +For example, in the default C locale, +.B [a\-d] +is equivalent to +.BR [abcd] . +Many locales sort characters in dictionary order, and in these locales +.B [a\-d] +is typically not equivalent to +.BR [abcd] ; +it might be equivalent to +.BR [aBbCcDd] , +for example. +To obtain the traditional interpretation of bracket expressions, +you can use the C locale by setting the +.B LC_ALL +environment variable to the value +.BR C . +.PP +Finally, certain named classes of characters are predefined within +bracket expressions, as follows. Their names are self explanatory, and they are .BR [:alnum:] , .BR [:alpha:] , @@ -428,8 +452,8 @@ and For example, .B [[:alnum:]] means -.BR [0-9A-Za-z] , -except the latter form depends upon the \s-1POSIX\s0 locale and the +.BR [0\-9A\-Za\-z] , +except the latter form depends upon the C locale and the \s-1ASCII\s0 character encoding, whereas the former is independent of locale and character set. (Note that the brackets in these class names are part of the symbolic @@ -576,6 +600,29 @@ instead of reporting a syntax error in the regular expression. \s-1POSIX.2\s0 allows this behavior as an extension, but portable scripts should avoid it. .SH "ENVIRONMENT VARIABLES" +Grep's behavior is affected by the following environment variables. +.PP +A locale +.BI LC_ foo +is specified by examining the three environment variables +.BR LC_ALL , +.BR LC_\fIfoo\fP , +.BR LANG , +in that order. +The first of these variables that is set specifies the locale. +For example, if +.B LC_ALL +is not set, but +.B LC_MESSAGES +is set to +.BR pt_BR , +then Brazilian Portuguese is used for the +.B LC_MESSAGES +locale. +The C locale is used if none of these environment variables are set, +or if the locale catalog is not installed, or if +.B grep +was not compiled with national language support (\s-1NLS\s0). .TP .B GREP_OPTIONS This variable specifies default options to be placed in front of any @@ -593,28 +640,26 @@ Option specifications are separated by whitespace. A backslash escapes the next character, so it can be used to specify an option containing whitespace or a backslash. .TP -\fBLC_ALL\fP, \fBLC_MESSAGES\fP, \fBLANG\fP +\fBLC_ALL\fP, \fBLC_COLLATE\fP, \fBLANG\fP These variables specify the -.B LC_MESSAGES -locale, which determines the language that -.B grep -uses for messages. -The locale is determined by the first of these variables that is set. -American English is used if none of these environment variables are set, -or if the message catalog is not installed, or if -.B grep -was not compiled with national language support (\s-1NLS\s0). +.B LC_COLLATE +locale, which determines the collating sequence used to interpret +range expressions like +.BR [a\-z] . .TP \fBLC_ALL\fP, \fBLC_CTYPE\fP, \fBLANG\fP These variables specify the .B LC_CTYPE locale, which determines the type of characters, e.g., which characters are whitespace. -The locale is determined by the first of these variables that is set. -The \s-1POSIX\s0 locale is used if none of these environment variables -are set, or if the locale catalog is not installed, or if +.TP +\fBLC_ALL\fP, \fBLC_MESSAGES\fP, \fBLANG\fP +These variables specify the +.B LC_MESSAGES +locale, which determines the language that .B grep -was not compiled with national language support (\s-1NLS\s0). +uses for messages. +The default C locale uses American English messages. .TP .B POSIXLY_CORRECT If set, diff --git a/doc/grep.texi b/doc/grep.texi index caeba681..9595bdca 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -501,9 +501,20 @@ matching engine is used. @xref{Grep Programs}. @section Environment Variables Grep's behavior is affected by the following environment variables. + +A locale @code{LC_@var{foo}} is specified by examining the three +environment variables @env{LC_ALL}, @env{LC_@var{foo}}, and @env{LANG}, +in that order. The first of these variables that is set specifies the +locale. For example, if @env{LC_ALL} is not set, but @env{LC_MESSAGES} +is set to @samp{pt_BR}, then Brazilian Portuguese is used for the +@code{LC_MESSAGES} locale. The C locale is used if none of these +environment variables are set, or if the locale catalog is not +installed, or if @command{grep} was not compiled with national language +support (@sc{nls}). + @cindex environment variables -@table @code +@table @env @item GREP_OPTIONS @vindex GREP_OPTIONS @@ -518,22 +529,17 @@ whitespace. A backslash escapes the next character, so it can be used to specify an option containing whitespace or a backslash. @item LC_ALL -@itemx LC_MESSAGES +@itemx LC_COLLATE @itemx LANG @vindex LC_ALL -@vindex LC_MESSAGES +@vindex LC_COLLATE @vindex LANG -@cindex language of messages -@cindex message language +@cindex character type @cindex national language support @cindex NLS -@cindex translation of message language -These variables specify the @code{LC_MESSAGES} locale, which determines -the language that @command{grep} uses for messages. The locale is determined -by the first of these variables that is set. American English is used -if none of these environment variables are set, or if the message -catalog is not installed, or if @command{grep} was not compiled with national -language support (@sc{nls}). +These variables specify the @code{LC_COLLATE} locale, which determines +the collating sequence used to interpret range expressions like +@samp{[a-z]}. @item LC_ALL @itemx LC_CTYPE @@ -545,11 +551,22 @@ language support (@sc{nls}). @cindex national language support @cindex NLS These variables specify the @code{LC_CTYPE} locale, which determines the -type of characters, e.g., which characters are whitespace. The locale is -determined by the first of these variables that is set. The @sc{posix} -locale is used if none of these environment variables are set, or if the -locale catalog is not installed, or if @command{grep} was not compiled with -national language support (@sc{nls}). +type of characters, e.g., which characters are whitespace. + +@item LC_ALL +@itemx LC_MESSAGES +@itemx LANG +@vindex LC_ALL +@vindex LC_MESSAGES +@vindex LANG +@cindex language of messages +@cindex message language +@cindex national language support +@cindex NLS +@cindex translation of message language +These variables specify the @code{LC_MESSAGES} locale, which determines +the language that @command{grep} uses for messages. The default C +locale uses American English messages. @item POSIXLY_CORRECT @vindex POSIXLY_CORRECT @@ -649,17 +666,31 @@ The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash. -A list of characters enclosed by @samp{[} and @samp{]} matches any + +@cindex bracket expression +A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and +@samp{]}. It matches any single character in that list; if the first character of the list is the caret @samp{^}, then it matches any character @strong{not} in the list. For example, the regular expression @samp{[0123456789]} matches any single digit. -A range of characters may be specified by giving the first -and last characters, separated by a hyphen. -Finally, certain named classes of characters are predefined, as follows. +@cindex range expression +Within a bracket expression, a @dfn{range expression} consists of two +characters separated by a hyphen. It matches any single character that +sorts between the two characters, inclusive, using the locale's +collating sequence and character set. For example, in the default C +locale, @samp{[a-d]} is equivalent to @samp{[abcd]}. Many locales sort +characters in dictionary order, and in these locales @samp{[a-d]} is +typically not equivalent to @samp{[abcd]}; it might be equivalent to +@samp{[aBbCcDd]}, for example. To obtain the traditional interpretation +of bracket expressions, you can use the C locale by setting the +@env{LC_ALL} environment variable to the value @samp{C}. + +Finally, certain named classes of characters are predefined within +bracket expressions, as follows. Their interpretation depends on the @code{LC_CTYPE} locale; the -interpretation below is that of the @sc{posix} locale, which is the default +interpretation below is that of the C locale, which is the default if no @code{LC_CTYPE} locale is specified. @cindex classes of characters @@ -743,7 +774,7 @@ Hexadecimal digits: @end table For example, @samp{[[:alnum:]]} means @samp{[0-9A-Za-z]}, except the latter -depends upon the @sc{posix} locale and the @sc{ascii} character +depends upon the C locale and the @sc{ascii} character encoding, whereas the former is independent of locale and character set. (Note that the brackets in these class names are part of the symbolic names, and must be included in addition to |