diff options
author | Paolo Bonzini <bonzini@gnu.org> | 2008-09-29 09:31:27 +0200 |
---|---|---|
committer | Paolo Bonzini <bonzini@gnu.org> | 2008-09-29 10:24:09 +0200 |
commit | 41e0cf2dac2364bc5c46c516143eccb59408e4ff (patch) | |
tree | f2feedc338d4566395b9cd040f58364d106a2124 /doc | |
parent | 07df2df3c2de5cc55525f5a38788e0c05fb08cce (diff) | |
download | sed-41e0cf2dac2364bc5c46c516143eccb59408e4ff.tar.gz |
add `z' extension
2008-09-29 Paolo Bonzini <bonzini@gnu.org>
* BUGS: Document s/.*.// behavior with invalid multibyte sequences.
* NEWS: Document `z' extension.
* doc/sed-in.texi: Document both things.
* sed/compile.c (compile_program): Recognize `z'.
* sed/execute.c (execute_program): Execute `z'.
* testsuite/Makefile.am: Add badenc test.
* testsuite/Makefile.tests: Add badenc test.
* testsuite/badenc.good: New.
* testsuite/badenc.inp: New.
* testsuite/badenc.sed: New.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/sed-in.texi | 34 | ||||
-rw-r--r-- | doc/sed.texi | 34 |
2 files changed, 68 insertions, 0 deletions
diff --git a/doc/sed-in.texi b/doc/sed-in.texi index cee2505..c8bb21d 100644 --- a/doc/sed-in.texi +++ b/doc/sed-in.texi @@ -1489,6 +1489,18 @@ This command enables all @value{SSEDEXT} even if Write to the given filename the portion of the pattern space up to the first newline. Everything said under the @code{w} command about file handling holds here too. + +@item z +@findex z (Zap) command +@cindex @value{SSEDEXT}, emptying pattern space +@cindex Emptying pattern space +This command empties the content of pattern space. It is +usually the same as @samp{s/.*//}, but is more efficient +and works in the presence of invalid multibyte sequences +in the input stream. @sc{posix} mandates that such sequences +are @emph{not} matched by @samp{.}, so that there is no portable +way to clear @command{sed}'s buffers in the middle of the +script in most multibyte locales (including UTF-8 locales). @end table @node Escapes @@ -2738,6 +2750,10 @@ in a read-only directory, and will break hard or symbolic links when @option{-i} is used on such a file. @item @code{0a} does not work (gives an error) +@cindex @code{0} address +@cindex @acronym{GNU} extensions, @code{0} address +@cindex Non-bugs, @code{0} address + There is no line 0. 0 is a special address that is only used to treat addresses like @code{0,/@var{RE}/} as active when the script starts: if you write @code{1,/abc/d} and the first line includes the word @samp{abc}, @@ -2748,6 +2764,8 @@ is obtained with @code{0,/abc/d}. @ifclear PERL @item @code{[a-z]} is case insensitive +@cindex Non-bugs, localization-related + You are encountering problems with locales. POSIX mandates that @code{[a-z]} uses the current locale's collation order -- in C parlance, that means using @code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a @@ -2764,6 +2782,22 @@ locales, or @samp{ij} in Dutch locales. To work around these problems, which may cause bugs in shell scripts, set the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}. + +@item @code{s/.*//} does not clear pattern space +@cindex Non-bugs, localization-related +@cindex @value{SSEDEXT}, emptying pattern space +@cindex Emptying pattern space + +This happens if your input stream includes invalid multibyte +sequences. @sc{posix} mandates that such sequences +are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear +pattern space as you would expect. In fact, there is no way to clear +sed's buffers in the middle of the script in most multibyte locales +(including UTF-8 locales). For this reason, @value{SSED} provides a `z' +command (for `zap') as an extension. + +To work around these problems, which may cause bugs in shell scripts, set +the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}. @end ifclear @end table diff --git a/doc/sed.texi b/doc/sed.texi index 5b74a51..72408e5 100644 --- a/doc/sed.texi +++ b/doc/sed.texi @@ -1490,6 +1490,18 @@ This command enables all @value{SSEDEXT} even if Write to the given filename the portion of the pattern space up to the first newline. Everything said under the @code{w} command about file handling holds here too. + +@item z +@findex z (Zap) command +@cindex @value{SSEDEXT}, emptying pattern space +@cindex Emptying pattern space +This command empties the content of pattern space. It is +usually the same as @samp{s/.*//}, but is more efficient +and works in the presence of invalid multibyte sequences +in the input stream. @sc{posix} mandates that such sequences +are @emph{not} matched by @samp{.}, so that there is no portable +way to clear @command{sed}'s buffers in the middle of the +script in most multibyte locales (including UTF-8 locales). @end table @node Escapes @@ -2905,6 +2917,10 @@ in a read-only directory, and will break hard or symbolic links when @option{-i} is used on such a file. @item @code{0a} does not work (gives an error) +@cindex @code{0} address +@cindex @acronym{GNU} extensions, @code{0} address +@cindex Non-bugs, @code{0} address + There is no line 0. 0 is a special address that is only used to treat addresses like @code{0,/@var{RE}/} as active when the script starts: if you write @code{1,/abc/d} and the first line includes the word @samp{abc}, @@ -2915,6 +2931,8 @@ is obtained with @code{0,/abc/d}. @ifclear PERL @item @code{[a-z]} is case insensitive +@cindex Non-bugs, localization-related + You are encountering problems with locales. POSIX mandates that @code{[a-z]} uses the current locale's collation order -- in C parlance, that means using @code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a @@ -2931,6 +2949,22 @@ locales, or @samp{ij} in Dutch locales. To work around these problems, which may cause bugs in shell scripts, set the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}. + +@item @code{s/.*//} does not clear pattern space +@cindex Non-bugs, localization-related +@cindex @value{SSEDEXT}, emptying pattern space +@cindex Emptying pattern space + +This happens if your input stream includes invalid multibyte +sequences. @sc{posix} mandates that such sequences +are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear +pattern space as you would expect. In fact, there is no way to clear +sed's buffers in the middle of the script in most multibyte locales +(including UTF-8 locales). For this reason, @value{SSED} provides a `z' +command (for `zap') as an extension. + +To work around these problems, which may cause bugs in shell scripts, set +the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}. @end ifclear @end table |