summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorPaolo Bonzini <bonzini@gnu.org>2008-09-29 09:31:27 +0200
committerPaolo Bonzini <bonzini@gnu.org>2008-09-29 10:24:09 +0200
commit41e0cf2dac2364bc5c46c516143eccb59408e4ff (patch)
treef2feedc338d4566395b9cd040f58364d106a2124 /doc
parent07df2df3c2de5cc55525f5a38788e0c05fb08cce (diff)
downloadsed-41e0cf2dac2364bc5c46c516143eccb59408e4ff.tar.gz
add `z' extension
2008-09-29 Paolo Bonzini <bonzini@gnu.org> * BUGS: Document s/.*.// behavior with invalid multibyte sequences. * NEWS: Document `z' extension. * doc/sed-in.texi: Document both things. * sed/compile.c (compile_program): Recognize `z'. * sed/execute.c (execute_program): Execute `z'. * testsuite/Makefile.am: Add badenc test. * testsuite/Makefile.tests: Add badenc test. * testsuite/badenc.good: New. * testsuite/badenc.inp: New. * testsuite/badenc.sed: New.
Diffstat (limited to 'doc')
-rw-r--r--doc/sed-in.texi34
-rw-r--r--doc/sed.texi34
2 files changed, 68 insertions, 0 deletions
diff --git a/doc/sed-in.texi b/doc/sed-in.texi
index cee2505..c8bb21d 100644
--- a/doc/sed-in.texi
+++ b/doc/sed-in.texi
@@ -1489,6 +1489,18 @@ This command enables all @value{SSEDEXT} even if
Write to the given filename the portion of the pattern space up to
the first newline. Everything said under the @code{w} command about
file handling holds here too.
+
+@item z
+@findex z (Zap) command
+@cindex @value{SSEDEXT}, emptying pattern space
+@cindex Emptying pattern space
+This command empties the content of pattern space. It is
+usually the same as @samp{s/.*//}, but is more efficient
+and works in the presence of invalid multibyte sequences
+in the input stream. @sc{posix} mandates that such sequences
+are @emph{not} matched by @samp{.}, so that there is no portable
+way to clear @command{sed}'s buffers in the middle of the
+script in most multibyte locales (including UTF-8 locales).
@end table
@node Escapes
@@ -2738,6 +2750,10 @@ in a read-only directory, and will break hard or symbolic links when
@option{-i} is used on such a file.
@item @code{0a} does not work (gives an error)
+@cindex @code{0} address
+@cindex @acronym{GNU} extensions, @code{0} address
+@cindex Non-bugs, @code{0} address
+
There is no line 0. 0 is a special address that is only used to treat
addresses like @code{0,/@var{RE}/} as active when the script starts: if
you write @code{1,/abc/d} and the first line includes the word @samp{abc},
@@ -2748,6 +2764,8 @@ is obtained with @code{0,/abc/d}.
@ifclear PERL
@item @code{[a-z]} is case insensitive
+@cindex Non-bugs, localization-related
+
You are encountering problems with locales. POSIX mandates that @code{[a-z]}
uses the current locale's collation order -- in C parlance, that means using
@code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a
@@ -2764,6 +2782,22 @@ locales, or @samp{ij} in Dutch locales.
To work around these problems, which may cause bugs in shell scripts, set
the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
+
+@item @code{s/.*//} does not clear pattern space
+@cindex Non-bugs, localization-related
+@cindex @value{SSEDEXT}, emptying pattern space
+@cindex Emptying pattern space
+
+This happens if your input stream includes invalid multibyte
+sequences. @sc{posix} mandates that such sequences
+are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear
+pattern space as you would expect. In fact, there is no way to clear
+sed's buffers in the middle of the script in most multibyte locales
+(including UTF-8 locales). For this reason, @value{SSED} provides a `z'
+command (for `zap') as an extension.
+
+To work around these problems, which may cause bugs in shell scripts, set
+the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
@end ifclear
@end table
diff --git a/doc/sed.texi b/doc/sed.texi
index 5b74a51..72408e5 100644
--- a/doc/sed.texi
+++ b/doc/sed.texi
@@ -1490,6 +1490,18 @@ This command enables all @value{SSEDEXT} even if
Write to the given filename the portion of the pattern space up to
the first newline. Everything said under the @code{w} command about
file handling holds here too.
+
+@item z
+@findex z (Zap) command
+@cindex @value{SSEDEXT}, emptying pattern space
+@cindex Emptying pattern space
+This command empties the content of pattern space. It is
+usually the same as @samp{s/.*//}, but is more efficient
+and works in the presence of invalid multibyte sequences
+in the input stream. @sc{posix} mandates that such sequences
+are @emph{not} matched by @samp{.}, so that there is no portable
+way to clear @command{sed}'s buffers in the middle of the
+script in most multibyte locales (including UTF-8 locales).
@end table
@node Escapes
@@ -2905,6 +2917,10 @@ in a read-only directory, and will break hard or symbolic links when
@option{-i} is used on such a file.
@item @code{0a} does not work (gives an error)
+@cindex @code{0} address
+@cindex @acronym{GNU} extensions, @code{0} address
+@cindex Non-bugs, @code{0} address
+
There is no line 0. 0 is a special address that is only used to treat
addresses like @code{0,/@var{RE}/} as active when the script starts: if
you write @code{1,/abc/d} and the first line includes the word @samp{abc},
@@ -2915,6 +2931,8 @@ is obtained with @code{0,/abc/d}.
@ifclear PERL
@item @code{[a-z]} is case insensitive
+@cindex Non-bugs, localization-related
+
You are encountering problems with locales. POSIX mandates that @code{[a-z]}
uses the current locale's collation order -- in C parlance, that means using
@code{strcoll(3)} instead of @code{strcmp(3)}. Some locales have a
@@ -2931,6 +2949,22 @@ locales, or @samp{ij} in Dutch locales.
To work around these problems, which may cause bugs in shell scripts, set
the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
+
+@item @code{s/.*//} does not clear pattern space
+@cindex Non-bugs, localization-related
+@cindex @value{SSEDEXT}, emptying pattern space
+@cindex Emptying pattern space
+
+This happens if your input stream includes invalid multibyte
+sequences. @sc{posix} mandates that such sequences
+are @emph{not} matched by @samp{.}, so that @samp{s/.*//} will not clear
+pattern space as you would expect. In fact, there is no way to clear
+sed's buffers in the middle of the script in most multibyte locales
+(including UTF-8 locales). For this reason, @value{SSED} provides a `z'
+command (for `zap') as an extension.
+
+To work around these problems, which may cause bugs in shell scripts, set
+the @env{LC_COLLATE} and @env{LC_CTYPE} environment variables to @samp{C}.
@end ifclear
@end table