summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDan Book <grinnz@grinnz.com>2020-03-10 20:31:27 -0400
committerNicolas R <nicolas@atoomic.org>2020-03-12 17:39:56 -0600
commit57fb4502d8032d9a870b52b241e19dc65af57776 (patch)
tree8f3b02acd649c4fc40b440ee04daf4448c5b60b1
parent6311900a664b5de7fc4b60c1935639bb1d0af7a8 (diff)
downloadperl-57fb4502d8032d9a870b52b241e19dc65af57776.tar.gz
Rework PerlIO documentation
- Add Layers section in the description, giving details of how layers work and encompassing the list of built-in layers - Add various information relevant to modern usage of each layer - Consistently refer to layers with a leading colon - Redo :utf8 and :bytes layer descriptions - Remove references to using the :utf8 layer for UTF-8 translation - Add :scalar layer - Move description of default layers that was oddly in "Querying" section to the end of "Defaults" section - Correct default layers to specify that PERLIO=:stdio will always result in default layers of :stdio - Update all examples to be strict-safe and check for open/binmode failure - Capitalize references to Perl consistently
-rw-r--r--lib/PerlIO.pm272
1 files changed, 169 insertions, 103 deletions
diff --git a/lib/PerlIO.pm b/lib/PerlIO.pm
index 7658ce497b..7f7db64a54 100644
--- a/lib/PerlIO.pm
+++ b/lib/PerlIO.pm
@@ -35,14 +35,19 @@ PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space
=head1 SYNOPSIS
- open($fh, "<:crlf", "my.txt"); # support platform-native and
- # CRLF text files
+ # support platform-native and CRLF text files
+ open(my $fh, "<:crlf", "my.txt") or die "open failed: $!";
- open($fh, "<", "his.jpg"); # portably open a binary file for reading
- binmode($fh);
+ # append UTF-8 encoded text
+ open(my $fh, ">>:encoding(UTF-8)", "some.log")
+ or die "open failed: $!";
+
+ # portably open a binary file for reading
+ open(my $fh, "<", "his.jpg") or die "open failed: $!";
+ binmode($fh) or die "binmode failed: $!";
Shell:
- PERLIO=perlio perl ....
+ PERLIO=:perlio perl ....
=head1 DESCRIPTION
@@ -51,13 +56,52 @@ C<binmode> layer specification then C code performs the equivalent of:
use PerlIO 'foo';
-The perl code in PerlIO.pm then attempts to locate a layer by doing
+The Perl code in PerlIO.pm then attempts to locate a layer by doing
require PerlIO::foo;
Otherwise the C<PerlIO> package is a place holder for additional
PerlIO related functions.
+=head2 Layers
+
+Generally speaking, PerlIO layers (previously sometimes referred to as
+"disciplines") are an ordered stack applied to a filehandle (specified as
+a space- or colon-separated list, conventionally written with a leading
+colon). Each layer performs some operation on any input or output, except
+when bypassed such as with C<sysread> or C<syswrite>. Read operations go
+through the stack in the order they are set (left to right), and write
+operations in the reverse order.
+
+There are also layers which actually just set flags on lower layers, or
+layers that modify the current stack but don't persist on the stack
+themselves; these are referred to as pseudo-layers.
+
+When opening a handle, it will be opened with any layers specified
+explicitly in the open() call (or the platform defaults, if specified as
+a colon with no following layers).
+
+If layers are not explicitly specified, the handle will be opened with the
+layers specified by the L<${^OPEN}|perlvar/"${^OPEN}"> variable (usually
+set by using the L<open> pragma for a lexical scope, or the C<-C>
+command-line switch or C<PERL_UNICODE> environment variable for the main
+program scope).
+
+If layers are not specified in the open() call or C<${^OPEN}> variable,
+the handle will be opened with the default layer stack configured for that
+architecture; see L</"Defaults and how to override them">.
+
+Some layers will automatically insert required lower level layers if not
+present; for example C<:perlio> will insert C<:unix> below itself for low
+level IO, and C<:encoding> will insert the platform defaults for buffered
+IO.
+
+The C<binmode> function can be called on an opened handle to push
+additional layers onto the stack, which may also modify the existing
+layers. C<binmode> called with no layers will remove or unset any
+existing layers which transform the byte stream, making the handle
+suitable for binary data.
+
The following layers are currently defined:
=over 4
@@ -67,17 +111,21 @@ The following layers are currently defined:
Lowest level layer which provides basic PerlIO operations in terms of
UNIX/POSIX numeric file descriptor calls
(open(), read(), write(), lseek(), close()).
+It is used even on non-Unix architectures, and most other layers operate on
+top of it.
=item :stdio
Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note
that as this is "real" stdio it will ignore any layers beneath it and
go straight to the operating system via the C library as usual.
+This layer implements both low level IO and buffering, but is rarely used
+on modern architectures.
=item :perlio
A from scratch implementation of buffering for PerlIO. Provides fast
-access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt>
+access to the buffer for C<sv_gets> which implements Perl's readline/E<lt>E<gt>
and in general attempts to minimize data copying.
C<:perlio> will insert a C<:unix> layer below itself to do low level IO.
@@ -92,81 +140,98 @@ refuse to be pushed on top of itself.
It currently does I<not> mimic MS-DOS as far as treating of Control-Z
as being an end-of-file marker.
-Based on the C<:perlio> layer.
-
-=item :utf8
-
-Declares that the stream accepts perl's I<internal> encoding of
-characters. (Which really is UTF-8 on ASCII machines, but is
-UTF-EBCDIC on EBCDIC machines.) This allows any character perl can
-represent to be read from or written to the stream. The UTF-X encoding
-is chosen to render simple text parts (i.e. non-accented letters,
-digits and common punctuation) human readable in the encoded file.
-
-(B<CAUTION>: This layer does not validate byte sequences. For reading input,
-you should instead use C<:encoding(UTF-8)> instead of bare C<:utf8>.)
-
-Here is how to write your native data out using UTF-8 (or UTF-EBCDIC)
-and then read it back in.
+On DOS/Windows like architectures where this layer is part of the defaults,
+it also acts like the C<:perlio> layer, and removing the CRLF translation
+(such as with C<:raw>) will only unset the CRLF translation flag. Since
+Perl 5.14, you can also apply another C<:crlf> layer later, such as when
+the CRLF translation must occur after an encoding layer. On other
+architectures, it is a mundane CRLF translation layer and can be added and
+removed normally.
- open(F, ">:utf8", "data.utf");
- print F $out;
- close(F);
+ # translate CRLF after encoding on Perl 5.14 or newer
+ binmode $fh, ":raw:encoding(UTF-16LE):crlf"
+ or die "binmode failed: $!";
- open(F, "<:utf8", "data.utf");
- $in = <F>;
- close(F);
+=item :utf8
+Pseudo-layer that declares that the stream accepts Perl's I<internal>
+upgraded encoding of characters, which is approximately UTF-8 on ASCII
+machines, but UTF-EBCDIC on EBCDIC machines. This allows any character
+Perl can represent to be read from or written to the stream.
+
+This layer (which actually sets a flag on the preceding layer, and is
+implicitly set by any C<:encoding> layer) does not translate or validate
+byte sequences. It instead indicates that the byte stream will have been
+arranged by other layers to be provided in Perl's internal upgraded
+encoding, which Perl code (and correctly written XS code) will interpret
+as decoded Unicode characters.
+
+B<CAUTION>: Do not use this layer to translate from UTF-8 bytes, as
+invalid UTF-8 or binary data will result in malformed Perl strings. It is
+unlikely to produce invalid UTF-8 when used for output, though it will
+instead produce UTF-EBCDIC on EBCDIC systems. The C<:encoding(UTF-8)>
+layer (hyphen is significant) is preferred as it will ensure translation
+between valid UTF-8 bytes and valid Unicode characters.
=item :bytes
-This is the inverse of the C<:utf8> layer. It turns off the flag
+This is the inverse of the C<:utf8> pseudo-layer. It turns off the flag
on the layer below so that data read from it is considered to
-be "octets" i.e. characters in the range 0..255 only. Likewise
-on output perl will warn if a "wide" character is written
-to a such a stream.
+be Perl's internal downgraded encoding, thus interpreted as the native
+single-byte encoding of Latin-1 or EBCDIC. Likewise on output Perl will
+warn if a "wide" character (a codepoint not in the range 0..255) is
+written to a such a stream.
+
+This is very dangerous to push on a handle using an C<:encoding> layer,
+as such a layer assumes to be working with Perl's internal upgraded
+encoding, so you will likely get a mangled result. Instead use C<:raw> or
+C<:pop> to remove encoding layers.
=item :raw
-The C<:raw> layer is I<defined> as being identical to calling
+The C<:raw> pseudo-layer is I<defined> as being identical to calling
C<binmode($fh)> - the stream is made suitable for passing binary data,
-i.e. each byte is passed as-is. The stream will still be
-buffered.
+i.e. each byte is passed as-is. The stream will still be buffered
+(but this was not always true before Perl 5.14).
-In Perl 5.6 and some books the C<:raw> layer (previously sometimes also
-referred to as a "discipline") is documented as the inverse of the
-C<:crlf> layer. That is no longer the case - other layers which would
-alter the binary nature of the stream are also disabled. If you want UNIX
-line endings on a platform that normally does CRLF translation, but still
-want UTF-8 or encoding defaults, the appropriate thing to do is to add
-C<:perlio> to the PERLIO environment variable.
+In Perl 5.6 and some books the C<:raw> layer is documented as the inverse
+of the C<:crlf> layer. That is no longer the case - other layers which
+would alter the binary nature of the stream are also disabled. If you
+want UNIX line endings on a platform that normally does CRLF translation,
+but still want UTF-8 or encoding defaults, the appropriate thing to do is
+to add C<:perlio> to the PERLIO environment variable, or open the handle
+explicitly with that layer, to replace the platform default of C<:crlf>.
The implementation of C<:raw> is as a pseudo-layer which when "pushed"
-pops itself and then any layers which do not declare themselves as suitable
-for binary data. (Undoing :utf8 and :crlf are implemented by clearing
-flags rather than popping layers but that is an implementation detail.)
+pops itself and then any layers which would modify the binary data stream.
+(Undoing C<:utf8> and C<:crlf> may be implemented by clearing flags
+rather than popping layers but that is an implementation detail.)
As a consequence of the fact that C<:raw> normally pops layers,
it usually only makes sense to have it as the only or first element in
a layer specification. When used as the first element it provides
a known base on which to build e.g.
- open($fh,":raw:utf8",...)
+ open(my $fh,">:raw:encoding(UTF-8)",...)
+ or die "open failed: $!";
-will construct a "binary" stream, but then enable UTF-8 translation.
+will construct a "binary" stream regardless of the platform defaults,
+but then enable UTF-8 translation.
=item :pop
-A pseudo layer that removes the top-most layer. Gives perl code a
+A pseudo-layer that removes the top-most layer. Gives Perl code a
way to manipulate the layer stack. Note that C<:pop> only works on
-real layers and will not undo the effects of pseudo layers like
-C<:utf8>. An example of a possible use might be:
+real layers and will not undo the effects of pseudo-layers or flags
+like C<:utf8>. An example of a possible use might be:
- open($fh,...)
+ open(my $fh,...) or die "open failed: $!";
...
- binmode($fh,":encoding(...)"); # next chunk is encoded
+ binmode($fh,":encoding(...)") or die "binmode failed: $!";
+ # next chunk is encoded
...
- binmode($fh,":pop"); # back to un-encoded
+ binmode($fh,":pop") or die "binmode failed: $!";
+ # back to un-encoded
A more elegant (and safer) interface is needed.
@@ -174,25 +239,24 @@ A more elegant (and safer) interface is needed.
On Win32 platforms this I<experimental> layer uses the native "handle" IO
rather than the unix-like numeric file descriptor layer. Known to be
-buggy as of perl 5.8.2.
+buggy as of Perl 5.8.2.
=back
=head2 Custom Layers
It is possible to write custom layers in addition to the above builtin
-ones, both in C/XS and Perl. Two such layers (and one example written
-in Perl using the latter) come with the Perl distribution.
+ones, both in C/XS and Perl, as a module named C<< PerlIO::<layer name> >>.
+Some custom layers come with the Perl distribution.
=over 4
=item :encoding
-Use C<:encoding(ENCODING)> either in open() or binmode() to install
-a layer that transparently does character set and encoding transformations,
-for example from Shift-JIS to Unicode. Note that under C<stdio>
-an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding>
-for more information.
+Use C<:encoding(ENCODING)> to transparently do character set and encoding
+transformations, for example from Shift-JIS to Unicode. Note that an
+C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> for more
+information.
=item :mmap
@@ -207,64 +271,81 @@ layer. Writes also behave like the C<:perlio> layer, as C<mmap()> for write
needs extra house-keeping (to extend the file) which negates any advantage.
The C<:mmap> layer will not exist if the platform does not support C<mmap()>.
+See L<PerlIO::mmap> for more information.
=item :via
-Use C<:via(MODULE)> either in open() or binmode() to install a layer
-that does whatever transformation (for example compression /
-decompression, encryption / decryption) to the filehandle.
+C<:via(MODULE)> allows a transformation to be applied by an arbitrary Perl
+module, for example compression / decompression, encryption / decryption.
See L<PerlIO::via> for more information.
+=item :scalar
+
+A layer implementing "in memory" files using scalar variables,
+automatically used in place of the platform defaults for IO when opening
+such a handle. As such, the scalar is expected to act like a file, only
+containing or storing bytes. See L<PerlIO::scalar> for more information.
+
=back
=head2 Alternatives to raw
To get a binary stream an alternate method is to use:
- open($fh,"whatever")
- binmode($fh);
+ open(my $fh,"<","whatever") or die "open failed: $!";
+ binmode($fh) or die "binmode failed: $!";
-this has the advantage of being backward compatible with how such things have
-had to be coded on some platforms for years.
+This has the advantage of being backward compatible with older versions
+of Perl that did not use PerlIO or where C<:raw> was buggy (as it was
+before Perl 5.14).
To get an unbuffered stream specify an unbuffered layer (e.g. C<:unix>)
in the open call:
- open($fh,"<:unix",$path)
+ open(my $fh,"<:unix",$path) or die "open failed: $!";
=head2 Defaults and how to override them
If the platform is MS-DOS like and normally does CRLF to "\n"
-translation for text files then the default layers are :
+translation for text files then the default layers are:
- unix crlf
-
-(The low level "unix" layer may be replaced by a platform specific low
-level layer.)
+ :unix:crlf
Otherwise if C<Configure> found out how to do "fast" IO using the system's
-stdio, then the default layers are:
+stdio (not common on modern architectures), then the default layers are:
- unix stdio
+ :stdio
Otherwise the default layers are
- unix perlio
-
-These defaults may change once perlio has been better tested and tuned.
+ :unix:perlio
-The default can be overridden by setting the environment variable
-PERLIO to a space separated list of layers (C<unix> or platform low
-level layer is always pushed first).
+Note that the "default stack" depends on the operating system and on the
+Perl version, and both the compile-time and runtime configurations of
+Perl. The default can be overridden by setting the environment variable
+PERLIO to a space or colon separated list of layers, however this cannot
+be used to set layers that require loading modules like C<:encoding>.
This can be used to see the effect of/bugs in the various layers e.g.
cd .../perl/t
- PERLIO=stdio ./perl harness
- PERLIO=perlio ./perl harness
+ PERLIO=:stdio ./perl harness
+ PERLIO=:perlio ./perl harness
For the various values of PERLIO see L<perlrun/PERLIO>.
+The following table summarizes the default layers on UNIX-like and
+DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>:
+
+ PERLIO UNIX-like DOS-like
+ ------ --------- --------
+ unset / "" :unix:perlio / :stdio [1] :unix:crlf
+ :stdio :stdio :stdio
+ :perlio :unix:perlio :unix:perlio
+
+ # [1] ":stdio" if Configure found out how to do "fast stdio" (depends
+ # on the stdio implementation) and in Perl 5.8, else ":unix:perlio"
+
=head2 Querying the layers of filehandles
The following returns the B<names> of the PerlIO layers on a filehandle.
@@ -272,21 +353,7 @@ The following returns the B<names> of the PerlIO layers on a filehandle.
my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH".
The layers are returned in the order an open() or binmode() call would
-use them. Note that the "default stack" depends on the operating
-system and on the Perl version, and both the compile-time and
-runtime configurations of Perl.
-
-The following table summarizes the default layers on UNIX-like and
-DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>:
-
- PERLIO UNIX-like DOS-like
- ------ --------- --------
- unset / "" unix perlio / stdio [1] unix crlf
- stdio unix perlio / stdio [1] stdio
- perlio unix perlio unix perlio
-
- # [1] "stdio" if Configure found out how to do "fast stdio" (depends
- # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio"
+use them, and without colons.
By default the layers from the input side of the filehandle are
returned; to get the output side, use the optional C<output> argument:
@@ -294,8 +361,7 @@ returned; to get the output side, use the optional C<output> argument:
my @layers = PerlIO::get_layers($fh, output => 1);
(Usually the layers are identical on either side of a filehandle but
-for example with sockets there may be differences, or if you have
-been using the C<open> pragma.)
+for example with sockets there may be differences.)
There is no set_layers(), nor does get_layers() return a tied array
mirroring the stack, or anything fancy like that. This is not
@@ -306,7 +372,7 @@ You are supposed to use open() and binmode() to manipulate the stack.
B<Implementation details follow, please close your eyes.>
The arguments to layers are by default returned in parentheses after
-the name of the layer, and certain layers (like C<utf8>) are not real
+the name of the layer, and certain layers (like C<:utf8>) are not real
layers but instead flags on real layers; to get all of these returned
separately, use the optional C<details> argument: