diff options
author | Dan Book <grinnz@grinnz.com> | 2020-03-10 20:31:27 -0400 |
---|---|---|
committer | Nicolas R <nicolas@atoomic.org> | 2020-03-12 17:39:56 -0600 |
commit | 57fb4502d8032d9a870b52b241e19dc65af57776 (patch) | |
tree | 8f3b02acd649c4fc40b440ee04daf4448c5b60b1 | |
parent | 6311900a664b5de7fc4b60c1935639bb1d0af7a8 (diff) | |
download | perl-57fb4502d8032d9a870b52b241e19dc65af57776.tar.gz |
Rework PerlIO documentation
- Add Layers section in the description, giving details of how layers work and encompassing the list of built-in layers
- Add various information relevant to modern usage of each layer
- Consistently refer to layers with a leading colon
- Redo :utf8 and :bytes layer descriptions
- Remove references to using the :utf8 layer for UTF-8 translation
- Add :scalar layer
- Move description of default layers that was oddly in "Querying" section to the end of "Defaults" section
- Correct default layers to specify that PERLIO=:stdio will always result in default layers of :stdio
- Update all examples to be strict-safe and check for open/binmode failure
- Capitalize references to Perl consistently
-rw-r--r-- | lib/PerlIO.pm | 272 |
1 files changed, 169 insertions, 103 deletions
diff --git a/lib/PerlIO.pm b/lib/PerlIO.pm index 7658ce497b..7f7db64a54 100644 --- a/lib/PerlIO.pm +++ b/lib/PerlIO.pm @@ -35,14 +35,19 @@ PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space =head1 SYNOPSIS - open($fh, "<:crlf", "my.txt"); # support platform-native and - # CRLF text files + # support platform-native and CRLF text files + open(my $fh, "<:crlf", "my.txt") or die "open failed: $!"; - open($fh, "<", "his.jpg"); # portably open a binary file for reading - binmode($fh); + # append UTF-8 encoded text + open(my $fh, ">>:encoding(UTF-8)", "some.log") + or die "open failed: $!"; + + # portably open a binary file for reading + open(my $fh, "<", "his.jpg") or die "open failed: $!"; + binmode($fh) or die "binmode failed: $!"; Shell: - PERLIO=perlio perl .... + PERLIO=:perlio perl .... =head1 DESCRIPTION @@ -51,13 +56,52 @@ C<binmode> layer specification then C code performs the equivalent of: use PerlIO 'foo'; -The perl code in PerlIO.pm then attempts to locate a layer by doing +The Perl code in PerlIO.pm then attempts to locate a layer by doing require PerlIO::foo; Otherwise the C<PerlIO> package is a place holder for additional PerlIO related functions. +=head2 Layers + +Generally speaking, PerlIO layers (previously sometimes referred to as +"disciplines") are an ordered stack applied to a filehandle (specified as +a space- or colon-separated list, conventionally written with a leading +colon). Each layer performs some operation on any input or output, except +when bypassed such as with C<sysread> or C<syswrite>. Read operations go +through the stack in the order they are set (left to right), and write +operations in the reverse order. + +There are also layers which actually just set flags on lower layers, or +layers that modify the current stack but don't persist on the stack +themselves; these are referred to as pseudo-layers. + +When opening a handle, it will be opened with any layers specified +explicitly in the open() call (or the platform defaults, if specified as +a colon with no following layers). + +If layers are not explicitly specified, the handle will be opened with the +layers specified by the L<${^OPEN}|perlvar/"${^OPEN}"> variable (usually +set by using the L<open> pragma for a lexical scope, or the C<-C> +command-line switch or C<PERL_UNICODE> environment variable for the main +program scope). + +If layers are not specified in the open() call or C<${^OPEN}> variable, +the handle will be opened with the default layer stack configured for that +architecture; see L</"Defaults and how to override them">. + +Some layers will automatically insert required lower level layers if not +present; for example C<:perlio> will insert C<:unix> below itself for low +level IO, and C<:encoding> will insert the platform defaults for buffered +IO. + +The C<binmode> function can be called on an opened handle to push +additional layers onto the stack, which may also modify the existing +layers. C<binmode> called with no layers will remove or unset any +existing layers which transform the byte stream, making the handle +suitable for binary data. + The following layers are currently defined: =over 4 @@ -67,17 +111,21 @@ The following layers are currently defined: Lowest level layer which provides basic PerlIO operations in terms of UNIX/POSIX numeric file descriptor calls (open(), read(), write(), lseek(), close()). +It is used even on non-Unix architectures, and most other layers operate on +top of it. =item :stdio Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note that as this is "real" stdio it will ignore any layers beneath it and go straight to the operating system via the C library as usual. +This layer implements both low level IO and buffering, but is rarely used +on modern architectures. =item :perlio A from scratch implementation of buffering for PerlIO. Provides fast -access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt> +access to the buffer for C<sv_gets> which implements Perl's readline/E<lt>E<gt> and in general attempts to minimize data copying. C<:perlio> will insert a C<:unix> layer below itself to do low level IO. @@ -92,81 +140,98 @@ refuse to be pushed on top of itself. It currently does I<not> mimic MS-DOS as far as treating of Control-Z as being an end-of-file marker. -Based on the C<:perlio> layer. - -=item :utf8 - -Declares that the stream accepts perl's I<internal> encoding of -characters. (Which really is UTF-8 on ASCII machines, but is -UTF-EBCDIC on EBCDIC machines.) This allows any character perl can -represent to be read from or written to the stream. The UTF-X encoding -is chosen to render simple text parts (i.e. non-accented letters, -digits and common punctuation) human readable in the encoded file. - -(B<CAUTION>: This layer does not validate byte sequences. For reading input, -you should instead use C<:encoding(UTF-8)> instead of bare C<:utf8>.) - -Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) -and then read it back in. +On DOS/Windows like architectures where this layer is part of the defaults, +it also acts like the C<:perlio> layer, and removing the CRLF translation +(such as with C<:raw>) will only unset the CRLF translation flag. Since +Perl 5.14, you can also apply another C<:crlf> layer later, such as when +the CRLF translation must occur after an encoding layer. On other +architectures, it is a mundane CRLF translation layer and can be added and +removed normally. - open(F, ">:utf8", "data.utf"); - print F $out; - close(F); + # translate CRLF after encoding on Perl 5.14 or newer + binmode $fh, ":raw:encoding(UTF-16LE):crlf" + or die "binmode failed: $!"; - open(F, "<:utf8", "data.utf"); - $in = <F>; - close(F); +=item :utf8 +Pseudo-layer that declares that the stream accepts Perl's I<internal> +upgraded encoding of characters, which is approximately UTF-8 on ASCII +machines, but UTF-EBCDIC on EBCDIC machines. This allows any character +Perl can represent to be read from or written to the stream. + +This layer (which actually sets a flag on the preceding layer, and is +implicitly set by any C<:encoding> layer) does not translate or validate +byte sequences. It instead indicates that the byte stream will have been +arranged by other layers to be provided in Perl's internal upgraded +encoding, which Perl code (and correctly written XS code) will interpret +as decoded Unicode characters. + +B<CAUTION>: Do not use this layer to translate from UTF-8 bytes, as +invalid UTF-8 or binary data will result in malformed Perl strings. It is +unlikely to produce invalid UTF-8 when used for output, though it will +instead produce UTF-EBCDIC on EBCDIC systems. The C<:encoding(UTF-8)> +layer (hyphen is significant) is preferred as it will ensure translation +between valid UTF-8 bytes and valid Unicode characters. =item :bytes -This is the inverse of the C<:utf8> layer. It turns off the flag +This is the inverse of the C<:utf8> pseudo-layer. It turns off the flag on the layer below so that data read from it is considered to -be "octets" i.e. characters in the range 0..255 only. Likewise -on output perl will warn if a "wide" character is written -to a such a stream. +be Perl's internal downgraded encoding, thus interpreted as the native +single-byte encoding of Latin-1 or EBCDIC. Likewise on output Perl will +warn if a "wide" character (a codepoint not in the range 0..255) is +written to a such a stream. + +This is very dangerous to push on a handle using an C<:encoding> layer, +as such a layer assumes to be working with Perl's internal upgraded +encoding, so you will likely get a mangled result. Instead use C<:raw> or +C<:pop> to remove encoding layers. =item :raw -The C<:raw> layer is I<defined> as being identical to calling +The C<:raw> pseudo-layer is I<defined> as being identical to calling C<binmode($fh)> - the stream is made suitable for passing binary data, -i.e. each byte is passed as-is. The stream will still be -buffered. +i.e. each byte is passed as-is. The stream will still be buffered +(but this was not always true before Perl 5.14). -In Perl 5.6 and some books the C<:raw> layer (previously sometimes also -referred to as a "discipline") is documented as the inverse of the -C<:crlf> layer. That is no longer the case - other layers which would -alter the binary nature of the stream are also disabled. If you want UNIX -line endings on a platform that normally does CRLF translation, but still -want UTF-8 or encoding defaults, the appropriate thing to do is to add -C<:perlio> to the PERLIO environment variable. +In Perl 5.6 and some books the C<:raw> layer is documented as the inverse +of the C<:crlf> layer. That is no longer the case - other layers which +would alter the binary nature of the stream are also disabled. If you +want UNIX line endings on a platform that normally does CRLF translation, +but still want UTF-8 or encoding defaults, the appropriate thing to do is +to add C<:perlio> to the PERLIO environment variable, or open the handle +explicitly with that layer, to replace the platform default of C<:crlf>. The implementation of C<:raw> is as a pseudo-layer which when "pushed" -pops itself and then any layers which do not declare themselves as suitable -for binary data. (Undoing :utf8 and :crlf are implemented by clearing -flags rather than popping layers but that is an implementation detail.) +pops itself and then any layers which would modify the binary data stream. +(Undoing C<:utf8> and C<:crlf> may be implemented by clearing flags +rather than popping layers but that is an implementation detail.) As a consequence of the fact that C<:raw> normally pops layers, it usually only makes sense to have it as the only or first element in a layer specification. When used as the first element it provides a known base on which to build e.g. - open($fh,":raw:utf8",...) + open(my $fh,">:raw:encoding(UTF-8)",...) + or die "open failed: $!"; -will construct a "binary" stream, but then enable UTF-8 translation. +will construct a "binary" stream regardless of the platform defaults, +but then enable UTF-8 translation. =item :pop -A pseudo layer that removes the top-most layer. Gives perl code a +A pseudo-layer that removes the top-most layer. Gives Perl code a way to manipulate the layer stack. Note that C<:pop> only works on -real layers and will not undo the effects of pseudo layers like -C<:utf8>. An example of a possible use might be: +real layers and will not undo the effects of pseudo-layers or flags +like C<:utf8>. An example of a possible use might be: - open($fh,...) + open(my $fh,...) or die "open failed: $!"; ... - binmode($fh,":encoding(...)"); # next chunk is encoded + binmode($fh,":encoding(...)") or die "binmode failed: $!"; + # next chunk is encoded ... - binmode($fh,":pop"); # back to un-encoded + binmode($fh,":pop") or die "binmode failed: $!"; + # back to un-encoded A more elegant (and safer) interface is needed. @@ -174,25 +239,24 @@ A more elegant (and safer) interface is needed. On Win32 platforms this I<experimental> layer uses the native "handle" IO rather than the unix-like numeric file descriptor layer. Known to be -buggy as of perl 5.8.2. +buggy as of Perl 5.8.2. =back =head2 Custom Layers It is possible to write custom layers in addition to the above builtin -ones, both in C/XS and Perl. Two such layers (and one example written -in Perl using the latter) come with the Perl distribution. +ones, both in C/XS and Perl, as a module named C<< PerlIO::<layer name> >>. +Some custom layers come with the Perl distribution. =over 4 =item :encoding -Use C<:encoding(ENCODING)> either in open() or binmode() to install -a layer that transparently does character set and encoding transformations, -for example from Shift-JIS to Unicode. Note that under C<stdio> -an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> -for more information. +Use C<:encoding(ENCODING)> to transparently do character set and encoding +transformations, for example from Shift-JIS to Unicode. Note that an +C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> for more +information. =item :mmap @@ -207,64 +271,81 @@ layer. Writes also behave like the C<:perlio> layer, as C<mmap()> for write needs extra house-keeping (to extend the file) which negates any advantage. The C<:mmap> layer will not exist if the platform does not support C<mmap()>. +See L<PerlIO::mmap> for more information. =item :via -Use C<:via(MODULE)> either in open() or binmode() to install a layer -that does whatever transformation (for example compression / -decompression, encryption / decryption) to the filehandle. +C<:via(MODULE)> allows a transformation to be applied by an arbitrary Perl +module, for example compression / decompression, encryption / decryption. See L<PerlIO::via> for more information. +=item :scalar + +A layer implementing "in memory" files using scalar variables, +automatically used in place of the platform defaults for IO when opening +such a handle. As such, the scalar is expected to act like a file, only +containing or storing bytes. See L<PerlIO::scalar> for more information. + =back =head2 Alternatives to raw To get a binary stream an alternate method is to use: - open($fh,"whatever") - binmode($fh); + open(my $fh,"<","whatever") or die "open failed: $!"; + binmode($fh) or die "binmode failed: $!"; -this has the advantage of being backward compatible with how such things have -had to be coded on some platforms for years. +This has the advantage of being backward compatible with older versions +of Perl that did not use PerlIO or where C<:raw> was buggy (as it was +before Perl 5.14). To get an unbuffered stream specify an unbuffered layer (e.g. C<:unix>) in the open call: - open($fh,"<:unix",$path) + open(my $fh,"<:unix",$path) or die "open failed: $!"; =head2 Defaults and how to override them If the platform is MS-DOS like and normally does CRLF to "\n" -translation for text files then the default layers are : +translation for text files then the default layers are: - unix crlf - -(The low level "unix" layer may be replaced by a platform specific low -level layer.) + :unix:crlf Otherwise if C<Configure> found out how to do "fast" IO using the system's -stdio, then the default layers are: +stdio (not common on modern architectures), then the default layers are: - unix stdio + :stdio Otherwise the default layers are - unix perlio - -These defaults may change once perlio has been better tested and tuned. + :unix:perlio -The default can be overridden by setting the environment variable -PERLIO to a space separated list of layers (C<unix> or platform low -level layer is always pushed first). +Note that the "default stack" depends on the operating system and on the +Perl version, and both the compile-time and runtime configurations of +Perl. The default can be overridden by setting the environment variable +PERLIO to a space or colon separated list of layers, however this cannot +be used to set layers that require loading modules like C<:encoding>. This can be used to see the effect of/bugs in the various layers e.g. cd .../perl/t - PERLIO=stdio ./perl harness - PERLIO=perlio ./perl harness + PERLIO=:stdio ./perl harness + PERLIO=:perlio ./perl harness For the various values of PERLIO see L<perlrun/PERLIO>. +The following table summarizes the default layers on UNIX-like and +DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>: + + PERLIO UNIX-like DOS-like + ------ --------- -------- + unset / "" :unix:perlio / :stdio [1] :unix:crlf + :stdio :stdio :stdio + :perlio :unix:perlio :unix:perlio + + # [1] ":stdio" if Configure found out how to do "fast stdio" (depends + # on the stdio implementation) and in Perl 5.8, else ":unix:perlio" + =head2 Querying the layers of filehandles The following returns the B<names> of the PerlIO layers on a filehandle. @@ -272,21 +353,7 @@ The following returns the B<names> of the PerlIO layers on a filehandle. my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". The layers are returned in the order an open() or binmode() call would -use them. Note that the "default stack" depends on the operating -system and on the Perl version, and both the compile-time and -runtime configurations of Perl. - -The following table summarizes the default layers on UNIX-like and -DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>: - - PERLIO UNIX-like DOS-like - ------ --------- -------- - unset / "" unix perlio / stdio [1] unix crlf - stdio unix perlio / stdio [1] stdio - perlio unix perlio unix perlio - - # [1] "stdio" if Configure found out how to do "fast stdio" (depends - # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio" +use them, and without colons. By default the layers from the input side of the filehandle are returned; to get the output side, use the optional C<output> argument: @@ -294,8 +361,7 @@ returned; to get the output side, use the optional C<output> argument: my @layers = PerlIO::get_layers($fh, output => 1); (Usually the layers are identical on either side of a filehandle but -for example with sockets there may be differences, or if you have -been using the C<open> pragma.) +for example with sockets there may be differences.) There is no set_layers(), nor does get_layers() return a tied array mirroring the stack, or anything fancy like that. This is not @@ -306,7 +372,7 @@ You are supposed to use open() and binmode() to manipulate the stack. B<Implementation details follow, please close your eyes.> The arguments to layers are by default returned in parentheses after -the name of the layer, and certain layers (like C<utf8>) are not real +the name of the layer, and certain layers (like C<:utf8>) are not real layers but instead flags on real layers; to get all of these returned separately, use the optional C<details> argument: |