diff options
Diffstat (limited to 'doc/gawktexi.in')
-rw-r--r-- | doc/gawktexi.in | 2945 |
1 files changed, 1479 insertions, 1466 deletions
diff --git a/doc/gawktexi.in b/doc/gawktexi.in index 8034a6b6..121a066e 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -46,10 +46,11 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH September, 2014 +@set UPDATE-MONTH February, 2015 @set VERSION 4.1 @set PATCHLEVEL 2 +@set GAWKINETTITLE TCP/IP Internetworking with @command{gawk} @ifset FOR_PRINT @set TITLE Effective awk Programming @end ifset @@ -192,9 +193,9 @@ @ifclear FOR_PRINT @set FN file name -@set FFN File Name +@set FFN File name @set DF data file -@set DDF Data File +@set DDF Data file @set PVERSION version @end ifclear @ifset FOR_PRINT @@ -293,7 +294,7 @@ Fax: +1-617-542-2652 Email: <email>gnu@@gnu.org</email> URL: <ulink url="http://www.gnu.org">http://www.gnu.org/</ulink></literallayout> -<literallayout class="normal">Copyright © 1989, 1991, 1992, 1993, 1996–2005, 2007, 2009–2014 +<literallayout class="normal">Copyright © 1989, 1991, 1992, 1993, 1996–2005, 2007, 2009–2015 Free Software Foundation, Inc. All Rights Reserved.</literallayout> @end docbook @@ -467,7 +468,7 @@ particular records in a file and perform operations upon them. @command{gawk}. * Internationalization:: Getting @command{gawk} to speak your language. -* Debugger:: The @code{gawk} debugger. +* Debugger:: The @command{gawk} debugger. * Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with @command{gawk}. * Dynamic Extensions:: Adding new built-in functions to @@ -627,6 +628,7 @@ particular records in a file and perform operations upon them. * Special Caveats:: Things to watch out for. * Close Files And Pipes:: Closing Input and Output Files and Pipes. +* Nonfatal:: Enabling Nonfatal Output. * Output Summary:: Output summary. * Output Exercises:: Exercises. * Values:: Constants, Variables, and Regular @@ -950,7 +952,7 @@ particular records in a file and perform operations upon them. * Internal File Ops:: The code for internal file operations. * Using Internal File Ops:: How to use an external extension. * Extension Samples:: The sample extensions that ship with - @code{gawk}. + @command{gawk}. * Extension Sample File Functions:: The file functions sample. * Extension Sample Fnmatch:: An interface to @code{fnmatch()}. * Extension Sample Fork:: An interface to @code{fork()} and @@ -1295,7 +1297,7 @@ October 2014 <affiliation><jobtitle>Nof Ayalon</jobtitle></affiliation> <affiliation><jobtitle>Israel</jobtitle></affiliation> </author> - <date>December 2014</date> + <date>February 2015</date> </prefaceinfo> @end docbook @@ -1463,7 +1465,7 @@ In May 1997, J@"urgen Kahrs felt the need for network access from @command{awk}, and with a little help from me, set about adding features to do this for @command{gawk}. At that time, he also wrote the bulk of -@cite{TCP/IP Internetworking with @command{gawk}} +@cite{@value{GAWKINETTITLE}} (a separate document, available as part of the @command{gawk} distribution). His code finally became part of the main @command{gawk} distribution with @command{gawk} @value{PVERSION} 3.1. @@ -1486,7 +1488,7 @@ is often referred to as ``new @command{awk}.'' By analogy, the original version of @command{awk} is referred to as ``old @command{awk}.'' -Today, on most systems, when you run the @command{awk} utility +On most current systems, when you run the @command{awk} utility you get some version of new @command{awk}.@footnote{Only Solaris systems still use an old @command{awk} for the default @command{awk} utility. A more modern @command{awk} lives in @@ -1717,15 +1719,39 @@ and how to compile and use it on different non-POSIX systems. It also describes how to report bugs in @command{gawk} and where to get other freely available @command{awk} implementations. -@end itemize @ifset FOR_PRINT -@itemize @value{MINUS} @item @ref{Copying}, presents the license that covers the @command{gawk} source code. +@end ifset + +@ifclear FOR_PRINT +@item +@ref{Notes}, +describes how to disable @command{gawk}'s extensions, as +well as how to contribute new code to @command{gawk}, +and some possible future directions for @command{gawk} development. + +@item +@ref{Basic Concepts}, +provides some very cursory background material for those who +are completely unfamiliar with computer programming. + +The @ref{Glossary}, defines most, if not all, of the significant terms used +throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with, +try looking them up here. + +@item +@ref{Copying}, and +@ref{GNU Free Documentation License}, +present the licenses that cover the @command{gawk} source code +and this @value{DOCUMENT}, respectively. +@end ifclear +@end itemize @end itemize +@ifset FOR_PRINT The version of this @value{DOCUMENT} distributed with @command{gawk} contains additional appendices and other end material. To save space, we have omitted them from the @@ -1763,32 +1789,6 @@ Some of the chapters have exercise sections; these have also been omitted from the print edition but are available online. @end ifset -@ifclear FOR_PRINT -@itemize @value{MINUS} -@item -@ref{Notes}, -describes how to disable @command{gawk}'s extensions, as -well as how to contribute new code to @command{gawk}, -and some possible future directions for @command{gawk} development. - -@item -@ref{Basic Concepts}, -provides some very cursory background material for those who -are completely unfamiliar with computer programming. - -The @ref{Glossary}, defines most, if not all, of the significant terms used -throughout the @value{DOCUMENT}. If you find terms that you aren't familiar with, -try looking them up here. - -@item -@ref{Copying}, and -@ref{GNU Free Documentation License}, -present the licenses that cover the @command{gawk} source code -and this @value{DOCUMENT}, respectively. -@end itemize -@end ifclear -@end itemize - @c FULLXREF OFF @node Conventions @@ -1830,15 +1830,23 @@ $ @kbd{echo hello on stderr 1>&2} @end example @ifnotinfo -In the text, command names appear in @code{this font}, while code segments +In the text, almost anything related to programming, such as +command names, +variable and function names, and string, numeric and regexp constants +appear in @code{this font}. Code fragments appear in the same font and quoted, @samp{like this}. +Things that are replaced by the user or programmer +appear in @var{this font}. Options look like this: @option{-f}. +@value{FFN}s are indicated like this: @file{/path/to/ourfile}. +@ifclear FOR_PRINT Some things are emphasized @emph{like this}, and if a point needs to be made -strongly, it is done @strong{like this}. The first occurrence of +strongly, it is done @strong{like this}. +@end ifclear +The first occurrence of a new term is usually its @dfn{definition} and appears in the same font as the previous occurrence of ``definition'' in this sentence. -Finally, @value{FN}s are indicated like this: @file{/path/to/ourfile}. @end ifnotinfo Characters that you type at the keyboard look @kbd{like this}. In particular, @@ -2251,14 +2259,14 @@ which they raised and educated me. Finally, I also must acknowledge my gratitude to G-d, for the many opportunities He has sent my way, as well as for the gifts He has given me with which to take advantage of those opportunities. -@iftex +@ifnotdocbook @sp 2 @noindent Arnold Robbins @* Nof Ayalon @* Israel @* -December 2014 -@end iftex +February 2015 +@end ifnotdocbook @ifnotinfo @part @value{PART1}The @command{awk} Language @@ -2564,9 +2572,7 @@ for programs that are provided on the @command{awk} command line. (Also, placing the program in a file allows us to use a literal single quote in the program text, instead of the magic @samp{\47}.) -@c STARTOFRANGE sq1x @cindex single quote (@code{'}) in @command{gawk} command lines -@c STARTOFRANGE qs2x @cindex @code{'} (single quote) in @command{gawk} command lines If you want to clearly identify an @command{awk} program file as such, you can add the extension @file{.awk} to the @value{FN}. This doesn't @@ -2884,8 +2890,6 @@ $ @kbd{awk "BEGIN @{ print \"Here is a single quote <'>\" @}"} @end example @noindent -@c ENDOFRANGE sq1x -@c ENDOFRANGE qs2x This option is also painful, because double quotes, backslashes, and dollar signs are very common in more advanced @command{awk} programs. @@ -3221,8 +3225,13 @@ no actions run. After processing all the rules that match the line (and perhaps there are none), @command{awk} reads the next line. (However, -@pxref{Next Statement}, +@DBPXREF{Next Statement} +@ifdocbook +and @DBREF{Nextfile Statement}.) +@end ifdocbook +@ifnotdocbook and also @pxref{Nextfile Statement}.) +@end ifnotdocbook This continues until the program reaches the end of the file. For example, the following @command{awk} program contains two rules: @@ -3487,7 +3496,7 @@ performing bit manipulation, for runtime string translation (internationalizatio determining the type of a variable, and array sorting. -As we develop our presentation of the @command{awk} language, we introduce +As we develop our presentation of the @command{awk} language, we will introduce most of the variables and many of the functions. They are described systematically in @DBREF{Built-in Variables} and in @ref{Built-in}. @@ -3541,7 +3550,7 @@ and Perl.} @c FIXME: Review this chapter for summary of builtin functions called. @itemize @value{BULLET} @item -Programs in @command{awk} consist of @var{pattern}-@var{action} pairs. +Programs in @command{awk} consist of @var{pattern}--@var{action} pairs. @item An @var{action} without a @var{pattern} always runs. The default @@ -3570,7 +3579,7 @@ part of a larger shell script (or MS-Windows batch file). You may use backslash continuation to continue a source line. Lines are automatically continued after a comma, open brace, question mark, colon, -@samp{||}, @samp{&&}, @code{do} and @code{else}. +@samp{||}, @samp{&&}, @code{do}, and @code{else}. @end itemize @node Invoking Gawk @@ -3645,20 +3654,16 @@ warning that the program is empty. @node Options @section Command-Line Options -@c STARTOFRANGE ocl @cindex options, command-line -@c STARTOFRANGE clo @cindex command line, options -@c STARTOFRANGE gnulo @cindex GNU long options -@c STARTOFRANGE longo @cindex options, long Options begin with a dash and consist of a single character. GNU-style long options consist of two dashes and a keyword. The keyword can be abbreviated, as long as the abbreviation allows the option -to be uniquely identified. If the option takes an argument, then the -keyword is either immediately followed by an equals sign (@samp{=}) and the +to be uniquely identified. If the option takes an argument, either the +keyword is immediately followed by an equals sign (@samp{=}) and the argument's value, or the keyword and the argument's value are separated by whitespace. If a particular option with a value is given more than once, it is the @@ -3685,7 +3690,7 @@ Set the @code{FS} variable to @var{fs} @cindex @option{-f} option @cindex @option{--file} option @cindex @command{awk} programs, location of -Read @command{awk} program source from @var{source-file} +Read the @command{awk} program source from @var{source-file} instead of in the first nonoption argument. This option may be given multiple times; the @command{awk} program consists of the concatenation of the contents of @@ -3740,8 +3745,6 @@ by the user that could start with @samp{-}. It is also useful for passing options on to the @command{awk} program; see @ref{Getopt Function}. @end table -@c ENDOFRANGE gnulo -@c ENDOFRANGE longo The following list describes @command{gawk}-specific options: @@ -3753,14 +3756,14 @@ The following list describes @command{gawk}-specific options: @cindex @option{--characters-as-bytes} option Cause @command{gawk} to treat all input data as single-byte characters. In addition, all output written with @code{print} or @code{printf} -are treated as single-byte characters. +is treated as single-byte characters. Normally, @command{gawk} follows the POSIX standard and attempts to process its input data according to the current locale (@pxref{Locales}). This can often involve converting multibyte characters into wide characters (internally), and can lead to problems or confusion if the input data does not contain valid -multibyte characters. This option is an easy way to tell @command{gawk}: -``hands off my data!''. +multibyte characters. This option is an easy way to tell @command{gawk}, +``Hands off my data!'' @item @option{-c} @itemx @option{--traditional} @@ -3817,7 +3820,7 @@ Enable debugging of @command{awk} programs By default, the debugger reads commands interactively from the keyboard (standard input). The optional @var{file} argument allows you to specify a file with a list -of commands for the debugger to execute non-interactively. +of commands for the debugger to execute noninteractively. No space is allowed between the @option{-D} and @var{file}, if @var{file} is supplied. @@ -3877,7 +3880,7 @@ with @samp{#!} scripts (@pxref{Executable Scripts}), like so: @cindex portable object files, generating @cindex files, portable object, generating Analyze the source program and -generate a GNU @command{gettext} Portable Object Template file on standard +generate a GNU @command{gettext} portable object template file on standard output for all string constants that have been marked for translation. @xref{Internationalization}, for information about this option. @@ -3889,7 +3892,7 @@ for information about this option. @cindex GNU long options, printing list of @cindex options, printing list of @cindex printing, list of options -Print a ``usage'' message summarizing the short and long style options +Print a ``usage'' message summarizing the short- and long-style options that @command{gawk} accepts and then exit. @item @option{-i} @var{source-file} @@ -3899,7 +3902,7 @@ that @command{gawk} accepts and then exit. @cindex @command{awk} programs, location of Read an @command{awk} source library from @var{source-file}. This option is completely equivalent to using the @code{@@include} directive inside -your program. This option is very similar to the @option{-f} option, +your program. It is very similar to the @option{-f} option, but there are two important differences. First, when @option{-i} is used, the program source is not loaded if it has been previously loaded, whereas with @option{-f}, @command{gawk} always loads the file. @@ -3984,7 +3987,7 @@ when parsing numeric input data (@pxref{Locales}). @cindex @option{-o} option @cindex @option{--pretty-print} option Enable pretty-printing of @command{awk} programs. -By default, output program is created in a file named @file{awkprof.out} +By default, the output program is created in a file named @file{awkprof.out} (@pxref{Profiling}). The optional @var{file} argument allows you to specify a different @value{FN} for the output. @@ -4028,7 +4031,7 @@ in the left margin, and function call counts for each function. Operate in strict POSIX mode. This disables all @command{gawk} extensions (just like @option{--traditional}) and disables all extensions not allowed by POSIX. -@xref{Common Extensions}, for a summary of the extensions +@DBXREF{Common Extensions} for a summary of the extensions in @command{gawk} that are disabled by this option. Also, the following additional @@ -4149,7 +4152,7 @@ source of data.) Because it is clumsy using the standard @command{awk} mechanisms to mix source file and command-line @command{awk} programs, @command{gawk} provides the @option{-e} option. This does not require you to -pre-empt the standard input for your source code; it allows you to easily +preempt the standard input for your source code; it allows you to easily mix command-line and library source code (@pxref{AWKPATH Variable}). As with @option{-f}, the @option{-e} and @option{-i} options may also be used multiple times on the command line. @@ -4195,8 +4198,6 @@ setenv POSIXLY_CORRECT true Having @env{POSIXLY_CORRECT} set is not recommended for daily use, but it is good for testing the portability of your programs to other environments. -@c ENDOFRANGE ocl -@c ENDOFRANGE clo @node Other Arguments @section Other Command-Line Arguments @@ -4339,7 +4340,7 @@ file, unless the file is in the current directory. But with @command{gawk}, if the @value{FN} supplied to the @option{-f} or @option{-i} options does not contain a directory separator @samp{/}, then @command{gawk} searches a list of -directories (called the @dfn{search path}), one by one, looking for a +directories (called the @dfn{search path}) one by one, looking for a file with the specified name. The search path is a string consisting of directory names @@ -4380,9 +4381,9 @@ as an entry in the path or write a null entry in the path. Different past versions of @command{gawk} would also look explicitly in the current directory, either before or after the path search. As of -@value{PVERSION} 4.1.2, this no longer happens, and if you wish to look +@value{PVERSION} 4.1.2, this no longer happens; if you wish to look in the current directory, you must include @file{.} either as a separate -entry, or as a null entry in the search path. +entry or as a null entry in the search path. @end quotation The default value for @env{AWKPATH} is @@ -4460,6 +4461,8 @@ wait for input before returning with an error. Controls the number of times @command{gawk} attempts to retry a two-way TCP/IP (socket) connection before giving up. @xref{TCP/IP Networking}. +Note that when nonfatal I/O is enabled (@pxref{Nonfatal}), +@command{gawk} only tries to open a TCP/IP socket once. @item POSIXLY_CORRECT Causes @command{gawk} to switch to POSIX-compatibility @@ -4498,7 +4501,7 @@ If this variable exists, @command{gawk} includes the @value{FN} and line number within the @command{gawk} source code from which warning and/or fatal messages are generated. Its purpose is to help isolate the source of a -message, as there are multiple places which produce the +message, as there are multiple places that produce the same warning or error message. @item GAWK_NO_DFA @@ -4514,16 +4517,16 @@ This specifies the amount by which @command{gawk} should grow its internal evaluation stack, when needed. @item INT_CHAIN_MAX -The intended maximum number of items @command{gawk} will maintain on a +This specifies intended maximum number of items @command{gawk} will maintain on a hash chain for managing arrays indexed by integers. @item STR_CHAIN_MAX -The intended maximum number of items @command{gawk} will maintain on a +This specifies intended maximum number of items @command{gawk} will maintain on a hash chain for managing arrays indexed by strings. @item TIDYMEM If this variable exists, @command{gawk} uses the @code{mtrace()} library -calls from GNU LIBC to help track down possible memory leaks. +calls from the GNU C library to help track down possible memory leaks. @end table @node Exit Status @@ -4560,7 +4563,7 @@ The @code{@@include} keyword can be used to read external @command{awk} source files. This gives you the ability to split large @command{awk} source files into smaller, more manageable pieces, and also lets you reuse common @command{awk} code from various @command{awk} scripts. In other words, you can group -together @command{awk} functions, used to carry out specific tasks, +together @command{awk} functions used to carry out specific tasks into external files. These files can be used just like function libraries, using the @code{@@include} keyword in conjunction with the @env{AWKPATH} environment variable. Note that source files may also be included @@ -4595,7 +4598,7 @@ $ @kbd{gawk -f test2} @print{} This is script test2. @end example -@code{gawk} runs the @file{test2} script, which includes @file{test1} +@command{gawk} runs the @file{test2} script, which includes @file{test1} using the @code{@@include} keyword. So, to include external @command{awk} source files, you just use @code{@@include} followed by the name of the file to be included, @@ -4650,11 +4653,12 @@ of the @env{AWKPATH} variable in command-line file searches This is very helpful in constructing @command{gawk} function libraries. If you have a large script with useful, general-purpose @command{awk} functions, you can break it down into library files and put those files -in a special directory. You can then include those ``libraries,'' using -either the full pathnames of the files, or by setting the @env{AWKPATH} +in a special directory. You can then include those ``libraries,'' +either by using the full pathnames of the files, or by setting the @env{AWKPATH} environment variable accordingly and then using @code{@@include} with -just the file part of the full pathname. Of course, you can have more -than one directory to keep library files; the more complex the working +just the file part of the full pathname. Of course, +you can keep library files in more than one directory; +the more complex the working environment is, the more directories you may need to organize the files to be included. @@ -4667,8 +4671,8 @@ In particular, @code{@@include} is very useful for writing CGI scripts to be run from web pages. As mentioned in @ref{AWKPATH Variable}, the current directory is always -searched first for source files, before searching in @env{AWKPATH}, -and this also applies to files named with @code{@@include}. +searched first for source files, before searching in @env{AWKPATH}; +this also applies to files named with @code{@@include}. @node Loading Shared Libraries @section Loading Dynamic Extensions into Your Program @@ -4722,8 +4726,8 @@ It also describes the @code{ordchr} extension. @cindex features, deprecated @cindex obsolete features This @value{SECTION} describes features and/or command-line options from -previous releases of @command{gawk} that are either not available in the -current version or that are still supported but deprecated (meaning that +previous releases of @command{gawk} that either are not available in the +current version or are still supported but deprecated (meaning that they will @emph{not} be in the next release). The process-related special files @file{/dev/pid}, @file{/dev/ppid}, @@ -4803,7 +4807,7 @@ This seems to have been a long-undocumented feature in Unix @command{awk}. Similarly, you may use @code{print} or @code{printf} statements in the @var{init} and @var{increment} parts of a @code{for} loop. This is another -long-undocumented ``feature'' of Unix @code{awk}. +long-undocumented ``feature'' of Unix @command{awk}. @end ignore @@ -4820,7 +4824,7 @@ to run @command{awk}. @item The three standard options for all versions of @command{awk} are -@option{-f}, @option{-F} and @option{-v}. @command{gawk} supplies these +@option{-f}, @option{-F}, and @option{-v}. @command{gawk} supplies these and many others, as well as corresponding GNU-style long options. @item @@ -4857,13 +4861,12 @@ and @option{-f} command-line options. @item @command{gawk} allows you to load additional functions written in C or C++ using the @code{@@load} statement and/or the @option{-l} option. -(This advanced feature is described later on in @ref{Dynamic Extensions}.) +(This advanced feature is described later, in @ref{Dynamic Extensions}.) @end itemize @node Regexp @chapter Regular Expressions @cindex regexp -@c STARTOFRANGE regexp @cindex regular expressions A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a @@ -5070,7 +5073,7 @@ Horizontal TAB, @kbd{Ctrl-i}, ASCII code 9 (HT). @cindex @code{\} (backslash), @code{\v} escape sequence @cindex backslash (@code{\}), @code{\v} escape sequence @item \v -Vertical tab, @kbd{Ctrl-k}, ASCII code 11 (VT). +Vertical TAB, @kbd{Ctrl-k}, ASCII code 11 (VT). @cindex @code{\} (backslash), @code{\}@var{nnn} escape sequence @cindex backslash (@code{\}), @code{\}@var{nnn} escape sequence @@ -5096,13 +5099,12 @@ letters or numbers. @value{COMMONEXT} @quotation CAUTION In ISO C, the escape sequence continues until the first nonhexadecimal digit is seen. -@c FIXME: Add exact version here. For many years, @command{gawk} would continue incorporating hexadecimal digits into the value until a non-hexadecimal digit or the end of the string was encountered. However, using more than two hexadecimal digits produced undefined results. -As of @value{PVERSION} @strong{FIXME:} 4.3.0, only two digits +As of @value{PVERSION} 4.2, only two digits are processed. @end quotation @@ -5145,7 +5147,7 @@ characters @samp{a+b}. @cindex @code{\} (backslash), in escape sequences @cindex portability For complete portability, do not use a backslash before any character not -shown in the previous list and that is not an operator. +shown in the previous list or that is not an operator. @c 11/2014: Moved so as to not stack sidebars @sidebar Backslash Before Regular Characters @@ -5224,7 +5226,6 @@ escape sequences literally when used in regexp constants. Thus, @node Regexp Operators @section Regular Expression Operators -@c STARTOFRANGE regexpo @cindex regular expressions, operators @cindex metacharacters in regular expressions @@ -5242,7 +5243,7 @@ are recognized and converted into corresponding real characters as the very first step in processing regexps. Here is a list of metacharacters. All characters that are not escape -sequences and that are not listed in the following stand for themselves: +sequences and that are not listed here stand for themselves: @c Use @asis so the docbook comes out ok. Sigh. @table @asis @@ -5365,7 +5366,7 @@ just @samp{p} if no @samp{h}s are present. There are two subtle points to understand about how @samp{*} works. First, the @samp{*} applies only to the single preceding regular expression component (e.g., in @samp{ph*}, it applies just to the @samp{h}). -To cause @samp{*} to apply to a larger sub-expression, use parentheses: +To cause @samp{*} to apply to a larger subexpression, use parentheses: @samp{(ph)*} matches @samp{ph}, @samp{phph}, @samp{phphph}, and so on. Second, @samp{*} finds as many repetitions as possible. If the text @@ -5404,10 +5405,10 @@ is repeated at least @var{n} times: Matches @samp{whhhy}, but not @samp{why} or @samp{whhhhy}. @item wh@{3,5@}y -Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy}, only. +Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy} only. @item wh@{2,@}y -Matches @samp{whhy} or @samp{whhhy}, and so on. +Matches @samp{whhy}, @samp{whhhy}, and so on. @end table @cindex POSIX @command{awk}, interval expressions in @@ -5456,11 +5457,9 @@ usage as a syntax error. If @command{gawk} is in compatibility mode (@pxref{Options}), interval expressions are not available in regular expressions. -@c ENDOFRANGE regexpo @node Bracket Expressions @section Using Bracket Expressions -@c STARTOFRANGE charlist @cindex bracket expressions @cindex bracket expressions, range expressions @cindex range expressions (regexps) @@ -5536,7 +5535,7 @@ POSIX standard. (a space is printable but not visible, whereas an @samp{a} is both) @item @code{[:lower:]} @tab Lowercase alphabetic characters @item @code{[:print:]} @tab Printable characters (characters that are not control characters) -@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits +@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits, control characters, or space characters) @item @code{[:space:]} @tab Space characters (such as space, TAB, and formfeed, to name a few) @item @code{[:upper:]} @tab Uppercase alphabetic characters @@ -5556,11 +5555,11 @@ and numeric characters in your character set. @c Date: Tue, 01 Jul 2014 07:39:51 +0200 @c From: Hermann Peifer <peifer@gmx.eu> Some utilities that match regular expressions provide a nonstandard -@code{[:ascii:]} character class; @command{awk} does not. However, you -can simulate such a construct using @code{[\x00-\x7F]}. This matches +@samp{[:ascii:]} character class; @command{awk} does not. However, you +can simulate such a construct using @samp{[\x00-\x7F]}. This matches all values numerically between zero and 127, which is the defined range of the ASCII character set. Use a complemented character list -(@code{[^\x00-\x7F]}) to match any single-byte characters that are not +(@samp{[^\x00-\x7F]}) to match any single-byte characters that are not in the ASCII range. @cindex bracket expressions, collating elements @@ -5589,8 +5588,8 @@ Locale-specific names for a list of characters that are equal. The name is enclosed between @samp{[=} and @samp{=]}. For example, the name @samp{e} might be used to represent all of -``e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp -that matches any of @samp{e}, @samp{@'e}, or @samp{@`e}. +``e,'' ``@^e,'' ``@`e,'' and ``@'e.'' In this case, @samp{[[=e=]]} is a regexp +that matches any of @samp{e}, @samp{@^e}, @samp{@'e}, or @samp{@`e}. @end table These features are very valuable in non-English-speaking locales. @@ -5604,7 +5603,6 @@ expression matching currently recognize only POSIX character classes; they do not recognize collating symbols or equivalence classes. @end quotation @c maybe one day ... -@c ENDOFRANGE charlist @node Leftmost Longest @section How Much Text Matches? @@ -5620,7 +5618,7 @@ echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' This example uses the @code{sub()} function to make a change to the input record. (@code{sub()} replaces the first instance of any text matched by the first argument with the string provided as the second argument; -@pxref{String Functions}). Here, the regexp @code{/a+/} indicates ``one +@pxref{String Functions}.) Here, the regexp @code{/a+/} indicates ``one or more @samp{a} characters,'' and the replacement text is @samp{<A>}. The input contains four @samp{a} characters. @@ -5648,9 +5646,7 @@ and also @pxref{Field Separators}). @node Computed Regexps @section Using Dynamic Regexps -@c STARTOFRANGE dregexp @cindex regular expressions, computed -@c STARTOFRANGE regexpd @cindex regular expressions, dynamic @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator @@ -5676,14 +5672,14 @@ and tests whether the input record matches this regexp. @quotation NOTE When using the @samp{~} and @samp{!~} -operators, there is a difference between a regexp constant +operators, be aware that there is a difference between a regexp constant enclosed in slashes and a string constant enclosed in double quotes. If you are going to use a string constant, you have to understand that the string is, in essence, scanned @emph{twice}: the first time when @command{awk} reads your program, and the second time when it goes to match the string on the lefthand side of the operator with the pattern on the right. This is true of any string-valued expression (such as -@code{digits_regexp}, shown previously), not just string constants. +@code{digits_regexp}, shown in the previous example), not just string constants. @end quotation @cindex regexp constants, slashes vs.@: quotes @@ -5757,17 +5753,13 @@ $ @kbd{awk '$0 ~ /[ \t\n]/'} @command{gawk} does not have this problem, and it isn't likely to occur often in practice, but it's worth noting for future reference. @end sidebar -@c ENDOFRANGE dregexp -@c ENDOFRANGE regexpd @node GNU Regexp Operators @section @command{gawk}-Specific Regexp Operators @c This section adapted (long ago) from the regex-0.12 manual -@c STARTOFRANGE regexpg @cindex regular expressions, operators, @command{gawk} -@c STARTOFRANGE gregexp @cindex @command{gawk}, regular expressions, operators @cindex operators, GNU-specific @cindex regular expressions, operators, for words @@ -5843,7 +5835,7 @@ matches either @samp{ball} or @samp{balls}, as a separate word. @item \B Matches the empty string that occurs between two word-constituent characters. For example, -@code{/\Brat\B/} matches @samp{crate} but it does not match @samp{dirty rat}. +@code{/\Brat\B/} matches @samp{crate}, but it does not match @samp{dirty rat}. @samp{\B} is essentially the opposite of @samp{\y}. @end table @@ -5862,14 +5854,14 @@ The operators are: @cindex backslash (@code{\}), @code{\`} operator (@command{gawk}) @cindex @code{\} (backslash), @code{\`} operator (@command{gawk}) Matches the empty string at the -beginning of a buffer (string). +beginning of a buffer (string) @c @cindex operators, @code{\'} (@command{gawk}) @cindex backslash (@code{\}), @code{\'} operator (@command{gawk}) @cindex @code{\} (backslash), @code{\'} operator (@command{gawk}) @item \' Matches the empty string at the -end of a buffer (string). +end of a buffer (string) @end table @cindex @code{^} (caret), regexp operator @@ -5932,15 +5924,11 @@ Allow interval expressions in regexps, if @option{--traditional} has been provided. Otherwise, interval expressions are available by default. @end table -@c ENDOFRANGE gregexp -@c ENDOFRANGE regexpg @node Case-sensitivity @section Case Sensitivity in Matching -@c STARTOFRANGE regexpcs @cindex regular expressions, case sensitivity -@c STARTOFRANGE csregexp @cindex case sensitivity, regexps and Case is normally significant in regular expressions, both when matching ordinary characters (i.e., not metacharacters) and inside bracket @@ -6032,8 +6020,6 @@ the right thing.} The value of @code{IGNORECASE} has no effect if @command{gawk} is in compatibility mode (@pxref{Options}). Case is always significant in compatibility mode. -@c ENDOFRANGE csregexp -@c ENDOFRANGE regexpcs @node Regexp Summary @section Summary @@ -6080,12 +6066,10 @@ versions, use @code{tolower()} or @code{toupper()}. @end itemize -@c ENDOFRANGE regexp @node Reading Files @chapter Reading Input Files -@c STARTOFRANGE infir @cindex reading input files @cindex input files, reading @cindex input files @@ -6110,7 +6094,7 @@ This makes it more convenient for programs to work on the parts of a record. @cindex @code{getline} command On rare occasions, you may need to use the @code{getline} command. -The @code{getline} command is valuable, both because it +The @code{getline} command is valuable both because it can do explicit input from any number of files, and because the files used with it do not have to be named on the @command{awk} command line (@pxref{Getline}). @@ -6136,9 +6120,7 @@ used with it do not have to be named on the @command{awk} command line @node Records @section How Input Is Split into Records -@c STARTOFRANGE inspl @cindex input, splitting into records -@c STARTOFRANGE recspl @cindex records, splitting input into @cindex @code{NR} variable @cindex @code{FNR} variable @@ -6163,8 +6145,8 @@ never automatically reset to zero. Records are separated by a character called the @dfn{record separator}. By default, the record separator is the newline character. This is why records are, by default, single lines. -A different character can be used for the record separator by -assigning the character to the predefined variable @code{RS}. +To use a different character for the record separator, +simply assign that character to the predefined variable @code{RS}. @cindex newlines, as record separators @cindex @code{RS} variable @@ -6187,8 +6169,8 @@ awk 'BEGIN @{ RS = "u" @} @noindent changes the value of @code{RS} to @samp{u}, before reading any input. -This is a string whose first character is the letter ``u''; as a result, records -are separated by the letter ``u.'' Then the input file is read, and the second +The new value is a string whose first character is the letter ``u''; as a result, records +are separated by the letter ``u''. Then the input file is read, and the second rule in the @command{awk} program (the action with no pattern) prints each record. Because each @code{print} statement adds a newline at the end of its output, this @command{awk} program copies the input @@ -6249,8 +6231,8 @@ Bill 555-1675 bill.drowning@@hotmail.com A @end example @noindent -It contains no @samp{u} so there is no reason to split the record, -unlike the others which have one or more occurrences of the @samp{u}. +It contains no @samp{u}, so there is no reason to split the record, +unlike the others, which each have one or more occurrences of the @samp{u}. In fact, this record is treated as part of the previous record; the newline separating them in the output is the original newline in the @value{DF}, not the one added by @@ -6345,7 +6327,7 @@ contains the same single character. However, when @code{RS} is a regular expression, @code{RT} contains the actual input text that matched the regular expression. -If the input file ended without any text that matches @code{RS}, +If the input file ends without any text matching @code{RS}, @command{gawk} sets @code{RT} to the null string. The following example illustrates both of these features. @@ -6438,8 +6420,6 @@ character as a record separator. However, this is a special case: whole files. If you are using @command{gawk}, see @DBREF{Extension Sample Readfile} for another option. @end sidebar -@c ENDOFRANGE inspl -@c ENDOFRANGE recspl @node Fields @section Examining Fields @@ -6447,7 +6427,6 @@ Readfile} for another option. @cindex examining fields @cindex fields @cindex accessing fields -@c STARTOFRANGE fiex @cindex fields, examining @cindex POSIX @command{awk}, field separators and @cindex field separators, POSIX and @@ -6472,11 +6451,11 @@ simple @command{awk} programs so powerful. @cindex @code{$} (dollar sign), @code{$} field operator @cindex dollar sign (@code{$}), @code{$} field operator @cindex field operators@comma{} dollar sign as -You use a dollar-sign (@samp{$}) +You use a dollar sign (@samp{$}) to refer to a field in an @command{awk} program, followed by the number of the field you want. Thus, @code{$1} refers to the first field, @code{$2} to the second, and so on. -(Unlike the Unix shells, the field numbers are not limited to single digits. +(Unlike in the Unix shells, the field numbers are not limited to single digits. @code{$127} is the 127th field in the record.) For example, suppose the following is a line of input: @@ -6502,7 +6481,7 @@ If you try to reference a field beyond the last one (such as @code{$8} when the record has only seven fields), you get the empty string. (If used in a numeric operation, you get zero.) -The use of @code{$0}, which looks like a reference to the ``zero-th'' field, is +The use of @code{$0}, which looks like a reference to the ``zeroth'' field, is a special case: it represents the whole input record. Use it when you are not interested in specific fields. Here are some more examples: @@ -6528,7 +6507,6 @@ $ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list} @print{} Julie F @print{} Samuel A @end example -@c ENDOFRANGE fiex @node Nonconstant Fields @section Nonconstant Field Numbers @@ -6558,13 +6536,13 @@ awk '@{ print $(2*2) @}' mail-list @end example @command{awk} evaluates the expression @samp{(2*2)} and uses -its value as the number of the field to print. The @samp{*} sign +its value as the number of the field to print. The @samp{*} represents multiplication, so the expression @samp{2*2} evaluates to four. The parentheses are used so that the multiplication is done before the @samp{$} operation; they are necessary whenever there is a binary operator@footnote{A @dfn{binary operator}, such as @samp{*} for multiplication, is one that takes two operands. The distinction -is required, because @command{awk} also has unary (one-operand) +is required because @command{awk} also has unary (one-operand) and ternary (three-operand) operators.} in the field-number expression. This example, then, prints the type of relationship (the fourth field) for every line of the file @@ -6589,7 +6567,6 @@ evaluating @code{NF} and using its value as a field number. @node Changing Fields @section Changing the Contents of a Field -@c STARTOFRANGE ficon @cindex fields, changing contents of The contents of a field, as seen by @command{awk}, can be changed within an @command{awk} program; this changes what @command{awk} perceives as the @@ -6745,7 +6722,7 @@ rebuild @code{$0} when @code{NF} is decremented. Finally, there are times when it is convenient to force @command{awk} to rebuild the entire record, using the current -value of the fields and @code{OFS}. To do this, use the +values of the fields and @code{OFS}. To do this, use the seemingly innocuous assignment: @example @@ -6769,7 +6746,7 @@ such as @code{sub()} and @code{gsub()} It is important to remember that @code{$0} is the @emph{full} record, exactly as it was read from the input. This includes any leading or trailing whitespace, and the exact whitespace (or other -characters) that separate the fields. +characters) that separates the fields. It is a common error to try to change the field separators in a record simply by setting @code{FS} and @code{OFS}, and then @@ -6781,7 +6758,6 @@ itself. Instead, you must force the record to be rebuilt, typically with a statement such as @samp{$1 = $1}, as described earlier. @end sidebar -@c ENDOFRANGE ficon @node Field Separators @section Specifying How Fields Are Separated @@ -6797,9 +6773,7 @@ with a statement such as @samp{$1 = $1}, as described earlier. @cindex @code{FS} variable @cindex fields, separating -@c STARTOFRANGE fisepr @cindex field separators -@c STARTOFRANGE fisepg @cindex fields, separating The @dfn{field separator}, which is either a single character or a regular expression, controls the way @command{awk} splits an input record into fields. @@ -6865,7 +6839,7 @@ John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139 @end example @noindent -The same program would extract @samp{@bullet{}LXIX}, instead of +The same program would extract @samp{@bullet{}LXIX} instead of @samp{@bullet{}29@bullet{}Oak@bullet{}St.}. If you were expecting the program to print the address, you would be surprised. The moral is to choose your data layout and @@ -6899,9 +6873,7 @@ rules. @node Regexp Field Splitting @subsection Using Regular Expressions to Separate Fields -@c STARTOFRANGE regexpfs @cindex regular expressions, as field separators -@c STARTOFRANGE fsregexp @cindex field separators, regular expressions as The previous @value{SUBSECTION} discussed the use of single characters or simple strings as the @@ -7005,8 +6977,6 @@ $ @kbd{echo 'xxAA xxBxx C' |} @print{} -->xxBxx<-- @print{} -->C<-- @end example -@c ENDOFRANGE regexpfs -@c ENDOFRANGE fsregexp @node Single Character Fields @subsection Making Each Character a Separate Field @@ -7130,7 +7100,7 @@ choosing your field and record separators. @cindex Unix @command{awk}, password files@comma{} field separators and Perhaps the most common use of a single character as the field separator occurs when processing the Unix system password file. On many Unix -systems, each user has a separate entry in the system password file, one +systems, each user has a separate entry in the system password file, with one line per user. The information in these lines is separated by colons. The first field is the user's login name and the second is the user's encrypted or shadow password. (A shadow password is indicated by the @@ -7171,7 +7141,7 @@ When you do this, @code{$1} is the same as @code{$0}. According to the POSIX standard, @command{awk} is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of @code{FS} -after a record is read, the value of the fields (i.e., how they were split) +after a record is read, the values of the fields (i.e., how they were split) should reflect the old value of @code{FS}, not the new one. @cindex dark corner, field separators @@ -7184,10 +7154,7 @@ using the @emph{current} value of @code{FS}! @value{DARKCORNER} This behavior can be difficult to diagnose. The following example illustrates the difference -between the two methods. -(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.'' -Its behavior is also defined by the POSIX standard.} -command prints just the first line of @file{/etc/passwd}.) +between the two methods: @example sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}' @@ -7207,6 +7174,10 @@ prints the full first line of the file, something like: @example root:x:0:0:Root:/: @end example + +(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.'' +Its behavior is also defined by the POSIX standard.} +command prints just the first line of @file{/etc/passwd}.) @end sidebar @node Field Splitting Summary @@ -7267,8 +7238,6 @@ do it for you (e.g., @samp{FS = "[c]"}). In this case, @code{IGNORECASE} will take effect. @end sidebar -@c ENDOFRANGE fisepr -@c ENDOFRANGE fisepg @node Constant Size @section Reading Fixed-Width Data @@ -7306,7 +7275,7 @@ variable @code{FIELDWIDTHS}. Each number specifies the width of the field, @emph{including} columns between fields. If you want to ignore the columns between fields, you can specify the width as a separate field that is subsequently ignored. -It is a fatal error to supply a field width that is not a positive number. +It is a fatal error to supply a field width that has a negative value. The following data is the output of the Unix @command{w} utility. It is useful to illustrate the use of @code{FIELDWIDTHS}: @@ -7383,7 +7352,7 @@ In order to tell which kind of field splitting is in effect, use @code{PROCINFO["FS"]} (@pxref{Auto-set}). The value is @code{"FS"} if regular field splitting is being used, -or it is @code{"FIELDWIDTHS"} if fixed-width field splitting is being used: +or @code{"FIELDWIDTHS"} if fixed-width field splitting is being used: @example if (PROCINFO["FS"] == "FS") @@ -7419,14 +7388,14 @@ what they are, and not by what they are not. The most notorious such case is so-called @dfn{comma-separated values} (CSV) data. Many spreadsheet programs, for example, can export their data into text files, where each record is -terminated with a newline, and fields are separated by commas. If only -commas separated the data, there wouldn't be an issue. The problem comes when +terminated with a newline, and fields are separated by commas. If +commas only separated the data, there wouldn't be an issue. The problem comes when one of the fields contains an @emph{embedded} comma. In such cases, most programs embed the field in double quotes.@footnote{The CSV format lacked a formal standard definition for many years. @uref{http://www.ietf.org/rfc/rfc4180.txt, RFC 4180} standardizes the most common practices.} -So we might have data like this: +So, we might have data like this: @example @c file eg/misc/addresses.csv @@ -7512,8 +7481,8 @@ of cases, and the @command{gawk} developers are satisfied with that. @end quotation As written, the regexp used for @code{FPAT} requires that each field -have a least one character. A straightforward modification -(changing changed the first @samp{+} to @samp{*}) allows fields to be empty: +contain at least one character. A straightforward modification +(changing the first @samp{+} to @samp{*}) allows fields to be empty: @example FPAT = "([^,]*)|(\"[^\"]+\")" @@ -7523,20 +7492,17 @@ Finally, the @code{patsplit()} function makes the same functionality available for splitting regular strings (@pxref{String Functions}). To recap, @command{gawk} provides three independent methods -to split input records into fields. @command{gawk} uses whichever -mechanism was last chosen based on which of the three -variables---@code{FS}, @code{FIELDWIDTHS}, and @code{FPAT}---was +to split input records into fields. +The mechanism used is based on which of the three +variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was last assigned to. @node Multiple Line @section Multiple-Line Records @cindex multiple-line records -@c STARTOFRANGE recm @cindex records, multiline -@c STARTOFRANGE imr @cindex input, multiline records -@c STARTOFRANGE frm @cindex files, reading, multiline records @cindex input, files, See input files In some databases, a single line cannot conveniently hold all the @@ -7571,7 +7537,7 @@ at the end of the record and one or more blank lines after the record. In addition, a regular expression always matches the longest possible sequence when there is a choice (@pxref{Leftmost Longest}). -So the next record doesn't start until +So, the next record doesn't start until the first nonblank line that follows---no matter how many blank lines appear in a row, they are considered one record separator. @@ -7586,10 +7552,10 @@ In the second case, this special processing is not done. @cindex field separator, in multiline records @cindex @code{FS}, in multiline records Now that the input is separated into records, the second step is to -separate the fields in the record. One way to do this is to divide each +separate the fields in the records. One way to do this is to divide each of the lines into fields in the normal manner. This happens by default as the result of a special feature. When @code{RS} is set to the empty -string, @emph{and} @code{FS} is set to a single character, +string @emph{and} @code{FS} is set to a single character, the newline character @emph{always} acts as a field separator. This is in addition to whatever field separations result from @code{FS}.@footnote{When @code{FS} is the null string (@code{""}) @@ -7604,7 +7570,7 @@ want the newline character to separate fields, because there is no way to prevent it. However, you can work around this by using the @code{split()} function to break up the record manually (@pxref{String Functions}). -If you have a single character field separator, you can work around +If you have a single-character field separator, you can work around the special feature in a different way, by making @code{FS} into a regexp for that single character. For example, if the field separator is a percent character, instead of @@ -7612,10 +7578,10 @@ separator is a percent character, instead of Another way to separate fields is to put each field on a separate line: to do this, just set the -variable @code{FS} to the string @code{"\n"}. (This single -character separator matches a single newline.) +variable @code{FS} to the string @code{"\n"}. +(This single-character separator matches a single newline.) A practical example of a @value{DF} organized this way might be a mailing -list, where each entry is separated by blank lines. Consider a mailing +list, where blank lines separate the entries. Consider a mailing list in a file named @file{addresses}, which looks like this: @example @@ -7703,20 +7669,15 @@ If not in compatibility mode (@pxref{Options}), @command{gawk} sets @code{RT} to the input text that matched the value specified by @code{RS}. But if the input file ended without any text that matches @code{RS}, then @command{gawk} sets @code{RT} to the null string. -@c ENDOFRANGE recm -@c ENDOFRANGE imr -@c ENDOFRANGE frm @node Getline @section Explicit Input with @code{getline} -@c STARTOFRANGE getl @cindex @code{getline} command, explicit input with -@c STARTOFRANGE inex @cindex input, explicit So far we have been getting our input data from @command{awk}'s main input stream---either the standard input (usually your keyboard, sometimes -the output from another program) or from the +the output from another program) or the files specified on the command line. The @command{awk} language has a special built-in command called @code{getline} that can be used to read input under your explicit control. @@ -7900,7 +7861,7 @@ free @end example The @code{getline} command used in this way sets only the variables -@code{NR}, @code{FNR}, and @code{RT} (and of course, @var{var}). +@code{NR}, @code{FNR}, and @code{RT} (and, of course, @var{var}). The record is not split into fields, so the values of the fields (including @code{$0}) and the value of @code{NF} do not change. @@ -7915,7 +7876,7 @@ the value of @code{NF} do not change. @cindex left angle bracket (@code{<}), @code{<} operator (I/O) @cindex operators, input/output Use @samp{getline < @var{file}} to read the next record from @var{file}. -Here @var{file} is a string-valued expression that +Here, @var{file} is a string-valued expression that specifies the @value{FN}. @samp{< @var{file}} is called a @dfn{redirection} because it directs input to come from a different place. For example, the following @@ -8093,7 +8054,7 @@ of a construct like @samp{@w{"echo "} "date" | getline}. Most versions, including the current version, treat it at as @samp{@w{("echo "} "date") | getline}. (This is also how BWK @command{awk} behaves.) -Some versions changed and treated it as +Some versions instead treat it as @samp{@w{"echo "} ("date" | getline)}. (This is how @command{mawk} behaves.) In short, @emph{always} use explicit parentheses, and then you won't @@ -8141,7 +8102,7 @@ program to be portable to other @command{awk} implementations. @cindex operators, input/output @cindex differences in @command{awk} and @command{gawk}, input/output operators -Input into @code{getline} from a pipe is a one-way operation. +Reading input into @code{getline} from a pipe is a one-way operation. The command that is started with @samp{@var{command} | getline} only sends data @emph{to} your @command{awk} program. @@ -8151,7 +8112,7 @@ for processing and then read the results back. communications are possible. This is done with the @samp{|&} operator. Typically, you write data to the coprocess first and then -read results back, as shown in the following: +read the results back, as shown in the following: @example print "@var{some query}" |& "db_server" @@ -8234,7 +8195,7 @@ also @pxref{Auto-set}.) @item Using @code{FILENAME} with @code{getline} (@samp{getline < FILENAME}) -is likely to be a source for +is likely to be a source of confusion. @command{awk} opens a separate input stream from the current input file. However, by not using a variable, @code{$0} and @code{NF} are still updated. If you're doing this, it's @@ -8242,9 +8203,15 @@ probably by accident, and you should reconsider what it is you're trying to accomplish. @item -@DBREF{Getline Summary} presents a table summarizing the +@ifdocbook +The next section +@end ifdocbook +@ifnotdocbook +@ref{Getline Summary}, +@end ifnotdocbook +presents a table summarizing the @code{getline} variants and which variables they can affect. -It is worth noting that those variants which do not use redirection +It is worth noting that those variants that do not use redirection can cause @code{FILENAME} to be updated if they cause @command{awk} to start reading a new input file. @@ -8253,7 +8220,7 @@ can cause @code{FILENAME} to be updated if they cause If the variable being assigned is an expression with side effects, different versions of @command{awk} behave differently upon encountering end-of-file. Some versions don't evaluate the expression; many versions -(including @command{gawk}) do. Here is an example, due to Duncan Moore: +(including @command{gawk}) do. Here is an example, courtesy of Duncan Moore: @ignore Date: Sun, 01 Apr 2012 11:49:33 +0100 @@ -8270,7 +8237,7 @@ BEGIN @{ @noindent Here, the side effect is the @samp{++c}. Is @code{c} incremented if -end of file is encountered, before the element in @code{a} is assigned? +end-of-file is encountered before the element in @code{a} is assigned? @command{gawk} treats @code{getline} like a function call, and evaluates the expression @samp{a[++c]} before attempting to read from @file{f}. @@ -8302,9 +8269,6 @@ Note: for each variant, @command{gawk} sets the @code{RT} predefined variable. @item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} and @code{RT} @tab @command{gawk} @end multitable @end float -@c ENDOFRANGE getl -@c ENDOFRANGE inex -@c ENDOFRANGE infir @node Read Timeout @section Reading Input with a Timeout @@ -8315,8 +8279,8 @@ This @value{SECTION} describes a feature that is specific to @command{gawk}. You may specify a timeout in milliseconds for reading input from the keyboard, a pipe, or two-way communication, including TCP/IP sockets. This can be done -on a per input, command, or connection basis, by setting a special element -in the @code{PROCINFO} array (@pxref{Auto-set}): +on a per-input, per-command, or per-connection basis, by setting a special +element in the @code{PROCINFO} array (@pxref{Auto-set}): @example PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds} @@ -8347,7 +8311,7 @@ while ((getline < "/dev/stdin") > 0) @end example @command{gawk} terminates the read operation if input does not -arrive after waiting for the timeout period, returns failure +arrive after waiting for the timeout period, returns failure, and sets @code{ERRNO} to an appropriate string value. A negative or zero value for the timeout is the same as specifying no timeout at all. @@ -8397,7 +8361,7 @@ If the @code{PROCINFO} element is not present and the @command{gawk} uses its value to initialize the timeout value. The exclusive use of the environment variable to specify timeout has the disadvantage of not being able to control it -on a per command or connection basis. +on a per-command or per-connection basis. @command{gawk} considers a timeout event to be an error even though the attempt to read from the underlying device may @@ -8463,7 +8427,7 @@ The possibilities are as follows: @item After splitting the input into records, @command{awk} further splits -the record into individual fields, named @code{$1}, @code{$2}, and so +the records into individual fields, named @code{$1}, @code{$2}, and so on. @code{$0} is the whole record, and @code{NF} indicates how many fields there are. The default way to split fields is between whitespace characters. @@ -8479,12 +8443,12 @@ thing. Decrementing @code{NF} throws away fields and rebuilds the record. @item Field splitting is more complicated than record splitting: -@multitable @columnfractions .40 .45 .15 +@multitable @columnfractions .40 .40 .20 @headitem Field separator value @tab Fields are split @dots{} @tab @command{awk} / @command{gawk} @item @code{FS == " "} @tab On runs of whitespace @tab @command{awk} @item @code{FS == @var{any single character}} @tab On that character @tab @command{awk} @item @code{FS == @var{regexp}} @tab On text matching the regexp @tab @command{awk} -@item @code{FS == ""} @tab Each individual character is a separate field @tab @command{gawk} +@item @code{FS == ""} @tab Such that each individual character is a separate field @tab @command{gawk} @item @code{FIELDWIDTHS == @var{list of columns}} @tab Based on character position @tab @command{gawk} @item @code{FPAT == @var{regexp}} @tab On the text surrounding text matching the regexp @tab @command{gawk} @end multitable @@ -8501,11 +8465,11 @@ This can also be done using command-line variable assignment. Use @code{PROCINFO["FS"]} to see how fields are being split. @item -Use @code{getline} in its various forms to read additional records, +Use @code{getline} in its various forms to read additional records from the default input stream, from a file, or from a pipe or coprocess. @item -Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to timeout +Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to time out for @var{file}. @item @@ -8539,7 +8503,6 @@ That can be fixed by making one simple change. What is it? @node Printing @chapter Printing Output -@c STARTOFRANGE prnt @cindex printing @cindex output, printing, See printing One of the most common programming actions is to @dfn{print}, or output, @@ -8555,7 +8518,6 @@ columns, whether to use exponential notation or not, and so on. For printing with specifications, you need the @code{printf} statement (@pxref{Printf}). -@c STARTOFRANGE prnts @cindex @code{print} statement @cindex @code{printf} statement Besides basic and formatted printing, this @value{CHAPTER} @@ -8576,6 +8538,7 @@ and discusses the @code{close()} built-in function. @command{gawk} allows access to inherited file descriptors. * Close Files And Pipes:: Closing Input and Output Files and Pipes. +* Nonfatal:: Enabling Nonfatal Output. * Output Summary:: Output summary. * Output Exercises:: Exercises. @end menu @@ -8616,7 +8579,7 @@ space is printed between any two items. Note that the @code{print} statement is a statement and not an expression---you can't use it in the pattern part of a -@var{pattern}-@var{action} statement, for example. +pattern--action statement, for example. @node Print Examples @section @code{print} Statement Examples @@ -8735,7 +8698,6 @@ You can continue either a @code{print} or @code{printf} statement simply by putting a newline after any comma (@pxref{Statements/Lines}). @end quotation -@c ENDOFRANGE prnts @node Output Separators @section Output Separators @@ -8808,7 +8770,7 @@ runs together on a single line. @cindex numeric, output format @cindex formats@comma{} numeric output When printing numeric values with the @code{print} statement, -@command{awk} internally converts the number to a string of characters +@command{awk} internally converts each number to a string of characters and prints that string. @command{awk} uses the @code{sprintf()} function to do this conversion (@pxref{String Functions}). @@ -8848,7 +8810,6 @@ if @code{OFMT} contains anything but a floating-point conversion specification. @node Printf @section Using @code{printf} Statements for Fancier Printing -@c STARTOFRANGE printfs @cindex @code{printf} statement @cindex output, formatted @cindex formatting output @@ -8880,7 +8841,7 @@ printf @var{format}, @var{item1}, @var{item2}, @dots{} @noindent As for @code{print}, the entire list of arguments may optionally be enclosed in parentheses. Here too, the parentheses are necessary if any -of the item expressions use the @samp{>} relational operator; otherwise, +of the item expressions uses the @samp{>} relational operator; otherwise, it can be confused with an output redirection (@pxref{Redirection}). @cindex format specifiers @@ -8911,7 +8872,7 @@ $ @kbd{awk 'BEGIN @{} @end example @noindent -Here, neither the @samp{+} nor the @samp{OUCH!} appear in +Here, neither the @samp{+} nor the @samp{OUCH!} appears in the output message. @node Control Letters @@ -8958,8 +8919,8 @@ The two control letters are equivalent. (The @samp{%i} specification is for compatibility with ISO C.) @item @code{%e}, @code{%E} -Print a number in scientific (exponential) notation; -for example: +Print a number in scientific (exponential) notation. +For example: @example printf "%4.3e\n", 1950 @@ -8996,7 +8957,7 @@ The special ``not a number'' value formats as @samp{-nan} or @samp{nan} (@pxref{Math Definitions}). @item @code{%F} -Like @samp{%f} but the infinity and ``not a number'' values are spelled +Like @samp{%f}, but the infinity and ``not a number'' values are spelled using uppercase letters. The @samp{%F} format is a POSIX extension to ISO C; not all systems @@ -9046,7 +9007,6 @@ values or do something else entirely. @node Format Modifiers @subsection Modifiers for @code{printf} Formats -@c STARTOFRANGE pfm @cindex @code{printf} statement, modifiers @cindex modifiers@comma{} in format specifiers A format specification can also include @dfn{modifiers} that can control @@ -9057,12 +9017,12 @@ represent spaces in the output. Here are the possible modifiers, in the order in which they may appear: -@table @code +@table @asis @cindex differences in @command{awk} and @command{gawk}, @code{print}/@code{printf} statements @cindex @code{printf} statement, positional specifiers @c the code{} does NOT start a secondary @cindex positional specifiers, @code{printf} statement -@item @var{N}$ +@item @code{@var{N}$} An integer constant followed by a @samp{$} is a @dfn{positional specifier}. Normally, format specifications are applied to arguments in the order given in the format string. With a positional specifier, the format @@ -9085,7 +9045,7 @@ messages at runtime. which describes how and why to use positional specifiers. For now, we ignore them. -@item - (Minus) +@item @code{-} (Minus) The minus sign, used before the width modifier (see later on in this list), says to left-justify @@ -9103,13 +9063,13 @@ prints @samp{foo@bullet{}}. For numeric conversions, prefix positive values with a space and negative values with a minus sign. -@item + +@item @code{+} The plus sign, used before the width modifier (see later on in this list), says to always supply a sign for numeric conversions, even if the data to format is positive. The @samp{+} overrides the space modifier. -@item # +@item @code{#} Use an ``alternative form'' for certain control letters. For @samp{%o}, supply a leading zero. For @samp{%x} and @samp{%X}, supply a leading @samp{0x} or @samp{0X} for @@ -9118,14 +9078,14 @@ For @samp{%e}, @samp{%E}, @samp{%f}, and @samp{%F}, the result always contains a decimal point. For @samp{%g} and @samp{%G}, trailing zeros are not removed from the result. -@item 0 +@item @code{0} A leading @samp{0} (zero) acts as a flag indicating that output should be padded with zeros instead of spaces. This applies only to the numeric output formats. This flag only has an effect when the field width is wider than the value to print. -@item ' +@item @code{'} A single quote or apostrophe character is a POSIX extension to ISO C. It indicates that the integer part of a floating-point value, or the entire part of an integer decimal value, should have a thousands-separator @@ -9178,7 +9138,7 @@ prints @samp{foobar}. Preceding the @var{width} with a minus sign causes the output to be padded with spaces on the right, instead of on the left. -@item .@var{prec} +@item @code{.@var{prec}} A period followed by an integer constant specifies the precision to use when printing. The meaning of the precision varies by control letter: @@ -9241,7 +9201,7 @@ printf "%" w "." p "s\n", s @end example @noindent -This is not particularly easy to read but it does work. +This is not particularly easy to read, but it does work. @c @cindex lint checks @cindex troubleshooting, fatal errors, @code{printf} format strings @@ -9252,7 +9212,6 @@ format strings. These are not valid in @command{awk}. Most @command{awk} implementations silently ignore them. If @option{--lint} is provided on the command line (@pxref{Options}), @command{gawk} warns about their use. If @option{--posix} is supplied, their use is a fatal error. -@c ENDOFRANGE pfm @node Printf Examples @subsection Examples Using @code{printf} @@ -9288,7 +9247,7 @@ $ @kbd{awk '@{ printf "%-10s %s\n", $1, $2 @}' mail-list} @end example In this case, the phone numbers had to be printed as strings because -the numbers are separated by a dash. Printing the phone numbers as +the numbers are separated by dashes. Printing the phone numbers as numbers would have produced just the first three digits: @samp{555}. This would have been pretty confusing. @@ -9333,14 +9292,11 @@ awk 'BEGIN @{ format = "%-10s %s\n" @{ printf format, $1, $2 @}' mail-list @end example -@c ENDOFRANGE printfs @node Redirection @section Redirecting Output of @code{print} and @code{printf} -@c STARTOFRANGE outre @cindex output redirection -@c STARTOFRANGE reout @cindex redirection of output @cindex @option{--sandbox} option, output redirection with @code{print}, @code{printf} So far, the output from @code{print} and @code{printf} has gone @@ -9351,7 +9307,7 @@ This is called @dfn{redirection}. @quotation NOTE When @option{--sandbox} is specified (@pxref{Options}), -redirecting output to files, pipes and coprocesses is disabled. +redirecting output to files, pipes, and coprocesses is disabled. @end quotation A redirection appears after the @code{print} or @code{printf} statement. @@ -9404,7 +9360,7 @@ Each output file contains one name or number per line. @cindex @code{>} (right angle bracket), @code{>>} operator (I/O) @cindex right angle bracket (@code{>}), @code{>>} operator (I/O) @item print @var{items} >> @var{output-file} -This redirection prints the items into the pre-existing output file +This redirection prints the items into the preexisting output file named @var{output-file}. The difference between this and the single-@samp{>} redirection is that the old contents (if any) of @var{output-file} are not erased. Instead, the @command{awk} output is @@ -9443,7 +9399,7 @@ The unsorted list is written with an ordinary redirection, while the sorted list is written by piping through the @command{sort} utility. The next example uses redirection to mail a message to the mailing -list @samp{bug-system}. This might be useful when trouble is encountered +list @code{bug-system}. This might be useful when trouble is encountered in an @command{awk} script run periodically for system maintenance: @example @@ -9474,15 +9430,23 @@ This redirection prints the items to the input of @var{command}. The difference between this and the single-@samp{|} redirection is that the output from @var{command} can be read with @code{getline}. -Thus @var{command} is a @dfn{coprocess}, which works together with, -but subsidiary to, the @command{awk} program. +Thus, @var{command} is a @dfn{coprocess}, which works together with +but is subsidiary to the @command{awk} program. This feature is a @command{gawk} extension, and is not available in POSIX @command{awk}. -@DBXREF{Getline/Coprocess} +@ifnotdocbook +@xref{Getline/Coprocess}, for a brief discussion. -@DBXREF{Two-way I/O} +@xref{Two-way I/O}, for a more complete discussion. +@end ifnotdocbook +@ifdocbook +@DBXREF{Getline/Coprocess} +for a brief discussion and +@DBREF{Two-way I/O} +for a more complete discussion. +@end ifdocbook @end table Redirecting output using @samp{>}, @samp{>>}, @samp{|}, or @samp{|&} @@ -9507,7 +9471,7 @@ This is indeed how redirections must be used from the shell. But in @command{awk}, it isn't necessary. In this kind of case, a program should use @samp{>} for all the @code{print} statements, because the output file is only opened once. (It happens that if you mix @samp{>} and @samp{>>} -that output is produced in the expected order. However, mixing the operators +output is produced in the expected order. However, mixing the operators for the same file is definitely poor style, and is confusing to readers of your program.) @@ -9557,11 +9521,9 @@ It then sends the list to the shell for execution. @DBXREF{Shell Quoting} for a function that can help in generating command lines to be fed to the shell. @end sidebar -@c ENDOFRANGE outre -@c ENDOFRANGE reout @node Special FD -@section Special Files for Standard Pre-Opened Data Streams +@section Special Files for Standard Preopened Data Streams @cindex standard input @cindex input, standard @cindex standard output @@ -9574,7 +9536,7 @@ command lines to be fed to the shell. Running programs conventionally have three input and output streams already available to them for reading and writing. These are known as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard -error output}. These open streams (and any other open file or pipe) +error output}. These open streams (and any other open files or pipes) are often referred to by the technical term @dfn{file descriptors}. These streams are, by default, connected to your keyboard and screen, but @@ -9612,7 +9574,7 @@ that is connected to your keyboard and screen. It represents the ``terminal,''@footnote{The ``tty'' in @file{/dev/tty} stands for ``Teletype,'' a serial terminal.} which on modern systems is a keyboard and screen, not a serial console.) -This generally has the same effect but not always: although the +This generally has the same effect, but not always: although the standard error stream is usually the screen, it can be redirected; when that happens, writing to the screen is not correct. In fact, if @command{awk} is run from a background job, it may not have a @@ -9657,7 +9619,7 @@ print "Serious error detected!" > "/dev/stderr" @cindex troubleshooting, quotes with file names Note the use of quotes around the @value{FN}. -Like any other redirection, the value must be a string. +Like with any other redirection, the value must be a string. It is a common error to omit the quotes, which leads to confusing results. @@ -9668,7 +9630,6 @@ invoked with the @option{--traditional} option (@pxref{Options}). @node Special Files @section Special @value{FFN}s in @command{gawk} -@c STARTOFRANGE gfn @cindex @command{gawk}, file names in Besides access to standard input, standard output, and standard error, @@ -9684,7 +9645,7 @@ TCP/IP networking. @end menu @node Other Inherited Files -@subsection Accessing Other Open Files With @command{gawk} +@subsection Accessing Other Open Files with @command{gawk} Besides the @code{/dev/stdin}, @code{/dev/stdout}, and @code{/dev/stderr} special @value{FN}s mentioned earlier, @command{gawk} provides syntax @@ -9741,7 +9702,7 @@ special @value{FN}s that @command{gawk} provides: @cindex compatibility mode (@command{gawk}), file names @cindex file names, in compatibility mode @item -Recognition of the @value{FN}s for the three standard pre-opened +Recognition of the @value{FN}s for the three standard preopened files is disabled only in POSIX mode. @item @@ -9754,23 +9715,18 @@ compatibility mode (either @option{--traditional} or @option{--posix}; interprets these special @value{FN}s. For example, using @samp{/dev/fd/4} for output actually writes on file descriptor 4, and not on a new -file descriptor that is @code{dup()}'ed from file descriptor 4. Most of +file descriptor that is @code{dup()}ed from file descriptor 4. Most of the time this does not matter; however, it is important to @emph{not} close any of the files related to file descriptors 0, 1, and 2. Doing so results in unpredictable behavior. @end itemize -@c ENDOFRANGE gfn @node Close Files And Pipes @section Closing Input and Output Redirections @cindex files, output, See output files -@c STARTOFRANGE ifc @cindex input files, closing -@c STARTOFRANGE ofc @cindex output, files@comma{} closing -@c STARTOFRANGE pc @cindex pipe, closing -@c STARTOFRANGE cc @cindex coprocesses, closing @cindex @code{getline} command, coprocesses@comma{} using from @@ -9976,18 +9932,79 @@ This value is zero if the close succeeds, or @minus{}1 if it fails. The POSIX standard is very vague; it says that @code{close()} -returns zero on success and nonzero otherwise. In general, +returns zero on success and a nonzero value otherwise. In general, different implementations vary in what they report when closing -pipes; thus the return value cannot be used portably. +pipes; thus, the return value cannot be used portably. @value{DARKCORNER} In POSIX mode (@pxref{Options}), @command{gawk} just returns zero when closing a pipe. @end sidebar -@c ENDOFRANGE ifc -@c ENDOFRANGE ofc -@c ENDOFRANGE pc -@c ENDOFRANGE cc + +@node Nonfatal +@section Enabling Nonfatal Output + +This @value{SECTION} describes a @command{gawk}-specific feature. + +In standard @command{awk}, output with @code{print} or @code{printf} +to a nonexistent file, or some other I/O error (such as filling up the +disk) is a fatal error. + +@example +$ @kbd{gawk 'BEGIN @{ print "hi" > "/no/such/file" @}'} +@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No such file or directory) +@end example + +@command{gawk} makes it possible to detect that an error has +occurred, allowing you to possibly recover from the error, or +at least print an error message of your choosing before exiting. +You can do this in one of two ways: + +@itemize @bullet +@item +For all output files, by assigning any value to @code{PROCINFO["NONFATAL"]}. + +@item +On a per-file basis, by assigning any value to +@code{PROCINFO[@var{filename}, "NONFATAL"]}. +Here, @var{filename} is the name of the file to which +you wish output to be nonfatal. +@end itemize + +Once you have enabled nonfatal output, you must check @code{ERRNO} +after every relevant @code{print} or @code{printf} statement to +see if something went wrong. It is also a good idea to initialize +@code{ERRNO} to zero before attempting the output. For example: + +@example +$ @kbd{gawk '} +> @kbd{BEGIN @{} +> @kbd{ PROCINFO["NONFATAL"] = 1} +> @kbd{ ERRNO = 0} +> @kbd{ print "hi" > "/no/such/file"} +> @kbd{ if (ERRNO) @{} +> @kbd{ print("Output failed:", ERRNO) > "/dev/stderr"} +> @kbd{ exit 1} +> @kbd{ @}} +> @kbd{@}'} +@error{} Output failed: No such file or directory +@end example + +Here, @command{gawk} did not produce a fatal error; instead +it let the @command{awk} program code detect the problem and handle it. + +This mechanism works also for standard output and standard error. +For standard output, you may use @code{PROCINFO["-", "NONFATAL"]} +or @code{PROCINFO["/dev/stdout", "NONFATAL"]}. For standard error, use +@code{PROCINFO["/dev/stderr", "NONFATAL"]}. + +When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}), +@command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES} +environment variable (@pxref{Other Environment Variables}) allows you to +override @command{gawk}'s builtin default number of attempts. However, +once nonfatal I/O is enabled for a given socket, @command{gawk} only +retries once, relying on @command{awk}-level code to notice that there +was a problem. @node Output Summary @section Summary @@ -10001,8 +10018,8 @@ for numeric values for the @code{print} statement. @item The @code{printf} statement provides finer-grained control over output, -with format control letters for different data types and various flags -that modify the behavior of the format control letters. +with format-control letters for different data types and various flags +that modify the behavior of the format-control letters. @item Output from both @code{print} and @code{printf} may be redirected to @@ -10017,6 +10034,12 @@ Use @code{close()} to close open file, pipe, and coprocess redirections. For coprocesses, it is possible to close only one direction of the communications. +@item +Normally errors with @code{print} or @code{printf} are fatal. +@command{gawk} lets you make output errors be nonfatal either for +all files or on a per-file basis. You must then check for errors +after every relevant output statement. + @end itemize @c EXCLUDE START @@ -10051,11 +10074,9 @@ BEGIN @{ print "Serious error detected!" > /dev/stderr @} @end enumerate @c EXCLUDE END -@c ENDOFRANGE prnt @node Expressions @chapter Expressions -@c STARTOFRANGE exps @cindex expressions Expressions are the basic building blocks of @command{awk} patterns @@ -10066,7 +10087,7 @@ can assign a new value to a variable or a field by using an assignment operator. An expression can serve as a pattern or action statement on its own. Most other kinds of statements contain one or more expressions that specify the data on which to -operate. As in other languages, expressions in @command{awk} include +operate. As in other languages, expressions in @command{awk} can include variables, array references, constants, and function calls, as well as combinations of these with various operators. @@ -10085,7 +10106,7 @@ combinations of these with various operators. Expressions are built up from values and the operations performed upon them. This @value{SECTION} describes the elementary objects -which provide the values used in expressions. +that provide the values used in expressions. @menu * Constants:: String, numeric and regexp constants. @@ -10098,7 +10119,6 @@ which provide the values used in expressions. @node Constants @subsection Constant Expressions -@c STARTOFRANGE cnst @cindex constants, types of The simplest type of expression is the @dfn{constant}, which always has @@ -10136,7 +10156,7 @@ have the same value: @end example @cindex string constants -A string constant consists of a sequence of characters enclosed in +A @dfn{string constant} consists of a sequence of characters enclosed in double quotation marks. For example: @example @@ -10148,7 +10168,7 @@ double quotation marks. For example: @cindex strings, length limitations represents the string whose contents are @samp{parrot}. Strings in @command{gawk} can be of any length, and they can contain any of the possible -eight-bit ASCII characters including ASCII @sc{nul} (character code zero). +eight-bit ASCII characters, including ASCII @sc{nul} (character code zero). Other @command{awk} implementations may have difficulty with some character codes. @@ -10163,15 +10183,15 @@ In @command{awk}, all numbers are in decimal (i.e., base 10). Many other programming languages allow you to specify numbers in other bases, often octal (base 8) and hexadecimal (base 16). In octal, the numbers go 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, and so on. -Just as @samp{11}, in decimal, is 1 times 10 plus 1, so -@samp{11}, in octal, is 1 times 8, plus 1. This equals 9 in decimal. +Just as @samp{11} in decimal is 1 times 10 plus 1, so +@samp{11} in octal is 1 times 8 plus 1. This equals 9 in decimal. In hexadecimal, there are 16 digits. Because the everyday decimal number system only has ten digits (@samp{0}--@samp{9}), the letters @samp{a} through @samp{f} are used to represent the rest. (Case in the letters is usually irrelevant; hexadecimal @samp{a} and @samp{A} have the same value.) -Thus, @samp{11}, in -hexadecimal, is 1 times 16 plus 1, which equals 17 in decimal. +Thus, @samp{11} in +hexadecimal is 1 times 16 plus 1, which equals 17 in decimal. Just by looking at plain @samp{11}, you can't tell what base it's in. So, in C, C++, and other languages derived from C, @@ -10182,13 +10202,13 @@ and hexadecimal numbers start with a leading @samp{0x} or @samp{0X}: @table @code @item 11 -Decimal value 11. +Decimal value 11 @item 011 -Octal 11, decimal value 9. +Octal 11, decimal value 9 @item 0x11 -Hexadecimal 11, decimal value 17. +Hexadecimal 11, decimal value 17 @end table This example shows the difference: @@ -10216,11 +10236,11 @@ you can use the @code{strtonum()} function (@pxref{String Functions}) to convert the data into a number. Most of the time, you will want to use octal or hexadecimal constants -when working with the built-in bit manipulation functions; +when working with the built-in bit-manipulation functions; see @DBREF{Bitwise Functions} for more information. -Unlike some early C implementations, @samp{8} and @samp{9} are not +Unlike in some early C implementations, @samp{8} and @samp{9} are not valid in octal constants. For example, @command{gawk} treats @samp{018} as decimal 18: @@ -10255,19 +10275,17 @@ $ @kbd{gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'} @node Regexp Constants @subsubsection Regular Expression Constants -@c STARTOFRANGE rec @cindex regexp constants @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator @cindex @code{!} (exclamation point), @code{!~} operator @cindex exclamation point (@code{!}), @code{!~} operator -A regexp constant is a regular expression description enclosed in +A @dfn{regexp constant} is a regular expression description enclosed in slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in @command{awk} programs are constant, but the @samp{~} and @samp{!~} matching operators can also match computed or dynamic regexps (which are typically just ordinary strings or variables that contain a regexp, -but could be a more complex expression). -@c ENDOFRANGE cnst +but could be more complex expressions). @node Using Constant Regexps @subsection Using Regular Expression Constants @@ -10347,7 +10365,7 @@ the third argument of @code{split()} to be a regexp constant, but some older implementations do not. @value{DARKCORNER} Because some built-in functions accept regexp constants as arguments, -it can be confusing when attempting to use regexp constants as arguments +confusion can arise when attempting to use regexp constants as arguments to user-defined functions (@pxref{User-defined}). For example: @example @@ -10373,19 +10391,18 @@ function mysub(pat, repl, str, global) In this example, the programmer wants to pass a regexp constant to the user-defined function @code{mysub()}, which in turn passes it on to either @code{sub()} or @code{gsub()}. However, what really happens is that -the @code{pat} parameter is either one or zero, depending upon whether +the @code{pat} parameter is assigned a value of either one or zero, depending upon whether or not @code{$0} matches @code{/hi/}. @command{gawk} issues a warning when it sees a regexp constant used as a parameter to a user-defined function, because passing a truth value in this way is probably not what was intended. -@c ENDOFRANGE rec @node Variables @subsection Variables @cindex variables, user-defined @cindex user-defined, variables -Variables are ways of storing values at one point in your program for +@dfn{Variables} are ways of storing values at one point in your program for use later in another part of your program. They can be manipulated entirely within the program text, and they can also be assigned values on the @command{awk} command line. @@ -10413,17 +10430,17 @@ are distinct variables. A variable name is a valid expression by itself; it represents the variable's current value. Variables are given new values with @dfn{assignment operators}, @dfn{increment operators}, and -@dfn{decrement operators}. -@xref{Assignment Ops}. +@dfn{decrement operators} +(@pxref{Assignment Ops}). In addition, the @code{sub()} and @code{gsub()} functions can change a variable's value, and the @code{match()}, @code{split()}, and @code{patsplit()} functions can change the contents of their -array parameters. @xref{String Functions}. +array parameters (@pxref{String Functions}). @cindex variables, built-in @cindex variables, initializing A few variables have special built-in meanings, such as @code{FS} (the -field separator), and @code{NF} (the number of fields in the current input +field separator) and @code{NF} (the number of fields in the current input record). @DBXREF{Built-in Variables} for a list of the predefined variables. These predefined variables can be used and assigned just like all other variables, but their values are also used or changed automatically by @@ -10651,7 +10668,7 @@ point, so the default behavior was restored to use a period as the decimal point character. You can use the @option{--use-lc-numeric} option (@pxref{Options}) to force @command{gawk} to use the locale's decimal point character. (@command{gawk} also uses the locale's decimal -point character when in POSIX mode, either via @option{--posix}, or the +point character when in POSIX mode, either via @option{--posix} or the @env{POSIXLY_CORRECT} environment variable, as shown previously.) @ref{table-locale-affects} describes the cases in which the locale's decimal @@ -10669,7 +10686,7 @@ features have not been described yet. @end multitable @end float -Finally, modern day formal standards and IEEE standard floating-point +Finally, modern-day formal standards and the IEEE standard floating-point representation can have an unusual but important effect on the way @command{gawk} converts some special string values to numbers. The details are presented in @ref{POSIX Floating Point Problems}. @@ -10677,7 +10694,7 @@ are presented in @ref{POSIX Floating Point Problems}. @node All Operators @section Operators: Doing Something with Values -This @value{SECTION} introduces the @dfn{operators} which make use +This @value{SECTION} introduces the @dfn{operators} that make use of the values provided by constants and variables. @menu @@ -10855,7 +10872,7 @@ print "something meaningful" > file name @noindent This produces a syntax error with some versions of Unix @command{awk}.@footnote{It happens that BWK -@command{awk}, @command{gawk} and @command{mawk} all ``get it right,'' +@command{awk}, @command{gawk}, and @command{mawk} all ``get it right,'' but you should not rely on this.} It is necessary to use the following: @@ -10944,11 +10961,8 @@ you're never quite sure what you'll get. @node Assignment Ops @subsection Assignment Expressions -@c STARTOFRANGE asop @cindex assignment operators -@c STARTOFRANGE opas @cindex operators, assignment -@c STARTOFRANGE exas @cindex expressions, assignment @cindex @code{=} (equals sign), @code{=} operator @cindex equals sign (@code{=}), @code{=} operator @@ -11108,7 +11122,7 @@ and @ifdocbook @DBREF{Numeric Functions} @end ifdocbook -for more information). +for more information.) This example illustrates an important fact about assignment operators: the lefthand expression is only evaluated @emph{once}. @@ -11144,17 +11158,17 @@ to a number. @caption{Arithmetic assignment operators} @multitable @columnfractions .30 .70 @headitem Operator @tab Effect -@item @var{lvalue} @code{+=} @var{increment} @tab Add @var{increment} to the value of @var{lvalue} -@item @var{lvalue} @code{-=} @var{decrement} @tab Subtract @var{decrement} from the value of @var{lvalue} -@item @var{lvalue} @code{*=} @var{coefficient} @tab Multiply the value of @var{lvalue} by @var{coefficient} -@item @var{lvalue} @code{/=} @var{divisor} @tab Divide the value of @var{lvalue} by @var{divisor} -@item @var{lvalue} @code{%=} @var{modulus} @tab Set @var{lvalue} to its remainder by @var{modulus} +@item @var{lvalue} @code{+=} @var{increment} @tab Add @var{increment} to the value of @var{lvalue}. +@item @var{lvalue} @code{-=} @var{decrement} @tab Subtract @var{decrement} from the value of @var{lvalue}. +@item @var{lvalue} @code{*=} @var{coefficient} @tab Multiply the value of @var{lvalue} by @var{coefficient}. +@item @var{lvalue} @code{/=} @var{divisor} @tab Divide the value of @var{lvalue} by @var{divisor}. +@item @var{lvalue} @code{%=} @var{modulus} @tab Set @var{lvalue} to its remainder by @var{modulus}. @cindex common extensions, @code{**=} operator @cindex extensions, common@comma{} @code{**=} operator @cindex @command{awk} language, POSIX version @cindex POSIX @command{awk} -@item @var{lvalue} @code{^=} @var{power} @tab -@item @var{lvalue} @code{**=} @var{power} @tab Raise @var{lvalue} to the power @var{power} @value{COMMONEXT} +@item @var{lvalue} @code{^=} @var{power} @tab Raise @var{lvalue} to the power @var{power}. +@item @var{lvalue} @code{**=} @var{power} @tab Raise @var{lvalue} to the power @var{power}. @value{COMMONEXT} @end multitable @end float @@ -11202,16 +11216,11 @@ awk '/[=]=/' /dev/null @command{gawk} does not have this problem; BWK @command{awk} and @command{mawk} also do not. @end sidebar -@c ENDOFRANGE exas -@c ENDOFRANGE opas -@c ENDOFRANGE asop @node Increment Ops @subsection Increment and Decrement Operators -@c STARTOFRANGE inop @cindex increment operators -@c STARTOFRANGE opde @cindex operators, decrement/increment @dfn{Increment} and @dfn{decrement operators} increase or decrease the value of a variable by one. An assignment operator can do the same thing, so @@ -11235,6 +11244,7 @@ has the value four, but it changes the value of @code{foo} to five. In other words, the operator returns the old value of the variable, but with the side effect of incrementing it. +@c FIXME: Use @sup here for superscript The post-increment @samp{foo++} is nearly the same as writing @samp{(foo += 1) - 1}. It is not perfectly equivalent because all numbers in @command{awk} are floating point---in floating point, @samp{foo + 1 - 1} does @@ -11259,7 +11269,6 @@ just like variables. (Use @samp{$(i++)} when you want to do a field reference and a variable increment at the same time. The parentheses are necessary because of the precedence of the field reference operator @samp{$}.) -@c STARTOFRANGE deop @cindex decrement operators The decrement operator @samp{--} works just like @samp{++}, except that it subtracts one instead of adding it. As with @samp{++}, it can be used before @@ -11299,8 +11308,8 @@ like @samp{@var{lvalue}++}, but instead of adding, it subtracts.) @cindex evaluation order @cindex Marx, Groucho @quotation -@i{Doctor, doctor! It hurts when I do this!@* -So don't do that!} +@i{Doctor, it hurts when I do this!@* +Then don't do that!} @author Groucho Marx @end quotation @@ -11324,7 +11333,7 @@ print b @cindex side effects In other words, when do the various side effects prescribed by the postfix operators (@samp{b++}) take effect? -When side effects happen is @dfn{implementation defined}. +When side effects happen is @dfn{implementation-defined}. In other words, it is up to the particular version of @command{awk}. The result for the first example may be 12 or 13, and for the second, it may be 22 or 23. @@ -11335,15 +11344,12 @@ You should avoid such things in your own programs. @c You'll sleep better at night and be able to look at yourself @c in the mirror in the morning. @end sidebar -@c ENDOFRANGE inop -@c ENDOFRANGE opde -@c ENDOFRANGE deop @node Truth Values and Conditions @section Truth Values and Conditions -In certain contexts, expression values also serve as ``truth values''; (i.e., -they determine what should happen next as the program runs). This +In certain contexts, expression values also serve as ``truth values''; i.e., +they determine what should happen next as the program runs. This @value{SECTION} describes how @command{awk} defines ``true'' and ``false'' and how values are compared. @@ -11401,20 +11407,19 @@ the string constant @code{"0"} is actually true, because it is non-null. @i{The Guide is definitive. Reality is frequently inaccurate.} @author Douglas Adams, @cite{The Hitchhiker's Guide to the Galaxy} @end quotation +@c 2/2015: Antonio Colombo points out that this is really from +@c The Restaurant at the End of the Universe. But I'm going to +@c leave it alone. -@c STARTOFRANGE comex @cindex comparison expressions -@c STARTOFRANGE excom @cindex expressions, comparison @cindex expressions, matching, See comparison expressions @cindex matching, expressions, See comparison expressions @cindex relational operators, See comparison operators @cindex operators, relational, See operators@comma{} comparison -@c STARTOFRANGE varting @cindex variable typing -@c STARTOFRANGE vartypc @cindex variables, types of, comparison expressions and -Unlike other programming languages, @command{awk} variables do not have a +Unlike in other programming languages, in @command{awk} variables do not have a fixed type. Instead, they can be either a number or a string, depending upon the value that is assigned to them. We look now at how variables are typed, and how @command{awk} @@ -11443,20 +11448,20 @@ Variable typing follows these rules: @itemize @value{BULLET} @item -A numeric constant or the result of a numeric operation has the @var{numeric} +A numeric constant or the result of a numeric operation has the @dfn{numeric} attribute. @item -A string constant or the result of a string operation has the @var{string} +A string constant or the result of a string operation has the @dfn{string} attribute. @item Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements, @code{ENVIRON} elements, and the elements of an array created by @code{match()}, @code{split()}, and @code{patsplit()} that are numeric -strings have the @var{strnum} attribute. Otherwise, they have -the @var{string} attribute. Uninitialized variables also have the -@var{strnum} attribute. +strings have the @dfn{strnum} attribute. Otherwise, they have +the @dfn{string} attribute. Uninitialized variables also have the +@dfn{strnum} attribute. @item Attributes propagate across assignments but are not changed by @@ -11600,13 +11605,13 @@ constant, then a string comparison is performed. Otherwise, a numeric comparison is performed. This point bears additional emphasis: All user input is made of characters, -and so is first and foremost of @var{string} type; input strings -that look numeric are additionally given the @var{strnum} attribute. +and so is first and foremost of string type; input strings +that look numeric are additionally given the strnum attribute. Thus, the six-character input string @w{@samp{ +3.14}} receives the -@var{strnum} attribute. In contrast, the eight characters +strnum attribute. In contrast, the eight characters @w{@code{" +3.14"}} appearing in program text comprise a string constant. The following examples print @samp{1} when the comparison between -the two different constants is true, @samp{0} otherwise: +the two different constants is true, and @samp{0} otherwise: @c 22.9.2014: Tested with mawk and BWK awk, got same results. @example @@ -11736,7 +11741,7 @@ $ @kbd{echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}'} @noindent the result is @samp{false} because both @code{$1} and @code{$2} are user input. They are numeric strings---therefore both have -the @var{strnum} attribute, dictating a numeric comparison. +the strnum attribute, dictating a numeric comparison. The purpose of the comparison rules and the use of numeric strings is to attempt to produce the behavior that is ``least surprising,'' while still ``doing the right thing.'' @@ -11795,7 +11800,7 @@ characters sort, as defined by the locale (for more discussion, @pxref{Locales}). This order is usually very different from the results obtained when doing straight character-by-character comparison.@footnote{Technically, string comparison is supposed -to behave the same way as if the strings are compared with the C +to behave the same way as if the strings were compared with the C @code{strcoll()} function.} Because this behavior differs considerably from existing practice, @@ -11812,19 +11817,13 @@ $ @kbd{gawk --posix 'BEGIN @{ printf("ABC < abc = %s\n",} @print{} ABC < abc = FALSE @end example -@c ENDOFRANGE comex -@c ENDOFRANGE excom -@c ENDOFRANGE vartypc -@c ENDOFRANGE varting @node Boolean Ops @subsection Boolean Expressions @cindex and Boolean-logic operator @cindex or Boolean-logic operator @cindex not Boolean-logic operator -@c STARTOFRANGE exbo @cindex expressions, Boolean -@c STARTOFRANGE boex @cindex Boolean expressions @cindex operators, Boolean, See Boolean expressions @cindex Boolean operators, See Boolean expressions @@ -11908,7 +11907,7 @@ BEGIN @{ if (! ("HOME" in ENVIRON)) @cindex vertical bar (@code{|}), @code{||} operator The @samp{&&} and @samp{||} operators are called @dfn{short-circuit} operators because of the way they work. Evaluation of the full expression -is ``short-circuited'' if the result can be determined part way through +is ``short-circuited'' if the result can be determined partway through its evaluation. @cindex line continuations @@ -11970,8 +11969,6 @@ next record, and start processing the rules over again at the top. The reason it's there is to avoid printing the bracketing @samp{START} and @samp{END} lines. @end quotation -@c ENDOFRANGE exbo -@c ENDOFRANGE boex @node Conditional Exp @subsection Conditional Expressions @@ -11982,8 +11979,8 @@ The reason it's there is to avoid printing the bracketing A @dfn{conditional expression} is a special kind of expression that has three operands. It allows you to use one expression's value to select one of two other expressions. -The conditional expression is the same as in the C language, -as shown here: +The conditional expression in @command{awk} is the same as in the C +language, as shown here: @example @var{selector} ? @var{if-true-exp} : @var{if-false-exp} @@ -11992,8 +11989,8 @@ as shown here: @noindent There are three subexpressions. The first, @var{selector}, is always computed first. If it is ``true'' (not zero or not null), then -@var{if-true-exp} is computed next and its value becomes the value of -the whole expression. Otherwise, @var{if-false-exp} is computed next +@var{if-true-exp} is computed next, and its value becomes the value of +the whole expression. Otherwise, @var{if-false-exp} is computed next, and its value becomes the value of the whole expression. For example, the following expression produces the absolute value of @code{x}: @@ -12041,7 +12038,7 @@ ask for it by name at any point in the program. For example, the function @code{sqrt()} computes the square root of a number. @cindex functions, built-in -A fixed set of functions are @dfn{built-in}, which means they are +A fixed set of functions are @dfn{built in}, which means they are available in every @command{awk} program. The @code{sqrt()} function is one of these. @DBXREF{Built-in} for a list of built-in functions and their descriptions. In addition, you can define @@ -12150,9 +12147,7 @@ $ @kbd{awk -f matchit.awk} @node Precedence @section Operator Precedence (How Operators Nest) -@c STARTOFRANGE prec @cindex precedence -@c STARTOFRANGE oppr @cindex operators, precedence @dfn{Operator precedence} determines how operators are grouped when @@ -12217,7 +12212,7 @@ Increment, decrement. @cindex @code{*} (asterisk), @code{**} operator @cindex asterisk (@code{*}), @code{**} operator @item @code{^ **} -Exponentiation. These operators group right-to-left. +Exponentiation. These operators group right to left. @cindex @code{+} (plus sign), @code{+} operator @cindex plus sign (@code{+}), @code{+} operator @@ -12283,7 +12278,7 @@ statements belong to the statement level, not to expressions. The redirection does not produce an expression that could be the operand of another operator. As a result, it does not make sense to use a redirection operator near another operator of lower precedence without -parentheses. Such combinations (e.g., @samp{print foo > a ? b : c}), +parentheses. Such combinations (e.g., @samp{print foo > a ? b : c}) result in syntax errors. The correct way to write this statement is @samp{print foo > (a ? b : c)}. @@ -12301,17 +12296,17 @@ Array membership. @cindex @code{&} (ampersand), @code{&&} operator @cindex ampersand (@code{&}), @code{&&} operator @item @code{&&} -Logical ``and''. +Logical ``and.'' @cindex @code{|} (vertical bar), @code{||} operator @cindex vertical bar (@code{|}), @code{||} operator @item @code{||} -Logical ``or''. +Logical ``or.'' @cindex @code{?} (question mark), @code{?:} operator @cindex question mark (@code{?}), @code{?:} operator @item @code{?:} -Conditional. This operator groups right-to-left. +Conditional. This operator groups right to left. @cindex @code{+} (plus sign), @code{+=} operator @cindex plus sign (@code{+}), @code{+=} operator @@ -12328,7 +12323,7 @@ Conditional. This operator groups right-to-left. @cindex @code{^} (caret), @code{^=} operator @cindex caret (@code{^}), @code{^=} operator @item @code{= += -= *= /= %= ^= **=} -Assignment. These operators group right-to-left. +Assignment. These operators group right to left. @end table @cindex POSIX @command{awk}, @code{**} operator and @@ -12337,8 +12332,6 @@ Assignment. These operators group right-to-left. The @samp{|&}, @samp{**}, and @samp{**=} operators are not specified by POSIX. For maximum portability, do not use them. @end quotation -@c ENDOFRANGE prec -@c ENDOFRANGE oppr @node Locales @section Where You Are Makes a Difference @@ -12404,8 +12397,8 @@ Locales can influence the conversions. @item @command{awk} provides the usual arithmetic operators (addition, subtraction, multiplication, division, modulus), and unary plus and minus. -It also provides comparison operators, boolean operators, array membership -testing, and regexp +It also provides comparison operators, Boolean operators, an array membership +testing operator, and regexp matching operators. String concatenation is accomplished by placing two expressions next to each other; there is no explicit operator. The three-operand @samp{?:} operator provides an ``if-else'' test within @@ -12416,7 +12409,7 @@ Assignment operators provide convenient shorthands for common arithmetic operations. @item -In @command{awk}, a value is considered to be true if it is non-zero +In @command{awk}, a value is considered to be true if it is nonzero @emph{or} non-null. Otherwise, the value is false. @item @@ -12425,7 +12418,7 @@ lifetime. The type determines how it behaves in comparisons (string or numeric). @item -Function calls return a value which may be used as part of a larger +Function calls return a value that may be used as part of a larger expression. Expressions used to pass parameter values are fully evaluated before the function is called. @command{awk} provides built-in and user-defined functions; this is described in @@ -12442,11 +12435,9 @@ program, and occasionally the format for data read as input. @end itemize -@c ENDOFRANGE exps @node Patterns and Actions @chapter Patterns, Actions, and Variables -@c STARTOFRANGE pat @cindex patterns As you have already seen, each @command{awk} statement consists of @@ -12454,7 +12445,7 @@ a pattern with an associated action. This @value{CHAPTER} describes how you build patterns and actions, what kinds of things you can do within actions, and @command{awk}'s predefined variables. -The pattern-action rules and the statements available for use +The pattern--action rules and the statements available for use within actions form the core of @command{awk} programming. In a sense, everything covered up to here has been the foundation @@ -12589,6 +12580,7 @@ $ @kbd{awk '$1 ~ /li/ @{ print $2 @}' mail-list} @cindex regexp constants, as patterns @cindex patterns, regexp constants as +A regexp constant as a pattern is also a special case of an expression pattern. The expression @code{/li/} has the value one if @samp{li} appears in the current input record. Thus, as a pattern, @code{/li/} matches any record containing @samp{li}. @@ -12645,7 +12637,7 @@ patterns. Likewise, the special patterns @code{BEGIN}, @code{END}, which never match any input record, are not expressions and cannot appear inside Boolean patterns. -The precedence of the different operators which can appear in +The precedence of the different operators that can appear in patterns is described in @ref{Precedence}. @node Ranges @@ -12671,7 +12663,7 @@ prints every record in @file{myfile} between @samp{on}/@samp{off} pairs, inclusi A range pattern starts out by matching @var{begpat} against every input record. When a record matches @var{begpat}, the range pattern is -@dfn{turned on} and the range pattern matches this record as well. As long as +@dfn{turned on}, and the range pattern matches this record as well. As long as the range pattern stays turned on, it automatically matches every input record read. The range pattern also matches @var{endpat} against every input record; when this succeeds, the range pattern is @dfn{turned off} again @@ -12742,9 +12734,7 @@ a range pattern. @value{DARKCORNER} @node BEGIN/END @subsection The @code{BEGIN} and @code{END} Special Patterns -@c STARTOFRANGE beg @cindex @code{BEGIN} pattern -@c STARTOFRANGE end @cindex @code{END} pattern All the patterns described so far are for matching input records. The @code{BEGIN} and @code{END} special patterns are different. @@ -12817,7 +12807,7 @@ using library functions. for a number of useful library functions. If an @command{awk} program has only @code{BEGIN} rules and no -other rules, then the program exits after the @code{BEGIN} rule is +other rules, then the program exits after the @code{BEGIN} rules are run.@footnote{The original version of @command{awk} kept reading and ignoring input until the end of the file was seen.} However, if an @code{END} rule exists, then the input is read, even if there are @@ -12845,7 +12835,7 @@ Another way is simply to assign a value to @code{$0}. @cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and @cindex @code{BEGIN} pattern, @code{print} statement and @cindex @code{END} pattern, @code{print} statement and -The second point is similar to the first but from the other direction. +The second point is similar to the first, but from the other direction. Traditionally, due largely to implementation issues, @code{$0} and @code{NF} were @emph{undefined} inside an @code{END} rule. The POSIX standard specifies that @code{NF} is available in an @code{END} @@ -12882,8 +12872,6 @@ are not valid in an @code{END} rule, because all the input has been read. @ifdocbook @DBREF{Nextfile Statement}.) @end ifdocbook -@c ENDOFRANGE beg -@c ENDOFRANGE end @node BEGINFILE/ENDFILE @subsection The @code{BEGINFILE} and @code{ENDFILE} Special Patterns @@ -12936,7 +12924,7 @@ fatal error. @item If you have written extensions that modify the record handling (by -inserting an ``input parser,'' @pxref{Input Parsers}), you can invoke +inserting an ``input parser''; @pxref{Input Parsers}), you can invoke them at this point, before @command{gawk} has started processing the file. (This is a @emph{very} advanced feature, currently used only by the @uref{http://gawkextlib.sourceforge.net, @code{gawkextlib} project}.) @@ -12947,8 +12935,8 @@ the last record in an input file. For the last input file, it will be called before any @code{END} rules. The @code{ENDFILE} rule is executed even for empty input files. -Normally, when an error occurs when reading input in the normal input -processing loop, the error is fatal. However, if an @code{ENDFILE} +Normally, when an error occurs when reading input in the normal +input-processing loop, the error is fatal. However, if an @code{ENDFILE} rule is present, the error becomes non-fatal, and instead @code{ERRNO} is set. This makes it possible to catch and process I/O errors at the level of the @command{awk} program. @@ -12957,7 +12945,7 @@ level of the @command{awk} program. The @code{next} statement (@pxref{Next Statement}) is not allowed inside either a @code{BEGINFILE} or an @code{ENDFILE} rule. The @code{nextfile} statement is allowed only inside a -@code{BEGINFILE} rule, but not inside an @code{ENDFILE} rule. +@code{BEGINFILE} rule, not inside an @code{ENDFILE} rule. @cindex @code{getline} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and The @code{getline} statement (@pxref{Getline}) is restricted inside @@ -13004,7 +12992,6 @@ awk '@{ print $1 @}' mail-list @noindent prints the first field of every record. -@c ENDOFRANGE pat @node Using Shell Variables @section Using Shell Variables in Programs @@ -13034,11 +13021,11 @@ awk "/$pattern/ "'@{ nmatches++ @} @noindent The @command{awk} program consists of two pieces of quoted text that are concatenated together to form the program. -The first part is double quoted, which allows substitution of +The first part is double-quoted, which allows substitution of the @code{pattern} shell variable inside the quotes. -The second part is single quoted. +The second part is single-quoted. -Variable substitution via quoting works, but can be potentially +Variable substitution via quoting works, but can potentially be messy. It requires a good understanding of the shell's quoting rules (@pxref{Quoting}), and it's often difficult to correctly @@ -13153,11 +13140,8 @@ For deleting array elements. @node Statements @section Control Statements in Actions -@c STARTOFRANGE csta @cindex control statements -@c STARTOFRANGE acs @cindex statements, control, in actions -@c STARTOFRANGE accs @cindex actions, control statements in @dfn{Control statements}, such as @code{if}, @code{while}, and so on, @@ -13300,13 +13284,13 @@ The body of this loop is a compound statement enclosed in braces, containing two statements. The loop works in the following manner: first, the value of @code{i} is set to one. Then, the @code{while} statement tests whether @code{i} is less than or equal to -three. This is true when @code{i} equals one, so the @code{i}-th +three. This is true when @code{i} equals one, so the @code{i}th field is printed. Then the @samp{i++} increments the value of @code{i} and the loop repeats. The loop terminates when @code{i} reaches four. A newline is not required between the condition and the body; however, using one makes the program clearer unless the body is a -compound statement or else is very simple. The newline after the open-brace +compound statement or else is very simple. The newline after the open brace that begins the compound statement is not required either, but the program is harder to read without it. @@ -13336,9 +13320,9 @@ while (@var{condition}) @end example @noindent -This statement does not execute @var{body} even once if the @var{condition} -is false to begin with. -The following is an example of a @code{do} statement: +This statement does not execute the @var{body} even once if the +@var{condition} is false to begin with. The following is an example of +a @code{do} statement: @example @{ @@ -13405,7 +13389,7 @@ their assignments as separate statements preceding the @code{for} loop.) The same is true of the @var{increment} part. Incrementing additional variables requires separate statements at the end of the loop. The C compound expression, using C's comma operator, is useful in -this context but it is not supported in @command{awk}. +this context, but it is not supported in @command{awk}. Most often, @var{increment} is an increment expression, as in the previous example. But this is not required; it can be any expression @@ -13496,7 +13480,7 @@ default: Control flow in the @code{switch} statement works as it does in C. Once a match to a given case is made, the case statement bodies execute until a @code{break}, -@code{continue}, @code{next}, @code{nextfile} or @code{exit} is encountered, +@code{continue}, @code{next}, @code{nextfile}, or @code{exit} is encountered, or the end of the @code{switch} statement itself. For example: @example @@ -13548,12 +13532,12 @@ numbers: # find smallest divisor of num @{ num = $1 - for (div = 2; div * div <= num; div++) @{ - if (num % div == 0) + for (divisor = 2; divisor * divisor <= num; divisor++) @{ + if (num % divisor == 0) break @} - if (num % div == 0) - printf "Smallest divisor of %d is %d\n", num, div + if (num % divisor == 0) + printf "Smallest divisor of %d is %d\n", num, divisor else printf "%d is prime\n", num @} @@ -13574,12 +13558,12 @@ an @code{if}: # find smallest divisor of num @{ num = $1 - for (div = 2; ; div++) @{ - if (num % div == 0) @{ - printf "Smallest divisor of %d is %d\n", num, div + for (divisor = 2; ; divisor++) @{ + if (num % divisor == 0) @{ + printf "Smallest divisor of %d is %d\n", num, divisor break @} - if (div * div > num) @{ + if (divisor * divisor > num) @{ printf "%d is prime\n", num break @} @@ -13670,7 +13654,12 @@ body of a loop. Historical versions of @command{awk} treated a @code{continue} statement outside a loop the same way they treated a @code{break} statement outside a loop: as if it were a @code{next} statement +@ifset FOR_PRINT +(discussed in the following section). +@end ifset +@ifclear FOR_PRINT (@pxref{Next Statement}). +@end ifclear @value{DARKCORNER} Recent versions of BWK @command{awk} no longer work this way, nor does @command{gawk}. @@ -13798,7 +13787,7 @@ See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}. @cindex @code{nextfile} statement, user-defined functions and @cindex Brian Kernighan's @command{awk} @cindex @command{mawk} utility -The current version of BWK @command{awk}, and @command{mawk} +The current version of BWK @command{awk} and @command{mawk} also support @code{nextfile}. However, they don't allow the @code{nextfile} statement inside function bodies (@pxref{User-defined}). @command{gawk} does; a @code{nextfile} inside a function body reads the @@ -13836,7 +13825,7 @@ any @code{ENDFILE} rules; they do not execute. In such a case, if you don't want the @code{END} rule to do its job, set a variable -to nonzero before the @code{exit} statement and check that variable in +to a nonzero value before the @code{exit} statement and check that variable in the @code{END} rule. @DBXREF{Assert Function} for an example that does this. @@ -13875,15 +13864,10 @@ Negative values, and values of 127 or greater, may not produce consistent results across different operating systems. @end quotation -@c ENDOFRANGE csta -@c ENDOFRANGE acs -@c ENDOFRANGE accs @node Built-in Variables @section Predefined Variables -@c STARTOFRANGE bvar @cindex predefined variables -@c STARTOFRANGE varb @cindex variables, predefined Most @command{awk} variables are available to use for your own @@ -13909,10 +13893,8 @@ their areas of activity. @end menu @node User-modified -@subsection Built-In Variables That Control @command{awk} -@c STARTOFRANGE bvaru +@subsection Built-in Variables That Control @command{awk} @cindex predefined variables, user-modifiable -@c STARTOFRANGE nmbv @cindex user-modifiable variables The following is an alphabetical list of variables that you can change to @@ -13940,7 +13922,7 @@ respectively, should use binary I/O. A string value of @code{"rw"} or @code{"wr"} indicates that all files should use binary I/O. Any other string value is treated the same as @code{"rw"}, but causes @command{gawk} to generate a warning message. @code{BINMODE} is described in more -detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions}), +detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions}) also supports this variable, but only using numeric values. @cindex @code{CONVFMT} variable @@ -13948,7 +13930,7 @@ also supports this variable, but only using numeric values. @cindex numbers, converting, to strings @cindex strings, converting, numbers to @item @code{CONVFMT} -This string controls conversion of numbers to +A string that controls the conversion of numbers to strings (@pxref{Conversion}). It works by being passed, in effect, as the first argument to the @code{sprintf()} function @@ -14023,12 +14005,13 @@ is to simply say @samp{FS = FS}, perhaps with an explanatory comment. @cindex regular expressions, case sensitivity @item IGNORECASE # If @code{IGNORECASE} is nonzero or non-null, then all string comparisons -and all regular expression matching are case independent. Thus, regexp -matching with @samp{~} and @samp{!~}, as well as the @code{gensub()}, -@code{gsub()}, @code{index()}, @code{match()}, @code{patsplit()}, -@code{split()}, and @code{sub()} -functions, record termination with @code{RS}, and field splitting with -@code{FS} and @code{FPAT}, all ignore case when doing their particular regexp operations. +and all regular expression matching are case-independent. +This applies to +regexp matching with @samp{~} and @samp{!~}, +the @code{gensub()}, @code{gsub()}, @code{index()}, @code{match()}, +@code{patsplit()}, @code{split()}, and @code{sub()} functions, +record termination with @code{RS}, and field splitting with +@code{FS} and @code{FPAT}. However, the value of @code{IGNORECASE} does @emph{not} affect array subscripting and it does not affect field splitting when using a single-character field separator. @@ -14049,7 +14032,7 @@ Any other true value prints nonfatal warnings. Assigning a false value to @code{LINT} turns off the lint warnings. This variable is a @command{gawk} extension. It is not special -in other @command{awk} implementations. Unlike the other special variables, +in other @command{awk} implementations. Unlike with the other special variables, changing @code{LINT} does affect the production of lint warnings, even if @command{gawk} is in compatibility mode. Much as the @option{--lint} and @option{--traditional} options independently @@ -14061,7 +14044,7 @@ of @command{awk} being executed. @cindex numbers, converting, to strings @cindex strings, converting, numbers to @item OFMT -Controls conversion of numbers to +A string that controls conversion of numbers to strings (@pxref{Conversion}) for printing with the @code{print} statement. It works by being passed as the first argument to the @code{sprintf()} function @@ -14076,7 +14059,7 @@ strings in general expressions; this is now done by @code{CONVFMT}. @cindex separators, field @cindex field separators @item OFS -This is the output field separator (@pxref{Output Separators}). It is +The output field separator (@pxref{Output Separators}). It is output between the fields printed by a @code{print} statement. Its default value is @w{@code{" "}}, a string consisting of a single space. @@ -14094,7 +14077,7 @@ The working precision of arbitrary-precision floating-point numbers, @cindex @code{ROUNDMODE} variable @item ROUNDMODE # The rounding mode to use for arbitrary-precision arithmetic on -numbers, by default @code{"N"} (@samp{roundTiesToEven} in +numbers, by default @code{"N"} (@code{roundTiesToEven} in the IEEE 754 standard; @pxref{Setting the rounding mode}). @cindex @code{RS} variable @@ -14123,7 +14106,7 @@ just the first character of @code{RS}'s value is used. @item @code{SUBSEP} The subscript separator. It has the default value of @code{"\034"} and is used to separate the parts of the indices of a -multidimensional array. Thus, the expression @code{@w{foo["A", "B"]}} +multidimensional array. Thus, the expression @samp{@w{foo["A", "B"]}} really accesses @code{foo["A\034B"]} (@pxref{Multidimensional}). @@ -14139,17 +14122,11 @@ marked string constants in the source text, as well as for the (@pxref{Internationalization}). The default value of @code{TEXTDOMAIN} is @code{"messages"}. @end table -@c ENDOFRANGE bvar -@c ENDOFRANGE varb -@c ENDOFRANGE bvaru -@c ENDOFRANGE nmbv @node Auto-set -@subsection Built-In Variables That Convey Information +@subsection Built-in Variables That Convey Information -@c STARTOFRANGE bvconi @cindex predefined variables, conveying information -@c STARTOFRANGE vbconi @cindex variables, predefined conveying information The following is an alphabetical list of variables that @command{awk} sets automatically on certain occasions in order to provide @@ -14305,12 +14282,12 @@ input file. @item @code{NF} The number of fields in the current input record. @code{NF} is set each time a new record is read, when a new field is -created or when @code{$0} changes (@pxref{Fields}). +created, or when @code{$0} changes (@pxref{Fields}). Unlike most of the variables described in this @value{SUBSECTION}, assigning a value to @code{NF} has the potential to affect @command{awk}'s internal workings. In particular, assignments -to @code{NF} can be used to create or remove fields from the +to @code{NF} can be used to create fields in or remove fields from the current record. @xref{Changing Fields}. @cindex @code{FUNCTAB} array @@ -14360,7 +14337,7 @@ or @code{"FPAT"} if field matching with @code{FPAT} is in effect. @item PROCINFO["identifiers"] @cindex program identifiers A subarray, indexed by the names of all identifiers used in the text of -the AWK program. An @dfn{identifier} is simply the name of a variable +the @command{awk} program. An @dfn{identifier} is simply the name of a variable (be it scalar or array), built-in function, user-defined function, or extension function. For each identifier, the value of the element is one of the following: @@ -14380,7 +14357,7 @@ The identifier is an extension function loaded via The identifier is a scalar. @item "untyped" -The identifier is untyped (could be used as a scalar or array, +The identifier is untyped (could be used as a scalar or an array; @command{gawk} doesn't know yet). @item "user" @@ -14501,7 +14478,7 @@ is the length of the matched string, or @minus{}1 if no match is found. @cindex @code{RSTART} variable @item @code{RSTART} -The start-index in characters of the substring that is matched by the +The start index in characters of the substring that is matched by the @code{match()} function (@pxref{String Functions}). @code{RSTART} is set by invoking the @code{match()} function. Its value @@ -14568,11 +14545,9 @@ function multiply(variable, amount) @quotation NOTE In order to avoid severe time-travel paradoxes,@footnote{Not to mention difficult implementation issues.} neither @code{FUNCTAB} nor @code{SYMTAB} -are available as elements within the @code{SYMTAB} array. +is available as an element within the @code{SYMTAB} array. @end quotation @end table -@c ENDOFRANGE bvconi -@c ENDOFRANGE vbconi @sidebar Changing @code{NR} and @code{FNR} @cindex @code{NR} variable, changing @@ -14744,7 +14719,7 @@ When designing your program, you should choose options that don't conflict with @command{gawk}'s, because it will process any options that it accepts before passing the rest of the command line on to your program. Using @samp{#!} with the @option{-E} option may help -(@DBXREF{Executable Scripts} +(@DBPXREF{Executable Scripts} and @ifnotdocbook @DBPXREF{Options}). @@ -14758,15 +14733,15 @@ and @itemize @value{BULLET} @item -Pattern-action pairs make up the basic elements of an @command{awk} +Pattern--action pairs make up the basic elements of an @command{awk} program. Patterns are either normal expressions, range expressions, -regexp constants, one of the special keywords @code{BEGIN}, @code{END}, -@code{BEGINFILE}, @code{ENDFILE}, or empty. The action executes if +or regexp constants; one of the special keywords @code{BEGIN}, @code{END}, +@code{BEGINFILE}, or @code{ENDFILE}; or empty. The action executes if the current record matches the pattern. Empty (missing) patterns match all records. @item -I/O from @code{BEGIN} and @code{END} rules have certain constraints. +I/O from @code{BEGIN} and @code{END} rules has certain constraints. This is also true, only more so, for @code{BEGINFILE} and @code{ENDFILE} rules. The latter two give you ``hooks'' into @command{gawk}'s file processing, allowing you to recover from a file that otherwise would @@ -14796,12 +14771,12 @@ iteration of a loop (or get out of a @code{switch}). @item @code{next} and @code{nextfile} let you read the next record and start -over at the top of your program, or skip to the next input file and +over at the top of your program or skip to the next input file and start over, respectively. @item The @code{exit} statement terminates your program. When executed -from an action (or function body) it transfers control to the +from an action (or function body), it transfers control to the @code{END} statements. From an @code{END} statement body, it exits immediately. You may pass an optional numeric value to be used as @command{awk}'s exit status. @@ -14819,7 +14794,6 @@ control how @command{awk} will process the provided @value{DF}s. @node Arrays @chapter Arrays in @command{awk} -@c STARTOFRANGE arrs @cindex arrays An @dfn{array} is a table of values called @dfn{elements}. The @@ -14894,7 +14868,7 @@ In most other languages, arrays must be @dfn{declared} before use, including a specification of how many elements or components they contain. In such languages, the declaration causes a contiguous block of memory to be allocated for that -many elements. Usually, an index in the array must be a positive integer. +many elements. Usually, an index in the array must be a nonnegative integer. For example, the index zero specifies the first element in the array, which is actually stored at the beginning of the block of memory. Index one specifies the second element, which is stored in memory right after the @@ -14905,15 +14879,17 @@ the declaration. indices---e.g., @samp{15 .. 27}---but the size of the array is still fixed when the array is declared.) -A contiguous array of four elements might look like the following example, -conceptually, if the element values are 8, @code{"foo"}, -@code{""}, and 30 +@c 1/2015: Do not put the numeric values into @code. Array element +@c values are no different than scalar variable values. +A contiguous array of four elements might look like @ifnotdocbook -as shown in @ref{figure-array-elements}: +@ref{figure-array-elements}, @end ifnotdocbook @ifdocbook -as shown in @inlineraw{docbook, <xref linkend="figure-array-elements"/>}: +@inlineraw{docbook, <xref linkend="figure-array-elements"/>}, @end ifdocbook +conceptually, if the element values are eight, @code{"foo"}, +@code{""}, and 30. @ifnotdocbook @float Figure,figure-array-elements @@ -14938,12 +14914,10 @@ as shown in @inlineraw{docbook, <xref linkend="figure-array-elements"/>}: @noindent Only the values are stored; the indices are implicit from the order of -the values. Here, 8 is the value at index zero, because 8 appears in the +the values. Here, eight is the value at index zero, because eight appears in the position with zero elements before it. -@c STARTOFRANGE arrin @cindex arrays, indexing -@c STARTOFRANGE inarr @cindex indexing arrays @cindex associative arrays @cindex arrays, associative @@ -14952,19 +14926,21 @@ that each array is a collection of pairs---an index and its corresponding array element value: @ifnotdocbook -@example -@r{Index} 3 @r{Value} 30 -@r{Index} 1 @r{Value} "foo" -@r{Index} 0 @r{Value} 8 -@r{Index} 2 @r{Value} "" -@end example +@c extra empty column to indent it right +@multitable @columnfractions .1 .1 .1 +@headitem @tab Index @tab Value +@item @tab @code{3} @tab @code{30} +@item @tab @code{1} @tab @code{"foo"} +@item @tab @code{0} @tab @code{8} +@item @tab @code{2} @tab @code{""} +@end multitable @end ifnotdocbook @docbook <informaltable> <tgroup cols="2"> -<colspec colname="1" align="center"/> -<colspec colname="2" align="center"/> +<colspec colname="1" align="left"/> +<colspec colname="2" align="left"/> <thead> <row> <entry>Index</entry> @@ -15010,20 +14986,22 @@ at any time. For example, suppose a tenth element is added to the array whose value is @w{@code{"number ten"}}. The result is: @ifnotdocbook -@example -@r{Index} 10 @r{Value} "number ten" -@r{Index} 3 @r{Value} 30 -@r{Index} 1 @r{Value} "foo" -@r{Index} 0 @r{Value} 8 -@r{Index} 2 @r{Value} "" -@end example +@c extra empty column to indent it right +@multitable @columnfractions .1 .1 .2 +@headitem @tab Index @tab Value +@item @tab @code{10} @tab @code{"number ten"} +@item @tab @code{3} @tab @code{30} +@item @tab @code{1} @tab @code{"foo"} +@item @tab @code{0} @tab @code{8} +@item @tab @code{2} @tab @code{""} +@end multitable @end ifnotdocbook @docbook <informaltable> <tgroup cols="2"> -<colspec colname="1" align="center"/> -<colspec colname="2" align="center"/> +<colspec colname="1" align="left"/> +<colspec colname="2" align="left"/> <thead> <row> <entry>Index</entry> @@ -15070,24 +15048,25 @@ Now the array is @dfn{sparse}, which just means some indices are missing. It has elements 0--3 and 10, but doesn't have elements 4, 5, 6, 7, 8, or 9. Another consequence of associative arrays is that the indices don't -have to be positive integers. Any number, or even a string, can be +have to be nonnegative integers. Any number, or even a string, can be an index. For example, the following is an array that translates words from English to French: @ifnotdocbook -@example -@r{Index} "dog" @r{Value} "chien" -@r{Index} "cat" @r{Value} "chat" -@r{Index} "one" @r{Value} "un" -@r{Index} 1 @r{Value} "un" -@end example +@multitable @columnfractions .1 .1 .1 +@headitem @tab Index @tab Value +@item @tab @code{"dog"} @tab @code{"chien"} +@item @tab @code{"cat"} @tab @code{"chat"} +@item @tab @code{"one"} @tab @code{"un"} +@item @tab @code{1} @tab @code{"un"} +@end multitable @end ifnotdocbook @docbook <informaltable> <tgroup cols="2"> -<colspec colname="1" align="center"/> -<colspec colname="2" align="center"/> +<colspec colname="1" align="left"/> +<colspec colname="2" align="left"/> <thead> <row> <entry>Index</entry> @@ -15129,7 +15108,7 @@ numbers and strings as indices. There are some subtleties to how numbers work when used as array subscripts; this is discussed in more detail in @ref{Numeric Array Subscripts}.) -Here, the number @code{1} isn't double quoted, because @command{awk} +Here, the number @code{1} isn't double-quoted, because @command{awk} automatically converts it to a string. @cindex @command{gawk}, @code{IGNORECASE} variable in @@ -15146,8 +15125,6 @@ that array's indices are consecutive integers starting at one. @command{awk}'s arrays are efficient---the time to access an element is independent of the number of elements in the array. -@c ENDOFRANGE arrin -@c ENDOFRANGE inarr @node Reference to Elements @subsection Referring to an Array Element @@ -15156,7 +15133,7 @@ is independent of the number of elements in the array. @cindex elements of arrays The principal way to use an array is to refer to one of its elements. -An array reference is an expression as follows: +An @dfn{array reference} is an expression as follows: @example @var{array}[@var{index-expression}] @@ -15166,8 +15143,11 @@ An array reference is an expression as follows: Here, @var{array} is the name of an array. The expression @var{index-expression} is the index of the desired element of the array. +@c 1/2015: Having the 4.3 in @samp is a little iffy. It's essentially +@c an expression though, so leave be. It's to early in the discussion +@c to mention that it's really a string. The value of the array reference is the current value of that array -element. For example, @code{foo[4.3]} is an expression for the element +element. For example, @code{foo[4.3]} is an expression referencing the element of array @code{foo} at index @samp{4.3}. @cindex arrays, unassigned elements @@ -15259,7 +15239,7 @@ assign to that element of the array. The following program takes a list of lines, each beginning with a line number, and prints them out in order of line number. The line numbers -are not in order when they are first read---instead they +are not in order when they are first read---instead, they are scrambled. This program sorts the lines by making an array using the line numbers as subscripts. The program then prints out the lines in sorted order of their numbers. It is a very simple program and gets @@ -15331,7 +15311,7 @@ END @{ In programs that use arrays, it is often necessary to use a loop that executes once for each element of an array. In other languages, where -arrays are contiguous and indices are limited to positive integers, +arrays are contiguous and indices are limited to nonnegative integers, this is easy: all the valid indices can be found by counting from the lowest index up to the highest. This technique won't do the job in @command{awk}, because any number or string can be an array index. @@ -15353,7 +15333,7 @@ program has previously used, with the variable @var{var} set to that index. The following program uses this form of the @code{for} statement. The first rule scans the input records and notes which words appear (at least once) in the input, by storing a one into the array @code{used} with -the word as index. The second rule scans the elements of @code{used} to +the word as the index. The second rule scans the elements of @code{used} to find all the distinct words that appear in the input. It prints each word that is more than 10 characters long and also prints the number of such words. @@ -15450,7 +15430,7 @@ and will vary from one version of @command{awk} to the next. Often, though, you may wish to do something simple, such as ``traverse the array by comparing the indices in ascending order,'' or ``traverse the array by comparing the values in descending order.'' -@command{gawk} provides two mechanisms which give you this control. +@command{gawk} provides two mechanisms that give you this control: @itemize @value{BULLET} @item @@ -15507,21 +15487,26 @@ across different environments.} which @command{gawk} uses internally to perform the sorting. @item "@@ind_str_desc" -String indices ordered from high to low. +Like @code{"@@ind_str_asc"}, but the +string indices are ordered from high to low. @item "@@ind_num_desc" -Numeric indices ordered from high to low. +Like @code{"@@ind_num_asc"}, but the +numeric indices are ordered from high to low. @item "@@val_type_desc" -Element values, based on type, ordered from high to low. +Like @code{"@@val_type_asc"}, but the +element values, based on type, are ordered from high to low. Subarrays, if present, come out first. @item "@@val_str_desc" -Element values, treated as strings, ordered from high to low. +Like @code{"@@val_str_asc"}, but the +element values, treated as strings, are ordered from high to low. Subarrays, if present, come out first. @item "@@val_num_desc" -Element values, treated as numbers, ordered from high to low. +Like @code{"@@val_num_asc"}, but the +element values, treated as numbers, are ordered from high to low. Subarrays, if present, come out first. @end table @@ -15744,7 +15729,7 @@ for (i in frequencies) @noindent This example removes all the elements from the array @code{frequencies}. Once an element is deleted, a subsequent @code{for} statement to scan the array -does not report that element and the @code{in} operator to check for +does not report that element and using the @code{in} operator to check for the presence of that element returns zero (i.e., false): @example @@ -16004,7 +15989,7 @@ a[1][2] = 2 This simulates a true two-dimensional array. Each subarray element can contain another subarray as a value, which in turn can hold other arrays as well. In this way, you can create arrays of three or more dimensions. -The indices can be any @command{awk} expression, including scalars +The indices can be any @command{awk} expressions, including scalars separated by commas (i.e., a regular @command{awk} simulated multidimensional subscript). So the following is valid in @command{gawk}: @@ -16016,7 +16001,7 @@ a[1][3][1, "name"] = "barney" Each subarray and the main array can be of different length. In fact, the elements of an array or its subarray do not all have to have the same type. This means that the main array and any of its subarrays can be -non-rectangular, or jagged in structure. You can assign a scalar value to +nonrectangular, or jagged in structure. You can assign a scalar value to the index @code{4} of the main array @code{a}, even though @code{a[1]} is itself an array and not a scalar: @@ -16040,7 +16025,8 @@ a[4][5][6][7] = "An element in a four-dimensional array" @noindent This removes the scalar value from index @code{4} and then inserts a -subarray of subarray of subarray containing a scalar. You can also +three-level nested subarray +containing a scalar. You can also delete an entire subarray or subarray of subarrays: @example @@ -16051,7 +16037,7 @@ a[4][5] = "An element in subarray a[4]" But recall that you can not delete the main array @code{a} and then use it as a scalar. -The built-in functions which take array arguments can also be used +The built-in functions that take array arguments can also be used with subarrays. For example, the following code fragment uses @code{length()} (@pxref{String Functions}) to determine the number of elements in the main array @code{a} and @@ -16081,7 +16067,7 @@ can be nested to scan all the elements of an array of arrays if it is rectangular in structure. In order to print the contents (scalar values) of a two-dimensional array of arrays (i.e., in which each first-level element is itself an -array, not necessarily of the same length) +array, not necessarily of the same length), you could use the following code: @example @@ -16181,9 +16167,9 @@ versions of @command{awk}. @item Standard @command{awk} simulates multidimensional arrays by separating -subscript values with a comma. The values are concatenated into a +subscript values with commas. The values are concatenated into a single string, separated by the value of @code{SUBSEP}. The fact -that such a subscript was created in this way is not retained; thus +that such a subscript was created in this way is not retained; thus, changing @code{SUBSEP} may have unexpected consequences. You can use @samp{(@var{sub1}, @var{sub2}, @dots{}) in @var{array}} to see if such a multidimensional subscript exists in @var{array}. @@ -16192,7 +16178,7 @@ a multidimensional subscript exists in @var{array}. @command{gawk} provides true arrays of arrays. You use a separate set of square brackets for each dimension in such an array: @code{data[row][col]}, for example. Array elements may thus be either -scalar values (number or string) or another array. +scalar values (number or string) or other arrays. @item Use the @code{isarray()} built-in function to determine if an array @@ -16200,14 +16186,11 @@ element is itself a subarray. @end itemize -@c ENDOFRANGE arrs @node Functions @chapter Functions -@c STARTOFRANGE funcbi @cindex functions, built-in -@c STARTOFRANGE bifunc @cindex built-in functions This @value{CHAPTER} describes @command{awk}'s built-in functions, which fall into three categories: numeric, string, and I/O. @@ -16220,6 +16203,9 @@ Besides the built-in functions, @command{awk} has provisions for writing new functions that the rest of a program can use. The second half of this @value{CHAPTER} describes these @dfn{user-defined} functions. +Finally, we explore indirect function calls, a @command{gawk}-specific +extension that lets you determine at runtime what function is to +be called. @menu * Built-in:: Summarizes the built-in functions. @@ -16229,7 +16215,7 @@ The second half of this @value{CHAPTER} describes these @end menu @node Built-in -@section Built-In Functions +@section Built-in Functions @dfn{Built-in} functions are always available for your @command{awk} program to call. This @value{SECTION} defines all @@ -16252,7 +16238,7 @@ but are summarized here for your convenience. @end menu @node Calling Built-in -@subsection Calling Built-In Functions +@subsection Calling Built-in Functions To call one of @command{awk}'s built-in functions, write the name of the function followed @@ -16303,7 +16289,7 @@ j = atan2(++i, i *= 2) @end example If the order of evaluation is left to right, then @code{i} first becomes -6, and then 12, and @code{atan2()} is called with the two arguments 6 +six, and then 12, and @code{atan2()} is called with the two arguments six and 12. But if the order of evaluation is right to left, @code{i} first becomes 10, then 11, and @code{atan2()} is called with the two arguments 11 and 10. @@ -16384,7 +16370,7 @@ In fact, @command{gawk} uses the BSD @code{random()} function, which is considerably better than @code{rand()}, to produce random numbers.} Often random integers are needed instead. Following is a user-defined function -that can be used to obtain a random non-negative integer less than @var{n}: +that can be used to obtain a random nonnegative integer less than @var{n}: @example function randint(n) @@ -16447,7 +16433,7 @@ for generating random numbers to the value @var{x}. Each seed value leads to a particular sequence of random numbers.@footnote{Computer-generated random numbers really are not truly -random. They are technically known as ``pseudorandom.'' This means +random. They are technically known as @dfn{pseudorandom}. This means that although the numbers in a sequence appear to be random, you can in fact generate the same sequence of random numbers over and over again.} Thus, if the seed is set to the same value a second time, @@ -16479,7 +16465,7 @@ implementations. The functions in this @value{SECTION} look at or change the text of one or more strings. -@code{gawk} understands locales (@pxref{Locales}), and does all +@command{gawk} understands locales (@pxref{Locales}) and does all string processing in terms of @emph{characters}, not @emph{bytes}. This distinction is particularly important to understand for locales where one character may be represented by multiple bytes. Thus, for @@ -16568,7 +16554,7 @@ a[2] = "de" a[3] = "sac" @end example -The @code{asorti()} function works similarly to @code{asort()}, however, +The @code{asorti()} function works similarly to @code{asort()}; however, the @emph{indices} are sorted, instead of the values. Thus, in the previous example, starting with the same initial set of indices and values in @code{a}, calling @samp{asorti(a)} would yield: @@ -16683,7 +16669,7 @@ If @var{find} is not found, @code{index()} returns zero. With BWK @command{awk} and @command{gawk}, it is a fatal error to use a regexp constant for @var{find}. Other implementations allow it, simply treating the regexp -constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER}. +constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER} @item @code{length(}[@var{string}]@code{)} @cindexawkfunc{length} @@ -16766,7 +16752,7 @@ If @option{--posix} is supplied, using an array argument is a fatal error @cindex string, regular expression match @cindex match regexp in string Search @var{string} for the -longest, leftmost substring matched by the regular expression, +longest, leftmost substring matched by the regular expression @var{regexp} and return the character position (index) at which that substring begins (one, if it starts at the beginning of @var{string}). If no match is found, return zero. @@ -16778,7 +16764,7 @@ In the latter case, the string is treated as a regexp to be matched. discussion of the difference between the two forms, and the implications for writing your program correctly. -The order of the first two arguments is backwards from most other string +The order of the first two arguments is the opposite of most other string functions that work with regular expressions, such as @code{sub()} and @code{gsub()}. It might help to remember that for @code{match()}, the order is the same as for the @samp{~} operator: @@ -16867,7 +16853,7 @@ $ @kbd{echo foooobazbarrrrr |} @end example There may not be subscripts for the start and index for every parenthesized -subexpression, because they may not all have matched text; thus they +subexpression, because they may not all have matched text; thus, they should be tested for with the @code{in} operator (@pxref{Reference to Elements}). @@ -16914,13 +16900,13 @@ a regexp describing where to split @var{string} (much as @code{FS} can be a regexp describing where to split input records). If @var{fieldsep} is omitted, the value of @code{FS} is used. @code{split()} returns the number of elements created. -@var{seps} is a @command{gawk} extension with @code{@var{seps}[@var{i}]} +@var{seps} is a @command{gawk} extension, with @code{@var{seps}[@var{i}]} being the separator string between @code{@var{array}[@var{i}]} and @code{@var{array}[@var{i}+1]}. If @var{fieldsep} is a single -space then any leading whitespace goes into @code{@var{seps}[0]} and +space, then any leading whitespace goes into @code{@var{seps}[0]} and any trailing -whitespace goes into @code{@var{seps}[@var{n}]} where @var{n} is the +whitespace goes into @code{@var{seps}[@var{n}]}, where @var{n} is the return value of @code{split()} (i.e., the number of elements in @var{array}). @@ -16933,7 +16919,7 @@ split("cul-de-sac", a, "-", seps) @noindent @cindex strings splitting, example -splits the string @samp{cul-de-sac} into three fields using @samp{-} as the +splits the string @code{"cul-de-sac"} into three fields using @samp{-} as the separator. It sets the contents of the array @code{a} as follows: @example @@ -16958,19 +16944,18 @@ As with input field-splitting, when the value of @var{fieldsep} is the elements of @var{array} but not in @var{seps}, and the elements are separated by runs of whitespace. -Also, as with input field-splitting, if @var{fieldsep} is the null string, each +Also, as with input field splitting, if @var{fieldsep} is the null string, each individual character in the string is split into its own array element. @value{COMMONEXT} Note, however, that @code{RS} has no effect on the way @code{split()} -works. Even though @samp{RS = ""} causes newline to also be an input +works. Even though @samp{RS = ""} causes the newline character to also be an input field separator, this does not affect how @code{split()} splits strings. @cindex dark corner, @code{split()} function Modern implementations of @command{awk}, including @command{gawk}, allow -the third argument to be a regexp constant (@code{/abc/}) as well as a -string. -@value{DARKCORNER} +the third argument to be a regexp constant (@w{@code{/}@dots{}@code{/}}) +as well as a string. @value{DARKCORNER} The POSIX standard allows this as well. @DBXREF{Computed Regexps} for a discussion of the difference between using a string constant or a regexp constant, @@ -17107,7 +17092,7 @@ an @samp{&}: @cindex @code{sub()} function, arguments of @cindex @code{gsub()} function, arguments of As mentioned, the third argument to @code{sub()} must -be a variable, field or array element. +be a variable, field, or array element. Some versions of @command{awk} allow the third argument to be an expression that is not an lvalue. In such a case, @code{sub()} still searches for the pattern and returns zero or one, but the result of @@ -17266,8 +17251,8 @@ example, @code{"a\qb"} is treated as @code{"aqb"}. At the runtime level, the various functions handle sequences of @samp{\} and @samp{&} differently. The situation is (sadly) somewhat complex. -Historically, the @code{sub()} and @code{gsub()} functions treated the two -character sequence @samp{\&} specially; this sequence was replaced in +Historically, the @code{sub()} and @code{gsub()} functions treated the +two-character sequence @samp{\&} specially; this sequence was replaced in the generated text with a single @samp{&}. Any other @samp{\} within the @var{replacement} string that did not precede an @samp{&} was passed through unchanged. This is illustrated in @ref{table-sub-escapes}. @@ -17325,7 +17310,7 @@ _bigskip} @end float @noindent -This table shows both the lexical-level processing, where +This table shows the lexical-level processing, where an odd number of backslashes becomes an even number at the runtime level, as well as the runtime processing done by @code{sub()}. (For the sake of simplicity, the rest of the following tables only show the @@ -17346,7 +17331,7 @@ This is shown in @ref{table-sub-proposed}. @float Table,table-sub-proposed -@caption{GNU @command{awk} rules for @code{sub()} and backslash} +@caption{@command{gawk} rules for @code{sub()} and backslash} @tex \vbox{\bigskip % We need more characters for escape and tab ... @@ -17391,7 +17376,7 @@ _bigskip} @end float In a nutshell, at the runtime level, there are now three special sequences -of characters (@samp{\\\&}, @samp{\\&} and @samp{\&}) whereas historically +of characters (@samp{\\\&}, @samp{\\&}, and @samp{\&}) whereas historically there was only one. However, as in the historical case, any @samp{\} that is not part of one of these three sequences is not special and appears in the output literally. @@ -17457,7 +17442,7 @@ The only case where the difference is noticeable is the last one: @samp{\\\\} is seen as @samp{\\} and produces @samp{\} instead of @samp{\\}. Starting with @value{PVERSION} 3.1.4, @command{gawk} followed the POSIX rules -when @option{--posix} is specified (@pxref{Options}). Otherwise, +when @option{--posix} was specified (@pxref{Options}). Otherwise, it continued to follow the proposed rules, as that had been its behavior for many years. @@ -17525,7 +17510,7 @@ _bigskip} @end ifnottex @end float -Because of the complexity of the lexical and runtime level processing +Because of the complexity of the lexical- and runtime-level processing and the special cases for @code{sub()} and @code{gsub()}, we recommend the use of @command{gawk} and @code{gensub()} when you have to do substitutions. @@ -17551,6 +17536,7 @@ for more information. When closing a coprocess, it is occasionally useful to first close one end of the two-way pipe and then to close the other. This is done by providing a second argument to @code{close()}. This second argument +(@var{how}) should be one of the two string values @code{"to"} or @code{"from"}, indicating which end of the pipe to close. Case in the string does not matter. @@ -17577,7 +17563,7 @@ every little bit of information as soon as it is ready. However, sometimes it is necessary to force a program to @dfn{flush} its buffers (i.e., write the information to its destination, even if a buffer is not full). This is the purpose of the @code{fflush()} function---@command{gawk} also -buffers its output and the @code{fflush()} function forces +buffers its output, and the @code{fflush()} function forces @command{gawk} to flush its buffers. @cindex extensions, common@comma{} @code{fflush()} function @@ -17598,7 +17584,7 @@ would flush only the standard output if there was no argument, and flush all output files and pipes if the argument was the null string. This was changed in order to be compatible with Brian Kernighan's @command{awk}, in the hope that standardizing this -feature in POSIX would then be easier (which indeed helped). +feature in POSIX would then be easier (which indeed proved to be the case). With @command{gawk}, you can use @samp{fflush("/dev/stdout")} if you wish to flush @@ -17609,7 +17595,7 @@ only the standard output. @c @cindex warnings, automatic @cindex troubleshooting, @code{fflush()} function @code{fflush()} returns zero if the buffer is successfully flushed; -otherwise, it returns non-zero. (@command{gawk} returns @minus{}1.) +otherwise, it returns a nonzero value. (@command{gawk} returns @minus{}1.) In the case where all buffers are flushed, the return value is zero only if all buffers were flushed successfully. Otherwise, it is @minus{}1, and @command{gawk} warns about the problem @var{filename}. @@ -17622,8 +17608,8 @@ In such a case, @code{fflush()} returns @minus{}1, as well. @sidebar Interactive Versus Noninteractive Buffering @cindex buffering, interactive vs.@: noninteractive -As a side point, buffering issues can be even more confusing, depending -upon whether your program is @dfn{interactive} (i.e., communicating +As a side point, buffering issues can be even more confusing if +your program is @dfn{interactive} (i.e., communicating with a user sitting at a keyboard).@footnote{A program is interactive if the standard output is connected to a terminal device. On modern systems, this means your keyboard and screen.} @@ -17666,7 +17652,7 @@ it is all buffered and sent down the pipe to @command{cat} in one shot. @cindexawkfunc{system} @cindex invoke shell command @cindex interacting with other programs -Execute the operating-system +Execute the operating system command @var{command} and then return to the @command{awk} program. Return @var{command}'s exit status. @@ -17770,18 +17756,14 @@ you would see the latter (undesirable) output. @subsection Time Functions @cindex time functions -@c STARTOFRANGE tst @cindex timestamps -@c STARTOFRANGE logftst @cindex log files, timestamps in -@c STARTOFRANGE filogtst @cindex files, log@comma{} timestamps in -@c STARTOFRANGE gawtst @cindex @command{gawk}, timestamps @cindex POSIX @command{awk}, timestamps and -@code{awk} programs are commonly used to process log files +@command{awk} programs are commonly used to process log files containing timestamp information, indicating when a -particular log record was written. Many programs log their timestamp +particular log record was written. Many programs log their timestamps in the form returned by the @code{time()} system call, which is the number of seconds since a particular epoch. On POSIX-compliant systems, it is the number of seconds since @@ -17808,6 +17790,7 @@ which is sufficient to represent times through 2038-01-19 03:14:07 UTC. Many systems support a wider range of timestamps, including negative timestamps that represent times before the epoch. +@c FIXME: Use @sup here for superscript @cindex @command{date} utility, GNU @cindex time, retrieving @@ -17842,7 +17825,7 @@ The values of these numbers need not be within the ranges specified; for example, an hour of @minus{}1 means 1 hour before midnight. The origin-zero Gregorian calendar is assumed, with year 0 preceding year 1 and year @minus{}1 preceding year 0. -The time is assumed to be in the local timezone. +The time is assumed to be in the local time zone. If the daylight-savings flag is positive, the time is assumed to be daylight savings time; if zero, the time is assumed to be standard time; and if negative (the default), @code{mktime()} attempts to determine @@ -17854,7 +17837,6 @@ is out of range, @code{mktime()} returns @minus{}1. @cindex @command{gawk}, @code{PROCINFO} array in @cindex @code{PROCINFO} array @item @code{strftime(}[@var{format} [@code{,} @var{timestamp} [@code{,} @var{utc-flag}] ] ]@code{)} -@c STARTOFRANGE strf @cindexgawkfunc{strftime} @cindex format time string Format the time specified by @var{timestamp} @@ -18003,12 +17985,12 @@ Equivalent to specifying @samp{%H:%M:%S}. The weekday as a decimal number (1--7). Monday is day one. @item %U -The week number of the year (the first Sunday as the first day of week one) +The week number of the year (with the first Sunday as the first day of week one) as a decimal number (00--53). @c @cindex ISO 8601 @item %V -The week number of the year (the first Monday as the first +The week number of the year (with the first Monday as the first day of week one) as a decimal number (01--53). The method for determining the week number is as specified by ISO 8601. (To wit: if the week containing January 1 has four or more days in the @@ -18019,7 +18001,7 @@ and the next week is week one.) The weekday as a decimal number (0--6). Sunday is day zero. @item %W -The week number of the year (the first Monday as the first day of week one) +The week number of the year (with the first Monday as the first day of week one) as a decimal number (00--53). @item %x @@ -18039,8 +18021,8 @@ The full year as a decimal number (e.g., 2015). @c @cindex RFC 822 @c @cindex RFC 1036 @item %z -The timezone offset in a +HHMM format (e.g., the format necessary to -produce RFC 822/RFC 1036 date headers). +The time zone offset in a @samp{+@var{HHMM}} format (e.g., the format +necessary to produce RFC 822/RFC 1036 date headers). @item %Z The time zone name or abbreviation; no characters if @@ -18103,7 +18085,6 @@ The time as a decimal timestamp in seconds since the epoch. The date in VMS format (e.g., @samp{20-JUN-1991}). @end ignore @end table -@c ENDOFRANGE strf Additionally, the alternative representations are recognized but their normal representations are used. @@ -18154,23 +18135,14 @@ gawk 'BEGIN @{ exit exitval @}' "$@@" @end example -@c ENDOFRANGE tst -@c ENDOFRANGE logftst -@c ENDOFRANGE filogtst -@c ENDOFRANGE gawtst @node Bitwise Functions @subsection Bit-Manipulation Functions @cindex bit-manipulation functions -@c STARTOFRANGE bit @cindex bitwise, operations -@c STARTOFRANGE and @cindex AND bitwise operation -@c STARTOFRANGE oro @cindex OR bitwise operation -@c STARTOFRANGE xor @cindex XOR bitwise operation -@c STARTOFRANGE opbit @cindex operations, bitwise @quotation @i{I can explain it for you, but I can't understand it for you.} @@ -18190,7 +18162,7 @@ The operations are described in @ref{table-bitwise-ops}. @ifnottex @ifnotdocbook @display - Bit Operator + Bit operator | AND | OR | XOR |---+---+---+---+---+--- Operands | 0 | 1 | 0 | 1 | 0 | 1 @@ -18248,7 +18220,7 @@ Operands | 0 | 1 | 0 | 1 | 0 | 1 <tbody> <row> <entry colsep="0"></entry> -<entry spanname="optitle"><emphasis role="bold">Bit Operator</emphasis></entry> +<entry spanname="optitle"><emphasis role="bold">Bit operator</emphasis></entry> </row> <row rowsep="1"> @@ -18312,10 +18284,9 @@ of a given value. Finally, two other common operations are to shift the bits left or right. For example, if you have a bit string @samp{10111001} and you shift it right by three bits, you end up with @samp{00010111}.@footnote{This example -shows that 0's come in on the left side. For @command{gawk}, this is +shows that zeros come in on the left side. For @command{gawk}, this is always true, but in some languages, it's possible to have the left side -fill with 1's.} -@c Purposely decided to use 0's and 1's here. 2/2001. +fill with ones.} If you start over again with @samp{10111001} and shift it left by three bits, you end up with @samp{11001000}. The following list describes @command{gawk}'s built-in functions that implement the bitwise operations. @@ -18369,7 +18340,7 @@ that illustrates the use of these functions: @example @group @c file eg/lib/bits2str.awk -# bits2str --- turn a byte into readable 1's and 0's +# bits2str --- turn a byte into readable ones and zeros function bits2str(bits, data, mask) @{ @@ -18443,15 +18414,16 @@ $ @kbd{gawk -f testbits.awk} @cindex converting, numbers to strings @cindex number as string of bits The @code{bits2str()} function turns a binary number into a string. -The number @code{1} represents a binary value where the rightmost bit -is set to 1. Using this mask, +Initializing @code{mask} to one creates +a binary value where the rightmost bit +is set to one. Using this mask, the function repeatedly checks the rightmost bit. ANDing the mask with the value indicates whether the -rightmost bit is 1 or not. If so, a @code{"1"} is concatenated onto the front +rightmost bit is one or not. If so, a @code{"1"} is concatenated onto the front of the string. Otherwise, a @code{"0"} is added. The value is then shifted right by one bit and the loop continues -until there are no more 1 bits. +until there are no more one bits. If the initial value is zero, it returns a simple @code{"0"}. Otherwise, at the end, it pads the value with zeros to represent multiples @@ -18462,11 +18434,6 @@ decimal and octal values for the same numbers (@pxref{Nondecimal-numbers}), and then demonstrates the results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions. -@c ENDOFRANGE bit -@c ENDOFRANGE and -@c ENDOFRANGE oro -@c ENDOFRANGE xor -@c ENDOFRANGE opbit @node Type Functions @subsection Getting Type Information @@ -18480,7 +18447,7 @@ that traverses every element of an array of arrays @cindexgawkfunc{isarray} @cindex scalar or array @item isarray(@var{x}) -Return a true value if @var{x} is an array. Otherwise return false. +Return a true value if @var{x} is an array. Otherwise, return false. @end table @code{isarray()} is meant for use in two circumstances. The first is when @@ -18541,20 +18508,16 @@ The default value for @var{category} is @code{"LC_MESSAGES"}. Return the plural form used for @var{number} of the translation of @var{string1} and @var{string2} in text domain @var{domain} for locale category @var{category}. @var{string1} is the -English singular variant of a message, and @var{string2} the English plural +English singular variant of a message, and @var{string2} is the English plural variant of the same message. The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. The default value for @var{category} is @code{"LC_MESSAGES"}. @end table -@c ENDOFRANGE funcbi -@c ENDOFRANGE bifunc @node User-defined @section User-Defined Functions -@c STARTOFRANGE udfunc @cindex user-defined functions -@c STARTOFRANGE funcud @cindex functions, user-defined Complicated @command{awk} programs can often be simplified by defining your own functions. User-defined functions can be called just like @@ -18574,12 +18537,11 @@ them (i.e., to tell @command{awk} what they should do). @subsection Function Definition Syntax @quotation -@i{It's entirely fair to say that the @command{awk} syntax for local +@i{It's entirely fair to say that the awk syntax for local variable definitions is appallingly awful.} @author Brian Kernighan @end quotation -@c STARTOFRANGE fdef @cindex functions, defining Definitions of functions can appear anywhere between the rules of an @command{awk} program. Thus, the general form of an @command{awk} program is @@ -18617,14 +18579,23 @@ the call. A function cannot have two parameters with the same name, nor may it have a parameter with the same name as the function itself. -In addition, according to the POSIX standard, function parameters + +@quotation CAUTION +According to the POSIX standard, function parameters cannot have the same name as one of the special predefined variables -(@pxref{Built-in Variables}). Not all versions of @command{awk} enforce -this restriction. +(@pxref{Built-in Variables}), nor may a function parameter have the +same name as another function. + +Not all versions of @command{awk} enforce +these restrictions. +@command{gawk} always enforces the first restriction. +With @option{--posix} (@pxref{Options}), +it also enforces the second restriction. +@end quotation Local variables act like the empty string if referenced where a string value is required, and like zero if referenced where a numeric value -is required. This is the same as regular variables that have never been +is required. This is the same as the behavior of regular variables that have never been assigned a value. (There is more to understand about local variables; @pxref{Dynamic Typing}.) @@ -18658,7 +18629,7 @@ During execution of the function body, the arguments and local variable values hide, or @dfn{shadow}, any variables of the same names used in the rest of the program. The shadowed variables are not accessible in the function definition, because there is no way to name them while their -names have been taken away for the local variables. All other variables +names have been taken away for the arguments and local variables. All other variables used in the @command{awk} program can be referenced or set normally in the function's body. @@ -18725,7 +18696,7 @@ function myprint(num) @end example @noindent -To illustrate, here is an @command{awk} rule that uses our @code{myprint} +To illustrate, here is an @command{awk} rule that uses our @code{myprint()} function: @example @@ -18766,13 +18737,13 @@ in an array and start over with a new list of elements (@pxref{Delete}). Instead of having to repeat this loop everywhere that you need to clear out -an array, your program can just call @code{delarray}. +an array, your program can just call @code{delarray()}. (This guarantees portability. The use of @samp{delete @var{array}} to delete the contents of an entire array is a relatively recent@footnote{Late in 2012.} addition to the POSIX standard.) The following is an example of a recursive function. It takes a string -as an input parameter and returns the string in backwards order. +as an input parameter and returns the string in reverse order. Recursive functions must always have a test that stops the recursion. In this case, the recursion terminates when the input string is already empty: @@ -18826,12 +18797,10 @@ You might think that @code{ctime()} could use @code{PROCINFO["strftime"]} for its format string. That would be a mistake, because @code{ctime()} is supposed to return the time formatted in a standard fashion, and user-level code could have changed @code{PROCINFO["strftime"]}. -@c ENDOFRANGE fdef @node Function Caveats @subsection Calling User-Defined Functions -@c STARTOFRANGE fudc @cindex functions, user-defined, calling @dfn{Calling a function} means causing the function to run and do its job. A function call is an expression and its value is the value returned by @@ -18871,7 +18840,7 @@ an error. @cindex local variables, in a function @cindex variables, local to a function -Unlike many languages, +Unlike in many languages, there is no way to make a variable local to a @code{@{} @dots{} @code{@}} block in @command{awk}, but you can make a variable local to a function. It is good practice to do so whenever a variable is needed only in that @@ -18880,7 +18849,7 @@ function. To make a variable local to a function, simply declare the variable as an argument after the actual function arguments (@pxref{Definition Syntax}). -Look at the following example where variable +Look at the following example, where variable @code{i} is a global variable used by both functions @code{foo()} and @code{bar()}: @@ -18921,7 +18890,7 @@ foo's i=3 top's i=3 @end example -If you want @code{i} to be local to both @code{foo()} and @code{bar()} do as +If you want @code{i} to be local to both @code{foo()} and @code{bar()}, do as follows (the extra space before @code{i} is a coding convention to indicate that @code{i} is a local variable, not an argument): @@ -19009,7 +18978,7 @@ declare explicitly whether the arguments are passed @dfn{by value} or @dfn{by reference}. Instead, the passing convention is determined at runtime when -the function is called according to the following rule: +the function is called, according to the following rule: if the argument is an array variable, then it is passed by reference. Otherwise, the argument is passed by value. @@ -19086,7 +19055,7 @@ prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because @cindex undefined functions @cindex functions, undefined Some @command{awk} implementations allow you to call a function that -has not been defined. They only report a problem at runtime when the +has not been defined. They only report a problem at runtime, when the program actually tries to call the function. For example: @example @@ -19123,7 +19092,6 @@ or the @code{nextfile} statement @end ifnotdocbook inside a user-defined function. @command{gawk} does not have this limitation. -@c ENDOFRANGE fudc @node Return Statement @subsection The @code{return} Statement @@ -19146,15 +19114,15 @@ makes the returned value undefined, and therefore, unpredictable. In practice, though, all versions of @command{awk} simply return the null string, which acts like zero if used in a numeric context. -A @code{return} statement with no value expression is assumed at the end of -every function definition. So if control reaches the end of the function -body, then technically, the function returns an unpredictable value. +A @code{return} statement without an @var{expression} is assumed at the end of +every function definition. So, if control reaches the end of the function +body, then technically the function returns an unpredictable value. In practice, it returns the empty string. @command{awk} does @emph{not} warn you if you use the return value of such a function. Sometimes, you want to write a function for what it does, not for what it returns. Such a function corresponds to a @code{void} function -in C, C++ or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not +in C, C++, or Java, or to a @code{procedure} in Ada. Thus, it may be appropriate to not return any value; simply bear in mind that you should not be using the return value of such a function. @@ -19251,7 +19219,6 @@ does report the second error. Usually, such things aren't a big issue, but it's worth being aware of them. -@c ENDOFRANGE udfunc @node Indirect Calls @section Indirect Function Calls @@ -19274,13 +19241,15 @@ function calls, you can specify the name of the function to call as a string variable, and then call the function. Let's look at an example. Suppose you have a file with your test scores for the classes you -are taking. The first field is the class name. The following fields +are taking, and +you wish to get the sum and the average of +your test scores. +The first field is the class name. The following fields are the functions to call to process the data, up to a ``marker'' field @samp{data:}. Following the marker, to the end of the record, are the various numeric test scores. -Here is the initial file; you wish to get the sum and the average of -your test scores: +Here is the initial file: @example @c file eg/data/class_data1 @@ -19363,9 +19332,9 @@ function sum(first, last, ret, i) @c endfile @end example -These two functions expect to work on fields; thus the parameters +These two functions expect to work on fields; thus, the parameters @code{first} and @code{last} indicate where in the fields to start and end. -Otherwise they perform the expected computations and are not unusual: +Otherwise, they perform the expected computations and are not unusual: @example @c file eg/prog/indirectcall.awk @@ -19424,8 +19393,8 @@ The ability to use indirect function calls is more powerful than you may think at first. The C and C++ languages provide ``function pointers,'' which are a mechanism for calling a function chosen at runtime. One of the most well-known uses of this ability is the C @code{qsort()} function, which sorts -an array using the famous ``quick sort'' algorithm -(see @uref{http://en.wikipedia.org/wiki/Quick_sort, the Wikipedia article} +an array using the famous ``quicksort'' algorithm +(see @uref{http://en.wikipedia.org/wiki/Quicksort, the Wikipedia article} for more information). To use this function, you supply a pointer to a comparison function. This mechanism allows you to sort arbitrary data in an arbitrary fashion. @@ -19444,11 +19413,11 @@ We can do something similar using @command{gawk}, like this: # January 2009 @c endfile - @end ignore @c file eg/lib/quicksort.awk -# quicksort --- C.A.R. Hoare's quick sort algorithm. See Wikipedia -# or almost any algorithms or computer science text + +# quicksort --- C.A.R. Hoare's quicksort algorithm. See Wikipedia +# or almost any algorithms or computer science text. @c endfile @ignore @c file eg/lib/quicksort.awk @@ -19486,7 +19455,7 @@ function quicksort_swap(data, i, j, temp) The @code{quicksort()} function receives the @code{data} array, the starting and ending indices to sort (@code{left} and @code{right}), and the name of a function that -performs a ``less than'' comparison. It then implements the quick sort algorithm. +performs a ``less than'' comparison. It then implements the quicksort algorithm. To make use of the sorting function, we return to our previous example. The first thing to do is write some comparison functions: @@ -19597,67 +19566,7 @@ $ @kbd{gawk -f quicksort.awk -f indirectcall.awk class_data2} @end example Another example where indirect functions calls are useful can be found in -processing arrays. @DBREF{Walking Arrays} presented a simple function -for ``walking'' an array of arrays. That function simply printed the -name and value of each scalar array element. However, it is easy to -generalize that function, by passing in the name of a function to call -when walking an array. The modified function looks like this: - -@example -@c file eg/lib/processarray.awk -function process_array(arr, name, process, do_arrays, i, new_name) -@{ - for (i in arr) @{ - new_name = (name "[" i "]") - if (isarray(arr[i])) @{ - if (do_arrays) - @@process(new_name, arr[i]) - process_array(arr[i], new_name, process, do_arrays) - @} else - @@process(new_name, arr[i]) - @} -@} -@c endfile -@end example - -The arguments are as follows: - -@table @code -@item arr -The array. - -@item name -The name of the array (a string). - -@item process -The name of the function to call. - -@item do_arrays -If this is true, the function can handle elements that are subarrays. -@end table - -If subarrays are to be processed, that is done before walking them further. - -When run with the following scaffolding, the function produces the same -results as does the earlier @code{walk_array()} function: - -@example -BEGIN @{ - a[1] = 1 - a[2][1] = 21 - a[2][2] = 22 - a[3] = 3 - a[4][1][1] = 411 - a[4][2] = 42 - - process_array(a, "a", "do_print", 0) -@} - -function do_print(name, element) -@{ - printf "%s = %s\n", name, element -@} -@end example +processing arrays. This is described in @ref{Walking Arrays}. Remember that you must supply a leading @samp{@@} in front of an indirect function call. @@ -19677,7 +19586,7 @@ for (i = 1; i <= n; i++) @end example @noindent -@code{gawk} looks up the actual function to call only once. +@command{gawk} looks up the actual function to call only once. @node Functions Summary @section Summary @@ -19744,7 +19653,6 @@ program. This is equivalent to function pointers in C and C++. @end itemize -@c ENDOFRANGE funcud @ifnotinfo @part @value{PART2}Problem Solving with @command{awk} @@ -19766,18 +19674,15 @@ It contains the following chapters: @node Library Functions @chapter A Library of @command{awk} Functions -@c STARTOFRANGE libf @cindex libraries of @command{awk} functions -@c STARTOFRANGE flib @cindex functions, library -@c STARTOFRANGE fudlib @cindex functions, user-defined, library of @DBREF{User-defined} describes how to write your own @command{awk} functions. Writing functions is important, because it allows you to encapsulate algorithms and program tasks in a single place. It simplifies programming, making program development more -manageable, and making programs more readable. +manageable and making programs more readable. @cindex Kernighan, Brian @cindex Plauger, P.J.@: @@ -19906,7 +19811,7 @@ often use variable names like these for their own purposes. The example programs shown in this @value{CHAPTER} all start the names of their private variables with an underscore (@samp{_}). Users generally don't use leading underscores in their variable names, so this convention immediately -decreases the chances that the variable name will be accidentally shared +decreases the chances that the variable names will be accidentally shared with the user's program. @cindex @code{_} (underscore), in names of private variables @@ -19924,8 +19829,8 @@ show how our own @command{awk} programming style has evolved and to provide some basis for this discussion.} As a final note on variable naming, if a function makes global variables -available for use by a main program, it is a good convention to start that -variable's name with a capital letter---for +available for use by a main program, it is a good convention to start those +variables' names with a capital letter---for example, @code{getopt()}'s @code{Opterr} and @code{Optind} variables (@pxref{Getopt Function}). The leading capital letter indicates that it is global, while the fact that @@ -19936,7 +19841,7 @@ not one of @command{awk}'s predefined variables, such as @code{FS}. It is also important that @emph{all} variables in library functions that do not need to save state are, in fact, declared local.@footnote{@command{gawk}'s @option{--dump-variables} command-line -option is useful for verifying this.} If this is not done, the variable +option is useful for verifying this.} If this is not done, the variables could accidentally be used in the user's program, leading to bugs that are very difficult to track down: @@ -20093,13 +19998,9 @@ be tested with @command{gawk} and the results compared to the built-in @node Assert Function @subsection Assertions -@c STARTOFRANGE asse @cindex assertions -@c STARTOFRANGE assef @cindex @code{assert()} function (C library) -@c STARTOFRANGE libfass @cindex libraries of @command{awk} functions, assertions -@c STARTOFRANGE flibass @cindex functions, library, assertions @cindex @command{awk} programs, lengthy, assertions When writing large programs, it is often useful to know @@ -20138,7 +20039,7 @@ Following is the function: @example @c file eg/lib/assert.awk -# assert --- assert that a condition is true. Otherwise exit. +# assert --- assert that a condition is true. Otherwise, exit. @c endfile @ignore @@ -20174,7 +20075,7 @@ is false, it prints a message to standard error, using the @code{string} parameter to describe the failed condition. It then sets the variable @code{_assert_exit} to one and executes the @code{exit} statement. The @code{exit} statement jumps to the @code{END} rule. If the @code{END} -rules finds @code{_assert_exit} to be true, it exits immediately. +rule finds @code{_assert_exit} to be true, it exits immediately. The purpose of the test in the @code{END} rule is to keep any other @code{END} rules from running. When an assertion fails, the @@ -20215,10 +20116,6 @@ most likely causing the program to hang as it waits for input. There is a simple workaround to this: make sure that such a @code{BEGIN} rule always ends with an @code{exit} statement. -@c ENDOFRANGE asse -@c ENDOFRANGE assef -@c ENDOFRANGE flibass -@c ENDOFRANGE libfass @node Round Function @subsection Rounding Numbers @@ -20470,7 +20367,7 @@ all the strings in an array into one long string. The following function, the application programs (@pxref{Sample Programs}). -Good function design is important; this function needs to be general but it +Good function design is important; this function needs to be general, but it should also have a reasonable default behavior. It is called with an array as well as the beginning and ending indices of the elements in the array to be merged. This assumes that the array indices are numeric---a reasonable @@ -20618,7 +20515,7 @@ allowed the user to supply an optional timestamp value to use instead of the current time. @node Readfile Function -@subsection Reading a Whole File At Once +@subsection Reading a Whole File at Once Often, it is convenient to have the entire contents of a file available in memory as a single string. A straightforward but naive way to @@ -20675,13 +20572,13 @@ function readfile(file, tmp, save_rs) It works by setting @code{RS} to @samp{^$}, a regular expression that will never match if the file has contents. @command{gawk} reads data from -the file into @code{tmp} attempting to match @code{RS}. The match fails +the file into @code{tmp}, attempting to match @code{RS}. The match fails after each read, but fails quickly, such that @command{gawk} fills @code{tmp} with the entire contents of the file. (@DBXREF{Records} for information on @code{RT} and @code{RS}.) In the case that @code{file} is empty, the return value is the null -string. Thus calling code may use something like: +string. Thus, calling code may use something like: @example contents = readfile("/some/path") @@ -20692,7 +20589,7 @@ if (length(contents) == 0) This tests the result to see if it is empty or not. An equivalent test would be @samp{contents == ""}. -@xref{Extension Sample Readfile}, for an extension function that +@DBXREF{Extension Sample Readfile} for an extension function that also reads an entire file into memory. @node Shell Quoting @@ -20776,11 +20673,8 @@ function shell_quote(s, # parameter @node Data File Management @section @value{DDF} Management -@c STARTOFRANGE dataf @cindex files, managing -@c STARTOFRANGE libfdataf @cindex libraries of @command{awk} functions, managing, data files -@c STARTOFRANGE flibdataf @cindex functions, library, managing data files This @value{SECTION} presents functions that are useful for managing command-line @value{DF}s. @@ -20802,8 +20696,8 @@ The @code{BEGIN} and @code{END} rules are each executed exactly once, at the beginning and end of your @command{awk} program, respectively (@pxref{BEGIN/END}). We (the @command{gawk} authors) once had a user who mistakenly thought that the -@code{BEGIN} rule is executed at the beginning of each @value{DF} and the -@code{END} rule is executed at the end of each @value{DF}. +@code{BEGIN} rules were executed at the beginning of each @value{DF} and the +@code{END} rules were executed at the end of each @value{DF}. When informed that this was not the case, the user requested that we add new special @@ -20843,7 +20737,7 @@ END @{ endfile(FILENAME) @} This file must be loaded before the user's ``main'' program, so that the rule it supplies is executed first. -This rule relies on @command{awk}'s @code{FILENAME} variable that +This rule relies on @command{awk}'s @code{FILENAME} variable, which automatically changes for each new @value{DF}. The current @value{FN} is saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does not equal @code{_oldfilename}, then a new @value{DF} is being processed and @@ -20859,7 +20753,7 @@ first @value{DF}. The program also supplies an @code{END} rule to do the final processing for the last file. Because this @code{END} rule comes before any @code{END} rules supplied in the ``main'' program, @code{endfile()} is called first. Once -again the value of multiple @code{BEGIN} and @code{END} rules should be clear. +again, the value of multiple @code{BEGIN} and @code{END} rules should be clear. @cindex @code{beginfile()} user-defined function @cindex @code{endfile()} user-defined function @@ -20902,7 +20796,7 @@ how it simplifies writing the main program. You are probably wondering, if @code{beginfile()} and @code{endfile()} functions can do the job, why does @command{gawk} have -@code{BEGINFILE} and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})? +@code{BEGINFILE} and @code{ENDFILE} patterns? Good question. Normally, if @command{awk} cannot open a file, this causes an immediate fatal error. In this case, there is no way for a @@ -20911,13 +20805,14 @@ calling it relies on the file being open and at the first record. Thus, the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch files that cannot be processed. @code{ENDFILE} exists for symmetry, and because it provides an easy way to do per-file cleanup processing. +For more information, refer to @ref{BEGINFILE/ENDFILE}. @end sidebar @node Rewind Function @subsection Rereading the Current File @cindex files, reading -Another request for a new built-in function was for a @code{rewind()} +Another request for a new built-in function was for a function that would make it possible to reread the current file. The requesting user didn't want to have to use @code{getline} (@pxref{Getline}) @@ -20926,7 +20821,7 @@ inside a loop. However, as long as you are not in the @code{END} rule, it is quite easy to arrange to immediately close the current input file and then start over with it from the top. -For lack of a better name, we'll call it @code{rewind()}: +For lack of a better name, we'll call the function @code{rewind()}: @cindex @code{rewind()} user-defined function @example @@ -21019,16 +20914,16 @@ See also @ref{ARGC and ARGV}. Because @command{awk} variable names only allow the English letters, the regular expression check purposely does not use character classes such as @samp{[:alpha:]} and @samp{[:alnum:]} -(@pxref{Bracket Expressions}) +(@pxref{Bracket Expressions}). @node Empty Files -@subsection Checking for Zero-length Files +@subsection Checking for Zero-Length Files All known @command{awk} implementations silently skip over zero-length files. This is a by-product of @command{awk}'s implicit read-a-record-and-match-against-the-rules loop: when @command{awk} tries to read a record from an empty file, it immediately receives an -end of file indication, closes the file, and proceeds on to the next +end-of-file indication, closes the file, and proceeds on to the next command-line @value{DF}, @emph{without} executing any user-level @command{awk} program code. @@ -21093,7 +20988,7 @@ Occasionally, you might not want @command{awk} to process command-line variable assignments (@pxref{Assignment Options}). In particular, if you have a @value{FN} that contains an @samp{=} character, -@command{awk} treats the @value{FN} as an assignment, and does not process it. +@command{awk} treats the @value{FN} as an assignment and does not process it. Some users have suggested an additional command-line option for @command{gawk} to disable command-line assignments. However, some simple programming with @@ -21143,22 +21038,14 @@ The use of @code{No_command_assign} allows you to disable command-line assignments at invocation time, by giving the variable a true value. When not set, it is initially zero (i.e., false), so the command-line arguments are left alone. -@c ENDOFRANGE dataf -@c ENDOFRANGE flibdataf -@c ENDOFRANGE libfdataf @node Getopt Function @section Processing Command-Line Options -@c STARTOFRANGE libfclo @cindex libraries of @command{awk} functions, command-line options -@c STARTOFRANGE flibclo @cindex functions, library, command-line options -@c STARTOFRANGE clop @cindex command-line options, processing -@c STARTOFRANGE oclp @cindex options, command-line, processing -@c STARTOFRANGE clibf @cindex functions, library, C library @cindex arguments, processing Most utilities on POSIX-compatible systems take options on @@ -21463,8 +21350,8 @@ BEGIN @{ @c endfile @end example -The rest of the @code{BEGIN} rule is a simple test program. Here is the -result of two sample runs of the test program: +The rest of the @code{BEGIN} rule is a simple test program. Here are the +results of two sample runs of the test program: @example $ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x} @@ -21510,27 +21397,19 @@ further options Several of the sample programs presented in @ref{Sample Programs}, use @code{getopt()} to process their arguments. -@c ENDOFRANGE libfclo -@c ENDOFRANGE flibclo -@c ENDOFRANGE clop -@c ENDOFRANGE oclp @node Passwd Functions @section Reading the User Database -@c STARTOFRANGE libfudata @cindex libraries of @command{awk} functions, user database, reading -@c STARTOFRANGE flibudata @cindex functions, library, user database@comma{} reading -@c STARTOFRANGE udatar @cindex user database@comma{} reading -@c STARTOFRANGE dataur @cindex database, users@comma{} reading @cindex @code{PROCINFO} array The @code{PROCINFO} array (@pxref{Built-in Variables}) provides access to the current user's real and effective user and group ID -numbers, and if available, the user's supplementary group set. +numbers, and, if available, the user's supplementary group set. However, because these are numbers, they do not provide very useful information to the average user. There needs to be some way to find the user information associated with the user and group ID numbers. This @@ -21550,7 +21429,7 @@ kept. Instead, it provides the @code{<pwd.h>} header file and several C language subroutines for obtaining user information. The primary function is @code{getpwent()}, for ``get password entry.'' The ``password'' comes from the original user database file, -@file{/etc/passwd}, which stores user information, along with the +@file{/etc/passwd}, which stores user information along with the encrypted passwords (hence the name). @cindex @command{pwcat} program @@ -21649,7 +21528,7 @@ The user's encrypted password. This may not be available on some systems. @item User-ID The user's numeric user ID number. -(On some systems, it's a C @code{long}, and not an @code{int}. Thus +(On some systems, it's a C @code{long}, and not an @code{int}. Thus, we cast it to @code{long} for all cases.) @item Group-ID @@ -21776,7 +21655,7 @@ The code that checks for using @code{FPAT}, using @code{using_fpat} and @code{PROCINFO["FS"]}, is similar. The main part of the function uses a loop to read database lines, split -the line into fields, and then store the line into each array as necessary. +the lines into fields, and then store the lines into each array as necessary. When the loop is done, @code{@w{_pw_init()}} cleans up by closing the pipeline, setting @code{@w{_pw_inited}} to one, and restoring @code{FS} (and @code{FIELDWIDTHS} or @code{FPAT} @@ -21871,21 +21750,13 @@ and such a change would clutter up the code. The @command{id} program in @DBREF{Id Program} uses these functions. -@c ENDOFRANGE libfudata -@c ENDOFRANGE flibudata -@c ENDOFRANGE udatar -@c ENDOFRANGE dataur @node Group Functions @section Reading the Group Database -@c STARTOFRANGE libfgdata @cindex libraries of @command{awk} functions, group database, reading -@c STARTOFRANGE flibgdata @cindex functions, library, group database@comma{} reading -@c STARTOFRANGE gdatar @cindex group database, reading -@c STARTOFRANGE datagr @cindex database, group, reading @cindex @code{PROCINFO} array, and group membership @cindex @code{getgrent()} function (C library) @@ -22001,7 +21872,7 @@ it is usually empty or set to @samp{*}. @item Group ID Number The group's numeric group ID number; the association of name to number must be unique within the file. -(On some systems it's a C @code{long}, and not an @code{int}. Thus +(On some systems it's a C @code{long}, and not an @code{int}. Thus, we cast it to @code{long} for all cases.) @item Group Member List @@ -22115,32 +21986,32 @@ The @code{@w{_gr_init()}} function first saves @code{FS}, @code{$0}, and then sets @code{FS} and @code{RS} to the correct values for scanning the group information. It also takes care to note whether @code{FIELDWIDTHS} or @code{FPAT} -is being used, and to restore the appropriate field splitting mechanism. +is being used, and to restore the appropriate field-splitting mechanism. -The group information is stored is several associative arrays. +The group information is stored in several associative arrays. The arrays are indexed by group name (@code{@w{_gr_byname}}), by group ID number (@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}). There is an additional array indexed by username (@code{@w{_gr_groupsbyuser}}), which is a space-separated list of groups to which each user belongs. -Unlike the user database, it is possible to have multiple records in the +Unlike in the user database, it is possible to have multiple records in the database for the same group. This is common when a group has a large number of members. A pair of such entries might look like the following: @example -tvpeople:*:101:johny,jay,arsenio +tvpeople:*:101:johnny,jay,arsenio tvpeople:*:101:david,conan,tom,joan @end example For this reason, @code{_gr_init()} looks to see if a group name or -group ID number is already seen. If it is, the usernames are -simply concatenated onto the previous list of users.@footnote{There is actually a +group ID number is already seen. If so, the usernames are +simply concatenated onto the previous list of users.@footnote{There is a subtle problem with the code just presented. Suppose that the first time there were no names. This code adds the names with a leading comma. It also doesn't check that there is a @code{$4}.} Finally, @code{_gr_init()} closes the pipeline to @command{grcat}, restores -@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT} if necessary), @code{RS}, and @code{$0}, +@code{FS} (and @code{FIELDWIDTHS} or @code{FPAT}, if necessary), @code{RS}, and @code{$0}, initializes @code{_gr_count} to zero (it is used later), and makes @code{_gr_inited} nonzero. @@ -22208,7 +22079,6 @@ function getgrent() @} @c endfile @end example -@c ENDOFRANGE clibf @cindex @code{endgrent()} function (C library) The @code{endgrent()} function resets @code{_gr_count} to zero so that @code{getgrent()} can @@ -22241,12 +22111,12 @@ uses these functions. @DBREF{Arrays of Arrays} described how @command{gawk} provides arrays of arrays. In particular, any element of -an array may be either a scalar, or another array. The +an array may be either a scalar or another array. The @code{isarray()} function (@pxref{Type Functions}) lets you distinguish an array from a scalar. The following function, @code{walk_array()}, recursively traverses -an array, printing each element's indices and value. +an array, printing the element indices and values. You call it with the array and a string representing the name of the array: @@ -22297,10 +22167,66 @@ $ @kbd{gawk -f walk_array.awk} @print{} a[4][2] = 42 @end example -@c ENDOFRANGE libfgdata -@c ENDOFRANGE flibgdata -@c ENDOFRANGE gdatar -@c ENDOFRANGE libf +The function just presented simply prints the +name and value of each scalar array element. However, it is easy to +generalize it, by passing in the name of a function to call +when walking an array. The modified function looks like this: + +@example +@c file eg/lib/processarray.awk +function process_array(arr, name, process, do_arrays, i, new_name) +@{ + for (i in arr) @{ + new_name = (name "[" i "]") + if (isarray(arr[i])) @{ + if (do_arrays) + @@process(new_name, arr[i]) + process_array(arr[i], new_name, process, do_arrays) + @} else + @@process(new_name, arr[i]) + @} +@} +@c endfile +@end example + +The arguments are as follows: + +@table @code +@item arr +The array. + +@item name +The name of the array (a string). + +@item process +The name of the function to call. + +@item do_arrays +If this is true, the function can handle elements that are subarrays. +@end table + +If subarrays are to be processed, that is done before walking them further. + +When run with the following scaffolding, the function produces the same +results as does the earlier version of @code{walk_array()}: + +@example +BEGIN @{ + a[1] = 1 + a[2][1] = 21 + a[2][2] = 22 + a[3] = 3 + a[4][1][1] = 411 + a[4][2] = 42 + + process_array(a, "a", "do_print", 0) +@} + +function do_print(name, element) +@{ + printf "%s = %s\n", name, element +@} +@end example @node Library Functions Summary @section Summary @@ -22322,24 +22248,24 @@ The functions presented here fit into the following categories: @c nested list @table @asis @item General problems -Number-to-string conversion, assertions, rounding, random number +Number-to-string conversion, testing assertions, rounding, random number generation, converting characters to numbers, joining strings, getting easily usable time-of-day information, and reading a whole file in -one shot. +one shot @item Managing @value{DF}s Noting @value{DF} boundaries, rereading the current file, checking for readable files, checking for zero-length files, and treating assignments -as @value{FN}s. +as @value{FN}s @item Processing command-line options -An @command{awk} version of the standard C @code{getopt()} function. +An @command{awk} version of the standard C @code{getopt()} function @item Reading the user and group databases -Two sets of routines that parallel the C library versions. +Two sets of routines that parallel the C library versions @item Traversing arrays of arrays -A simple function to traverse an array of arrays to any depth. +Two functions that traverse an array of arrays to any depth @end table @c end nested list @@ -22414,13 +22340,9 @@ output identical to that of the original version. @end enumerate @c EXCLUDE END -@c ENDOFRANGE flib -@c ENDOFRANGE fudlib -@c ENDOFRANGE datagr @node Sample Programs @chapter Practical @command{awk} Programs -@c STARTOFRANGE awkpex @cindex @command{awk} programs, examples of @c FULLXREF ON @@ -22438,10 +22360,10 @@ in this @value{CHAPTER}. The second presents @command{awk} versions of several common POSIX utilities. These are programs that you are hopefully already familiar with, -and therefore, whose problems are understood. +and therefore whose problems are understood. By reimplementing these programs in @command{awk}, you can focus on the @command{awk}-related aspects of solving -the programming problem. +the programming problems. The third is a grab bag of interesting programs. These solve a number of different data-manipulation and management @@ -22490,7 +22412,6 @@ cut.awk -- -c1-8 myfiles > results @node Clones @section Reinventing Wheels for Fun and Profit -@c STARTOFRANGE posimawk @cindex POSIX, programs@comma{} implementing in @command{awk} This @value{SECTION} presents a number of POSIX utilities implemented in @@ -22502,7 +22423,7 @@ It should be noted that these programs are not necessarily intended to replace the installed versions on your system. Nor may all of these programs be fully compliant with the most recent POSIX standard. This is not a problem; their -purpose is to illustrate @command{awk} language programming for ``real world'' +purpose is to illustrate @command{awk} language programming for ``real-world'' tasks. The programs are presented in alphabetical order. @@ -22521,11 +22442,8 @@ The programs are presented in alphabetical order. @subsection Cutting Out Fields and Columns @cindex @command{cut} utility -@c STARTOFRANGE cut @cindex @command{cut} utility -@c STARTOFRANGE ficut @cindex fields, cutting -@c STARTOFRANGE colcut @cindex columns, cutting The @command{cut} utility selects, or ``cuts,'' characters or fields from its standard input and sends them to its standard output. @@ -22534,7 +22452,7 @@ but you may supply a command-line option to change the field @dfn{delimiter} (i.e., the field-separator character). @command{cut}'s definition of fields is less general than @command{awk}'s. -A common use of @command{cut} might be to pull out just the login name of +A common use of @command{cut} might be to pull out just the login names of logged-on users from the output of @command{who}. For example, the following pipeline generates a sorted, unique list of the logged-on users: @@ -22833,21 +22751,14 @@ other @command{awk} implementations to use @code{substr()} it is also extremely painful. The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem of picking the input line apart by characters. -@c ENDOFRANGE cut -@c ENDOFRANGE ficut -@c ENDOFRANGE colcut @node Egrep Program @subsection Searching for Regular Expressions in Files -@c STARTOFRANGE regexps @cindex regular expressions, searching for -@c STARTOFRANGE sfregexp @cindex searching, files for regular expressions -@c STARTOFRANGE fsregexp @cindex files, searching for regular expressions -@c STARTOFRANGE egrep @cindex @command{egrep} utility The @command{egrep} utility searches files for patterns. It uses regular expressions that are almost identical to those available in @command{awk} @@ -23050,7 +22961,7 @@ successful or unsuccessful match. If the line does not match, the @code{next} statement just moves on to the next record. A number of additional tests are made, but they are only done if we -are not counting lines. First, if the user only wants exit status +are not counting lines. First, if the user only wants the exit status (@code{no_print} is true), then it is enough to know that @emph{one} line in this file matched, and we can skip on to the next file with @code{nextfile}. Similarly, if we are only printing @value{FN}s, we can @@ -23091,7 +23002,7 @@ if necessary: @end example The @code{END} rule takes care of producing the correct exit status. If -there are no matches, the exit status is one; otherwise it is zero: +there are no matches, the exit status is one; otherwise, it is zero: @example @c file eg/prog/egrep.awk @@ -23115,17 +23026,12 @@ function usage() @c endfile @end example -@c ENDOFRANGE regexps -@c ENDOFRANGE sfregexp -@c ENDOFRANGE fsregexp -@c ENDOFRANGE egrep @node Id Program @subsection Printing Out User Information @cindex printing, user information @cindex users, information about, printing -@c STARTOFRANGE id @cindex @command{id} utility The @command{id} utility lists a user's real and effective user ID numbers, real and effective group ID numbers, and the user's group set, if any. @@ -23148,7 +23054,8 @@ Here is a simple version of @command{id} written in @command{awk}. It uses the user database library functions (@pxref{Passwd Functions}) and the group database library functions -(@pxref{Group Functions}): +(@pxref{Group Functions}) +from @ref{Library Functions}. The program is fairly straightforward. All the work is done in the @code{BEGIN} rule. The user and group ID numbers are obtained from @@ -23254,16 +23161,13 @@ code that is used repeatedly, making the whole program shorter and cleaner. In particular, moving the check for the empty string into this function saves several lines of code. -@c ENDOFRANGE id @node Split Program @subsection Splitting a Large File into Pieces @c FIXME: One day, update to current POSIX version of split -@c STARTOFRANGE filspl @cindex files, splitting -@c STARTOFRANGE split @cindex @code{split} utility The @command{split} program splits large text files into smaller pieces. Usage is as follows:@footnote{This is the traditional usage. The @@ -23278,8 +23182,8 @@ By default, the output files are named @file{xaa}, @file{xab}, and so on. Each file has 1,000 lines in it, with the likely exception of the last file. To change the number of lines in each file, supply a number on the command line -preceded with a minus (e.g., @samp{-500} for files with 500 lines in them -instead of 1,000). To change the name of the output files to something like +preceded with a minus sign (e.g., @samp{-500} for files with 500 lines in them +instead of 1,000). To change the names of the output files to something like @file{myfileaa}, @file{myfileab}, and so on, supply an additional argument that specifies the @value{FN} prefix. @@ -23398,15 +23302,12 @@ You might want to consider how to eliminate the use of way as to solve the EBCDIC issue as well. @end ifset -@c ENDOFRANGE filspl -@c ENDOFRANGE split @node Tee Program @subsection Duplicating Output into Multiple Files @cindex files, multiple@comma{} duplicating output into @cindex output, duplicating into files -@c STARTOFRANGE tee @cindex @code{tee} utility The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies its standard input to its standard output and also duplicates it to the @@ -23519,18 +23420,14 @@ END @{ @} @c endfile @end example -@c ENDOFRANGE tee @node Uniq Program @subsection Printing Nonduplicated Lines of Text @c FIXME: One day, update to current POSIX version of uniq -@c STARTOFRANGE prunt @cindex printing, unduplicated lines of text -@c STARTOFRANGE tpul @cindex text@comma{} printing, unduplicated lines of -@c STARTOFRANGE uniq @cindex @command{uniq} utility The @command{uniq} utility reads sorted lines of data on its standard input, and by default removes duplicate lines. In other words, it only @@ -23799,26 +23696,17 @@ suggestion. @end ifset -@c ENDOFRANGE prunt -@c ENDOFRANGE tpul -@c ENDOFRANGE uniq @node Wc Program @subsection Counting Things @c FIXME: One day, update to current POSIX version of wc -@c STARTOFRANGE count @cindex counting -@c STARTOFRANGE infco @cindex input files, counting elements in -@c STARTOFRANGE woco @cindex words, counting -@c STARTOFRANGE chco @cindex characters, counting -@c STARTOFRANGE lico @cindex lines, counting -@c STARTOFRANGE wc @cindex @command{wc} utility The @command{wc} (word count) utility counts lines, words, and characters in one or more input files. Its usage is as follows: @@ -23988,13 +23876,6 @@ END @{ @} @c endfile @end example -@c ENDOFRANGE count -@c ENDOFRANGE infco -@c ENDOFRANGE lico -@c ENDOFRANGE woco -@c ENDOFRANGE chco -@c ENDOFRANGE wc -@c ENDOFRANGE posimawk @node Miscellaneous Programs @section A Grab Bag of @command{awk} Programs @@ -24125,9 +24006,7 @@ Aharon Robbins <arnold@skeeve.com> wrote: @author Erik Quanstrom @end quotation -@c STARTOFRANGE tialarm @cindex time, alarm clock example program -@c STARTOFRANGE alaex @cindex alarm clock example program The following program is a simple ``alarm clock'' program. You give it a time of day and an optional message. At the specified time, @@ -24143,7 +24022,7 @@ checking and setting of defaults: the delay, the count, and the message to print. If the user supplied a message without the ASCII BEL character (known as the ``alert'' character, @code{"\a"}), then it is added to the message. (On many systems, printing the ASCII BEL generates an -audible alert. Thus when the alarm goes off, the system calls attention +audible alert. Thus, when the alarm goes off, the system calls attention to itself in case the user is not looking at the computer.) Just for a change, this program uses a @code{switch} statement (@pxref{Switch Statement}), but the processing could be done with a series of @@ -24279,15 +24158,11 @@ seconds are necessary: @} @c endfile @end example -@c ENDOFRANGE tialarm -@c ENDOFRANGE alaex @node Translate Program @subsection Transliterating Characters -@c STARTOFRANGE chtra @cindex characters, transliterating -@c STARTOFRANGE tr @cindex @command{tr} utility The system @command{tr} utility transliterates characters. For example, it is often used to map uppercase letters into lowercase for further processing: @@ -24316,7 +24191,7 @@ to @command{gawk}. @c at least theoretically The following program was written to prove that character transliteration could be done with a user-level -function. This program is not as complete as the system @command{tr} utility +function. This program is not as complete as the system @command{tr} utility, but it does most of the job. The @command{translate} program was written long before @command{gawk} @@ -24328,13 +24203,13 @@ takes three arguments: @table @code @item from -A list of characters from which to translate. +A list of characters from which to translate @item to -A list of characters to which to translate. +A list of characters to which to translate @item target -The string on which to do the translation. +The string on which to do the translation @end table Associative arrays make the translation part fairly easy. @code{t_ar} holds @@ -24343,7 +24218,7 @@ loop goes through @code{from}, one character at a time. For each character in @code{from}, if the character appears in @code{target}, it is replaced with the corresponding @code{to} character. -The @code{translate()} function calls @code{stranslate()} using @code{$0} +The @code{translate()} function calls @code{stranslate()}, using @code{$0} as the target. The main program sets two global variables, @code{FROM} and @code{TO}, from the command line, and then changes @code{ARGV} so that @command{awk} reads from the standard input. @@ -24365,7 +24240,7 @@ Finally, the processing rule simply calls @code{translate()} for each record: @c endfile @end ignore @c file eg/prog/translate.awk -# Bugs: does not handle things like: tr A-Z a-z, it has +# Bugs: does not handle things like tr A-Z a-z; it has # to be spelled out. However, if `to' is shorter than `from', # the last character in `to' is used for the rest of `from'. @@ -24435,17 +24310,13 @@ such as @samp{a-z}, as allowed by the @command{tr} utility. Look at the code for @file{cut.awk} (@pxref{Cut Program}) for inspiration. -@c ENDOFRANGE chtra -@c ENDOFRANGE tr @node Labels Program @subsection Printing Mailing Labels -@c STARTOFRANGE prml @cindex printing, mailing labels -@c STARTOFRANGE mlprint @cindex mailing labels@comma{} printing -Here is a ``real world''@footnote{``Real world'' is defined as +Here is a ``real-world''@footnote{``Real world'' is defined as ``a program actually used to get something done.''} program. This script reads lists of names and @@ -24454,7 +24325,7 @@ on it, two across and 10 down. The addresses are guaranteed to be no more than five lines of data. Each address is separated from the next by a blank line. -The basic idea is to read 20 labels worth of data. Each line of each label +The basic idea is to read 20 labels' worth of data. Each line of each label is stored in the @code{line} array. The single rule takes care of filling the @code{line} array and printing the page when 20 labels have been read. @@ -24477,12 +24348,12 @@ of lines on the page Most of the work is done in the @code{printpage()} function. The label lines are stored sequentially in the @code{line} array. But they -have to print horizontally; @code{line[1]} next to @code{line[6]}, +have to print horizontally: @code{line[1]} next to @code{line[6]}, @code{line[2]} next to @code{line[7]}, and so on. Two loops accomplish this. The outer loop, controlled by @code{i}, steps through every 10 lines of data; this is each row of labels. The inner loop, controlled by @code{j}, goes through the lines within the row. -As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}-th line in +As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}th line in the row, and @samp{i+j+5} is the entry next to it. The output ends up looking something like this: @@ -24507,7 +24378,6 @@ that there are two blank lines at the top and two blank lines at the bottom. The @code{END} rule arranges to flush the final page of labels; there may not have been an even multiple of 20 labels in the data: -@c STARTOFRANGE labels @cindex @code{labels.awk} program @example @c file eg/prog/labels.awk @@ -24572,14 +24442,10 @@ END @{ @} @c endfile @end example -@c ENDOFRANGE prml -@c ENDOFRANGE mlprint -@c ENDOFRANGE labels @node Word Sorting @subsection Generating Word-Usage Counts -@c STARTOFRANGE worus @cindex words, usage counts@comma{} generating When working with large amounts of text, it can be interesting to know @@ -24605,8 +24471,8 @@ END @{ @} @end example -The program relies on @command{awk}'s default field splitting -mechanism to break each line up into ``words,'' and uses an +The program relies on @command{awk}'s default field-splitting +mechanism to break each line up into ``words'' and uses an associative array named @code{freq}, indexed by each word, to count the number of times the word occurs. In the @code{END} rule, it prints the counts. @@ -24641,7 +24507,6 @@ to remove punctuation characters. Finally, we solve the third problem by using the system @command{sort} utility to process the output of the @command{awk} script. Here is the new version of the program: -@c STARTOFRANGE wordfreq @cindex @code{wordfreq.awk} program @example @c file eg/prog/wordfreq.awk @@ -24706,16 +24571,13 @@ This way of sorting must be used on systems that do not have true pipes at the command-line (or batch-file) level. See the general operating system documentation for more information on how to use the @command{sort} program. -@c ENDOFRANGE worus -@c ENDOFRANGE wordfreq @node History Sorting @subsection Removing Duplicates from Unsorted Text -@c STARTOFRANGE lidu @cindex lines, duplicate@comma{} removing The @command{uniq} program -(@pxref{Uniq Program}), +(@pxref{Uniq Program}) removes duplicate lines from @emph{sorted} data. Suppose, however, you need to remove duplicate lines from a @value{DF} but @@ -24737,7 +24599,6 @@ Each element of @code{lines} is a unique command, and the indices of The @code{END} rule simply prints out the lines, in order: @cindex Rakitzis, Byron -@c STARTOFRANGE histsort @cindex @code{histsort.awk} program @example @c file eg/prog/histsort.awk @@ -24780,15 +24641,11 @@ print data[lines[i]], lines[i] @noindent This works because @code{data[$0]} is incremented each time a line is seen. -@c ENDOFRANGE lidu -@c ENDOFRANGE histsort @node Extract Program @subsection Extracting Programs from Texinfo Source Files -@c STARTOFRANGE texse @cindex Texinfo, extracting programs from source files -@c STARTOFRANGE fitex @cindex files, Texinfo@comma{} extracting programs from @ifnotinfo Both this chapter and the previous chapter @@ -24807,7 +24664,7 @@ Texinfo input file into separate files. @cindex Texinfo This @value{DOCUMENT} is written in @uref{http://www.gnu.org/software/texinfo/, Texinfo}, -the GNU project's document formatting language. +the GNU Project's document formatting language. A single Texinfo source file can be used to produce both printed documentation, with @TeX{}, and online documentation. @ifnotinfo @@ -24866,7 +24723,7 @@ The Texinfo file looks something like this: @example @dots{} -This program has a @@code@{BEGIN@} rule, +This program has a @@code@{BEGIN@} rule that prints a nice message: @@example @@ -24892,11 +24749,10 @@ The first rule handles calling @code{system()}, checking that a command is given (@code{NF} is at least three) and also checking that the command exits with a zero exit status, signifying OK: -@c STARTOFRANGE extract @cindex @code{extract.awk} program @example @c file eg/prog/extract.awk -# extract.awk --- extract files and run programs from texinfo files +# extract.awk --- extract files and run programs from Texinfo files @c endfile @ignore @c file eg/prog/extract.awk @@ -24937,12 +24793,12 @@ The second rule handles moving data into files. It verifies that a @value{FN} is given in the directive. If the file named is not the current file, then the current file is closed. Keeping the current file open until a new file is encountered allows the use of the @samp{>} -redirection for printing the contents, keeping open file management +redirection for printing the contents, keeping open-file management simple. The @code{for} loop does the work. It reads lines using @code{getline} (@pxref{Getline}). -For an unexpected end of file, it calls the @code{@w{unexpected_eof()}} +For an unexpected end-of-file, it calls the @code{@w{unexpected_eof()}} function. If the line is an ``endfile'' line, then it breaks out of the loop. If the line is an @samp{@@group} or @samp{@@end group} line, then it @@ -25038,16 +24894,13 @@ END @{ @} @c endfile @end example -@c ENDOFRANGE texse -@c ENDOFRANGE fitex -@c ENDOFRANGE extract @node Simple Sed @subsection A Simple Stream Editor @cindex @command{sed} utility @cindex stream editors -The @command{sed} utility is a stream editor, a program that reads a +The @command{sed} utility is a @dfn{stream editor}, a program that reads a stream of data, makes changes to it, and passes it on. It is often used to make global changes to a large file or to a stream of data generated by a pipeline of commands. @@ -25070,7 +24923,6 @@ additional arguments are treated as @value{DF} names to process. If none are provided, the standard input is used: @cindex Brennan, Michael -@c STARTOFRANGE awksed @cindex @command{awksed.awk} program @c @cindex simple stream editor @c @cindex stream editor, simple @@ -25147,14 +24999,11 @@ The @code{usage()} function prints an error message and exits. Finally, the single rule handles the printing scheme outlined earlier, using @code{print} or @code{printf} as appropriate, depending upon the value of @code{RT}. -@c ENDOFRANGE awksed @node Igawk Program @subsection An Easy Way to Use Library Functions -@c STARTOFRANGE libfex @cindex libraries of @command{awk} functions, example program for using -@c STARTOFRANGE flibex @cindex functions, library, example program for using In @ref{Include Files}, we saw how @command{gawk} provides a built-in file-inclusion capability. However, this is a @command{gawk} extension. @@ -25196,7 +25045,7 @@ includes don't accidentally include a library function twice. @command{igawk} should behave just like @command{gawk} externally. This means it should accept all of @command{gawk}'s command-line arguments, including the ability to have multiple source files specified via -@option{-f}, and the ability to mix command-line and library source files. +@option{-f} and the ability to mix command-line and library source files. The program is written using the POSIX Shell (@command{sh}) command language.@footnote{Fully explaining the @command{sh} language is beyond @@ -25235,7 +25084,7 @@ Run the expanded program with @command{gawk} and any other original command-line arguments that the user supplied (such as the @value{DF} names). @end enumerate -This program uses shell variables extensively: for storing command-line arguments, +This program uses shell variables extensively: for storing command-line arguments and the text of the @command{awk} program that will expand the user's program, for the user's original program, and for the expanded program. Doing so removes some potential problems that might arise were we to use temporary files instead, @@ -25293,7 +25142,6 @@ program. The program is as follows: -@c STARTOFRANGE igawk @cindex @code{igawk.sh} program @example @c file eg/prog/igawk.sh @@ -25553,22 +25401,7 @@ Save the results of this processing in the shell variable The last step is to call @command{gawk} with the expanded program, along with the original -options and command-line arguments that the user supplied. - -@c this causes more problems than it solves, so leave it out. -@ignore -The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk} -to handle an interesting case. Suppose that the user's program only has -a @code{BEGIN} rule and there are no @value{DF}s to read. -The program should exit without reading any @value{DF}s. -However, suppose that an included library file defines an @code{END} -rule of its own. In this case, @command{gawk} will hang, reading standard -input. In order to avoid this, @file{/dev/null} is explicitly added to the -command line. Reading from @file{/dev/null} always returns an immediate -end of file indication. - -@c Hmm. Add /dev/null if $# is 0? Still messes up ARGV. Sigh. -@end ignore +options and command-line arguments that the user supplied: @example @c file eg/prog/igawk.sh @@ -25618,10 +25451,6 @@ features to a program; they can often be layered on top.@footnote{@command{gawk} does @code{@@include} processing itself in order to support the use of @command{awk} programs as Web CGI scripts.} -@c ENDOFRANGE libfex -@c ENDOFRANGE flibex -@c ENDOFRANGE awkpex -@c ENDOFRANGE igawk @node Anagram Program @subsection Finding Anagrams from a Dictionary @@ -25638,19 +25467,18 @@ the same letters Column 2, Problem C, of Jon Bentley's @cite{Programming Pearls}, Second Edition, presents an elegant algorithm. The idea is to give words that are anagrams a common signature, sort all the words together by their -signature, and then print them. Dr.@: Bentley observes that taking the -letters in each word and sorting them produces that common signature. +signatures, and then print them. Dr.@: Bentley observes that taking the +letters in each word and sorting them produces those common signatures. The following program uses arrays of arrays to bring together words with the same signature and array sorting to print the words in sorted order: -@c STARTOFRANGE anagram @cindex @code{anagram.awk} program @example @c file eg/prog/anagram.awk -# anagram.awk --- An implementation of the anagram finding algorithm -# from Jon Bentley's "Programming Pearls", 2nd edition. +# anagram.awk --- An implementation of the anagram-finding algorithm +# from Jon Bentley's "Programming Pearls," 2nd edition. # Addison Wesley, 2000, ISBN 0-201-65788-0. # Column 2, Problem C, section 2.8, pp 18-20. @c endfile @@ -25698,7 +25526,7 @@ sorts the letters, and then joins them back together: @example @c file eg/prog/anagram.awk -# word2key --- split word apart into letters, sort, joining back together +# word2key --- split word apart into letters, sort, and join back together function word2key(word, a, i, n, result) @{ @@ -25754,7 +25582,6 @@ babery yabber @dots{} @end example -@c ENDOFRANGE anagram @node Signature Program @subsection And Now for Something Completely Different @@ -25894,12 +25721,13 @@ characters. The ability to use @code{split()} with the empty string as the separator can considerably simplify such tasks. @item -The library functions from @ref{Library Functions}, proved their -usefulness for a number of real (if small) programs. +The examples here demonstrate the usefulness of the library +functions from @DBREF{Library Functions} +for a number of real (if small) programs. @item Besides reinventing POSIX wheels, other programs solved a selection of -interesting problems, such as finding duplicates words in text, printing +interesting problems, such as finding duplicate words in text, printing mailing labels, and finding anagrams. @end itemize @@ -26074,9 +25902,7 @@ It contains the following chapters: @node Advanced Features @chapter Advanced Features of @command{gawk} -@c STARTOFRANGE gawadv @cindex @command{gawk}, features, advanced -@c STARTOFRANGE advgaw @cindex advanced features, @command{gawk} @ignore Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com> @@ -26097,18 +25923,18 @@ a violent psychopath who knows where you live.} This @value{CHAPTER} discusses advanced features in @command{gawk}. It's a bit of a ``grab bag'' of items that are otherwise unrelated to each other. -First, a command-line option allows @command{gawk} to recognize +First, we look at a command-line option that allows @command{gawk} to recognize nondecimal numbers in input data, not just in @command{awk} programs. Then, @command{gawk}'s special features for sorting arrays are presented. Next, two-way I/O, discussed briefly in earlier parts of this @value{DOCUMENT}, is described in full detail, along with the basics -of TCP/IP networking. Finally, @command{gawk} +of TCP/IP networking. Finally, we see how @command{gawk} can @dfn{profile} an @command{awk} program, making it possible to tune it for performance. @c FULLXREF ON -A number of advanced features require separate @value{CHAPTER}s of their +Additional advanced features are discussed in separate @value{CHAPTER}s of their own: @itemize @value{BULLET} @@ -26202,7 +26028,8 @@ This option may disappear in a future version of @command{gawk}. @node Array Sorting @section Controlling Array Traversal and Array Sorting -@command{gawk} lets you control the order in which a @samp{for (i in array)} +@command{gawk} lets you control the order in which a +@samp{for (@var{indx} in @var{array})} loop traverses an array. In addition, two built-in functions, @code{asort()} and @code{asorti()}, @@ -26218,7 +26045,7 @@ to order the elements during sorting. @node Controlling Array Traversal @subsection Controlling Array Traversal -By default, the order in which a @samp{for (i in array)} loop +By default, the order in which a @samp{for (@var{indx} in @var{array})} loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside @command{awk}. @@ -26247,23 +26074,23 @@ function comp_func(i1, v1, i2, v2) @} @end example -Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2} +Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2} are the corresponding values of the two elements being compared. -Either @var{v1} or @var{v2}, or both, can be arrays if the array being +Either @code{v1} or @code{v2}, or both, can be arrays if the array being traversed contains subarrays as values. (@DBXREF{Arrays of Arrays} for more information about subarrays.) The three possible return values are interpreted as follows: @table @code @item comp_func(i1, v1, i2, v2) < 0 -Index @var{i1} comes before index @var{i2} during loop traversal. +Index @code{i1} comes before index @code{i2} during loop traversal. @item comp_func(i1, v1, i2, v2) == 0 -Indices @var{i1} and @var{i2} -come together but the relative order with respect to each other is undefined. +Indices @code{i1} and @code{i2} +come together, but the relative order with respect to each other is undefined. @item comp_func(i1, v1, i2, v2) > 0 -Index @var{i1} comes after index @var{i2} during loop traversal. +Index @code{i1} comes after index @code{i2} during loop traversal. @end table Our first comparison function can be used to scan an array in @@ -26424,7 +26251,7 @@ As already mentioned, the order of the indices is arbitrary if two elements compare equal. This is usually not a problem, but letting the tied elements come out in arbitrary order can be an issue, especially when comparing item values. The partial ordering of the equal elements -may change the next time the array is traversed, if other elements are added or +may change the next time the array is traversed, if other elements are added to or removed from the array. One way to resolve ties when comparing elements with otherwise equal values is to include the indices in the comparison rules. Note that doing this may make the loop traversal less efficient, @@ -26467,7 +26294,7 @@ equivalent or distinct. Another point to keep in mind is that in the case of subarrays, the element values can themselves be arrays; a production comparison function should use the @code{isarray()} function -(@pxref{Type Functions}), +(@pxref{Type Functions}) to check for this, and choose a defined sorting order for subarrays. All sorting based on @code{PROCINFO["sorted_in"]} @@ -26475,7 +26302,7 @@ is disabled in POSIX mode, because the @code{PROCINFO} array is not special in that case. As a side note, sorting the array indices before traversing -the array has been reported to add 15% to 20% overhead to the +the array has been reported to add a 15% to 20% overhead to the execution time of @command{awk} programs. For this reason, sorted array traversal is not the default. @@ -26534,7 +26361,7 @@ However, the @code{source} array is not affected. Often, what's needed is to sort on the values of the @emph{indices} instead of the values of the elements. To do that, use the @code{asorti()} function. The interface and behavior are identical to -that of @code{asort()}, except that the index values are used for sorting, +that of @code{asort()}, except that the index values are used for sorting and become the values of the result array: @example @@ -26569,8 +26396,8 @@ it chooses}, taking into account just the indices, just the values, or both. This is extremely powerful. Once the array is sorted, @code{asort()} takes the @emph{values} in -their final order, and uses them to fill in the result array, whereas -@code{asorti()} takes the @emph{indices} in their final order, and uses +their final order and uses them to fill in the result array, whereas +@code{asorti()} takes the @emph{indices} in their final order and uses them to fill in the result array. @cindex reference counting, sorting arrays @@ -26786,7 +26613,6 @@ using regular pipes. @section Using @command{gawk} for Network Programming @cindex advanced features, network programming @cindex networks, programming -@c STARTOFRANGE tcpip @cindex TCP/IP @cindex @code{/inet/@dots{}} special files (@command{gawk}) @cindex files, @code{/inet/@dots{}} (@command{gawk}) @@ -26868,7 +26694,7 @@ service name. @cindex @command{gawk}, @code{ERRNO} variable in @cindex @code{ERRNO} variable @quotation NOTE -Failure in opening a two-way socket will result in a non-fatal error +Failure in opening a two-way socket will result in a nonfatal error being returned to the calling code. The value of @code{ERRNO} indicates the error (@pxref{Auto-set}). @end quotation @@ -26885,31 +26711,28 @@ BEGIN @{ @end example This program reads the current date and time from the local system's -TCP @samp{daytime} server. +TCP @code{daytime} server. It then prints the results and closes the connection. Because this topic is extensive, the use of @command{gawk} for TCP/IP programming is documented separately. @ifinfo See -@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}, +@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}}, @end ifinfo @ifnotinfo See @uref{http://www.gnu.org/software/gawk/manual/gawkinet/, -@cite{TCP/IP Internetworking with @command{gawk}}}, +@cite{@value{GAWKINETTITLE}}}, which comes as part of the @command{gawk} distribution, @end ifnotinfo for a much more complete introduction and discussion, as well as extensive examples. -@c ENDOFRANGE tcpip @node Profiling @section Profiling Your @command{awk} Programs -@c STARTOFRANGE awkp @cindex @command{awk} programs, profiling -@c STARTOFRANGE proawk @cindex profiling @command{awk} programs @cindex @code{awkprof.out} file @cindex files, @code{awkprof.out} @@ -26976,9 +26799,9 @@ junk @end example Here is the @file{awkprof.out} that results from running the -@command{gawk} profiler on this program and data. (This example also +@command{gawk} profiler on this program and data (this example also illustrates that @command{awk} programmers sometimes get up very early -in the morning to work.) +in the morning to work): @cindex @code{BEGIN} pattern, and profiling @cindex @code{END} pattern, and profiling @@ -27038,8 +26861,8 @@ They are as follows: @item The program is printed in the order @code{BEGIN} rules, @code{BEGINFILE} rules, -pattern/action rules, -@code{ENDFILE} rules, @code{END} rules and functions, listed +pattern--action rules, +@code{ENDFILE} rules, @code{END} rules, and functions, listed alphabetically. Multiple @code{BEGIN} and @code{END} rules retain their separate identities, as do @@ -27047,7 +26870,7 @@ multiple @code{BEGINFILE} and @code{ENDFILE} rules. @cindex patterns, counts, in a profile @item -Pattern-action rules have two counts. +Pattern--action rules have two counts. The first count, to the left of the rule, shows how many times the rule's pattern was @emph{tested}. The second count, to the right of the rule's opening left brace @@ -27114,13 +26937,13 @@ the target of a redirection isn't a scalar, it gets parenthesized. @command{gawk} supplies leading comments in front of the @code{BEGIN} and @code{END} rules, the @code{BEGINFILE} and @code{ENDFILE} rules, -the pattern/action rules, and the functions. +the pattern--action rules, and the functions. @end itemize The profiled version of your program may not look exactly like what you typed when you wrote it. This is because @command{gawk} creates the -profiled version by ``pretty printing'' its internal representation of +profiled version by ``pretty-printing'' its internal representation of the program. The advantage to this is that @command{gawk} can produce a standard representation. Also, things such as: @@ -27203,16 +27026,16 @@ If you use the @code{HUP} signal instead of the @code{USR1} signal, @cindex @code{SIGQUIT} signal (MS-Windows) @cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows) When @command{gawk} runs on MS-Windows systems, it uses the -@code{INT} and @code{QUIT} signals for producing the profile and, in +@code{INT} and @code{QUIT} signals for producing the profile, and in the case of the @code{INT} signal, @command{gawk} exits. This is because these systems don't support the @command{kill} command, so the only signals you can deliver to a program are those generated by the keyboard. The @code{INT} signal is generated by the -@kbd{Ctrl-@key{C}} or @kbd{Ctrl-@key{BREAK}} key, while the -@code{QUIT} signal is generated by the @kbd{Ctrl-@key{\}} key. +@kbd{Ctrl-c} or @kbd{Ctrl-BREAK} key, while the +@code{QUIT} signal is generated by the @kbd{Ctrl-\} key. Finally, @command{gawk} also accepts another option, @option{--pretty-print}. -When called this way, @command{gawk} ``pretty prints'' the program into +When called this way, @command{gawk} ``pretty-prints'' the program into @file{awkprof.out}, without any execution counts. @quotation NOTE @@ -27236,9 +27059,6 @@ that the profiling output does. This makes it easy to pretty-print your code once development is completed, and then use the result as the final version of your program. -@c ENDOFRANGE awkp -@c ENDOFRANGE proawk - @node Advanced Features Summary @section Summary @@ -27269,7 +27089,7 @@ optionally, close off one side of the two-way communications. @item By using special @value{FN}s with the @samp{|&} operator, you can open a -TCP/IP (or UDP/IP) connection to remote hosts in the Internet. @command{gawk} +TCP/IP (or UDP/IP) connection to remote hosts on the Internet. @command{gawk} supports both IPv4 and IPv6. @item @@ -27279,13 +27099,11 @@ you tune them more easily. Sending the @code{USR1} signal while profiling cause @command{gawk} to dump the profile and keep going, including a function call stack. @item -You can also just ``pretty print'' the program. This currently also runs +You can also just ``pretty-print'' the program. This currently also runs the program, but that will change in the next major release. @end itemize -@c ENDOFRANGE advgaw -@c ENDOFRANGE gawadv @node Internationalization @chapter Internationalization with @command{gawk} @@ -27298,7 +27116,6 @@ countries, they were able to sell more systems. As a result, internationalization and localization of programs and software systems became a common practice. -@c STARTOFRANGE inloc @cindex internationalization, localization @cindex @command{gawk}, internationalization and, See internationalization @cindex internationalization, localization, @command{gawk} and @@ -27331,7 +27148,7 @@ a requirement. @cindex localization @dfn{Internationalization} means writing (or modifying) a program once, in such a way that it can use multiple languages without requiring -further source-code changes. +further source code changes. @dfn{Localization} means providing the data necessary for an internationalized program to work in a particular language. Most typically, these terms refer to features such as the language @@ -27343,11 +27160,10 @@ monetary values are printed and read. @section GNU @command{gettext} @cindex internationalizing a program -@c STARTOFRANGE gettex @cindex @command{gettext} library @command{gawk} uses GNU @command{gettext} to provide its internationalization features. -The facilities in GNU @command{gettext} focus on messages; strings printed +The facilities in GNU @command{gettext} focus on messages: strings printed by a program, either directly or via formatting with @code{printf} or @code{sprintf()}.@footnote{For some operating systems, the @command{gawk} port doesn't support GNU @command{gettext}. @@ -27395,7 +27211,6 @@ lookup of the translations. @cindex @code{.po} files @cindex files, @code{.po} -@c STARTOFRANGE portobfi @cindex portable object files @cindex files, portable object @item @@ -27407,7 +27222,6 @@ For example, there might be a @file{fr.po} for a French translation. @cindex @code{.gmo} files @cindex files, @code{.gmo} @cindex message object files -@c STARTOFRANGE portmsgfi @cindex files, message object @item Each language's @file{.po} file is converted into a binary @@ -27535,14 +27349,12 @@ before or after the day in a date, local month abbreviations, and so on. @item LC_ALL All of the above. (Not too useful in the context of @command{gettext}.) @end table -@c ENDOFRANGE gettex @node Programmer i18n @section Internationalizing @command{awk} Programs -@c STARTOFRANGE inap @cindex @command{awk} programs, internationalizing -@command{gawk} provides the following variables and functions for +@command{gawk} provides the following variables for internationalization: @table @code @@ -27558,7 +27370,12 @@ value is @code{"messages"}. String constants marked with a leading underscore are candidates for translation at runtime. String constants without a leading underscore are not translated. +@end table + +@command{gawk} provides the following functions for +internationalization: +@table @code @cindexgawkfunc{dcgettext} @item @code{dcgettext(@var{string}} [@code{,} @var{domain} [@code{,} @var{category}]]@code{)} Return the translation of @var{string} in @@ -27615,15 +27432,7 @@ If @var{directory} is the null string (@code{""}), then given @var{domain}. @end table -To use these facilities in your @command{awk} program, follow the steps -outlined in -@ifnotinfo -the previous @value{SECTION}, -@end ifnotinfo -@ifinfo -@ref{Explaining gettext}, -@end ifinfo -like so: +To use these facilities in your @command{awk} program, follow these steps: @enumerate @cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and @@ -27772,8 +27581,6 @@ to provide you translations that you can also then distribute. @DBXREF{I18N Example} for the full list of steps to go through to create and test translations for @command{guide}. -@c ENDOFRANGE portobfi -@c ENDOFRANGE portmsgfi @node Printf Ordering @subsection Rearranging @code{printf} Arguments @@ -27908,7 +27715,7 @@ the null string (@code{""}) as its value, leaving the original string constant a the result. @item -By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()} +By defining ``dummy'' functions to replace @code{dcgettext()}, @code{dcngettext()}, and @code{bindtextdomain()}, the @command{awk} program can be made to run, but all the messages are output in the original language. For example: @@ -27949,7 +27756,6 @@ However, because the positional specifications are primarily for use in @emph{translated} format strings, and because non-GNU @command{awk}s never retrieve the translated string, this should not be a problem in practice. @end itemize -@c ENDOFRANGE inap @node I18N Example @section A Simple Internationalization Example @@ -28093,15 +27899,15 @@ using the GNU @command{gettext} package. (GNU @command{gettext} is described in complete detail in @ifinfo -@inforef{Top, , GNU @command{gettext} utilities, gettext, GNU gettext tools}.) +@inforef{Top, , GNU @command{gettext} utilities, gettext, GNU @command{gettext} utilities}.) @end ifinfo @ifnotinfo @uref{http://www.gnu.org/software/gettext/manual/, -@cite{GNU gettext tools}}.) +@cite{GNU @command{gettext} utilities}}.) @end ifnotinfo As of this writing, the latest version of GNU @command{gettext} is -@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.19.3.tar.gz, -@value{PVERSION} 0.19.3}. +@uref{ftp://ftp.gnu.org/gnu/gettext/gettext-0.19.4.tar.gz, +@value{PVERSION} 0.19.4}. If a translation of @command{gawk}'s messages exists, then @command{gawk} produces usage messages, warnings, @@ -28113,7 +27919,7 @@ and fatal errors in the local language. @itemize @value{BULLET} @item Internationalization means writing a program such that it can use multiple -languages without requiring source-code changes. Localization means +languages without requiring source code changes. Localization means providing the data necessary for an internationalized program to work in a particular language. @@ -28130,9 +27936,9 @@ file, and the @file{.po} files are compiled into @file{.gmo} files for use at runtime. @item -You can use position specifications with @code{sprintf()} and +You can use positional specifications with @code{sprintf()} and @code{printf} to rearrange the placement of argument values in formatted -strings and output. This is useful for the translations of format +strings and output. This is useful for the translation of format control strings. @item @@ -28145,7 +27951,6 @@ a number of translations for its messages. @end itemize -@c ENDOFRANGE inloc @node Debugger @chapter Debugging @command{awk} Programs @@ -28189,8 +27994,7 @@ the discussion of debugging in @command{gawk}. @subsection Debugging in General (If you have used debuggers in other languages, you may want to skip -ahead to the next section on the specific features of the @command{gawk} -debugger.) +ahead to @ref{Awk Debugging}.) Of course, a debugging program cannot remove bugs for you, because it has no way of knowing what you or your users consider a ``bug'' versus a @@ -28281,10 +28085,10 @@ and usually find the errant code quite quickly. @end table @node Awk Debugging -@subsection Awk Debugging +@subsection @command{awk} Debugging Debugging an @command{awk} program has some specific aspects that are -not shared with other programming languages. +not shared with programs written in other languages. First of all, the fact that @command{awk} programs usually take input line by line from a file or files and operate on those lines using specific @@ -28300,7 +28104,7 @@ to look at the individual primitive instructions carried out by the higher-level @command{awk} commands. @node Sample Debugging Session -@section Sample Debugging Session +@section Sample @command{gawk} Debugging Session @cindex sample debugging session In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample @@ -28319,8 +28123,8 @@ as our example. @cindex debugger, how to start Starting the debugger is almost exactly like running @command{gawk} normally, -except you have to pass an additional option @option{--debug}, or the -corresponding short option @option{-D}. The file(s) containing the +except you have to pass an additional option, @option{--debug}, or the +corresponding short option, @option{-D}. The file(s) containing the program and any supporting code are given on the command line as arguments to one or more @option{-f} options. (@command{gawk} is not designed to debug command-line programs, only programs contained in files.) @@ -28333,7 +28137,7 @@ $ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk -1 inputfile} @noindent where both @file{getopt.awk} and @file{uniq.awk} are in @env{$AWKPATH}. (Experienced users of GDB or similar debuggers should note that -this syntax is slightly different from what they are used to. +this syntax is slightly different from what you are used to. With the @command{gawk} debugger, you give the arguments for running the program in the command line to the debugger rather than as part of the @code{run} command at the debugger prompt.) @@ -28487,10 +28291,10 @@ gawk> @kbd{n} @end example This tells us that @command{gawk} is now ready to execute line 66, which -decides whether to give the lines the special ``field skipping'' treatment +decides whether to give the lines the special ``field-skipping'' treatment indicated by the @option{-1} command-line option. (Notice that we skipped -from where we were before at line 63 to here, because the condition in line 63 -@samp{if (fcount == 0 && charcount == 0)} was false.) +from where we were before, at line 63, to here, because the condition +in line 63, @samp{if (fcount == 0 && charcount == 0)}, was false.) Continuing to step, we now get to the splitting of the current and last records: @@ -28564,7 +28368,7 @@ gawk> @kbd{n} Well, here we are at our error (sorry to spoil the suspense). What we had in mind was to join the fields starting from the second one to make -the virtual record to compare, and if the first field was numbered zero, +the virtual record to compare, and if the first field were numbered zero, this would work. Let's look at what we've got: @example @@ -28573,7 +28377,7 @@ gawk> @kbd{p cline clast} @print{} clast = "awk is a wonderful program!" @end example -Hey, those look pretty familiar! They're just our original, unaltered, +Hey, those look pretty familiar! They're just our original, unaltered input records. A little thinking (the human brain is still the best debugging tool), and we realize that we were off by one! @@ -28623,11 +28427,11 @@ Miscellaneous @end itemize Each of these are discussed in the following subsections. -In the following descriptions, commands which may be abbreviated +In the following descriptions, commands that may be abbreviated show the abbreviation on a second description line. A debugger command name may also be truncated if that partial name is unambiguous. The debugger has the built-in capability to -automatically repeat the previous command just by hitting @key{Enter}. +automatically repeat the previous command just by hitting @kbd{Enter}. This works for the commands @code{list}, @code{next}, @code{nexti}, @code{step}, @code{stepi}, and @code{continue} executed without any argument. @@ -28677,7 +28481,7 @@ Set a breakpoint at entry to (the first instruction of) function @var{function}. @end table -Each breakpoint is assigned a number which can be used to delete it from +Each breakpoint is assigned a number that can be used to delete it from the breakpoint list using the @code{delete} command. With a breakpoint, you may also supply a condition. This is an @@ -28729,7 +28533,7 @@ watchpoint is made unconditional). @cindex breakpoint, delete by number @item @code{delete} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] @itemx @code{d} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] -Delete specified breakpoints or a range of breakpoints. Deletes +Delete specified breakpoints or a range of breakpoints. Delete all defined breakpoints if no argument is supplied. @cindex debugger commands, @code{disable} @@ -28738,7 +28542,7 @@ all defined breakpoints if no argument is supplied. @cindex breakpoint, how to disable or enable @item @code{disable} [@var{n1 n2} @dots{} | @var{n}--@var{m}] Disable specified breakpoints or a range of breakpoints. Without -any argument, disables all breakpoints. +any argument, disable all breakpoints. @cindex debugger commands, @code{e} (@code{enable}) @cindex debugger commands, @code{enable} @@ -28748,18 +28552,18 @@ any argument, disables all breakpoints. @item @code{enable} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] @itemx @code{e} [@code{del} | @code{once}] [@var{n1 n2} @dots{}] [@var{n}--@var{m}] Enable specified breakpoints or a range of breakpoints. Without -any argument, enables all breakpoints. -Optionally, you can specify how to enable the breakpoint: +any argument, enable all breakpoints. +Optionally, you can specify how to enable the breakpoints: @c nested table @table @code @item del -Enable the breakpoint(s) temporarily, then delete it when -the program stops at the breakpoint. +Enable the breakpoints temporarily, then delete each one when +the program stops at it. @item once -Enable the breakpoint(s) temporarily, then disable it when -the program stops at the breakpoint. +Enable the breakpoints temporarily, then disable each one when +the program stops at it. @end table @cindex debugger commands, @code{ignore} @@ -28827,7 +28631,7 @@ gawk> @item @code{continue} [@var{count}] @itemx @code{c} [@var{count}] Resume program execution. If continued from a breakpoint and @var{count} is -specified, ignores the breakpoint at that location the next @var{count} times +specified, ignore the breakpoint at that location the next @var{count} times before stopping. @cindex debugger commands, @code{finish} @@ -28881,7 +28685,7 @@ automatic display variables, and debugger options. @item @code{step} [@var{count}] @itemx @code{s} [@var{count}] Continue execution until control reaches a different source line in the -current stack frame. @code{step} steps inside any function called within +current stack frame, stepping inside any function called within the line. If the argument @var{count} is supplied, steps that many times before stopping, unless it encounters a breakpoint or watchpoint. @@ -28994,7 +28798,7 @@ or field. String values must be enclosed between double quotes (@code{"}@dots{}@code{"}). You can also set special @command{awk} variables, such as @code{FS}, -@code{NF}, @code{NR}, and son on. +@code{NF}, @code{NR}, and so on. @cindex debugger commands, @code{w} (@code{watch}) @cindex debugger commands, @code{watch} @@ -29006,7 +28810,7 @@ You can also set special @command{awk} variables, such as @code{FS}, Add variable @var{var} (or field @code{$@var{n}}) to the watch list. The debugger then stops whenever the value of the variable or field changes. Each watched item is assigned a -number which can be used to delete it from the watch list using the +number that can be used to delete it from the watch list using the @code{unwatch} command. With a watchpoint, you may also supply a condition. This is an @@ -29034,11 +28838,11 @@ watch list. @node Execution Stack @subsection Working with the Stack -Whenever you run a program which contains any function calls, +Whenever you run a program that contains any function calls, @command{gawk} maintains a stack of all of the function calls leading up to where the program is right now. You can see how you got to where you are, and also move around in the stack to see what the state of things was in the -functions which called the one you are in. The commands for doing this are: +functions that called the one you are in. The commands for doing this are: @table @asis @cindex debugger commands, @code{bt} (@code{backtrace}) @@ -29073,8 +28877,8 @@ Then select and print the frame. @item @code{frame} [@var{n}] @itemx @code{f} [@var{n}] Select and print stack frame @var{n}. Frame 0 is the currently executing, -or @dfn{innermost}, frame (function call), frame 1 is the frame that -called the innermost one. The highest numbered frame is the one for the +or @dfn{innermost}, frame (function call); frame 1 is the frame that +called the innermost one. The highest-numbered frame is the one for the main program. The printed information consists of the frame number, function and argument names, source file, and the source line. @@ -29090,7 +28894,7 @@ Then select and print the frame. Besides looking at the values of variables, there is often a need to get other sorts of information about the state of your program and of the -debugging environment itself. The @command{gawk} debugger has one command which +debugging environment itself. The @command{gawk} debugger has one command that provides this information, appropriately called @code{info}. @code{info} is used with one of a number of arguments that tell it exactly what you want to know: @@ -29178,12 +28982,12 @@ The available options are: @table @asis @item @code{history_size} @cindex debugger history size -The maximum number of lines to keep in the history file @file{./.gawk_history}. -The default is 100. +Set the maximum number of lines to keep in the history file +@file{./.gawk_history}. The default is 100. @item @code{listsize} @cindex debugger default list amount -The number of lines that @code{list} prints. The default is 15. +Specify the number of lines that @code{list} prints. The default is 15. @item @code{outfile} @cindex redirect @command{gawk} output, in debugger @@ -29193,7 +28997,7 @@ standard output. @item @code{prompt} @cindex debugger prompt -The debugger prompt. The default is @samp{@w{gawk> }}. +Change the debugger prompt. The default is @samp{@w{gawk> }}. @item @code{save_history} [@code{on} | @code{off}] @cindex debugger history file @@ -29204,7 +29008,7 @@ The default is @code{on}. @cindex save debugger options Save current options to file @file{./.gawkrc} upon exit. The default is @code{on}. -Options are read back in to the next session upon startup. +Options are read back into the next session upon startup. @item @code{trace} [@code{on} | @code{off}] @cindex instruction tracing, in debugger @@ -29227,7 +29031,7 @@ command in the file. Also, the list of commands may include additional @code{source} commands; however, the @command{gawk} debugger will not source the same file more than once in order to avoid infinite recursion. -In addition to, or instead of the @code{source} command, you can use +In addition to, or instead of, the @code{source} command, you can use the @option{-D @var{file}} or @option{--debug=@var{file}} command-line options to execute commands from a file non-interactively (@pxref{Options}). @@ -29236,16 +29040,16 @@ options to execute commands from a file non-interactively @node Miscellaneous Debugger Commands @subsection Miscellaneous Commands -There are a few more commands which do not fit into the +There are a few more commands that do not fit into the previous categories, as follows: @table @asis @cindex debugger commands, @code{dump} @cindex @code{dump} debugger command @item @code{dump} [@var{filename}] -Dump bytecode of the program to standard output or to the file +Dump byte code of the program to standard output or to the file named in @var{filename}. This prints a representation of the internal -instructions which @command{gawk} executes to implement the @command{awk} +instructions that @command{gawk} executes to implement the @command{awk} commands in a program. This can be very enlightening, as the following partial dump of Davide Brini's obfuscated code (@pxref{Signature Program}) demonstrates: @@ -29342,7 +29146,7 @@ Print lines centered around line number @var{n} in source file @var{filename}. This command may change the current source file. @item @var{function} -Print lines centered around beginning of the +Print lines centered around the beginning of the function @var{function}. This command may change the current source file. @end table @@ -29354,16 +29158,16 @@ function @var{function}. This command may change the current source file. @item @code{quit} @itemx @code{q} Exit the debugger. Debugging is great fun, but sometimes we all have -to tend to other obligations in life, and sometimes we find the bug, +to tend to other obligations in life, and sometimes we find the bug and are free to go on to the next one! As we saw earlier, if you are -running a program, the debugger warns you if you accidentally type +running a program, the debugger warns you when you type @samp{q} or @samp{quit}, to make sure you really want to quit. @cindex debugger commands, @code{trace} @cindex @code{trace} debugger command @item @code{trace} [@code{on} | @code{off}] -Turn on or off a continuous printing of instructions which are about to -be executed, along with printing the @command{awk} line which they +Turn on or off continuous printing of the instructions that are about to +be executed, along with the @command{awk} lines they implement. The default is @code{off}. It is to be hoped that most of the ``opcodes'' in these instructions are @@ -29379,7 +29183,7 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while If @command{gawk} is compiled with @uref{http://cnswww.cns.cwru.edu/php/chet/readline/readline.html, -the @code{readline} library}, you can take advantage of that library's +the GNU Readline library}, you can take advantage of that library's command completion and history expansion features. The following types of completion are available: @@ -29416,7 +29220,7 @@ and We hope you find the @command{gawk} debugger useful and enjoyable to work with, but as with any program, especially in its early releases, it still has -some limitations. A few which are worth being aware of are: +some limitations. A few that it's worth being aware of are: @itemize @value{BULLET} @item @@ -29432,13 +29236,13 @@ If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands} (or if you are already familiar with @command{gawk} internals), you will realize that much of the internal manipulation of data in @command{gawk}, as in many interpreters, is done on a stack. -@code{Op_push}, @code{Op_pop}, and the like, are the ``bread and butter'' of +@code{Op_push}, @code{Op_pop}, and the like are the ``bread and butter'' of most @command{gawk} code. Unfortunately, as of now, the @command{gawk} debugger does not allow you to examine the stack's contents. That is, the intermediate results of expression evaluation are on the -stack, but cannot be printed. Rather, only variables which are defined +stack, but cannot be printed. Rather, only variables that are defined in the program can be printed. Of course, a workaround for this is to use more explicit variables at the debugging stage and then change back to obscure, perhaps more optimal code later. @@ -29452,12 +29256,12 @@ programmer, you are expected to know the meaning of @item The @command{gawk} debugger is designed to be used by running a program (with all its parameters) on the command line, as described in @ref{Debugger Invocation}. -There is no way (as of now) to attach or ``break in'' to a running program. -This seems reasonable for a language which is used mainly for quickly +There is no way (as of now) to attach or ``break into'' a running program. +This seems reasonable for a language that is used mainly for quickly executing, short programs. @item -The @command{gawk} debugger only accepts source supplied with the @option{-f} option. +The @command{gawk} debugger only accepts source code supplied with the @option{-f} option. @end itemize @ignore @@ -29471,8 +29275,8 @@ be added, and of course feel free to try to add them yourself! @itemize @value{BULLET} @item Programs rarely work correctly the first time. Finding bugs -is @dfn{debugging} and a program that helps you find bugs is a -@dfn{debugger}. @command{gawk} has a built-in debugger that works very +is called debugging, and a program that helps you find bugs is a +debugger. @command{gawk} has a built-in debugger that works very similarly to the GNU Debugger, GDB. @item @@ -29492,7 +29296,7 @@ breakpoints, execution, viewing and changing data, working with the stack, getting information, and other tasks. @item -If the @code{readline} library is available when @command{gawk} is +If the GNU Readline library is available when @command{gawk} is compiled, it is used by the debugger to provide command-line history and editing. @@ -29556,7 +29360,7 @@ paper and pencil (and/or a calculator). In theory, numbers can have an arbitrary number of digits on either side (or both sides) of the decimal point, and the results of a computation are always exact. -Some modern system can do decimal arithmetic in hardware, but usually you +Some modern systems can do decimal arithmetic in hardware, but usually you need a special software library to provide access to these instructions. There are also libraries that do decimal arithmetic entirely in software. @@ -29574,8 +29378,8 @@ The disadvantage is that their range is limited. @cindex integers, unsigned In computers, integer values come in two flavors: @dfn{signed} and @dfn{unsigned}. Signed values may be negative or positive, whereas -unsigned values are always positive (i.e., greater than or equal -to zero). +unsigned values are always greater than or equal +to zero. In computer systems, integer arithmetic is exact, but the possible range of values is limited. Integer arithmetic is generally faster than @@ -29612,8 +29416,35 @@ signed. The possible ranges of values are shown in @ref{table-numeric-ranges}. @item 32-bit unsigned integer @tab 0 @tab 4,294,967,295 @item 64-bit signed integer @tab @minus{}9,223,372,036,854,775,808 @tab 9,223,372,036,854,775,807 @item 64-bit unsigned integer @tab 0 @tab 18,446,744,073,709,551,615 -@item Single-precision floating point (approximate) @tab @code{1.175494e-38} @tab @code{3.402823e+38} -@item Double-precision floating point (approximate) @tab @code{2.225074e-308} @tab @code{1.797693e+308} +@iftex +@item Single-precision floating point (approximate) @tab @math{1.175494^{-38}} @tab @math{3.402823^{38}} +@item Double-precision floating point (approximate) @tab @math{2.225074^{-308}} @tab @math{1.797693^{308}} +@end iftex +@ifnottex +@ifnotdocbook +@item Single-precision floating point (approximate) @tab 1.175494e-38 @tab 3.402823e38 +@item Double-precision floating point (approximate) @tab 2.225074e-308 @tab 1.797693e308 +@end ifnotdocbook +@end ifnottex +@ifdocbook +@item Single-precision floating point (approximate) @tab +@c FIXME: Use @sup here for superscript +@docbook +1.175494<superscript>-38</superscript> +@end docbook +@tab +@docbook +3.402823<superscript>38</superscript> +@end docbook +@item Double-precision floating point (approximate) @tab +@docbook +2.225074<superscript>-308</superscript> +@end docbook +@tab +@docbook +1.797693<superscript>308</superscript> +@end docbook +@end ifdocbook @end multitable @end float @@ -29622,7 +29453,7 @@ signed. The possible ranges of values are shown in @ref{table-numeric-ranges}. The rest of this @value{CHAPTER} uses a number of terms. Here are some informal definitions that should help you work your way through the material -here. +here: @table @dfn @item Accuracy @@ -29643,7 +29474,7 @@ A special value representing infinity. Operations involving another number and infinity produce infinity. @item NaN -``Not A Number.''@footnote{Thanks to Michael Brennan for this description, +``Not a number.''@footnote{Thanks to Michael Brennan for this description, which we have paraphrased, and for the examples.} A special value that results from attempting a calculation that has no answer as a real number. In such a case, programs can either receive a floating-point exception, @@ -29686,8 +29517,8 @@ formula: @end display @noindent -Here, @var{prec} denotes the binary precision -(measured in bits) and @var{dps} (short for decimal places) +Here, @emph{prec} denotes the binary precision +(measured in bits) and @emph{dps} (short for decimal places) is the decimal digits. @item Rounding mode @@ -29695,7 +29526,7 @@ How numbers are rounded up or down when necessary. More details are provided later. @item Significand -A floating-point value consists the significand multiplied by 10 +A floating-point value consists of the significand multiplied by 10 to the power of the exponent. For example, in @code{1.2345e67}, the significand is @code{1.2345}. @@ -29719,7 +29550,7 @@ to allow greater precisions and larger exponent ranges. (@command{awk} uses only the 64-bit double-precision format.) @ref{table-ieee-formats} lists the precision and exponent -field values for the basic IEEE 754 binary formats: +field values for the basic IEEE 754 binary formats. @float Table,table-ieee-formats @caption{Basic IEEE format values} @@ -29749,7 +29580,7 @@ is available like so: @example $ @kbd{gawk --version} @print{} GNU Awk 4.1.2, API: 1.1 (GNU MPFR 3.1.0-p3, GNU MP 5.0.2) -@print{} Copyright (C) 1989, 1991-2014 Free Software Foundation. +@print{} Copyright (C) 1989, 1991-2015 Free Software Foundation. @dots{} @end example @@ -29783,12 +29614,12 @@ for more information. @author Teen Talk Barbie, July 1992 @end quotation -This @value{SECTION} provides a high level overview of the issues +This @value{SECTION} provides a high-level overview of the issues involved when doing lots of floating-point arithmetic.@footnote{There is a very nice @uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic} by David Goldberg, ``What Every -Computer Scientist Should Know About Floating-point Arithmetic,'' -@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48. This is +Computer Scientist Should Know About Floating-Point Arithmetic,'' +@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03): 5-48. This is worth reading if you are interested in the details, but it does require a background in computer science.} The discussion applies to both hardware and arbitrary-precision @@ -29857,7 +29688,7 @@ $ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425} Often the error is so small you do not even notice it, and if you do, you can always specify how much precision you would like in your output. -Usually this is a format string like @code{"%.15g"}, which when +Usually this is a format string like @code{"%.15g"}, which, when used in the previous example, produces an output identical to the input. @node Comparing FP Values @@ -29896,7 +29727,7 @@ else The loss of accuracy during a single computation with floating-point numbers usually isn't enough to worry about. However, if you compute a -value which is the result of a sequence of floating-point operations, +value that is the result of a sequence of floating-point operations, the error can accumulate and greatly affect the computation itself. Here is an attempt to compute the value of @value{PI} using one of its many series representations: @@ -29949,7 +29780,7 @@ no easy answers. The standard rules of algebra often do not apply when using floating-point arithmetic. Among other things, the distributive and associative laws do not hold completely, and order of operation may be important -for your computation. Rounding error, cumulative precision loss +for your computation. Rounding error, cumulative precision loss, and underflow are often troublesome. When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @@ -29989,7 +29820,8 @@ by our earlier attempt to compute the value of @value{PI}. Extra precision can greatly enhance the stability and the accuracy of your computation in such cases. -Repeated addition is not necessarily equivalent to multiplication +Additionally, you should understand that +repeated addition is not necessarily equivalent to multiplication in floating-point arithmetic. In the example in @ref{Errors accumulate}: @@ -30052,7 +29884,7 @@ to emulate an IEEE 754 binary format. @float Table,table-predefined-precision-strings @caption{Predefined precision strings for @code{PREC}} @multitable {@code{"double"}} {12345678901234567890123456789012345} -@headitem @code{PREC} @tab IEEE 754 Binary Format +@headitem @code{PREC} @tab IEEE 754 binary format @item @code{"half"} @tab 16-bit half-precision @item @code{"single"} @tab Basic 32-bit single precision @item @code{"double"} @tab Basic 64-bit double precision @@ -30084,7 +29916,6 @@ than the default and cannot use a command-line assignment to @code{PREC}, you should either specify the constant as a string, or as a rational number, whenever possible. The following example illustrates the differences among various ways to print a floating-point constant: -@end quotation @example $ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'} @@ -30096,22 +29927,23 @@ $ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'} $ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'} @print{} 0.1000000000000000000000000 @end example +@end quotation @node Setting the rounding mode @subsection Setting the Rounding Mode The @code{ROUNDMODE} variable provides -program level control over the rounding mode. +program-level control over the rounding mode. The correspondence between @code{ROUNDMODE} and the IEEE rounding modes is shown in @ref{table-gawk-rounding-modes}. @float Table,table-gawk-rounding-modes @caption{@command{gawk} rounding modes} @multitable @columnfractions .45 .30 .25 -@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE} +@headitem Rounding mode @tab IEEE name @tab @code{ROUNDMODE} @item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"} -@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} -@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} +@item Round toward positive infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} +@item Round toward negative infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} @item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"} @item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"} @end multitable @@ -30172,8 +30004,8 @@ distributes upward and downward rounds of exact halves, which might cause any accumulating round-off error to cancel itself out. This is the default rounding mode for IEEE 754 computing functions and operators. -The other rounding modes are rarely used. Round toward positive infinity -(@code{roundTowardPositive}) and round toward negative infinity +The other rounding modes are rarely used. Rounding toward positive infinity +(@code{roundTowardPositive}) and toward negative infinity (@code{roundTowardNegative}) are often used to implement interval arithmetic, where you adjust the rounding mode to calculate upper and lower bounds for the range of output. The @code{roundTowardZero} mode can @@ -30215,6 +30047,7 @@ the following computes @end docbook the result of which is beyond the limits of ordinary hardware double-precision floating-point values: +@c FIXME: Use @sup here for superscript @example $ @kbd{gawk -M 'BEGIN @{} @@ -30230,17 +30063,17 @@ If instead you were to compute the same value using arbitrary-precision floating-point values, the precision needed for correct output (using the formula @iftex -@math{prec = 3.322 @cdot dps}), +@math{prec = 3.322 @cdot dps}) would be @math{3.322 @cdot 183231}, @end iftex @ifnottex @ifnotdocbook -@samp{prec = 3.322 * dps}), +@samp{prec = 3.322 * dps}) would be 3.322 x 183231, @end ifnotdocbook @end ifnottex @docbook -<emphasis>prec</emphasis> = 3.322 ⋅ <emphasis>dps</emphasis>), +<emphasis>prec</emphasis> = 3.322 ⋅ <emphasis>dps</emphasis>) would be <emphasis>prec</emphasis> = 3.322 ⋅ 183231, @c @end docbook @@ -30278,7 +30111,7 @@ interface to process arbitrary-precision integers or mixed-mode numbers as needed by an operation or function. In such a case, the precision is set to the minimum value necessary for exact conversion, and the working precision is not used for this purpose. If this is not what you need or -want, you can employ a subterfuge, and convert the integer to floating +want, you can employ a subterfuge and convert the integer to floating point first, like this: @example @@ -30403,7 +30236,7 @@ When asked about the algorithm used, Katie replied: @quotation It's not that well known but it's not that obscure either. It's Euler's modification to Newton's method for calculating pi. -Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.htm}. +Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.html}. The algorithm I wrote simply expands the multiply by 2 and works from the innermost expression outwards. I used this to program HP calculators @@ -30415,7 +30248,7 @@ word sizes. See @node POSIX Floating Point Problems @section Standards Versus Existing Practice -Historically, @command{awk} has converted any non-numeric looking string +Historically, @command{awk} has converted any nonnumeric-looking string to the numeric value zero, when required. Furthermore, the original definition of the language and the original POSIX standards specified that @command{awk} only understands decimal numbers (base 10), and not octal @@ -30432,8 +30265,8 @@ notation (e.g., @code{0xDEADBEEF}). (Note: data values, @emph{not} source code constants.) @item -Support for the special IEEE 754 floating-point values ``Not A Number'' -(NaN), positive Infinity (``inf''), and negative Infinity (``@minus{}inf''). +Support for the special IEEE 754 floating-point values ``not a number'' +(NaN), positive infinity (``inf''), and negative infinity (``@minus{}inf''). In particular, the format for these values is as specified by the ISO 1999 C standard, which ignores case and can allow implementation-dependent additional characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}. @@ -30453,22 +30286,22 @@ Allowing completely alphabetic strings to have valid numeric values is also a very severe departure from historical practice. @end itemize -The second problem is that the @code{gawk} maintainer feels that this -interpretation of the standard, which requires a certain amount of +The second problem is that the @command{gawk} maintainer feels that this +interpretation of the standard, which required a certain amount of ``language lawyering'' to arrive at in the first place, was not even -intended by the standard developers. In other words, ``we see how you +intended by the standard developers. In other words, ``We see how you got where you are, but we don't think that that's where you want to be.'' Recognizing these issues, but attempting to provide compatibility with the earlier versions of the standard, the 2008 POSIX standard added explicit wording to allow, but not require, that @command{awk} support hexadecimal floating-point values and -special values for ``Not A Number'' and infinity. +special values for ``not a number'' and infinity. Although the @command{gawk} maintainer continues to feel that providing those features is inadvisable, nevertheless, on systems that support IEEE floating point, it seems -reasonable to provide @emph{some} way to support NaN and Infinity values. +reasonable to provide @emph{some} way to support NaN and infinity values. The solution implemented in @command{gawk} is as follows: @itemize @value{BULLET} @@ -30488,7 +30321,7 @@ $ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'} @end example @item -Without @option{--posix}, @command{gawk} interprets the four strings +Without @option{--posix}, @command{gawk} interprets the four string values @samp{+inf}, @samp{-inf}, @samp{+nan}, @@ -30510,7 +30343,7 @@ $ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'} @end example @command{gawk} ignores case in the four special values. -Thus @samp{+nan} and @samp{+NaN} are the same. +Thus, @samp{+nan} and @samp{+NaN} are the same. @end itemize @node Floating point summary @@ -30523,9 +30356,9 @@ values. Standard @command{awk} uses double-precision floating-point values. @item -In the early 1990s, Barbie mistakenly said ``Math class is tough!'' +In the early 1990s Barbie mistakenly said, ``Math class is tough!'' Although math isn't tough, floating-point arithmetic isn't the same -as pencil and paper math, and care must be taken: +as pencil-and-paper math, and care must be taken: @c nested list @itemize @value{MINUS} @@ -30558,11 +30391,11 @@ arithmetic. Use @code{PREC} to set the precision in bits, and @item With @option{-M}, @command{gawk} performs arbitrary-precision integer arithmetic using the GMP library. -This is faster and more space efficient than using MPFR for +This is faster and more space-efficient than using MPFR for the same calculations. @item -There are several ``dark corners'' with respect to floating-point +There are several areas with respect to floating-point numbers where @command{gawk} disagrees with the POSIX standard. It pays to be aware of them. @@ -30570,7 +30403,7 @@ It pays to be aware of them. Overall, there is no need to be unduly suspicious about the results from floating-point arithmetic. The lesson to remember is that floating-point arithmetic is always more complex than arithmetic using pencil and -paper. In order to take advantage of the power of computer floating point, +paper. In order to take advantage of the power of floating-point arithmetic, you need to know its limitations and work within them. For most casual use of floating-point arithmetic, you will often get the expected result if you simply round the display of your final results to the correct number @@ -30612,7 +30445,7 @@ When @option{--sandbox} is specified, extensions are disabled * Finding Extensions:: How @command{gawk} finds compiled extensions. * Extension Example:: Example C code for an extension. * Extension Samples:: The sample extensions that ship with - @code{gawk}. + @command{gawk}. * gawkextlib:: The @code{gawkextlib} project. * Extension summary:: Extension summary. * Extension Exercises:: Exercises. @@ -30631,7 +30464,7 @@ Extensions are useful because they allow you (of course) to extend @command{gawk}'s functionality. For example, they can provide access to system calls (such as @code{chdir()} to change directory) and to other C library routines that could be of use. As with most software, -``the sky is the limit;'' if you can imagine something that you might +``the sky is the limit''; if you can imagine something that you might want to do and can write in C or C++, you can write an extension to do it! Extensions are written in C or C++, using the @dfn{application programming @@ -30639,7 +30472,7 @@ interface} (API) defined for this purpose by the @command{gawk} developers. The rest of this @value{CHAPTER} explains the facilities that the API provides and how to use them, and presents a small example extension. In addition, it documents -the sample extensions included in the @command{gawk} distribution, +the sample extensions included in the @command{gawk} distribution and describes the @code{gawkextlib} project. @ifclear FOR_PRINT @xref{Extension Design}, for a discussion of the extension mechanism @@ -30792,7 +30625,7 @@ Some other bits and pieces: @itemize @value{BULLET} @item The API provides access to @command{gawk}'s @code{do_@var{xxx}} values, -reflecting command-line options, like @code{do_lint}, @code{do_profiling} +reflecting command-line options, like @code{do_lint}, @code{do_profiling}, and so on (@pxref{Extension API Variables}). These are informational: an extension cannot affect their values inside @command{gawk}. In addition, attempting to assign to them @@ -30836,7 +30669,7 @@ This (rather large) @value{SECTION} describes the API in detail. @node Extension API Functions Introduction @subsection Introduction -Access to facilities within @command{gawk} are made available +Access to facilities within @command{gawk} is achieved by calling through function pointers passed into your extension. API function pointers are provided for the following kinds of operations: @@ -30864,7 +30697,7 @@ Output wrappers Two-way processors @end itemize -All of these are discussed in detail, later in this @value{CHAPTER}. +All of these are discussed in detail later in this @value{CHAPTER}. @item Printing fatal, warning, and ``lint'' warning messages. @@ -30902,7 +30735,7 @@ Creating a new array Clearing an array @item -Flattening an array for easy C style looping over all its indices and elements +Flattening an array for easy C-style looping over all its indices and elements @end itemize @end itemize @@ -30914,8 +30747,9 @@ The following types, macros, and/or functions are referenced in @file{gawkapi.h}. For correct use, you must therefore include the corresponding standard header file @emph{before} including @file{gawkapi.h}: +@c FIXME: Make this is a float at some point. @multitable {@code{memset()}, @code{memcpy()}} {@code{<sys/types.h>}} -@headitem C Entity @tab Header File +@headitem C entity @tab Header file @item @code{EOF} @tab @code{<stdio.h>} @item Values for @code{errno} @tab @code{<errno.h>} @item @code{FILE} @tab @code{<stdio.h>} @@ -30941,7 +30775,7 @@ Doing so, however, is poor coding practice. Although the API only uses ISO C 90 features, there is an exception; the ``constructor'' functions use the @code{inline} keyword. If your compiler does not support this keyword, you should either place -@samp{-Dinline=''} on your command line, or use the GNU Autotools and include a +@samp{-Dinline=''} on your command line or use the GNU Autotools and include a @file{config.h} file in your extensions. @item @@ -30949,7 +30783,7 @@ All pointers filled in by @command{gawk} point to memory managed by @command{gawk} and should be treated by the extension as read-only. Memory for @emph{all} strings passed into @command{gawk} from the extension @emph{must} come from calling one of -@code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}, +@code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()}, and is managed by @command{gawk} from then on. @item @@ -30963,7 +30797,7 @@ characters are allowed. By intent, strings are maintained using the current multibyte encoding (as defined by @env{LC_@var{xxx}} environment variables) and not using wide characters. This matches how @command{gawk} stores strings internally -and also how characters are likely to be input and output from files. +and also how characters are likely to be input into and output from files. @end quotation @item @@ -31008,6 +30842,8 @@ general-purpose use. Additional, more specialized, data structures are introduced in subsequent @value{SECTION}s, together with the functions that use them. +The general-purpose types and structures are as follows: + @table @code @item typedef void *awk_ext_id_t; A value of this type is received from @command{gawk} when an extension is loaded. @@ -31024,7 +30860,7 @@ while allowing @command{gawk} to use them as it needs to. @itemx @ @ @ @ awk_false = 0, @itemx @ @ @ @ awk_true @itemx @} awk_bool_t; -A simple boolean type. +A simple Boolean type. @item typedef struct awk_string @{ @itemx @ @ @ @ char *str;@ @ @ @ @ @ /* data */ @@ -31070,7 +30906,7 @@ The @code{val_type} member indicates what kind of value the @itemx #define array_cookie@ @ @ u.a @itemx #define scalar_cookie@ @ u.scl @itemx #define value_cookie@ @ @ u.vc -These macros make accessing the fields of the @code{awk_value_t} more +Using these macros makes accessing the fields of the @code{awk_value_t} more readable. @item typedef void *awk_scalar_t; @@ -31093,7 +30929,7 @@ indicates what is in the @code{union}. Representing numbers is easy---the API uses a C @code{double}. Strings require more work. Because @command{gawk} allows embedded @sc{nul} bytes in string values, a string must be represented as a pair containing a -data-pointer and length. This is the @code{awk_string_t} type. +data pointer and length. This is the @code{awk_string_t} type. Identifiers (i.e., the names of global variables) can be associated with either scalar values or with arrays. In addition, @command{gawk} @@ -31106,12 +30942,12 @@ of the @code{union} as if they were fields in a @code{struct}; this is a common coding practice in C. Such code is easier to write and to read, but it remains @emph{your} responsibility to make sure that the @code{val_type} member correctly reflects the type of the value in -the @code{awk_value_t}. +the @code{awk_value_t} struct. Conceptually, the first three members of the @code{union} (number, string, and array) are all that is needed for working with @command{awk} values. However, because the API provides routines for accessing and changing -the value of global scalar variables only by using the variable's name, +the value of a global scalar variable only by using the variable's name, there is a performance penalty: @command{gawk} must find the variable each time it is accessed and changed. This turns out to be a real issue, not just a theoretical one. @@ -31129,7 +30965,9 @@ See also the entry for ``Cookie'' in the @ref{Glossary}. object for that variable, and then use the cookie for getting the variable's value or for changing the variable's value. -This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro. +The @code{awk_scalar_t} type holds a scalar cookie, and the +@code{scalar_cookie} macro provides access to the value of that type +in the @code{awk_value_t} struct. Given a scalar cookie, @command{gawk} can directly retrieve or modify the value, as required, without having to find it first. @@ -31138,8 +30976,8 @@ If you know that you wish to use the same numeric or string @emph{value} for one or more variables, you can create the value once, retaining a @dfn{value cookie} for it, and then pass in that value cookie whenever you wish to set the value of a -variable. This saves both storage space within the running @command{gawk} -process as well as the time needed to create the value. +variable. This saves storage space within the running @command{gawk} +process and reduces the time needed to create the value. @node Memory Allocation Functions @subsection Memory Allocation Functions and Convenience Macros @@ -31167,13 +31005,13 @@ be passed to @command{gawk}. @item void gawk_free(void *ptr); Call the correct version of @code{free()} to release storage that was -allocated with @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}. +allocated with @code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()}. @end table The API has to provide these functions because it is possible for an extension to be compiled and linked against a different version of the C library than was used for the @command{gawk} -executable.@footnote{This is more common on MS-Windows systems, but +executable.@footnote{This is more common on MS-Windows systems, but it can happen on Unix-like systems as well.} If @command{gawk} were to use its version of @code{free()} when the memory came from an unrelated version of @code{malloc()}, unexpected behavior would @@ -31183,7 +31021,7 @@ Two convenience macros may be used for allocating storage from @code{gawk_malloc()} and @code{gawk_realloc()}. If the allocation fails, they cause @command{gawk} to exit with a fatal error message. They should be used as if they were -procedure calls that do not return a value. +procedure calls that do not return a value: @table @code @item #define emalloc(pointer, type, size, message) @dots{} @@ -31220,7 +31058,7 @@ make_malloced_string(message, strlen(message), & result); @end example @item #define erealloc(pointer, type, size, message) @dots{} -This is like @code{emalloc()}, but it calls @code{gawk_realloc()}, +This is like @code{emalloc()}, but it calls @code{gawk_realloc()} instead of @code{gawk_malloc()}. The arguments are the same as for the @code{emalloc()} macro. @end table @@ -31235,28 +31073,28 @@ the way that extension code would use them: @table @code @item static inline awk_value_t * -@itemx make_const_string(const char *string, size_t length, awk_value_t *result) +@itemx make_const_string(const char *string, size_t length, awk_value_t *result); This function creates a string value in the @code{awk_value_t} variable pointed to by @code{result}. It expects @code{string} to be a C string constant (or other string data), and automatically creates a @emph{copy} of the data for storage in @code{result}. It returns @code{result}. @item static inline awk_value_t * -@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result) +@itemx make_malloced_string(const char *string, size_t length, awk_value_t *result); This function creates a string value in the @code{awk_value_t} variable pointed to by @code{result}. It expects @code{string} to be a @samp{char *} -value pointing to data previously obtained from @code{gawk_malloc()}, @code{gawk_calloc()} or @code{gawk_realloc()}. The idea here +value pointing to data previously obtained from @code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()}. The idea here is that the data is passed directly to @command{gawk}, which assumes responsibility for it. It returns @code{result}. @item static inline awk_value_t * -@itemx make_null_string(awk_value_t *result) +@itemx make_null_string(awk_value_t *result); This specialized function creates a null string (the ``undefined'' value) in the @code{awk_value_t} variable pointed to by @code{result}. It returns @code{result}. @item static inline awk_value_t * -@itemx make_number(double num, awk_value_t *result) +@itemx make_number(double num, awk_value_t *result); This function simply creates a numeric value in the @code{awk_value_t} variable pointed to by @code{result}. @end table @@ -31296,7 +31134,7 @@ The fields are: @table @code @item const char *name; The name of the new function. -@command{awk} level code calls the function by this name. +@command{awk}-level code calls the function by this name. This is a regular C string. Function names must obey the rules for @command{awk} @@ -31310,7 +31148,7 @@ This is a pointer to the C function that provides the extension's functionality. The function must fill in @code{*result} with either a number or a string. @command{gawk} takes ownership of any string memory. -As mentioned earlier, string memory @strong{must} come from one of +As mentioned earlier, string memory @emph{must} come from one of @code{gawk_malloc()}, @code{gawk_calloc()}, or @code{gawk_realloc()}. The @code{num_actual_args} argument tells the C function how many @@ -31362,20 +31200,20 @@ The @code{exit_status} parameter is the exit status value that @command{gawk} intends to pass to the @code{exit()} system call. @item arg0 -A pointer to private data which @command{gawk} saves in order to pass to +A pointer to private data that @command{gawk} saves in order to pass to the function pointed to by @code{funcp}. @end table @end table -Exit callback functions are called in last-in-first-out (LIFO) +Exit callback functions are called in last-in, first-out (LIFO) order---that is, in the reverse order in which they are registered with @command{gawk}. @node Extension Version String @subsubsection Registering An Extension Version String -You can register a version string which indicates the name and -version of your extension, with @command{gawk}, as follows: +You can register a version string that indicates the name and +version of your extension with @command{gawk}, as follows: @table @code @item void register_ext_version(const char *version); @@ -31397,7 +31235,7 @@ of @code{RS} to find the end of the record, and then uses @code{FS} Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}). If you want, you can provide your own custom input parser. An input -parser's job is to return a record to the @command{gawk} record processing +parser's job is to return a record to the @command{gawk} record-processing code, along with indicators for the value and length of the data to be used for @code{RT}, if any. @@ -31415,9 +31253,9 @@ It should not change any state (variable values, etc.) within @command{gawk}. @item awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf); When @command{gawk} decides to hand control of the file over to the input parser, it calls this function. This function in turn must fill -in certain fields in the @code{awk_input_buf_t} structure, and ensure +in certain fields in the @code{awk_input_buf_t} structure and ensure that certain conditions are true. It should then return true. If an -error of some kind occurs, it should not fill in any fields, and should +error of some kind occurs, it should not fill in any fields and should return false; then @command{gawk} will not use the input parser. The details are presented shortly. @end table @@ -31510,7 +31348,7 @@ in the @code{struct stat}, or any combination of these factors. Once @code{@var{XXX}_can_take_file()} has returned true, and @command{gawk} has decided to use your input parser, it calls -@code{@var{XXX}_take_control_of()}. That function then fills one of +@code{@var{XXX}_take_control_of()}. That function then fills either the @code{get_record} field or the @code{read_func} field in the @code{awk_input_buf_t}. It must also ensure that @code{fd} is @emph{not} set to @code{INVALID_HANDLE}. The following list describes the fields that @@ -31532,21 +31370,21 @@ records. Said function is the core of the input parser. Its behavior is described in the text following this list. @item ssize_t (*read_func)(); -This function pointer should point to function that has the +This function pointer should point to a function that has the same behavior as the standard POSIX @code{read()} system call. It is an alternative to the @code{get_record} pointer. Its behavior is also described in the text following this list. @item void (*close_func)(struct awk_input *iobuf); This function pointer should point to a function that does -the ``tear down.'' It should release any resources allocated by +the ``teardown.'' It should release any resources allocated by @code{@var{XXX}_take_control_of()}. It may also close the file. If it does so, it should set the @code{fd} field to @code{INVALID_HANDLE}. If @code{fd} is still not @code{INVALID_HANDLE} after the call to this function, @command{gawk} calls the regular @code{close()} system call. -Having a ``tear down'' function is optional. If your input parser does +Having a ``teardown'' function is optional. If your input parser does not need it, do not set this field. Then, @command{gawk} calls the regular @code{close()} system call on the file descriptor, so it should be valid. @@ -31557,7 +31395,7 @@ input records. The parameters are as follows: @table @code @item char **out -This is a pointer to a @code{char *} variable which is set to point +This is a pointer to a @code{char *} variable that is set to point to the record. @command{gawk} makes its own copy of the data, so the extension must manage this storage. @@ -31576,7 +31414,7 @@ If the concept of a ``record terminator'' makes sense, then @code{*rt_start} should be set to point to the data to be used for @code{RT}, and @code{*rt_len} should be set to the length of the data. Otherwise, @code{*rt_len} should be set to zero. -@code{gawk} makes its own copy of this data, so the +@command{gawk} makes its own copy of this data, so the extension must manage this storage. @end table @@ -31610,19 +31448,19 @@ set this field explicitly. You must choose one method or the other: either a function that returns a record, or one that returns raw data. In particular, if you supply a function to get a record, @command{gawk} will -call it, and never call the raw read function. +call it, and will never call the raw read function. @end quotation @command{gawk} ships with a sample extension that reads directories, -returning records for each entry in the directory (@pxref{Extension +returning records for each entry in a directory (@pxref{Extension Sample Readdir}). You may wish to use that code as a guide for writing your own input parser. When writing an input parser, you should think about (and document) how it is expected to interact with @command{awk} code. You may want -it to always be called, and take effect as appropriate (as the +it to always be called, and to take effect as appropriate (as the @code{readdir} extension does). Or you may want it to take effect -based upon the value of an @code{awk} variable, as the XML extension +based upon the value of an @command{awk} variable, as the XML extension from the @code{gawkextlib} project does (@pxref{gawkextlib}). In the latter case, code in a @code{BEGINFILE} section can look at @code{FILENAME} and @code{ERRNO} to decide whether or @@ -31730,7 +31568,7 @@ a pointer to any private data associated with the file. These pointers should be set to point to functions that perform the equivalent function as the @code{<stdio.h>} functions do, if appropriate. @command{gawk} uses these function pointers for all output. -@command{gawk} initializes the pointers to point to internal, ``pass through'' +@command{gawk} initializes the pointers to point to internal ``pass-through'' functions that just call the regular @code{<stdio.h>} functions, so an extension only needs to redefine those functions that are appropriate for what it does. @@ -31741,7 +31579,7 @@ upon the @code{name} and @code{mode} fields, and any additional state (such as @command{awk} variable values) that is appropriate. When @command{gawk} calls @code{@var{XXX}_take_control_of()}, that function should fill -in the other fields, as appropriate, except for @code{fp}, which it should just +in the other fields as appropriate, except for @code{fp}, which it should just use normally. You register your output wrapper with the following function: @@ -31781,14 +31619,14 @@ The fields are as follows: The name of the two-way processor. @item awk_bool_t (*can_take_two_way)(const char *name); -This function returns true if it wants to take over two-way I/O for this @value{FN}. +The function pointed to by this field should return true if it wants to take over two-way I/O for this @value{FN}. It should not change any state (variable values, etc.) within @command{gawk}. @item awk_bool_t (*take_control_of)(const char *name, @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_input_buf_t *inbuf, @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_output_buf_t *outbuf); -This function should fill in the @code{awk_input_buf_t} and +The function pointed to by this field should fill in the @code{awk_input_buf_t} and @code{awk_outut_buf_t} structures pointed to by @code{inbuf} and @code{outbuf}, respectively. These structures were described earlier. @@ -31817,7 +31655,7 @@ Register the two-way processor pointed to by @code{two_way_processor} with You can print different kinds of warning messages from your extension, as described here. Note that for these functions, -you must pass in the extension id received from @command{gawk} +you must pass in the extension ID received from @command{gawk} when the extension was loaded:@footnote{Because the API uses only ISO C 90 features, it cannot make use of the ISO C 99 variadic macro feature to hide that parameter. More's the pity.} @@ -31870,7 +31708,7 @@ matches what you requested, the function returns true and fills in the @code{awk_value_t} result. Otherwise, the function returns false, and the @code{val_type} member indicates the type of the actual value. You may then -print an error message, or reissue the request for the actual +print an error message or reissue the request for the actual value type, as appropriate. This behavior is summarized in @ref{table-value-types-returned}. @@ -31903,32 +31741,32 @@ value type, as appropriate. This behavior is summarized in <entry><para><emphasis role="bold">String</emphasis></para></entry> <entry><para>String</para></entry> <entry><para>String</para></entry> - <entry><para>false</para></entry> - <entry><para>false</para></entry> + <entry><para>False</para></entry> + <entry><para>False</para></entry> </row> <row> <entry></entry> <entry><para><emphasis role="bold">Number</emphasis></para></entry> <entry><para>Number if can be converted, else false</para></entry> <entry><para>Number</para></entry> - <entry><para>false</para></entry> - <entry><para>false</para></entry> + <entry><para>False</para></entry> + <entry><para>False</para></entry> </row> <row> <entry><para><emphasis role="bold">Type</emphasis></para></entry> <entry><para><emphasis role="bold">Array</emphasis></para></entry> - <entry><para>false</para></entry> - <entry><para>false</para></entry> + <entry><para>False</para></entry> + <entry><para>False</para></entry> <entry><para>Array</para></entry> - <entry><para>false</para></entry> + <entry><para>False</para></entry> </row> <row> <entry><para><emphasis role="bold">Requested</emphasis></para></entry> <entry><para><emphasis role="bold">Scalar</emphasis></para></entry> <entry><para>Scalar</para></entry> <entry><para>Scalar</para></entry> - <entry><para>false</para></entry> - <entry><para>false</para></entry> + <entry><para>False</para></entry> + <entry><para>False</para></entry> </row> <row> <entry></entry> @@ -31940,11 +31778,11 @@ value type, as appropriate. This behavior is summarized in </row> <row> <entry></entry> - <entry><para><emphasis role="bold">Value Cookie</emphasis></para></entry> - <entry><para>false</para></entry> - <entry><para>false</para></entry> - <entry><para>false</para> - </entry><entry><para>false</para></entry> + <entry><para><emphasis role="bold">Value cookie</emphasis></para></entry> + <entry><para>False</para></entry> + <entry><para>False</para></entry> + <entry><para>False</para> + </entry><entry><para>False</para></entry> </row> </tbody> </tgroup> @@ -31962,12 +31800,12 @@ value type, as appropriate. This behavior is summarized in @end tex @multitable @columnfractions .166 .166 .198 .15 .15 .166 @headitem @tab @tab String @tab Number @tab Array @tab Undefined -@item @tab @b{String} @tab String @tab String @tab false @tab false -@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab false @tab false -@item @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab false -@item @b{Requested} @tab @b{Scalar} @tab Scalar @tab Scalar @tab false @tab false +@item @tab @b{String} @tab String @tab String @tab False @tab False +@item @tab @b{Number} @tab Number if can be converted, else false @tab Number @tab False @tab False +@item @b{Type} @tab @b{Array} @tab False @tab False @tab Array @tab False +@item @b{Requested} @tab @b{Scalar} @tab Scalar @tab Scalar @tab False @tab False @item @tab @b{Undefined} @tab String @tab Number @tab Array @tab Undefined -@item @tab @b{Value Cookie} @tab false @tab false @tab false @tab false +@item @tab @b{Value cookie} @tab False @tab False @tab False @tab False @end multitable @end ifnotdocbook @end ifnotplaintext @@ -31978,21 +31816,21 @@ value type, as appropriate. This behavior is summarized in +------------+------------+-----------+-----------+ | String | Number | Array | Undefined | +-----------+-----------+------------+------------+-----------+-----------+ -| | String | String | String | false | false | +| | String | String | String | False | False | | |-----------+------------+------------+-----------+-----------+ -| | Number | Number if | Number | false | false | +| | Number | Number if | Number | False | False | | | | can be | | | | | | | converted, | | | | | | | else false | | | | | |-----------+------------+------------+-----------+-----------+ -| Type | Array | false | false | Array | false | +| Type | Array | False | False | Array | False | | Requested |-----------+------------+------------+-----------+-----------+ -| | Scalar | Scalar | Scalar | false | false | +| | Scalar | Scalar | Scalar | False | False | | |-----------+------------+------------+-----------+-----------+ | | Undefined | String | Number | Array | Undefined | | |-----------+------------+------------+-----------+-----------+ -| | Value | false | false | false | false | -| | Cookie | | | | | +| | Value | False | False | False | False | +| | cookie | | | | | +-----------+-----------+------------+------------+-----------+-----------+ @end example @end ifplaintext @@ -32009,16 +31847,16 @@ passed to your extension function. They are: @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted, @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result); Fill in the @code{awk_value_t} structure pointed to by @code{result} -with the @code{count}'th argument. Return true if the actual -type matches @code{wanted}, false otherwise. In the latter +with the @code{count}th argument. Return true if the actual +type matches @code{wanted}, and false otherwise. In the latter case, @code{result@w{->}val_type} indicates the actual type -(@pxref{table-value-types-returned}). Counts are zero based---the first +(@pxref{table-value-types-returned}). Counts are zero-based---the first argument is numbered zero, the second one, and so on. @code{wanted} indicates the type of value expected. @item awk_bool_t set_argument(size_t count, awk_array_t array); Convert a parameter that was undefined into an array; this provides -call-by-reference for arrays. Return false if @code{count} is too big, +call by reference for arrays. Return false if @code{count} is too big, or if the argument's type is not undefined. @DBXREF{Array Manipulation} for more information on creating arrays. @end table @@ -32042,8 +31880,9 @@ allows you to create and release cached values. The following routines provide the ability to access and update global @command{awk}-level variables by name. In compiler terminology, identifiers of different kinds are termed @dfn{symbols}, thus the ``sym'' -in the routines' names. The data structure which stores information +in the routines' names. The data structure that stores information about symbols is termed a @dfn{symbol table}. +The functions are as follows: @table @code @item awk_bool_t sym_lookup(const char *name, @@ -32052,14 +31891,14 @@ about symbols is termed a @dfn{symbol table}. Fill in the @code{awk_value_t} structure pointed to by @code{result} with the value of the variable named by the string @code{name}, which is a regular C string. @code{wanted} indicates the type of value expected. -Return true if the actual type matches @code{wanted}, false otherwise. +Return true if the actual type matches @code{wanted}, and false otherwise. In the latter case, @code{result->val_type} indicates the actual type (@pxref{table-value-types-returned}). @item awk_bool_t sym_update(const char *name, awk_value_t *value); Update the variable named by the string @code{name}, which is a regular C string. The variable is added to @command{gawk}'s symbol table -if it is not there. Return true if everything worked, false otherwise. +if it is not there. Return true if everything worked, and false otherwise. Changing types (scalar to array or vice versa) of an existing variable is @emph{not} allowed, nor may this routine be used to update an array. @@ -32084,7 +31923,7 @@ populate it. A @dfn{scalar cookie} is an opaque handle that provides access to a global variable or array. It is an optimization that avoids looking up variables in @command{gawk}'s symbol table every time -access is needed. This was discussed earlier in @ref{General Data Types}. +access is needed. This was discussed earlier, in @ref{General Data Types}. The following functions let you work with scalar cookies: @@ -32200,7 +32039,7 @@ and carefully check the return values from the API functions. @subsubsection Creating and Using Cached Values The routines in this section allow you to create and release -cached values. As with scalar cookies, in theory, cached values +cached values. Like scalar cookies, in theory, cached values are not necessary. You can create numbers and strings using the functions in @ref{Constructor Functions}. You can then assign those values to variables using @code{sym_update()} @@ -32278,7 +32117,7 @@ Using value cookies in this way saves considerable storage, as all of @code{VAR1} through @code{VAR100} share the same value. You might be wondering, ``Is this sharing problematic? -What happens if @command{awk} code assigns a new value to @code{VAR1}, +What happens if @command{awk} code assigns a new value to @code{VAR1}; are all the others changed too?'' That's a great question. The answer is that no, it's not a problem. @@ -32382,7 +32221,7 @@ modify them. @node Array Functions @subsubsection Array Functions -The following functions relate to individual array elements. +The following functions relate to individual array elements: @table @code @item awk_bool_t get_element_count(awk_array_t a_cookie, size_t *count); @@ -32401,13 +32240,13 @@ Return false if @code{wanted} does not match the actual type or if @code{index} is not in the array (@pxref{table-value-types-returned}). The value for @code{index} can be numeric, in which case @command{gawk} -converts it to a string. Using non-integral values is possible, but +converts it to a string. Using nonintegral values is possible, but requires that you understand how such values are converted to strings -(@pxref{Conversion}); thus using integral values is safest. +(@pxref{Conversion}); thus, using integral values is safest. -As with @emph{all} strings passed into @code{gawk} from an extension, +As with @emph{all} strings passed into @command{gawk} from an extension, the string value of @code{index} must come from @code{gawk_malloc()}, -@code{gawk_calloc()} or @code{gawk_realloc()}, and +@code{gawk_calloc()}, or @code{gawk_realloc()}, and @command{gawk} releases the storage. @item awk_bool_t set_array_element(awk_array_t a_cookie, @@ -32463,7 +32302,7 @@ flatten an array and work with it. @item awk_bool_t release_flattened_array(awk_array_t a_cookie, @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_flat_array_t *data); When done with a flattened array, release the storage using this function. -You must pass in both the original array cookie, and the address of +You must pass in both the original array cookie and the address of the created @code{awk_flat_array_t} structure. The function returns true upon success, false otherwise. @end table @@ -32473,7 +32312,7 @@ The function returns true upon success, false otherwise. To @dfn{flatten} an array is to create a structure that represents the full array in a fashion that makes it easy -for C code to traverse the entire array. Test code +for C code to traverse the entire array. Some of the code in @file{extension/testext.c} does this, and also serves as a nice example showing how to use the APIs. @@ -32530,9 +32369,9 @@ dump_array_and_delete(int nargs, awk_value_t *result) @end example The function then proceeds in steps, as follows. First, retrieve -the name of the array, passed as the first argument. Then -retrieve the array itself. If either operation fails, print -error messages and return: +the name of the array, passed as the first argument, followed by +the array itself. If either operation fails, print an +error message and return: @example /* get argument named array as flat array and print it */ @@ -32568,7 +32407,7 @@ and print it: @end example The third step is to actually flatten the array, and then -to double check that the count in the @code{awk_flat_array_t} +to double-check that the count in the @code{awk_flat_array_t} is the same as the count just retrieved: @example @@ -32589,7 +32428,7 @@ is the same as the count just retrieved: The fourth step is to retrieve the index of the element to be deleted, which was passed as the second argument. Remember that argument counts passed to @code{get_argument()} -are zero-based, thus the second argument is numbered one: +are zero-based, and thus the second argument is numbered one: @example if (! get_argument(1, AWK_STRING, & value3)) @{ @@ -32604,7 +32443,7 @@ element values. In addition, upon finding the element with the index that is supposed to be deleted, the function sets the @code{AWK_ELEMENT_DELETE} bit in the @code{flags} field of the element. When the array is released, @command{gawk} -traverses the flattened array, and deletes any elements which +traverses the flattened array, and deletes any elements that have this flag bit set: @example @@ -32892,10 +32731,10 @@ The API versions are available at compile time as constants: @table @code @item GAWK_API_MAJOR_VERSION -The major version of the API. +The major version of the API @item GAWK_API_MINOR_VERSION -The minor version of the API. +The minor version of the API @end table The minor version increases when new functions are added to the API. Such @@ -32913,14 +32752,14 @@ constant integers: @table @code @item api->major_version -The major version of the running @command{gawk}. +The major version of the running @command{gawk} @item api->minor_version -The minor version of the running @command{gawk}. +The minor version of the running @command{gawk} @end table It is up to the extension to decide if there are API incompatibilities. -Typically a check like this is enough: +Typically, a check like this is enough: @example if (api->major_version != GAWK_API_MAJOR_VERSION @@ -32934,7 +32773,7 @@ if (api->major_version != GAWK_API_MAJOR_VERSION @end example Such code is included in the boilerplate @code{dl_load_func()} macro -provided in @file{gawkapi.h} (discussed later, in +provided in @file{gawkapi.h} (discussed in @ref{Extension API Boilerplate}). @node Extension API Informational Variables @@ -32981,7 +32820,7 @@ as described here. The boilerplate needed is also provided in comments in the @file{gawkapi.h} header file: @example -/* Boiler plate code: */ +/* Boilerplate code: */ int plugin_is_GPL_compatible; static gawk_api_t *const api; @@ -33040,7 +32879,7 @@ to @code{NULL}, or to point to a string giving the name and version of your extension. @item static awk_ext_func_t func_table[] = @{ @dots{} @}; -This is an array of one or more @code{awk_ext_func_t} structures +This is an array of one or more @code{awk_ext_func_t} structures, as described earlier (@pxref{Extension Functions}). It can then be looped over for multiple calls to @code{add_ext_func()}. @@ -33171,7 +33010,7 @@ the @code{stat()} fails. It fills in the following elements: @table @code @item "name" -The name of the file that was @code{stat()}'ed. +The name of the file that was @code{stat()}ed. @item "dev" @itemx "ino" @@ -33227,7 +33066,7 @@ interprocess communications). The file is a directory. @item "fifo" -The file is a named-pipe (also known as a FIFO). +The file is a named pipe (also known as a FIFO). @item "file" The file is just a regular file. @@ -33250,7 +33089,7 @@ For some other systems, @dfn{a priori} knowledge is used to provide a value. Where no value can be determined, it defaults to 512. @end table -Several additional elements may be present depending upon the operating +Several additional elements may be present, depending upon the operating system and the type of the file. You can test for them in your @command{awk} program by using the @code{in} operator (@pxref{Reference to Elements}): @@ -33280,7 +33119,7 @@ edited slightly for presentation. See @file{extension/filefuncs.c} in the @command{gawk} distribution for the complete version.} The file includes a number of standard header files, and then includes -the @file{gawkapi.h} header file which provides the API definitions. +the @file{gawkapi.h} header file, which provides the API definitions. Those are followed by the necessary variable declarations to make use of the API macros and boilerplate code (@pxref{Extension API Boilerplate}): @@ -33321,9 +33160,9 @@ int plugin_is_GPL_compatible; @cindex programming conventions, @command{gawk} extensions By convention, for an @command{awk} function @code{foo()}, the C function that implements it is called @code{do_foo()}. The function should have -two arguments: the first is an @code{int} usually called @code{nargs}, +two arguments. The first is an @code{int}, usually called @code{nargs}, that represents the number of actual arguments for the function. -The second is a pointer to an @code{awk_value_t}, usually named +The second is a pointer to an @code{awk_value_t} structure, usually named @code{result}: @example @@ -33369,7 +33208,7 @@ Finally, the function returns the return value to the @command{awk} level: The @code{stat()} extension is more involved. First comes a function that turns a numeric mode into a printable representation -(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: +(e.g., octal @code{0644} becomes @samp{-rw-r--r--}). This is omitted here for brevity: @example /* format_mode --- turn a stat mode field into something readable */ @@ -33425,9 +33264,9 @@ array_set_numeric(awk_array_t array, const char *sub, double num) The following function does most of the work to fill in the @code{awk_array_t} result array with values obtained -from a valid @code{struct stat}. It is done in a separate function +from a valid @code{struct stat}. This work is done in a separate function to support the @code{stat()} function for @command{gawk} and also -to support the @code{fts()} extension which is included in +to support the @code{fts()} extension, which is included in the same file but whose code is not shown here (@pxref{Extension Sample File Functions}). @@ -33548,8 +33387,8 @@ the @code{stat()} system call instead of the @code{lstat()} system call. This is done by using a function pointer: @code{statfunc}. @code{statfunc} is initialized to point to @code{lstat()} (instead of @code{stat()}) to get the file information, in case the file is a -symbolic link. However, if there were three arguments, @code{statfunc} -is set point to @code{stat()}, instead. +symbolic link. However, if the third argument is included, @code{statfunc} +is set to point to @code{stat()}, instead. Here is the @code{do_stat()} function, which starts with variable declarations and argument checking: @@ -33605,7 +33444,7 @@ Next, it gets the information for the file. If the called function /* always empty out the array */ clear_array(array); - /* stat the file, if error, set ERRNO and return */ + /* stat the file; if error, set ERRNO and return */ ret = statfunc(name, & sbuf); if (ret < 0) @{ update_ERRNO_int(errno); @@ -33627,7 +33466,9 @@ Finally, it's necessary to provide the ``glue'' that loads the new function(s) into @command{gawk}. The @code{filefuncs} extension also provides an @code{fts()} -function, which we omit here. For its sake there is an initialization +function, which we omit here +(@pxref{Extension Sample File Functions}). +For its sake, there is an initialization function: @example @@ -33752,9 +33593,9 @@ $ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk} @section The Sample Extensions in the @command{gawk} Distribution @cindex extensions distributed with @command{gawk} -This @value{SECTION} provides brief overviews of the sample extensions +This @value{SECTION} provides a brief overview of the sample extensions that come in the @command{gawk} distribution. Some of them are intended -for production use (e.g., the @code{filefuncs}, @code{readdir} and +for production use (e.g., the @code{filefuncs}, @code{readdir}, and @code{inplace} extensions). Others mainly provide example code that shows how to use the extension API. @@ -33790,14 +33631,14 @@ This is how you load the extension. @item @code{result = chdir("/some/directory")} The @code{chdir()} function is a direct hook to the @code{chdir()} system call to change the current directory. It returns zero -upon success or less than zero upon error. In the latter case, it updates -@code{ERRNO}. +upon success or a value less than zero upon error. +In the latter case, it updates @code{ERRNO}. @cindex @code{stat()} extension function @item @code{result = stat("/some/path", statdata} [@code{, follow}]@code{)} The @code{stat()} function provides a hook into the @code{stat()} system call. -It returns zero upon success or less than zero upon error. +It returns zero upon success or a value less than zero upon error. In the latter case, it updates @code{ERRNO}. By default, it uses the @code{lstat()} system call. However, if passed @@ -33824,10 +33665,10 @@ array with information retrieved from the filesystem, as follows: @item @code{"major"} @tab @code{st_major} @tab Device files @item @code{"minor"} @tab @code{st_minor} @tab Device files @item @code{"blksize"} @tab @code{st_blksize} @tab All -@item @code{"pmode"} @tab A human-readable version of the mode value, such as printed by -@command{ls}. For example, @code{"-rwxr-xr-x"} @tab All +@item @code{"pmode"} @tab A human-readable version of the mode value, like that printed by +@command{ls} (for example, @code{"-rwxr-xr-x"}) @tab All @item @code{"linkval"} @tab The value of the symbolic link @tab Symbolic links -@item @code{"type"} @tab The type of the file as a string. One of +@item @code{"type"} @tab The type of the file as a string---one of @code{"file"}, @code{"blockdev"}, @code{"chardev"}, @@ -33837,15 +33678,15 @@ array with information retrieved from the filesystem, as follows: @code{"symlink"}, @code{"door"}, or -@code{"unknown"}. -Not all systems support all file types. @tab All +@code{"unknown"} +(not all systems support all file types) @tab All @end multitable @cindex @code{fts()} extension function @item @code{flags = or(FTS_PHYSICAL, ...)} @itemx @code{result = fts(pathlist, flags, filedata)} Walk the file trees provided in @code{pathlist} and fill in the -@code{filedata} array as described next. @code{flags} is the bitwise +@code{filedata} array, as described next. @code{flags} is the bitwise OR of several predefined values, also described in a moment. Return zero if there were no errors, otherwise return @minus{}1. @end table @@ -33901,7 +33742,8 @@ During a traversal, do not cross onto a different mounted filesystem. @end table @item filedata -The @code{filedata} array is first cleared. Then, @code{fts()} creates +The @code{filedata} array holds the results. +@code{fts()} first clears it. Then it creates an element in @code{filedata} for every element in @code{pathlist}. The index is the name of the directory or file given in @code{pathlist}. The element for this index is itself an array. There are two cases: @@ -33943,7 +33785,7 @@ for a file: @code{"path"}, @code{"stat"}, and @code{"error"}. @end table The @code{fts()} function returns zero if there were no errors. -Otherwise it returns @minus{}1. +Otherwise, it returns @minus{}1. @quotation NOTE The @code{fts()} extension does not exactly mimic the @@ -33985,14 +33827,14 @@ The arguments to @code{fnmatch()} are: @table @code @item pattern -The @value{FN} wildcard to match. +The @value{FN} wildcard to match @item string -The @value{FN} string. +The @value{FN} string @item flag Either zero, or the bitwise OR of one or more of the -flags in the @code{FNM} array. +flags in the @code{FNM} array @end table The flags are as follows: @@ -34029,14 +33871,14 @@ This is how you load the extension. @cindex @code{fork()} extension function @item pid = fork() This function creates a new process. The return value is zero in the -child and the process-ID number of the child in the parent, or @minus{}1 +child and the process ID number of the child in the parent, or @minus{}1 upon error. In the latter case, @code{ERRNO} indicates the problem. In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are updated to reflect the correct values. @cindex @code{waitpid()} extension function @item ret = waitpid(pid) -This function takes a numeric argument, which is the process-ID to +This function takes a numeric argument, which is the process ID to wait for. The return value is that of the @code{waitpid()} system call. @@ -34064,8 +33906,8 @@ else @subsection Enabling In-Place File Editing @cindex @code{inplace} extension -The @code{inplace} extension emulates GNU @command{sed}'s @option{-i} option -which performs ``in place'' editing of each input file. +The @code{inplace} extension emulates GNU @command{sed}'s @option{-i} option, +which performs ``in-place'' editing of each input file. It uses the bundled @file{inplace.awk} include file to invoke the extension properly: @@ -34161,14 +34003,14 @@ they are read, with each entry returned as a record. The record consists of three fields. The first two are the inode number and the @value{FN}, separated by a forward slash character. On systems where the directory entry contains the file type, the record -has a third field (also separated by a slash) which is a single letter +has a third field (also separated by a slash), which is a single letter indicating the type of the file. The letters and their corresponding file types are shown in @ref{table-readdir-file-types}. @float Table,table-readdir-file-types @caption{File types returned by the @code{readdir} extension} @multitable @columnfractions .1 .9 -@headitem Letter @tab File Type +@headitem Letter @tab File type @item @code{b} @tab Block device @item @code{c} @tab Character device @item @code{d} @tab Directory @@ -34196,7 +34038,7 @@ Here is an example: @@load "readdir" @dots{} BEGIN @{ FS = "/" @} -@{ print "file name is", $2 @} +@{ print "@value{FN} is", $2 @} @end example @node Extension Sample Revout @@ -34217,8 +34059,7 @@ BEGIN @{ @} @end example -The output from this program is: -@samp{cinap t'nod}. +The output from this program is @samp{cinap t'nod}. @node Extension Sample Rev2way @subsection Two-Way I/O Example @@ -34273,7 +34114,7 @@ success, or zero upon failure. @code{reada()} is the inverse of @code{writea()}; it reads the file named as its first argument, filling in the array named as the second argument. It clears the array first. -Here too, the return value is one on success and zero upon failure. +Here too, the return value is one on success, or zero upon failure. @end table The array created by @code{reada()} is identical to that written by @@ -34361,7 +34202,7 @@ it tries to use @code{GetSystemTimeAsFileTime()}. Attempt to sleep for @var{seconds} seconds. If @var{seconds} is negative, or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}. Otherwise, return zero after sleeping for the indicated amount of time. -Note that @var{seconds} may be a floating-point (non-integral) value. +Note that @var{seconds} may be a floating-point (nonintegral) value. Implementation details: depending on platform availability, this function tries to use @code{nanosleep()} or @code{select()} to implement the delay. @end table @@ -34388,10 +34229,13 @@ project provides a number of @command{gawk} extensions, including one for processing XML files. This is the evolution of the original @command{xgawk} (XML @command{gawk}) project. -As of this writing, there are six extensions: +As of this writing, there are seven extensions: @itemize @value{BULLET} @item +@code{errno} extension + +@item GD graphics library extension @item @@ -34402,7 +34246,7 @@ PostgreSQL extension @item MPFR library extension -(this provides access to a number of MPFR functions which @command{gawk}'s +(this provides access to a number of MPFR functions that @command{gawk}'s native MPFR support does not) @item @@ -34456,7 +34300,7 @@ make install @ii{Install the extensions} If you have installed @command{gawk} in the standard way, then you will likely not need the @option{--with-gawk} option when configuring -@code{gawkextlib}. You may also need to use the @command{sudo} utility +@code{gawkextlib}. You may need to use the @command{sudo} utility to install both @command{gawk} and @code{gawkextlib}, depending upon how your system works. @@ -34481,7 +34325,7 @@ named @code{plugin_is_GPL_compatible}. @item Communication between @command{gawk} and an extension is two-way. -@command{gawk} passes a @code{struct} to the extension which contains +@command{gawk} passes a @code{struct} to the extension that contains various data fields and function pointers. The extension can then call into @command{gawk} via the supplied function pointers to accomplish certain tasks. @@ -34494,7 +34338,7 @@ By convention, implementation functions are named @code{do_@var{XXXX}()} for some @command{awk}-level function @code{@var{XXXX}()}. @item -The API is defined in a header file named @file{gawkpi.h}. You must include +The API is defined in a header file named @file{gawkapi.h}. You must include a number of standard header files @emph{before} including it in your source file. @item @@ -34539,7 +34383,7 @@ getting the count of elements in an array; creating a new array; clearing an array; and -flattening an array for easy C style looping over all its indices and elements) +flattening an array for easy C-style looping over all its indices and elements) @end itemize @item @@ -34547,7 +34391,7 @@ The API defines a number of standard data types for representing @command{awk} values, array elements, and arrays. @item -The API provide convenience functions for constructing values. +The API provides convenience functions for constructing values. It also provides memory management functions to ensure compatibility between memory allocated by @command{gawk} and memory allocated by an extension. @@ -34573,8 +34417,8 @@ file make this easier to do. @item The @command{gawk} distribution includes a number of small but useful -sample extensions. The @code{gawkextlib} project includes several more, -larger, extensions. If you wish to write an extension and contribute it +sample extensions. The @code{gawkextlib} project includes several more +(larger) extensions. If you wish to write an extension and contribute it to the community of @command{gawk} users, the @code{gawkextlib} project is the place to do so. @@ -34691,9 +34535,7 @@ online documentation}. @node V7/SVR3.1 @appendixsec Major Changes Between V7 and SVR3.1 -@c STARTOFRANGE gawkv @cindex @command{awk}, versions of -@c STARTOFRANGE gawkv1 @cindex @command{awk}, versions of, changes between V7 and SVR3.1 The @command{awk} language evolved considerably between the release of @@ -34704,83 +34546,82 @@ cross-references to further details: @itemize @value{BULLET} @item The requirement for @samp{;} to separate rules on a line -(@pxref{Statements/Lines}). +(@pxref{Statements/Lines}) @item User-defined functions and the @code{return} statement -(@pxref{User-defined}). +(@pxref{User-defined}) @item The @code{delete} statement (@pxref{Delete}). @item The @code{do}-@code{while} statement -(@pxref{Do Statement}). +(@pxref{Do Statement}) @item The built-in functions @code{atan2()}, @code{cos()}, @code{sin()}, @code{rand()}, and -@code{srand()} (@pxref{Numeric Functions}). +@code{srand()} (@pxref{Numeric Functions}) @item The built-in functions @code{gsub()}, @code{sub()}, and @code{match()} -(@pxref{String Functions}). +(@pxref{String Functions}) @item The built-in functions @code{close()} and @code{system()} -(@pxref{I/O Functions}). +(@pxref{I/O Functions}) @item The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART}, -and @code{SUBSEP} predefined variables (@pxref{Built-in Variables}). +and @code{SUBSEP} predefined variables (@pxref{Built-in Variables}) @item -Assignable @code{$0} (@pxref{Changing Fields}). +Assignable @code{$0} (@pxref{Changing Fields}) @item The conditional expression using the ternary operator @samp{?:} -(@pxref{Conditional Exp}). +(@pxref{Conditional Exp}) @item -The expression @samp{@var{index-variable} in @var{array}} outside of @code{for} -statements (@pxref{Reference to Elements}). +The expression @samp{@var{indx} in @var{array}} outside of @code{for} +statements (@pxref{Reference to Elements}) @item The exponentiation operator @samp{^} (@pxref{Arithmetic Ops}) and its assignment operator -form @samp{^=} (@pxref{Assignment Ops}). +form @samp{^=} (@pxref{Assignment Ops}) @item C-compatible operator precedence, which breaks some old @command{awk} -programs (@pxref{Precedence}). +programs (@pxref{Precedence}) @item Regexps as the value of @code{FS} (@pxref{Field Separators}) and as the third argument to the @code{split()} function (@pxref{String Functions}), rather than using only the first character -of @code{FS}. +of @code{FS} @item Dynamic regexps as operands of the @samp{~} and @samp{!~} operators -(@pxref{Computed Regexps}). +(@pxref{Computed Regexps}) @item The escape sequences @samp{\b}, @samp{\f}, and @samp{\r} -(@pxref{Escape Sequences}). +(@pxref{Escape Sequences}) @item Redirection of input for the @code{getline} function -(@pxref{Getline}). +(@pxref{Getline}) @item Multiple @code{BEGIN} and @code{END} rules -(@pxref{BEGIN/END}). +(@pxref{BEGIN/END}) @item Multidimensional arrays -(@pxref{Multidimensional}). +(@pxref{Multidimensional}) @end itemize -@c ENDOFRANGE gawkv1 @node SVR4 @appendixsec Changes Between SVR3.1 and SVR4 @@ -34791,54 +34632,54 @@ The System V Release 4 (1989) version of Unix @command{awk} added these features @itemize @value{BULLET} @item -The @code{ENVIRON} array (@pxref{Built-in Variables}). +The @code{ENVIRON} array (@pxref{Built-in Variables}) @c gawk and MKS awk @item Multiple @option{-f} options on the command line -(@pxref{Options}). +(@pxref{Options}) @c MKS awk @item The @option{-v} option for assigning variables before program execution begins -(@pxref{Options}). +(@pxref{Options}) @c GNU, Bell Laboratories & MKS together @item -The @option{--} signal for terminating command-line options. +The @option{--} signal for terminating command-line options @item The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences -(@pxref{Escape Sequences}). +(@pxref{Escape Sequences}) @c GNU, for ANSI C compat @item A defined return value for the @code{srand()} built-in function -(@pxref{Numeric Functions}). +(@pxref{Numeric Functions}) @item The @code{toupper()} and @code{tolower()} built-in string functions for case translation -(@pxref{String Functions}). +(@pxref{String Functions}) @item A cleaner specification for the @samp{%c} format-control letter in the @code{printf} function -(@pxref{Control Letters}). +(@pxref{Control Letters}) @item The ability to dynamically pass the field width and precision (@code{"%*.*d"}) in the argument list of @code{printf} and @code{sprintf()} -(@pxref{Control Letters}). +(@pxref{Control Letters}) @item The use of regexp constants, such as @code{/foo/}, as expressions, where they are equivalent to using the matching operator, as in @samp{$0 ~ /foo/} -(@pxref{Using Constant Regexps}). +(@pxref{Using Constant Regexps}) @item Processing of escape sequences inside command-line variable assignments -(@pxref{Assignment Options}). +(@pxref{Assignment Options}) @end itemize @node POSIX @@ -34852,23 +34693,23 @@ introduced the following changes into the language: @itemize @value{BULLET} @item The use of @option{-W} for implementation-specific options -(@pxref{Options}). +(@pxref{Options}) @item The use of @code{CONVFMT} for controlling the conversion of numbers -to strings (@pxref{Conversion}). +to strings (@pxref{Conversion}) @item The concept of a numeric string and tighter comparison rules to go -with it (@pxref{Typing and Comparison}). +with it (@pxref{Typing and Comparison}) @item The use of predefined variables as function parameter names is forbidden -(@pxref{Definition Syntax}). +(@pxref{Definition Syntax}) @item More complete documentation of many of the previously undocumented -features of the language. +features of the language @end itemize In 2012, a number of extensions that had been commonly available for @@ -34877,15 +34718,15 @@ many years were finally added to POSIX. They are: @itemize @value{BULLET} @item The @code{fflush()} built-in function for flushing buffered output -(@pxref{I/O Functions}). +(@pxref{I/O Functions}) @item The @code{nextfile} statement -(@pxref{Nextfile Statement}). +(@pxref{Nextfile Statement}) @item The ability to delete all of an array at once with @samp{delete @var{array}} -(@pxref{Delete}). +(@pxref{Delete}) @end itemize @@ -34895,7 +34736,6 @@ not permitted by the POSIX standard. The 2008 POSIX standard can be found online at @url{http://www.opengroup.org/onlinepubs/9699919799/}. -@c ENDOFRANGE gawkv @node BTL @appendixsec Extensions in Brian Kernighan's @command{awk} @@ -34916,22 +34756,22 @@ originally appeared in his version of @command{awk}: The @samp{**} and @samp{**=} operators (@pxref{Arithmetic Ops} and -@ref{Assignment Ops}). +@ref{Assignment Ops}) @item The use of @code{func} as an abbreviation for @code{function} -(@pxref{Definition Syntax}). +(@pxref{Definition Syntax}) @item The @code{fflush()} built-in function for flushing buffered output -(@pxref{I/O Functions}). +(@pxref{I/O Functions}) @ignore @item The @code{SYMTAB} array, that allows access to @command{awk}'s internal symbol table. This feature was never documented for his @command{awk}, largely because it is somewhat shakily implemented. For instance, you cannot access arrays -or array elements through it. +or array elements through it @end ignore @end itemize @@ -34941,11 +34781,8 @@ available in his @command{awk}. @node POSIX/GNU @appendixsec Extensions in @command{gawk} Not in POSIX @command{awk} -@c STARTOFRANGE fripls @cindex compatibility mode (@command{gawk}), extensions -@c STARTOFRANGE exgnot @cindex extensions, in @command{gawk}, not in POSIX @command{awk} -@c STARTOFRANGE posnot @cindex POSIX, @command{gawk} extensions not included in The GNU implementation, @command{gawk}, adds a large number of features. They can all be disabled with either the @option{--traditional} or @@ -34964,7 +34801,7 @@ Additional predefined variables: @itemize @value{MINUS} @item The -@code{ARGIND} +@code{ARGIND}, @code{BINMODE}, @code{ERRNO}, @code{FIELDWIDTHS}, @@ -34976,7 +34813,7 @@ The and @code{TEXTDOMAIN} variables -(@pxref{Built-in Variables}). +(@pxref{Built-in Variables}) @end itemize @item @@ -34984,15 +34821,15 @@ Special files in I/O redirections: @itemize @value{MINUS} @item -The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and +The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr}, and @file{/dev/fd/@var{N}} special @value{FN}s -(@pxref{Special Files}). +(@pxref{Special Files}) @item The @file{/inet}, @file{/inet4}, and @samp{/inet6} special files for TCP/IP networking using @samp{|&} to specify which version of the IP protocol to use -(@pxref{TCP/IP Networking}). +(@pxref{TCP/IP Networking}) @end itemize @item @@ -35001,37 +34838,41 @@ Changes and/or additions to the language: @itemize @value{MINUS} @item The @samp{\x} escape sequence -(@pxref{Escape Sequences}). +(@pxref{Escape Sequences}) @item Full support for both POSIX and GNU regexps -(@pxref{Regexp}). +(@pxref{Regexp}) @item The ability for @code{FS} and for the third argument to @code{split()} to be null strings -(@pxref{Single Character Fields}). +(@pxref{Single Character Fields}) @item The ability for @code{RS} to be a regexp -(@pxref{Records}). +(@pxref{Records}) @item The ability to use octal and hexadecimal constants in @command{awk} program source code -(@pxref{Nondecimal-numbers}). +(@pxref{Nondecimal-numbers}) @item The @samp{|&} operator for two-way I/O to a coprocess -(@pxref{Two-way I/O}). +(@pxref{Two-way I/O}) @item Indirect function calls -(@pxref{Indirect Calls}). +(@pxref{Indirect Calls}) @item Directories on the command line produce a warning and are skipped -(@pxref{Command-line directories}). +(@pxref{Command-line directories}) + +@item +Output with @code{print} and @code{printf} need not be fatal +(@pxref{Nonfatal}) @end itemize @item @@ -35040,11 +34881,11 @@ New keywords: @itemize @value{MINUS} @item The @code{BEGINFILE} and @code{ENDFILE} special patterns -(@pxref{BEGINFILE/ENDFILE}). +(@pxref{BEGINFILE/ENDFILE}) @item The @code{switch} statement -(@pxref{Switch Statement}). +(@pxref{Switch Statement}) @end itemize @item @@ -35054,30 +34895,30 @@ Changes to standard @command{awk} functions: @item The optional second argument to @code{close()} that allows closing one end of a two-way pipe to a coprocess -(@pxref{Two-way I/O}). +(@pxref{Two-way I/O}) @item -POSIX compliance for @code{gsub()} and @code{sub()} with @option{--posix}. +POSIX compliance for @code{gsub()} and @code{sub()} with @option{--posix} @item The @code{length()} function accepts an array argument and returns the number of elements in the array -(@pxref{String Functions}). +(@pxref{String Functions}) @item The optional third argument to the @code{match()} function for capturing text-matching subexpressions within a regexp -(@pxref{String Functions}). +(@pxref{String Functions}) @item Positional specifiers in @code{printf} formats for making translations easier -(@pxref{Printf Ordering}). +(@pxref{Printf Ordering}) @item The @code{split()} function's additional optional fourth -argument which is an array to hold the text of the field separators -(@pxref{String Functions}). +argument, which is an array to hold the text of the field separators +(@pxref{String Functions}) @end itemize @item @@ -35087,16 +34928,16 @@ Additional functions only in @command{gawk}: @item The @code{gensub()}, @code{patsplit()}, and @code{strtonum()} functions for more powerful text manipulation -(@pxref{String Functions}). +(@pxref{String Functions}) @item The @code{asort()} and @code{asorti()} functions for sorting arrays -(@pxref{Array Sorting}). +(@pxref{Array Sorting}) @item The @code{mktime()}, @code{systime()}, and @code{strftime()} functions for working with timestamps -(@pxref{Time Functions}). +(@pxref{Time Functions}) @item The @@ -35108,17 +34949,22 @@ The and @code{xor()} functions for bit manipulation -(@pxref{Bitwise Functions}). +(@pxref{Bitwise Functions}) @c In 4.1, and(), or() and xor() grew the ability to take > 2 arguments @item The @code{isarray()} function to check if a variable is an array or not -(@pxref{Type Functions}). +(@pxref{Type Functions}) @item -The @code{bindtextdomain()}, @code{dcgettext()} and @code{dcngettext()} +The @code{bindtextdomain()}, @code{dcgettext()}, and @code{dcngettext()} functions for internationalization -(@pxref{Programmer i18n}). +(@pxref{Programmer i18n}) + +@item +The @code{div()} function for doing integer +division and remainder +(@pxref{Numeric Functions}) @end itemize @item @@ -35128,12 +34974,12 @@ Changes and/or additions in the command-line options: @item The @env{AWKPATH} environment variable for specifying a path search for the @option{-f} command-line option -(@pxref{Options}). +(@pxref{Options}) @item The @env{AWKLIBPATH} environment variable for specifying a path search for the @option{-l} command-line option -(@pxref{Options}). +(@pxref{Options}) @item The @@ -35162,7 +35008,7 @@ The and @option{-V} short options. Also, the -ability to use GNU-style long-named options that start with @option{--} +ability to use GNU-style long-named options that start with @option{--}, and the @option{--assign}, @option{--bignum}, @@ -35242,7 +35088,7 @@ GCC for VAX and Alpha has not been tested for a while. @end itemize @item -Support for the following obsolete systems was removed from the code +Support for the following obsolete system was removed from the code for @command{gawk} @value{PVERSION} 4.1: @c nested table @@ -35252,16 +35098,19 @@ Ultrix @end itemize @item -@c FIXME: Verify the version here. -Support for MirBSD was removed at @command{gawk} @value{PVERSION} 4.2. +Support for the following systems was removed from the code +for @command{gawk} @value{PVERSION} 4.2: + +@c nested table +@itemize @value{MINUS} +@item +MirBSD +@end itemize @end itemize @c XXX ADD MORE STUFF HERE -@c ENDOFRANGE fripls -@c ENDOFRANGE exgnot -@c ENDOFRANGE posnot @c This does not need to be in the formal book. @ifclear FOR_PRINT @@ -35870,6 +35719,44 @@ with a minimum of two The dynamic extension interface was completely redone (@pxref{Dynamic Extensions}). +@item +Support for Ultrix was removed. + +@end itemize + +Version 4.2 introduced the following changes: + +@itemize @bullet +@item +Changes to @code{ENVIRON} are reflected into @command{gawk}'s +environment and that of programs that it runs. +@xref{Auto-set}. + +@item +The @option{--pretty-print} option no longer runs the @command{awk} +program too. +@xref{Options}. + +@item +The @command{igawk} program and its manual page are no longer +installed when @command{gawk} is built. +@xref{Igawk Program}. + +@item +The @code{div()} function. +@xref{Numeric Functions}. + +@item +The maximum number of hexdecimal digits in @samp{\x} escapes +is now two. +@xref{Escape Sequences}. + +@item +Nonfatal output with @code{print} and @code{printf}. +@xref{Nonfatal}. + +@item +Support for MirBSD was removed. @end itemize @c XXX ADD MORE STUFF HERE @@ -35885,9 +35772,9 @@ by @command{gawk}, Brian Kernighan's @command{awk}, and @command{mawk}, the three most widely used freely available versions of @command{awk} (@pxref{Other Versions}). -@multitable {@file{/dev/stderr} special file} {BWK Awk} {Mawk} {GNU Awk} {Now standard} -@headitem Feature @tab BWK Awk @tab Mawk @tab GNU Awk @tab Now standard -@item @samp{\x} Escape sequence @tab X @tab X @tab X @tab +@multitable {@file{/dev/stderr} special file} {BWK @command{awk}} {@command{mawk}} {@command{gawk}} {Now standard} +@headitem Feature @tab BWK @command{awk} @tab @command{mawk} @tab @command{gawk} @tab Now standard +@item @samp{\x} escape sequence @tab X @tab X @tab X @tab @item @code{FS} as null string @tab X @tab X @tab X @tab @item @file{/dev/stdin} special file @tab X @tab X @tab X @tab @item @file{/dev/stdout} special file @tab X @tab X @tab X @tab @@ -35918,7 +35805,7 @@ in the machine's native character set. Thus, on ASCII-based systems, @samp{[a-z]} matched all the lowercase letters, and only the lowercase letters, as the numeric values for the letters from @samp{a} through @samp{z} were contiguous. (On an EBCDIC system, the range @samp{[a-z]} -includes additional, non-alphabetic characters as well.) +includes additional nonalphabetic characters as well.) Almost all introductory Unix literature explained range expressions as working in this fashion, and in particular, would teach that the @@ -35943,7 +35830,7 @@ What does that mean? In many locales, @samp{A} and @samp{a} are both less than @samp{B}. In other words, these locales sort characters in dictionary order, and @samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]}; -instead it might be equivalent to @samp{[ABCXYabcdxyz]}, for example. +instead, it might be equivalent to @samp{[ABCXYabcdxyz]}, for example. This point needs to be emphasized: much literature teaches that you should use @samp{[a-z]} to match a lowercase character. But on systems with @@ -35972,23 +35859,23 @@ is perfectly valid in ASCII, but is not valid in many Unicode locales, such as @code{en_US.UTF-8}. Early versions of @command{gawk} used regexp matching code that was not -locale aware, so ranges had their traditional interpretation. +locale-aware, so ranges had their traditional interpretation. When @command{gawk} switched to using locale-aware regexp matchers, the problems began; especially as both GNU/Linux and commercial Unix vendors started implementing non-ASCII locales, @emph{and making them the default}. Perhaps the most frequently asked question became something -like ``why does @samp{[A-Z]} match lowercase letters?!?'' +like, ``Why does @samp{[A-Z]} match lowercase letters?!?'' @cindex Berry, Karl This situation existed for close to 10 years, if not more, and the @command{gawk} maintainer grew weary of trying to explain that -@command{gawk} was being nicely standards compliant, and that the issue +@command{gawk} was being nicely standards-compliant, and that the issue was in the user's locale. During the development of @value{PVERSION} 4.0, he modified @command{gawk} to always treat ranges in the original, pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).@footnote{And thus was born the Campaign for Rational Range Interpretation (or -RRI). A number of GNU tools have either implemented this change, +RRI). A number of GNU tools have already implemented this change, or will soon. Thanks to Karl Berry for coining the phrase ``Rational Range Interpretation.''} @@ -36002,9 +35889,10 @@ and By using this lovely technical term, the standard gives license to implementors to implement ranges in whatever way they choose. -The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all -cases: the default regexp matching; with @option{--traditional} and with -@option{--posix}; in all cases, @command{gawk} remains POSIX compliant. +The @command{gawk} maintainer chose to apply the pre-POSIX meaning +both with the default regexp matching and when @option{--traditional} or +@option{--posix} are used. +In all cases @command{gawk} remains POSIX-compliant. @node Contributors @appendixsec Major Contributors to @command{gawk} @@ -36050,7 +35938,7 @@ to around 90 pages. Richard Stallman helped finish the implementation and the initial draft of this @value{DOCUMENT}. -He is also the founder of the FSF and the GNU project. +He is also the founder of the FSF and the GNU Project. @item @cindex Woods, John @@ -36214,28 +36102,28 @@ John Haque made the following contributions: @itemize @value{MINUS} @item The modifications to convert @command{gawk} -into a byte-code interpreter, including the debugger. +into a byte-code interpreter, including the debugger @item -The addition of true arrays of arrays. +The addition of true arrays of arrays @item -The additional modifications for support of arbitrary-precision arithmetic. +The additional modifications for support of arbitrary-precision arithmetic @item The initial text of -@ref{Arbitrary Precision Arithmetic}. +@ref{Arbitrary Precision Arithmetic} @item The work to merge the three versions of @command{gawk} -into one, for the 4.1 release. +into one, for the 4.1 release @item -Improved array internals for arrays indexed by integers. +Improved array internals for arrays indexed by integers @item -The improved array sorting features were driven by John together -with Pat Rankin. +The improved array sorting features were also driven by John, together +with Pat Rankin @end itemize @cindex Papadopoulos, Panos @@ -36276,10 +36164,10 @@ helping David Trueman, and as the primary maintainer since around 1994. @itemize @value{BULLET} @item The @command{awk} language has evolved over time. The first release -was with V7 Unix circa 1978. In 1987, for System V Release 3.1, +was with V7 Unix, circa 1978. In 1987, for System V Release 3.1, major additions, including user-defined functions, were made to the language. Additional changes were made for System V Release 4, in 1989. -Since then, further minor changes happen under the auspices of the +Since then, further minor changes have happened under the auspices of the POSIX standard. @item @@ -36295,7 +36183,7 @@ options. The interaction of POSIX locales and regexp matching in @command{gawk} has been confusing over the years. Today, @command{gawk} implements Rational Range Interpretation, where ranges of the form @samp{[a-z]} match @emph{only} the characters numerically between -@samp{a} through @samp{z} in the machine's native character set. Usually this is ASCII +@samp{a} through @samp{z} in the machine's native character set. Usually this is ASCII, but it can be EBCDIC on IBM S/390 systems. @item @@ -36310,9 +36198,7 @@ the appropriate credit where credit is due. @c last two commas are part of see also @cindex operating systems, See Also GNU/Linux@comma{} PC operating systems@comma{} Unix -@c STARTOFRANGE gligawk @cindex @command{gawk}, installing -@c STARTOFRANGE ingawk @cindex installing @command{gawk} This appendix provides instructions for installing @command{gawk} on the various platforms that are supported by the developers. The primary @@ -36382,7 +36268,7 @@ will be less busy, and you can usually find one closer to your site. @command{gawk} is distributed as several @code{tar} files compressed with different compression programs: @command{gzip}, @command{bzip2}, and @command{xz}. For simplicity, the rest of these instructions assume -you are using the one compressed with the GNU Zip program, @code{gzip}. +you are using the one compressed with the GNU Gzip program (@command{gzip}). Once you have the distribution (e.g., @file{gawk-@value{VERSION}.@value{PATCHLEVEL}.tar.gz}), @@ -36422,7 +36308,6 @@ a local expert. @node Distribution contents @appendixsubsec Contents of the @command{gawk} Distribution -@c STARTOFRANGE gawdis @cindex @command{gawk}, distribution The @command{gawk} distribution has a number of C source files, @@ -36434,12 +36319,12 @@ operating systems: @table @asis @item Various @samp{.c}, @samp{.y}, and @samp{.h} files -The actual @command{gawk} source code. +These files contain the actual @command{gawk} source code. @end table @table @file @item ABOUT-NLS -Information about GNU @command{gettext} and translations. +A file containing information about GNU @command{gettext} and translations. @item AUTHORS A file with some information about the authorship of @command{gawk}. @@ -36469,7 +36354,7 @@ An older list of changes to @command{gawk}. The GNU General Public License. @item POSIX.STD -A description of behaviors in the POSIX standard for @command{awk} which +A description of behaviors in the POSIX standard for @command{awk} that are left undefined, or where @command{gawk} may not comply fully, as well as a list of things that the POSIX standard should describe but does not. @@ -36520,10 +36405,10 @@ The generated Info file for this @value{DOCUMENT}. @item doc/gawkinet.texi The Texinfo source file for @ifinfo -@inforef{Top, , General Introduction, gawkinet, TCP/IP Internetworking with @command{gawk}}. +@inforef{Top, , General Introduction, gawkinet, @value{GAWKINETTITLE}}. @end ifinfo @ifnotinfo -@cite{TCP/IP Internetworking with @command{gawk}}. +@cite{@value{GAWKINETTITLE}}. @end ifnotinfo It should be processed with @TeX{} (via @command{texi2dvi} or @command{texi2pdf}) @@ -36532,7 +36417,7 @@ with @command{makeinfo} to produce an Info or HTML file. @item doc/gawkinet.info The generated Info file for -@cite{TCP/IP Internetworking with @command{gawk}}. +@cite{@value{GAWKINETTITLE}}. @item doc/igawk.1 The @command{troff} source for a manual page describing the @command{igawk} @@ -36621,7 +36506,6 @@ directory to run your version of @command{gawk} against the test suite. If @command{gawk} successfully passes @samp{make check}, then you can be confident of a successful port. @end table -@c ENDOFRANGE gawdis @node Unix Installation @appendixsec Compiling and Installing @command{gawk} on Unix-Like Systems @@ -36772,7 +36656,7 @@ can be configured and compiled. @cindex @option{--disable-lint} configuration option @cindex configuration option, @code{--disable-lint} @item --disable-lint -Disable all lint checking within @code{gawk}. The +Disable all lint checking within @command{gawk}. The @option{--lint} and @option{--lint-old} options (@pxref{Options}) are accepted, but silently do nothing. @@ -36780,14 +36664,17 @@ Similarly, setting the @code{LINT} variable (@pxref{User-modified}) has no effect on the running @command{awk} program. -When used with GCC's automatic dead-code-elimination, this option +When used with the GNU Compiler Collection's (GCC's) +automatic dead-code-elimination, this option cuts almost 23K bytes off the size of the @command{gawk} executable on GNU/Linux x86_64 systems. Results on other systems and with other compilers are likely to vary. Using this option may bring you some slight performance improvement. +@quotation CAUTION Using this option will cause some of the tests in the test suite to fail. This option may be removed at a later date. +@end quotation @cindex @option{--disable-nls} configuration option @cindex configuration option, @code{--disable-nls} @@ -36884,10 +36771,10 @@ running MS-DOS, any version of MS-Windows, or OS/2. running MS-DOS and any version of MS-Windows. @end ifset In this @value{SECTION}, the term ``Windows32'' -refers to any of Microsoft Windows-95/98/ME/NT/2000/XP/Vista/7/8. +refers to any of Microsoft Windows 95/98/ME/NT/2000/XP/Vista/7/8. The limitations of MS-DOS (and MS-DOS shells under the other operating -systems) has meant that various ``DOS extenders'' are often used with +systems) have meant that various ``DOS extenders'' are often used with programs such as @command{gawk}. The varying capabilities of Microsoft Windows 3.1 and Windows32 can add to the confusion. For an overview of the considerations, refer to @file{README_d/README.pc} in @@ -37086,9 +36973,7 @@ multibyte functionality is not available. @node PC Using @appendixsubsubsec Using @command{gawk} on PC Operating Systems -@c STARTOFRANGE opgawx @cindex operating systems, PC, @command{gawk} on -@c STARTOFRANGE pcgawon @cindex PC operating systems, @command{gawk} on Under MS-DOS and MS-Windows, the Cygwin and MinGW environments support @@ -37148,7 +37033,7 @@ Under MS-Windows, OS/2 and MS-DOS, Under MS-Windows and MS-DOS, @end ifset @command{gawk} (and many other text programs) silently -translate end-of-line @samp{\r\n} to @samp{\n} on input and @samp{\n} +translates end-of-line @samp{\r\n} to @samp{\n} on input and @samp{\n} to @samp{\r\n} on output. A special @code{BINMODE} variable @value{COMMONEXT} allows control over these translations and is interpreted as follows: @@ -37182,7 +37067,7 @@ Setting @code{BINMODE} for standard input or standard output is accomplished by using an appropriate @samp{-v BINMODE=@var{N}} option on the command line. @code{BINMODE} is set at the time a file or pipe is opened and cannot be -changed mid-stream. +changed midstream. The name @code{BINMODE} was chosen to match @command{mawk} (@pxref{Other Versions}). @@ -37238,8 +37123,8 @@ moved into the @code{BEGIN} rule. @command{gawk} can be built and used ``out of the box'' under MS-Windows if you are using the @uref{http://www.cygwin.com, Cygwin environment}. -This environment provides an excellent simulation of GNU/Linux, using the -GNU tools, such as Bash, the GNU Compiler Collection (GCC), GNU Make, +This environment provides an excellent simulation of GNU/Linux, using +Bash, GCC, GNU Make, and other GNU programs. Compilation and installation for Cygwin is the same as for a Unix system: @@ -37258,7 +37143,7 @@ and then the @samp{make} proceeds as usual. @appendixsubsubsec Using @command{gawk} In The MSYS Environment In the MSYS environment under MS-Windows, @command{gawk} automatically -uses binary mode for reading and writing files. Thus there is no +uses binary mode for reading and writing files. Thus, there is no need to use the @code{BINMODE} variable. This can cause problems with other Unix-like components that have @@ -37322,7 +37207,7 @@ With ODS-5 volumes and extended parsing enabled, the case of the target parameter may need to be exact. @command{gawk} has been tested under VAX/VMS 7.3 and Alpha/VMS 7.3-1 -using Compaq C V6.4, and Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3. +using Compaq C V6.4, and under Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3. The most recent builds used HP C V7.3 on Alpha VMS 8.3 and both Alpha and IA64 VMS 8.4 used HP C 7.3.@footnote{The IA64 architecture is also known as ``Itanium.''} @@ -37370,7 +37255,7 @@ For VAX: /name=(as_is,short) @end example -Compile time macros need to be defined before the first VMS-supplied +Compile-time macros need to be defined before the first VMS-supplied header file is included, as follows: @example @@ -37417,7 +37302,7 @@ If your @command{gawk} was installed by a PCSI kit into the @file{GNV$GNU:[vms_help]gawk.hlp}. The PCSI kit also installs a @file{GNV$GNU:[vms_bin]gawk_verb.cld} file -which can be used to add @command{gawk} and @command{awk} as DCL commands. +that can be used to add @command{gawk} and @command{awk} as DCL commands. For just the current process you can use: @@ -37426,7 +37311,7 @@ $ @kbd{set command gnv$gnu:[vms_bin]gawk_verb.cld} @end example Or the system manager can use @file{GNV$GNU:[vms_bin]gawk_verb.cld} to -add the @command{gawk} and @command{awk} to the system wide @samp{DCLTABLES}. +add the @command{gawk} and @command{awk} to the system-wide @samp{DCLTABLES}. The DCL syntax is documented in the @file{gawk.hlp} file. @@ -37492,14 +37377,14 @@ The @code{exit} value is a Unix-style value and is encoded into a VMS exit status value when the program exits. The VMS severity bits will be set based on the @code{exit} value. -A failure is indicated by 1 and VMS sets the @code{ERROR} status. -A fatal error is indicated by 2 and VMS sets the @code{FATAL} status. +A failure is indicated by 1, and VMS sets the @code{ERROR} status. +A fatal error is indicated by 2, and VMS sets the @code{FATAL} status. All other values will have the @code{SUCCESS} status. The exit value is encoded to comply with VMS coding standards and will have the @code{C_FACILITY_NO} of @code{0x350000} with the constant @code{0xA000} added to the number shifted over by 3 bits to make room for the severity codes. -To extract the actual @command{gawk} exit code from the VMS status use: +To extract the actual @command{gawk} exit code from the VMS status, use: @example unix_status = (vms_status .and. &x7f8) / 8 @@ -37518,7 +37403,7 @@ VAX/VMS floating point uses unbiased rounding. @xref{Round Function}. VMS reports time values in GMT unless one of the @code{SYS$TIMEZONE_RULE} or @code{TZ} logical names is set. Older versions of VMS, such as VAX/VMS -7.3 do not set these logical names. +7.3, do not set these logical names. @c @cindex directory search @c @cindex path, search @@ -37536,7 +37421,7 @@ translation and not a multitranslation @code{RMS} searchlist. The VMS GNV package provides a build environment similar to POSIX with ports of a collection of open source tools. The @command{gawk} found in the GNV -base kit is an older port. Currently the GNV project is being reorganized +base kit is an older port. Currently, the GNV project is being reorganized to supply individual PCSI packages for each component. See @w{@uref{https://sourceforge.net/p/gnv/wiki/InstallingGNVPackages/}.} @@ -37596,8 +37481,6 @@ $ @kbd{gawk :== $sys$common:[syshlp.examples.tcpip.snmp]gawk.exe} This is apparently @value{PVERSION} 2.15.6, which is extremely old. We recommend compiling and using the current version. -@c ENDOFRANGE opgawx -@c ENDOFRANGE pcgawon @node Bugs @appendixsec Reporting Problems and Bugs @@ -37608,12 +37491,10 @@ recommend compiling and using the current version. @end quotation @c the radio show, not the book. :-) -@c STARTOFRANGE dbugg @cindex debugging @command{gawk}, bug reports -@c STARTOFRANGE tblgawb @cindex troubleshooting, @command{gawk}, bug reports If you have problems with @command{gawk} or think that you have found a bug, -report it to the developers; we cannot promise to do anything +report it to the developers; we cannot promise to do anything, but we might well want to fix it. Before reporting a bug, make sure you have really found a genuine bug. @@ -37623,7 +37504,7 @@ to do something or not, report that too; it's a bug in the documentation! Before reporting a bug or trying to fix it yourself, try to isolate it to the smallest possible @command{awk} program and input @value{DF} that -reproduces the problem. Then send us the program and @value{DF}, +reproduce the problem. Then send us the program and @value{DF}, some idea of what kind of Unix system you're using, the compiler you used to compile @command{gawk}, and the exact results @command{gawk} gave you. Also say what you expected to occur; this helps @@ -37638,7 +37519,7 @@ You can get this information with the command @samp{gawk --version}. Once you have a precise problem description, send email to @EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org}. -The @command{gawk} maintainers subscribe to this address and +The @command{gawk} maintainers subscribe to this address, and thus they will receive your bug report. Although you can send mail to the maintainers directly, the bug reporting address is preferred because the @@ -37665,8 +37546,8 @@ bug reporting system, you should also send a copy to This is for two reasons. First, although some distributions forward bug reports ``upstream'' to the GNU mailing list, many don't, so there is a good chance that the @command{gawk} maintainers won't even see the bug report! Second, -mail to the GNU list is archived, and having everything at the GNU project -keeps things self-contained and not dependant on other organizations. +mail to the GNU list is archived, and having everything at the GNU Project +keeps things self-contained and not dependent on other organizations. @end quotation Non-bug suggestions are always welcome as well. If you have questions @@ -37675,7 +37556,7 @@ features, ask on the bug list; we will try to help you out if we can. If you find bugs in one of the non-Unix ports of @command{gawk}, send an email to the bug list, with a copy to the -person who maintains that port. They are named in the following list, +person who maintains that port. The maintainers are named in the following list, as well as in the @file{README} file in the @command{gawk} distribution. Information in the @file{README} file should be considered authoritative if it conflicts with this @value{DOCUMENT}. @@ -37690,29 +37571,26 @@ The people maintaining the various @command{gawk} ports are: @cindex Robbins, Arnold @cindex Zaretskii, Eli @multitable {MS-Windows with MinGW} {123456789012345678901234567890123456789001234567890} -@item Unix and POSIX systems @tab Arnold Robbins, @EMAIL{arnold@@skeeve.com,arnold at skeeve dot com}. +@item Unix and POSIX systems @tab Arnold Robbins, @EMAIL{arnold@@skeeve.com,arnold at skeeve dot com} -@item MS-DOS with DJGPP @tab Scott Deifik, @EMAIL{scottd.mail@@sbcglobal.net,scottd dot mail at sbcglobal dot net}. +@item MS-DOS with DJGPP @tab Scott Deifik, @EMAIL{scottd.mail@@sbcglobal.net,scottd dot mail at sbcglobal dot net} -@item MS-Windows with MinGW @tab Eli Zaretskii, @EMAIL{eliz@@gnu.org,eliz at gnu dot org}. +@item MS-Windows with MinGW @tab Eli Zaretskii, @EMAIL{eliz@@gnu.org,eliz at gnu dot org} @c Leave this in the print version on purpose. @c OS/2 is not mentioned anywhere else in the print version though. -@item OS/2 @tab Andreas Buening, @EMAIL{andreas.buening@@nexgo.de,andreas dot buening at nexgo dot de}. +@item OS/2 @tab Andreas Buening, @EMAIL{andreas.buening@@nexgo.de,andreas dot buening at nexgo dot de} -@item VMS @tab John Malmberg, @EMAIL{wb8tyw@@qsl.net,wb8tyw at qsl.net}. +@item VMS @tab John Malmberg, @EMAIL{wb8tyw@@qsl.net,wb8tyw at qsl.net} -@item z/OS (OS/390) @tab Dave Pitts, @EMAIL{dpitts@@cozx.com,dpitts at cozx dot com}. +@item z/OS (OS/390) @tab Dave Pitts, @EMAIL{dpitts@@cozx.com,dpitts at cozx dot com} @end multitable If your bug is also reproducible under Unix, send a copy of your report to the @EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org} email list as well. -@c ENDOFRANGE dbugg -@c ENDOFRANGE tblgawb @node Other Versions @appendixsec Other Freely Available @command{awk} Implementations -@c STARTOFRANGE awkim @cindex @command{awk}, implementations @ignore From: emory!amc.com!brennan (Michael Brennan) @@ -37724,7 +37602,7 @@ Date: Wed, 4 Sep 1996 08:11:48 -0700 (PDT) @cindex Brennan, Michael @ifnotdocbook @quotation -@i{It's kind of fun to put comments like this in your awk code.}@* +@i{It's kind of fun to put comments like this in your awk code:}@* @ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course} @author Michael Brennan @end quotation @@ -37765,14 +37643,14 @@ It is available in several archive formats: @end table @cindex @command{git} utility -You can also retrieve it from Git Hub: +You can also retrieve it from GitHub: @example git clone git://github.com/onetrueawk/awk bwkawk @end example @noindent -This command creates a copy of the @uref{http://www.git-scm.com, Git} +This command creates a copy of the @uref{http://git-scm.com, Git} repository in a directory named @file{bwkawk}. If you leave that argument off the @command{git} command line, the repository copy is created in a directory named @file{awk}. @@ -37825,7 +37703,7 @@ for a list of extensions in @command{mawk} that are not in POSIX @command{awk}. @item @command{awka} Written by Andrew Sumner, @command{awka} translates @command{awk} programs into C, compiles them, -and links them with a library of functions that provides the core +and links them with a library of functions that provide the core @command{awk} functionality. It also has a number of extensions. @@ -37837,7 +37715,7 @@ To get @command{awka}, go to @url{http://sourceforge.net/projects/awka}. @c andrewsumner@@yahoo.net The project seems to be frozen; no new code changes have been made -since approximately 2003. +since approximately 2001. @cindex Beebe, Nelson H.F.@: @cindex @command{pawk} (profiling version of Brian Kernighan's @command{awk}) @@ -37846,17 +37724,17 @@ since approximately 2003. Nelson H.F.@: Beebe at the University of Utah has modified BWK @command{awk} to provide timing and profiling information. It is different from @command{gawk} with the @option{--profile} option -(@pxref{Profiling}), +(@pxref{Profiling}) in that it uses CPU-based profiling, not line-count profiling. You may find it at either @uref{ftp://ftp.math.utah.edu/pub/pawk/pawk-20030606.tar.gz} or @uref{http://www.math.utah.edu/pub/pawk/pawk-20030606.tar.gz}. -@item Busybox Awk -@cindex Busybox Awk -@cindex source code, Busybox Awk -Busybox is a GPL-licensed program providing small versions of many +@item BusyBox @command{awk} +@cindex BusyBox Awk +@cindex source code, BusyBox Awk +BusyBox is a GPL-licensed program providing small versions of many applications within a single executable. It is aimed at embedded systems. It includes a full implementation of POSIX @command{awk}. When building it, be careful not to do @samp{make install} as it will overwrite @@ -37868,7 +37746,7 @@ information, see the @uref{http://busybox.net, project's home page}. @cindex source code, Solaris @command{awk} @item The OpenSolaris POSIX @command{awk} The versions of @command{awk} in @file{/usr/xpg4/bin} and -@file{/usr/xpg6/bin} on Solaris are more-or-less POSIX-compliant. +@file{/usr/xpg6/bin} on Solaris are more or less POSIX-compliant. They are based on the @command{awk} from Mortice Kern Systems for PCs. We were able to make this code compile and work under GNU/Linux with 1--2 hours of work. Making it more generally portable (using @@ -37909,9 +37787,9 @@ features to Python. See @uref{https://github.com/alecthomas/pawk} for more information. (This is not related to Nelson Beebe's modified version of BWK @command{awk}, described earlier.) -@item @w{QSE Awk} -@cindex QSE Awk -@cindex source code, QSE Awk +@item @w{QSE @command{awk}} +@cindex QSE @command{awk} +@cindex source code, QSE @command{awk} This is an embeddable @command{awk} interpreter. For more information, see @uref{http://code.google.com/p/qse/} and @uref{http://awk.info/?tools/qse}. @@ -37930,17 +37808,16 @@ since approximately 2008. @item Other versions See also the ``Versions and implementations'' section of the @uref{http://en.wikipedia.org/wiki/Awk_language#Versions_and_implementations, -Wikipedia article} for information on additional versions. +Wikipedia article} on @command{awk} for information on additional versions. @end table -@c ENDOFRANGE awkim @node Installation summary @appendixsec Summary @itemize @value{BULLET} @item -The @command{gawk} distribution is available from GNU project's main +The @command{gawk} distribution is available from the GNU Project's main distribution site, @code{ftp.gnu.org}. The canonical build recipe is: @example @@ -37952,34 +37829,30 @@ cd gawk-@value{VERSION}.@value{PATCHLEVEL} @item @command{gawk} may be built on non-POSIX systems as well. The currently -supported systems are MS-Windows using DJGPP, MSYS, MinGW and Cygwin, +supported systems are MS-Windows using DJGPP, MSYS, MinGW, and Cygwin, @ifclear FOR_PRINT OS/2 using EMX, @end ifclear and both Vax/VMS and OpenVMS. -Instructions for each system are included in this @value{CHAPTER}. +Instructions for each system are included in this @value{APPENDIX}. @item Bug reports should be sent via email to @email{bug-gawk@@gnu.org}. -Bug reports should be in English, and should include the version of @command{gawk}, -how it was compiled, and a short program and @value{DF} which demonstrate +Bug reports should be in English and should include the version of @command{gawk}, +how it was compiled, and a short program and @value{DF} that demonstrate the problem. @item There are a number of other freely available @command{awk} -implementations. Many are POSIX compliant; others are less so. +implementations. Many are POSIX-compliant; others are less so. @end itemize -@c ENDOFRANGE gligawk -@c ENDOFRANGE ingawk @ifclear FOR_PRINT @node Notes @appendix Implementation Notes -@c STARTOFRANGE gawii @cindex @command{gawk}, implementation issues -@c STARTOFRANGE impis @cindex implementation issues, @command{gawk} This appendix contains information mainly of interest to implementers and @@ -38055,7 +37928,7 @@ However, if you want to modify @command{gawk} and contribute back your changes, you will probably wish to work with the development version. To do so, you will need to access the @command{gawk} source code repository. The code is maintained using the -@uref{http://git-scm.com/, Git distributed version control system}. +@uref{http://git-scm.com, Git distributed version control system}. You will need to install it if your system doesn't have it. Once you have done so, use the command: @@ -38084,11 +37957,8 @@ that has a Git plug-in for working with Git repositories. @node Adding Code @appendixsubsec Adding New Features -@c STARTOFRANGE adfgaw @cindex adding, features to @command{gawk} -@c STARTOFRANGE fadgaw @cindex features, adding to @command{gawk} -@c STARTOFRANGE gawadf @cindex @command{gawk}, features, adding You are free to add any new features you like to @command{gawk}. However, if you want your changes to be incorporated into the @command{gawk} @@ -38123,7 +37993,7 @@ for information on getting the latest version of @command{gawk}.) @item @ifnotinfo -Follow the @uref{http://www.gnu.org/prep/standards/, @cite{GNU Coding Standards}}. +Follow the @cite{GNU Coding Standards}. @end ifnotinfo @ifinfo See @inforef{Top, , Version, standards, GNU Coding Standards}. @@ -38132,7 +38002,7 @@ This document describes how GNU software should be written. If you haven't read it, please do so, preferably @emph{before} starting to modify @command{gawk}. (The @cite{GNU Coding Standards} are available from the GNU Project's -@uref{http://www.gnu.org/prep/standards_toc.html, website}. +@uref{http://www.gnu.org/prep/standards/, website}. Texinfo, Info, and DVI versions are also available.) @cindex @command{gawk}, coding style in @@ -38255,9 +38125,6 @@ Although this sounds like a lot of work, please remember that while you may write the new code, I have to maintain it and support it. If it isn't possible for me to do that with a minimum of extra work, then I probably will not. -@c ENDOFRANGE adfgaw -@c ENDOFRANGE gawadf -@c ENDOFRANGE fadgaw @node New Ports @appendixsubsec Porting @command{gawk} to a New Operating System @@ -38391,7 +38258,6 @@ coding style and brace layout that suits your taste. @node Derived Files @appendixsubsec Why Generated Files Are Kept In Git -@c STARTOFRANGE gawkgit @cindex Git, use of for @command{gawk} source code @c From emails written March 22, 2012, to the gawk developers list. @@ -38580,7 +38446,6 @@ wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.ta @noindent to retrieve a snapshot of the given branch. -@c ENDOFRANGE gawkgit @node Future Extensions @appendixsec Probable Future Extensions @@ -38961,13 +38826,10 @@ of @command{gawk}, but it @emph{will} be removed in the next major release. @end itemize -@c ENDOFRANGE impis -@c ENDOFRANGE gawii @node Basic Concepts @appendix Basic Programming Concepts @cindex programming, concepts -@c STARTOFRANGE procon @cindex programming, concepts This @value{APPENDIX} attempts to define some of the basic concepts @@ -39205,7 +39067,6 @@ standard for C. This standard became an ISO standard in 1990. In 1999, a revised ISO C standard was approved and released. Where it makes sense, POSIX @command{awk} is compatible with 1999 ISO C. -@c ENDOFRANGE procon @node Glossary @unnumbered Glossary @@ -39256,6 +39117,21 @@ languages. These standards often become international standards as well. See also ``ISO.'' +@item Argument +An argument can be two different things. It can be an option or a +@value{FN} passed to a command while invoking it from the command line, or +it can be something passed to a @dfn{function} inside a program, e.g. +inside @command{awk}. + +In the latter case, an argument can be passed to a function in two ways. +Either it is given to the called function by value, i.e., a copy of the +value of the variable is made available to the called function, but the +original variable cannot be modified by the function itself; or it is +given by reference, i.e., a pointer to the interested variable is passed to +the function, which can then directly modify it. In @command{awk} +scalars are passed by value, and arrays are passed by reference. +See ``Pass By Value/Reference.'' + @item Array A grouping of multiple values under the same name. Most languages just provide sequential arrays. @@ -39297,6 +39173,25 @@ The GNU version of the standard shell @end ifinfo See also ``Bourne Shell.'' +@item Binary +Base-two notation, where the digits are @code{0}--@code{1}. Since +electronic circuitry works ``naturally'' in base 2 (just think of Off/On), +everything inside a computer is calculated using base 2. Each digit +represents the presence (or absence) of a power of 2 and is called a +@dfn{bit}. So, for example, the base-two number @code{10101} is +the same as decimal 21, ((1 x 16) + (1 x 4) + (1 x 1)). + +Since base-two numbers quickly become +very long to read and write, they are usually grouped by 3 (i.e., they are +read as octal numbers), or by 4 (i.e., they are read as hexadecimal +numbers). There is no direct way to insert base 2 numbers in a C program. +If need arises, such numbers are usually inserted as octal or hexadecimal +numbers. The number of base-two digits that fit into registers used for +representing integer numbers in computers is a rough indication of the +computing power of the computer itself. Most computers nowadays use 64 +bits for representing integer numbers in their registers, but 32-bit, +16-bit and 8-bit registers have been widely used in the past. +@xref{Nondecimal-numbers}. @item Bit Short for ``Binary Digit.'' All values in computer memory ultimately reduce to binary digits: values @@ -39328,6 +39223,19 @@ The characters @samp{@{} and @samp{@}}. Braces are used in @command{awk} for delimiting actions, compound statements, and function bodies. +@item Bracket Expression +Inside a @dfn{regular expression}, an expression included in square +brackets, meant to designate a single character as belonging to a +specified character class. A bracket expression can contain a list of one +or more characters, like @samp{[abc]}, a range of characters, like +@samp{[A-Z]}, or a name, delimited by @samp{:}, that designates a known set +of characters, like @samp{[:digit:]}. The form of bracket expression +enclosed between @samp{:} is independent of the underlying representation +of the character themselves, which could utilize the ASCII, ECBDIC, or +Unicode codesets, depending on the architecture of the computer system, and on +localization. +See also ``Regular Expression.'' + @item Built-in Function The @command{awk} language provides built-in functions that perform various numerical, I/O-related, and string computations. Examples are @@ -39381,9 +39289,25 @@ points out similarities between @command{awk} and C when appropriate. In general, @command{gawk} attempts to be as similar to the 1990 version of ISO C as makes sense. +@item C Shell +The C Shell (@command{csh} or its improved version, @command{tcsh}) is a Unix shell that was +created by Bill Joy in the late 1970s. The C shell was differentiated from +other shells by its interactive features and overall style, which +looks more like C. The C Shell is not backward compatible with the Bourne +Shell, so special attention is required when converting scripts +written for other Unix shells to the C shell, especially with regard to the management of +shell variables. +See also ``Bourne Shell.'' + @item C++ A popular object-oriented programming language derived from C. +@item Character Class +See ``Bracket Expression.'' + +@item Character List +See ``Bracket Expression.'' + @cindex ASCII @cindex ISO 8859-1 @cindex ISO Latin-1 @@ -39407,7 +39331,7 @@ A preprocessor for @command{pic} that reads descriptions of molecules and produces @command{pic} input for drawing them. It was written in @command{awk} by Brian Kernighan and Jon Bentley, and is available from -@uref{http://netlib.sandia.gov/netlib/typesetting/chem.gz}. +@uref{http://netlib.org/typesetting/chem}. @item Comparison Expression A relation that is either true or false, such as @samp{a < b}. @@ -39423,11 +39347,23 @@ machine-executable object code. The object code is then executed directly by the computer. See also ``Interpreter.'' +@item Complemented Bracket Expression +The negation of a @dfn{bracket expression}. All that is @emph{not} +described by a given bracket expression. The symbol @samp{^} precedes +the negated bracket expression. E.g.: @samp{[[^:digit:]} +designates whatever character is not a digit. @samp{[^bad]} +designates whatever character is not one of the letters @samp{b}, @samp{a}, +or @samp{d}. +See ``Bracket Expression.'' + @item Compound Statement A series of @command{awk} statements, enclosed in curly braces. Compound statements may be nested. (@xref{Statements}.) +@item Computed Regexps +See ``Dynamic Regular Expressions.'' + @item Concatenation Concatenating two strings means sticking them together, one after another, producing a new string. For example, the string @samp{foo} concatenated with @@ -39442,6 +39378,13 @@ expression is the value of @var{expr2}; otherwise the value is @var{expr3}. In either case, only one of @var{expr2} and @var{expr3} is evaluated. (@xref{Conditional Exp}.) +@item Control Statement +A control statement is an instruction to perform a given operation or a set +of operations inside an @command{awk} program, if a given condition is +true. Control statements are: @code{if}, @code{for}, @code{while}, and +@code{do} +(@pxref{Statements}). + @cindex McIlroy, Doug @cindex cookie @item Cookie @@ -39596,6 +39539,11 @@ Format strings control the appearance of output in the are controlled by the format strings contained in the predefined variables @code{CONVFMT} and @code{OFMT}. (@xref{Control Letters}.) +@item Fortran +Shorthand for FORmula TRANslator, one of the first programming languages +available for scientific calculations. It was created by John Backus, +and has been available since 1957. It is still in use today. + @item Free Documentation License This document describes the terms under which this @value{DOCUMENT} is published and may be copied. (@xref{GNU Free Documentation License}.) @@ -39613,10 +39561,21 @@ Emacs editor. GNU Emacs is the most widely used version of Emacs today. See ``Free Software Foundation.'' @item Function -A specialized group of statements used to encapsulate general -or program-specific tasks. @command{awk} has a number of built-in -functions, and also allows you to define your own. -(@xref{Functions}.) +A part of an @command{awk} program that can be invoked from every point of +the program, to perform a task. @command{awk} has several built-in +functions. +Users can define their own functions in every part of the program. +Function can be recursive, i.e., they may invoke themselves. +@xref{Functions}. +In @command{gawk} it is also possible to have functions shared +among different programs, and included where required using the +@code{@@include} directive +(@pxref{Include Files}). +In @command{gawk} the name of the function that should be invoked +can be generated at run time, i.e., dynamically. +The @command{gawk} extension API provides constructor functions +(@pxref{Constructor Functions}). + @item @command{gawk} The GNU implementation of @command{awk}. @@ -39740,6 +39699,12 @@ meaning. Keywords are reserved and may not be used as variable names. and @code{while}. +@item Korn Shell +The Korn Shell (@command{ksh}) is a Unix shell which was developed by David Korn at Bell +Laboratories in the early 1980s. The Korn Shell is backward-compatible with the Bourne +shell and includes many features of the C shell. +See also ``Bourne Shell.'' + @cindex LGPL (Lesser General Public License) @cindex Lesser General Public License (LGPL) @cindex GNU Lesser General Public License @@ -39779,6 +39744,14 @@ Characters used within a regexp that do not stand for themselves. Instead, they denote regular expression operations, such as repetition, grouping, or alternation. +@item Nesting +Nesting is where information is organized in layers, or where objects +contain other similar objects. +In @command{gawk} the @code{@@include} +directive can be nested. The ``natural'' nesting of arithmetic and +logical operations can be changed using parentheses +(@pxref{Precedence}). + @item No-op An operation that does nothing. @@ -39799,6 +39772,11 @@ Octal numbers are written in C using a leading @samp{0}, to indicate their base. Thus, @code{013} is 11 ((1 x 8) + 3). @xref{Nondecimal-numbers}. +@item Output Record +A single chunk of data that is written out by @command{awk}. Usually, an +@command{awk} output record consists of one or more lines of text. +@xref{Records}. + @item Pattern Patterns tell @command{awk} which input records are interesting to which rules. @@ -39813,6 +39791,9 @@ An acronym describing what is possibly the most frequent source of computer usage problems. (Problem Exists Between Keyboard And Chair.) +@item Plug-in +See ``Extensions.'' + @item POSIX The name for a series of standards that specify a Portable Operating System interface. The ``IX'' denotes @@ -39837,6 +39818,9 @@ A sequence of consecutive lines from the input file(s). A pattern can specify ranges of input lines for @command{awk} to process or it can specify single lines. (@xref{Pattern Overview}.) +@item Record +See ``Input record'' and ``Output record.'' + @item Recursion When a function calls itself, either directly or indirectly. If this is clear, stop, and proceed to the next entry. @@ -39854,6 +39838,15 @@ operators. (@xref{Getline}, and @ref{Redirection}.) +@item Reference Counts +An internal mechanism in @command{gawk} to minimize the amount of memory +needed to store the value of string variables. If the value assumed by +a variable is used in more than one place, only one copy of the value +itself is kept, and the associated reference count is increased when the +same value is used by an additional variable, and decresed when the related +variable is no longer in use. When the reference count goes to zero, +the memory space used to store the value of the variable is freed. + @item Regexp See ``Regular Expression.'' @@ -39871,6 +39864,15 @@ slashes, such as @code{/foo/}. This regular expression is chosen when you write the @command{awk} program and cannot be changed during its execution. (@xref{Regexp Usage}.) +@item Regular Expression Operators +See ``Metacharacters.'' + +@item Rounding +Rounding the result of an arithmetic operation can be tricky. +More than one way of rounding exists, and in @command{gawk} +it is possible to choose which method should be used in a program. +@xref{Setting the rounding mode}. + @item Rule A segment of an @command{awk} program that specifies how to process single input records. A rule consists of a @dfn{pattern} and an @dfn{action}. @@ -39930,6 +39932,12 @@ A @value{FN} interpreted internally by @command{gawk}, instead of being handed directly to the underlying operating system---for example, @file{/dev/stderr}. (@xref{Special Files}.) +@item Statement +An expression inside an @command{awk} program in the action part +of a pattern--action rule, or inside an +@command{awk} function. A statement can be a variable assignment, +an array operation, a loop, etc. + @item Stream Editor A program that reads records from an input stream and processes them one or more at a time. This is in contrast with batch programs, which may @@ -39980,9 +39988,14 @@ This is standard time in Greenwich, England, which is used as a reference time for day and date calculations. See also ``Epoch'' and ``GMT.'' +@item Variable +A name for a value. In @command{awk}, variables may be either scalars +or arrays. + @item Whitespace A sequence of space, TAB, or newline characters occurring inside an input record or a string. + @end table @end ifclear |