summaryrefslogtreecommitdiff
path: root/doc/gawk.texi
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2010-07-16 12:41:09 +0300
committerArnold D. Robbins <arnold@skeeve.com>2010-07-16 12:41:09 +0300
commit8c042f99cc7465c86351d21331a129111b75345d (patch)
tree9656e653be0e42e5469cec77635c20356de152c2 /doc/gawk.texi
parent8ceb5f934787eb7be5fb452fb39179df66119954 (diff)
downloadgawk-8c042f99cc7465c86351d21331a129111b75345d.tar.gz
Move to gawk-3.0.0.gawk-3.0.0
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi20460
1 files changed, 20460 insertions, 0 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
new file mode 100644
index 00000000..6227ac32
--- /dev/null
+++ b/doc/gawk.texi
@@ -0,0 +1,20460 @@
+\input texinfo @c -*-texinfo-*-
+@c %**start of header (This is for running Texinfo on a region.)
+@setfilename gawk.info
+@settitle AWK Language Programming
+@c %**end of header (This is for running Texinfo on a region.)
+
+@ignore
+@ifinfo
+@format
+START-INFO-DIR-ENTRY
+* Gawk: (gawk.info). A Text Scanning and Processing Language.
+END-INFO-DIR-ENTRY
+@end format
+@end ifinfo
+@end ignore
+
+@c @set xref-automatic-section-title
+@c @set DRAFT
+
+@c The following information should be updated here only!
+@c This sets the edition of the document, the version of gawk it
+@c applies to, and when the document was updated.
+@set TITLE AWK Language Programming
+@set EDITION 1.0
+@set VERSION 3.0
+@set UPDATE-MONTH January 1996
+@iftex
+@set DOCUMENT book
+@end iftex
+@ifinfo
+@set DOCUMENT Info file
+@end ifinfo
+
+@ignore
+Some comments on the layout for TeX.
+1. Use the texinfo.tex from the gawk distribution. It contains fixes that
+ are needed to get the footings for draft mode to not appear.
+2. I have done A LOT of work to make this look good. There `@page' commands
+ and use of `@group ... @end group' in a number of places. If you muck
+ with anything, it's your responsibility not to break the layout.
+@end ignore
+
+@c merge the function and variable indexes into the concept index
+@ifinfo
+@synindex fn cp
+@synindex vr cp
+@end ifinfo
+@iftex
+@syncodeindex fn cp
+@syncodeindex vr cp
+@end iftex
+
+@c If "finalout" is commented out, the printed output will show
+@c black boxes that mark lines that are too long. Thus, it is
+@c unwise to comment it out when running a master in case there are
+@c overfulls which are deemed okay.
+
+@ifclear DRAFT
+@iftex
+@finalout
+@end iftex
+@end ifclear
+
+@smallbook
+@iftex
+@cropmarks
+@end iftex
+
+@ifinfo
+This file documents @code{awk}, a program that you can use to select
+particular records in a file and perform operations upon them.
+
+This is Edition @value{EDITION} of @cite{@value{TITLE}},
+for the @value{VERSION} version of the GNU implementation of AWK.
+
+Copyright (C) 1989, 1991 - 1996 Free Software Foundation, Inc.
+
+Permission is granted to make and distribute verbatim copies of
+this manual provided the copyright notice and this permission notice
+are preserved on all copies.
+
+@ignore
+Permission is granted to process this file through TeX and print the
+results, provided the printed document carries copying permission
+notice identical to this one except for the removal of this paragraph
+(this paragraph not being relevant to the printed manual).
+
+@end ignore
+Permission is granted to copy and distribute modified versions of this
+manual under the conditions for verbatim copying, provided that the entire
+resulting derived work is distributed under the terms of a permission
+notice identical to this one.
+
+Permission is granted to copy and distribute translations of this manual
+into another language, under the above conditions for modified versions,
+except that this permission notice may be stated in a translation approved
+by the Foundation.
+@end ifinfo
+
+@setchapternewpage odd
+
+@titlepage
+@title @value{TITLE}
+@subtitle A User's Guide for GNU AWK
+@subtitle Edition @value{EDITION}
+@subtitle @value{UPDATE-MONTH}
+@author Arnold D. Robbins
+@sp
+@author Based on @cite{The GAWK Manual},
+@author by Robbins, Close, Rubin, and Stallman
+
+@c Include the Distribution inside the titlepage environment so
+@c that headings are turned off. Headings on and off do not work.
+
+@page
+@vskip 0pt plus 1filll
+@ifset LEGALJUNK
+The programs and applications presented in this book have been
+included for their instructional value. They have been tested with care,
+but are not guaranteed for any particular purpose. The publisher does not
+offer any warranties or representations, nor does it accept any
+liabilities with respect to the programs or applications.
+So there.
+@sp 2
+UNIX is a registered trademark of X/Open, Ltd. @*
+Microsoft, MS, and MS-DOS are registered trademarks, and Windows is a
+trademark of Microsoft Corporation in the United States and other
+countries. @*
+Atari, 520ST, 1040ST, TT, STE, Mega, and Falcon are registered trademarks
+or trademarks of Atari Corporation. @*
+DEC, Digital, OpenVMS, ULTRIX, and VMS, are trademarks of Digital Equipment
+Corporation. @*
+@end ifset
+``To boldly go where no man has gone before'' is a
+Registered Trademark of Paramount Pictures Corporation. @*
+@c sorry, i couldn't resist
+@sp 3
+Copyright @copyright{} 1989, 1991 - 1996 Free Software Foundation, Inc.
+@sp 2
+
+This is Edition @value{EDITION} of @cite{@value{TITLE}}, @*
+for the @value{VERSION} (or later) version of the GNU implementation of AWK.
+
+@sp 2
+Published by the Free Software Foundation @*
+59 Temple Place --- Suite 330 @*
+Boston, MA 02111-1307 USA @*
+Phone: +1-617-542-5942 @*
+Fax (including Japan): +1-617-542-2652 @*
+Printed copies are available for $25 each. @*
+@c this ISBN can change! Check with the FSF office...
+@c This one is correct for gawk 3.0 and edition 1.0
+ISBN 1-882114-26-4 @*
+
+Permission is granted to make and distribute verbatim copies of
+this manual provided the copyright notice and this permission notice
+are preserved on all copies.
+
+Permission is granted to copy and distribute modified versions of this
+manual under the conditions for verbatim copying, provided that the entire
+resulting derived work is distributed under the terms of a permission
+notice identical to this one.
+
+Permission is granted to copy and distribute translations of this manual
+into another language, under the above conditions for modified versions,
+except that this permission notice may be stated in a translation approved
+by the Foundation.
+@sp 2
+Cover art by Etienne Suvasa.
+@end titlepage
+
+@c Thanks to Bob Chassell for directions on doing dedications.
+@iftex
+@headings off
+@page
+@w{ }
+@sp 9
+@center @i{To Miriam, for making me complete.}
+@sp
+@center @i{To Chana, for the joy you bring us.}
+@sp
+@center @i{To Rivka, for the exponential increase.}
+@page
+@w{ }
+@page
+@headings on
+@end iftex
+
+@iftex
+@headings off
+@evenheading @thispage@ @ @ @b{@thistitle} @| @|
+@oddheading @| @| @b{@thischapter}@ @ @ @thispage
+@ifset DRAFT
+@evenfooting @today{} @| @emph{DRAFT!} @| Please Do Not Redistribute
+@oddfooting Please Do Not Redistribute @| @emph{DRAFT!} @| @today{}
+@end ifset
+@end iftex
+
+@ifinfo
+@node Top, Preface, (dir), (dir)
+@top General Introduction
+@c Preface or Licensing nodes should come right after the Top
+@c node, in `unnumbered' sections, then the chapter, `What is gawk'.
+
+This file documents @code{awk}, a program that you can use to select
+particular records in a file and perform operations upon them.
+
+This is Edition @value{EDITION} of @cite{@value{TITLE}}, @*
+for the @value{VERSION} version of the GNU implementation @*
+of AWK.
+
+@end ifinfo
+
+@menu
+* Preface:: What this @value{DOCUMENT} is about; brief
+ history and acknowledgements.
+* What Is Awk:: What is the @code{awk} language; using this
+ @value{DOCUMENT}.
+* Getting Started:: A basic introduction to using @code{awk}. How
+ to run an @code{awk} program. Command line
+ syntax.
+* One-liners:: Short, sample @code{awk} programs.
+* Regexp:: All about matching things using regular
+ expressions.
+* Reading Files:: How to read files and manipulate fields.
+* Printing:: How to print using @code{awk}. Describes the
+ @code{print} and @code{printf} statements.
+ Also describes redirection of output.
+* Expressions:: Expressions are the basic building blocks of
+ statements.
+* Patterns and Actions:: Overviews of patterns and actions.
+* Statements:: The various control statements are described
+ in detail.
+* Built-in Variables:: Built-in Variables
+* Arrays:: The description and use of arrays. Also
+ includes array-oriented control statements.
+* Built-in:: The built-in functions are summarized here.
+* User-defined:: User-defined functions are described in
+ detail.
+* Invoking Gawk:: How to run @code{gawk}.
+* Library Functions:: A Library of @code{awk} Functions.
+* Sample Programs:: Many @code{awk} programs with complete
+ explanations.
+* Language History:: The evolution of the @code{awk} language.
+* Gawk Summary:: @code{gawk} Options and Language Summary.
+* Installation:: Installing @code{gawk} under various operating
+ systems.
+* Notes:: Something about the implementation of
+ @code{gawk}.
+* Glossary:: An explanation of some unfamiliar terms.
+* Copying:: Your right to copy and distribute @code{gawk}.
+* Index:: Concept and Variable Index.
+
+* History:: The history of @code{gawk} and @code{awk}.
+* Manual History:: Brief history of the GNU project and this
+ @value{DOCUMENT}.
+* Acknowledgements:: Acknowledgements.
+* This Manual:: Using this @value{DOCUMENT}. Includes sample
+ input files that you can use.
+* Conventions:: Typographical Conventions.
+* Sample Data Files:: Sample data files for use in the @code{awk}
+ programs illustrated in this @value{DOCUMENT}.
+* Names:: What name to use to find @code{awk}.
+* Running gawk:: How to run @code{gawk} programs; includes
+ command line syntax.
+* One-shot:: Running a short throw-away @code{awk} program.
+* Read Terminal:: Using no input files (input from terminal
+ instead).
+* Long:: Putting permanent @code{awk} programs in
+ files.
+* Executable Scripts:: Making self-contained @code{awk} programs.
+* Comments:: Adding documentation to @code{gawk} programs.
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example with two rules.
+* More Complex:: A more complex example.
+* Statements/Lines:: Subdividing or combining statements into
+ lines.
+* Other Features:: Other Features of @code{awk}.
+* When:: When to use @code{gawk} and when to use other
+ things.
+* Regexp Usage:: How to Use Regular Expressions.
+* Escape Sequences:: How to write non-printing characters.
+* Regexp Operators:: Regular Expression Operators.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
+* Leftmost Longest:: How much text matches.
+* Computed Regexps:: Using Dynamic Regexps.
+* Records:: Controlling how data is split into records.
+* Fields:: An introduction to fields.
+* Non-Constant Fields:: Non-constant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change it.
+* Basic Field Splitting:: How fields are split with single characters or
+ simple strings.
+* Regexp Field Splitting:: Using regexps as the field separator.
+* Single Character Fields:: Making each character a separate field.
+* Command Line Field Separator:: Setting @code{FS} from the command line.
+* Field Splitting Summary:: Some final points and a summary table.
+* Constant Size:: Reading constant width data.
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program control
+ using the @code{getline} function.
+* Getline Intro:: Introduction to the @code{getline} function.
+* Plain Getline:: Using @code{getline} with no arguments.
+* Getline/Variable:: Using @code{getline} into a variable.
+* Getline/File:: Using @code{getline} from a file.
+* Getline/Variable/File:: Using @code{getline} into a variable from a
+ file.
+* Getline/Pipe:: Using @code{getline} from a pipe.
+* Getline/Variable/Pipe:: Using @code{getline} into a variable from a
+ pipe.
+* Getline Summary:: Summary Of @code{getline} Variants.
+* Print:: The @code{print} statement.
+* Print Examples:: Simple examples of @code{print} statements.
+* Output Separators:: The output separators and how to change them.
+* OFMT:: Controlling Numeric Output With @code{print}.
+* Printf:: The @code{printf} statement.
+* Basic Printf:: Syntax of the @code{printf} statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+* Redirection:: How to redirect output to multiple files and
+ pipes.
+* Special Files:: File name interpretation in @code{gawk}.
+ @code{gawk} allows access to inherited file
+ descriptors.
+* Close Files And Pipes:: Closing Input and Output Files and Pipes.
+* Constants:: String, numeric, and regexp constants.
+* Scalar Constants:: Numeric and string constants.
+* Regexp Constants:: Regular Expression constants.
+* Using Constant Regexps:: When and how to use a regexp constant.
+* Variables:: Variables give names to values for later use.
+* Using Variables:: Using variables in your programs.
+* Assignment Options:: Setting variables on the command line and a
+ summary of command line syntax. This is an
+ advanced method of input.
+* Conversion:: The conversion of strings to numbers and vice
+ versa.
+* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-},
+ etc.)
+* Concatenation:: Concatenating strings.
+* Assignment Ops:: Changing the value of a variable or a field.
+* Increment Ops:: Incrementing the numeric value of a variable.
+* Truth Values:: What is ``true'' and what is ``false''.
+* Typing and Comparison:: How variables acquire types, and how this
+ affects comparison of numbers and strings with
+ @samp{<}, etc.
+* Boolean Ops:: Combining comparison expressions using boolean
+ operators @samp{||} (``or''), @samp{&&}
+ (``and'') and @samp{!} (``not'').
+* Conditional Exp:: Conditional expressions select between two
+ subexpressions under control of a third
+ subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+* Pattern Overview:: What goes into a pattern.
+* Kinds of Patterns:: A list of all kinds of patterns.
+* Regexp Patterns:: Using regexps as patterns.
+* Expression Patterns:: Any expression can be used as a pattern.
+* Ranges:: Pairs of patterns specify record ranges.
+* BEGIN/END:: Specifying initialization and cleanup rules.
+* Using BEGIN/END:: How and why to use BEGIN/END rules.
+* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
+* Empty:: The empty pattern, which matches every record.
+* Action Overview:: What goes into an action.
+* If Statement:: Conditionally execute some @code{awk}
+ statements.
+* While Statement:: Loop until some condition is satisfied.
+* Do Statement:: Do specified action while looping until some
+ condition is satisfied.
+* For Statement:: Another looping statement, that provides
+ initialization and increment clauses.
+* Break Statement:: Immediately exit the innermost enclosing loop.
+* Continue Statement:: Skip to the end of the innermost enclosing
+ loop.
+* Next Statement:: Stop processing the current input record.
+* Nextfile Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of @code{awk}.
+* User-modified:: Built-in variables that you change to control
+ @code{awk}.
+* Auto-set:: Built-in variables where @code{awk} gives you
+ information.
+* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}.
+* Array Intro:: Introduction to Arrays
+* Reference to Elements:: How to examine one element of an array.
+* Assigning Elements:: How to change an element of an array.
+* Array Example:: Basic Example of an Array
+* Scanning an Array:: A variation of the @code{for} statement. It
+ loops through the indices of an array's
+ existing elements.
+* Delete:: The @code{delete} statement removes an element
+ from an array.
+* Numeric Array Subscripts:: How to use numbers as subscripts in
+ @code{awk}.
+* Uninitialized Subscripts:: Using Uninitialized variables as subscripts.
+* Multi-dimensional:: Emulating multi-dimensional arrays in
+ @code{awk}.
+* Multi-scanning:: Scanning multi-dimensional arrays.
+* Calling Built-in:: How to call built-in functions.
+* Numeric Functions:: Functions that work with numbers, including
+ @code{int}, @code{sin} and @code{rand}.
+* String Functions:: Functions for string manipulation, such as
+ @code{split}, @code{match}, and
+ @code{sprintf}.
+* I/O Functions:: Functions for files and shell commands.
+* Time Functions:: Functions for dealing with time stamps.
+* Definition Syntax:: How to write definitions and what they mean.
+* Function Example:: An example function definition and what it
+ does.
+* Function Caveats:: Things to watch out for.
+* Return Statement:: Specifying the value a function returns.
+* Options:: Command line options and their meanings.
+* Other Arguments:: Input file names and variable assignments.
+* AWKPATH Variable:: Searching directories for @code{awk} programs.
+* Obsolete:: Obsolete Options and/or features.
+* Undocumented:: Undocumented Options and Features.
+* Known Bugs:: Known Bugs in @code{gawk}.
+* Portability Notes:: What to do if you don't have @code{gawk}.
+* Nextfile Function:: Two implementations of a @code{nextfile}
+ function.
+* Assert Function:: A function for assertions in @code{awk}
+ programs.
+* Ordinal Functions:: Functions for using characters as numbers and
+ vice versa.
+* Join Function:: A function to join an array into a string.
+* Mktime Function:: A function to turn a date into a timestamp.
+* Gettimeofday Function:: A function to get formatted times.
+* Filetrans Function:: A function for handling data file transitions.
+* Getopt Function:: A function for processing command line
+ arguments.
+* Passwd Functions:: Functions for getting user information.
+* Group Functions:: Functions for getting group information.
+* Library Names:: How to best name private global variables in
+ library functions.
+* Clones:: Clones of common utilities.
+* Cut Program:: The @code{cut} utility.
+* Egrep Program:: The @code{egrep} utility.
+* Id Program:: The @code{id} utility.
+* Split Program:: The @code{split} utility.
+* Tee Program:: The @code{tee} utility.
+* Uniq Program:: The @code{uniq} utility.
+* Wc Program:: The @code{wc} utility.
+* Miscellaneous Programs:: Some interesting @code{awk} programs.
+* Dupword Program:: Finding duplicated words in a document.
+* Alarm Program:: An alarm clock.
+* Translate Program:: A program similar to the @code{tr} utility.
+* Labels Program:: Printing mailing labels.
+* Word Sorting:: A program to produce a word usage count.
+* History Sorting:: Eliminating duplicate entries from a history
+ file.
+* Extract Program:: Pulling out programs from Texinfo source
+ files.
+* Simple Sed:: A Simple Stream Editor.
+* Igawk Program:: A wrapper for @code{awk} that includes files.
+* V7/SVR3.1:: The major changes between V7 and System V
+ Release 3.1.
+* SVR4:: Minor changes between System V Releases 3.1
+ and 4.
+* POSIX:: New features from the POSIX standard.
+* BTL:: New features from the AT&T Bell Laboratories
+ version of @code{awk}.
+* POSIX/GNU:: The extensions in @code{gawk} not in POSIX
+ @code{awk}.
+* Command Line Summary:: Recapitulation of the command line.
+* Language Summary:: A terse review of the language.
+* Variables/Fields:: Variables, fields, and arrays.
+* Fields Summary:: Input field splitting.
+* Built-in Summary:: @code{awk}'s built-in variables.
+* Arrays Summary:: Using arrays.
+* Data Type Summary:: Values in @code{awk} are numbers or strings.
+* Rules Summary:: Patterns and Actions, and their component
+ parts.
+* Pattern Summary:: Quick overview of patterns.
+* Regexp Summary:: Quick overview of regular expressions.
+* Actions Summary:: Quick overview of actions.
+* Operator Summary:: @code{awk} operators.
+* Control Flow Summary:: The control statements.
+* I/O Summary:: The I/O statements.
+* Printf Summary:: A summary of @code{printf}.
+* Special File Summary:: Special file names interpreted internally.
+* Built-in Functions Summary:: Built-in numeric and string functions.
+* Time Functions Summary:: Built-in time functions.
+* String Constants Summary:: Escape sequences in strings.
+* Functions Summary:: Defining and calling functions.
+* Historical Features:: Some undocumented but supported ``features''.
+* Gawk Distribution:: What is in the @code{gawk} distribution.
+* Getting:: How to get the distribution.
+* Extracting:: How to extract the distribution.
+* Distribution contents:: What is in the distribution.
+* Unix Installation:: Installing @code{gawk} under various versions
+ of Unix.
+* Quick Installation:: Compiling @code{gawk} under Unix.
+* Configuration Philosophy:: How it's all supposed to work.
+* VMS Installation:: Installing @code{gawk} on VMS.
+* VMS Compilation:: How to compile @code{gawk} under VMS.
+* VMS Installation Details:: How to install @code{gawk} under VMS.
+* VMS Running:: How to run @code{gawk} under VMS.
+* VMS POSIX:: Alternate instructions for VMS POSIX.
+* PC Installation:: Installing and Compiling @code{gawk} on MS-DOS
+ and OS/2
+* Atari Installation:: Installing @code{gawk} on the Atari ST.
+* Atari Compiling:: Compiling @code{gawk} on Atari
+* Atari Using:: Running @code{gawk} on Atari
+* Amiga Installation:: Installing @code{gawk} on an Amiga.
+* Bugs:: Reporting Problems and Bugs.
+* Other Versions:: Other freely available @code{awk}
+ implementations.
+* Compatibility Mode:: How to disable certain @code{gawk} extensions.
+* Additions:: Making Additions To @code{gawk}.
+* Adding Code:: Adding code to the main body of @code{gawk}.
+* New Ports:: Porting @code{gawk} to a new operating system.
+* Future Extensions:: New features that may be implemented one day.
+* Improvements:: Suggestions for improvements by volunteers.
+
+@end menu
+
+@c dedication for Info file
+@ifinfo
+@center To Miriam, for making me complete.
+@sp 1
+@center To Chana, for the joy you bring us.
+@sp 1
+@center To Rivka, for the exponential increase.
+@end ifinfo
+
+@node Preface, What Is Awk, Top, Top
+@unnumbered Preface
+
+@c I saw a comment somewhere that the preface should describe the book itself,
+@c and the introduction should describe what the book covers.
+
+This @value{DOCUMENT} teaches you about the @code{awk} language and
+how you can use it effectively. You should already be familiar with basic
+system commands, such as @code{cat} and @code{ls},@footnote{These commands
+are available on POSIX compliant systems, as well as on traditional Unix
+based systems. If you are using some other operating system, you still need to
+be familiar with the ideas of I/O redirection and pipes} and basic shell
+facilities, such as Input/Output (I/O) redirection and pipes.
+
+Implementations of the @code{awk} language are available for many different
+computing environments. This @value{DOCUMENT}, while describing the @code{awk} language
+in general, also describes a particular implementation of @code{awk} called
+@code{gawk} (which stands for ``GNU Awk''). @code{gawk} runs on a broad range
+of Unix systems, ranging from 80386 PC-based computers, up through large scale
+systems, such as Crays. @code{gawk} has also been ported to MS-DOS and
+OS/2 PC's, Atari and Amiga micro-computers, and VMS.
+
+@menu
+* History:: The history of @code{gawk} and @code{awk}.
+* Manual History:: Brief history of the GNU project and this
+ @value{DOCUMENT}.
+* Acknowledgements:: Acknowledgements.
+@end menu
+
+@node History, Manual History, Preface, Preface
+@unnumberedsec History of @code{awk} and @code{gawk}
+
+@cindex acronym
+@cindex history of @code{awk}
+@cindex Aho, Alfred
+@cindex Weinberger, Peter
+@cindex Kernighan, Brian
+@cindex old @code{awk}
+@cindex new @code{awk}
+The name @code{awk} comes from the initials of its designers: Alfred V.@:
+Aho, Peter J.@: Weinberger, and Brian W.@: Kernighan. The original version of
+@code{awk} was written in 1977 at AT&T Bell Laboratories.
+In 1985 a new version made the programming
+language more powerful, introducing user-defined functions, multiple input
+streams, and computed regular expressions.
+This new version became generally available with Unix System V Release 3.1.
+The version in System V Release 4 added some new features and also cleaned
+up the behavior in some of the ``dark corners'' of the language.
+The specification for @code{awk} in the POSIX Command Language
+and Utilities standard further clarified the language based on feedback
+from both the @code{gawk} designers, and the original Bell Labs @code{awk}
+designers.
+
+The GNU implementation, @code{gawk}, was written in 1986 by Paul Rubin
+and Jay Fenlason, with advice from Richard Stallman. John Woods
+contributed parts of the code as well. In 1988 and 1989, David Trueman, with
+help from Arnold Robbins, thoroughly reworked @code{gawk} for compatibility
+with the newer @code{awk}. Current development focuses on bug fixes,
+performance improvements, standards compliance, and occasionally, new features.
+
+@node Manual History, Acknowledgements, History, Preface
+@unnumberedsec The GNU Project and This Book
+
+@cindex Free Software Foundation
+The Free Software Foundation (FSF) is a non-profit organization dedicated
+to the production and distribution of freely distributable software.
+It was founded by Richard M.@: Stallman, the author of the original
+Emacs editor. GNU Emacs is the most widely used version of Emacs today.
+
+@cindex GNU Project
+The GNU project is an on-going effort on the part of the Free Software
+Foundation to create a complete, freely distributable, POSIX compliant
+computing environment. (GNU stands for ``GNU's not Unix''.)
+The FSF uses the ``GNU General Public License'' (or GPL) to ensure that
+source code for their software is always available to the end user. A
+copy of the GPL is included for your reference
+(@pxref{Copying, ,GNU GENERAL PUBLIC LICENSE}).
+The GPL applies to the C language source code for @code{gawk}.
+
+As of this writing (1995), the only major component of the
+GNU environment still uncompleted is the operating system kernel, and
+work proceeds apace on that. A shell, an editor (Emacs), highly portable
+optimizing C, C++, and Objective-C compilers, a symbolic debugger, and dozens
+of large and small utilities (such as @code{gawk}),
+have all been completed and are freely available.
+
+@cindex Linux
+@cindex NetBSD
+@cindex FreeBSD
+Until the GNU operating system is released, the FSF recommends the use
+of Linux, a freely distributable, Unix-like operating system for 80386
+and other systems. There are many books on Linux. One freely available one
+is @cite{Linux Installation and Getting Started}, by Matt Welsh.
+Many Linux distributions are available, often in computer stores or
+bundled on CD-ROM with books about Linux. Also, the FSF provides a Linux
+distribution (``Debian''); contact them for more information.
+@xref{Getting, ,Getting the @code{gawk} Distribution}, for the FSF's contact
+information.
+(There are two other freely available, Unix-like operating systems for
+80386 and other systems, NetBSD and FreeBSD. Both are based on the
+4.4-Lite Berkeley Software Distribution, and both use recent versions
+of @code{gawk} for their versions of @code{awk}.)
+
+@iftex
+This @value{DOCUMENT} you are reading now is actually free. The
+information in it is freely available to anyone, the machine readable
+source code for the @value{DOCUMENT} comes with @code{gawk}, and anyone
+may take this @value{DOCUMENT} to a copying machine and make as many
+copies of it as they like. (Take a moment to check the copying
+permissions on the Copyright page.)
+
+If you paid money for this @value{DOCUMENT}, what you actually paid for
+was the @value{DOCUMENT}'s nice printing and binding, and the
+publisher's associated costs to produce it. We have made an effort to
+keep these costs reasonable; most people would prefer a bound book to
+over 300 pages of photo-copied text that would then have to be held in
+a loose-leaf binder (not to mention the time and labor involved in
+doing the copying). The same is true of producing this
+@value{DOCUMENT} from the machine readable source; the retail price is
+only slightly more than the cost per page of printing it
+on a laser printer.
+@end iftex
+
+This @value{DOCUMENT} itself has gone through several previous,
+preliminary editions. I started working on a preliminary draft of
+@cite{The GAWK Manual}, by Diane Close, Paul Rubin, and Richard
+Stallman in the fall of 1988.
+It was around 90 pages long, and barely described the original, ``old''
+version of @code{awk}. After substantial revision, the first version of
+the @cite{The GAWK Manual} to be released was Edition 0.11 Beta in
+October of 1989. The manual then underwent more substantial revision
+for Edition 0.13 of December 1991.
+David Trueman, Pat Rankin, and Michal Jaegermann contributed sections
+of the manual for Edition 0.13.
+That edition was published by the
+FSF as a bound book early in 1992. Since then there have been several
+minor revisions, notably Edition 0.14 of November 1992 that was published
+by the FSF in January of 1993, and Edition 0.16 of August 1993.
+
+Edition 1.0 of @cite{@value{TITLE}} represents a significant re-working
+of @cite{The GAWK Manual}, with much additional material.
+The FSF and I agree that I am now the primary author.
+I also felt that it needed a more descriptive title.
+
+@cite{@value{TITLE}} will undoubtedly continue to evolve.
+An electronic version
+comes with the @code{gawk} distribution from the FSF.
+If you find an error in this @value{DOCUMENT}, please report it!
+@xref{Bugs, ,Reporting Problems and Bugs}, for information on submitting
+problem reports electronically, or write to me in care of the FSF.
+
+@node Acknowledgements, , Manual History, Preface
+@unnumberedsec Acknowledgements
+
+I would like to acknowledge Richard M.@: Stallman, for his vision of a
+better world, and for his courage in founding the FSF and starting the
+GNU project.
+
+The initial draft of @cite{The GAWK Manual} had the following acknowledgements:
+
+@quotation
+Many people need to be thanked for their assistance in producing this
+manual. Jay Fenlason contributed many ideas and sample programs. Richard
+Mlynarik and Robert Chassell gave helpful comments on drafts of this
+manual. The paper @cite{A Supplemental Document for @code{awk}} by John W.@:
+Pierce of the Chemistry Department at UC San Diego, pinpointed several
+issues relevant both to @code{awk} implementation and to this manual, that
+would otherwise have escaped us.
+@end quotation
+
+The following people provided many helpful comments on Edition 0.13 of
+@cite{The GAWK Manual}: Rick Adams, Michael Brennan, Rich Burridge, Diane Close,
+Christopher (``Topher'') Eliot, Michael Lijewski, Pat Rankin, Miriam Robbins,
+and Michal Jaegermann.
+
+The following people provided many helpful comments for Edition 1.0 of
+@cite{@value{TITLE}}: Karl Berry, Michael Brennan, Darrel
+Hankerson, Michal Jaegermann, Michael Lijewski, and Miriam Robbins.
+Pat Rankin, Michal Jaegermann, Darrel Hankerson and Scott Deifik
+updated their respective sections for Edition 1.0.
+
+Robert J.@: Chassell provided much valuable advice on
+the use of Texinfo. He also deserves special thanks for
+convincing me @emph{not} to title this @value{DOCUMENT}
+@cite{How To Gawk Politely}.
+Karl Berry helped significantly with the @TeX{} part of Texinfo.
+
+@cindex Trueman, David
+David Trueman deserves special credit; he has done a yeoman job
+of evolving @code{gawk} so that it performs well, and without bugs.
+Although he is no longer involved with @code{gawk},
+working with him on this project was a significant pleasure.
+
+@cindex Deifik, Scott
+@cindex Hankerson, Darrel
+@cindex Rommel, Kai Uwe
+@cindex Rankin, Pat
+@cindex Jaegermann, Michal
+Scott Deifik, Darrel Hankerson, Kai Uwe Rommel, Pat Rankin, and Michal
+Jaegermann (in no particular order) are long time members of the
+@code{gawk} ``crack portability team.'' Without their hard work and
+help, @code{gawk} would not be nearly the fine program it is today. It
+has been and continues to be a pleasure working with this team of fine
+people.
+
+@cindex Friedl, Jeffrey
+Jeffrey Friedl provided invaluable help in tracking down a number
+of last minute problems with regular expressions in @code{gawk} 3.0.
+
+@cindex Kernighan, Brian
+David and I would like to thank Brian Kernighan of Bell Labs for
+invaluable assistance during the testing and debugging of @code{gawk}, and for
+help in clarifying numerous points about the language. We could not have
+done nearly as good a job on either @code{gawk} or its documentation without
+his help.
+
+@cindex Hughes, Phil
+I would like to thank Marshall and Elaine Hartholz of Seattle, and Dr.@:
+Bert and Rita Schreiber of Detroit for large amounts of quiet vacation
+time in their homes, which allowed me to make significant progress on
+this @value{DOCUMENT} and on @code{gawk} itself. Phil Hughes of SSC
+contributed in a very important way by loaning me his laptop Linux
+system, not once, but twice, allowing me to do a lot of work while
+away from home.
+
+@cindex Robbins, Miriam
+Finally, I must thank my wonderful wife, Miriam, for her patience through
+the many versions of this project, for her proof-reading,
+and for sharing me with the computer.
+I would like to thank my parents for their love, and for the grace with
+which they raised and educated me.
+I also must acknowledge my gratitude to G-d, for the many opportunities
+He has sent my way, as well as for the gifts He has given me with which to
+take advantage of those opportunities.
+@sp 2
+@noindent
+Arnold Robbins @*
+Atlanta, Georgia @*
+January, 1996
+
+@ignore
+Stuff still not covered anywhere:
+BASICS:
+ Integer vs. floating point
+ Hex vs. octal vs. decimal
+ Interpreter vs compiler
+ input/output
+@end ignore
+
+@node What Is Awk, Getting Started, Preface, Top
+@chapter Introduction
+
+If you are like many computer users, you would frequently like to make
+changes in various text files wherever certain patterns appear, or
+extract data from parts of certain lines while discarding the rest. To
+write a program to do this in a language such as C or Pascal is a
+time-consuming inconvenience that may take many lines of code. The job
+may be easier with @code{awk}.
+
+The @code{awk} utility interprets a special-purpose programming language
+that makes it possible to handle simple data-reformatting jobs
+with just a few lines of code.
+
+The GNU implementation of @code{awk} is called @code{gawk}; it is fully
+upward compatible with the System V Release 4 version of
+@code{awk}. @code{gawk} is also upward compatible with the POSIX
+specification of the @code{awk} language. This means that all
+properly written @code{awk} programs should work with @code{gawk}.
+Thus, we usually don't distinguish between @code{gawk} and other @code{awk}
+implementations.
+
+@cindex uses of @code{awk}
+Using @code{awk} you can:
+
+@itemize @bullet
+@item
+manage small, personal databases
+
+@item
+generate reports
+
+@item
+validate data
+
+@item
+produce indexes, and perform other document preparation tasks
+
+@item
+even experiment with algorithms that can be adapted later to other computer
+languages
+@end itemize
+
+@menu
+* This Manual:: Using this @value{DOCUMENT}. Includes sample
+ input files that you can use.
+* Conventions:: Typographical Conventions.
+* Sample Data Files:: Sample data files for use in the @code{awk}
+ programs illustrated in this @value{DOCUMENT}.
+@end menu
+
+@node This Manual, Conventions, What Is Awk, What Is Awk
+@section Using This Book
+@cindex book, using this
+@cindex using this book
+@cindex language, @code{awk}
+@cindex program, @code{awk}
+@ignore
+@cindex @code{awk} language
+@cindex @code{awk} program
+@end ignore
+
+The term @code{awk} refers to a particular program, and to the language you
+use to tell this program what to do. When we need to be careful, we call
+the program ``the @code{awk} utility'' and the language ``the @code{awk}
+language.'' The term @code{gawk} refers to a version of @code{awk} developed
+as part the GNU project. The purpose of this @value{DOCUMENT} is to explain
+both the @code{awk} language and how to run the @code{awk} utility.
+
+The main purpose of the @value{DOCUMENT} is to explain the features
+of @code{awk}, as defined in the POSIX standard. It does so in the context
+of one particular implementation, @code{gawk}. While doing so, it will also
+attempt to describe important differences between @code{gawk} and other
+@code{awk} implementations. Finally, any @code{gawk} features that
+are not in the POSIX standard for @code{awk} will be noted.
+
+@iftex
+This @value{DOCUMENT} has the difficult task of being both tutorial and reference.
+If you are a novice, feel free to skip over details that seem too complex.
+You should also ignore the many cross references; they are for the
+expert user, and for the on-line Info version of the document.
+@end iftex
+
+The term @dfn{@code{awk} program} refers to a program written by you in
+the @code{awk} programming language.
+
+@xref{Getting Started, ,Getting Started with @code{awk}}, for the bare
+essentials you need to know to start using @code{awk}.
+
+Some useful ``one-liners'' are included to give you a feel for the
+@code{awk} language (@pxref{One-liners, ,Useful One Line Programs}).
+
+Many sample @code{awk} programs have been provided for you
+(@pxref{Library Functions, ,A Library of @code{awk} Functions}; also
+@pxref{Sample Programs, ,Practical @code{awk} Programs}).
+
+The entire @code{awk} language is summarized for quick reference in
+@ref{Gawk Summary, ,@code{gawk} Summary}. Look there if you just need
+to refresh your memory about a particular feature.
+
+If you find terms that you aren't familiar with, try looking them
+up in the glossary (@pxref{Glossary}).
+
+Most of the time complete @code{awk} programs are used as examples, but in
+some of the more advanced sections, only the part of the @code{awk} program
+that illustrates the concept being described is shown.
+
+While this @value{DOCUMENT} is aimed principally at people who have not been
+exposed
+to @code{awk}, there is a lot of information here that even the @code{awk}
+expert should find useful. In particular, the description of POSIX
+@code{awk}, and the example programs in
+@ref{Library Functions, ,A Library of @code{awk} Functions}, and
+@ref{Sample Programs, ,Practical @code{awk} Programs},
+should be of interest.
+
+@c fakenode --- for prepinfo
+@unnumberedsubsec Dark Corners
+
+@cindex d.c., see ``dark corner''
+@cindex dark corner
+Until the POSIX standard (and @cite{The Gawk Manual}),
+many features of @code{awk} were either poorly documented, or not
+documented at all. Descriptions of such features
+(often called ``dark corners'') are noted in this @value{DOCUMENT} with
+``(d.c.)''.
+They also appear in the index under the heading ``dark corner.''
+
+@node Conventions, Sample Data Files, This Manual, What Is Awk
+@section Typographical Conventions
+
+This @value{DOCUMENT} is written using Texinfo, the GNU documentation formatting language.
+A single Texinfo source file is used to produce both the printed and on-line
+versions of the documentation.
+@iftex
+Because of this, the typographical conventions
+are slightly different than in other books you may have read.
+@end iftex
+@ifinfo
+This section briefly documents the typographical conventions used in Texinfo.
+@end ifinfo
+
+Examples you would type at the command line are preceded by the common
+shell primary and secondary prompts, @samp{$} and @samp{>}.
+Output from the command is preceded by the glyph ``@print{}''.
+This typically represents the command's standard output.
+Error messages, and other output on the command's standard error, are preceded
+by the glyph ``@error{}''. For example:
+
+@example
+$ echo hi on stdout
+@print{} hi on stdout
+$ echo hello on stderr 1>&2
+@error{} hello on stderr
+@end example
+
+@iftex
+In the text, command names appear in @code{this font}, while code segments
+appear in the same font and quoted, @samp{like this}. Some things will
+be emphasized @emph{like this}, and if a point needs to be made
+strongly, it will be done @strong{like this}. The first occurrence of
+a new term is usually its @dfn{definition}, and appears in the same
+font as the previous occurrence of ``definition'' in this sentence.
+File names are indicated like this: @file{/path/to/ourfile}.
+@end iftex
+
+Characters that you type at the keyboard look @kbd{like this}. In particular,
+there are special characters called ``control characters.'' These are
+characters that you type by holding down both the @kbd{CONTROL} key and
+another key, at the same time. For example, a @kbd{Control-d} is typed
+by first pressing and holding the @kbd{CONTROL} key, next
+pressing the @kbd{d} key, and finally releasing both keys.
+
+@node Sample Data Files, , Conventions, What Is Awk
+@section Data Files for the Examples
+
+@cindex input file, sample
+@cindex sample input file
+@cindex @file{BBS-list} file
+Many of the examples in this @value{DOCUMENT} take their input from two sample
+data files. The first, called @file{BBS-list}, represents a list of
+computer bulletin board systems together with information about those systems.
+The second data file, called @file{inventory-shipped}, contains
+information about shipments on a monthly basis. In both files,
+each line is considered to be one @dfn{record}.
+
+In the file @file{BBS-list}, each record contains the name of a computer
+bulletin board, its phone number, the board's baud rate(s), and a code for
+the number of hours it is operational. An @samp{A} in the last column
+means the board operates 24 hours a day. A @samp{B} in the last
+column means the board operates evening and weekend hours, only. A
+@samp{C} means the board operates only on weekends.
+
+@c 2e: Update the baud rates to reflect today's faster modems
+@example
+@c system mkdir eg
+@c system mkdir eg/lib
+@c system mkdir eg/data
+@c system mkdir eg/prog
+@c system mkdir eg/misc
+@c file eg/data/BBS-list
+aardvark 555-5553 1200/300 B
+alpo-net 555-3412 2400/1200/300 A
+barfly 555-7685 1200/300 A
+bites 555-1675 2400/1200/300 A
+camelot 555-0542 300 C
+core 555-2912 1200/300 C
+fooey 555-1234 2400/1200/300 B
+foot 555-6699 1200/300 B
+macfoo 555-6480 1200/300 A
+sdace 555-3430 2400/1200/300 A
+sabafoo 555-2127 1200/300 C
+@c endfile
+@end example
+
+@cindex @file{inventory-shipped} file
+The second data file, called @file{inventory-shipped}, represents
+information about shipments during the year.
+Each record contains the month of the year, the number
+of green crates shipped, the number of red boxes shipped, the number of
+orange bags shipped, and the number of blue packages shipped,
+respectively. There are 16 entries, covering the 12 months of one year
+and four months of the next year.
+
+@example
+@c file eg/data/inventory-shipped
+Jan 13 25 15 115
+Feb 15 32 24 226
+Mar 15 24 34 228
+Apr 31 52 63 420
+May 16 34 29 208
+Jun 31 42 75 492
+Jul 24 34 67 436
+Aug 15 34 47 316
+Sep 13 55 37 277
+Oct 29 54 68 525
+Nov 20 87 82 577
+Dec 17 35 61 401
+
+Jan 21 36 64 620
+Feb 26 58 80 652
+Mar 24 75 70 495
+Apr 21 70 74 514
+@c endfile
+@end example
+
+@ifinfo
+If you are reading this in GNU Emacs using Info, you can copy the regions
+of text showing these sample files into your own test files. This way you
+can try out the examples shown in the remainder of this document. You do
+this by using the command @kbd{M-x write-region} to copy text from the Info
+file into a file for use with @code{awk}
+(@xref{Misc File Ops, , Miscellaneous File Operations, emacs, GNU Emacs Manual},
+for more information). Using this information, create your own
+@file{BBS-list} and @file{inventory-shipped} files, and practice what you
+learn in this @value{DOCUMENT}.
+
+If you are using the stand-alone version of Info,
+see @ref{Extract Program, ,Extracting Programs from Texinfo Source Files},
+for an @code{awk} program that will extract these data files from
+@file{gawk.texi}, the Texinfo source file for this Info file.
+@end ifinfo
+
+@node Getting Started, One-liners, What Is Awk, Top
+@chapter Getting Started with @code{awk}
+@cindex script, definition of
+@cindex rule, definition of
+@cindex program, definition of
+@cindex basic function of @code{awk}
+
+The basic function of @code{awk} is to search files for lines (or other
+units of text) that contain certain patterns. When a line matches one
+of the patterns, @code{awk} performs specified actions on that line.
+@code{awk} keeps processing input lines in this way until the end of the
+input files are reached.
+
+@cindex data-driven languages
+@cindex procedural languages
+@cindex language, data-driven
+@cindex language, procedural
+Programs in @code{awk} are different from programs in most other languages,
+because @code{awk} programs are @dfn{data-driven}; that is, you describe
+the data you wish to work with, and then what to do when you find it.
+Most other languages are @dfn{procedural}; you have to describe, in great
+detail, every step the program is to take. When working with procedural
+languages, it is usually much
+harder to clearly describe the data your program will process.
+For this reason, @code{awk} programs are often refreshingly easy to both
+write and read.
+
+@cindex program, definition of
+@cindex rule, definition of
+When you run @code{awk}, you specify an @code{awk} @dfn{program} that
+tells @code{awk} what to do. The program consists of a series of
+@dfn{rules}. (It may also contain @dfn{function definitions},
+an advanced feature which we will ignore for now.
+@xref{User-defined, ,User-defined Functions}.) Each rule specifies one
+pattern to search for, and one action to perform when that pattern is found.
+
+Syntactically, a rule consists of a pattern followed by an action. The
+action is enclosed in curly braces to separate it from the pattern.
+Rules are usually separated by newlines. Therefore, an @code{awk}
+program looks like this:
+
+@example
+@var{pattern} @{ @var{action} @}
+@var{pattern} @{ @var{action} @}
+@dots{}
+@end example
+
+@menu
+* Names:: What name to use to find @code{awk}.
+* Running gawk:: How to run @code{gawk} programs; includes
+ command line syntax.
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example with two rules.
+* More Complex:: A more complex example.
+* Statements/Lines:: Subdividing or combining statements into
+ lines.
+* Other Features:: Other Features of @code{awk}.
+* When:: When to use @code{gawk} and when to use other
+ things.
+@end menu
+
+@node Names, Running gawk , Getting Started, Getting Started
+@section A Rose By Any Other Name
+
+@cindex old @code{awk} vs. new @code{awk}
+@cindex new @code{awk} vs. old @code{awk}
+The @code{awk} language has evolved over the years. Full details are
+provided in @ref{Language History, ,The Evolution of the @code{awk} Language}.
+The language described in this @value{DOCUMENT}
+is often referred to as ``new @code{awk}.''
+
+Because of this, many systems have multiple
+versions of @code{awk}.
+Some systems have an @code{awk} utility that implements the
+original version of the @code{awk} language, and a @code{nawk} utility
+for the new version. Others have an @code{oawk} for the ``old @code{awk}''
+language, and plain @code{awk} for the new one. Still others only
+have one version, usually the new one.@footnote{Often, these systems
+use @code{gawk} for their @code{awk} implementation!}
+
+All in all, this makes it difficult for you to know which version of
+@code{awk} you should run when writing your programs. The best advice
+we can give here is to check your local documentation. Look for @code{awk},
+@code{oawk}, and @code{nawk}, as well as for @code{gawk}. Chances are, you
+will have some version of new @code{awk} on your system, and that is what
+you should use when running your programs. (Of course, if you're reading
+this @value{DOCUMENT}, chances are good that you have @code{gawk}!)
+
+Throughout this @value{DOCUMENT}, whenever we refer to a language feature
+that should be available in any complete implementation of POSIX @code{awk},
+we simply use the term @code{awk}. When referring to a feature that is
+specific to the GNU implementation, we use the term @code{gawk}.
+
+@node Running gawk, Very Simple, Names, Getting Started
+@section How to Run @code{awk} Programs
+
+@cindex command line formats
+@cindex running @code{awk} programs
+There are several ways to run an @code{awk} program. If the program is
+short, it is easiest to include it in the command that runs @code{awk},
+like this:
+
+@example
+awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
+@end example
+
+@noindent
+where @var{program} consists of a series of patterns and actions, as
+described earlier.
+(The reason for the single quotes is described below, in
+@ref{One-shot, ,One-shot Throw-away @code{awk} Programs}.)
+
+When the program is long, it is usually more convenient to put it in a file
+and run it with a command like this:
+
+@example
+awk -f @var{program-file} @var{input-file1} @var{input-file2} @dots{}
+@end example
+
+@menu
+* One-shot:: Running a short throw-away @code{awk} program.
+* Read Terminal:: Using no input files (input from terminal
+ instead).
+* Long:: Putting permanent @code{awk} programs in
+ files.
+* Executable Scripts:: Making self-contained @code{awk} programs.
+* Comments:: Adding documentation to @code{gawk} programs.
+@end menu
+
+@node One-shot, Read Terminal, Running gawk, Running gawk
+@subsection One-shot Throw-away @code{awk} Programs
+
+Once you are familiar with @code{awk}, you will often type in simple
+programs the moment you want to use them. Then you can write the
+program as the first argument of the @code{awk} command, like this:
+
+@example
+awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
+@end example
+
+@noindent
+where @var{program} consists of a series of @var{patterns} and
+@var{actions}, as described earlier.
+
+@cindex single quotes, why needed
+This command format instructs the @dfn{shell}, or command interpreter,
+to start @code{awk} and use the @var{program} to process records in the
+input file(s). There are single quotes around @var{program} so that
+the shell doesn't interpret any @code{awk} characters as special shell
+characters. They also cause the shell to treat all of @var{program} as
+a single argument for @code{awk} and allow @var{program} to be more
+than one line long.
+
+This format is also useful for running short or medium-sized @code{awk}
+programs from shell scripts, because it avoids the need for a separate
+file for the @code{awk} program. A self-contained shell script is more
+reliable since there are no other files to misplace.
+
+@ref{One-liners, , Useful One Line Programs}, presents several short,
+self-contained programs.
+
+@iftex
+@page
+@end iftex
+As an interesting side point, the command
+
+@example
+awk '/foo/' @var{files} @dots{}
+@end example
+
+@noindent
+is essentially the same as
+
+@cindex @code{egrep}
+@example
+egrep foo @var{files} @dots{}
+@end example
+
+@node Read Terminal, Long, One-shot, Running gawk
+@subsection Running @code{awk} without Input Files
+
+@cindex standard input
+@cindex input, standard
+You can also run @code{awk} without any input files. If you type the
+command line:
+
+@example
+awk '@var{program}'
+@end example
+
+@noindent
+then @code{awk} applies the @var{program} to the @dfn{standard input},
+which usually means whatever you type on the terminal. This continues
+until you indicate end-of-file by typing @kbd{Control-d}.
+(On other operating systems, the end-of-file character may be different.
+For example, on OS/2 and MS-DOS, it is @kbd{Control-z}.)
+
+For example, the following program prints a friendly piece of advice
+(from Douglas Adams' @cite{The Hitchhiker's Guide to the Galaxy}),
+to keep you from worrying about the complexities of computer programming
+(@samp{BEGIN} is a feature we haven't discussed yet).
+
+@example
+$ awk "BEGIN @{ print \"Don't Panic!\" @}"
+@print{} Don't Panic!
+@end example
+
+@cindex quoting, shell
+@cindex shell quoting
+This program does not read any input. The @samp{\} before each of the
+inner double quotes is necessary because of the shell's quoting rules,
+in particular because it mixes both single quotes and double quotes.
+
+This next simple @code{awk} program
+emulates the @code{cat} utility; it copies whatever you type at the
+keyboard to its standard output. (Why this works is explained shortly.)
+
+@example
+$ awk '@{ print @}'
+Now is the time for all good men
+@print{} Now is the time for all good men
+to come to the aid of their country.
+@print{} to come to the aid of their country.
+Four score and seven years ago, ...
+@print{} Four score and seven years ago, ...
+What, me worry?
+@print{} What, me worry?
+@kbd{Control-d}
+@end example
+
+@node Long, Executable Scripts, Read Terminal, Running gawk
+@subsection Running Long Programs
+
+@cindex running long programs
+@cindex @code{-f} option
+@cindex program file
+@cindex file, @code{awk} program
+Sometimes your @code{awk} programs can be very long. In this case it is
+more convenient to put the program into a separate file. To tell
+@code{awk} to use that file for its program, you type:
+
+@example
+awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{}
+@end example
+
+The @samp{-f} instructs the @code{awk} utility to get the @code{awk} program
+from the file @var{source-file}. Any file name can be used for
+@var{source-file}. For example, you could put the program:
+
+@example
+BEGIN @{ print "Don't Panic!" @}
+@end example
+
+@noindent
+into the file @file{advice}. Then this command:
+
+@example
+awk -f advice
+@end example
+
+@noindent
+does the same thing as this one:
+
+@example
+awk "BEGIN @{ print \"Don't Panic!\" @}"
+@end example
+
+@cindex quoting, shell
+@cindex shell quoting
+@noindent
+which was explained earlier (@pxref{Read Terminal, ,Running @code{awk} without Input Files}).
+Note that you don't usually need single quotes around the file name that you
+specify with @samp{-f}, because most file names don't contain any of the shell's
+special characters. Notice that in @file{advice}, the @code{awk}
+program did not have single quotes around it. The quotes are only needed
+for programs that are provided on the @code{awk} command line.
+
+If you want to identify your @code{awk} program files clearly as such,
+you can add the extension @file{.awk} to the file name. This doesn't
+affect the execution of the @code{awk} program, but it does make
+``housekeeping'' easier.
+
+@node Executable Scripts, Comments, Long, Running gawk
+@subsection Executable @code{awk} Programs
+@cindex executable scripts
+@cindex scripts, executable
+@cindex self contained programs
+@cindex program, self contained
+@cindex @code{#!} (executable scripts)
+
+Once you have learned @code{awk}, you may want to write self-contained
+@code{awk} scripts, using the @samp{#!} script mechanism. You can do
+this on many Unix systems@footnote{The @samp{#!} mechanism works on
+Linux systems,
+Unix systems derived from Berkeley Unix, System V Release 4, and some System
+V Release 3 systems.} (and someday on the GNU system).
+
+For example, you could update the file @file{advice} to look like this:
+
+@example
+#! /bin/awk -f
+
+BEGIN @{ print "Don't Panic!" @}
+@end example
+
+@noindent
+After making this file executable (with the @code{chmod} utility), you
+can simply type @samp{advice}
+at the shell, and the system will arrange to run @code{awk} @footnote{The
+line beginning with @samp{#!} lists the full file name of an interpreter
+to be run, and an optional initial command line argument to pass to that
+interpreter. The operating system then runs the interpreter with the given
+argument and the full argument list of the executed program. The first argument
+in the list is the full file name of the @code{awk} program. The rest of the
+argument list will either be options to @code{awk}, or data files,
+or both.} as if you had typed @samp{awk -f advice}.
+
+@example
+$ advice
+@print{} Don't Panic!
+@end example
+
+@noindent
+Self-contained @code{awk} scripts are useful when you want to write a
+program which users can invoke without their having to know that the program is
+written in @code{awk}.
+
+@cindex shell scripts
+@cindex scripts, shell
+Some older systems do not support the @samp{#!} mechanism. You can get a
+similar effect using a regular shell script. It would look something
+like this:
+
+@example
+: The colon ensures execution by the standard shell.
+awk '@var{program}' "$@@"
+@end example
+
+Using this technique, it is @emph{vital} to enclose the @var{program} in
+single quotes to protect it from interpretation by the shell. If you
+omit the quotes, only a shell wizard can predict the results.
+
+The @code{"$@@"} causes the shell to forward all the command line
+arguments to the @code{awk} program, without interpretation. The first
+line, which starts with a colon, is used so that this shell script will
+work even if invoked by a user who uses the C shell. (Not all older systems
+obey this convention, but many do.)
+@c 2e:
+@c Someday: (See @cite{The Bourne Again Shell}, by ??.)
+
+@node Comments, , Executable Scripts, Running gawk
+@subsection Comments in @code{awk} Programs
+@cindex @code{#} (comment)
+@cindex comments
+@cindex use of comments
+@cindex documenting @code{awk} programs
+@cindex programs, documenting
+
+A @dfn{comment} is some text that is included in a program for the sake
+of human readers; it is not really part of the program. Comments
+can explain what the program does, and how it works. Nearly all
+programming languages have provisions for comments, because programs are
+typically hard to understand without their extra help.
+
+In the @code{awk} language, a comment starts with the sharp sign
+character, @samp{#}, and continues to the end of the line.
+The @samp{#} does not have to be the first character on the line. The
+@code{awk} language ignores the rest of a line following a sharp sign.
+For example, we could have put the following into @file{advice}:
+
+@example
+# This program prints a nice friendly message. It helps
+# keep novice users from being afraid of the computer.
+BEGIN @{ print "Don't Panic!" @}
+@end example
+
+You can put comment lines into keyboard-composed throw-away @code{awk}
+programs also, but this usually isn't very useful; the purpose of a
+comment is to help you or another person understand the program at
+a later time.
+
+@node Very Simple, Two Rules, Running gawk, Getting Started
+@section A Very Simple Example
+
+The following command runs a simple @code{awk} program that searches the
+input file @file{BBS-list} for the string of characters: @samp{foo}. (A
+string of characters is usually called a @dfn{string}.
+The term @dfn{string} is perhaps based on similar usage in English, such
+as ``a string of pearls,'' or, ``a string of cars in a train.'')
+
+@example
+awk '/foo/ @{ print $0 @}' BBS-list
+@end example
+
+@noindent
+When lines containing @samp{foo} are found, they are printed, because
+@w{@samp{print $0}} means print the current line. (Just @samp{print} by
+itself means the same thing, so we could have written that
+instead.)
+
+You will notice that slashes, @samp{/}, surround the string @samp{foo}
+in the @code{awk} program. The slashes indicate that @samp{foo}
+is a pattern to search for. This type of pattern is called a
+@dfn{regular expression}, and is covered in more detail later
+(@pxref{Regexp, ,Regular Expressions}).
+The pattern is allowed to match parts of words.
+There are
+single-quotes around the @code{awk} program so that the shell won't
+interpret any of it as special shell characters.
+
+Here is what this program prints:
+
+@example
+@group
+$ awk '/foo/ @{ print $0 @}' BBS-list
+@print{} fooey 555-1234 2400/1200/300 B
+@print{} foot 555-6699 1200/300 B
+@print{} macfoo 555-6480 1200/300 A
+@print{} sabafoo 555-2127 1200/300 C
+@end group
+@end example
+
+@cindex action, default
+@cindex pattern, default
+@cindex default action
+@cindex default pattern
+In an @code{awk} rule, either the pattern or the action can be omitted,
+but not both. If the pattern is omitted, then the action is performed
+for @emph{every} input line. If the action is omitted, the default
+action is to print all lines that match the pattern.
+
+@cindex empty action
+@cindex action, empty
+Thus, we could leave out the action (the @code{print} statement and the curly
+braces) in the above example, and the result would be the same: all
+lines matching the pattern @samp{foo} would be printed. By comparison,
+omitting the @code{print} statement but retaining the curly braces makes an
+empty action that does nothing; then no lines would be printed.
+
+@node Two Rules, More Complex, Very Simple, Getting Started
+@section An Example with Two Rules
+@cindex how @code{awk} works
+
+The @code{awk} utility reads the input files one line at a
+time. For each line, @code{awk} tries the patterns of each of the rules.
+If several patterns match then several actions are run, in the order in
+which they appear in the @code{awk} program. If no patterns match, then
+no actions are run.
+
+After processing all the rules (perhaps none) that match the line,
+@code{awk} reads the next line (however,
+@pxref{Next Statement, ,The @code{next} Statement},
+and also @pxref{Nextfile Statement, ,The @code{nextfile} Statement}).
+This continues until the end of the file is reached.
+
+For example, the @code{awk} program:
+
+@example
+/12/ @{ print $0 @}
+/21/ @{ print $0 @}
+@end example
+
+@noindent
+contains two rules. The first rule has the string @samp{12} as the
+pattern and @samp{print $0} as the action. The second rule has the
+string @samp{21} as the pattern and also has @samp{print $0} as the
+action. Each rule's action is enclosed in its own pair of braces.
+
+This @code{awk} program prints every line that contains the string
+@samp{12} @emph{or} the string @samp{21}. If a line contains both
+strings, it is printed twice, once by each rule.
+
+This is what happens if we run this program on our two sample data files,
+@file{BBS-list} and @file{inventory-shipped}, as shown here:
+
+@example
+$ awk '/12/ @{ print $0 @}
+> /21/ @{ print $0 @}' BBS-list inventory-shipped
+@print{} aardvark 555-5553 1200/300 B
+@print{} alpo-net 555-3412 2400/1200/300 A
+@print{} barfly 555-7685 1200/300 A
+@print{} bites 555-1675 2400/1200/300 A
+@print{} core 555-2912 1200/300 C
+@print{} fooey 555-1234 2400/1200/300 B
+@print{} foot 555-6699 1200/300 B
+@print{} macfoo 555-6480 1200/300 A
+@print{} sdace 555-3430 2400/1200/300 A
+@print{} sabafoo 555-2127 1200/300 C
+@print{} sabafoo 555-2127 1200/300 C
+@print{} Jan 21 36 64 620
+@print{} Apr 21 70 74 514
+@end example
+
+@noindent
+Note how the line in @file{BBS-list} beginning with @samp{sabafoo}
+was printed twice, once for each rule.
+
+@node More Complex, Statements/Lines, Two Rules, Getting Started
+@section A More Complex Example
+
+@ignore
+We have to use ls -lg here to get portable output across Unix systems.
+The POSIX ls matches this behavior too. Sigh.
+@end ignore
+Here is an example to give you an idea of what typical @code{awk}
+programs do. This example shows how @code{awk} can be used to
+summarize, select, and rearrange the output of another utility. It uses
+features that haven't been covered yet, so don't worry if you don't
+understand all the details.
+
+@example
+ls -lg | awk '$6 == "Nov" @{ sum += $5 @}
+ END @{ print sum @}'
+@end example
+
+@cindex @code{csh}, backslash continuation
+@cindex backslash continuation in @code{csh}
+This command prints the total number of bytes in all the files in the
+current directory that were last modified in November (of any year).
+(In the C shell you would need to type a semicolon and then a backslash
+at the end of the first line; in a POSIX-compliant shell, such as the
+Bourne shell or Bash, the GNU Bourne-Again shell, you can type the example
+as shown.)
+@ignore
+FIXME: how can users tell what shell they are running? Need a footnote
+or something, but getting into this is a distraction.
+@end ignore
+
+The @w{@samp{ls -lg}} part of this example is a system command that gives
+you a listing of the files in a directory, including file size and the date
+the file was last modified. Its output looks like this:
+
+@example
+-rw-r--r-- 1 arnold user 1933 Nov 7 13:05 Makefile
+-rw-r--r-- 1 arnold user 10809 Nov 7 13:03 gawk.h
+-rw-r--r-- 1 arnold user 983 Apr 13 12:14 gawk.tab.h
+-rw-r--r-- 1 arnold user 31869 Jun 15 12:20 gawk.y
+-rw-r--r-- 1 arnold user 22414 Nov 7 13:03 gawk1.c
+-rw-r--r-- 1 arnold user 37455 Nov 7 13:03 gawk2.c
+-rw-r--r-- 1 arnold user 27511 Dec 9 13:07 gawk3.c
+-rw-r--r-- 1 arnold user 7989 Nov 7 13:03 gawk4.c
+@end example
+
+@noindent
+The first field contains read-write permissions, the second field contains
+the number of links to the file, and the third field identifies the owner of
+the file. The fourth field identifies the group of the file.
+The fifth field contains the size of the file in bytes. The
+sixth, seventh and eighth fields contain the month, day, and time,
+respectively, that the file was last modified. Finally, the ninth field
+contains the name of the file.
+
+@cindex automatic initialization
+@cindex initialization, automatic
+The @samp{$6 == "Nov"} in our @code{awk} program is an expression that
+tests whether the sixth field of the output from @w{@samp{ls -lg}}
+matches the string @samp{Nov}. Each time a line has the string
+@samp{Nov} for its sixth field, the action @samp{sum += $5} is
+performed. This adds the fifth field (the file size) to the variable
+@code{sum}. As a result, when @code{awk} has finished reading all the
+input lines, @code{sum} is the sum of the sizes of files whose
+lines matched the pattern. (This works because @code{awk} variables
+are automatically initialized to zero.)
+
+After the last line of output from @code{ls} has been processed, the
+@code{END} rule is executed, and the value of @code{sum} is
+printed. In this example, the value of @code{sum} would be 80600.
+
+These more advanced @code{awk} techniques are covered in later sections
+(@pxref{Action Overview, ,Overview of Actions}). Before you can move on to more
+advanced @code{awk} programming, you have to know how @code{awk} interprets
+your input and displays your output. By manipulating fields and using
+@code{print} statements, you can produce some very useful and impressive
+looking reports.
+
+@node Statements/Lines, Other Features, More Complex, Getting Started
+@section @code{awk} Statements Versus Lines
+@cindex line break
+@cindex newline
+
+Most often, each line in an @code{awk} program is a separate statement or
+separate rule, like this:
+
+@example
+awk '/12/ @{ print $0 @}
+ /21/ @{ print $0 @}' BBS-list inventory-shipped
+@end example
+
+However, @code{gawk} will ignore newlines after any of the following:
+
+@example
+, @{ ? : || && do else
+@end example
+
+@noindent
+A newline at any other point is considered the end of the statement.
+(Splitting lines after @samp{?} and @samp{:} is a minor @code{gawk}
+extension. The @samp{?} and @samp{:} referred to here is the
+three operand conditional expression described in
+@ref{Conditional Exp, ,Conditional Expressions}.)
+
+@cindex backslash continuation
+@cindex continuation of lines
+@cindex line continuation
+If you would like to split a single statement into two lines at a point
+where a newline would terminate it, you can @dfn{continue} it by ending the
+first line with a backslash character, @samp{\}. The backslash must be
+the final character on the line to be recognized as a continuation
+character. This is allowed absolutely anywhere in the statement, even
+in the middle of a string or regular expression. For example:
+
+@example
+awk '/This regular expression is too long, so continue it\
+ on the next line/ @{ print $1 @}'
+@end example
+
+@noindent
+@cindex portability issues
+We have generally not used backslash continuation in the sample programs
+in this @value{DOCUMENT}. Since in @code{gawk} there is no limit on the
+length of a line, it is never strictly necessary; it just makes programs
+more readable. For this same reason, as well as for clarity, we have
+kept most statements short in the sample programs presented throughout
+the @value{DOCUMENT}. Backslash continuation is most useful when your
+@code{awk} program is in a separate source file, instead of typed in on
+the command line. You should also note that many @code{awk}
+implementations are more particular about where you may use backslash
+continuation. For example, they may not allow you to split a string
+constant using backslash continuation. Thus, for maximal portability of
+your @code{awk} programs, it is best not to split your lines in the
+middle of a regular expression or a string.
+
+@cindex @code{csh}, backslash continuation
+@cindex backslash continuation in @code{csh}
+@strong{Caution: backslash continuation does not work as described above
+with the C shell.} Continuation with backslash works for @code{awk}
+programs in files, and also for one-shot programs @emph{provided} you
+are using a POSIX-compliant shell, such as the Bourne shell or Bash, the
+GNU Bourne-Again shell. But the C shell (@code{csh}) behaves
+differently! There, you must use two backslashes in a row, followed by
+a newline. Note also that when using the C shell, @emph{every} newline
+in your awk program must be escaped with a backslash. To illustrate:
+
+@example
+% awk 'BEGIN @{ \
+? print \\
+? "hello, world" \
+? @}'
+@print{} hello, world
+@end example
+
+@noindent
+Here, the @samp{%} and @samp{?} are the C shell's primary and secondary
+prompts, analogous to the standard shell's @samp{$} and @samp{>}.
+
+@code{awk} is a line-oriented language. Each rule's action has to
+begin on the same line as the pattern. To have the pattern and action
+on separate lines, you @emph{must} use backslash continuation---there
+is no other way.
+
+@cindex multiple statements on one line
+When @code{awk} statements within one rule are short, you might want to put
+more than one of them on a line. You do this by separating the statements
+with a semicolon, @samp{;}.
+
+This also applies to the rules themselves.
+Thus, the previous program could have been written:
+
+@example
+/12/ @{ print $0 @} ; /21/ @{ print $0 @}
+@end example
+
+@noindent
+@strong{Note:} the requirement that rules on the same line must be
+separated with a semicolon was not in the original @code{awk}
+language; it was added for consistency with the treatment of statements
+within an action.
+
+@node Other Features, When, Statements/Lines, Getting Started
+@section Other Features of @code{awk}
+
+The @code{awk} language provides a number of predefined, or built-in variables, which
+your programs can use to get information from @code{awk}. There are other
+variables your program can set to control how @code{awk} processes your
+data.
+
+In addition, @code{awk} provides a number of built-in functions for doing
+common computational and string related operations.
+
+As we develop our presentation of the @code{awk} language, we introduce
+most of the variables and many of the functions. They are defined
+systematically in @ref{Built-in Variables}, and
+@ref{Built-in, ,Built-in Functions}.
+
+@node When, , Other Features, Getting Started
+@section When to Use @code{awk}
+
+@cindex when to use @code{awk}
+@cindex applications of @code{awk}
+You might wonder how @code{awk} might be useful for you. Using
+utility programs, advanced patterns, field separators, arithmetic
+statements, and other selection criteria, you can produce much more
+complex output. The @code{awk} language is very useful for producing
+reports from large amounts of raw data, such as summarizing information
+from the output of other utility programs like @code{ls}.
+(@xref{More Complex, ,A More Complex Example}.)
+
+Programs written with @code{awk} are usually much smaller than they would
+be in other languages. This makes @code{awk} programs easy to compose and
+use. Often, @code{awk} programs can be quickly composed at your terminal,
+used once, and thrown away. Since @code{awk} programs are interpreted, you
+can avoid the (usually lengthy) compilation part of the typical
+edit-compile-test-debug cycle of software development.
+
+Complex programs have been written in @code{awk}, including a complete
+retargetable assembler for eight-bit microprocessors (@pxref{Glossary}, for
+more information) and a microcode assembler for a special purpose Prolog
+computer. However, @code{awk}'s capabilities are strained by tasks of
+such complexity.
+
+If you find yourself writing @code{awk} scripts of more than, say, a few
+hundred lines, you might consider using a different programming
+language. Emacs Lisp is a good choice if you need sophisticated string
+or pattern matching capabilities. The shell is also good at string and
+pattern matching; in addition, it allows powerful use of the system
+utilities. More conventional languages, such as C, C++, and Lisp, offer
+better facilities for system programming and for managing the complexity
+of large programs. Programs in these languages may require more lines
+of source code than the equivalent @code{awk} programs, but they are
+easier to maintain and usually run more efficiently.
+
+@node One-liners, Regexp, Getting Started, Top
+@chapter Useful One Line Programs
+
+@cindex one-liners
+Many useful @code{awk} programs are short, just a line or two. Here is a
+collection of useful, short programs to get you started. Some of these
+programs contain constructs that haven't been covered yet. The description
+of the program will give you a good idea of what is going on, but please
+read the rest of the @value{DOCUMENT} to become an @code{awk} expert!
+
+Most of the examples use a data file named @file{data}. This is just a
+placeholder; if you were to use these programs yourself, you would substitute
+your own file names for @file{data}.
+
+@ifinfo
+Since you are reading this in Info, each line of the example code is
+enclosed in quotes, to represent text that you would type literally.
+The examples themselves represent shell commands that use single quotes
+to keep the shell from interpreting the contents of the program.
+When reading the examples, focus on the text between the open and close
+quotes.
+@end ifinfo
+
+@table @code
+@item awk '@{ if (length($0) > max) max = length($0) @}
+@itemx @ @ @ @ @ END @{ print max @}' data
+This program prints the length of the longest input line.
+
+@item awk 'length($0) > 80' data
+This program prints every line that is longer than 80 characters. The sole
+rule has a relational expression as its pattern, and has no action (so the
+default action, printing the record, is used).
+
+@item expand@ data@ |@ awk@ '@{ if (x < length()) x = length() @}
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ END @{ print "maximum line length is " x @}'
+This program prints the length of the longest line in @file{data}. The input
+is processed by the @code{expand} program to change tabs into spaces,
+so the widths compared are actually the right-margin columns.
+
+@item awk 'NF > 0' data
+This program prints every line that has at least one field. This is an
+easy way to delete blank lines from a file (or rather, to create a new
+file similar to the old file but from which the blank lines have been
+deleted).
+
+@c Karl Berry points out that new users probably don't want to see
+@c multiple ways to do things, just the `best' way. He's probably
+@c right. At some point it might be worth adding something about there
+@c often being multiple ways to do things in awk, but for now we'll
+@c just take this one out.
+@ignore
+@item awk '@{ if (NF > 0) print @}' data
+This program also prints every line that has at least one field. Here we
+allow the rule to match every line, and then decide in the action whether
+to print.
+@end ignore
+
+@item awk@ 'BEGIN@ @{@ for (i = 1; i <= 7; i++)
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ print int(101 * rand()) @}'
+This program prints seven random numbers from zero to 100, inclusive.
+
+@item ls -lg @var{files} | awk '@{ x += $5 @} ; END @{ print "total bytes: " x @}'
+This program prints the total number of bytes used by @var{files}.
+
+@item ls -lg @var{files} | awk '@{ x += $5 @}
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ END @{ print "total K-bytes: " (x + 1023)/1024 @}'
+This program prints the total number of kilobytes used by @var{files}.
+
+@item awk -F: '@{ print $1 @}' /etc/passwd | sort
+This program prints a sorted list of the login names of all users.
+
+@item awk 'END @{ print NR @}' data
+This program counts lines in a file.
+
+@item awk 'NR % 2' data
+This program prints the even numbered lines in the data file.
+If you were to use the expression @samp{NR % 2 == 1} instead,
+it would print the odd number lines.
+@end table
+
+@node Regexp, Reading Files, One-liners, Top
+@chapter Regular Expressions
+@cindex pattern, regular expressions
+@cindex regexp
+@cindex regular expression
+@cindex regular expressions as patterns
+
+A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a
+set of strings.
+Because regular expressions are such a fundamental part of @code{awk}
+programming, their format and use deserve a separate chapter.
+
+A regular expression enclosed in slashes (@samp{/})
+is an @code{awk} pattern that matches every input record whose text
+belongs to that set.
+
+The simplest regular expression is a sequence of letters, numbers, or
+both. Such a regexp matches any string that contains that sequence.
+Thus, the regexp @samp{foo} matches any string containing @samp{foo}.
+Therefore, the pattern @code{/foo/} matches any input record containing
+the three characters @samp{foo}, @emph{anywhere} in the record. Other
+kinds of regexps let you specify more complicated classes of strings.
+
+@iftex
+Initially, the examples will be simple. As we explain more about how
+regular expressions work, we will present more complicated examples.
+@end iftex
+
+@menu
+* Regexp Usage:: How to Use Regular Expressions.
+* Escape Sequences:: How to write non-printing characters.
+* Regexp Operators:: Regular Expression Operators.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
+* Leftmost Longest:: How much text matches.
+* Computed Regexps:: Using Dynamic Regexps.
+@end menu
+
+@node Regexp Usage, Escape Sequences, Regexp, Regexp
+@section How to Use Regular Expressions
+
+A regular expression can be used as a pattern by enclosing it in
+slashes. Then the regular expression is tested against the
+entire text of each record. (Normally, it only needs
+to match some part of the text in order to succeed.) For example, this
+prints the second field of each record that contains the three
+characters @samp{foo} anywhere in it:
+
+@example
+@group
+$ awk '/foo/ @{ print $2 @}' BBS-list
+@print{} 555-1234
+@print{} 555-6699
+@print{} 555-6480
+@print{} 555-2127
+@end group
+@end example
+
+@cindex regexp matching operators
+@cindex string-matching operators
+@cindex operators, string-matching
+@cindex operators, regexp matching
+@cindex regexp match/non-match operators
+@cindex @code{~} operator
+@cindex @code{!~} operator
+Regular expressions can also be used in matching expressions. These
+expressions allow you to specify the string to match against; it need
+not be the entire current input record. The two operators, @samp{~}
+and @samp{!~}, perform regular expression comparisons. Expressions
+using these operators can be used as patterns or in @code{if},
+@code{while}, @code{for}, and @code{do} statements.
+@ifinfo
+@c adding this xref in TeX screws up the formatting too much
+(@xref{Statements, ,Control Statements in Actions}.)
+@end ifinfo
+
+@table @code
+@item @var{exp} ~ /@var{regexp}/
+This is true if the expression @var{exp} (taken as a string)
+is matched by @var{regexp}. The following example matches, or selects,
+all input records with the upper-case letter @samp{J} somewhere in the
+first field:
+
+@example
+@group
+$ awk '$1 ~ /J/' inventory-shipped
+@print{} Jan 13 25 15 115
+@print{} Jun 31 42 75 492
+@print{} Jul 24 34 67 436
+@print{} Jan 21 36 64 620
+@end group
+@end example
+
+So does this:
+
+@example
+awk '@{ if ($1 ~ /J/) print @}' inventory-shipped
+@end example
+
+@item @var{exp} !~ /@var{regexp}/
+This is true if the expression @var{exp} (taken as a character string)
+is @emph{not} matched by @var{regexp}. The following example matches,
+or selects, all input records whose first field @emph{does not} contain
+the upper-case letter @samp{J}:
+
+@example
+@group
+$ awk '$1 !~ /J/' inventory-shipped
+@print{} Feb 15 32 24 226
+@print{} Mar 15 24 34 228
+@print{} Apr 31 52 63 420
+@print{} May 16 34 29 208
+@dots{}
+@end group
+@end example
+@end table
+
+@cindex regexp constant
+When a regexp is written enclosed in slashes, like @code{/foo/}, we call it
+a @dfn{regexp constant}, much like @code{5.27} is a numeric constant, and
+@code{"foo"} is a string constant.
+
+@node Escape Sequences, Regexp Operators, Regexp Usage, Regexp
+@section Escape Sequences
+
+@cindex escape sequence notation
+Some characters cannot be included literally in string constants
+(@code{"foo"}) or regexp constants (@code{/foo/}). You represent them
+instead with @dfn{escape sequences}, which are character sequences
+beginning with a backslash (@samp{\}).
+
+One use of an escape sequence is to include a double-quote character in
+a string constant. Since a plain double-quote would end the string, you
+must use @samp{\"} to represent an actual double-quote character as a
+part of the string. For example:
+
+@example
+$ awk 'BEGIN @{ print "He said \"hi!\" to her." @}'
+@print{} He said "hi!" to her.
+@end example
+
+The backslash character itself is another character that cannot be
+included normally; you write @samp{\\} to put one backslash in the
+string or regexp. Thus, the string whose contents are the two characters
+@samp{"} and @samp{\} must be written @code{"\"\\"}.
+
+Another use of backslash is to represent unprintable characters
+such as tab or newline. While there is nothing to stop you from entering most
+unprintable characters directly in a string constant or regexp constant,
+they may look ugly.
+
+Here is a table of all the escape sequences used in @code{awk}, and
+what they represent. Unless noted otherwise, all of these escape
+sequences apply to both string constants and regexp constants.
+
+@iftex
+@page
+@end iftex
+@c @cartouche
+@table @code
+@item \\
+A literal backslash, @samp{\}.
+
+@cindex @code{awk} language, V.4 version
+@item \a
+The ``alert'' character, @kbd{Control-g}, ASCII code 7 (BEL).
+
+@item \b
+Backspace, @kbd{Control-h}, ASCII code 8 (BS).
+
+@item \f
+Formfeed, @kbd{Control-l}, ASCII code 12 (FF).
+
+@item \n
+Newline, @kbd{Control-j}, ASCII code 10 (LF).
+
+@item \r
+Carriage return, @kbd{Control-m}, ASCII code 13 (CR).
+
+@item \t
+Horizontal tab, @kbd{Control-i}, ASCII code 9 (HT).
+
+@cindex @code{awk} language, V.4 version
+@item \v
+Vertical tab, @kbd{Control-k}, ASCII code 11 (VT).
+
+@item \@var{nnn}
+The octal value @var{nnn}, where @var{nnn} are one to three digits
+between @samp{0} and @samp{7}. For example, the code for the ASCII ESC
+(escape) character is @samp{\033}.
+
+@cindex @code{awk} language, V.4 version
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@item \x@var{hh}@dots{}
+The hexadecimal value @var{hh}, where @var{hh} are hexadecimal
+digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or
+@samp{a} through @samp{f}). Like the same construct in ANSI C, the escape
+sequence continues until the first non-hexadecimal digit is seen. However,
+using more than two hexadecimal digits produces undefined results. (The
+@samp{\x} escape sequence is not allowed in POSIX @code{awk}.)
+
+@item \/
+A literal slash (necessary for regexp constants only).
+You use this when you wish to write a regexp
+constant that contains a slash. Since the regexp is delimited by
+slashes, you need to escape the slash that is part of the pattern,
+in order to tell @code{awk} to keep processing the rest of the regexp.
+
+@item \"
+A literal double-quote (necessary for string constants only).
+You use this when you wish to write a string
+constant that contains a double-quote. Since the string is delimited by
+double-quotes, you need to escape the quote that is part of the string,
+in order to tell @code{awk} to keep processing the rest of the string.
+@end table
+@c @end cartouche
+
+In @code{gawk}, there are additional two character sequences that begin
+with backslash that have special meaning in regexps.
+@xref{GNU Regexp Operators, ,Additional Regexp Operators Only in @code{gawk}}.
+
+In a string constant,
+what happens if you place a backslash before something that is not one of
+the characters listed above? POSIX @code{awk} purposely leaves this case
+undefined. There are two choices.
+
+@itemize @bullet
+@item
+Strip the backslash out. This is what Unix @code{awk} and @code{gawk} both do.
+For example, @code{"a\qc"} is the same as @code{"aqc"}.
+
+@item
+Leave the backslash alone. Some other @code{awk} implementations do this.
+In such implementations, @code{"a\qc"} is the same as if you had typed
+@code{"a\\qc"}.
+@end itemize
+
+In a regexp, a backslash before any character that is not in the above table,
+and not listed in
+@ref{GNU Regexp Operators, ,Additional Regexp Operators Only in @code{gawk}},
+means that the next character should be taken literally, even if it would
+normally be a regexp operator. E.g., @code{/a\+b/} matches the three
+characters @samp{a+b}.
+
+@cindex portability issues
+For complete portability, do not use a backslash before any character not
+listed in the table above.
+
+Another interesting question arises. Suppose you use an octal or hexadecimal
+escape to represent a regexp metacharacter
+(@pxref{Regexp Operators, , Regular Expression Operators}).
+Does @code{awk} treat the character as literal character, or as a regexp
+operator?
+
+@cindex dark corner
+It turns out that historically, such characters were taken literally (d.c.).
+However, the POSIX standard indicates that they should be treated
+as real metacharacters, and this is what @code{gawk} does.
+However, in compatibility mode (@pxref{Options, ,Command Line Options}),
+@code{gawk} treats the characters represented by octal and hexadecimal
+escape sequences literally when used in regexp constants. Thus,
+@code{/a\52b/} is equivalent to @code{/a\*b/}.
+
+To summarize:
+
+@enumerate 1
+@item
+The escape sequences in the table above are always processed first,
+for both string constants and regexp constants. This happens very early,
+as soon as @code{awk} reads your program.
+
+@item
+@code{gawk} processes both regexp constants and dynamic regexps
+(@pxref{Computed Regexps, ,Using Dynamic Regexps}),
+for the special operators listed in
+@ref{GNU Regexp Operators, ,Additional Regexp Operators Only in @code{gawk}}.
+
+@item
+A backslash before any other character means to treat that character
+literally.
+@end enumerate
+
+@node Regexp Operators, GNU Regexp Operators, Escape Sequences, Regexp
+@section Regular Expression Operators
+@cindex metacharacters
+@cindex regular expression metacharacters
+@cindex regexp operators
+
+You can combine regular expressions with the following characters,
+called @dfn{regular expression operators}, or @dfn{metacharacters}, to
+increase the power and versatility of regular expressions.
+
+The escape sequences described
+@iftex
+above
+@end iftex
+in @ref{Escape Sequences},
+are valid inside a regexp. They are introduced by a @samp{\}. They
+are recognized and converted into the corresponding real characters as
+the very first step in processing regexps.
+
+Here is a table of metacharacters. All characters that are not escape
+sequences and that are not listed in the table stand for themselves.
+
+@iftex
+@page
+@end iftex
+@table @code
+@item \
+This is used to suppress the special meaning of a character when
+matching. For example:
+
+@example
+\$
+@end example
+
+@noindent
+matches the character @samp{$}.
+
+@cindex anchors in regexps
+@cindex regexp, anchors
+@item ^
+This matches the beginning of a string. For example:
+
+@example
+^@@chapter
+@end example
+
+@noindent
+matches the @samp{@@chapter} at the beginning of a string, and can be used
+to identify chapter beginnings in Texinfo source files.
+The @samp{^} is known as an @dfn{anchor}, since it anchors the pattern to
+matching only at the beginning of the string.
+
+It is important to realize that @samp{^} does not match the beginning of
+a line embedded in a string. In this example the condition is not true:
+
+@example
+if ("line1\nLINE 2" ~ /^L/) @dots{}
+@end example
+
+@item $
+This is similar to @samp{^}, but it matches only at the end of a string.
+For example:
+
+@example
+p$
+@end example
+
+@noindent
+matches a record that ends with a @samp{p}. The @samp{$} is also an anchor,
+and also does not match the end of a line embedded in a string. In this
+example the condition is not true:
+
+@example
+if ("line1\nLINE 2" ~ /1$/) @dots{}
+@end example
+
+@item .
+The period, or dot, matches any single character,
+@emph{including} the newline character. For example:
+
+@example
+.P
+@end example
+
+@noindent
+matches any single character followed by a @samp{P} in a string. Using
+concatenation we can make a regular expression like @samp{U.A}, which
+matches any three-character sequence that begins with @samp{U} and ends
+with @samp{A}.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+In strict POSIX mode (@pxref{Options, ,Command Line Options}),
+@samp{.} does not match the @sc{nul}
+character, which is a character with all bits equal to zero.
+Otherwise, @sc{nul} is just another character. Other versions of @code{awk}
+may not be able to match the @sc{nul} character.
+
+@ignore
+2e: Add stuff that character list is the POSIX terminology. In other
+ literature known as character set or character class.
+@end ignore
+
+@cindex character list
+@item [@dots{}]
+This is called a @dfn{character list}. It matches any @emph{one} of the
+characters that are enclosed in the square brackets. For example:
+
+@example
+[MVX]
+@end example
+
+@noindent
+matches any one of the characters @samp{M}, @samp{V}, or @samp{X} in a
+string.
+
+Ranges of characters are indicated by using a hyphen between the beginning
+and ending characters, and enclosing the whole thing in brackets. For
+example:
+
+@example
+[0-9]
+@end example
+
+@noindent
+matches any digit.
+Multiple ranges are allowed. E.g., the list @code{@w{[A-Za-z0-9]}} is a
+common way to express the idea of ``all alphanumeric characters.''
+
+To include one of the characters @samp{\}, @samp{]}, @samp{-} or @samp{^} in a
+character list, put a @samp{\} in front of it. For example:
+
+@example
+[d\]]
+@end example
+
+@noindent
+matches either @samp{d}, or @samp{]}.
+
+@cindex @code{egrep}
+This treatment of @samp{\} in character lists
+is compatible with other @code{awk}
+implementations, and is also mandated by POSIX.
+The regular expressions in @code{awk} are a superset
+of the POSIX specification for Extended Regular Expressions (EREs).
+POSIX EREs are based on the regular expressions accepted by the
+traditional @code{egrep} utility.
+
+@cindex character classes
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@dfn{Character classes} are a new feature introduced in the POSIX standard.
+A character class is a special notation for describing
+lists of characters that have a specific attribute, but where the
+actual characters themselves can vary from country to country and/or
+from character set to character set. For example, the notion of what
+is an alphabetic character differs in the USA and in France.
+
+A character class is only valid in a regexp @emph{inside} the
+brackets of a character list. Character classes consist of @samp{[:},
+a keyword denoting the class, and @samp{:]}. Here are the character
+classes defined by the POSIX standard.
+
+@table @code
+@item [:alnum:]
+Alphanumeric characters.
+
+@item [:alpha:]
+Alphabetic characters.
+
+@item [:blank:]
+Space and tab characters.
+
+@item [:cntrl:]
+Control characters.
+
+@item [:digit:]
+Numeric characters.
+
+@item [:graph:]
+Characters that are printable and are also visible.
+(A space is printable, but not visible, while an @samp{a} is both.)
+
+@item [:lower:]
+Lower-case alphabetic characters.
+
+@item [:print:]
+Printable characters (characters that are not control characters.)
+
+@item [:punct:]
+Punctuation characters (characters that are not letter, digits,
+control characters, or space characters).
+
+@item [:space:]
+Space characters (such as space, tab, and formfeed, to name a few).
+
+@item [:upper:]
+Upper-case alphabetic characters.
+
+@item [:xdigit:]
+Characters that are hexadecimal digits.
+@end table
+
+For example, before the POSIX standard, to match alphanumeric
+characters, you had to write @code{/[A-Za-z0-9]/}. If your
+character set had other alphabetic characters in it, this would not
+match them. With the POSIX character classes, you can write
+@code{/[[:alnum:]]/}, and this will match @emph{all} the alphabetic
+and numeric characters in your character set.
+
+@cindex collating elements
+Two additional special sequences can appear in character lists.
+These apply to non-ASCII character sets, which can have single symbols
+(called @dfn{collating elements}) that are represented with more than one
+character, as well as several characters that are equivalent for
+@dfn{collating}, or sorting, purposes. (E.g., in French, a plain ``e''
+and a grave-accented
+@iftex
+``@`e''
+@end iftex
+@ifinfo
+``e''
+@end ifinfo
+are equivalent.)
+
+@table @asis
+@cindex collating symbols
+@item Collating Symbols
+A @dfn{collating symbol} is a multi-character collating element enclosed in
+@samp{[.} and @samp{.]}. For example, if @samp{ch} is a collating element,
+then @code{[[.ch.]]} is a regexp that matches this collating element, while
+@code{[ch]} is a regexp that matches either @samp{c} or @samp{h}.
+
+@cindex equivalence classes
+@item Equivalence Classes
+An @dfn{equivalence class} is a list of equivalent characters enclosed in
+@samp{[=} and @samp{=]}.
+@iftex
+Thus, @code{[[=e@`e=]]} is regexp that matches either @samp{e} or @samp{@`e}.
+@end iftex
+@ifinfo
+Because Info files use plain ASCII characters, it is not possible to present
+a realistic equivalence class example here.
+@end ifinfo
+@end table
+
+These features are very valuable in non-English speaking locales.
+
+@strong{Caution:} The library functions that @code{gawk} uses for regular
+expression matching currently only recognize POSIX character classes;
+they do not recognize collating symbols or equivalence classes.
+@c maybe one day ...
+
+@cindex complemented character list
+@cindex character list, complemented
+@item [^ @dots{}]
+This is a @dfn{complemented character list}. The first character after
+the @samp{[} @emph{must} be a @samp{^}. It matches any characters
+@emph{except} those in the square brackets, or newline. For example:
+
+@example
+[^0-9]
+@end example
+
+@noindent
+matches any character that is not a digit.
+
+@item |
+This is the @dfn{alternation operator}, and it is used to specify
+alternatives. For example:
+
+@example
+^P|[0-9]
+@end example
+
+@noindent
+matches any string that matches either @samp{^P} or @samp{[0-9]}. This
+means it matches any string that starts with @samp{P} or contains a digit.
+
+The alternation applies to the largest possible regexps on either side.
+In other words, @samp{|} has the lowest precedence of all the regular
+expression operators.
+
+@item (@dots{})
+Parentheses are used for grouping in regular expressions as in
+arithmetic. They can be used to concatenate regular expressions
+containing the alternation operator, @samp{|}. For example,
+@samp{@@(samp|code)\@{[^@}]+\@}} matches both @samp{@@code@{foo@}} and
+@samp{@@samp@{bar@}}. (These are Texinfo formatting control sequences.)
+
+@item *
+This symbol means that the preceding regular expression is to be
+repeated as many times as necessary to find a match. For example:
+
+@example
+ph*
+@end example
+
+@noindent
+applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
+of one @samp{p} followed by any number of @samp{h}s. This will also match
+just @samp{p} if no @samp{h}s are present.
+
+The @samp{*} repeats the @emph{smallest} possible preceding expression.
+(Use parentheses if you wish to repeat a larger expression.) It finds
+as many repetitions as possible. For example:
+
+@example
+awk '/\(c[ad][ad]*r x\)/ @{ print @}' sample
+@end example
+
+@noindent
+prints every record in @file{sample} containing a string of the form
+@samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on.
+Notice the escaping of the parentheses by preceding them
+with backslashes.
+
+@item +
+This symbol is similar to @samp{*}, but the preceding expression must be
+matched at least once. This means that:
+
+@example
+wh+y
+@end example
+
+@noindent
+would match @samp{why} and @samp{whhy} but not @samp{wy}, whereas
+@samp{wh*y} would match all three of these strings. This is a simpler
+way of writing the last @samp{*} example:
+
+@example
+awk '/\(c[ad]+r x\)/ @{ print @}' sample
+@end example
+
+@item ?
+This symbol is similar to @samp{*}, but the preceding expression can be
+matched either once or not at all. For example:
+
+@example
+fe?d
+@end example
+
+@noindent
+will match @samp{fed} and @samp{fd}, but nothing else.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@cindex interval expressions
+@item @{@var{n}@}
+@itemx @{@var{n},@}
+@itemx @{@var{n},@var{m}@}
+One or two numbers inside braces denote an @dfn{interval expression}.
+If there is one number in the braces, the preceding regexp is repeated
+@var{n} times.
+If there are two numbers separated by a comma, the preceding regexp is
+repeated @var{n} to @var{m} times.
+If there is one number followed by a comma, then the preceding regexp
+is repeated at least @var{n} times.
+
+@table @code
+@item wh@{3@}y
+matches @samp{whhhy} but not @samp{why} or @samp{whhhhy}.
+
+@item wh@{3,5@}y
+matches @samp{whhhy} or @samp{whhhhy} or @samp{whhhhhy}, only.
+
+@item wh@{2,@}y
+matches @samp{whhy} or @samp{whhhy}, and so on.
+@end table
+
+Interval expressions were not traditionally available in @code{awk}.
+As part of the POSIX standard they were added, to make @code{awk}
+and @code{egrep} consistent with each other.
+
+However, since old programs may use @samp{@{} and @samp{@}} in regexp
+constants, by default @code{gawk} does @emph{not} match interval expressions
+in regexps. If either @samp{--posix} or @samp{--re-interval} are specified
+(@pxref{Options, , Command Line Options}), then interval expressions
+are allowed in regexps.
+@end table
+
+@cindex precedence, regexp operators
+@cindex regexp operators, precedence of
+In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators,
+as well as the braces @samp{@{} and @samp{@}},
+have
+the highest precedence, followed by concatenation, and finally by @samp{|}.
+As in arithmetic, parentheses can change how operators are grouped.
+
+If @code{gawk} is in compatibility mode
+(@pxref{Options, ,Command Line Options}),
+character classes and interval expressions are not available in
+regular expressions.
+
+The next
+@ifinfo
+node
+@end ifinfo
+@iftex
+section
+@end iftex
+discusses the GNU-specific regexp operators, and provides
+more detail concerning how command line options affect the way @code{gawk}
+interprets the characters in regular expressions.
+
+@node GNU Regexp Operators, Case-sensitivity, Regexp Operators, Regexp
+@section Additional Regexp Operators Only in @code{gawk}
+
+@c This section adapted from the regex-0.12 manual
+
+@cindex regexp operators, GNU specific
+GNU software that deals with regular expressions provides a number of
+additional regexp operators. These operators are described in this
+section, and are specific to @code{gawk}; they are not available in other
+@code{awk} implementations.
+
+@cindex word, regexp definition of
+Most of the additional operators are for dealing with word matching.
+For our purposes, a @dfn{word} is a sequence of one or more letters, digits,
+or underscores (@samp{_}).
+
+@table @code
+@cindex @code{\w} regexp operator
+@item \w
+This operator matches any word-constituent character, i.e.@: any
+letter, digit, or underscore. Think of it as a short-hand for
+@c @w{@code{[A-Za-z0-9_]}} or
+@w{@code{[[:alnum:]_]}}.
+
+@cindex @code{\W} regexp operator
+@item \W
+This operator matches any character that is not word-constituent.
+Think of it as a short-hand for
+@c @w{@code{[^A-Za-z0-9_]}} or
+@w{@code{[^[:alnum:]_]}}.
+
+@cindex @code{\<} regexp operator
+@item \<
+This operator matches the empty string at the beginning of a word.
+For example, @code{/\<away/} matches @samp{away}, but not
+@samp{stowaway}.
+
+@cindex @code{\>} regexp operator
+@item \>
+This operator matches the empty string at the end of a word.
+For example, @code{/stow\>/} matches @samp{stow}, but not @samp{stowaway}.
+
+@cindex @code{\y} regexp operator
+@cindex word boundaries, matching
+@item \y
+This operator matches the empty string at either the beginning or the
+end of a word (the word boundar@strong{y}). For example, @samp{\yballs?\y}
+matches either @samp{ball} or @samp{balls} as a separate word.
+
+@cindex @code{\B} regexp operator
+@item \B
+This operator matches the empty string within a word. In other words,
+@samp{\B} matches the empty string that occurs between two
+word-constituent characters. For example,
+@code{/\Brat\B/} matches @samp{crate}, but it does not match @samp{dirty rat}.
+@samp{\B} is essentially the opposite of @samp{\y}.
+@end table
+
+There are two other operators that work on buffers. In Emacs, a
+@dfn{buffer} is, naturally, an Emacs buffer. For other programs, the
+regexp library routines that @code{gawk} uses consider the entire
+string to be matched as the buffer.
+
+For @code{awk}, since @samp{^} and @samp{$} always work in terms
+of the beginning and end of strings, these operators don't add any
+new capabilities. They are provided for compatibility with other GNU
+software.
+
+@cindex buffer matching operators
+@table @code
+@cindex @code{\`} regexp operator
+@item \`
+This operator matches the empty string at the
+beginning of the buffer.
+
+@cindex @code{\'} regexp operator
+@item \'
+This operator matches the empty string at the
+end of the buffer.
+@end table
+
+In other GNU software, the word boundary operator is @samp{\b}. However,
+that conflicts with the @code{awk} language's definition of @samp{\b}
+as backspace, so @code{gawk} uses a different letter.
+
+An alternative method would have been to require two backslashes in the
+GNU operators, but this was deemed to be too confusing, and the current
+method of using @samp{\y} for the GNU @samp{\b} appears to be the
+lesser of two evils.
+
+@c NOTE!!! Keep this in sync with the same table in the summary appendix!
+@cindex regexp, effect of command line options
+The various command line options
+(@pxref{Options, ,Command Line Options})
+control how @code{gawk} interprets characters in regexps.
+
+@table @asis
+@item No options
+In the default case, @code{gawk} provide all the facilities of
+POSIX regexps and the GNU regexp operators described
+@iftex
+above.
+@end iftex
+@ifinfo
+in @ref{Regexp Operators, ,Regular Expression Operators}.
+@end ifinfo
+However, interval expressions are not supported.
+
+@item @code{--posix}
+Only POSIX regexps are supported, the GNU operators are not special
+(e.g., @samp{\w} matches a literal @samp{w}). Interval expressions
+are allowed.
+
+@item @code{--traditional}
+Traditional Unix @code{awk} regexps are matched. The GNU operators
+are not special, interval expressions are not available, and neither
+are the POSIX character classes (@code{[[:alnum:]]} and so on).
+Characters described by octal and hexadecimal escape sequences are
+treated literally, even if they represent regexp metacharacters.
+
+@item @code{--re-interval}
+Allow interval expressions in regexps, even if @samp{--traditional}
+has been provided.
+@end table
+
+@node Case-sensitivity, Leftmost Longest, GNU Regexp Operators, Regexp
+@section Case-sensitivity in Matching
+
+@cindex case sensitivity
+@cindex ignoring case
+Case is normally significant in regular expressions, both when matching
+ordinary characters (i.e.@: not metacharacters), and inside character
+sets. Thus a @samp{w} in a regular expression matches only a lower-case
+@samp{w} and not an upper-case @samp{W}.
+
+The simplest way to do a case-independent match is to use a character
+list: @samp{[Ww]}. However, this can be cumbersome if you need to use it
+often; and it can make the regular expressions harder to
+read. There are two alternatives that you might prefer.
+
+One way to do a case-insensitive match at a particular point in the
+program is to convert the data to a single case, using the
+@code{tolower} or @code{toupper} built-in string functions (which we
+haven't discussed yet;
+@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+For example:
+
+@example
+tolower($1) ~ /foo/ @{ @dots{} @}
+@end example
+
+@noindent
+converts the first field to lower-case before matching against it.
+This will work in any POSIX-compliant implementation of @code{awk}.
+
+@cindex differences between @code{gawk} and @code{awk}
+@cindex @code{~} operator
+@cindex @code{!~} operator
+@vindex IGNORECASE
+Another method, specific to @code{gawk}, is to set the variable
+@code{IGNORECASE} to a non-zero value (@pxref{Built-in Variables}).
+When @code{IGNORECASE} is not zero, @emph{all} regexp and string
+operations ignore case. Changing the value of
+@code{IGNORECASE} dynamically controls the case sensitivity of your
+program as it runs. Case is significant by default because
+@code{IGNORECASE} (like most variables) is initialized to zero.
+
+@example
+x = "aB"
+if (x ~ /ab/) @dots{} # this test will fail
+
+IGNORECASE = 1
+if (x ~ /ab/) @dots{} # now it will succeed
+@end example
+
+In general, you cannot use @code{IGNORECASE} to make certain rules
+case-insensitive and other rules case-sensitive, because there is no way
+to set @code{IGNORECASE} just for the pattern of a particular rule.
+@ignore
+This isn't quite true. Consider:
+
+ IGNORECASE=1 && /foObAr/ { .... }
+ IGNORECASE=0 || /foobar/ { .... }
+
+But that's pretty bad style and I don't want to get into it at this
+late date.
+@end ignore
+To do this, you must use character lists or @code{tolower}. However, one
+thing you can do only with @code{IGNORECASE} is turn case-sensitivity on
+or off dynamically for all the rules at once.
+
+@code{IGNORECASE} can be set on the command line, or in a @code{BEGIN} rule
+(@pxref{Other Arguments, ,Other Command Line Arguments}; also
+@pxref{Using BEGIN/END, ,Startup and Cleanup Actions}).
+Setting @code{IGNORECASE} from the command line is a way to make
+a program case-insensitive without having to edit it.
+
+Prior to version 3.0 of @code{gawk}, the value of @code{IGNORECASE}
+only affected regexp operations. It did not affect string comparison
+with @samp{==}, @samp{!=}, and so on.
+Beginning with version 3.0, both regexp and string comparison
+operations are affected by @code{IGNORECASE}.
+
+@cindex ISO 8859-1
+@cindex ISO Latin-1
+Beginning with version 3.0 of @code{gawk}, the equivalences between upper-case
+and lower-case characters are based on the ISO-8859-1 (ISO Latin-1)
+character set. This character set is a superset of the traditional 128
+ASCII characters, that also provides a number of characters suitable
+for use with European languages.
+@ignore
+A pure ASCII character set can be used instead if @code{gawk} is compiled
+with @samp{-DUSE_PURE_ASCII}.
+@end ignore
+
+The value of @code{IGNORECASE} has no effect if @code{gawk} is in
+compatibility mode (@pxref{Options, ,Command Line Options}).
+Case is always significant in compatibility mode.
+
+@node Leftmost Longest, Computed Regexps, Case-sensitivity, Regexp
+@section How Much Text Matches?
+
+@cindex leftmost longest match
+@cindex matching, leftmost longest
+Consider the following example:
+
+@example
+echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
+@end example
+
+This example uses the @code{sub} function (which we haven't discussed yet,
+@pxref{String Functions, ,Built-in Functions for String Manipulation})
+to make a change to the input record. Here, the regexp @code{/a+/}
+indicates ``one or more @samp{a} characters,'' and the replacement
+text is @samp{<A>}.
+
+The input contains four @samp{a} characters. What will the output be?
+In other words, how many is ``one or more''---will @code{awk} match two,
+three, or all four @samp{a} characters?
+
+The answer is, @code{awk} (and POSIX) regular expressions always match
+the leftmost, @emph{longest} sequence of input characters that can
+match. Thus, in this example, all four @samp{a} characters are
+replaced with @samp{<A>}.
+
+@example
+$ echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}'
+@print{} <A>bcd
+@end example
+
+For simple match/no-match tests, this is not so important. But when doing
+regexp-based field and record splitting, and
+text matching and substitutions with the @code{match}, @code{sub}, @code{gsub},
+and @code{gensub} functions, it is very important.
+@ifinfo
+@xref{String Functions, ,Built-in Functions for String Manipulation},
+for more information on these functions.
+@end ifinfo
+Understanding this principle is also important for regexp-based record
+and field splitting (@pxref{Records, ,How Input is Split into Records},
+and also @pxref{Field Separators, ,Specifying How Fields are Separated}).
+
+@node Computed Regexps, , Leftmost Longest, Regexp
+@section Using Dynamic Regexps
+
+@cindex computed regular expressions
+@cindex regular expressions, computed
+@cindex dynamic regular expressions
+@cindex regexp, dynamic
+@cindex @code{~} operator
+@cindex @code{!~} operator
+The right hand side of a @samp{~} or @samp{!~} operator need not be a
+regexp constant (i.e.@: a string of characters between slashes). It may
+be any expression. The expression is evaluated, and converted if
+necessary to a string; the contents of the string are used as the
+regexp. A regexp that is computed in this way is called a @dfn{dynamic
+regexp}. For example:
+
+@example
+BEGIN @{ identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+" @}
+$0 ~ identifier_regexp @{ print @}
+@end example
+
+@noindent
+sets @code{identifier_regexp} to a regexp that describes @code{awk}
+variable names, and tests if the input record matches this regexp.
+
+@strong{Caution:} When using the @samp{~} and @samp{!~}
+operators, there is a difference between a regexp constant
+enclosed in slashes, and a string constant enclosed in double quotes.
+If you are going to use a string constant, you have to understand that
+the string is in essence scanned @emph{twice}; the first time when
+@code{awk} reads your program, and the second time when it goes to
+match the string on the left-hand side of the operator with the pattern
+on the right. This is true of any string valued expression (such as
+@code{identifier_regexp} above), not just string constants.
+
+@cindex regexp constants, difference between slashes and quotes
+What difference does it make if the string is
+scanned twice? The answer has to do with escape sequences, and particularly
+with backslashes. To get a backslash into a regular expression inside a
+string, you have to type two backslashes.
+
+For example, @code{/\*/} is a regexp constant for a literal @samp{*}.
+Only one backslash is needed. To do the same thing with a string,
+you would have to type @code{"\\*"}. The first backslash escapes the
+second one, so that the string actually contains the
+two characters @samp{\} and @samp{*}.
+
+@cindex common mistakes
+@cindex mistakes, common
+@cindex errors, common
+Given that you can use both regexp and string constants to describe
+regular expressions, which should you use? The answer is ``regexp
+constants,'' for several reasons.
+
+@enumerate 1
+@item
+String constants are more complicated to write, and
+more difficult to read. Using regexp constants makes your programs
+less error-prone. Not understanding the difference between the two
+kinds of constants is a common source of errors.
+
+@item
+It is also more efficient to use regexp constants: @code{awk} can note
+that you have supplied a regexp and store it internally in a form that
+makes pattern matching more efficient. When using a string constant,
+@code{awk} must first convert the string into this internal form, and
+then perform the pattern matching.
+
+@item
+Using regexp constants is better style; it shows clearly that you
+intend a regexp match.
+@end enumerate
+
+@node Reading Files, Printing, Regexp, Top
+@chapter Reading Input Files
+
+@cindex reading files
+@cindex input
+@cindex standard input
+@vindex FILENAME
+In the typical @code{awk} program, all input is read either from the
+standard input (by default the keyboard, but often a pipe from another
+command) or from files whose names you specify on the @code{awk} command
+line. If you specify input files, @code{awk} reads them in order, reading
+all the data from one before going on to the next. The name of the current
+input file can be found in the built-in variable @code{FILENAME}
+(@pxref{Built-in Variables}).
+
+The input is read in units called @dfn{records}, and processed by the
+rules of your program one record at a time.
+By default, each record is one line. Each
+record is automatically split into chunks called @dfn{fields}.
+This makes it more convenient for programs to work on the parts of a record.
+
+On rare occasions you will need to use the @code{getline} command.
+The @code{getline} command is valuable, both because it
+can do explicit input from any number of files, and because the files
+used with it do not have to be named on the @code{awk} command line
+(@pxref{Getline, ,Explicit Input with @code{getline}}).
+
+@menu
+* Records:: Controlling how data is split into records.
+* Fields:: An introduction to fields.
+* Non-Constant Fields:: Non-constant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change it.
+* Constant Size:: Reading constant width data.
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program control
+ using the @code{getline} function.
+@end menu
+
+@node Records, Fields, Reading Files, Reading Files
+@section How Input is Split into Records
+
+@cindex record separator, @code{RS}
+@cindex changing the record separator
+@cindex record, definition of
+@vindex RS
+The @code{awk} utility divides the input for your @code{awk}
+program into records and fields.
+Records are separated by a character called the @dfn{record separator}.
+By default, the record separator is the newline character.
+This is why records are, by default, single lines.
+You can use a different character for the record separator by
+assigning the character to the built-in variable @code{RS}.
+
+You can change the value of @code{RS} in the @code{awk} program,
+like any other variable, with the
+assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}).
+The new record-separator character should be enclosed in quotation marks,
+which indicate
+a string constant. Often the right time to do this is at the beginning
+of execution, before any input has been processed, so that the very
+first record will be read with the proper separator. To do this, use
+the special @code{BEGIN} pattern
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). For
+example:
+
+@example
+awk 'BEGIN @{ RS = "/" @} ; @{ print $0 @}' BBS-list
+@end example
+
+@noindent
+changes the value of @code{RS} to @code{"/"}, before reading any input.
+This is a string whose first character is a slash; as a result, records
+are separated by slashes. Then the input file is read, and the second
+rule in the @code{awk} program (the action with no pattern) prints each
+record. Since each @code{print} statement adds a newline at the end of
+its output, the effect of this @code{awk} program is to copy the input
+with each slash changed to a newline. Here are the results of running
+the program on @file{BBS-list}:
+
+@example
+@group
+$ awk 'BEGIN @{ RS = "/" @} ; @{ print $0 @}' BBS-list
+@print{} aardvark 555-5553 1200
+@print{} 300 B
+@print{} alpo-net 555-3412 2400
+@print{} 1200
+@print{} 300 A
+@print{} barfly 555-7685 1200
+@print{} 300 A
+@print{} bites 555-1675 2400
+@print{} 1200
+@print{} 300 A
+@print{} camelot 555-0542 300 C
+@print{} core 555-2912 1200
+@print{} 300 C
+@print{} fooey 555-1234 2400
+@print{} 1200
+@print{} 300 B
+@print{} foot 555-6699 1200
+@print{} 300 B
+@print{} macfoo 555-6480 1200
+@print{} 300 A
+@print{} sdace 555-3430 2400
+@print{} 1200
+@print{} 300 A
+@print{} sabafoo 555-2127 1200
+@print{} 300 C
+@print{}
+@end group
+@end example
+
+@noindent
+Note that the entry for the @samp{camelot} BBS is not split.
+In the original data file
+(@pxref{Sample Data Files, , Data Files for the Examples}),
+the line looks like this:
+
+@example
+camelot 555-0542 300 C
+@end example
+
+@noindent
+It only has one baud rate; there are no slashes in the record.
+
+Another way to change the record separator is on the command line,
+using the variable-assignment feature
+(@pxref{Other Arguments, ,Other Command Line Arguments}).
+
+@example
+awk '@{ print $0 @}' RS="/" BBS-list
+@end example
+
+@noindent
+This sets @code{RS} to @samp{/} before processing @file{BBS-list}.
+
+Using an unusual character such as @samp{/} for the record separator
+produces correct behavior in the vast majority of cases. However,
+the following (extreme) pipeline prints a surprising @samp{1}. There
+is one field, consisting of a newline. The value of the built-in
+variable @code{NF} is the number of fields in the current record.
+
+@example
+$ echo | awk 'BEGIN @{ RS = "a" @} ; @{ print NF @}'
+@print{} 1
+@end example
+
+@cindex dark corner
+@noindent
+Reaching the end of an input file terminates the current input record,
+even if the last character in the file is not the character in @code{RS}
+(d.c.).
+
+@cindex empty string
+The empty string, @code{""} (a string of no characters), has a special meaning
+as the value of @code{RS}: it means that records are separated
+by one or more blank lines, and nothing else.
+@xref{Multiple Line, ,Multiple-Line Records}, for more details.
+
+If you change the value of @code{RS} in the middle of an @code{awk} run,
+the new value is used to delimit subsequent records, but the record
+currently being processed (and records already processed) are not
+affected.
+
+@vindex RT
+@cindex record terminator, @code{RT}
+@cindex terminator, record
+@cindex differences between @code{gawk} and @code{awk}
+After the end of the record has been determined, @code{gawk}
+sets the variable @code{RT} to the text in the input that matched
+@code{RS}.
+
+@cindex regular expressions as record separators
+The value of @code{RS} is in fact not limited to a one-character
+string. It can be any regular expression
+(@pxref{Regexp, ,Regular Expressions}).
+In general, each record
+ends at the next string that matches the regular expression; the next
+record starts at the end of the matching string. This general rule is
+actually at work in the usual case, where @code{RS} contains just a
+newline: a record ends at the beginning of the next matching string (the
+next newline in the input) and the following record starts just after
+the end of this string (at the first character of the following line).
+The newline, since it matches @code{RS}, is not part of either record.
+
+When @code{RS} is a single character, @code{RT} will
+contain the same single character. However, when @code{RS} is a
+regular expression, then @code{RT} becomes more useful; it contains
+the actual input text that matched the regular expression.
+
+The following example illustrates both of these features.
+It sets @code{RS} equal to a regular expression that
+matches either a newline, or a series of one or more upper-case letters
+with optional leading and/or trailing white space
+(@pxref{Regexp, , Regular Expressions}).
+
+@example
+$ echo record 1 AAAA record 2 BBBB record 3 |
+> gawk 'BEGIN @{ RS = "\n|( *[[:upper:]]+ *)" @}
+> @{ print "Record =", $0, "and RT =", RT @}'
+@print{} Record = record 1 and RT = AAAA
+@print{} Record = record 2 and RT = BBBB
+@print{} Record = record 3 and RT =
+@print{}
+@end example
+
+@noindent
+The final line of output has an extra blank line. This is because the
+value of @code{RT} is a newline, and then the @code{print} statement
+supplies its own terminating newline.
+
+@xref{Simple Sed, ,A Simple Stream Editor}, for a more useful example
+of @code{RS} as a regexp and @code{RT}.
+
+@cindex differences between @code{gawk} and @code{awk}
+The use of @code{RS} as a regular expression and the @code{RT}
+variable are @code{gawk} extensions; they are not available in
+compatibility mode
+(@pxref{Options, ,Command Line Options}).
+In compatibility mode, only the first character of the value of
+@code{RS} is used to determine the end of the record.
+
+@cindex number of records, @code{NR}, @code{FNR}
+@vindex NR
+@vindex FNR
+The @code{awk} utility keeps track of the number of records that have
+been read so far from the current input file. This value is stored in a
+built-in variable called @code{FNR}. It is reset to zero when a new
+file is started. Another built-in variable, @code{NR}, is the total
+number of input records read so far from all data files. It starts at zero
+but is never automatically reset to zero.
+
+@node Fields, Non-Constant Fields, Records, Reading Files
+@section Examining Fields
+
+@cindex examining fields
+@cindex fields
+@cindex accessing fields
+When @code{awk} reads an input record, the record is
+automatically separated or @dfn{parsed} by the interpreter into chunks
+called @dfn{fields}. By default, fields are separated by whitespace,
+like words in a line.
+Whitespace in @code{awk} means any string of one or more spaces and/or
+tabs; other characters such as newline, formfeed, and so on, that are
+considered whitespace by other languages are @emph{not} considered
+whitespace by @code{awk}.
+
+The purpose of fields is to make it more convenient for you to refer to
+these pieces of the record. You don't have to use them---you can
+operate on the whole record if you wish---but fields are what make
+simple @code{awk} programs so powerful.
+
+@cindex @code{$} (field operator)
+@cindex field operator @code{$}
+To refer to a field in an @code{awk} program, you use a dollar-sign,
+@samp{$}, followed by the number of the field you want. Thus, @code{$1}
+refers to the first field, @code{$2} to the second, and so on. For
+example, suppose the following is a line of input:
+
+@example
+This seems like a pretty nice example.
+@end example
+
+@noindent
+Here the first field, or @code{$1}, is @samp{This}; the second field, or
+@code{$2}, is @samp{seems}; and so on. Note that the last field,
+@code{$7}, is @samp{example.}. Because there is no space between the
+@samp{e} and the @samp{.}, the period is considered part of the seventh
+field.
+
+@vindex NF
+@cindex number of fields, @code{NF}
+@code{NF} is a built-in variable whose value
+is the number of fields in the current record.
+@code{awk} updates the value of @code{NF} automatically, each time
+a record is read.
+
+No matter how many fields there are, the last field in a record can be
+represented by @code{$NF}. So, in the example above, @code{$NF} would
+be the same as @code{$7}, which is @samp{example.}. Why this works is
+explained below (@pxref{Non-Constant Fields, ,Non-constant Field Numbers}).
+If you try to reference a field beyond the last one, such as @code{$8}
+when the record has only seven fields, you get the empty string.
+@c the empty string acts like 0 in some contexts, but I don't want to
+@c get into that here....
+
+@code{$0}, which looks like a reference to the ``zeroth'' field, is
+a special case: it represents the whole input record. @code{$0} is
+used when you are not interested in fields.
+
+Here are some more examples:
+
+@example
+@group
+$ awk '$1 ~ /foo/ @{ print $0 @}' BBS-list
+@print{} fooey 555-1234 2400/1200/300 B
+@print{} foot 555-6699 1200/300 B
+@print{} macfoo 555-6480 1200/300 A
+@print{} sabafoo 555-2127 1200/300 C
+@end group
+@end example
+
+@noindent
+This example prints each record in the file @file{BBS-list} whose first
+field contains the string @samp{foo}. The operator @samp{~} is called a
+@dfn{matching operator}
+(@pxref{Regexp Usage, , How to Use Regular Expressions});
+it tests whether a string (here, the field @code{$1}) matches a given regular
+expression.
+
+By contrast, the following example
+looks for @samp{foo} in @emph{the entire record} and prints the first
+field and the last field for each input record containing a
+match.
+
+@example
+@group
+$ awk '/foo/ @{ print $1, $NF @}' BBS-list
+@print{} fooey B
+@print{} foot B
+@print{} macfoo A
+@print{} sabafoo C
+@end group
+@end example
+
+@node Non-Constant Fields, Changing Fields, Fields, Reading Files
+@section Non-constant Field Numbers
+
+The number of a field does not need to be a constant. Any expression in
+the @code{awk} language can be used after a @samp{$} to refer to a
+field. The value of the expression specifies the field number. If the
+value is a string, rather than a number, it is converted to a number.
+Consider this example:
+
+@example
+awk '@{ print $NR @}'
+@end example
+
+@noindent
+Recall that @code{NR} is the number of records read so far: one in the
+first record, two in the second, etc. So this example prints the first
+field of the first record, the second field of the second record, and so
+on. For the twentieth record, field number 20 is printed; most likely,
+the record has fewer than 20 fields, so this prints a blank line.
+
+Here is another example of using expressions as field numbers:
+
+@example
+awk '@{ print $(2*2) @}' BBS-list
+@end example
+
+@code{awk} must evaluate the expression @samp{(2*2)} and use
+its value as the number of the field to print. The @samp{*} sign
+represents multiplication, so the expression @samp{2*2} evaluates to four.
+The parentheses are used so that the multiplication is done before the
+@samp{$} operation; they are necessary whenever there is a binary
+operator in the field-number expression. This example, then, prints the
+hours of operation (the fourth field) for every line of the file
+@file{BBS-list}. (All of the @code{awk} operators are listed, in
+order of decreasing precedence, in
+@ref{Precedence, , Operator Precedence (How Operators Nest)}.)
+
+If the field number you compute is zero, you get the entire record.
+Thus, @code{$(2-2)} has the same value as @code{$0}. Negative field
+numbers are not allowed; trying to reference one will usually terminate
+your running @code{awk} program. (The POSIX standard does not define
+what happens when you reference a negative field number. @code{gawk}
+will notice this and terminate your program. Other @code{awk}
+implementations may behave differently.)
+
+As mentioned in @ref{Fields, ,Examining Fields},
+the number of fields in the current record is stored in the built-in
+variable @code{NF} (also @pxref{Built-in Variables}). The expression
+@code{$NF} is not a special feature: it is the direct consequence of
+evaluating @code{NF} and using its value as a field number.
+
+@node Changing Fields, Field Separators, Non-Constant Fields, Reading Files
+@section Changing the Contents of a Field
+
+@cindex field, changing contents of
+@cindex changing contents of a field
+@cindex assignment to fields
+You can change the contents of a field as seen by @code{awk} within an
+@code{awk} program; this changes what @code{awk} perceives as the
+current input record. (The actual input is untouched; @code{awk} @emph{never}
+modifies the input file.)
+
+Consider this example and its output:
+
+@example
+@group
+$ awk '@{ $3 = $2 - 10; print $2, $3 @}' inventory-shipped
+@print{} 13 3
+@print{} 15 5
+@print{} 15 5
+@dots{}
+@end group
+@end example
+
+@noindent
+The @samp{-} sign represents subtraction, so this program reassigns
+field three, @code{$3}, to be the value of field two minus ten,
+@samp{$2 - 10}. (@xref{Arithmetic Ops, ,Arithmetic Operators}.)
+Then field two, and the new value for field three, are printed.
+
+In order for this to work, the text in field @code{$2} must make sense
+as a number; the string of characters must be converted to a number in
+order for the computer to do arithmetic on it. The number resulting
+from the subtraction is converted back to a string of characters which
+then becomes field three.
+@xref{Conversion, ,Conversion of Strings and Numbers}.
+
+When you change the value of a field (as perceived by @code{awk}), the
+text of the input record is recalculated to contain the new field where
+the old one was. Therefore, @code{$0} changes to reflect the altered
+field. Thus, this program
+prints a copy of the input file, with 10 subtracted from the second
+field of each line.
+
+@example
+@group
+$ awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped
+@print{} Jan 3 25 15 115
+@print{} Feb 5 32 24 226
+@print{} Mar 5 24 34 228
+@dots{}
+@end group
+@end example
+
+You can also assign contents to fields that are out of range. For
+example:
+
+@example
+$ awk '@{ $6 = ($5 + $4 + $3 + $2)
+> print $6 @}' inventory-shipped
+@print{} 168
+@print{} 297
+@print{} 301
+@dots{}
+@end example
+
+@noindent
+We've just created @code{$6}, whose value is the sum of fields
+@code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign
+represents addition. For the file @file{inventory-shipped}, @code{$6}
+represents the total number of parcels shipped for a particular month.
+
+Creating a new field changes @code{awk}'s internal copy of the current
+input record---the value of @code{$0}. Thus, if you do @samp{print $0}
+after adding a field, the record printed includes the new field, with
+the appropriate number of field separators between it and the previously
+existing fields.
+
+This recomputation affects and is affected by
+@code{NF} (the number of fields; @pxref{Fields, ,Examining Fields}),
+and by a feature that has not been discussed yet,
+the @dfn{output field separator}, @code{OFS},
+which is used to separate the fields (@pxref{Output Separators}).
+For example, the value of @code{NF} is set to the number of the highest
+field you create.
+
+Note, however, that merely @emph{referencing} an out-of-range field
+does @emph{not} change the value of either @code{$0} or @code{NF}.
+Referencing an out-of-range field only produces an empty string. For
+example:
+
+@example
+if ($(NF+1) != "")
+ print "can't happen"
+else
+ print "everything is normal"
+@end example
+
+@noindent
+should print @samp{everything is normal}, because @code{NF+1} is certain
+to be out of range. (@xref{If Statement, ,The @code{if}-@code{else} Statement},
+for more information about @code{awk}'s @code{if-else} statements.
+@xref{Typing and Comparison, ,Variable Typing and Comparison Expressions}, for more information
+about the @samp{!=} operator.)
+
+It is important to note that making an assignment to an existing field
+will change the
+value of @code{$0}, but will not change the value of @code{NF},
+even when you assign the empty string to a field. For example:
+
+@example
+@group
+$ echo a b c d | awk '@{ OFS = ":"; $2 = ""
+> print $0; print NF @}'
+@print{} a::c:d
+@print{} 4
+@end group
+@end example
+
+@noindent
+The field is still there; it just has an empty value. You can tell
+because there are two colons in a row.
+
+This example shows what happens if you create a new field.
+
+@example
+$ echo a b c d | awk '@{ OFS = ":"; $2 = ""; $6 = "new"
+> print $0; print NF @}'
+@print{} a::c:d::new
+@print{} 6
+@end example
+
+@noindent
+The intervening field, @code{$5} is created with an empty value
+(indicated by the second pair of adjacent colons),
+and @code{NF} is updated with the value six.
+
+@node Field Separators, Constant Size, Changing Fields, Reading Files
+@section Specifying How Fields are Separated
+
+This section is rather long; it describes one of the most fundamental
+operations in @code{awk}.
+
+@menu
+* Basic Field Splitting:: How fields are split with single characters
+ or simple strings.
+* Regexp Field Splitting:: Using regexps as the field separator.
+* Single Character Fields:: Making each character a separate field.
+* Command Line Field Separator:: Setting @code{FS} from the command line.
+* Field Splitting Summary:: Some final points and a summary table.
+@end menu
+
+@node Basic Field Splitting, Regexp Field Splitting, Field Separators, Field Separators
+@subsection The Basics of Field Separating
+@vindex FS
+@cindex fields, separating
+@cindex field separator, @code{FS}
+
+The @dfn{field separator}, which is either a single character or a regular
+expression, controls the way @code{awk} splits an input record into fields.
+@code{awk} scans the input record for character sequences that
+match the separator; the fields themselves are the text between the matches.
+
+In the examples below, we use the bullet symbol ``@bullet{}'' to represent
+spaces in the output.
+
+If the field separator is @samp{oo}, then the following line:
+
+@example
+moo goo gai pan
+@end example
+
+@noindent
+would be split into three fields: @samp{m}, @samp{@bullet{}g} and
+@samp{@bullet{}gai@bullet{}pan}.
+Note the leading spaces in the values of the second and third fields.
+
+@cindex common mistakes
+@cindex mistakes, common
+@cindex errors, common
+The field separator is represented by the built-in variable @code{FS}.
+Shell programmers take note! @code{awk} does @emph{not} use the name @code{IFS}
+which is used by the POSIX compatible shells (such as the Bourne shell,
+@code{sh}, or the GNU Bourne-Again Shell, Bash).
+
+You can change the value of @code{FS} in the @code{awk} program with the
+assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}).
+Often the right time to do this is at the beginning of execution,
+before any input has been processed, so that the very first record
+will be read with the proper separator. To do this, use the special
+@code{BEGIN} pattern
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}).
+For example, here we set the value of @code{FS} to the string
+@code{","}:
+
+@example
+awk 'BEGIN @{ FS = "," @} ; @{ print $2 @}'
+@end example
+
+@noindent
+Given the input line,
+
+@example
+John Q. Smith, 29 Oak St., Walamazoo, MI 42139
+@end example
+
+@noindent
+this @code{awk} program extracts and prints the string
+@samp{@bullet{}29@bullet{}Oak@bullet{}St.}.
+
+@cindex field separator, choice of
+@cindex regular expressions as field separators
+Sometimes your input data will contain separator characters that don't
+separate fields the way you thought they would. For instance, the
+person's name in the example we just used might have a title or
+suffix attached, such as @samp{John Q. Smith, LXIX}. From input
+containing such a name:
+
+@example
+John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
+@end example
+
+@noindent
+@c careful of an overfull hbox here!
+the above program would extract @samp{@bullet{}LXIX}, instead of
+@samp{@bullet{}29@bullet{}Oak@bullet{}St.}.
+If you were expecting the program to print the
+address, you would be surprised. The moral is: choose your data layout and
+separator characters carefully to prevent such problems.
+
+@iftex
+As you know, normally,
+@end iftex
+@ifinfo
+Normally,
+@end ifinfo
+fields are separated by whitespace sequences
+(spaces and tabs), not by single spaces: two spaces in a row do not
+delimit an empty field. The default value of the field separator @code{FS}
+is a string containing a single space, @w{@code{" "}}. If this value were
+interpreted in the usual way, each space character would separate
+fields, so two spaces in a row would make an empty field between them.
+The reason this does not happen is that a single space as the value of
+@code{FS} is a special case: it is taken to specify the default manner
+of delimiting fields.
+
+If @code{FS} is any other single character, such as @code{","}, then
+each occurrence of that character separates two fields. Two consecutive
+occurrences delimit an empty field. If the character occurs at the
+beginning or the end of the line, that too delimits an empty field. The
+space character is the only single character which does not follow these
+rules.
+
+@node Regexp Field Splitting, Single Character Fields, Basic Field Splitting, Field Separators
+@subsection Using Regular Expressions to Separate Fields
+
+The previous
+@iftex
+subsection
+@end iftex
+@ifinfo
+node
+@end ifinfo
+discussed the use of single characters or simple strings as the
+value of @code{FS}.
+More generally, the value of @code{FS} may be a string containing any
+regular expression. In this case, each match in the record for the regular
+expression separates fields. For example, the assignment:
+
+@example
+FS = ", \t"
+@end example
+
+@noindent
+makes every area of an input line that consists of a comma followed by a
+space and a tab, into a field separator. (@samp{\t}
+is an @dfn{escape sequence} that stands for a tab;
+@pxref{Escape Sequences},
+for the complete list of similar escape sequences.)
+
+For a less trivial example of a regular expression, suppose you want
+single spaces to separate fields the way single commas were used above.
+You can set @code{FS} to @w{@code{"[@ ]"}} (left bracket, space, right
+bracket). This regular expression matches a single space and nothing else
+(@pxref{Regexp, ,Regular Expressions}).
+
+There is an important difference between the two cases of @samp{FS = @w{" "}}
+(a single space) and @samp{FS = @w{"[ \t]+"}} (left bracket, space, backslash,
+``t'', right bracket, which is a regular expression
+matching one or more spaces or tabs). For both values of @code{FS}, fields
+are separated by runs of spaces and/or tabs. However, when the value of
+@code{FS} is @w{@code{" "}}, @code{awk} will first strip leading and trailing
+whitespace from the record, and then decide where the fields are.
+
+For example, the following pipeline prints @samp{b}:
+
+@example
+$ echo ' a b c d ' | awk '@{ print $2 @}'
+@print{} b
+@end example
+
+@noindent
+However, this pipeline prints @samp{a} (note the extra spaces around
+each letter):
+
+@example
+$ echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t]+" @}
+> @{ print $2 @}'
+@print{} a
+@end example
+
+@noindent
+@cindex null string
+@cindex empty string
+In this case, the first field is @dfn{null}, or empty.
+
+The stripping of leading and trailing whitespace also comes into
+play whenever @code{$0} is recomputed. For instance, study this pipeline:
+
+@example
+$ echo ' a b c d' | awk '@{ print; $2 = $2; print @}'
+@print{} a b c d
+@print{} a b c d
+@end example
+
+@noindent
+The first @code{print} statement prints the record as it was read,
+with leading whitespace intact. The assignment to @code{$2} rebuilds
+@code{$0} by concatenating @code{$1} through @code{$NF} together,
+separated by the value of @code{OFS}. Since the leading whitespace
+was ignored when finding @code{$1}, it is not part of the new @code{$0}.
+Finally, the last @code{print} statement prints the new @code{$0}.
+
+@node Single Character Fields, Command Line Field Separator, Regexp Field Splitting, Field Separators
+@subsection Making Each Character a Separate Field
+
+@cindex differences between @code{gawk} and @code{awk}
+@cindex single character fields
+There are times when you may want to examine each character
+of a record separately. In @code{gawk}, this is easy to do, you
+simply assign the null string (@code{""}) to @code{FS}. In this case,
+each individual character in the record will become a separate field.
+Here is an example:
+@c extra verbiage due to page boundaries
+
+@example
+echo a b | gawk 'BEGIN @{ FS = "" @}
+ @{
+ for (i = 1; i <= NF; i = i + 1)
+ print "Field", i, "is", $i
+ @}'
+@end example
+
+@noindent
+The output from this is:
+
+@example
+Field 1 is a
+Field 2 is
+Field 3 is b
+@end example
+
+@cindex dark corner
+Traditionally, the behavior for @code{FS} equal to @code{""} was not defined.
+In this case, Unix @code{awk} would simply treat the entire record
+as only having one field (d.c.). In compatibility mode
+(@pxref{Options, ,Command Line Options}),
+if @code{FS} is the null string, then @code{gawk} will also
+behave this way.
+
+@node Command Line Field Separator, Field Splitting Summary, Single Character Fields, Field Separators
+@subsection Setting @code{FS} from the Command Line
+@cindex @code{-F} option
+@cindex field separator, on command line
+@cindex command line, setting @code{FS} on
+
+@code{FS} can be set on the command line. You use the @samp{-F} option to
+do so. For example:
+
+@example
+awk -F, '@var{program}' @var{input-files}
+@end example
+
+@noindent
+sets @code{FS} to be the @samp{,} character. Notice that the option uses
+a capital @samp{F}. Contrast this with @samp{-f}, which specifies a file
+containing an @code{awk} program. Case is significant in command line options:
+the @samp{-F} and @samp{-f} options have nothing to do with each other.
+You can use both options at the same time to set the @code{FS} variable
+@emph{and} get an @code{awk} program from a file.
+
+The value used for the argument to @samp{-F} is processed in exactly the
+same way as assignments to the built-in variable @code{FS}. This means that
+if the field separator contains special characters, they must be escaped
+appropriately. For example, to use a @samp{\} as the field separator, you
+would have to type:
+
+@example
+# same as FS = "\\"
+awk -F\\\\ '@dots{}' files @dots{}
+@end example
+
+@noindent
+Since @samp{\} is used for quoting in the shell, @code{awk} will see
+@samp{-F\\}. Then @code{awk} processes the @samp{\\} for escape
+characters (@pxref{Escape Sequences}), finally yielding
+a single @samp{\} to be used for the field separator.
+
+@cindex historical features
+As a special case, in compatibility mode
+(@pxref{Options, ,Command Line Options}), if the
+argument to @samp{-F} is @samp{t}, then @code{FS} is set to the tab
+character. This is because if you type @samp{-F\t} at the shell,
+without any quotes, the @samp{\} gets deleted, so @code{awk} figures that you
+really want your fields to be separated with tabs, and not @samp{t}s.
+Use @samp{-v FS="t"} on the command line if you really do want to separate
+your fields with @samp{t}s
+(@pxref{Options, ,Command Line Options}).
+
+For example, let's use an @code{awk} program file called @file{baud.awk}
+that contains the pattern @code{/300/}, and the action @samp{print $1}.
+Here is the program:
+
+@example
+/300/ @{ print $1 @}
+@end example
+
+Let's also set @code{FS} to be the @samp{-} character, and run the
+program on the file @file{BBS-list}. The following command prints a
+list of the names of the bulletin boards that operate at 300 baud and
+the first three digits of their phone numbers:
+
+@c tweaked to make the tex output look better in @smallbook
+@example
+@group
+$ awk -F- -f baud.awk BBS-list
+@print{} aardvark 555
+@print{} alpo
+@print{} barfly 555
+@dots{}
+@end group
+@ignore
+@print{} bites 555
+@print{} camelot 555
+@print{} core 555
+@print{} fooey 555
+@print{} foot 555
+@print{} macfoo 555
+@print{} sdace 555
+@print{} sabafoo 555
+@end ignore
+@end example
+
+@noindent
+Note the second line of output. In the original file
+(@pxref{Sample Data Files, ,Data Files for the Examples}),
+the second line looked like this:
+
+@example
+alpo-net 555-3412 2400/1200/300 A
+@end example
+
+The @samp{-} as part of the system's name was used as the field
+separator, instead of the @samp{-} in the phone number that was
+originally intended. This demonstrates why you have to be careful in
+choosing your field and record separators.
+
+On many Unix systems, each user has a separate entry in the system password
+file, one line per user. The information in these lines is separated
+by colons. The first field is the user's logon name, and the second is
+the user's encrypted password. A password file entry might look like this:
+
+@example
+arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
+@end example
+
+The following program searches the system password file, and prints
+the entries for users who have no password:
+
+@example
+awk -F: '$2 == ""' /etc/passwd
+@end example
+
+@node Field Splitting Summary, , Command Line Field Separator, Field Separators
+@subsection Field Splitting Summary
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+According to the POSIX standard, @code{awk} is supposed to behave
+as if each record is split into fields at the time that it is read.
+In particular, this means that you can change the value of @code{FS}
+after a record is read, and the value of the fields (i.e.@: how they were split)
+should reflect the old value of @code{FS}, not the new one.
+
+@cindex dark corner
+@cindex @code{sed} utility
+@cindex stream editor
+However, many implementations of @code{awk} do not work this way. Instead,
+they defer splitting the fields until a field is actually
+referenced. The fields will be split
+using the @emph{current} value of @code{FS}! (d.c.)
+This behavior can be difficult
+to diagnose. The following example illustrates the difference
+between the two methods.
+(The @code{sed}@footnote{The @code{sed} utility is a ``stream editor.''
+Its behavior is also defined by the POSIX standard.}
+command prints just the first line of @file{/etc/passwd}.)
+
+@example
+sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}'
+@end example
+
+@noindent
+will usually print
+
+@example
+root
+@end example
+
+@noindent
+on an incorrect implementation of @code{awk}, while @code{gawk}
+will print something like
+
+@example
+root:nSijPlPhZZwgE:0:0:Root:/:
+@end example
+
+The following table summarizes how fields are split, based on the
+value of @code{FS}. (@samp{==} means ``is equal to.'')
+
+@c @cartouche
+@table @code
+@item FS == " "
+Fields are separated by runs of whitespace. Leading and trailing
+whitespace are ignored. This is the default.
+
+@item FS == @var{any other single character}
+Fields are separated by each occurrence of the character. Multiple
+successive occurrences delimit empty fields, as do leading and
+trailing occurrences.
+The character can even be a regexp metacharacter; it does not need
+to be escaped.
+
+@item FS == @var{regexp}
+Fields are separated by occurrences of characters that match @var{regexp}.
+Leading and trailing matches of @var{regexp} delimit empty fields.
+
+@item FS == ""
+Each individual character in the record becomes a separate field.
+@end table
+@c @end cartouche
+
+@node Constant Size, Multiple Line, Field Separators, Reading Files
+@section Reading Fixed-width Data
+
+(This section discusses an advanced, experimental feature. If you are
+a novice @code{awk} user, you may wish to skip it on the first reading.)
+
+@code{gawk} version 2.13 introduced a new facility for dealing with
+fixed-width fields with no distinctive field separator. Data of this
+nature arises, for example, in the input for old FORTRAN programs where
+numbers are run together; or in the output of programs that did not
+anticipate the use of their output as input for other programs.
+
+An example of the latter is a table where all the columns are lined up by
+the use of a variable number of spaces and @emph{empty fields are just
+spaces}. Clearly, @code{awk}'s normal field splitting based on @code{FS}
+will not work well in this case. Although a portable @code{awk} program
+can use a series of @code{substr} calls on @code{$0}
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}),
+this is awkward and inefficient for a large number of fields.
+
+The splitting of an input record into fixed-width fields is specified by
+assigning a string containing space-separated numbers to the built-in
+variable @code{FIELDWIDTHS}. Each number specifies the width of the field
+@emph{including} columns between fields. If you want to ignore the columns
+between fields, you can specify the width as a separate field that is
+subsequently ignored.
+
+The following data is the output of the Unix @code{w} utility. It is useful
+to illustrate the use of @code{FIELDWIDTHS}.
+
+@example
+@group
+ 10:06pm up 21 days, 14:04, 23 users
+User tty login@ idle JCPU PCPU what
+hzuo ttyV0 8:58pm 9 5 vi p24.tex
+hzang ttyV3 6:37pm 50 -csh
+eklye ttyV5 9:53pm 7 1 em thes.tex
+dportein ttyV6 8:17pm 1:47 -csh
+gierd ttyD3 10:00pm 1 elm
+dave ttyD4 9:47pm 4 4 w
+brent ttyp0 26Jun91 4:46 26:46 4:41 bash
+dave ttyq4 26Jun9115days 46 46 wnewmail
+@end group
+@end example
+
+The following program takes the above input, converts the idle time to
+number of seconds and prints out the first two fields and the calculated
+idle time. (This program uses a number of @code{awk} features that
+haven't been introduced yet.)
+
+@example
+@group
+BEGIN @{ FIELDWIDTHS = "9 6 10 6 7 7 35" @}
+NR > 2 @{
+ idle = $4
+ sub(/^ */, "", idle) # strip leading spaces
+ if (idle == "")
+ idle = 0
+ if (idle ~ /:/) @{
+ split(idle, t, ":")
+ idle = t[1] * 60 + t[2]
+ @}
+ if (idle ~ /days/)
+ idle *= 24 * 60 * 60
+
+ print $1, $2, idle
+@}
+@end group
+@end example
+
+Here is the result of running the program on the data:
+
+@example
+hzuo ttyV0 0
+hzang ttyV3 50
+eklye ttyV5 0
+dportein ttyV6 107
+gierd ttyD3 1
+dave ttyD4 0
+brent ttyp0 286
+dave ttyq4 1296000
+@end example
+
+Another (possibly more practical) example of fixed-width input data
+would be the input from a deck of balloting cards. In some parts of
+the United States, voters mark their choices by punching holes in computer
+cards. These cards are then processed to count the votes for any particular
+candidate or on any particular issue. Since a voter may choose not to
+vote on some issue, any column on the card may be empty. An @code{awk}
+program for processing such data could use the @code{FIELDWIDTHS} feature
+to simplify reading the data. (Of course, getting @code{gawk} to run on
+a system with card readers is another story!)
+
+@ignore
+Exercise: Write a ballot card reading program
+@end ignore
+
+Assigning a value to @code{FS} causes @code{gawk} to return to using
+@code{FS} for field splitting. Use @samp{FS = FS} to make this happen,
+without having to know the current value of @code{FS}.
+
+This feature is still experimental, and may evolve over time.
+Note that in particular, @code{gawk} does not attempt to verify
+the sanity of the values used in the value of @code{FIELDWIDTHS}.
+
+@node Multiple Line, Getline, Constant Size, Reading Files
+@section Multiple-Line Records
+
+@cindex multiple line records
+@cindex input, multiple line records
+@cindex reading files, multiple line records
+@cindex records, multiple line
+In some data bases, a single line cannot conveniently hold all the
+information in one entry. In such cases, you can use multi-line
+records.
+
+The first step in doing this is to choose your data format: when records
+are not defined as single lines, how do you want to define them?
+What should separate records?
+
+One technique is to use an unusual character or string to separate
+records. For example, you could use the formfeed character (written
+@samp{\f} in @code{awk}, as in C) to separate them, making each record
+a page of the file. To do this, just set the variable @code{RS} to
+@code{"\f"} (a string containing the formfeed character). Any
+other character could equally well be used, as long as it won't be part
+of the data in a record.
+
+Another technique is to have blank lines separate records. By a special
+dispensation, an empty string as the value of @code{RS} indicates that
+records are separated by one or more blank lines. If you set @code{RS}
+to the empty string, a record always ends at the first blank line
+encountered. And the next record doesn't start until the first non-blank
+line that follows---no matter how many blank lines appear in a row, they
+are considered one record-separator.
+
+@cindex leftmost longest match
+@cindex matching, leftmost longest
+You can achieve the same effect as @samp{RS = ""} by assigning the
+string @code{"\n\n+"} to @code{RS}. This regexp matches the newline
+at the end of the record, and one or more blank lines after the record.
+In addition, a regular expression always matches the longest possible
+sequence when there is a choice
+(@pxref{Leftmost Longest, ,How Much Text Matches?})
+So the next record doesn't start until
+the first non-blank line that follows---no matter how many blank lines
+appear in a row, they are considered one record-separator.
+
+@cindex dark corner
+There is an important difference between @samp{RS = ""} and
+@samp{RS = "\n\n+"}. In the first case, leading newlines in the input
+data file are ignored, and if a file ends without extra blank lines
+after the last record, the final newline is removed from the record.
+In the second case, this special processing is not done (d.c.).
+
+Now that the input is separated into records, the second step is to
+separate the fields in the record. One way to do this is to divide each
+of the lines into fields in the normal manner. This happens by default
+as the result of a special feature: when @code{RS} is set to the empty
+string, the newline character @emph{always} acts as a field separator.
+This is in addition to whatever field separations result from @code{FS}.
+
+The original motivation for this special exception was probably to provide
+useful behavior in the default case (i.e.@: @code{FS} is equal
+to @w{@code{" "}}). This feature can be a problem if you really don't
+want the newline character to separate fields, since there is no way to
+prevent it. However, you can work around this by using the @code{split}
+function to break up the record manually
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+
+Another way to separate fields is to
+put each field on a separate line: to do this, just set the
+variable @code{FS} to the string @code{"\n"}. (This simple regular
+expression matches a single newline.)
+
+A practical example of a data file organized this way might be a mailing
+list, where each entry is separated by blank lines. If we have a mailing
+list in a file named @file{addresses}, that looks like this:
+
+@example
+Jane Doe
+123 Main Street
+Anywhere, SE 12345-6789
+
+John Smith
+456 Tree-lined Avenue
+Smallville, MW 98765-4321
+
+@dots{}
+@end example
+
+@noindent
+A simple program to process this file would look like this:
+
+@example
+@group
+# addrs.awk --- simple mailing list program
+
+# Records are separated by blank lines.
+# Each line is one field.
+BEGIN @{ RS = "" ; FS = "\n" @}
+
+@{
+ print "Name is:", $1
+ print "Address is:", $2
+ print "City and State are:", $3
+ print ""
+@}
+@end group
+@end example
+
+Running the program produces the following output:
+
+@example
+@group
+$ awk -f addrs.awk addresses
+@print{} Name is: Jane Doe
+@print{} Address is: 123 Main Street
+@print{} City and State are: Anywhere, SE 12345-6789
+@print{}
+@end group
+@group
+@print{} Name is: John Smith
+@print{} Address is: 456 Tree-lined Avenue
+@print{} City and State are: Smallville, MW 98765-4321
+@print{}
+@dots{}
+@end group
+@end example
+
+@xref{Labels Program, ,Printing Mailing Labels}, for a more realistic
+program that deals with address lists.
+
+The following table summarizes how records are split, based on the
+value of @code{RS}. (@samp{==} means ``is equal to.'')
+
+@c @cartouche
+@table @code
+@item RS == "\n"
+Records are separated by the newline character (@samp{\n}). In effect,
+every line in the data file is a separate record, including blank lines.
+This is the default.
+
+@item RS == @var{any single character}
+Records are separated by each occurrence of the character. Multiple
+successive occurrences delimit empty records.
+
+@item RS == ""
+Records are separated by runs of blank lines. The newline character
+always serves as a field separator, in addition to whatever value
+@code{FS} may have. Leading and trailing newlines in a file are ignored.
+
+@item RS == @var{regexp}
+Records are separated by occurrences of characters that match @var{regexp}.
+Leading and trailing matches of @var{regexp} delimit empty records.
+@end table
+@c @end cartouche
+
+@vindex RT
+In all cases, @code{gawk} sets @code{RT} to the input text that matched the
+value specified by @code{RS}.
+
+@node Getline, , Multiple Line, Reading Files
+@section Explicit Input with @code{getline}
+
+@findex getline
+@cindex input, explicit
+@cindex explicit input
+@cindex input, @code{getline} command
+@cindex reading files, @code{getline} command
+So far we have been getting our input data from @code{awk}'s main
+input stream---either the standard input (usually your terminal, sometimes
+the output from another program) or from the
+files specified on the command line. The @code{awk} language has a
+special built-in command called @code{getline} that
+can be used to read input under your explicit control.
+
+@menu
+* Getline Intro:: Introduction to the @code{getline} function.
+* Plain Getline:: Using @code{getline} with no arguments.
+* Getline/Variable:: Using @code{getline} into a variable.
+* Getline/File:: Using @code{getline} from a file.
+* Getline/Variable/File:: Using @code{getline} into a variable from a
+ file.
+* Getline/Pipe:: Using @code{getline} from a pipe.
+* Getline/Variable/Pipe:: Using @code{getline} into a variable from a
+ pipe.
+* Getline Summary:: Summary Of @code{getline} Variants.
+@end menu
+
+@node Getline Intro, Plain Getline, Getline, Getline
+@subsection Introduction to @code{getline}
+
+This command is used in several different ways, and should @emph{not} be
+used by beginners. It is covered here because this is the chapter on input.
+The examples that follow the explanation of the @code{getline} command
+include material that has not been covered yet. Therefore, come back
+and study the @code{getline} command @emph{after} you have reviewed the
+rest of this @value{DOCUMENT} and have a good knowledge of how @code{awk} works.
+
+@vindex ERRNO
+@cindex differences between @code{gawk} and @code{awk}
+@cindex @code{getline}, return values
+@code{getline} returns one if it finds a record, and zero if the end of the
+file is encountered. If there is some error in getting a record, such
+as a file that cannot be opened, then @code{getline} returns @minus{}1.
+In this case, @code{gawk} sets the variable @code{ERRNO} to a string
+describing the error that occurred.
+
+In the following examples, @var{command} stands for a string value that
+represents a shell command.
+
+@node Plain Getline, Getline/Variable, Getline Intro, Getline
+@subsection Using @code{getline} with No Arguments
+
+The @code{getline} command can be used without arguments to read input
+from the current input file. All it does in this case is read the next
+input record and split it up into fields. This is useful if you've
+finished processing the current record, but you want to do some special
+processing @emph{right now} on the next record. Here's an
+example:
+
+@example
+@group
+awk '@{
+ if ((t = index($0, "/*")) != 0) @{
+ # value will be "" if t is 1
+ tmp = substr($0, 1, t - 1)
+ u = index(substr($0, t + 2), "*/")
+ while (u == 0) @{
+ if (getline <= 0) @{
+ m = "unexpected EOF or error"
+ m = (m ": " ERRNO)
+ print m > "/dev/stderr"
+ exit
+ @}
+ t = -1
+ u = index($0, "*/")
+ @}
+@end group
+@group
+ # substr expression will be "" if */
+ # occurred at end of line
+ $0 = tmp substr($0, t + u + 3)
+ @}
+ print $0
+@}'
+@end group
+@end example
+
+This @code{awk} program deletes all C-style comments, @samp{/* @dots{}
+*/}, from the input. By replacing the @samp{print $0} with other
+statements, you could perform more complicated processing on the
+decommented input, like searching for matches of a regular
+expression. This program has a subtle problem---it does not work if one
+comment ends and another begins on the same line.
+
+@ignore
+Exercise,
+write a program that does handle multiple comments on the line.
+@end ignore
+
+This form of the @code{getline} command sets @code{NF} (the number of
+fields; @pxref{Fields, ,Examining Fields}), @code{NR} (the number of
+records read so far; @pxref{Records, ,How Input is Split into Records}),
+@code{FNR} (the number of records read from this input file), and the
+value of @code{$0}.
+
+@cindex dark corner
+@strong{Note:} the new value of @code{$0} is used in testing
+the patterns of any subsequent rules. The original value
+of @code{$0} that triggered the rule which executed @code{getline}
+is lost (d.c.).
+By contrast, the @code{next} statement reads a new record
+but immediately begins processing it normally, starting with the first
+rule in the program. @xref{Next Statement, ,The @code{next} Statement}.
+
+@node Getline/Variable, Getline/File, Plain Getline, Getline
+@subsection Using @code{getline} Into a Variable
+
+You can use @samp{getline @var{var}} to read the next record from
+@code{awk}'s input into the variable @var{var}. No other processing is
+done.
+
+For example, suppose the next line is a comment, or a special string,
+and you want to read it, without triggering
+any rules. This form of @code{getline} allows you to read that line
+and store it in a variable so that the main
+read-a-line-and-check-each-rule loop of @code{awk} never sees it.
+
+The following example swaps every two lines of input. For example, given:
+
+@example
+wan
+tew
+free
+phore
+@end example
+
+@noindent
+it outputs:
+
+@example
+tew
+wan
+phore
+free
+@end example
+
+@noindent
+Here's the program:
+
+@example
+@group
+awk '@{
+ if ((getline tmp) > 0) @{
+ print tmp
+ print $0
+ @} else
+ print $0
+@}'
+@end group
+@end example
+
+The @code{getline} command used in this way sets only the variables
+@code{NR} and @code{FNR} (and of course, @var{var}). The record is not
+split into fields, so the values of the fields (including @code{$0}) and
+the value of @code{NF} do not change.
+
+@node Getline/File, Getline/Variable/File, Getline/Variable, Getline
+@subsection Using @code{getline} from a File
+
+@cindex input redirection
+@cindex redirection of input
+Use @samp{getline < @var{file}} to read
+the next record from the file
+@var{file}. Here @var{file} is a string-valued expression that
+specifies the file name. @samp{< @var{file}} is called a @dfn{redirection}
+since it directs input to come from a different place.
+
+For example, the following
+program reads its input record from the file @file{secondary.input} when it
+encounters a first field with a value equal to 10 in the current input
+file.
+
+@example
+@group
+awk '@{
+ if ($1 == 10) @{
+ getline < "secondary.input"
+ print
+ @} else
+ print
+@}'
+@end group
+@end example
+
+Since the main input stream is not used, the values of @code{NR} and
+@code{FNR} are not changed. But the record read is split into fields in
+the normal manner, so the values of @code{$0} and other fields are
+changed. So is the value of @code{NF}.
+
+@node Getline/Variable/File, Getline/Pipe, Getline/File, Getline
+@subsection Using @code{getline} Into a Variable from a File
+
+Use @samp{getline @var{var} < @var{file}} to read input
+the file
+@var{file} and put it in the variable @var{var}. As above, @var{file}
+is a string-valued expression that specifies the file from which to read.
+
+In this version of @code{getline}, none of the built-in variables are
+changed, and the record is not split into fields. The only variable
+changed is @var{var}.
+
+For example, the following program copies all the input files to the
+output, except for records that say @w{@samp{@@include @var{filename}}}.
+Such a record is replaced by the contents of the file
+@var{filename}.
+
+@example
+@group
+awk '@{
+ if (NF == 2 && $1 == "@@include") @{
+ while ((getline line < $2) > 0)
+ print line
+ close($2)
+ @} else
+ print
+@}'
+@end group
+@end example
+
+Note here how the name of the extra input file is not built into
+the program; it is taken directly from the data, from the second field on
+the @samp{@@include} line.
+
+The @code{close} function is called to ensure that if two identical
+@samp{@@include} lines appear in the input, the entire specified file is
+included twice.
+@xref{Close Files And Pipes, ,Closing Input and Output Files and Pipes}.
+
+One deficiency of this program is that it does not process nested
+@samp{@@include} statements
+(@samp{@@include} statements in included files)
+the way a true macro preprocessor would.
+@xref{Igawk Program, ,An Easy Way to Use Library Functions}, for a program
+that does handle nested @samp{@@include} statements.
+
+@node Getline/Pipe, Getline/Variable/Pipe, Getline/Variable/File, Getline
+@subsection Using @code{getline} from a Pipe
+
+@cindex input pipeline
+@cindex pipeline, input
+You can pipe the output of a command into @code{getline}, using
+@samp{@var{command} | getline}. In
+this case, the string @var{command} is run as a shell command and its output
+is piped into @code{awk} to be used as input. This form of @code{getline}
+reads one record at a time from the pipe.
+
+For example, the following program copies its input to its output, except for
+lines that begin with @samp{@@execute}, which are replaced by the output
+produced by running the rest of the line as a shell command:
+
+@example
+@group
+awk '@{
+ if ($1 == "@@execute") @{
+ tmp = substr($0, 10)
+ while ((tmp | getline) > 0)
+ print
+ close(tmp)
+ @} else
+ print
+@}'
+@end group
+@end example
+
+@noindent
+The @code{close} function is called to ensure that if two identical
+@samp{@@execute} lines appear in the input, the command is run for
+each one.
+@xref{Close Files And Pipes, ,Closing Input and Output Files and Pipes}.
+@c Exercise!!
+@c This example is unrealistic, since you could just use system
+
+Given the input:
+
+@example
+@group
+foo
+bar
+baz
+@@execute who
+bletch
+@end group
+@end example
+
+@noindent
+the program might produce:
+
+@example
+@group
+foo
+bar
+baz
+arnold ttyv0 Jul 13 14:22
+miriam ttyp0 Jul 13 14:23 (murphy:0)
+bill ttyp1 Jul 13 14:23 (murphy:0)
+bletch
+@end group
+@end example
+
+@noindent
+Notice that this program ran the command @code{who} and printed the result.
+(If you try this program yourself, you will of course get different results,
+showing you who is logged in on your system.)
+
+This variation of @code{getline} splits the record into fields, sets the
+value of @code{NF} and recomputes the value of @code{$0}. The values of
+@code{NR} and @code{FNR} are not changed.
+
+@node Getline/Variable/Pipe, Getline Summary, Getline/Pipe, Getline
+@subsection Using @code{getline} Into a Variable from a Pipe
+
+When you use @samp{@var{command} | getline @var{var}}, the
+output of the command @var{command} is sent through a pipe to
+@code{getline} and into the variable @var{var}. For example, the
+following program reads the current date and time into the variable
+@code{current_time}, using the @code{date} utility, and then
+prints it.
+
+@example
+@group
+awk 'BEGIN @{
+ "date" | getline current_time
+ close("date")
+ print "Report printed on " current_time
+@}'
+@end group
+@end example
+
+In this version of @code{getline}, none of the built-in variables are
+changed, and the record is not split into fields.
+
+@node Getline Summary, , Getline/Variable/Pipe, Getline
+@subsection Summary of @code{getline} Variants
+
+With all the forms of @code{getline}, even though @code{$0} and @code{NF},
+may be updated, the record will not be tested against all the patterns
+in the @code{awk} program, in the way that would happen if the record
+were read normally by the main processing loop of @code{awk}. However
+the new record is tested against any subsequent rules.
+
+@cindex differences between @code{gawk} and @code{awk}
+@cindex limitations
+@cindex implementation limits
+Many @code{awk} implementations limit the number of pipelines an @code{awk}
+program may have open to just one! In @code{gawk}, there is no such limit.
+You can open as many pipelines as the underlying operating system will
+permit.
+
+The following table summarizes the six variants of @code{getline},
+listing which built-in variables are set by each one.
+
+@iftex
+@page
+@end iftex
+@c @cartouche
+@table @code
+@item getline
+sets @code{$0}, @code{NF}, @code{FNR}, and @code{NR}.
+
+@item getline @var{var}
+sets @var{var}, @code{FNR}, and @code{NR}.
+
+@item getline < @var{file}
+sets @code{$0}, and @code{NF}.
+
+@item getline @var{var} < @var{file}
+sets @var{var}.
+
+@item @var{command} | getline
+sets @code{$0}, and @code{NF}.
+
+@item @var{command} | getline @var{var}
+sets @var{var}.
+@end table
+@c @end cartouche
+
+@node Printing, Expressions, Reading Files, Top
+@chapter Printing Output
+
+@cindex printing
+@cindex output
+One of the most common actions is to @dfn{print}, or output,
+some or all of the input. You use the @code{print} statement
+for simple output. You use the @code{printf} statement
+for fancier formatting. Both are described in this chapter.
+
+@menu
+* Print:: The @code{print} statement.
+* Print Examples:: Simple examples of @code{print} statements.
+* Output Separators:: The output separators and how to change them.
+* OFMT:: Controlling Numeric Output With @code{print}.
+* Printf:: The @code{printf} statement.
+* Redirection:: How to redirect output to multiple files and
+ pipes.
+* Special Files:: File name interpretation in @code{gawk}.
+ @code{gawk} allows access to inherited file
+ descriptors.
+* Close Files And Pipes:: Closing Input and Output Files and Pipes.
+@end menu
+
+@node Print, Print Examples, Printing, Printing
+@section The @code{print} Statement
+@cindex @code{print} statement
+
+The @code{print} statement does output with simple, standardized
+formatting. You specify only the strings or numbers to be printed, in a
+list separated by commas. They are output, separated by single spaces,
+followed by a newline. The statement looks like this:
+
+@example
+print @var{item1}, @var{item2}, @dots{}
+@end example
+
+@noindent
+The entire list of items may optionally be enclosed in parentheses. The
+parentheses are necessary if any of the item expressions uses the @samp{>}
+relational operator; otherwise it could be confused with a redirection
+(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}).
+
+The items to be printed can be constant strings or numbers, fields of the
+current record (such as @code{$1}), variables, or any @code{awk}
+expressions.
+Numeric values are converted to strings, and then printed.
+
+The @code{print} statement is completely general for
+computing @emph{what} values to print. However, with two exceptions,
+you cannot specify @emph{how} to print them---how many
+columns, whether to use exponential notation or not, and so on.
+(For the exceptions, @pxref{Output Separators}, and
+@ref{OFMT, ,Controlling Numeric Output with @code{print}}.)
+For that, you need the @code{printf} statement
+(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).
+
+The simple statement @samp{print} with no items is equivalent to
+@samp{print $0}: it prints the entire current record. To print a blank
+line, use @samp{print ""}, where @code{""} is the empty string.
+
+To print a fixed piece of text, use a string constant such as
+@w{@code{"Don't Panic"}} as one item. If you forget to use the
+double-quote characters, your text will be taken as an @code{awk}
+expression, and you will probably get an error. Keep in mind that a
+space is printed between any two items.
+
+Each @code{print} statement makes at least one line of output. But it
+isn't limited to one line. If an item value is a string that contains a
+newline, the newline is output along with the rest of the string. A
+single @code{print} can make any number of lines this way.
+
+@node Print Examples, Output Separators, Print, Printing
+@section Examples of @code{print} Statements
+
+Here is an example of printing a string that contains embedded newlines
+(the @samp{\n} is an escape sequence, used to represent the newline
+character; see @ref{Escape Sequences}):
+
+@example
+@group
+$ awk 'BEGIN @{ print "line one\nline two\nline three" @}'
+@print{} line one
+@print{} line two
+@print{} line three
+@end group
+@end example
+
+Here is an example that prints the first two fields of each input record,
+with a space between them:
+
+@example
+@group
+$ awk '@{ print $1, $2 @}' inventory-shipped
+@print{} Jan 13
+@print{} Feb 15
+@print{} Mar 15
+@dots{}
+@end group
+@end example
+
+@cindex common mistakes
+@cindex mistakes, common
+@cindex errors, common
+A common mistake in using the @code{print} statement is to omit the comma
+between two items. This often has the effect of making the items run
+together in the output, with no space. The reason for this is that
+juxtaposing two string expressions in @code{awk} means to concatenate
+them. Here is the same program, without the comma:
+
+@example
+@group
+$ awk '@{ print $1 $2 @}' inventory-shipped
+@print{} Jan13
+@print{} Feb15
+@print{} Mar15
+@dots{}
+@end group
+@end example
+
+To someone unfamiliar with the file @file{inventory-shipped}, neither
+example's output makes much sense. A heading line at the beginning
+would make it clearer. Let's add some headings to our table of months
+(@code{$1}) and green crates shipped (@code{$2}). We do this using the
+@code{BEGIN} pattern
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns})
+to force the headings to be printed only once:
+
+@example
+awk 'BEGIN @{ print "Month Crates"
+ print "----- ------" @}
+ @{ print $1, $2 @}' inventory-shipped
+@end example
+
+@noindent
+Did you already guess what happens? When run, the program prints
+the following:
+
+@example
+@group
+Month Crates
+----- ------
+Jan 13
+Feb 15
+Mar 15
+@dots{}
+@end group
+@end example
+
+@noindent
+The headings and the table data don't line up! We can fix this by printing
+some spaces between the two fields:
+
+@example
+awk 'BEGIN @{ print "Month Crates"
+ print "----- ------" @}
+ @{ print $1, " ", $2 @}' inventory-shipped
+@end example
+
+You can imagine that this way of lining up columns can get pretty
+complicated when you have many columns to fix. Counting spaces for two
+or three columns can be simple, but more than this and you can get
+lost quite easily. This is why the @code{printf} statement was
+created (@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing});
+one of its specialties is lining up columns of data.
+
+@cindex line continuation
+As a side point,
+you can continue either a @code{print} or @code{printf} statement simply
+by putting a newline after any comma
+(@pxref{Statements/Lines, ,@code{awk} Statements Versus Lines}).
+
+@node Output Separators, OFMT, Print Examples, Printing
+@section Output Separators
+
+@cindex output field separator, @code{OFS}
+@cindex output record separator, @code{ORS}
+@vindex OFS
+@vindex ORS
+As mentioned previously, a @code{print} statement contains a list
+of items, separated by commas. In the output, the items are normally
+separated by single spaces. This need not be the case; a
+single space is only the default. You can specify any string of
+characters to use as the @dfn{output field separator} by setting the
+built-in variable @code{OFS}. The initial value of this variable
+is the string @w{@code{" "}}, that is, a single space.
+
+The output from an entire @code{print} statement is called an
+@dfn{output record}. Each @code{print} statement outputs one output
+record and then outputs a string called the @dfn{output record separator}.
+The built-in variable @code{ORS} specifies this string. The initial
+value of @code{ORS} is the string @code{"\n"}, i.e.@: a newline
+character; thus, normally each @code{print} statement makes a separate line.
+
+You can change how output fields and records are separated by assigning
+new values to the variables @code{OFS} and/or @code{ORS}. The usual
+place to do this is in the @code{BEGIN} rule
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}), so
+that it happens before any input is processed. You may also do this
+with assignments on the command line, before the names of your input
+files, or using the @samp{-v} command line option
+(@pxref{Options, ,Command Line Options}).
+
+@ignore
+Exercise,
+Rewrite the
+@example
+awk 'BEGIN @{ print "Month Crates"
+ print "----- ------" @}
+ @{ print $1, " ", $2 @}' inventory-shipped
+@end example
+program by using a new value of @code{OFS}.
+@end ignore
+
+The following example prints the first and second fields of each input
+record separated by a semicolon, with a blank line added after each
+line:
+
+@example
+@group
+$ awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}
+> @{ print $1, $2 @}' BBS-list
+@print{} aardvark;555-5553
+@print{}
+@print{} alpo-net;555-3412
+@print{}
+@print{} barfly;555-7685
+@dots{}
+@end group
+@end example
+
+If the value of @code{ORS} does not contain a newline, all your output
+will be run together on a single line, unless you output newlines some
+other way.
+
+@node OFMT, Printf, Output Separators, Printing
+@section Controlling Numeric Output with @code{print}
+@vindex OFMT
+@cindex numeric output format
+@cindex format, numeric output
+@cindex output format specifier, @code{OFMT}
+When you use the @code{print} statement to print numeric values,
+@code{awk} internally converts the number to a string of characters,
+and prints that string. @code{awk} uses the @code{sprintf} function
+to do this conversion
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+For now, it suffices to say that the @code{sprintf}
+function accepts a @dfn{format specification} that tells it how to format
+numbers (or strings), and that there are a number of different ways in which
+numbers can be formatted. The different format specifications are discussed
+more fully in
+@ref{Control Letters, , Format-Control Letters}.
+
+The built-in variable @code{OFMT} contains the default format specification
+that @code{print} uses with @code{sprintf} when it wants to convert a
+number to a string for printing.
+The default value of @code{OFMT} is @code{"%.6g"}.
+By supplying different format specifications
+as the value of @code{OFMT}, you can change how @code{print} will print
+your numbers. As a brief example:
+
+@example
+@group
+$ awk 'BEGIN @{
+> OFMT = "%.0f" # print numbers as integers (rounds)
+> print 17.23 @}'
+@print{} 17
+@end group
+@end example
+
+@noindent
+@cindex dark corner
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+According to the POSIX standard, @code{awk}'s behavior will be undefined
+if @code{OFMT} contains anything but a floating point conversion specification
+(d.c.).
+
+@node Printf, Redirection, OFMT, Printing
+@section Using @code{printf} Statements for Fancier Printing
+@cindex formatted output
+@cindex output, formatted
+
+If you want more precise control over the output format than
+@code{print} gives you, use @code{printf}. With @code{printf} you can
+specify the width to use for each item, and you can specify various
+formatting choices for numbers (such as what radix to use, whether to
+print an exponent, whether to print a sign, and how many digits to print
+after the decimal point). You do this by supplying a string, called
+the @dfn{format string}, which controls how and where to print the other
+arguments.
+
+@menu
+* Basic Printf:: Syntax of the @code{printf} statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+@end menu
+
+@node Basic Printf, Control Letters, Printf, Printf
+@subsection Introduction to the @code{printf} Statement
+
+@cindex @code{printf} statement, syntax of
+The @code{printf} statement looks like this:
+
+@example
+printf @var{format}, @var{item1}, @var{item2}, @dots{}
+@end example
+
+@noindent
+The entire list of arguments may optionally be enclosed in parentheses. The
+parentheses are necessary if any of the item expressions use the @samp{>}
+relational operator; otherwise it could be confused with a redirection
+(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}).
+
+@cindex format string
+The difference between @code{printf} and @code{print} is the @var{format}
+argument. This is an expression whose value is taken as a string; it
+specifies how to output each of the other arguments. It is called
+the @dfn{format string}.
+
+The format string is very similar to that in the ANSI C library function
+@code{printf}. Most of @var{format} is text to be output verbatim.
+Scattered among this text are @dfn{format specifiers}, one per item.
+Each format specifier says to output the next item in the argument list
+at that place in the format.
+
+The @code{printf} statement does not automatically append a newline to its
+output. It outputs only what the format string specifies. So if you want
+a newline, you must include one in the format string. The output separator
+variables @code{OFS} and @code{ORS} have no effect on @code{printf}
+statements. For example:
+
+@example
+@group
+BEGIN @{
+ ORS = "\nOUCH!\n"; OFS = "!"
+ msg = "Don't Panic!"; printf "%s\n", msg
+@}
+@end group
+@end example
+
+This program still prints the familiar @samp{Don't Panic!} message.
+
+@node Control Letters, Format Modifiers, Basic Printf, Printf
+@subsection Format-Control Letters
+@cindex @code{printf}, format-control characters
+@cindex format specifier
+
+A format specifier starts with the character @samp{%} and ends with a
+@dfn{format-control letter}; it tells the @code{printf} statement how
+to output one item. (If you actually want to output a @samp{%}, write
+@samp{%%}.) The format-control letter specifies what kind of value to
+print. The rest of the format specifier is made up of optional
+@dfn{modifiers} which are parameters to use, such as the field width.
+
+Here is a list of the format-control letters:
+
+@table @code
+@item c
+This prints a number as an ASCII character. Thus, @samp{printf "%c",
+65} outputs the letter @samp{A}. The output for a string value is
+the first character of the string.
+
+@iftex
+@page
+@end iftex
+@item d
+@itemx i
+These are equivalent. They both print a decimal integer.
+The @samp{%i} specification is for compatibility with ANSI C.
+
+@item e
+@itemx E
+This prints a number in scientific (exponential) notation.
+For example,
+
+@example
+printf "%4.3e\n", 1950
+@end example
+
+@noindent
+prints @samp{1.950e+03}, with a total of four significant figures of
+which three follow the decimal point. The @samp{4.3} are modifiers,
+discussed below. @samp{%E} uses @samp{E} instead of @samp{e} in the output.
+
+@item f
+This prints a number in floating point notation.
+For example,
+
+@example
+printf "%4.3f", 1950
+@end example
+
+@noindent
+prints @samp{1950.000}, with a total of four significant figures of
+which three follow the decimal point. The @samp{4.3} are modifiers,
+discussed below.
+
+@item g
+@itemx G
+This prints a number in either scientific notation or floating point
+notation, whichever uses fewer characters. If the result is printed in
+scientific notation, @samp{%G} uses @samp{E} instead of @samp{e}.
+
+@item o
+This prints an unsigned octal integer.
+(In octal, or base-eight notation, the digits run from @samp{0} to @samp{7};
+the decimal number eight is represented as @samp{10} in octal.)
+
+@item s
+This prints a string.
+
+@item x
+@itemx X
+This prints an unsigned hexadecimal integer.
+(In hexadecimal, or base-16 notation, the digits are @samp{0} through @samp{9}
+and @samp{a} through @samp{f}. The hexadecimal digit @samp{f} represents
+the decimal number 15.) @samp{%X} uses the letters @samp{A} through @samp{F}
+instead of @samp{a} through @samp{f}.
+
+@item %
+This isn't really a format-control letter, but it does have a meaning
+when used after a @samp{%}: the sequence @samp{%%} outputs one
+@samp{%}. It does not consume an argument, and it ignores any
+modifiers.
+@end table
+
+@cindex dark corner
+When using the integer format-control letters for values that are outside
+the range of a C @code{long} integer, @code{gawk} will switch to the
+@samp{%g} format specifier. Other versions of @code{awk} may print
+invalid values, or do something else entirely (d.c.).
+
+@node Format Modifiers, Printf Examples, Control Letters, Printf
+@subsection Modifiers for @code{printf} Formats
+
+@cindex @code{printf}, modifiers
+@cindex modifiers (in format specifiers)
+A format specification can also include @dfn{modifiers} that can control
+how much of the item's value is printed and how much space it gets. The
+modifiers come between the @samp{%} and the format-control letter.
+In the examples below, we use the bullet symbol ``@bullet{}'' to represent
+spaces in the output. Here are the possible modifiers, in the order in
+which they may appear:
+
+@table @code
+@item -
+The minus sign, used before the width modifier (see below),
+says to left-justify
+the argument within its specified width. Normally the argument
+is printed right-justified in the specified width. Thus,
+
+@example
+printf "%-4s", "foo"
+@end example
+
+@noindent
+prints @samp{foo@bullet{}}.
+
+@item @var{space}
+For numeric conversions, prefix positive values with a space, and
+negative values with a minus sign.
+
+@item +
+The plus sign, used before the width modifier (see below),
+says to always supply a sign for numeric conversions, even if the data
+to be formatted is positive. The @samp{+} overrides the space modifier.
+
+@item #
+Use an ``alternate form'' for certain control letters.
+For @samp{%o}, supply a leading zero.
+For @samp{%x}, and @samp{%X}, supply a leading @samp{0x} or @samp{0X} for
+a non-zero result.
+For @samp{%e}, @samp{%E}, and @samp{%f}, the result will always contain a
+decimal point.
+For @samp{%g}, and @samp{%G}, trailing zeros are not removed from the result.
+
+@cindex dark corner
+@item 0
+A leading @samp{0} (zero) acts as a flag, that indicates output should be
+padded with zeros instead of spaces.
+This applies even to non-numeric output formats (d.c.).
+This flag only has an effect when the field width is wider than the
+value to be printed.
+
+@item @var{width}
+This is a number specifying the desired minimum width of a field. Inserting any
+number between the @samp{%} sign and the format control character forces the
+field to be expanded to this width. The default way to do this is to
+pad with spaces on the left. For example,
+
+@example
+printf "%4s", "foo"
+@end example
+
+@noindent
+prints @samp{@bullet{}foo}.
+
+The value of @var{width} is a minimum width, not a maximum. If the item
+value requires more than @var{width} characters, it can be as wide as
+necessary. Thus,
+
+@example
+printf "%4s", "foobar"
+@end example
+
+@noindent
+prints @samp{foobar}.
+
+Preceding the @var{width} with a minus sign causes the output to be
+padded with spaces on the right, instead of on the left.
+
+@item .@var{prec}
+This is a number that specifies the precision to use when printing.
+For the @samp{e}, @samp{E}, and @samp{f} formats, this specifies the
+number of digits you want printed to the right of the decimal point.
+For the @samp{g}, and @samp{G} formats, it specifies the maximum number
+of significant digits. For the @samp{d}, @samp{o}, @samp{i}, @samp{u},
+@samp{x}, and @samp{X} formats, it specifies the minimum number of
+digits to print. For a string, it specifies the maximum number of
+characters from the string that should be printed. Thus,
+
+@example
+printf "%.4s", "foobar"
+@end example
+
+@noindent
+prints @samp{foob}.
+@end table
+
+The C library @code{printf}'s dynamic @var{width} and @var{prec}
+capability (for example, @code{"%*.*s"}) is supported. Instead of
+supplying explicit @var{width} and/or @var{prec} values in the format
+string, you pass them in the argument list. For example:
+
+@example
+w = 5
+p = 3
+s = "abcdefg"
+printf "%*.*s\n", w, p, s
+@end example
+
+@noindent
+is exactly equivalent to
+
+@example
+s = "abcdefg"
+printf "%5.3s\n", s
+@end example
+
+@noindent
+Both programs output @samp{@w{@bullet{}@bullet{}abc}}.
+
+Earlier versions of @code{awk} did not support this capability.
+If you must use such a version, you may simulate this feature by using
+concatenation to build up the format string, like so:
+
+@example
+w = 5
+p = 3
+s = "abcdefg"
+printf "%" w "." p "s\n", s
+@end example
+
+@noindent
+This is not particularly easy to read, but it does work.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+C programmers may be used to supplying additional @samp{l} and @samp{h}
+flags in @code{printf} format strings. These are not valid in @code{awk}.
+Most @code{awk} implementations silently ignore these flags.
+If @samp{--lint} is provided on the command line
+(@pxref{Options, ,Command Line Options}),
+@code{gawk} will warn about their use. If @samp{--posix} is supplied,
+their use is a fatal error.
+
+@node Printf Examples, , Format Modifiers, Printf
+@subsection Examples Using @code{printf}
+
+Here is how to use @code{printf} to make an aligned table:
+
+@example
+awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+@end example
+
+@noindent
+prints the names of bulletin boards (@code{$1}) of the file
+@file{BBS-list} as a string of 10 characters, left justified. It also
+prints the phone numbers (@code{$2}) afterward on the line. This
+produces an aligned two-column table of names and phone numbers:
+
+@example
+@group
+$ awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+@print{} aardvark 555-5553
+@print{} alpo-net 555-3412
+@print{} barfly 555-7685
+@print{} bites 555-1675
+@print{} camelot 555-0542
+@print{} core 555-2912
+@print{} fooey 555-1234
+@print{} foot 555-6699
+@print{} macfoo 555-6480
+@print{} sdace 555-3430
+@print{} sabafoo 555-2127
+@end group
+@end example
+
+Did you notice that we did not specify that the phone numbers be printed
+as numbers? They had to be printed as strings because the numbers are
+separated by a dash.
+If we had tried to print the phone numbers as numbers, all we would have
+gotten would have been the first three digits, @samp{555}.
+This would have been pretty confusing.
+
+We did not specify a width for the phone numbers because they are the
+last things on their lines. We don't need to put spaces after them.
+
+We could make our table look even nicer by adding headings to the tops
+of the columns. To do this, we use the @code{BEGIN} pattern
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns})
+to force the header to be printed only once, at the beginning of
+the @code{awk} program:
+
+@example
+@group
+awk 'BEGIN @{ print "Name Number"
+ print "---- ------" @}
+ @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+@end group
+@end example
+
+Did you notice that we mixed @code{print} and @code{printf} statements in
+the above example? We could have used just @code{printf} statements to get
+the same results:
+
+@example
+@group
+awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number"
+ printf "%-10s %s\n", "----", "------" @}
+ @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+@end group
+@end example
+
+@noindent
+By printing each column heading with the same format specification
+used for the elements of the column, we have made sure that the headings
+are aligned just like the columns.
+
+The fact that the same format specification is used three times can be
+emphasized by storing it in a variable, like this:
+
+@example
+@group
+awk 'BEGIN @{ format = "%-10s %s\n"
+ printf format, "Name", "Number"
+ printf format, "----", "------" @}
+ @{ printf format, $1, $2 @}' BBS-list
+@end group
+@end example
+
+@c !!! exercise
+See if you can use the @code{printf} statement to line up the headings and
+table data for our @file{inventory-shipped} example covered earlier in the
+section on the @code{print} statement
+(@pxref{Print, ,The @code{print} Statement}).
+
+@node Redirection, Special Files, Printf, Printing
+@section Redirecting Output of @code{print} and @code{printf}
+
+@cindex output redirection
+@cindex redirection of output
+So far we have been dealing only with output that prints to the standard
+output, usually your terminal. Both @code{print} and @code{printf} can
+also send their output to other places.
+This is called @dfn{redirection}.
+
+A redirection appears after the @code{print} or @code{printf} statement.
+Redirections in @code{awk} are written just like redirections in shell
+commands, except that they are written inside the @code{awk} program.
+
+There are three forms of output redirection: output to a file,
+output appended to a file, and output through a pipe to another
+command.
+They are all shown for
+the @code{print} statement, but they work identically for @code{printf}
+also.
+
+@table @code
+@item print @var{items} > @var{output-file}
+This type of redirection prints the items into the output file
+@var{output-file}. The file name @var{output-file} can be any
+expression. Its value is changed to a string and then used as a
+file name (@pxref{Expressions}).
+
+When this type of redirection is used, the @var{output-file} is erased
+before the first output is written to it. Subsequent writes
+to the same @var{output-file} do not
+erase @var{output-file}, but append to it. If @var{output-file} does
+not exist, then it is created.
+
+For example, here is how an @code{awk} program can write a list of
+BBS names to a file @file{name-list} and a list of phone numbers to a
+file @file{phone-list}. Each output file contains one name or number
+per line.
+
+@example
+@group
+$ awk '@{ print $2 > "phone-list"
+> print $1 > "name-list" @}' BBS-list
+@end group
+@group
+$ cat phone-list
+@print{} 555-5553
+@print{} 555-3412
+@dots{}
+@end group
+@group
+$ cat name-list
+@print{} aardvark
+@print{} alpo-net
+@dots{}
+@end group
+@end example
+
+@item print @var{items} >> @var{output-file}
+This type of redirection prints the items into the pre-existing output file
+@var{output-file}. The difference between this and the
+single-@samp{>} redirection is that the old contents (if any) of
+@var{output-file} are not erased. Instead, the @code{awk} output is
+appended to the file.
+If @var{output-file} does not exist, then it is created.
+
+@cindex pipes for output
+@cindex output, piping
+@item print @var{items} | @var{command}
+It is also possible to send output to another program through a pipe
+instead of into a
+file. This type of redirection opens a pipe to @var{command} and writes
+the values of @var{items} through this pipe, to another process created
+to execute @var{command}.
+
+The redirection argument @var{command} is actually an @code{awk}
+expression. Its value is converted to a string, whose contents give the
+shell command to be run.
+
+For example, this produces two files, one unsorted list of BBS names
+and one list sorted in reverse alphabetical order:
+
+@example
+awk '@{ print $1 > "names.unsorted"
+ command = "sort -r > names.sorted"
+ print $1 | command @}' BBS-list
+@end example
+
+Here the unsorted list is written with an ordinary redirection while
+the sorted list is written by piping through the @code{sort} utility.
+
+This example uses redirection to mail a message to a mailing
+list @samp{bug-system}. This might be useful when trouble is encountered
+in an @code{awk} script run periodically for system maintenance.
+
+@example
+report = "mail bug-system"
+print "Awk script failed:", $0 | report
+m = ("at record number " FNR " of " FILENAME)
+print m | report
+close(report)
+@end example
+
+The message is built using string concatenation and saved in the variable
+@code{m}. It is then sent down the pipeline to the @code{mail} program.
+
+We call the @code{close} function here because it's a good idea to close
+the pipe as soon as all the intended output has been sent to it.
+@xref{Close Files And Pipes, ,Closing Input and Output Files and Pipes},
+for more information
+on this. This example also illustrates the use of a variable to represent
+a @var{file} or @var{command}: it is not necessary to always
+use a string constant. Using a variable is generally a good idea,
+since @code{awk} requires you to spell the string value identically
+every time.
+@end table
+
+Redirecting output using @samp{>}, @samp{>>}, or @samp{|} asks the system
+to open a file or pipe only if the particular @var{file} or @var{command}
+you've specified has not already been written to by your program, or if
+it has been closed since it was last written to.
+
+@cindex differences between @code{gawk} and @code{awk}
+@cindex limitations
+@cindex implementation limits
+Many @code{awk} implementations limit the number of pipelines an @code{awk}
+program may have open to just one! In @code{gawk}, there is no such limit.
+You can open as many pipelines as the underlying operating system will
+permit.
+
+@node Special Files, Close Files And Pipes , Redirection, Printing
+@section Special File Names in @code{gawk}
+@cindex standard input
+@cindex standard output
+@cindex standard error output
+@cindex file descriptors
+
+Running programs conventionally have three input and output streams
+already available to them for reading and writing. These are known as
+the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error
+output}. These streams are, by default, connected to your terminal, but
+they are often redirected with the shell, via the @samp{<}, @samp{<<},
+@samp{>}, @samp{>>}, @samp{>&} and @samp{|} operators. Standard error
+is typically used for writing error messages; the reason we have two separate
+streams, standard output and standard error, is so that they can be
+redirected separately.
+
+@cindex differences between @code{gawk} and @code{awk}
+In other implementations of @code{awk}, the only way to write an error
+message to standard error in an @code{awk} program is as follows:
+
+@example
+print "Serious error detected!" | "cat 1>&2"
+@end example
+
+@noindent
+This works by opening a pipeline to a shell command which can access the
+standard error stream which it inherits from the @code{awk} process.
+This is far from elegant, and is also inefficient, since it requires a
+separate process. So people writing @code{awk} programs often
+neglect to do this. Instead, they send the error messages to the
+terminal, like this:
+
+@example
+@group
+print "Serious error detected!" > "/dev/tty"
+@end group
+@end example
+
+@noindent
+This usually has the same effect, but not always: although the
+standard error stream is usually the terminal, it can be redirected, and
+when that happens, writing to the terminal is not correct. In fact, if
+@code{awk} is run from a background job, it may not have a terminal at all.
+Then opening @file{/dev/tty} will fail.
+
+@code{gawk} provides special file names for accessing the three standard
+streams. When you redirect input or output in @code{gawk}, if the file name
+matches one of these special names, then @code{gawk} directly uses the
+stream it stands for.
+
+@cindex @file{/dev/stdin}
+@cindex @file{/dev/stdout}
+@cindex @file{/dev/stderr}
+@cindex @file{/dev/fd}
+@c @cartouche
+@table @file
+@item /dev/stdin
+The standard input (file descriptor 0).
+
+@item /dev/stdout
+The standard output (file descriptor 1).
+
+@item /dev/stderr
+The standard error output (file descriptor 2).
+
+@item /dev/fd/@var{N}
+The file associated with file descriptor @var{N}. Such a file must have
+been opened by the program initiating the @code{awk} execution (typically
+the shell). Unless you take special pains in the shell from which
+you invoke @code{gawk}, only descriptors 0, 1 and 2 are available.
+@end table
+@c @end cartouche
+
+The file names @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
+are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2},
+respectively, but they are more self-explanatory.
+
+The proper way to write an error message in a @code{gawk} program
+is to use @file{/dev/stderr}, like this:
+
+@example
+print "Serious error detected!" > "/dev/stderr"
+@end example
+
+@code{gawk} also provides special file names that give access to information
+about the running @code{gawk} process. Each of these ``files'' provides
+a single record of information. To read them more than once, you must
+first close them with the @code{close} function
+(@pxref{Close Files And Pipes, ,Closing Input and Output Files and Pipes}).
+The filenames are:
+
+@cindex process information
+@cindex @file{/dev/pid}
+@cindex @file{/dev/pgrpid}
+@cindex @file{/dev/ppid}
+@cindex @file{/dev/user}
+@c @cartouche
+@table @file
+@item /dev/pid
+Reading this file returns the process ID of the current process,
+in decimal, terminated with a newline.
+
+@item /dev/ppid
+Reading this file returns the parent process ID of the current process,
+in decimal, terminated with a newline.
+
+@item /dev/pgrpid
+Reading this file returns the process group ID of the current process,
+in decimal, terminated with a newline.
+
+@item /dev/user
+Reading this file returns a single record terminated with a newline.
+The fields are separated with spaces. The fields represent the
+following information:
+
+@table @code
+@item $1
+The return value of the @code{getuid} system call
+(the real user ID number).
+
+@item $2
+The return value of the @code{geteuid} system call
+(the effective user ID number).
+
+@item $3
+The return value of the @code{getgid} system call
+(the real group ID number).
+
+@item $4
+The return value of the @code{getegid} system call
+(the effective group ID number).
+@end table
+
+If there are any additional fields, they are the group IDs returned by
+@code{getgroups} system call.
+(Multiple groups may not be supported on all systems.)
+@end table
+@c @end cartouche
+
+These special file names may be used on the command line as data
+files, as well as for I/O redirections within an @code{awk} program.
+They may not be used as source files with the @samp{-f} option.
+
+Recognition of these special file names is disabled if @code{gawk} is in
+compatibility mode (@pxref{Options, ,Command Line Options}).
+
+@strong{Caution}: Unless your system actually has a @file{/dev/fd} directory
+(or any of the other above listed special files),
+the interpretation of these file names is done by @code{gawk} itself.
+For example, using @samp{/dev/fd/4} for output will actually write on
+file descriptor 4, and not on a new file descriptor that was @code{dup}'ed
+from file descriptor 4. Most of the time this does not matter; however, it
+is important to @emph{not} close any of the files related to file descriptors
+0, 1, and 2. If you do close one of these files, unpredictable behavior
+will result.
+
+The special files that provide process-related information may disappear
+in a future version of @code{gawk}.
+@xref{Future Extensions, ,Probable Future Extensions}.
+
+@node Close Files And Pipes, , Special Files, Printing
+@section Closing Input and Output Files and Pipes
+@cindex closing input files and pipes
+@cindex closing output files and pipes
+@findex close
+
+If the same file name or the same shell command is used with
+@code{getline}
+(@pxref{Getline, ,Explicit Input with @code{getline}})
+more than once during the execution of an @code{awk}
+program, the file is opened (or the command is executed) only the first time.
+At that time, the first record of input is read from that file or command.
+The next time the same file or command is used in @code{getline}, another
+record is read from it, and so on.
+
+Similarly, when a file or pipe is opened for output, the file name or command
+associated with
+it is remembered by @code{awk} and subsequent writes to the same file or
+command are appended to the previous writes. The file or pipe stays
+open until @code{awk} exits.
+
+This implies that if you want to start reading the same file again from
+the beginning, or if you want to rerun a shell command (rather than
+reading more output from the command), you must take special steps.
+What you must do is use the @code{close} function, as follows:
+
+@example
+close(@var{filename})
+@end example
+
+@noindent
+or
+
+@example
+close(@var{command})
+@end example
+
+The argument @var{filename} or @var{command} can be any expression. Its
+value must @emph{exactly} match the string that was used to open the file or
+start the command (spaces and other ``irrelevant'' characters
+included). For example, if you open a pipe with this:
+
+@example
+"sort -r names" | getline foo
+@end example
+
+@noindent
+then you must close it with this:
+
+@example
+close("sort -r names")
+@end example
+
+Once this function call is executed, the next @code{getline} from that
+file or command, or the next @code{print} or @code{printf} to that
+file or command, will reopen the file or rerun the command.
+
+Because the expression that you use to close a file or pipeline must
+exactly match the expression used to open the file or run the command,
+it is good practice to use a variable to store the file name or command.
+The previous example would become
+
+@example
+sortcom = "sort -r names"
+sortcom | getline foo
+@dots{}
+close(sortcom)
+@end example
+
+@noindent
+This helps avoid hard-to-find typographical errors in your @code{awk}
+programs.
+
+Here are some reasons why you might need to close an output file:
+
+@itemize @bullet
+@item
+To write a file and read it back later on in the same @code{awk}
+program. Close the file when you are finished writing it; then
+you can start reading it with @code{getline}.
+
+@item
+To write numerous files, successively, in the same @code{awk}
+program. If you don't close the files, eventually you may exceed a
+system limit on the number of open files in one process. So close
+each one when you are finished writing it.
+
+@item
+To make a command finish. When you redirect output through a pipe,
+the command reading the pipe normally continues to try to read input
+as long as the pipe is open. Often this means the command cannot
+really do its work until the pipe is closed. For example, if you
+redirect output to the @code{mail} program, the message is not
+actually sent until the pipe is closed.
+
+@item
+To run the same program a second time, with the same arguments.
+This is not the same thing as giving more input to the first run!
+
+For example, suppose you pipe output to the @code{mail} program. If you
+output several lines redirected to this pipe without closing it, they make
+a single message of several lines. By contrast, if you close the pipe
+after each line of output, then each line makes a separate message.
+@end itemize
+
+@vindex ERRNO
+@cindex differences between @code{gawk} and @code{awk}
+@code{close} returns a value of zero if the close succeeded.
+Otherwise, the value will be non-zero.
+In this case, @code{gawk} sets the variable @code{ERRNO} to a string
+describing the error that occurred.
+
+@cindex differences between @code{gawk} and @code{awk}
+@cindex portability issues
+If you use more files than the system allows you to have open,
+@code{gawk} will attempt to multiplex the available open files among
+your data files. @code{gawk}'s ability to do this depends upon the
+facilities of your operating system: it may not always work. It is
+therefore both good practice and good portability advice to always
+use @code{close} on your files when you are done with them.
+
+@node Expressions, Patterns and Actions, Printing, Top
+@chapter Expressions
+@cindex expression
+
+Expressions are the basic building blocks of @code{awk} patterns
+and actions. An expression evaluates to a value, which you can print, test,
+store in a variable or pass to a function. Additionally, an expression
+can assign a new value to a variable or a field, with an assignment operator.
+
+An expression can serve as a pattern or action statement on its own.
+Most other kinds of
+statements contain one or more expressions which specify data on which to
+operate. As in other languages, expressions in @code{awk} include
+variables, array references, constants, and function calls, as well as
+combinations of these with various operators.
+
+@menu
+* Constants:: String, numeric, and regexp constants.
+* Using Constant Regexps:: When and how to use a regexp constant.
+* Variables:: Variables give names to values for later use.
+* Conversion:: The conversion of strings to numbers and vice
+ versa.
+* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-},
+ etc.)
+* Concatenation:: Concatenating strings.
+* Assignment Ops:: Changing the value of a variable or a field.
+* Increment Ops:: Incrementing the numeric value of a variable.
+* Truth Values:: What is ``true'' and what is ``false''.
+* Typing and Comparison:: How variables acquire types, and how this
+ affects comparison of numbers and strings with
+ @samp{<}, etc.
+* Boolean Ops:: Combining comparison expressions using boolean
+ operators @samp{||} (``or''), @samp{&&}
+ (``and'') and @samp{!} (``not'').
+* Conditional Exp:: Conditional expressions select between two
+ subexpressions under control of a third
+ subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+@end menu
+
+@node Constants, Using Constant Regexps, Expressions, Expressions
+@section Constant Expressions
+@cindex constants, types of
+@cindex string constants
+
+The simplest type of expression is the @dfn{constant}, which always has
+the same value. There are three types of constants: numeric constants,
+string constants, and regular expression constants.
+
+@menu
+* Scalar Constants:: Numeric and string constants.
+* Regexp Constants:: Regular Expression constants.
+@end menu
+
+@node Scalar Constants, Regexp Constants, Constants, Constants
+@subsection Numeric and String Constants
+
+@cindex numeric constant
+@cindex numeric value
+A @dfn{numeric constant} stands for a number. This number can be an
+integer, a decimal fraction, or a number in scientific (exponential)
+notation.@footnote{The internal representation uses double-precision
+floating point numbers. If you don't know what that means, then don't
+worry about it.} Here are some examples of numeric constants, which all
+have the same value:
+
+@example
+105
+1.05e+2
+1050e-1
+@end example
+
+A string constant consists of a sequence of characters enclosed in
+double-quote marks. For example:
+
+@example
+"parrot"
+@end example
+
+@noindent
+@cindex differences between @code{gawk} and @code{awk}
+represents the string whose contents are @samp{parrot}. Strings in
+@code{gawk} can be of any length and they can contain any of the possible
+eight-bit ASCII characters including ASCII NUL (character code zero).
+Other @code{awk}
+implementations may have difficulty with some character codes.
+
+@node Regexp Constants, , Scalar Constants, Constants
+@subsection Regular Expression Constants
+
+@cindex @code{~} operator
+@cindex @code{!~} operator
+A regexp constant is a regular expression description enclosed in
+slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in
+@code{awk} programs are constant, but the @samp{~} and @samp{!~}
+matching operators can also match computed or ``dynamic'' regexps
+(which are just ordinary strings or variables that contain a regexp).
+
+@node Using Constant Regexps, Variables, Constants, Expressions
+@section Using Regular Expression Constants
+
+When used on the right hand side of the @samp{~} or @samp{!~}
+operators, a regexp constant merely stands for the regexp that is to be
+matched.
+
+@cindex dark corner
+Regexp constants (such as @code{/foo/}) may be used like simple expressions.
+When a
+regexp constant appears by itself, it has the same meaning as if it appeared
+in a pattern, i.e.@: @samp{($0 ~ /foo/)} (d.c.)
+(@pxref{Expression Patterns, ,Expressions as Patterns}).
+This means that the two code segments,
+
+@example
+if ($0 ~ /barfly/ || $0 ~ /camelot/)
+ print "found"
+@end example
+
+@noindent
+and
+
+@example
+if (/barfly/ || /camelot/)
+ print "found"
+@end example
+
+@noindent
+are exactly equivalent.
+
+One rather bizarre consequence of this rule is that the following
+boolean expression is valid, but does not do what the user probably
+intended:
+
+@example
+# note that /foo/ is on the left of the ~
+if (/foo/ ~ $1) print "found foo"
+@end example
+
+@noindent
+This code is ``obviously'' testing @code{$1} for a match against the regexp
+@code{/foo/}. But in fact, the expression @samp{/foo/ ~ $1} actually means
+@samp{($0 ~ /foo/) ~ $1}. In other words, first match the input record
+against the regexp @code{/foo/}. The result will be either zero or one,
+depending upon the success or failure of the match. Then match that result
+against the first field in the record.
+
+Since it is unlikely that you would ever really wish to make this kind of
+test, @code{gawk} will issue a warning when it sees this construct in
+a program.
+
+Another consequence of this rule is that the assignment statement
+
+@example
+matches = /foo/
+@end example
+
+@noindent
+will assign either zero or one to the variable @code{matches}, depending
+upon the contents of the current input record.
+
+This feature of the language was never well documented until the
+POSIX specification.
+
+@cindex differences between @code{gawk} and @code{awk}
+@cindex dark corner
+Constant regular expressions are also used as the first argument for
+the @code{gensub}, @code{sub} and @code{gsub} functions, and as the
+second argument of the @code{match} function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+Modern implementations of @code{awk}, including @code{gawk}, allow
+the third argument of @code{split} to be a regexp constant, while some
+older implementations do not (d.c.).
+
+This can lead to confusion when attempting to use regexp constants
+as arguments to user defined functions
+(@pxref{User-defined, , User-defined Functions}).
+For example:
+
+@example
+function mysub(pat, repl, str, global)
+@{
+ if (global)
+ gsub(pat, repl, str)
+ else
+ sub(pat, repl, str)
+ return str
+@}
+
+@{
+ @dots{}
+ text = "hi! hi yourself!"
+ mysub(/hi/, "howdy", text, 1)
+ @dots{}
+@}
+@end example
+
+In this example, the programmer wishes to pass a regexp constant to the
+user-defined function @code{mysub}, which will in turn pass it on to
+either @code{sub} or @code{gsub}. However, what really happens is that
+the @code{pat} parameter will be either one or zero, depending upon whether
+or not @code{$0} matches @code{/hi/}.
+
+As it is unlikely that you would ever really wish to pass a truth value
+in this way, @code{gawk} will issue a warning when it sees a regexp
+constant used as a parameter to a user-defined function.
+
+@node Variables, Conversion, Using Constant Regexps, Expressions
+@section Variables
+
+Variables are ways of storing values at one point in your program for
+use later in another part of your program. You can manipulate them
+entirely within your program text, and you can also assign values to
+them on the @code{awk} command line.
+
+@menu
+* Using Variables:: Using variables in your programs.
+* Assignment Options:: Setting variables on the command line and a
+ summary of command line syntax. This is an
+ advanced method of input.
+@end menu
+
+@node Using Variables, Assignment Options, Variables, Variables
+@subsection Using Variables in a Program
+
+@cindex variables, user-defined
+@cindex user-defined variables
+Variables let you give names to values and refer to them later. You have
+already seen variables in many of the examples. The name of a variable
+must be a sequence of letters, digits and underscores, but it may not begin
+with a digit. Case is significant in variable names; @code{a} and @code{A}
+are distinct variables.
+
+A variable name is a valid expression by itself; it represents the
+variable's current value. Variables are given new values with
+@dfn{assignment operators}, @dfn{increment operators} and
+@dfn{decrement operators}.
+@xref{Assignment Ops, ,Assignment Expressions}.
+
+A few variables have special built-in meanings, such as @code{FS}, the
+field separator, and @code{NF}, the number of fields in the current
+input record. @xref{Built-in Variables}, for a list of them. These
+built-in variables can be used and assigned just like all other
+variables, but their values are also used or changed automatically by
+@code{awk}. All built-in variables names are entirely upper-case.
+
+Variables in @code{awk} can be assigned either numeric or string
+values. By default, variables are initialized to the empty string, which
+is zero if converted to a number. There is no need to
+``initialize'' each variable explicitly in @code{awk},
+the way you would in C and in most other traditional languages.
+
+@node Assignment Options, , Using Variables, Variables
+@subsection Assigning Variables on the Command Line
+
+You can set any @code{awk} variable by including a @dfn{variable assignment}
+among the arguments on the command line when you invoke @code{awk}
+(@pxref{Other Arguments, ,Other Command Line Arguments}). Such an assignment has
+this form:
+
+@example
+@var{variable}=@var{text}
+@end example
+
+@noindent
+With it, you can set a variable either at the beginning of the
+@code{awk} run or in between input files.
+
+If you precede the assignment with the @samp{-v} option, like this:
+
+@example
+-v @var{variable}=@var{text}
+@end example
+
+@noindent
+then the variable is set at the very beginning, before even the
+@code{BEGIN} rules are run. The @samp{-v} option and its assignment
+must precede all the file name arguments, as well as the program text.
+(@xref{Options, ,Command Line Options}, for more information about
+the @samp{-v} option.)
+
+Otherwise, the variable assignment is performed at a time determined by
+its position among the input file arguments: after the processing of the
+preceding input file argument. For example:
+
+@example
+awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list
+@end example
+
+@noindent
+prints the value of field number @code{n} for all input records. Before
+the first file is read, the command line sets the variable @code{n}
+equal to four. This causes the fourth field to be printed in lines from
+the file @file{inventory-shipped}. After the first file has finished,
+but before the second file is started, @code{n} is set to two, so that the
+second field is printed in lines from @file{BBS-list}.
+
+@example
+@group
+$ awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list
+@print{} 15
+@print{} 24
+@dots{}
+@print{} 555-5553
+@print{} 555-3412
+@dots{}
+@end group
+@end example
+
+Command line arguments are made available for explicit examination by
+the @code{awk} program in an array named @code{ARGV}
+(@pxref{ARGC and ARGV, ,Using @code{ARGC} and @code{ARGV}}).
+
+@cindex dark corner
+@code{awk} processes the values of command line assignments for escape
+sequences (d.c.) (@pxref{Escape Sequences}).
+
+@node Conversion, Arithmetic Ops, Variables, Expressions
+@section Conversion of Strings and Numbers
+
+@cindex conversion of strings and numbers
+Strings are converted to numbers, and numbers to strings, if the context
+of the @code{awk} program demands it. For example, if the value of
+either @code{foo} or @code{bar} in the expression @samp{foo + bar}
+happens to be a string, it is converted to a number before the addition
+is performed. If numeric values appear in string concatenation, they
+are converted to strings. Consider this:
+
+@example
+two = 2; three = 3
+print (two three) + 4
+@end example
+
+@noindent
+This prints the (numeric) value 27. The numeric values of
+the variables @code{two} and @code{three} are converted to strings and
+concatenated together, and the resulting string is converted back to the
+number 23, to which four is then added.
+
+@cindex null string
+@cindex empty string
+@cindex type conversion
+If, for some reason, you need to force a number to be converted to a
+string, concatenate the empty string, @code{""}, with that number.
+To force a string to be converted to a number, add zero to that string.
+
+A string is converted to a number by interpreting any numeric prefix
+of the string as numerals:
+@code{"2.5"} converts to 2.5, @code{"1e3"} converts to 1000, and @code{"25fix"}
+has a numeric value of 25.
+Strings that can't be interpreted as valid numbers are converted to
+zero.
+
+@vindex CONVFMT
+The exact manner in which numbers are converted into strings is controlled
+by the @code{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables}).
+Numbers are converted using the @code{sprintf} function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation})
+with @code{CONVFMT} as the format
+specifier.
+
+@code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with
+at least six significant digits. For some applications you will want to
+change it to specify more precision. Double precision on most modern
+machines gives you 16 or 17 decimal digits of precision.
+
+Strange results can happen if you set @code{CONVFMT} to a string that doesn't
+tell @code{sprintf} how to format floating point numbers in a useful way.
+For example, if you forget the @samp{%} in the format, all numbers will be
+converted to the same constant string.
+
+@cindex dark corner
+As a special case, if a number is an integer, then the result of converting
+it to a string is @emph{always} an integer, no matter what the value of
+@code{CONVFMT} may be. Given the following code fragment:
+
+@example
+CONVFMT = "%2.2f"
+a = 12
+b = a ""
+@end example
+
+@noindent
+@code{b} has the value @code{"12"}, not @code{"12.00"} (d.c.).
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@vindex OFMT
+Prior to the POSIX standard, @code{awk} specified that the value
+of @code{OFMT} was used for converting numbers to strings. @code{OFMT}
+specifies the output format to use when printing numbers with @code{print}.
+@code{CONVFMT} was introduced in order to separate the semantics of
+conversion from the semantics of printing. Both @code{CONVFMT} and
+@code{OFMT} have the same default value: @code{"%.6g"}. In the vast majority
+of cases, old @code{awk} programs will not change their behavior.
+However, this use of @code{OFMT} is something to keep in mind if you must
+port your program to other implementations of @code{awk}; we recommend
+that instead of changing your programs, you just port @code{gawk} itself!
+@xref{Print, ,The @code{print} Statement},
+for more information on the @code{print} statement.
+
+@node Arithmetic Ops, Concatenation, Conversion, Expressions
+@section Arithmetic Operators
+@cindex arithmetic operators
+@cindex operators, arithmetic
+@cindex addition
+@cindex subtraction
+@cindex multiplication
+@cindex division
+@cindex remainder
+@cindex quotient
+@cindex exponentiation
+
+The @code{awk} language uses the common arithmetic operators when
+evaluating expressions. All of these arithmetic operators follow normal
+precedence rules, and work as you would expect them to.
+
+Here is a file @file{grades} containing a list of student names and
+three test scores per student (it's a small class):
+
+@example
+Pat 100 97 58
+Sandy 84 72 93
+Chris 72 92 89
+@end example
+
+@noindent
+This programs takes the file @file{grades}, and prints the average
+of the scores.
+
+@example
+$ awk '@{ sum = $2 + $3 + $4 ; avg = sum / 3
+> print $1, avg @}' grades
+@print{} Pat 85
+@print{} Sandy 83
+@print{} Chris 84.3333
+@end example
+
+This table lists the arithmetic operators in @code{awk}, in order from
+highest precedence to lowest:
+
+@c sigh. this seems necessary
+@iftex
+@page
+@end iftex
+@c @cartouche
+@table @code
+@item - @var{x}
+Negation.
+
+@item + @var{x}
+Unary plus. The expression is converted to a number.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@item @var{x} ^ @var{y}
+@itemx @var{x} ** @var{y}
+Exponentiation: @var{x} raised to the @var{y} power. @samp{2 ^ 3} has
+the value eight. The character sequence @samp{**} is equivalent to
+@samp{^}. (The POSIX standard only specifies the use of @samp{^}
+for exponentiation.)
+
+@item @var{x} * @var{y}
+Multiplication.
+
+@item @var{x} / @var{y}
+Division. Since all numbers in @code{awk} are
+real numbers, the result is not rounded to an integer: @samp{3 / 4}
+has the value 0.75.
+
+@item @var{x} % @var{y}
+@cindex differences between @code{gawk} and @code{awk}
+Remainder. The quotient is rounded toward zero to an integer,
+multiplied by @var{y} and this result is subtracted from @var{x}.
+This operation is sometimes known as ``trunc-mod.'' The following
+relation always holds:
+
+@example
+b * int(a / b) + (a % b) == a
+@end example
+
+One possibly undesirable effect of this definition of remainder is that
+@code{@var{x} % @var{y}} is negative if @var{x} is negative. Thus,
+
+@example
+-17 % 8 = -1
+@end example
+
+In other @code{awk} implementations, the signedness of the remainder
+may be machine dependent.
+@c !!! what does posix say?
+
+@item @var{x} + @var{y}
+Addition.
+
+@item @var{x} - @var{y}
+Subtraction.
+@end table
+@c @end cartouche
+
+For maximum portability, do not use the @samp{**} operator.
+
+Unary plus and minus have the same precedence,
+the multiplication operators all have the same precedence, and
+addition and subtraction have the same precedence.
+
+@node Concatenation, Assignment Ops, Arithmetic Ops, Expressions
+@section String Concatenation
+
+@cindex string operators
+@cindex operators, string
+@cindex concatenation
+There is only one string operation: concatenation. It does not have a
+specific operator to represent it. Instead, concatenation is performed by
+writing expressions next to one another, with no operator. For example:
+
+@example
+@group
+$ awk '@{ print "Field number one: " $1 @}' BBS-list
+@print{} Field number one: aardvark
+@print{} Field number one: alpo-net
+@dots{}
+@end group
+@end example
+
+Without the space in the string constant after the @samp{:}, the line
+would run together. For example:
+
+@example
+@group
+$ awk '@{ print "Field number one:" $1 @}' BBS-list
+@print{} Field number one:aardvark
+@print{} Field number one:alpo-net
+@dots{}
+@end group
+@end example
+
+Since string concatenation does not have an explicit operator, it is
+often necessary to insure that it happens where you want it to by
+using parentheses to enclose
+the items to be concatenated. For example, the
+following code fragment does not concatenate @code{file} and @code{name}
+as you might expect:
+
+@example
+file = "file"
+name = "name"
+print "something meaningful" > file name
+@end example
+
+@noindent
+It is necessary to use the following:
+
+@example
+print "something meaningful" > (file name)
+@end example
+
+We recommend that you use parentheses around concatenation in all but the
+most common contexts (such as on the right-hand side of @samp{=}).
+
+@node Assignment Ops, Increment Ops, Concatenation, Expressions
+@section Assignment Expressions
+@cindex assignment operators
+@cindex operators, assignment
+@cindex expression, assignment
+
+An @dfn{assignment} is an expression that stores a new value into a
+variable. For example, let's assign the value one to the variable
+@code{z}:
+
+@example
+z = 1
+@end example
+
+After this expression is executed, the variable @code{z} has the value one.
+Whatever old value @code{z} had before the assignment is forgotten.
+
+Assignments can store string values also. For example, this would store
+the value @code{"this food is good"} in the variable @code{message}:
+
+@example
+thing = "food"
+predicate = "good"
+message = "this " thing " is " predicate
+@end example
+
+@noindent
+(This also illustrates string concatenation.)
+
+The @samp{=} sign is called an @dfn{assignment operator}. It is the
+simplest assignment operator because the value of the right-hand
+operand is stored unchanged.
+
+@cindex side effect
+Most operators (addition, concatenation, and so on) have no effect
+except to compute a value. If you ignore the value, you might as well
+not use the operator. An assignment operator is different; it does
+produce a value, but even if you ignore the value, the assignment still
+makes itself felt through the alteration of the variable. We call this
+a @dfn{side effect}.
+
+@cindex lvalue
+@cindex rvalue
+The left-hand operand of an assignment need not be a variable
+(@pxref{Variables}); it can also be a field
+(@pxref{Changing Fields, ,Changing the Contents of a Field}) or
+an array element (@pxref{Arrays, ,Arrays in @code{awk}}).
+These are all called @dfn{lvalues},
+which means they can appear on the left-hand side of an assignment operator.
+The right-hand operand may be any expression; it produces the new value
+which the assignment stores in the specified variable, field or array
+element. (Such values are called @dfn{rvalues}).
+
+@cindex types of variables
+It is important to note that variables do @emph{not} have permanent types.
+The type of a variable is simply the type of whatever value it happens
+to hold at the moment. In the following program fragment, the variable
+@code{foo} has a numeric value at first, and a string value later on:
+
+@example
+foo = 1
+print foo
+foo = "bar"
+print foo
+@end example
+
+@noindent
+When the second assignment gives @code{foo} a string value, the fact that
+it previously had a numeric value is forgotten.
+
+String values that do not begin with a digit have a numeric value of
+zero. After executing this code, the value of @code{foo} is five:
+
+@example
+foo = "a string"
+foo = foo + 5
+@end example
+
+@noindent
+(Note that using a variable as a number and then later as a string can
+be confusing and is poor programming style. The above examples illustrate how
+@code{awk} works, @emph{not} how you should write your own programs!)
+
+An assignment is an expression, so it has a value: the same value that
+is assigned. Thus, @samp{z = 1} as an expression has the value one.
+One consequence of this is that you can write multiple assignments together:
+
+@example
+x = y = z = 0
+@end example
+
+@noindent
+stores the value zero in all three variables. It does this because the
+value of @samp{z = 0}, which is zero, is stored into @code{y}, and then
+the value of @samp{y = z = 0}, which is zero, is stored into @code{x}.
+
+You can use an assignment anywhere an expression is called for. For
+example, it is valid to write @samp{x != (y = 1)} to set @code{y} to one
+and then test whether @code{x} equals one. But this style tends to make
+programs hard to read; except in a one-shot program, you should
+not use such nesting of assignments.
+
+Aside from @samp{=}, there are several other assignment operators that
+do arithmetic with the old value of the variable. For example, the
+operator @samp{+=} computes a new value by adding the right-hand value
+to the old value of the variable. Thus, the following assignment adds
+five to the value of @code{foo}:
+
+@example
+foo += 5
+@end example
+
+@noindent
+This is equivalent to the following:
+
+@example
+foo = foo + 5
+@end example
+
+@noindent
+Use whichever one makes the meaning of your program clearer.
+
+There are situations where using @samp{+=} (or any assignment operator)
+is @emph{not} the same as simply repeating the left-hand operand in the
+right-hand expression. For example:
+
+@cindex Rankin, Pat
+@example
+@group
+# Thanks to Pat Rankin for this example
+BEGIN @{
+ foo[rand()] += 5
+ for (x in foo)
+ print x, foo[x]
+
+ bar[rand()] = bar[rand()] + 5
+ for (x in bar)
+ print x, bar[x]
+@}
+@end group
+@end example
+
+@noindent
+The indices of @code{bar} are guaranteed to be different, because
+@code{rand} will return different values each time it is called.
+(Arrays and the @code{rand} function haven't been covered yet.
+@xref{Arrays, ,Arrays in @code{awk}},
+and see @ref{Numeric Functions, ,Numeric Built-in Functions}, for more information).
+This example illustrates an important fact about the assignment
+operators: the left-hand expression is only evaluated @emph{once}.
+
+It is also up to the implementation as to which expression is evaluated
+first, the left-hand one or the right-hand one.
+Consider this example:
+
+@example
+i = 1
+a[i += 2] = i + 1
+@end example
+
+@noindent
+The value of @code{a[3]} could be either two or four.
+
+Here is a table of the arithmetic assignment operators. In each
+case, the right-hand operand is an expression whose value is converted
+to a number.
+
+@c @cartouche
+@table @code
+@item @var{lvalue} += @var{increment}
+Adds @var{increment} to the value of @var{lvalue} to make the new value
+of @var{lvalue}.
+
+@item @var{lvalue} -= @var{decrement}
+Subtracts @var{decrement} from the value of @var{lvalue}.
+
+@item @var{lvalue} *= @var{coefficient}
+Multiplies the value of @var{lvalue} by @var{coefficient}.
+
+@item @var{lvalue} /= @var{divisor}
+Divides the value of @var{lvalue} by @var{divisor}.
+
+@item @var{lvalue} %= @var{modulus}
+Sets @var{lvalue} to its remainder by @var{modulus}.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@item @var{lvalue} ^= @var{power}
+@itemx @var{lvalue} **= @var{power}
+Raises @var{lvalue} to the power @var{power}.
+(Only the @samp{^=} operator is specified by POSIX.)
+@end table
+@c @end cartouche
+
+For maximum portability, do not use the @samp{**=} operator.
+
+@node Increment Ops, Truth Values, Assignment Ops, Expressions
+@section Increment and Decrement Operators
+
+@cindex increment operators
+@cindex operators, increment
+@dfn{Increment} and @dfn{decrement operators} increase or decrease the value of
+a variable by one. You could do the same thing with an assignment operator, so
+the increment operators add no power to the @code{awk} language; but they
+are convenient abbreviations for very common operations.
+
+The operator to add one is written @samp{++}. It can be used to increment
+a variable either before or after taking its value.
+
+To pre-increment a variable @var{v}, write @samp{++@var{v}}. This adds
+one to the value of @var{v} and that new value is also the value of this
+expression. The assignment expression @samp{@var{v} += 1} is completely
+equivalent.
+
+Writing the @samp{++} after the variable specifies post-increment. This
+increments the variable value just the same; the difference is that the
+value of the increment expression itself is the variable's @emph{old}
+value. Thus, if @code{foo} has the value four, then the expression @samp{foo++}
+has the value four, but it changes the value of @code{foo} to five.
+
+The post-increment @samp{foo++} is nearly equivalent to writing @samp{(foo
++= 1) - 1}. It is not perfectly equivalent because all numbers in
+@code{awk} are floating point: in floating point, @samp{foo + 1 - 1} does
+not necessarily equal @code{foo}. But the difference is minute as
+long as you stick to numbers that are fairly small (less than 10e12).
+
+Any lvalue can be incremented. Fields and array elements are incremented
+just like variables. (Use @samp{$(i++)} when you wish to do a field reference
+and a variable increment at the same time. The parentheses are necessary
+because of the precedence of the field reference operator, @samp{$}.)
+
+@cindex decrement operators
+@cindex operators, decrement
+The decrement operator @samp{--} works just like @samp{++} except that
+it subtracts one instead of adding. Like @samp{++}, it can be used before
+the lvalue to pre-decrement or after it to post-decrement.
+
+Here is a summary of increment and decrement expressions.
+
+@c @cartouche
+@table @code
+@item ++@var{lvalue}
+This expression increments @var{lvalue} and the new value becomes the
+value of the expression.
+
+@item @var{lvalue}++
+This expression increments @var{lvalue}, but
+the value of the expression is the @emph{old} value of @var{lvalue}.
+
+@item --@var{lvalue}
+Like @samp{++@var{lvalue}}, but instead of adding, it subtracts. It
+decrements @var{lvalue} and delivers the value that results.
+
+@item @var{lvalue}--
+Like @samp{@var{lvalue}++}, but instead of adding, it subtracts. It
+decrements @var{lvalue}. The value of the expression is the @emph{old}
+value of @var{lvalue}.
+@end table
+@c @end cartouche
+
+@node Truth Values, Typing and Comparison, Increment Ops, Expressions
+@section True and False in @code{awk}
+@cindex truth values
+@cindex logical true
+@cindex logical false
+
+Many programming languages have a special representation for the concepts
+of ``true'' and ``false.'' Such languages usually use the special
+constants @code{true} and @code{false}, or perhaps their upper-case
+equivalents.
+
+@cindex null string
+@cindex empty string
+@code{awk} is different. It borrows a very simple concept of true and
+false from C. In @code{awk}, any non-zero numeric value, @emph{or} any
+non-empty string value is true. Any other value (zero or the null
+string, @code{""}) is false. The following program will print @samp{A strange
+truth value} three times:
+
+@example
+BEGIN @{
+ if (3.1415927)
+ print "A strange truth value"
+ if ("Four Score And Seven Years Ago")
+ print "A strange truth value"
+ if (j = 57)
+ print "A strange truth value"
+@}
+@end example
+
+@cindex dark corner
+There is a surprising consequence of the ``non-zero or non-null'' rule:
+The string constant @code{"0"} is actually true, since it is non-null (d.c.).
+
+@node Typing and Comparison, Boolean Ops, Truth Values, Expressions
+@section Variable Typing and Comparison Expressions
+@cindex comparison expressions
+@cindex expression, comparison
+@cindex expression, matching
+@cindex relational operators
+@cindex operators, relational
+@cindex regexp match/non-match operators
+@cindex variable typing
+@cindex types of variables
+
+@c 2e: consider splitting this section into subsections
+
+Unlike other programming languages, @code{awk} variables do not have a
+fixed type. Instead, they can be either a number or a string, depending
+upon the value that is assigned to them.
+
+@cindex numeric string
+The 1992 POSIX standard introduced
+the concept of a @dfn{numeric string}, which is simply a string that looks
+like a number, for example, @code{@w{" +2"}}. This concept is used
+for determining the type of a variable.
+
+The type of the variable is important, since the types of two variables
+determine how they are compared.
+
+In @code{gawk}, variable typing follows these rules.
+
+@enumerate 1
+@item
+A numeric literal or the result of a numeric operation has the @var{numeric}
+attribute.
+
+@item
+A string literal or the result of a string operation has the @var{string}
+attribute.
+
+@item
+Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements,
+@code{ENVIRON} elements and the
+elements of an array created by @code{split} that are numeric strings
+have the @var{strnum} attribute. Otherwise, they have the @var{string}
+attribute.
+Uninitialized variables also have the @var{strnum} attribute.
+
+@item
+Attributes propagate across assignments, but are not changed by
+any use.
+@c (Although a use may cause the entity to acquire an additional
+@c value such that it has both a numeric and string value -- this leaves the
+@c attribute unchanged.)
+@c This is important but not relevant
+@end enumerate
+
+The last rule is particularly important. In the following program,
+@code{a} has numeric type, even though it is later used in a string
+operation.
+
+@example
+BEGIN @{
+ a = 12.345
+ b = a " is a cute number"
+ print b
+@}
+@end example
+
+When two operands are compared, either string comparison or numeric comparison
+may be used, depending on the attributes of the operands, according to the
+following, symmetric, matrix:
+
+@c thanks to Karl Berry, kb@cs.umb.edu, for major help with TeX tables
+@tex
+\centerline{
+\vbox{\bigskip % space above the table (about 1 linespace)
+% Because we have vertical rules, we can't let TeX insert interline space
+% in its usual way.
+\offinterlineskip
+%
+% Define the table template. & separates columns, and \cr ends the
+% template (and each row). # is replaced by the text of that entry on
+% each row. The template for the first column breaks down like this:
+% \strut -- a way to make each line have the height and depth
+% of a normal line of type, since we turned off interline spacing.
+% \hfil -- infinite glue; has the effect of right-justifying in this case.
+% # -- replaced by the text (for instance, `STRNUM', in the last row).
+% \quad -- about the width of an `M'. Just separates the columns.
+%
+% The second column (\vrule#) is what generates the vertical rule that
+% spans table rows.
+%
+% The doubled && before the next entry means `repeat the following
+% template as many times as necessary on each line' -- in our case, twice.
+%
+% The template itself, \quad#\hfil, left-justifies with a little space before.
+%
+\halign{\strut\hfil#\quad&\vrule#&&\quad#\hfil\cr
+ &&STRING &NUMERIC &STRNUM\cr
+% The \omit tells TeX to skip inserting the template for this column on
+% this particular row. In this case, we only want a little extra space
+% to separate the heading row from the rule below it. the depth 2pt --
+% `\vrule depth 2pt' is that little space.
+\omit &depth 2pt\cr
+% This is the horizontal rule below the heading. Since it has nothing to
+% do with the columns of the table, we use \noalign to get it in there.
+\noalign{\hrule}
+% Like above, this time a little more space.
+\omit &depth 4pt\cr
+% The remaining rows have nothing special about them.
+STRING &&string &string &string\cr
+NUMERIC &&string &numeric &numeric\cr
+STRNUM &&string &numeric &numeric\cr
+}}}
+@end tex
+@ifinfo
+@display
+ +----------------------------------------------
+ | STRING NUMERIC STRNUM
+--------+----------------------------------------------
+ |
+STRING | string string string
+ |
+NUMERIC | string numeric numeric
+ |
+STRNUM | string numeric numeric
+--------+----------------------------------------------
+@end display
+@end ifinfo
+
+The basic idea is that user input that looks numeric, and @emph{only}
+user input, should be treated as numeric, even though it is actually
+made of characters, and is therefore also a string.
+
+@dfn{Comparison expressions} compare strings or numbers for
+relationships such as equality. They are written using @dfn{relational
+operators}, which are a superset of those in C. Here is a table of
+them:
+
+@cindex relational operators
+@cindex operators, relational
+@cindex @code{<} operator
+@cindex @code{<=} operator
+@cindex @code{>} operator
+@cindex @code{>=} operator
+@cindex @code{==} operator
+@cindex @code{!=} operator
+@cindex @code{~} operator
+@cindex @code{!~} operator
+@cindex @code{in} operator
+@c @cartouche
+@table @code
+@item @var{x} < @var{y}
+True if @var{x} is less than @var{y}.
+
+@item @var{x} <= @var{y}
+True if @var{x} is less than or equal to @var{y}.
+
+@item @var{x} > @var{y}
+True if @var{x} is greater than @var{y}.
+
+@item @var{x} >= @var{y}
+True if @var{x} is greater than or equal to @var{y}.
+
+@item @var{x} == @var{y}
+True if @var{x} is equal to @var{y}.
+
+@item @var{x} != @var{y}
+True if @var{x} is not equal to @var{y}.
+
+@item @var{x} ~ @var{y}
+True if the string @var{x} matches the regexp denoted by @var{y}.
+
+@item @var{x} !~ @var{y}
+True if the string @var{x} does not match the regexp denoted by @var{y}.
+
+@item @var{subscript} in @var{array}
+True if the array @var{array} has an element with the subscript @var{subscript}.
+@end table
+@c @end cartouche
+
+Comparison expressions have the value one if true and zero if false.
+
+When comparing operands of mixed types, numeric operands are converted
+to strings using the value of @code{CONVFMT}
+(@pxref{Conversion, ,Conversion of Strings and Numbers}).
+
+Strings are compared
+by comparing the first character of each, then the second character of each,
+and so on. Thus @code{"10"} is less than @code{"9"}. If there are two
+strings where one is a prefix of the other, the shorter string is less than
+the longer one. Thus @code{"abc"} is less than @code{"abcd"}.
+
+@cindex common mistakes
+@cindex mistakes, common
+@cindex errors, common
+It is very easy to accidentally mistype the @samp{==} operator, and
+leave off one of the @samp{=}s. The result is still valid @code{awk}
+code, but the program will not do what you mean:
+
+@example
+if (a = b) # oops! should be a == b
+ @dots{}
+else
+ @dots{}
+@end example
+
+@noindent
+Unless @code{b} happens to be zero or the null string, the @code{if}
+part of the test will always succeed. Because the operators are
+so similar, this kind of error is very difficult to spot when
+scanning the source code.
+
+Here are some sample expressions, how @code{gawk} compares them, and what
+the result of the comparison is.
+
+@table @code
+@item 1.5 <= 2.0
+numeric comparison (true)
+
+@item "abc" >= "xyz"
+string comparison (false)
+
+@item 1.5 != " +2"
+string comparison (true)
+
+@item "1e2" < "3"
+string comparison (true)
+
+@item a = 2; b = "2"
+@itemx a == b
+string comparison (true)
+
+@item a = 2; b = " +2"
+@itemx a == b
+string comparison (false)
+@end table
+
+In this example,
+
+@example
+@group
+$ echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}'
+@print{} false
+@end group
+@end example
+
+@noindent
+the result is @samp{false} since both @code{$1} and @code{$2} are numeric
+strings and thus both have the @var{strnum} attribute,
+dictating a numeric comparison.
+
+The purpose of the comparison rules and the use of numeric strings is
+to attempt to produce the behavior that is ``least surprising,'' while
+still ``doing the right thing.''
+
+@cindex comparisons, string vs. regexp
+@cindex string comparison vs. regexp comparison
+@cindex regexp comparison vs. string comparison
+String comparisons and regular expression comparisons are very different.
+For example,
+
+@example
+x == "foo"
+@end example
+
+@noindent
+has the value of one, or is true, if the variable @code{x}
+is precisely @samp{foo}. By contrast,
+
+@example
+x ~ /foo/
+@end example
+
+@noindent
+has the value one if @code{x} contains @samp{foo}, such as
+@code{"Oh, what a fool am I!"}.
+
+The right hand operand of the @samp{~} and @samp{!~} operators may be
+either a regexp constant (@code{/@dots{}/}), or an ordinary
+expression, in which case the value of the expression as a string is used as a
+dynamic regexp (@pxref{Regexp Usage, ,How to Use Regular Expressions}; also
+@pxref{Computed Regexps, ,Using Dynamic Regexps}).
+
+@cindex regexp as expression
+In recent implementations of @code{awk}, a constant regular
+expression in slashes by itself is also an expression. The regexp
+@code{/@var{regexp}/} is an abbreviation for this comparison expression:
+
+@example
+$0 ~ /@var{regexp}/
+@end example
+
+One special place where @code{/foo/} is @emph{not} an abbreviation for
+@samp{$0 ~ /foo/} is when it is the right-hand operand of @samp{~} or
+@samp{!~}!
+@xref{Using Constant Regexps, ,Using Regular Expression Constants},
+where this is discussed in more detail.
+
+@c This paragraph has been here since day 1, and has always bothered
+@c me, especially since the expression doesn't really make a lot of
+@c sense. So, just take it out.
+@ignore
+In some contexts it may be necessary to write parentheses around the
+regexp to avoid confusing the @code{gawk} parser. For example,
+@samp{(/x/ - /y/) > threshold} is not allowed, but @samp{((/x/) - (/y/))
+> threshold} parses properly.
+@end ignore
+
+@node Boolean Ops, Conditional Exp, Typing and Comparison, Expressions
+@section Boolean Expressions
+@cindex expression, boolean
+@cindex boolean expressions
+@cindex operators, boolean
+@cindex boolean operators
+@cindex logical operations
+@cindex operations, logical
+@cindex short-circuit operators
+@cindex operators, short-circuit
+@cindex and operator
+@cindex or operator
+@cindex not operator
+@cindex @code{&&} operator
+@cindex @code{||} operator
+@cindex @code{!} operator
+
+A @dfn{boolean expression} is a combination of comparison expressions or
+matching expressions, using the boolean operators ``or''
+(@samp{||}), ``and'' (@samp{&&}), and ``not'' (@samp{!}), along with
+parentheses to control nesting. The truth value of the boolean expression is
+computed by combining the truth values of the component expressions.
+Boolean expressions are also referred to as @dfn{logical expressions}.
+The terms are equivalent.
+
+Boolean expressions can be used wherever comparison and matching
+expressions can be used. They can be used in @code{if}, @code{while},
+@code{do} and @code{for} statements
+(@pxref{Statements, ,Control Statements in Actions}).
+They have numeric values (one if true, zero if false), which come into play
+if the result of the boolean expression is stored in a variable, or
+used in arithmetic.
+
+In addition, every boolean expression is also a valid pattern, so
+you can use one as a pattern to control the execution of rules.
+
+Here are descriptions of the three boolean operators, with examples.
+
+@c @cartouche
+@table @code
+@item @var{boolean1} && @var{boolean2}
+True if both @var{boolean1} and @var{boolean2} are true. For example,
+the following statement prints the current input record if it contains
+both @samp{2400} and @samp{foo}.
+
+@example
+if ($0 ~ /2400/ && $0 ~ /foo/) print
+@end example
+
+The subexpression @var{boolean2} is evaluated only if @var{boolean1}
+is true. This can make a difference when @var{boolean2} contains
+expressions that have side effects: in the case of @samp{$0 ~ /foo/ &&
+($2 == bar++)}, the variable @code{bar} is not incremented if there is
+no @samp{foo} in the record.
+
+@item @var{boolean1} || @var{boolean2}
+True if at least one of @var{boolean1} or @var{boolean2} is true.
+For example, the following statement prints all records in the input
+that contain @emph{either} @samp{2400} or
+@samp{foo}, or both.
+
+@example
+if ($0 ~ /2400/ || $0 ~ /foo/) print
+@end example
+
+The subexpression @var{boolean2} is evaluated only if @var{boolean1}
+is false. This can make a difference when @var{boolean2} contains
+expressions that have side effects.
+
+@item ! @var{boolean}
+True if @var{boolean} is false. For example, the following program prints
+all records in the input file @file{BBS-list} that do @emph{not} contain the
+string @samp{foo}.
+
+@c A better example would be `if (! (subscript in array)) ...' but we
+@c haven't done anything with arrays or `in' yet. Sigh.
+@example
+awk '@{ if (! ($0 ~ /foo/)) print @}' BBS-list
+@end example
+@end table
+@c @end cartouche
+
+The @samp{&&} and @samp{||} operators are called @dfn{short-circuit}
+operators because of the way they work. Evaluation of the full expression
+is ``short-circuited'' if the result can be determined part way through
+its evaluation.
+
+@cindex line continuation
+You can continue a statement that uses @samp{&&} or @samp{||} simply
+by putting a newline after them. But you cannot put a newline in front
+of either of these operators without using backslash continuation
+(@pxref{Statements/Lines, ,@code{awk} Statements Versus Lines}).
+
+The actual value of an expression using the @samp{!} operator will be
+either one or zero, depending upon the truth value of the expression it
+is applied to.
+
+The @samp{!} operator is often useful for changing the sense of a flag
+variable from false to true and back again. For example, the following
+program is one way to print lines in between special bracketing lines:
+
+@example
+$1 == "START" @{ interested = ! interested @}
+interested == 1 @{ print @}
+$1 == "END" @{ interested = ! interested @}
+@end example
+
+@noindent
+The variable @code{interested}, like all @code{awk} variables, starts
+out initialized to zero, which is also false. When a line is seen whose
+first field is @samp{START}, the value of @code{interested} is toggled
+to true, using @samp{!}. The next rule prints lines as long as
+@code{interested} is true. When a line is seen whose first field is
+@samp{END}, @code{interested} is toggled back to false.
+@ignore
+We should discuss using `next' in the two rules that toggle the
+variable, to avoid printing the bracketing lines, but that's more
+distraction than really needed.
+@end ignore
+
+@node Conditional Exp, Function Calls, Boolean Ops, Expressions
+@section Conditional Expressions
+@cindex conditional expression
+@cindex expression, conditional
+
+A @dfn{conditional expression} is a special kind of expression with
+three operands. It allows you to use one expression's value to select
+one of two other expressions.
+
+The conditional expression is the same as in the C language:
+
+@example
+@var{selector} ? @var{if-true-exp} : @var{if-false-exp}
+@end example
+
+@noindent
+There are three subexpressions. The first, @var{selector}, is always
+computed first. If it is ``true'' (not zero and not null) then
+@var{if-true-exp} is computed next and its value becomes the value of
+the whole expression. Otherwise, @var{if-false-exp} is computed next
+and its value becomes the value of the whole expression.
+
+For example, this expression produces the absolute value of @code{x}:
+
+@example
+x > 0 ? x : -x
+@end example
+
+Each time the conditional expression is computed, exactly one of
+@var{if-true-exp} and @var{if-false-exp} is computed; the other is ignored.
+This is important when the expressions contain side effects. For example,
+this conditional expression examines element @code{i} of either array
+@code{a} or array @code{b}, and increments @code{i}.
+
+@example
+x == y ? a[i++] : b[i++]
+@end example
+
+@noindent
+This is guaranteed to increment @code{i} exactly once, because each time
+only one of the two increment expressions is executed,
+and the other is not.
+@xref{Arrays, ,Arrays in @code{awk}},
+for more information about arrays.
+
+@cindex differences between @code{gawk} and @code{awk}
+@cindex line continuation
+As a minor @code{gawk} extension,
+you can continue a statement that uses @samp{?:} simply
+by putting a newline after either character.
+However, you cannot put a newline in front
+of either character without using backslash continuation
+(@pxref{Statements/Lines, ,@code{awk} Statements Versus Lines}).
+
+@node Function Calls, Precedence, Conditional Exp, Expressions
+@section Function Calls
+@cindex function call
+@cindex calling a function
+
+A @dfn{function} is a name for a particular calculation. Because it has
+a name, you can ask for it by name at any point in the program. For
+example, the function @code{sqrt} computes the square root of a number.
+
+A fixed set of functions are @dfn{built-in}, which means they are
+available in every @code{awk} program. The @code{sqrt} function is one
+of these. @xref{Built-in, ,Built-in Functions}, for a list of built-in
+functions and their descriptions. In addition, you can define your own
+functions for use in your program.
+@xref{User-defined, ,User-defined Functions}, for how to do this.
+
+@cindex arguments in function call
+The way to use a function is with a @dfn{function call} expression,
+which consists of the function name followed immediately by a list of
+@dfn{arguments} in parentheses. The arguments are expressions which
+provide the raw materials for the function's calculations.
+When there is more than one argument, they are separated by commas. If
+there are no arguments, write just @samp{()} after the function name.
+Here are some examples:
+
+@example
+sqrt(x^2 + y^2) @i{one argument}
+atan2(y, x) @i{two arguments}
+rand() @i{no arguments}
+@end example
+
+@strong{Do not put any space between the function name and the
+open-parenthesis!} A user-defined function name looks just like the name of
+a variable, and space would make the expression look like concatenation
+of a variable with an expression inside parentheses. Space before the
+parenthesis is harmless with built-in functions, but it is best not to get
+into the habit of using space to avoid mistakes with user-defined
+functions.
+
+Each function expects a particular number of arguments. For example, the
+@code{sqrt} function must be called with a single argument, the number
+to take the square root of:
+
+@example
+sqrt(@var{argument})
+@end example
+
+Some of the built-in functions allow you to omit the final argument.
+If you do so, they use a reasonable default.
+@xref{Built-in, ,Built-in Functions}, for full details. If arguments
+are omitted in calls to user-defined functions, then those arguments are
+treated as local variables, initialized to the empty string
+(@pxref{User-defined, ,User-defined Functions}).
+
+Like every other expression, the function call has a value, which is
+computed by the function based on the arguments you give it. In this
+example, the value of @samp{sqrt(@var{argument})} is the square root of
+@var{argument}. A function can also have side effects, such as assigning
+values to certain variables or doing I/O.
+
+Here is a command to read numbers, one number per line, and print the
+square root of each one:
+
+@example
+@group
+$ awk '@{ print "The square root of", $1, "is", sqrt($1) @}'
+1
+@print{} The square root of 1 is 1
+3
+@print{} The square root of 3 is 1.73205
+5
+@print{} The square root of 5 is 2.23607
+@kbd{Control-d}
+@end group
+@end example
+
+@node Precedence, , Function Calls, Expressions
+@section Operator Precedence (How Operators Nest)
+@cindex precedence
+@cindex operator precedence
+
+@dfn{Operator precedence} determines how operators are grouped, when
+different operators appear close by in one expression. For example,
+@samp{*} has higher precedence than @samp{+}; thus, @samp{a + b * c}
+means to multiply @code{b} and @code{c}, and then add @code{a} to the
+product (i.e.@: @samp{a + (b * c)}).
+
+You can overrule the precedence of the operators by using parentheses.
+You can think of the precedence rules as saying where the
+parentheses are assumed to be if you do not write parentheses yourself. In
+fact, it is wise to always use parentheses whenever you have an unusual
+combination of operators, because other people who read the program may
+not remember what the precedence is in this case. You might forget,
+too; then you could make a mistake. Explicit parentheses will help prevent
+any such mistake.
+
+When operators of equal precedence are used together, the leftmost
+operator groups first, except for the assignment, conditional and
+exponentiation operators, which group in the opposite order.
+Thus, @samp{a - b + c} groups as @samp{(a - b) + c}, and
+@samp{a = b = c} groups as @samp{a = (b = c)}.
+
+The precedence of prefix unary operators does not matter as long as only
+unary operators are involved, because there is only one way to interpret
+them---innermost first. Thus, @samp{$++i} means @samp{$(++i)} and
+@samp{++$x} means @samp{++($x)}. However, when another operator follows
+the operand, then the precedence of the unary operators can matter.
+Thus, @samp{$x^2} means @samp{($x)^2}, but @samp{-x^2} means
+@samp{-(x^2)}, because @samp{-} has lower precedence than @samp{^}
+while @samp{$} has higher precedence.
+
+Here is a table of @code{awk}'s operators, in order from highest
+precedence to lowest:
+
+@c use @code in the items, looks better in TeX w/o all the quotes
+@table @code
+@item (@dots{})
+Grouping.
+
+@item $
+Field.
+
+@item ++ --
+Increment, decrement.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@item ^ **
+Exponentiation. These operators group right-to-left.
+(The @samp{**} operator is not specified by POSIX.)
+
+@item + - !
+Unary plus, minus, logical ``not''.
+
+@item * / %
+Multiplication, division, modulus.
+
+@item + -
+Addition, subtraction.
+
+@item @r{Concatenation}
+No special token is used to indicate concatenation.
+The operands are simply written side by side.
+
+@item < <= == !=
+@itemx > >= >> |
+Relational, and redirection.
+The relational operators and the redirections have the same precedence
+level. Characters such as @samp{>} serve both as relationals and as
+redirections; the context distinguishes between the two meanings.
+
+Note that the I/O redirection operators in @code{print} and @code{printf}
+statements belong to the statement level, not to expressions. The
+redirection does not produce an expression which could be the operand of
+another operator. As a result, it does not make sense to use a
+redirection operator near another operator of lower precedence, without
+parentheses. Such combinations, for example @samp{print foo > a ? b : c},
+result in syntax errors.
+The correct way to write this statement is @samp{print foo > (a ? b : c)}.
+
+@item ~ !~
+Matching, non-matching.
+
+@item in
+Array membership.
+
+@item &&
+Logical ``and''.
+
+@item ||
+Logical ``or''.
+
+@item ?:
+Conditional. This operator groups right-to-left.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@item = += -= *=
+@itemx /= %= ^= **=
+Assignment. These operators group right-to-left.
+(The @samp{**=} operator is not specified by POSIX.)
+@end table
+
+@node Patterns and Actions, Statements, Expressions, Top
+@chapter Patterns and Actions
+@cindex pattern, definition of
+
+As you have already seen, each @code{awk} statement consists of
+a pattern with an associated action. This chapter describes how
+you build patterns and actions.
+
+@menu
+* Pattern Overview:: What goes into a pattern.
+* Action Overview:: What goes into an action.
+@end menu
+
+@node Pattern Overview, Action Overview, Patterns and Actions, Patterns and Actions
+@section Pattern Elements
+
+Patterns in @code{awk} control the execution of rules: a rule is
+executed when its pattern matches the current input record. This
+section explains all about how to write patterns.
+
+@menu
+* Kinds of Patterns:: A list of all kinds of patterns.
+* Regexp Patterns:: Using regexps as patterns.
+* Expression Patterns:: Any expression can be used as a pattern.
+* Ranges:: Pairs of patterns specify record ranges.
+* BEGIN/END:: Specifying initialization and cleanup rules.
+* Empty:: The empty pattern, which matches every record.
+@end menu
+
+@node Kinds of Patterns, Regexp Patterns, Pattern Overview, Pattern Overview
+@subsection Kinds of Patterns
+@cindex patterns, types of
+
+Here is a summary of the types of patterns supported in @code{awk}.
+
+@table @code
+@item /@var{regular expression}/
+A regular expression as a pattern. It matches when the text of the
+input record fits the regular expression.
+(@xref{Regexp, ,Regular Expressions}.)
+
+@item @var{expression}
+A single expression. It matches when its value
+is non-zero (if a number) or non-null (if a string).
+(@xref{Expression Patterns, ,Expressions as Patterns}.)
+
+@item @var{pat1}, @var{pat2}
+A pair of patterns separated by a comma, specifying a range of records.
+The range includes both the initial record that matches @var{pat1}, and
+the final record that matches @var{pat2}.
+(@xref{Ranges, ,Specifying Record Ranges with Patterns}.)
+
+@item BEGIN
+@itemx END
+Special patterns for you to supply start-up or clean-up actions for your
+@code{awk} program.
+(@xref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}.)
+
+@item @var{empty}
+The empty pattern matches every input record.
+(@xref{Empty, ,The Empty Pattern}.)
+@end table
+
+@node Regexp Patterns, Expression Patterns, Kinds of Patterns, Pattern Overview
+@subsection Regular Expressions as Patterns
+
+We have been using regular expressions as patterns since our early examples.
+This kind of pattern is simply a regexp constant in the pattern part of
+a rule. Its meaning is @samp{$0 ~ /@var{pattern}/}.
+The pattern matches when the input record matches the regexp.
+For example:
+
+@example
+/foo|bar|baz/ @{ buzzwords++ @}
+END @{ print buzzwords, "buzzwords seen" @}
+@end example
+
+@node Expression Patterns, Ranges, Regexp Patterns, Pattern Overview
+@subsection Expressions as Patterns
+
+Any @code{awk} expression is valid as an @code{awk} pattern.
+Then the pattern matches if the expression's value is non-zero (if a
+number) or non-null (if a string).
+
+The expression is reevaluated each time the rule is tested against a new
+input record. If the expression uses fields such as @code{$1}, the
+value depends directly on the new input record's text; otherwise, it
+depends only on what has happened so far in the execution of the
+@code{awk} program, but that may still be useful.
+
+A very common kind of expression used as a pattern is the comparison
+expression, using the comparison operators described in
+@ref{Typing and Comparison, ,Variable Typing and Comparison Expressions}.
+
+Regexp matching and non-matching are also very common expressions.
+The left operand of the @samp{~} and @samp{!~} operators is a string.
+The right operand is either a constant regular expression enclosed in
+slashes (@code{/@var{regexp}/}), or any expression, whose string value
+is used as a dynamic regular expression
+(@pxref{Computed Regexps, , Using Dynamic Regexps}).
+
+The following example prints the second field of each input record
+whose first field is precisely @samp{foo}.
+
+@example
+$ awk '$1 == "foo" @{ print $2 @}' BBS-list
+@end example
+
+@noindent
+(There is no output, since there is no BBS site named ``foo''.)
+Contrast this with the following regular expression match, which would
+accept any record with a first field that contains @samp{foo}:
+
+@example
+@group
+$ awk '$1 ~ /foo/ @{ print $2 @}' BBS-list
+@print{} 555-1234
+@print{} 555-6699
+@print{} 555-6480
+@print{} 555-2127
+@end group
+@end example
+
+Boolean expressions are also commonly used as patterns.
+Whether the pattern
+matches an input record depends on whether its subexpressions match.
+
+For example, the following command prints all records in
+@file{BBS-list} that contain both @samp{2400} and @samp{foo}.
+
+@example
+$ awk '/2400/ && /foo/' BBS-list
+@print{} fooey 555-1234 2400/1200/300 B
+@end example
+
+The following command prints all records in
+@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo}, or
+both.
+
+@example
+@group
+$ awk '/2400/ || /foo/' BBS-list
+@print{} alpo-net 555-3412 2400/1200/300 A
+@print{} bites 555-1675 2400/1200/300 A
+@print{} fooey 555-1234 2400/1200/300 B
+@print{} foot 555-6699 1200/300 B
+@print{} macfoo 555-6480 1200/300 A
+@print{} sdace 555-3430 2400/1200/300 A
+@print{} sabafoo 555-2127 1200/300 C
+@end group
+@end example
+
+The following command prints all records in
+@file{BBS-list} that do @emph{not} contain the string @samp{foo}.
+
+@example
+@group
+$ awk '! /foo/' BBS-list
+@print{} aardvark 555-5553 1200/300 B
+@print{} alpo-net 555-3412 2400/1200/300 A
+@print{} barfly 555-7685 1200/300 A
+@print{} bites 555-1675 2400/1200/300 A
+@print{} camelot 555-0542 300 C
+@print{} core 555-2912 1200/300 C
+@print{} sdace 555-3430 2400/1200/300 A
+@end group
+@end example
+
+The subexpressions of a boolean operator in a pattern can be constant regular
+expressions, comparisons, or any other @code{awk} expressions. Range
+patterns are not expressions, so they cannot appear inside boolean
+patterns. Likewise, the special patterns @code{BEGIN} and @code{END},
+which never match any input record, are not expressions and cannot
+appear inside boolean patterns.
+
+A regexp constant as a pattern is also a special case of an expression
+pattern. @code{/foo/} as an expression has the value one if @samp{foo}
+appears in the current input record; thus, as a pattern, @code{/foo/}
+matches any record containing @samp{foo}.
+
+@node Ranges, BEGIN/END, Expression Patterns, Pattern Overview
+@subsection Specifying Record Ranges with Patterns
+
+@cindex range pattern
+@cindex pattern, range
+@cindex matching ranges of lines
+A @dfn{range pattern} is made of two patterns separated by a comma, of
+the form @samp{@var{begpat}, @var{endpat}}. It matches ranges of
+consecutive input records. The first pattern, @var{begpat}, controls
+where the range begins, and the second one, @var{endpat}, controls where
+it ends. For example,
+
+@example
+awk '$1 == "on", $1 == "off"'
+@end example
+
+@noindent
+prints every record between @samp{on}/@samp{off} pairs, inclusive.
+
+A range pattern starts out by matching @var{begpat}
+against every input record; when a record matches @var{begpat}, the
+range pattern becomes @dfn{turned on}. The range pattern matches this
+record. As long as it stays turned on, it automatically matches every
+input record read. It also matches @var{endpat} against
+every input record; when that succeeds, the range pattern is turned
+off again for the following record. Then it goes back to checking
+@var{begpat} against each record.
+
+The record that turns on the range pattern and the one that turns it
+off both match the range pattern. If you don't want to operate on
+these records, you can write @code{if} statements in the rule's action
+to distinguish them from the records you are interested in.
+
+It is possible for a pattern to be turned both on and off by the same
+record, if the record satisfies both conditions. Then the action is
+executed for just that record.
+
+For example, suppose you have text between two identical markers (say
+the @samp{%} symbol) that you wish to ignore. You might try to
+combine a range pattern that describes the delimited text with the
+@code{next} statement
+(not discussed yet, @pxref{Next Statement, , The @code{next} Statement}),
+which causes @code{awk} to skip any further processing of the current
+record and start over again with the next input record. Such a program
+would like this:
+
+@example
+/^%$/,/^%$/ @{ next @}
+ @{ print @}
+@end example
+
+@noindent
+@cindex skipping lines between markers
+This program fails because the range pattern is both turned on and turned off
+by the first line with just a @samp{%} on it. To accomplish this task, you
+must write the program this way, using a flag:
+
+@example
+/^%$/ @{ skip = ! skip; next @}
+skip == 1 @{ next @} # skip lines with `skip' set
+@end example
+
+Note that in a range pattern, the @samp{,} has the lowest precedence
+(is evaluated last) of all the operators. Thus, for example, the
+following program attempts to combine a range pattern with another,
+simpler test.
+
+@example
+echo Yes | awk '/1/,/2/ || /Yes/'
+@end example
+
+The author of this program intended it to mean @samp{(/1/,/2/) || /Yes/}.
+However, @code{awk} interprets this as @samp{/1/, (/2/ || /Yes/)}.
+This cannot be changed or worked around; range patterns do not combine
+with other patterns.
+
+@node BEGIN/END, Empty, Ranges, Pattern Overview
+@subsection The @code{BEGIN} and @code{END} Special Patterns
+
+@cindex @code{BEGIN} special pattern
+@cindex pattern, @code{BEGIN}
+@cindex @code{END} special pattern
+@cindex pattern, @code{END}
+@code{BEGIN} and @code{END} are special patterns. They are not used to
+match input records. Rather, they supply start-up or
+clean-up actions for your @code{awk} script.
+
+@menu
+* Using BEGIN/END:: How and why to use BEGIN/END rules.
+* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
+@end menu
+
+@node Using BEGIN/END, I/O And BEGIN/END, BEGIN/END, BEGIN/END
+@subsubsection Startup and Cleanup Actions
+
+A @code{BEGIN} rule is executed, once, before the first input record
+has been read. An @code{END} rule is executed, once, after all the
+input has been read. For example:
+
+@example
+@group
+$ awk '
+> BEGIN @{ print "Analysis of \"foo\"" @}
+> /foo/ @{ ++n @}
+> END @{ print "\"foo\" appears " n " times." @}' BBS-list
+@print{} Analysis of "foo"
+@print{} "foo" appears 4 times.
+@end group
+@end example
+
+This program finds the number of records in the input file @file{BBS-list}
+that contain the string @samp{foo}. The @code{BEGIN} rule prints a title
+for the report. There is no need to use the @code{BEGIN} rule to
+initialize the counter @code{n} to zero, as @code{awk} does this
+automatically (@pxref{Variables}).
+
+The second rule increments the variable @code{n} every time a
+record containing the pattern @samp{foo} is read. The @code{END} rule
+prints the value of @code{n} at the end of the run.
+
+The special patterns @code{BEGIN} and @code{END} cannot be used in ranges
+or with boolean operators (indeed, they cannot be used with any operators).
+
+An @code{awk} program may have multiple @code{BEGIN} and/or @code{END}
+rules. They are executed in the order they appear, all the @code{BEGIN}
+rules at start-up and all the @code{END} rules at termination.
+@code{BEGIN} and @code{END} rules may be intermixed with other rules.
+This feature was added in the 1987 version of @code{awk}, and is included
+in the POSIX standard. The original (1978) version of @code{awk}
+required you to put the @code{BEGIN} rule at the beginning of the
+program, and the @code{END} rule at the end, and only allowed one of
+each. This is no longer required, but it is a good idea in terms of
+program organization and readability.
+
+Multiple @code{BEGIN} and @code{END} rules are useful for writing
+library functions, since each library file can have its own @code{BEGIN} and/or
+@code{END} rule to do its own initialization and/or cleanup. Note that
+the order in which library functions are named on the command line
+controls the order in which their @code{BEGIN} and @code{END} rules are
+executed. Therefore you have to be careful to write such rules in
+library files so that the order in which they are executed doesn't matter.
+@xref{Options, ,Command Line Options}, for more information on
+using library functions.
+@xref{Library Functions, ,A Library of @code{awk} Functions},
+for a number of useful library functions.
+
+@cindex dark corner
+If an @code{awk} program only has a @code{BEGIN} rule, and no other
+rules, then the program exits after the @code{BEGIN} rule has been run.
+(The original version of @code{awk} used to keep reading and ignoring input
+until end of file was seen.) However, if an @code{END} rule exists,
+then the input will be read, even if there are no other rules in
+the program. This is necessary in case the @code{END} rule checks the
+@code{FNR} and @code{NR} variables (d.c.).
+
+@code{BEGIN} and @code{END} rules must have actions; there is no default
+action for these rules since there is no current record when they run.
+
+@node I/O And BEGIN/END, , Using BEGIN/END, BEGIN/END
+@subsubsection Input/Output from @code{BEGIN} and @code{END} Rules
+
+@cindex I/O from @code{BEGIN} and @code{END}
+There are several (sometimes subtle) issues involved when doing I/O
+from a @code{BEGIN} or @code{END} rule.
+
+The first has to do with the value of @code{$0} in a @code{BEGIN}
+rule. Since @code{BEGIN} rules are executed before any input is read,
+there simply is no input record, and therefore no fields, when
+executing @code{BEGIN} rules. References to @code{$0} and the fields
+yield a null string or zero, depending upon the context. One way
+to give @code{$0} a real value is to execute a @code{getline} command
+without a variable (@pxref{Getline, ,Explicit Input with @code{getline}}).
+Another way is to simply assign a value to it.
+
+@cindex differences between @code{gawk} and @code{awk}
+The second point is similar to the first, but from the other direction.
+Inside an @code{END} rule, what is the value of @code{$0} and @code{NF}?
+Traditionally, due largely to implementation issues, @code{$0} and
+@code{NF} were @emph{undefined} inside an @code{END} rule.
+The POSIX standard specified that @code{NF} was available in an @code{END}
+rule, containing the number of fields from the last input record.
+Due most probably to an oversight, the standard does not say that @code{$0}
+is also preserved, although logically one would think that it should be.
+In fact, @code{gawk} does preserve the value of @code{$0} for use in
+@code{END} rules. Be aware, however, that Unix @code{awk}, and possibly
+other implementations, do not.
+
+The third point follows from the first two. What is the meaning of
+@samp{print} inside a @code{BEGIN} or @code{END} rule? The meaning is
+the same as always, @samp{print $0}. If @code{$0} is the null string,
+then this prints an empty line. Many long time @code{awk} programmers
+use @samp{print} in @code{BEGIN} and @code{END} rules, to mean
+@samp{@w{print ""}}, relying on @code{$0} being null. While you might
+generally get away with this in @code{BEGIN} rules, in @code{gawk} at
+least, it is a very bad idea in @code{END} rules. It is also poor
+style, since if you want an empty line in the output, you
+should say so explicitly in your program.
+
+@node Empty, , BEGIN/END, Pattern Overview
+@subsection The Empty Pattern
+
+@cindex empty pattern
+@cindex pattern, empty
+An empty (i.e.@: non-existent) pattern is considered to match @emph{every}
+input record. For example, the program:
+
+@example
+awk '@{ print $1 @}' BBS-list
+@end example
+
+@noindent
+prints the first field of every record.
+
+@node Action Overview, , Pattern Overview, Patterns and Actions
+@section Overview of Actions
+@cindex action, definition of
+@cindex curly braces
+@cindex action, curly braces
+@cindex action, separating statements
+
+An @code{awk} program or script consists of a series of
+rules and function definitions, interspersed. (Functions are
+described later. @xref{User-defined, ,User-defined Functions}.)
+
+A rule contains a pattern and an action, either of which (but not
+both) may be
+omitted. The purpose of the @dfn{action} is to tell @code{awk} what to do
+once a match for the pattern is found. Thus, in outline, an @code{awk}
+program generally looks like this:
+
+@example
+@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
+@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
+@dots{}
+function @var{name}(@var{args}) @{ @dots{} @}
+@dots{}
+@end example
+
+An action consists of one or more @code{awk} @dfn{statements}, enclosed
+in curly braces (@samp{@{} and @samp{@}}). Each statement specifies one
+thing to be done. The statements are separated by newlines or
+semicolons.
+
+The curly braces around an action must be used even if the action
+contains only one statement, or even if it contains no statements at
+all. However, if you omit the action entirely, omit the curly braces as
+well. An omitted action is equivalent to @samp{@{ print $0 @}}.
+
+@example
+/foo/ @{ @} # match foo, do nothing - empty action
+/foo/ # match foo, print the record - omitted action
+@end example
+
+Here are the kinds of statements supported in @code{awk}:
+
+@itemize @bullet
+@item
+Expressions, which can call functions or assign values to variables
+(@pxref{Expressions}). Executing
+this kind of statement simply computes the value of the expression.
+This is useful when the expression has side effects
+(@pxref{Assignment Ops, ,Assignment Expressions}).
+
+@item
+Control statements, which specify the control flow of @code{awk}
+programs. The @code{awk} language gives you C-like constructs
+(@code{if}, @code{for}, @code{while}, and @code{do}) as well as a few
+special ones (@pxref{Statements, ,Control Statements in Actions}).
+
+@item
+Compound statements, which consist of one or more statements enclosed in
+curly braces. A compound statement is used in order to put several
+statements together in the body of an @code{if}, @code{while}, @code{do}
+or @code{for} statement.
+
+@item
+Input statements, using the @code{getline} command
+(@pxref{Getline, ,Explicit Input with @code{getline}}), the @code{next}
+statement (@pxref{Next Statement, ,The @code{next} Statement}),
+and the @code{nextfile} statement
+(@pxref{Nextfile Statement, ,The @code{nextfile} Statement}).
+
+@item
+Output statements, @code{print} and @code{printf}.
+@xref{Printing, ,Printing Output}.
+
+@item
+Deletion statements, for deleting array elements.
+@xref{Delete, ,The @code{delete} Statement}.
+@end itemize
+
+@iftex
+The next chapter covers control statements in detail.
+@end iftex
+
+@node Statements, Built-in Variables, Patterns and Actions, Top
+@chapter Control Statements in Actions
+@cindex control statement
+
+@dfn{Control statements} such as @code{if}, @code{while}, and so on
+control the flow of execution in @code{awk} programs. Most of the
+control statements in @code{awk} are patterned on similar statements in
+C.
+
+All the control statements start with special keywords such as @code{if}
+and @code{while}, to distinguish them from simple expressions.
+
+@cindex compound statement
+@cindex statement, compound
+Many control statements contain other statements; for example, the
+@code{if} statement contains another statement which may or may not be
+executed. The contained statement is called the @dfn{body}. If you
+want to include more than one statement in the body, group them into a
+single @dfn{compound statement} with curly braces, separating them with
+newlines or semicolons.
+
+@menu
+* If Statement:: Conditionally execute some @code{awk}
+ statements.
+* While Statement:: Loop until some condition is satisfied.
+* Do Statement:: Do specified action while looping until some
+ condition is satisfied.
+* For Statement:: Another looping statement, that provides
+ initialization and increment clauses.
+* Break Statement:: Immediately exit the innermost enclosing loop.
+* Continue Statement:: Skip to the end of the innermost enclosing
+ loop.
+* Next Statement:: Stop processing the current input record.
+* Nextfile Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of @code{awk}.
+@end menu
+
+@node If Statement, While Statement, Statements, Statements
+@section The @code{if}-@code{else} Statement
+
+@cindex @code{if}-@code{else} statement
+The @code{if}-@code{else} statement is @code{awk}'s decision-making
+statement. It looks like this:
+
+@example
+if (@var{condition}) @var{then-body} @r{[}else @var{else-body}@r{]}
+@end example
+
+@noindent
+The @var{condition} is an expression that controls what the rest of the
+statement will do. If @var{condition} is true, @var{then-body} is
+executed; otherwise, @var{else-body} is executed.
+The @code{else} part of the statement is
+optional. The condition is considered false if its value is zero or
+the null string, and true otherwise.
+
+Here is an example:
+
+@example
+if (x % 2 == 0)
+ print "x is even"
+else
+ print "x is odd"
+@end example
+
+In this example, if the expression @samp{x % 2 == 0} is true (that is,
+the value of @code{x} is evenly divisible by two), then the first @code{print}
+statement is executed, otherwise the second @code{print} statement is
+executed.
+
+If the @code{else} appears on the same line as @var{then-body}, and
+@var{then-body} is not a compound statement (i.e.@: not surrounded by
+curly braces), then a semicolon must separate @var{then-body} from
+@code{else}. To illustrate this, let's rewrite the previous example:
+
+@example
+if (x % 2 == 0) print "x is even"; else
+ print "x is odd"
+@end example
+
+@noindent
+If you forget the @samp{;}, @code{awk} won't be able to interpret the
+statement, and you will get a syntax error.
+
+We would not actually write this example this way, because a human
+reader might fail to see the @code{else} if it were not the first thing
+on its line.
+
+@node While Statement, Do Statement, If Statement, Statements
+@section The @code{while} Statement
+@cindex @code{while} statement
+@cindex loop
+@cindex body of a loop
+
+In programming, a @dfn{loop} means a part of a program that can
+be executed two or more times in succession.
+
+The @code{while} statement is the simplest looping statement in
+@code{awk}. It repeatedly executes a statement as long as a condition is
+true. It looks like this:
+
+@example
+while (@var{condition})
+ @var{body}
+@end example
+
+@noindent
+Here @var{body} is a statement that we call the @dfn{body} of the loop,
+and @var{condition} is an expression that controls how long the loop
+keeps running.
+
+The first thing the @code{while} statement does is test @var{condition}.
+If @var{condition} is true, it executes the statement @var{body}.
+@ifinfo
+(The @var{condition} is true when the value
+is not zero and not a null string.)
+@end ifinfo
+After @var{body} has been executed,
+@var{condition} is tested again, and if it is still true, @var{body} is
+executed again. This process repeats until @var{condition} is no longer
+true. If @var{condition} is initially false, the body of the loop is
+never executed, and @code{awk} continues with the statement following
+the loop.
+
+This example prints the first three fields of each record, one per line.
+
+@example
+awk '@{ i = 1
+ while (i <= 3) @{
+ print $i
+ i++
+ @}
+@}' inventory-shipped
+@end example
+
+@noindent
+Here the body of the loop is a compound statement enclosed in braces,
+containing two statements.
+
+The loop works like this: first, the value of @code{i} is set to one.
+Then, the @code{while} tests whether @code{i} is less than or equal to
+three. This is true when @code{i} equals one, so the @code{i}-th
+field is printed. Then the @samp{i++} increments the value of @code{i}
+and the loop repeats. The loop terminates when @code{i} reaches four.
+
+As you can see, a newline is not required between the condition and the
+body; but using one makes the program clearer unless the body is a
+compound statement or is very simple. The newline after the open-brace
+that begins the compound statement is not required either, but the
+program would be harder to read without it.
+
+@node Do Statement, For Statement, While Statement, Statements
+@section The @code{do}-@code{while} Statement
+
+The @code{do} loop is a variation of the @code{while} looping statement.
+The @code{do} loop executes the @var{body} once, and then repeats @var{body}
+as long as @var{condition} is true. It looks like this:
+
+@example
+do
+ @var{body}
+while (@var{condition})
+@end example
+
+Even if @var{condition} is false at the start, @var{body} is executed at
+least once (and only once, unless executing @var{body} makes
+@var{condition} true). Contrast this with the corresponding
+@code{while} statement:
+
+@example
+while (@var{condition})
+ @var{body}
+@end example
+
+@noindent
+This statement does not execute @var{body} even once if @var{condition}
+is false to begin with.
+
+Here is an example of a @code{do} statement:
+
+@example
+awk '@{ i = 1
+ do @{
+ print $0
+ i++
+ @} while (i <= 10)
+@}'
+@end example
+
+@noindent
+This program prints each input record ten times. It isn't a very
+realistic example, since in this case an ordinary @code{while} would do
+just as well. But this reflects actual experience; there is only
+occasionally a real use for a @code{do} statement.
+
+@node For Statement, Break Statement, Do Statement, Statements
+@section The @code{for} Statement
+@cindex @code{for} statement
+
+The @code{for} statement makes it more convenient to count iterations of a
+loop. The general form of the @code{for} statement looks like this:
+
+@example
+for (@var{initialization}; @var{condition}; @var{increment})
+ @var{body}
+@end example
+
+@noindent
+The @var{initialization}, @var{condition} and @var{increment} parts are
+arbitrary @code{awk} expressions, and @var{body} stands for any
+@code{awk} statement.
+
+The @code{for} statement starts by executing @var{initialization}.
+Then, as long
+as @var{condition} is true, it repeatedly executes @var{body} and then
+@var{increment}. Typically @var{initialization} sets a variable to
+either zero or one, @var{increment} adds one to it, and @var{condition}
+compares it against the desired number of iterations.
+
+Here is an example of a @code{for} statement:
+
+@example
+@group
+awk '@{ for (i = 1; i <= 3; i++)
+ print $i
+@}' inventory-shipped
+@end group
+@end example
+
+@noindent
+This prints the first three fields of each input record, one field per
+line.
+
+You cannot set more than one variable in the
+@var{initialization} part unless you use a multiple assignment statement
+such as @samp{x = y = 0}, which is possible only if all the initial values
+are equal. (But you can initialize additional variables by writing
+their assignments as separate statements preceding the @code{for} loop.)
+
+The same is true of the @var{increment} part; to increment additional
+variables, you must write separate statements at the end of the loop.
+The C compound expression, using C's comma operator, would be useful in
+this context, but it is not supported in @code{awk}.
+
+Most often, @var{increment} is an increment expression, as in the
+example above. But this is not required; it can be any expression
+whatever. For example, this statement prints all the powers of two
+between one and 100:
+
+@example
+for (i = 1; i <= 100; i *= 2)
+ print i
+@end example
+
+Any of the three expressions in the parentheses following the @code{for} may
+be omitted if there is nothing to be done there. Thus, @w{@samp{for (; x
+> 0;)}} is equivalent to @w{@samp{while (x > 0)}}. If the
+@var{condition} is omitted, it is treated as @var{true}, effectively
+yielding an @dfn{infinite loop} (i.e.@: a loop that will never
+terminate).
+
+In most cases, a @code{for} loop is an abbreviation for a @code{while}
+loop, as shown here:
+
+@example
+@var{initialization}
+while (@var{condition}) @{
+ @var{body}
+ @var{increment}
+@}
+@end example
+
+@noindent
+The only exception is when the @code{continue} statement
+(@pxref{Continue Statement, ,The @code{continue} Statement}) is used
+inside the loop; changing a @code{for} statement to a @code{while}
+statement in this way can change the effect of the @code{continue}
+statement inside the loop.
+
+There is an alternate version of the @code{for} loop, for iterating over
+all the indices of an array:
+
+@example
+for (i in array)
+ @var{do something with} array[i]
+@end example
+
+@noindent
+@xref{Scanning an Array, ,Scanning All Elements of an Array},
+for more information on this version of the @code{for} loop.
+
+The @code{awk} language has a @code{for} statement in addition to a
+@code{while} statement because often a @code{for} loop is both less work to
+type and more natural to think of. Counting the number of iterations is
+very common in loops. It can be easier to think of this counting as part
+of looping rather than as something to do inside the loop.
+
+The next section has more complicated examples of @code{for} loops.
+
+@node Break Statement, Continue Statement, For Statement, Statements
+@section The @code{break} Statement
+@cindex @code{break} statement
+@cindex loops, exiting
+
+The @code{break} statement jumps out of the innermost @code{for},
+@code{while}, or @code{do} loop that encloses it. The
+following example finds the smallest divisor of any integer, and also
+identifies prime numbers:
+
+@example
+awk '# find smallest divisor of num
+ @{ num = $1
+ for (div = 2; div*div <= num; div++)
+ if (num % div == 0)
+ break
+ if (num % div == 0)
+ printf "Smallest divisor of %d is %d\n", num, div
+ else
+ printf "%d is prime\n", num
+ @}'
+@end example
+
+When the remainder is zero in the first @code{if} statement, @code{awk}
+immediately @dfn{breaks out} of the containing @code{for} loop. This means
+that @code{awk} proceeds immediately to the statement following the loop
+and continues processing. (This is very different from the @code{exit}
+statement which stops the entire @code{awk} program.
+@xref{Exit Statement, ,The @code{exit} Statement}.)
+
+Here is another program equivalent to the previous one. It illustrates how
+the @var{condition} of a @code{for} or @code{while} could just as well be
+replaced with a @code{break} inside an @code{if}:
+
+@example
+@group
+awk '# find smallest divisor of num
+ @{ num = $1
+ for (div = 2; ; div++) @{
+ if (num % div == 0) @{
+ printf "Smallest divisor of %d is %d\n", num, div
+ break
+ @}
+ if (div*div > num) @{
+ printf "%d is prime\n", num
+ break
+ @}
+ @}
+@}'
+@end group
+@end example
+
+@cindex @code{break}, outside of loops
+@cindex historical features
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@cindex dark corner
+As described above, the @code{break} statement has no meaning when
+used outside the body of a loop. However, although it was never documented,
+historical implementations of @code{awk} have treated the @code{break}
+statement outside of a loop as if it were a @code{next} statement
+(@pxref{Next Statement, ,The @code{next} Statement}).
+Recent versions of Unix @code{awk} no longer allow this usage.
+@code{gawk} will support this use of @code{break} only if @samp{--traditional}
+has been specified on the command line
+(@pxref{Options, ,Command Line Options}).
+Otherwise, it will be treated as an error, since the POSIX standard
+specifies that @code{break} should only be used inside the body of a
+loop (d.c.).
+
+@node Continue Statement, Next Statement, Break Statement, Statements
+@section The @code{continue} Statement
+
+@cindex @code{continue} statement
+The @code{continue} statement, like @code{break}, is used only inside
+@code{for}, @code{while}, and @code{do} loops. It skips
+over the rest of the loop body, causing the next cycle around the loop
+to begin immediately. Contrast this with @code{break}, which jumps out
+of the loop altogether.
+
+@c The point of this program was to illustrate the use of continue with
+@c a while loop. But Karl Berry points out that that is done adequately
+@c below, and that this example is very un-awk-like. So for now, we'll
+@c omit it.
+@ignore
+In Texinfo source files, text that the author wishes to ignore can be
+enclosed between lines that start with @samp{@@ignore} and end with
+@samp{@@end ignore}. Here is a program that strips out lines between
+@samp{@@ignore} and @samp{@@end ignore} pairs.
+
+@example
+BEGIN @{
+ while (getline > 0) @{
+ if (/^@@ignore/)
+ ignoring = 1
+ else if (/^@@end[ \t]+ignore/) @{
+ ignoring = 0
+ continue
+ @}
+ if (ignoring)
+ continue
+ print
+ @}
+@}
+@end example
+
+When an @samp{@@ignore} is seen, the @code{ignoring} flag is set to one (true).
+When @samp{@@end ignore} is seen, the flag is reset to zero (false). As long
+as the flag is true, the input record is not printed, because the
+@code{continue} restarts the @code{while} loop, skipping over the @code{print}
+statement.
+
+@c Exercise!!!
+@c How could this program be written to make better use of the awk language?
+@end ignore
+
+The @code{continue} statement in a @code{for} loop directs @code{awk} to
+skip the rest of the body of the loop, and resume execution with the
+increment-expression of the @code{for} statement. The following program
+illustrates this fact:
+
+@example
+awk 'BEGIN @{
+ for (x = 0; x <= 20; x++) @{
+ if (x == 5)
+ continue
+ printf "%d ", x
+ @}
+ print ""
+@}'
+@end example
+
+@noindent
+This program prints all the numbers from zero to 20, except for five, for
+which the @code{printf} is skipped. Since the increment @samp{x++}
+is not skipped, @code{x} does not remain stuck at five. Contrast the
+@code{for} loop above with this @code{while} loop:
+
+@example
+awk 'BEGIN @{
+ x = 0
+ while (x <= 20) @{
+ if (x == 5)
+ continue
+ printf "%d ", x
+ x++
+ @}
+ print ""
+@}'
+@end example
+
+@noindent
+This program loops forever once @code{x} gets to five.
+
+@cindex @code{continue}, outside of loops
+@cindex historical features
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@cindex dark corner
+As described above, the @code{continue} statement has no meaning when
+used outside the body of a loop. However, although it was never documented,
+historical implementations of @code{awk} have treated the @code{continue}
+statement outside of a loop as if it were a @code{next} statement
+(@pxref{Next Statement, ,The @code{next} Statement}).
+Recent versions of Unix @code{awk} no longer allow this usage.
+@code{gawk} will support this use of @code{continue} only if
+@samp{--traditional} has been specified on the command line
+(@pxref{Options, ,Command Line Options}).
+Otherwise, it will be treated as an error, since the POSIX standard
+specifies that @code{continue} should only be used inside the body of a
+loop (d.c.).
+
+@node Next Statement, Nextfile Statement, Continue Statement, Statements
+@section The @code{next} Statement
+@cindex @code{next} statement
+
+The @code{next} statement forces @code{awk} to immediately stop processing
+the current record and go on to the next record. This means that no
+further rules are executed for the current record. The rest of the
+current rule's action is not executed either.
+
+Contrast this with the effect of the @code{getline} function
+(@pxref{Getline, ,Explicit Input with @code{getline}}). That too causes
+@code{awk} to read the next record immediately, but it does not alter the
+flow of control in any way. So the rest of the current action executes
+with a new input record.
+
+At the highest level, @code{awk} program execution is a loop that reads
+an input record and then tests each rule's pattern against it. If you
+think of this loop as a @code{for} statement whose body contains the
+rules, then the @code{next} statement is analogous to a @code{continue}
+statement: it skips to the end of the body of this implicit loop, and
+executes the increment (which reads another record).
+
+For example, if your @code{awk} program works only on records with four
+fields, and you don't want it to fail when given bad input, you might
+use this rule near the beginning of the program:
+
+@example
+@group
+NF != 4 @{
+ err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR)
+ print err > "/dev/stderr"
+ next
+@}
+@end group
+@end example
+
+@noindent
+so that the following rules will not see the bad record. The error
+message is redirected to the standard error output stream, as error
+messages should be. @xref{Special Files, ,Special File Names in @code{gawk}}.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+According to the POSIX standard, the behavior is undefined if
+the @code{next} statement is used in a @code{BEGIN} or @code{END} rule.
+@code{gawk} will treat it as a syntax error.
+Although POSIX permits it,
+some other @code{awk} implementations don't allow the @code{next}
+statement inside function bodies
+(@pxref{User-defined, ,User-defined Functions}).
+Just as any other @code{next} statement, a @code{next} inside a
+function body reads the next record and starts processing it with the
+first rule in the program.
+
+If the @code{next} statement causes the end of the input to be reached,
+then the code in any @code{END} rules will be executed.
+@xref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}.
+
+@node Nextfile Statement, Exit Statement, Next Statement, Statements
+@section The @code{nextfile} Statement
+@cindex @code{nextfile} statement
+@cindex differences between @code{gawk} and @code{awk}
+
+@code{gawk} provides the @code{nextfile} statement,
+which is similar to the @code{next} statement.
+However, instead of abandoning processing of the current record, the
+@code{nextfile} statement instructs @code{gawk} to stop processing the
+current data file.
+
+Upon execution of the @code{nextfile} statement, @code{FILENAME} is
+updated to the name of the next data file listed on the command line,
+@code{FNR} is reset to one, @code{ARGIND} is incremented, and processing
+starts over with the first rule in the progam. @xref{Built-in Variables}.
+
+If the @code{nextfile} statement causes the end of the input to be reached,
+then the code in any @code{END} rules will be executed.
+@xref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}.
+
+The @code{nextfile} statement is a @code{gawk} extension; it is not
+(currently) available in any other @code{awk} implementation.
+@xref{Nextfile Function, ,Implementing @code{nextfile} as a Function},
+for a user-defined function you can use to simulate the @code{nextfile}
+statement.
+
+The @code{nextfile} statement would be useful if you have many data
+files to process, and you expect that you
+would not want to process every record in every file.
+Normally, in order to move on to
+the next data file, you would have to continue scanning the unwanted
+records. The @code{nextfile} statement accomplishes this much more
+efficiently.
+
+@cindex @code{next file} statement
+@strong{Caution:} Versions of @code{gawk} prior to 3.0 used two
+words (@samp{next file}) for the @code{nextfile} statement. This was
+changed in 3.0 to one word, since the treatment of @samp{file} was
+inconsistent. When it appeared after @code{next}, it was a keyword.
+Otherwise, it was a regular identifier. The old usage is still
+accepted. However, @code{gawk} will generate a warning message, and
+support for @code{next file} will eventually be discontinued in a
+future version of @code{gawk}.
+
+@node Exit Statement, , Nextfile Statement, Statements
+@section The @code{exit} Statement
+
+@cindex @code{exit} statement
+The @code{exit} statement causes @code{awk} to immediately stop
+executing the current rule and to stop processing input; any remaining input
+is ignored. It looks like this:
+
+@example
+exit @r{[}@var{return code}@r{]}
+@end example
+
+If an @code{exit} statement is executed from a @code{BEGIN} rule the
+program stops processing everything immediately. No input records are
+read. However, if an @code{END} rule is present, it is executed
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}).
+
+If @code{exit} is used as part of an @code{END} rule, it causes
+the program to stop immediately.
+
+An @code{exit} statement that is not part
+of a @code{BEGIN} or @code{END} rule stops the execution of any further
+automatic rules for the current record, skips reading any remaining input
+records, and executes
+the @code{END} rule if there is one.
+
+If you do not want the @code{END} rule to do its job in this case, you
+can set a variable to non-zero before the @code{exit} statement, and check
+that variable in the @code{END} rule.
+@xref{Assert Function, ,Assertions},
+for an example that does this.
+
+@cindex dark corner
+If an argument is supplied to @code{exit}, its value is used as the exit
+status code for the @code{awk} process. If no argument is supplied,
+@code{exit} returns status zero (success). In the case where an argument
+is supplied to a first @code{exit} statement, and then @code{exit} is
+called a second time with no argument, the previously supplied exit value
+is used (d.c.).
+
+For example, let's say you've discovered an error condition you really
+don't know how to handle. Conventionally, programs report this by
+exiting with a non-zero status. Your @code{awk} program can do this
+using an @code{exit} statement with a non-zero argument. Here is an
+example:
+
+@example
+@group
+BEGIN @{
+ if (("date" | getline date_now) < 0) @{
+ print "Can't get system date" > "/dev/stderr"
+ exit 1
+ @}
+ print "current date is", date_now
+ close("date")
+@}
+@end group
+@end example
+
+@node Built-in Variables, Arrays, Statements, Top
+@chapter Built-in Variables
+@cindex built-in variables
+
+Most @code{awk} variables are available for you to use for your own
+purposes; they never change except when your program assigns values to
+them, and never affect anything except when your program examines them.
+However, a few variables in @code{awk} have special built-in meanings.
+Some of them @code{awk} examines automatically, so that they enable you
+to tell @code{awk} how to do certain things. Others are set
+automatically by @code{awk}, so that they carry information from the
+internal workings of @code{awk} to your program.
+
+This chapter documents all the built-in variables of @code{gawk}. Most
+of them are also documented in the chapters describing their areas of
+activity.
+
+@menu
+* User-modified:: Built-in variables that you change to control
+ @code{awk}.
+* Auto-set:: Built-in variables where @code{awk} gives you
+ information.
+* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}.
+@end menu
+
+@node User-modified, Auto-set, Built-in Variables, Built-in Variables
+@section Built-in Variables that Control @code{awk}
+@cindex built-in variables, user modifiable
+
+This is an alphabetical list of the variables which you can change to
+control how @code{awk} does certain things. Those variables that are
+specific to @code{gawk} are marked with an asterisk, @samp{*}.
+
+@table @code
+@vindex CONVFMT
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+@item CONVFMT
+This string controls conversion of numbers to
+strings (@pxref{Conversion, ,Conversion of Strings and Numbers}).
+It works by being passed, in effect, as the first argument to the
+@code{sprintf} function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+Its default value is @code{"%.6g"}.
+@code{CONVFMT} was introduced by the POSIX standard.
+
+@vindex FIELDWIDTHS
+@item FIELDWIDTHS *
+This is a space separated list of columns that tells @code{gawk}
+how to split input with fixed, columnar boundaries. It is an
+experimental feature. Assigning to @code{FIELDWIDTHS}
+overrides the use of @code{FS} for field splitting.
+@xref{Constant Size, ,Reading Fixed-width Data}, for more information.
+
+If @code{gawk} is in compatibility mode
+(@pxref{Options, ,Command Line Options}), then @code{FIELDWIDTHS}
+has no special meaning, and field splitting operations are done based
+exclusively on the value of @code{FS}.
+
+@vindex FS
+@item FS
+@code{FS} is the input field separator
+(@pxref{Field Separators, ,Specifying How Fields are Separated}).
+The value is a single-character string or a multi-character regular
+expression that matches the separations between fields in an input
+record. If the value is the null string (@code{""}), then each
+character in the record becomes a separate field.
+
+The default value is @w{@code{" "}}, a string consisting of a single
+space. As a special exception, this value means that any
+sequence of spaces and tabs is a single separator. It also causes
+spaces and tabs at the beginning and end of a record to be ignored.
+
+You can set the value of @code{FS} on the command line using the
+@samp{-F} option:
+
+@example
+awk -F, '@var{program}' @var{input-files}
+@end example
+
+If @code{gawk} is using @code{FIELDWIDTHS} for field-splitting,
+assigning a value to @code{FS} will cause @code{gawk} to return to
+the normal, @code{FS}-based, field splitting. An easy way to do this
+is to simply say @samp{FS = FS}, perhaps with an explanatory comment.
+
+@vindex IGNORECASE
+@item IGNORECASE *
+If @code{IGNORECASE} is non-zero or non-null, then all string comparisons,
+and all regular expression matching are case-independent. Thus, regexp
+matching with @samp{~} and @samp{!~}, and the @code{gensub},
+@code{gsub}, @code{index}, @code{match}, @code{split} and @code{sub}
+functions, record termination with @code{RS}, and field splitting with
+@code{FS} all ignore case when doing their particular regexp operations.
+@xref{Case-sensitivity, ,Case-sensitivity in Matching}.
+
+If @code{gawk} is in compatibility mode
+(@pxref{Options, ,Command Line Options}),
+then @code{IGNORECASE} has no special meaning, and string
+and regexp operations are always case-sensitive.
+
+@vindex OFMT
+@item OFMT
+This string controls conversion of numbers to
+strings (@pxref{Conversion, ,Conversion of Strings and Numbers}) for
+printing with the @code{print} statement. It works by being passed, in
+effect, as the first argument to the @code{sprintf} function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+Its default value is @code{"%.6g"}. Earlier versions of @code{awk}
+also used @code{OFMT} to specify the format for converting numbers to
+strings in general expressions; this is now done by @code{CONVFMT}.
+
+@vindex OFS
+@item OFS
+This is the output field separator (@pxref{Output Separators}). It is
+output between the fields output by a @code{print} statement. Its
+default value is @w{@code{" "}}, a string consisting of a single space.
+
+@vindex ORS
+@item ORS
+This is the output record separator. It is output at the end of every
+@code{print} statement. Its default value is @code{"\n"}.
+(@xref{Output Separators}.)
+
+@vindex RS
+@item RS
+This is @code{awk}'s input record separator. Its default value is a string
+containing a single newline character, which means that an input record
+consists of a single line of text.
+It can also be the null string, in which case records are separated by
+runs of blank lines, or a regexp, in which case records are separated by
+matches of the regexp in the input text.
+(@xref{Records, ,How Input is Split into Records}.)
+
+@vindex SUBSEP
+@item SUBSEP
+@code{SUBSEP} is the subscript separator. It has the default value of
+@code{"\034"}, and is used to separate the parts of the indices of a
+multi-dimensional array. Thus, the expression @code{@w{foo["A", "B"]}}
+really accesses @code{foo["A\034B"]}
+(@pxref{Multi-dimensional, ,Multi-dimensional Arrays}).
+@end table
+
+@node Auto-set, ARGC and ARGV, User-modified, Built-in Variables
+@section Built-in Variables that Convey Information
+@cindex built-in variables, convey information
+
+This is an alphabetical list of the variables that are set
+automatically by @code{awk} on certain occasions in order to provide
+information to your program. Those variables that are specific to
+@code{gawk} are marked with an asterisk, @samp{*}.
+
+@table @code
+@vindex ARGC
+@vindex ARGV
+@item ARGC
+@itemx ARGV
+The command-line arguments available to @code{awk} programs are stored in
+an array called @code{ARGV}. @code{ARGC} is the number of command-line
+arguments present. @xref{Other Arguments, ,Other Command Line Arguments}.
+Unlike most @code{awk} arrays,
+@code{ARGV} is indexed from zero to @code{ARGC} @minus{} 1. For example:
+
+@example
+@group
+$ awk 'BEGIN @{
+> for (i = 0; i < ARGC; i++)
+> print ARGV[i]
+> @}' inventory-shipped BBS-list
+@print{} awk
+@print{} inventory-shipped
+@print{} BBS-list
+@end group
+@end example
+
+@noindent
+In this example, @code{ARGV[0]} contains @code{"awk"}, @code{ARGV[1]}
+contains @code{"inventory-shipped"}, and @code{ARGV[2]} contains
+@code{"BBS-list"}. The value of @code{ARGC} is three, one more than the
+index of the last element in @code{ARGV}, since the elements are numbered
+from zero.
+
+The names @code{ARGC} and @code{ARGV}, as well as the convention of indexing
+the array from zero to @code{ARGC} @minus{} 1, are derived from the C language's
+method of accessing command line arguments.
+@xref{ARGC and ARGV, , Using @code{ARGC} and @code{ARGV}}, for information
+about how @code{awk} uses these variables.
+
+@vindex ARGIND
+@item ARGIND *
+The index in @code{ARGV} of the current file being processed.
+Every time @code{gawk} opens a new data file for processing, it sets
+@code{ARGIND} to the index in @code{ARGV} of the file name.
+When @code{gawk} is processing the input files, it is always
+true that @samp{FILENAME == ARGV[ARGIND]}.
+
+This variable is useful in file processing; it allows you to tell how far
+along you are in the list of data files, and to distinguish between
+successive instances of the same filename on the command line.
+
+While you can change the value of @code{ARGIND} within your @code{awk}
+program, @code{gawk} will automatically set it to a new value when the
+next file is opened.
+
+This variable is a @code{gawk} extension. In other @code{awk} implementations,
+or if @code{gawk} is in compatibility mode
+(@pxref{Options, ,Command Line Options}),
+it is not special.
+
+@vindex ENVIRON
+@item ENVIRON
+An associative array that contains the values of the environment. The array
+indices are the environment variable names; the values are the values of
+the particular environment variables. For example,
+@code{ENVIRON["HOME"]} might be @file{/home/arnold}. Changing this array
+does not affect the environment passed on to any programs that
+@code{awk} may spawn via redirection or the @code{system} function.
+(In a future version of @code{gawk}, it may do so.)
+
+Some operating systems may not have environment variables.
+On such systems, the @code{ENVIRON} array is empty (except for
+@w{@code{ENVIRON["AWKPATH"]}}).
+
+@vindex ERRNO
+@item ERRNO *
+If a system error occurs either doing a redirection for @code{getline},
+during a read for @code{getline}, or during a @code{close} operation,
+then @code{ERRNO} will contain a string describing the error.
+
+This variable is a @code{gawk} extension. In other @code{awk} implementations,
+or if @code{gawk} is in compatibility mode
+(@pxref{Options, ,Command Line Options}),
+it is not special.
+
+@cindex dark corner
+@vindex FILENAME
+@item FILENAME
+This is the name of the file that @code{awk} is currently reading.
+When no data files are listed on the command line, @code{awk} reads
+from the standard input, and @code{FILENAME} is set to @code{"-"}.
+@code{FILENAME} is changed each time a new file is read
+(@pxref{Reading Files, ,Reading Input Files}).
+Inside a @code{BEGIN} rule, the value of @code{FILENAME} is
+@code{""}, since there are no input files being processed
+yet.@footnote{Some early implementations of Unix @code{awk} initialized
+@code{FILENAME} to @code{"-"}, even if there were data files to be
+processed. This behavior was incorrect, and should not be relied
+upon in your programs.} (d.c.)
+
+@vindex FNR
+@item FNR
+@code{FNR} is the current record number in the current file. @code{FNR} is
+incremented each time a new record is read
+(@pxref{Getline, ,Explicit Input with @code{getline}}). It is reinitialized
+to zero each time a new input file is started.
+
+@vindex NF
+@item NF
+@code{NF} is the number of fields in the current input record.
+@code{NF} is set each time a new record is read, when a new field is
+created, or when @code{$0} changes (@pxref{Fields, ,Examining Fields}).
+
+@vindex NR
+@item NR
+This is the number of input records @code{awk} has processed since
+the beginning of the program's execution
+(@pxref{Records, ,How Input is Split into Records}).
+@code{NR} is set each time a new record is read.
+
+@vindex RLENGTH
+@item RLENGTH
+@code{RLENGTH} is the length of the substring matched by the
+@code{match} function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+@code{RLENGTH} is set by invoking the @code{match} function. Its value
+is the length of the matched string, or @minus{}1 if no match was found.
+
+@vindex RSTART
+@item RSTART
+@code{RSTART} is the start-index in characters of the substring matched by the
+@code{match} function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+@code{RSTART} is set by invoking the @code{match} function. Its value
+is the position of the string where the matched substring starts, or zero
+if no match was found.
+
+@vindex RT
+@item RT *
+@code{RT} is set each time a record is read. It contains the input text
+that matched the text denoted by @code{RS}, the record separator.
+
+This variable is a @code{gawk} extension. In other @code{awk} implementations,
+or if @code{gawk} is in compatibility mode
+(@pxref{Options, ,Command Line Options}),
+it is not special.
+@end table
+
+@cindex dark corner
+A side note about @code{NR} and @code{FNR}.
+@code{awk} simply increments both of these variables
+each time it reads a record, instead of setting them to the absolute
+value of the number of records read. This means that your program can
+change these variables, and their new values will be incremented for
+each record (d.c.). For example:
+
+@example
+@group
+$ echo '1
+> 2
+> 3
+> 4' | awk 'NR == 2 @{ NR = 17 @}
+> @{ print NR @}'
+@print{} 1
+@print{} 17
+@print{} 18
+@print{} 19
+@end group
+@end example
+
+@noindent
+Before @code{FNR} was added to the @code{awk} language
+(@pxref{V7/SVR3.1, ,Major Changes between V7 and SVR3.1}),
+many @code{awk} programs used this feature to track the number of
+records in a file by resetting @code{NR} to zero when @code{FILENAME}
+changed.
+
+@node ARGC and ARGV, , Auto-set, Built-in Variables
+@section Using @code{ARGC} and @code{ARGV}
+
+In @ref{Auto-set, , Built-in Variables that Convey Information},
+you saw this program describing the information contained in @code{ARGC}
+and @code{ARGV}:
+
+@example
+@group
+$ awk 'BEGIN @{
+> for (i = 0; i < ARGC; i++)
+> print ARGV[i]
+> @}' inventory-shipped BBS-list
+@print{} awk
+@print{} inventory-shipped
+@print{} BBS-list
+@end group
+@end example
+
+@noindent
+In this example, @code{ARGV[0]} contains @code{"awk"}, @code{ARGV[1]}
+contains @code{"inventory-shipped"}, and @code{ARGV[2]} contains
+@code{"BBS-list"}.
+
+Notice that the @code{awk} program is not entered in @code{ARGV}. The
+other special command line options, with their arguments, are also not
+entered. But variable assignments on the command line @emph{are}
+treated as arguments, and do show up in the @code{ARGV} array.
+
+Your program can alter @code{ARGC} and the elements of @code{ARGV}.
+Each time @code{awk} reaches the end of an input file, it uses the next
+element of @code{ARGV} as the name of the next input file. By storing a
+different string there, your program can change which files are read.
+You can use @code{"-"} to represent the standard input. By storing
+additional elements and incrementing @code{ARGC} you can cause
+additional files to be read.
+
+If you decrease the value of @code{ARGC}, that eliminates input files
+from the end of the list. By recording the old value of @code{ARGC}
+elsewhere, your program can treat the eliminated arguments as
+something other than file names.
+
+To eliminate a file from the middle of the list, store the null string
+(@code{""}) into @code{ARGV} in place of the file's name. As a
+special feature, @code{awk} ignores file names that have been
+replaced with the null string.
+You may also use the @code{delete} statement to remove elements from
+@code{ARGV} (@pxref{Delete, ,The @code{delete} Statement}).
+
+All of these actions are typically done from the @code{BEGIN} rule,
+before actual processing of the input begins.
+@xref{Split Program, ,Splitting a Large File Into Pieces}, and see
+@ref{Tee Program, ,Duplicating Output Into Multiple Files}, for an example
+of each way of removing elements from @code{ARGV}.
+
+The following fragment processes @code{ARGV} in order to examine, and
+then remove, command line options.
+
+@example
+@group
+BEGIN @{
+ for (i = 1; i < ARGC; i++) @{
+ if (ARGV[i] == "-v")
+ verbose = 1
+ else if (ARGV[i] == "-d")
+ debug = 1
+@end group
+@group
+ else if (ARGV[i] ~ /^-?/) @{
+ e = sprintf("%s: unrecognized option -- %c",
+ ARGV[0], substr(ARGV[i], 1, ,1))
+ print e > "/dev/stderr"
+ @} else
+ break
+ delete ARGV[i]
+ @}
+@}
+@end group
+@end example
+
+@node Arrays, Built-in, Built-in Variables, Top
+@chapter Arrays in @code{awk}
+
+An @dfn{array} is a table of values, called @dfn{elements}. The
+elements of an array are distinguished by their indices. @dfn{Indices}
+may be either numbers or strings. @code{awk} maintains a single set
+of names that may be used for naming variables, arrays and functions
+(@pxref{User-defined, ,User-defined Functions}).
+Thus, you cannot have a variable and an array with the same name in the
+same @code{awk} program.
+
+@menu
+* Array Intro:: Introduction to Arrays
+* Reference to Elements:: How to examine one element of an array.
+* Assigning Elements:: How to change an element of an array.
+* Array Example:: Basic Example of an Array
+* Scanning an Array:: A variation of the @code{for} statement. It
+ loops through the indices of an array's
+ existing elements.
+* Delete:: The @code{delete} statement removes an element
+ from an array.
+* Numeric Array Subscripts:: How to use numbers as subscripts in
+ @code{awk}.
+* Uninitialized Subscripts:: Using Uninitialized variables as subscripts.
+* Multi-dimensional:: Emulating multi-dimensional arrays in
+ @code{awk}.
+* Multi-scanning:: Scanning multi-dimensional arrays.
+@end menu
+
+@node Array Intro, Reference to Elements, Arrays, Arrays
+@section Introduction to Arrays
+
+@cindex arrays
+The @code{awk} language provides one-dimensional @dfn{arrays} for storing groups
+of related strings or numbers.
+
+Every @code{awk} array must have a name. Array names have the same
+syntax as variable names; any valid variable name would also be a valid
+array name. But you cannot use one name in both ways (as an array and
+as a variable) in one @code{awk} program.
+
+Arrays in @code{awk} superficially resemble arrays in other programming
+languages; but there are fundamental differences. In @code{awk}, you
+don't need to specify the size of an array before you start to use it.
+Additionally, any number or string in @code{awk} may be used as an
+array index, not just consecutive integers.
+
+In most other languages, you have to @dfn{declare} an array and specify
+how many elements or components it contains. In such languages, the
+declaration causes a contiguous block of memory to be allocated for that
+many elements. An index in the array usually must be a positive integer; for
+example, the index zero specifies the first element in the array, which is
+actually stored at the beginning of the block of memory. Index one
+specifies the second element, which is stored in memory right after the
+first element, and so on. It is impossible to add more elements to the
+array, because it has room for only as many elements as you declared.
+(Some languages allow arbitrary starting and ending indices,
+e.g., @samp{15 .. 27}, but the size of the array is still fixed when
+the array is declared.)
+
+A contiguous array of four elements might look like this,
+conceptually, if the element values are eight, @code{"foo"},
+@code{""} and 30:
+
+@iftex
+@c from Karl Berry, much thanks for the help.
+@tex
+\bigskip % space above the table (about 1 linespace)
+\offinterlineskip
+\newdimen\width \width = 1.5cm
+\newdimen\hwidth \hwidth = 4\width \advance\hwidth by 2pt % 5 * 0.4pt
+\centerline{\vbox{
+\halign{\strut\hfil\ignorespaces#&&\vrule#&\hbox to\width{\hfil#\unskip\hfil}\cr
+\noalign{\hrule width\hwidth}
+ &&{\tt 8} &&{\tt "foo"} &&{\tt ""} &&{\tt 30} &&\quad value\cr
+\noalign{\hrule width\hwidth}
+\noalign{\smallskip}
+ &\omit&0&\omit &1 &\omit&2 &\omit&3 &\omit&\quad index\cr
+}
+}}
+@end tex
+@end iftex
+@ifinfo
+@example
++---------+---------+--------+---------+
+| 8 | "foo" | "" | 30 | @r{value}
++---------+---------+--------+---------+
+ 0 1 2 3 @r{index}
+@end example
+@end ifinfo
+
+@noindent
+Only the values are stored; the indices are implicit from the order of
+the values. Eight is the value at index zero, because eight appears in the
+position with zero elements before it.
+
+@cindex arrays, definition of
+@cindex associative arrays
+@cindex arrays, associative
+Arrays in @code{awk} are different: they are @dfn{associative}. This means
+that each array is a collection of pairs: an index, and its corresponding
+array element value:
+
+@example
+@r{Element} 4 @r{Value} 30
+@r{Element} 2 @r{Value} "foo"
+@r{Element} 1 @r{Value} 8
+@r{Element} 3 @r{Value} ""
+@end example
+
+@noindent
+We have shown the pairs in jumbled order because their order is irrelevant.
+
+One advantage of associative arrays is that new pairs can be added
+at any time. For example, suppose we add to the above array a tenth element
+whose value is @w{@code{"number ten"}}. The result is this:
+
+@example
+@r{Element} 10 @r{Value} "number ten"
+@r{Element} 4 @r{Value} 30
+@r{Element} 2 @r{Value} "foo"
+@r{Element} 1 @r{Value} 8
+@r{Element} 3 @r{Value} ""
+@end example
+
+@noindent
+@cindex sparse arrays
+@cindex arrays, sparse
+Now the array is @dfn{sparse}, which just means some indices are missing:
+it has elements 1--4 and 10, but doesn't have elements 5, 6, 7, 8, or 9.
+@c ok, I should spell out the above, but ...
+
+Another consequence of associative arrays is that the indices don't
+have to be positive integers. Any number, or even a string, can be
+an index. For example, here is an array which translates words from
+English into French:
+
+@example
+@r{Element} "dog" @r{Value} "chien"
+@r{Element} "cat" @r{Value} "chat"
+@r{Element} "one" @r{Value} "un"
+@r{Element} 1 @r{Value} "un"
+@end example
+
+@noindent
+Here we decided to translate the number one in both spelled-out and
+numeric form---thus illustrating that a single array can have both
+numbers and strings as indices.
+(In fact, array subscripts are always strings; this is discussed
+in more detail in
+@ref{Numeric Array Subscripts, ,Using Numbers to Subscript Arrays}.)
+
+When @code{awk} creates an array for you, e.g., with the @code{split}
+built-in function,
+that array's indices are consecutive integers starting at one.
+(@xref{String Functions, ,Built-in Functions for String Manipulation}.)
+
+@node Reference to Elements, Assigning Elements, Array Intro, Arrays
+@section Referring to an Array Element
+@cindex array reference
+@cindex element of array
+@cindex reference to array
+
+The principal way of using an array is to refer to one of its elements.
+An array reference is an expression which looks like this:
+
+@example
+@var{array}[@var{index}]
+@end example
+
+@noindent
+Here, @var{array} is the name of an array. The expression @var{index} is
+the index of the element of the array that you want.
+
+The value of the array reference is the current value of that array
+element. For example, @code{foo[4.3]} is an expression for the element
+of array @code{foo} at index @samp{4.3}.
+
+If you refer to an array element that has no recorded value, the value
+of the reference is @code{""}, the null string. This includes elements
+to which you have not assigned any value, and elements that have been
+deleted (@pxref{Delete, ,The @code{delete} Statement}). Such a reference
+automatically creates that array element, with the null string as its value.
+(In some cases, this is unfortunate, because it might waste memory inside
+@code{awk}.)
+
+@cindex arrays, presence of elements
+@cindex arrays, the @code{in} operator
+You can find out if an element exists in an array at a certain index with
+the expression:
+
+@example
+@var{index} in @var{array}
+@end example
+
+@noindent
+This expression tests whether or not the particular index exists,
+without the side effect of creating that element if it is not present.
+The expression has the value one (true) if @code{@var{array}[@var{index}]}
+exists, and zero (false) if it does not exist.
+
+For example, to test whether the array @code{frequencies} contains the
+index @samp{2}, you could write this statement:
+
+@example
+if (2 in frequencies)
+ print "Subscript 2 is present."
+@end example
+
+Note that this is @emph{not} a test of whether or not the array
+@code{frequencies} contains an element whose @emph{value} is two.
+(There is no way to do that except to scan all the elements.) Also, this
+@emph{does not} create @code{frequencies[2]}, while the following
+(incorrect) alternative would do so:
+
+@example
+if (frequencies[2] != "")
+ print "Subscript 2 is present."
+@end example
+
+@node Assigning Elements, Array Example, Reference to Elements, Arrays
+@section Assigning Array Elements
+@cindex array assignment
+@cindex element assignment
+
+Array elements are lvalues: they can be assigned values just like
+@code{awk} variables:
+
+@example
+@var{array}[@var{subscript}] = @var{value}
+@end example
+
+@noindent
+Here @var{array} is the name of your array. The expression
+@var{subscript} is the index of the element of the array that you want
+to assign a value. The expression @var{value} is the value you are
+assigning to that element of the array.
+
+@node Array Example, Scanning an Array, Assigning Elements, Arrays
+@section Basic Array Example
+
+The following program takes a list of lines, each beginning with a line
+number, and prints them out in order of line number. The line numbers are
+not in order, however, when they are first read: they are scrambled. This
+program sorts the lines by making an array using the line numbers as
+subscripts. It then prints out the lines in sorted order of their numbers.
+It is a very simple program, and gets confused if it encounters repeated
+numbers, gaps, or lines that don't begin with a number.
+
+@example
+@c file eg/misc/arraymax.awk
+@{
+ if ($1 > max)
+ max = $1
+ arr[$1] = $0
+@}
+
+END @{
+ for (x = 1; x <= max; x++)
+ print arr[x]
+@}
+@c endfile
+@end example
+
+The first rule keeps track of the largest line number seen so far;
+it also stores each line into the array @code{arr}, at an index that
+is the line's number.
+
+The second rule runs after all the input has been read, to print out
+all the lines.
+
+When this program is run with the following input:
+
+@example
+@group
+@c file eg/misc/arraymax.data
+5 I am the Five man
+2 Who are you? The new number two!
+4 . . . And four on the floor
+1 Who is number one?
+3 I three you.
+@c endfile
+@end group
+@end example
+
+@noindent
+its output is this:
+
+@example
+1 Who is number one?
+2 Who are you? The new number two!
+3 I three you.
+4 . . . And four on the floor
+5 I am the Five man
+@end example
+
+If a line number is repeated, the last line with a given number overrides
+the others.
+
+Gaps in the line numbers can be handled with an easy improvement to the
+program's @code{END} rule:
+
+@example
+END @{
+ for (x = 1; x <= max; x++)
+ if (x in arr)
+ print arr[x]
+@}
+@end example
+
+@node Scanning an Array, Delete, Array Example, Arrays
+@section Scanning All Elements of an Array
+@cindex @code{for (x in @dots{})}
+@cindex arrays, special @code{for} statement
+@cindex scanning an array
+
+In programs that use arrays, you often need a loop that executes
+once for each element of an array. In other languages, where arrays are
+contiguous and indices are limited to positive integers, this is
+easy: you can
+find all the valid indices by counting from the lowest index
+up to the highest. This
+technique won't do the job in @code{awk}, since any number or string
+can be an array index. So @code{awk} has a special kind of @code{for}
+statement for scanning an array:
+
+@example
+for (@var{var} in @var{array})
+ @var{body}
+@end example
+
+@noindent
+This loop executes @var{body} once for each index in @var{array} that your
+program has previously used, with the
+variable @var{var} set to that index.
+
+Here is a program that uses this form of the @code{for} statement. The
+first rule scans the input records and notes which words appear (at
+least once) in the input, by storing a one into the array @code{used} with
+the word as index. The second rule scans the elements of @code{used} to
+find all the distinct words that appear in the input. It prints each
+word that is more than 10 characters long, and also prints the number of
+such words. @xref{String Functions, ,Built-in Functions for String Manipulation}, for more information
+on the built-in function @code{length}.
+
+@example
+# Record a 1 for each word that is used at least once.
+@{
+ for (i = 1; i <= NF; i++)
+ used[$i] = 1
+@}
+
+# Find number of distinct words more than 10 characters long.
+END @{
+ for (x in used)
+ if (length(x) > 10) @{
+ ++num_long_words
+ print x
+ @}
+ print num_long_words, "words longer than 10 characters"
+@}
+@end example
+
+@noindent
+@xref{Word Sorting, ,Generating Word Usage Counts},
+for a more detailed example of this type.
+
+The order in which elements of the array are accessed by this statement
+is determined by the internal arrangement of the array elements within
+@code{awk} and cannot be controlled or changed. This can lead to
+problems if new elements are added to @var{array} by statements in
+the loop body; you cannot predict whether or not the @code{for} loop will
+reach them. Similarly, changing @var{var} inside the loop may produce
+strange results. It is best to avoid such things.
+
+@node Delete, Numeric Array Subscripts, Scanning an Array, Arrays
+@section The @code{delete} Statement
+@cindex @code{delete} statement
+@cindex deleting elements of arrays
+@cindex removing elements of arrays
+@cindex arrays, deleting an element
+
+You can remove an individual element of an array using the @code{delete}
+statement:
+
+@example
+delete @var{array}[@var{index}]
+@end example
+
+Once you have deleted an array element, you can no longer obtain any
+value the element once had. It is as if you had never referred
+to it and had never given it any value.
+
+Here is an example of deleting elements in an array:
+
+@example
+for (i in frequencies)
+ delete frequencies[i]
+@end example
+
+@noindent
+This example removes all the elements from the array @code{frequencies}.
+
+If you delete an element, a subsequent @code{for} statement to scan the array
+will not report that element, and the @code{in} operator to check for
+the presence of that element will return zero (i.e.@: false):
+
+@example
+delete foo[4]
+if (4 in foo)
+ print "This will never be printed"
+@end example
+
+It is important to note that deleting an element is @emph{not} the
+same as assigning it a null value (the empty string, @code{""}).
+
+@example
+foo[4] = ""
+if (4 in foo)
+ print "This is printed, even though foo[4] is empty"
+@end example
+
+It is not an error to delete an element that does not exist.
+
+@cindex arrays, deleting entire contents
+@cindex deleting entire arrays
+@cindex differences between @code{gawk} and @code{awk}
+You can delete all the elements of an array with a single statement,
+by leaving off the subscript in the @code{delete} statement.
+
+@example
+delete @var{array}
+@end example
+
+This ability is a @code{gawk} extension; it is not available in
+compatibility mode (@pxref{Options, ,Command Line Options}).
+
+Using this version of the @code{delete} statement is about three times
+more efficient than the equivalent loop that deletes each element one
+at a time.
+
+@cindex portability issues
+The following statement provides a portable, but non-obvious way to clear
+out an array.
+
+@cindex Brennan, Michael
+@example
+@group
+# thanks to Michael Brennan for pointing this out
+split("", array)
+@end group
+@end example
+
+The @code{split} function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation})
+clears out the target array first. This call asks it to split
+apart the null string. Since there is no data to split out, the
+function simply clears the array and then returns.
+
+@node Numeric Array Subscripts, Uninitialized Subscripts, Delete, Arrays
+@section Using Numbers to Subscript Arrays
+
+An important aspect of arrays to remember is that @emph{array subscripts
+are always strings}. If you use a numeric value as a subscript,
+it will be converted to a string value before it is used for subscripting
+(@pxref{Conversion, ,Conversion of Strings and Numbers}).
+
+@cindex conversions, during subscripting
+@cindex numbers, used as subscripts
+@vindex CONVFMT
+This means that the value of the built-in variable @code{CONVFMT} can potentially
+affect how your program accesses elements of an array. For example:
+
+@example
+xyz = 12.153
+data[xyz] = 1
+CONVFMT = "%2.2f"
+@group
+if (xyz in data)
+ printf "%s is in data\n", xyz
+else
+ printf "%s is not in data\n", xyz
+@end group
+@end example
+
+@noindent
+This prints @samp{12.15 is not in data}. The first statement gives
+@code{xyz} a numeric value. Assigning to
+@code{data[xyz]} subscripts @code{data} with the string value @code{"12.153"}
+(using the default conversion value of @code{CONVFMT}, @code{"%.6g"}),
+and assigns one to @code{data["12.153"]}. The program then changes
+the value of @code{CONVFMT}. The test @samp{(xyz in data)} generates a new
+string value from @code{xyz}, this time @code{"12.15"}, since the value of
+@code{CONVFMT} only allows two significant digits. This test fails,
+since @code{"12.15"} is a different string from @code{"12.153"}.
+
+According to the rules for conversions
+(@pxref{Conversion, ,Conversion of Strings and Numbers}), integer
+values are always converted to strings as integers, no matter what the
+value of @code{CONVFMT} may happen to be. So the usual case of:
+
+@example
+for (i = 1; i <= maxsub; i++)
+ @i{do something with} array[i]
+@end example
+
+@noindent
+will work, no matter what the value of @code{CONVFMT}.
+
+Like many things in @code{awk}, the majority of the time things work
+as you would expect them to work. But it is useful to have a precise
+knowledge of the actual rules, since sometimes they can have a subtle
+effect on your programs.
+
+@node Uninitialized Subscripts, Multi-dimensional, Numeric Array Subscripts, Arrays
+@section Using Uninitialized Variables as Subscripts
+
+@cindex uninitialized variables, as array subscripts
+@cindex array subscripts, uninitialized variables
+Suppose you want to print your input data in reverse order.
+A reasonable attempt at a program to do so (with some test
+data) might look like this:
+
+@example
+$ echo 'line 1
+> line 2
+> line 3' | awk '@{ l[lines] = $0; ++lines @}
+> END @{
+> for (i = lines-1; i >= 0; --i)
+> print l[i]
+> @}'
+@print{} line 3
+@print{} line 2
+@end example
+
+Unfortunately, the very first line of input data did not come out in the
+output!
+
+At first glance, this program should have worked. The variable @code{lines}
+is uninitialized, and uninitialized variables have the numeric value zero.
+So, the value of @code{l[0]} should have been printed.
+
+The issue here is that subscripts for @code{awk} arrays are @strong{always}
+strings. And uninitialized variables, when used as strings, have the
+value @code{""}, not zero. Thus, @samp{line 1} ended up stored in
+@code{l[""]}.
+
+The following version of the program works correctly:
+
+@example
+@{ l[lines++] = $0 @}
+END @{
+ for (i = lines - 1; i >= 0; --i)
+ print l[i]
+@}
+@end example
+
+Here, the @samp{++} forces @code{l} to be numeric, thus making
+the ``old value'' numeric zero, which is then converted to @code{"0"}
+as the array subscript.
+
+@cindex null string, as array subscript
+@cindex dark corner
+As we have just seen, even though it is somewhat unusual, the null string
+(@code{""}) is a valid array subscript (d.c.). If @samp{--lint} is provided
+on the command line (@pxref{Options, ,Command Line Options}),
+@code{gawk} will warn about the use of the null string as a subscript.
+
+@node Multi-dimensional, Multi-scanning, Uninitialized Subscripts, Arrays
+@section Multi-dimensional Arrays
+
+@cindex subscripts in arrays
+@cindex arrays, multi-dimensional subscripts
+@cindex multi-dimensional subscripts
+A multi-dimensional array is an array in which an element is identified
+by a sequence of indices, instead of a single index. For example, a
+two-dimensional array requires two indices. The usual way (in most
+languages, including @code{awk}) to refer to an element of a
+two-dimensional array named @code{grid} is with
+@code{grid[@var{x},@var{y}]}.
+
+@vindex SUBSEP
+Multi-dimensional arrays are supported in @code{awk} through
+concatenation of indices into one string. What happens is that
+@code{awk} converts the indices into strings
+(@pxref{Conversion, ,Conversion of Strings and Numbers}) and
+concatenates them together, with a separator between them. This creates
+a single string that describes the values of the separate indices. The
+combined string is used as a single index into an ordinary,
+one-dimensional array. The separator used is the value of the built-in
+variable @code{SUBSEP}.
+
+For example, suppose we evaluate the expression @samp{foo[5,12] = "value"}
+when the value of @code{SUBSEP} is @code{"@@"}. The numbers five and 12 are
+converted to strings and
+concatenated with an @samp{@@} between them, yielding @code{"5@@12"}; thus,
+the array element @code{foo["5@@12"]} is set to @code{"value"}.
+
+Once the element's value is stored, @code{awk} has no record of whether
+it was stored with a single index or a sequence of indices. The two
+expressions @samp{foo[5,12]} and @w{@samp{foo[5 SUBSEP 12]}} are always
+equivalent.
+
+The default value of @code{SUBSEP} is the string @code{"\034"},
+which contains a non-printing character that is unlikely to appear in an
+@code{awk} program or in most input data.
+
+The usefulness of choosing an unlikely character comes from the fact
+that index values that contain a string matching @code{SUBSEP} lead to
+combined strings that are ambiguous. Suppose that @code{SUBSEP} were
+@code{"@@"}; then @w{@samp{foo["a@@b", "c"]}} and @w{@samp{foo["a",
+"b@@c"]}} would be indistinguishable because both would actually be
+stored as @samp{foo["a@@b@@c"]}.
+
+You can test whether a particular index-sequence exists in a
+``multi-dimensional'' array with the same operator @samp{in} used for single
+dimensional arrays. Instead of a single index as the left-hand operand,
+write the whole sequence of indices, separated by commas, in
+parentheses:
+
+@example
+(@var{subscript1}, @var{subscript2}, @dots{}) in @var{array}
+@end example
+
+The following example treats its input as a two-dimensional array of
+fields; it rotates this array 90 degrees clockwise and prints the
+result. It assumes that all lines have the same number of
+elements.
+
+@example
+@group
+awk '@{
+ if (max_nf < NF)
+ max_nf = NF
+ max_nr = NR
+ for (x = 1; x <= NF; x++)
+ vector[x, NR] = $x
+@}
+@end group
+
+@group
+END @{
+ for (x = 1; x <= max_nf; x++) @{
+ for (y = max_nr; y >= 1; --y)
+ printf("%s ", vector[x, y])
+ printf("\n")
+ @}
+@}'
+@end group
+@end example
+
+@noindent
+When given the input:
+
+@example
+@group
+1 2 3 4 5 6
+2 3 4 5 6 1
+3 4 5 6 1 2
+4 5 6 1 2 3
+@end group
+@end example
+
+@noindent
+it produces:
+
+@example
+@group
+4 3 2 1
+5 4 3 2
+6 5 4 3
+1 6 5 4
+2 1 6 5
+3 2 1 6
+@end group
+@end example
+
+@node Multi-scanning, , Multi-dimensional, Arrays
+@section Scanning Multi-dimensional Arrays
+
+There is no special @code{for} statement for scanning a
+``multi-dimensional'' array; there cannot be one, because in truth there
+are no multi-dimensional arrays or elements; there is only a
+multi-dimensional @emph{way of accessing} an array.
+
+However, if your program has an array that is always accessed as
+multi-dimensional, you can get the effect of scanning it by combining
+the scanning @code{for} statement
+(@pxref{Scanning an Array, ,Scanning All Elements of an Array}) with the
+@code{split} built-in function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+It works like this:
+
+@example
+for (combined in array) @{
+ split(combined, separate, SUBSEP)
+ @dots{}
+@}
+@end example
+
+@noindent
+This sets @code{combined} to
+each concatenated, combined index in the array, and splits it
+into the individual indices by breaking it apart where the value of
+@code{SUBSEP} appears. The split-out indices become the elements of
+the array @code{separate}.
+
+Thus, suppose you have previously stored a value in @code{array[1, "foo"]};
+then an element with index @code{"1\034foo"} exists in
+@code{array}. (Recall that the default value of @code{SUBSEP} is
+the character with code 034.) Sooner or later the @code{for} statement
+will find that index and do an iteration with @code{combined} set to
+@code{"1\034foo"}. Then the @code{split} function is called as
+follows:
+
+@example
+split("1\034foo", separate, "\034")
+@end example
+
+@noindent
+The result of this is to set @code{separate[1]} to @code{"1"} and
+@code{separate[2]} to @code{"foo"}. Presto, the original sequence of
+separate indices has been recovered.
+
+@node Built-in, User-defined, Arrays, Top
+@chapter Built-in Functions
+
+@c 2e: USE TEXINFO-2 FUNCTION DEFINITION STUFF!!!!!!!!!!!!!
+@cindex built-in functions
+@dfn{Built-in} functions are functions that are always available for
+your @code{awk} program to call. This chapter defines all the built-in
+functions in @code{awk}; some of them are mentioned in other sections,
+but they are summarized here for your convenience. (You can also define
+new functions yourself. @xref{User-defined, ,User-defined Functions}.)
+
+@menu
+* Calling Built-in:: How to call built-in functions.
+* Numeric Functions:: Functions that work with numbers, including
+ @code{int}, @code{sin} and @code{rand}.
+* String Functions:: Functions for string manipulation, such as
+ @code{split}, @code{match}, and
+ @code{sprintf}.
+* I/O Functions:: Functions for files and shell commands.
+* Time Functions:: Functions for dealing with time stamps.
+@end menu
+
+@node Calling Built-in, Numeric Functions, Built-in, Built-in
+@section Calling Built-in Functions
+
+To call a built-in function, write the name of the function followed
+by arguments in parentheses. For example, @samp{atan2(y + z, 1)}
+is a call to the function @code{atan2}, with two arguments.
+
+Whitespace is ignored between the built-in function name and the
+open-parenthesis, but we recommend that you avoid using whitespace
+there. User-defined functions do not permit whitespace in this way, and
+you will find it easier to avoid mistakes by following a simple
+convention which always works: no whitespace after a function name.
+
+@cindex differences between @code{gawk} and @code{awk}
+Each built-in function accepts a certain number of arguments.
+In some cases, arguments can be omitted. The defaults for omitted
+arguments vary from function to function and are described under the
+individual functions. In some @code{awk} implementations, extra
+arguments given to built-in functions are ignored. However, in @code{gawk},
+it is a fatal error to give extra arguments to a built-in function.
+
+When a function is called, expressions that create the function's actual
+parameters are evaluated completely before the function call is performed.
+For example, in the code fragment:
+
+@example
+i = 4
+j = sqrt(i++)
+@end example
+
+@noindent
+the variable @code{i} is set to five before @code{sqrt} is called
+with a value of four for its actual parameter.
+
+@cindex evaluation, order of
+@cindex order of evaluation
+The order of evaluation of the expressions used for the function's
+parameters is undefined. Thus, you should not write programs that
+assume that parameters are evaluated from left to right or from
+right to left. For example,
+
+@example
+i = 5
+j = atan2(i++, i *= 2)
+@end example
+
+If the order of evaluation is left to right, then @code{i} first becomes
+six, and then 12, and @code{atan2} is called with the two arguments six
+and 12. But if the order of evaluation is right to left, @code{i}
+first becomes 10, and then 11, and @code{atan2} is called with the
+two arguments 11 and 10.
+
+@node Numeric Functions, String Functions, Calling Built-in, Built-in
+@section Numeric Built-in Functions
+
+Here is a full list of built-in functions that work with numbers.
+Optional parameters are enclosed in square brackets (``['' and ``]'').
+
+@table @code
+@item int(@var{x})
+@findex int
+This produces the nearest integer to @var{x}, located between @var{x} and zero,
+truncated toward zero.
+
+For example, @code{int(3)} is three, @code{int(3.9)} is three, @code{int(-3.9)}
+is @minus{}3, and @code{int(-3)} is @minus{}3 as well.
+
+@item sqrt(@var{x})
+@findex sqrt
+This gives you the positive square root of @var{x}. It reports an error
+if @var{x} is negative. Thus, @code{sqrt(4)} is two.
+
+@item exp(@var{x})
+@findex exp
+This gives you the exponential of @var{x} (@code{e ^ @var{x}}), or reports
+an error if @var{x} is out of range. The range of values @var{x} can have
+depends on your machine's floating point representation.
+
+@item log(@var{x})
+@findex log
+This gives you the natural logarithm of @var{x}, if @var{x} is positive;
+otherwise, it reports an error.
+
+@item sin(@var{x})
+@findex sin
+This gives you the sine of @var{x}, with @var{x} in radians.
+
+@item cos(@var{x})
+@findex cos
+This gives you the cosine of @var{x}, with @var{x} in radians.
+
+@item atan2(@var{y}, @var{x})
+@findex atan2
+This gives you the arctangent of @code{@var{y} / @var{x}} in radians.
+
+@item rand()
+@findex rand
+This gives you a random number. The values of @code{rand} are
+uniformly-distributed between zero and one.
+The value is never zero and never one.
+
+Often you want random integers instead. Here is a user-defined function
+you can use to obtain a random non-negative integer less than @var{n}:
+
+@example
+function randint(n) @{
+ return int(n * rand())
+@}
+@end example
+
+@noindent
+The multiplication produces a random real number greater than zero and less
+than @code{n}. We then make it an integer (using @code{int}) between zero
+and @code{n} @minus{} 1, inclusive.
+
+Here is an example where a similar function is used to produce
+random integers between one and @var{n}. This program
+prints a new random number for each input record.
+
+@example
+@group
+awk '
+# Function to roll a simulated die.
+function roll(n) @{ return 1 + int(rand() * n) @}
+@end group
+
+@group
+# Roll 3 six-sided dice and
+# print total number of points.
+@{
+ printf("%d points\n",
+ roll(6)+roll(6)+roll(6))
+@}'
+@end group
+@end example
+
+@cindex seed for random numbers
+@cindex random numbers, seed of
+@comment MAWK uses a different seed each time.
+@strong{Caution:} In most @code{awk} implementations, including @code{gawk},
+@code{rand} starts generating numbers from the same
+starting number, or @dfn{seed}, each time you run @code{awk}. Thus,
+a program will generate the same results each time you run it.
+The numbers are random within one @code{awk} run, but predictable
+from run to run. This is convenient for debugging, but if you want
+a program to do different things each time it is used, you must change
+the seed to a value that will be different in each run. To do this,
+use @code{srand}.
+
+@item srand(@r{[}@var{x}@r{]})
+@findex srand
+The function @code{srand} sets the starting point, or seed,
+for generating random numbers to the value @var{x}.
+
+Each seed value leads to a particular sequence of random
+numbers.@footnote{Computer generated random numbers really are not truly
+random. They are technically known as ``pseudo-random.'' This means
+that while the numbers in a sequence appear to be random, you can in
+fact generate the same sequence of random numbers over and over again.}
+Thus, if you set the seed to the same value a second time, you will get
+the same sequence of random numbers again.
+
+If you omit the argument @var{x}, as in @code{srand()}, then the current
+date and time of day are used for a seed. This is the way to get random
+numbers that are truly unpredictable.
+
+The return value of @code{srand} is the previous seed. This makes it
+easy to keep track of the seeds for use in consistently reproducing
+sequences of random numbers.
+@end table
+
+@node String Functions, I/O Functions, Numeric Functions, Built-in
+@section Built-in Functions for String Manipulation
+
+The functions in this section look at or change the text of one or more
+strings.
+Optional parameters are enclosed in square brackets (``['' and ``]'').
+
+@table @code
+@item index(@var{in}, @var{find})
+@findex index
+This searches the string @var{in} for the first occurrence of the string
+@var{find}, and returns the position in characters where that occurrence
+begins in the string @var{in}. For example:
+
+@example
+$ awk 'BEGIN @{ print index("peanut", "an") @}'
+@print{} 3
+@end example
+
+@noindent
+If @var{find} is not found, @code{index} returns zero.
+(Remember that string indices in @code{awk} start at one.)
+
+@item length(@r{[}@var{string}@r{]})
+@findex length
+This gives you the number of characters in @var{string}. If
+@var{string} is a number, the length of the digit string representing
+that number is returned. For example, @code{length("abcde")} is five. By
+contrast, @code{length(15 * 35)} works out to three. How? Well, 15 * 35 =
+525, and 525 is then converted to the string @code{"525"}, which has
+three characters.
+
+If no argument is supplied, @code{length} returns the length of @code{$0}.
+
+@cindex historical features
+@cindex portability issues
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+In older versions of @code{awk}, you could call the @code{length} function
+without any parentheses. Doing so is marked as ``deprecated'' in the
+POSIX standard. This means that while you can do this in your
+programs, it is a feature that can eventually be removed from a future
+version of the standard. Therefore, for maximal portability of your
+@code{awk} programs, you should always supply the parentheses.
+
+@item match(@var{string}, @var{regexp})
+@findex match
+The @code{match} function searches the string, @var{string}, for the
+longest, leftmost substring matched by the regular expression,
+@var{regexp}. It returns the character position, or @dfn{index}, of
+where that substring begins (one, if it starts at the beginning of
+@var{string}). If no match is found, it returns zero.
+
+@vindex RSTART
+@vindex RLENGTH
+The @code{match} function sets the built-in variable @code{RSTART} to
+the index. It also sets the built-in variable @code{RLENGTH} to the
+length in characters of the matched substring. If no match is found,
+@code{RSTART} is set to zero, and @code{RLENGTH} to @minus{}1.
+
+For example:
+
+@example
+@group
+@c file eg/misc/findpat.sh
+awk '@{
+ if ($1 == "FIND")
+ regex = $2
+ else @{
+ where = match($0, regex)
+ if (where != 0)
+ print "Match of", regex, "found at", \
+ where, "in", $0
+ @}
+@}'
+@c endfile
+@end group
+@end example
+
+@noindent
+This program looks for lines that match the regular expression stored in
+the variable @code{regex}. This regular expression can be changed. If the
+first word on a line is @samp{FIND}, @code{regex} is changed to be the
+second word on that line. Therefore, given:
+
+@example
+@c file eg/misc/findpat.data
+FIND ru+n
+My program runs
+but not very quickly
+FIND Melvin
+JF+KM
+This line is property of Reality Engineering Co.
+Melvin was here.
+@c endfile
+@end example
+
+@noindent
+@code{awk} prints:
+
+@example
+Match of ru+n found at 12 in My program runs
+Match of Melvin found at 1 in Melvin was here.
+@end example
+
+@item split(@var{string}, @var{array} @r{[}, @var{fieldsep}@r{]})
+@findex split
+This divides @var{string} into pieces separated by @var{fieldsep},
+and stores the pieces in @var{array}. The first piece is stored in
+@code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so
+forth. The string value of the third argument, @var{fieldsep}, is
+a regexp describing where to split @var{string} (much as @code{FS} can
+be a regexp describing where to split input records). If
+the @var{fieldsep} is omitted, the value of @code{FS} is used.
+@code{split} returns the number of elements created.
+
+The @code{split} function splits strings into pieces in a
+manner similar to the way input lines are split into fields. For example:
+
+@example
+split("cul-de-sac", a, "-")
+@end example
+
+@noindent
+splits the string @samp{cul-de-sac} into three fields using @samp{-} as the
+separator. It sets the contents of the array @code{a} as follows:
+
+@example
+a[1] = "cul"
+a[2] = "de"
+a[3] = "sac"
+@end example
+
+@noindent
+The value returned by this call to @code{split} is three.
+
+As with input field-splitting, when the value of @var{fieldsep} is
+@w{@code{" "}}, leading and trailing whitespace is ignored, and the elements
+are separated by runs of whitespace.
+
+@cindex differences between @code{gawk} and @code{awk}
+Also as with input field-splitting, if @var{fieldsep} is the null string, each
+individual character in the string is split into its own array element.
+(This is a @code{gawk}-specific extension.)
+
+@cindex dark corner
+Recent implementations of @code{awk}, including @code{gawk}, allow
+the third argument to be a regexp constant (@code{/abc/}), as well as a
+string (d.c.). The POSIX standard allows this as well.
+
+Before splitting the string, @code{split} deletes any previously existing
+elements in the array @var{array} (d.c.).
+
+@item sprintf(@var{format}, @var{expression1},@dots{})
+@findex sprintf
+This returns (without printing) the string that @code{printf} would
+have printed out with the same arguments
+(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).
+For example:
+
+@example
+sprintf("pi = %.2f (approx.)", 22/7)
+@end example
+
+@noindent
+returns the string @w{@code{"pi = 3.14 (approx.)"}}.
+
+@ignore
+2e: For sub, gsub, and gensub, either here or in the "how much matches"
+ section, we need some explanation that it is possible to match the
+ null string when using closures like *. E.g.,
+
+ $ echo abc | awk '{ gsub(/m*/, "X"); print }'
+ @print{} XaXbXc
+
+ Although this makes a certain amount of sense, it can be very
+ suprising.
+@end ignore
+
+@item sub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]})
+@findex sub
+The @code{sub} function alters the value of @var{target}.
+It searches this value, which is treated as a string, for the
+leftmost longest substring matched by the regular expression, @var{regexp},
+extending this match as far as possible. Then the entire string is
+changed by replacing the matched text with @var{replacement}.
+The modified string becomes the new value of @var{target}.
+
+This function is peculiar because @var{target} is not simply
+used to compute a value, and not just any expression will do: it
+must be a variable, field or array element, so that @code{sub} can
+store a modified value there. If this argument is omitted, then the
+default is to use and alter @code{$0}.
+
+For example:
+
+@example
+str = "water, water, everywhere"
+sub(/at/, "ith", str)
+@end example
+
+@noindent
+sets @code{str} to @w{@code{"wither, water, everywhere"}}, by replacing the
+leftmost, longest occurrence of @samp{at} with @samp{ith}.
+
+The @code{sub} function returns the number of substitutions made (either
+one or zero).
+
+If the special character @samp{&} appears in @var{replacement}, it
+stands for the precise substring that was matched by @var{regexp}. (If
+the regexp can match more than one string, then this precise substring
+may vary.) For example:
+
+@example
+awk '@{ sub(/candidate/, "& and his wife"); print @}'
+@end example
+
+@noindent
+changes the first occurrence of @samp{candidate} to @samp{candidate
+and his wife} on each input line.
+
+Here is another example:
+
+@example
+awk 'BEGIN @{
+ str = "daabaaa"
+ sub(/a*/, "c&c", str)
+ print str
+@}'
+@print{} dcaacbaaa
+@end example
+
+@noindent
+This shows how @samp{&} can represent a non-constant string, and also
+illustrates the ``leftmost, longest'' rule in regexp matching
+(@pxref{Leftmost Longest, ,How Much Text Matches?}).
+
+The effect of this special character (@samp{&}) can be turned off by putting a
+backslash before it in the string. As usual, to insert one backslash in
+the string, you must write two backslashes. Therefore, write @samp{\\&}
+in a string constant to include a literal @samp{&} in the replacement.
+For example, here is how to replace the first @samp{|} on each line with
+an @samp{&}:
+
+@example
+awk '@{ sub(/\|/, "\\&"); print @}'
+@end example
+
+@strong{Note:} As mentioned above, the third argument to @code{sub} must
+be a variable, field or array reference.
+Some versions of @code{awk} allow the third argument to
+be an expression which is not an lvalue. In such a case, @code{sub}
+would still search for the pattern and return zero or one, but the result of
+the substitution (if any) would be thrown away because there is no place
+to put it. Such versions of @code{awk} accept expressions like
+this:
+
+@example
+sub(/USA/, "United States", "the USA and Canada")
+@end example
+
+@noindent
+This is considered erroneous in @code{gawk}.
+
+@item gsub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]})
+@findex gsub
+This is similar to the @code{sub} function, except @code{gsub} replaces
+@emph{all} of the longest, leftmost, @emph{non-overlapping} matching
+substrings it can find. The @samp{g} in @code{gsub} stands for
+``global,'' which means replace everywhere. For example:
+
+@example
+awk '@{ gsub(/Britain/, "United Kingdom"); print @}'
+@end example
+
+@noindent
+replaces all occurrences of the string @samp{Britain} with @samp{United
+Kingdom} for all input records.
+
+The @code{gsub} function returns the number of substitutions made. If
+the variable to be searched and altered, @var{target}, is
+omitted, then the entire input record, @code{$0}, is used.
+
+As in @code{sub}, the characters @samp{&} and @samp{\} are special,
+and the third argument must be an lvalue.
+@end table
+
+@table @code
+@item gensub(@var{regexp}, @var{replacement}, @var{how} @r{[}, @var{target}@r{]})
+@findex gensub
+@code{gensub} is a general substitution function. Like @code{sub} and
+@code{gsub}, it searches the target string @var{target} for matches of
+the regular expression @var{regexp}. Unlike @code{sub} and
+@code{gsub}, the modified string is returned as the result of the
+function, and the original target string is @emph{not} changed. If
+@var{how} is a string beginning with @samp{g} or @samp{G}, then it
+replaces all matches of @var{regexp} with @var{replacement}.
+Otherwise, @var{how} is a number indicating which match of @var{regexp}
+to replace. If no @var{target} is supplied, @code{$0} is used instead.
+
+@code{gensub} provides an additional feature that is not available
+in @code{sub} or @code{gsub}: the ability to specify components of
+a regexp in the replacement text. This is done by using parentheses
+in the regexp to mark the components, and then specifying @samp{\@var{n}}
+in the replacement text, where @var{n} is a digit from one to nine.
+For example:
+
+@example
+@group
+$ gawk '
+> BEGIN @{
+> a = "abc def"
+> b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a)
+> print b
+> @}'
+@print{} def abc
+@end group
+@end example
+
+@noindent
+As described above for @code{sub}, you must type two backslashes in order
+to get one into the string.
+
+In the replacement text, the sequence @samp{\0} represents the entire
+matched text, as does the character @samp{&}.
+
+This example shows how you can use the third argument to control
+which match of the regexp should be changed.
+
+@example
+$ echo a b c a b c |
+> gawk '@{ print gensub(/a/, "AA", 2) @}'
+@print{} a b c AA b c
+@end example
+
+In this case, @code{$0} is used as the default target string.
+@code{gensub} returns the new string as its result, which is
+passed directly to @code{print} for printing.
+
+If the @var{how} argument is a string that does not begin with @samp{g} or
+@samp{G}, or if it is a number that is less than zero, only one
+substitution is performed.
+
+@cindex differences between @code{gawk} and @code{awk}
+@code{gensub} is a @code{gawk} extension; it is not available
+in compatibility mode (@pxref{Options, ,Command Line Options}).
+
+@item substr(@var{string}, @var{start} @r{[}, @var{length}@r{]})
+@findex substr
+This returns a @var{length}-character-long substring of @var{string},
+starting at character number @var{start}. The first character of a
+string is character number one. For example,
+@code{substr("washington", 5, 3)} returns @code{"ing"}.
+
+If @var{length} is not present, this function returns the whole suffix of
+@var{string} that begins at character number @var{start}. For example,
+@code{substr("washington", 5)} returns @code{"ington"}. The whole
+suffix is also returned
+if @var{length} is greater than the number of characters remaining
+in the string, counting from character number @var{start}.
+
+@cindex case conversion
+@cindex conversion of case
+@item tolower(@var{string})
+@findex tolower
+This returns a copy of @var{string}, with each upper-case character
+in the string replaced with its corresponding lower-case character.
+Non-alphabetic characters are left unchanged. For example,
+@code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}.
+
+@item toupper(@var{string})
+@findex toupper
+This returns a copy of @var{string}, with each lower-case character
+in the string replaced with its corresponding upper-case character.
+Non-alphabetic characters are left unchanged. For example,
+@code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}.
+@end table
+
+@c fakenode --- for prepinfo
+@subheading More About @samp{\} and @samp{&} with @code{sub}, @code{gsub} and @code{gensub}
+
+@cindex escape processing, @code{sub} et. al.
+When using @code{sub}, @code{gsub} or @code{gensub}, and trying to get literal
+backslashes and ampersands into the replacement text, you need to remember
+that there are several levels of @dfn{escape processing} going on.
+
+First, there is the @dfn{lexical} level, which is when @code{awk} reads
+your program, and builds an internal copy of your program that can
+be executed.
+
+Then there is the run-time level, when @code{awk} actually scans the
+replacement string to determine what to generate.
+
+At both levels, @code{awk} looks for a defined set of characters that
+can come after a backslash. At the lexical level, it looks for the
+escape sequences listed in @ref{Escape Sequences}.
+Thus, for every @samp{\} that @code{awk} will process at the run-time
+level, you type two @samp{\}s at the lexical level.
+When a character that is not valid for an escape sequence follows the
+@samp{\}, Unix @code{awk} and @code{gawk} both simply remove the initial
+@samp{\}, and put the following character into the string. Thus, for
+example, @code{"a\qb"} is treated as @code{"aqb"}.
+
+At the run-time level, the various functions handle sequences of
+@samp{\} and @samp{&} differently. The situation is (sadly) somewhat complex.
+
+Historically, the @code{sub} and @code{gsub} functions treated the two
+character sequence @samp{\&} specially; this sequence was replaced in
+the generated text with a single @samp{&}. Any other @samp{\} within
+the @var{replacement} string that did not precede an @samp{&} was passed
+through unchanged. To illustrate with a table:
+
+@c Thank to Karl Berry for help with the TeX stuff.
+@tex
+\vbox{\bigskip
+% This table has lots of &'s and \'s, so unspecialize them.
+\catcode`\& = \other \catcode`\\ = \other
+% But then we need character for escape and tab.
+@catcode`! = 4
+@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
+ You type!@code{sub} sees!@code{sub} generates@cr
+@hrulefill!@hrulefill!@hrulefill@cr
+ @code{\&}! @code{&}!the matched text@cr
+ @code{\\&}! @code{\&}!a literal @samp{&}@cr
+ @code{\\\&}! @code{\&}!a literal @samp{&}@cr
+@code{\\\\&}! @code{\\&}!a literal @samp{\&}@cr
+@code{\\\\\&}! @code{\\&}!a literal @samp{\&}@cr
+@code{\\\\\\&}! @code{\\\&}!a literal @samp{\\&}@cr
+ @code{\\q}! @code{\q}!a literal @samp{\q}@cr
+}
+@bigskip}
+@end tex
+@ifinfo
+@display
+ You type @code{sub} sees @code{sub} generates
+ -------- ---------- ---------------
+ @code{\&} @code{&} the matched text
+ @code{\\&} @code{\&} a literal @samp{&}
+ @code{\\\&} @code{\&} a literal @samp{&}
+ @code{\\\\&} @code{\\&} a literal @samp{\&}
+ @code{\\\\\&} @code{\\&} a literal @samp{\&}
+@code{\\\\\\&} @code{\\\&} a literal @samp{\\&}
+ @code{\\q} @code{\q} a literal @samp{\q}
+@end display
+@end ifinfo
+
+@noindent
+This table shows both the lexical level processing, where
+an odd number of backslashes becomes an even number at the run time level,
+and the run-time processing done by @code{sub}.
+(For the sake of simplicity, the rest of the tables below only show the
+case of even numbers of @samp{\}s entered at the lexical level.)
+
+The problem with the historical approach is that there is no way to get
+a literal @samp{\} followed by the matched text.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+The 1992 POSIX standard attempted to fix this problem. The standard
+says that @code{sub} and @code{gsub} look for either a @samp{\} or an @samp{&}
+after the @samp{\}. If either one follows a @samp{\}, that character is
+output literally. The interpretation of @samp{\} and @samp{&} then becomes
+like this:
+
+@c thanks to Karl Berry for formatting this table
+@tex
+\vbox{\bigskip
+% This table has lots of &'s and \'s, so unspecialize them.
+\catcode`\& = \other \catcode`\\ = \other
+% But then we need character for escape and tab.
+@catcode`! = 4
+@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
+ You type!@code{sub} sees!@code{sub} generates@cr
+@hrulefill!@hrulefill!@hrulefill@cr
+ @code{&}! @code{&}!the matched text@cr
+ @code{\\&}! @code{\&}!a literal @samp{&}@cr
+@code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text@cr
+@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
+}
+@bigskip}
+@end tex
+@ifinfo
+@display
+ You type @code{sub} sees @code{sub} generates
+ -------- ---------- ---------------
+ @code{&} @code{&} the matched text
+ @code{\\&} @code{\&} a literal @samp{&}
+ @code{\\\\&} @code{\\&} a literal @samp{\}, then the matched text
+@code{\\\\\\&} @code{\\\&} a literal @samp{\&}
+@end display
+@end ifinfo
+
+@noindent
+This would appear to solve the problem.
+Unfortunately, the phrasing of the standard is unusual. It
+says, in effect, that @samp{\} turns off the special meaning of any
+following character, but that for anything other than @samp{\} and @samp{&},
+such special meaning is undefined. This wording leads to two problems.
+
+@enumerate
+@item
+Backslashes must now be doubled in the @var{replacement} string, breaking
+historical @code{awk} programs.
+
+@item
+To make sure that an @code{awk} program is portable, @emph{every} character
+in the @var{replacement} string must be preceded with a
+backslash.@footnote{This consequence was certainly unintended.}
+@c I can say that, 'cause I was involved in making this change
+@end enumerate
+
+The POSIX standard is under revision.@footnote{As of December 1995,
+with final approval and publication hopefully sometime in 1996.}
+Because of the above problems, proposed text for the revised standard
+reverts to rules that correspond more closely to the original existing
+practice. The proposed rules have special cases that make it possible
+to produce a @samp{\} preceding the matched text.
+
+@tex
+\vbox{\bigskip
+% This table has lots of &'s and \'s, so unspecialize them.
+\catcode`\& = \other \catcode`\\ = \other
+% But then we need character for escape and tab.
+@catcode`! = 4
+@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
+ You type!@code{sub} sees!@code{sub} generates@cr
+@hrulefill!@hrulefill!@hrulefill@cr
+@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
+@code{\\\\&}! @code{\\&}!a literal @samp{\}, followed by the matched text@cr
+ @code{\\&}! @code{\&}!a literal @samp{&}@cr
+ @code{\\q}! @code{\q}!a literal @samp{\q}@cr
+}
+@bigskip}
+@end tex
+@ifinfo
+@display
+ You type @code{sub} sees @code{sub} generates
+ -------- ---------- ---------------
+@code{\\\\\\&} @code{\\\&} a literal @samp{\&}
+ @code{\\\\&} @code{\\&} a literal @samp{\}, followed by the matched text
+ @code{\\&} @code{\&} a literal @samp{&}
+ @code{\\q} @code{\q} a literal @samp{\q}
+@end display
+@end ifinfo
+
+In a nutshell, at the run-time level, there are now three special sequences
+of characters, @samp{\\\&}, @samp{\\&} and @samp{\&}, whereas historically,
+there was only one. However, as in the historical case, any @samp{\} that
+is not part of one of these three sequences is not special, and appears
+in the output literally.
+
+@code{gawk} 3.0 follows these proposed POSIX rules for @code{sub} and
+@code{gsub}.
+@c As much as we think it's a lousy idea. You win some, you lose some. Sigh.
+Whether these proposed rules will actually become codified into the
+standard is unknown at this point. Subsequent @code{gawk} releases will
+track the standard and implement whatever the final version specifies;
+this @value{DOCUMENT} will be updated as well.
+
+The rules for @code{gensub} are considerably simpler. At the run-time
+level, whenever @code{gawk} sees a @samp{\}, if the following character
+is a digit, then the text that matched the corresponding parenthesized
+subexpression is placed in the generated output. Otherwise,
+no matter what the character after the @samp{\} is, that character will
+appear in the generated text, and the @samp{\} will not.
+
+@tex
+\vbox{\bigskip
+% This table has lots of &'s and \'s, so unspecialize them.
+\catcode`\& = \other \catcode`\\ = \other
+% But then we need character for escape and tab.
+@catcode`! = 4
+@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr
+ You type!@code{gensub} sees!@code{gensub} generates@cr
+@hrulefill!@hrulefill!@hrulefill@cr
+ @code{&}! @code{&}!the matched text@cr
+ @code{\\&}! @code{\&}!a literal @samp{&}@cr
+ @code{\\\\}! @code{\\}!a literal @samp{\}@cr
+ @code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text@cr
+@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr
+ @code{\\q}! @code{\q}!a literal @samp{q}@cr
+}
+@bigskip}
+@end tex
+@ifinfo
+@display
+ You type @code{gensub} sees @code{gensub} generates
+ -------- ------------- ------------------
+ @code{&} @code{&} the matched text
+ @code{\\&} @code{\&} a literal @samp{&}
+ @code{\\\\} @code{\\} a literal @samp{\}
+ @code{\\\\&} @code{\\&} a literal @samp{\}, then the matched text
+@code{\\\\\\&} @code{\\\&} a literal @samp{\&}
+ @code{\\q} @code{\q} a literal @samp{q}
+@end display
+@end ifinfo
+
+Because of the complexity of the lexical and run-time level processing,
+and the special cases for @code{sub} and @code{gsub},
+we recommend the use of @code{gawk} and @code{gensub} for when you have
+to do substitutions.
+
+@node I/O Functions, Time Functions, String Functions, Built-in
+@section Built-in Functions for Input/Output
+
+The following functions are related to Input/Output (I/O).
+Optional parameters are enclosed in square brackets (``['' and ``]'').
+
+@table @code
+@item close(@var{filename})
+@findex close
+Close the file @var{filename}, for input or output. The argument may
+alternatively be a shell command that was used for redirecting to or
+from a pipe; then the pipe is closed.
+@xref{Close Files And Pipes, ,Closing Input and Output Files and Pipes},
+for more information.
+
+@item fflush(@r{[}@var{filename}@r{]})
+@findex fflush
+@cindex portability issues
+@cindex flushing buffers
+@cindex buffers, flushing
+@cindex buffering output
+@cindex output, buffering
+Flush any buffered output associated @var{filename}, which is either a
+file opened for writing, or a shell command for redirecting output to
+a pipe.
+
+Many utility programs will @dfn{buffer} their output; they save information
+to be written to a disk file or terminal in memory, until there is enough
+for it to be worthwhile to send the data to the ouput device.
+This is often more efficient than writing
+every little bit of information as soon as it is ready. However, sometimes
+it is necessary to force a program to @dfn{flush} its buffers; that is,
+write the information to its destination, even if a buffer is not full.
+This is the purpose of the @code{fflush} function; @code{gawk} too
+buffers its output, and the @code{fflush} function can be used to force
+@code{gawk} to flush its buffers.
+
+@code{fflush} is a recent (1994) addition to the Bell Labs research
+version of @code{awk}; it is not part of the POSIX standard, and will
+not be available if @samp{--posix} has been specified on the command
+line (@pxref{Options, ,Command Line Options}).
+
+@code{gawk} extends the @code{fflush} function in two ways. This first
+is to allow no argument at all. In this case, the buffer for the
+standard output is flushed. The second way is to allow the null string
+(@w{@code{""}}) as the argument. In this case, the buffers for
+@emph{all} open output files and pipes are flushed.
+
+@code{fflush} returns zero if the buffer was successfully flushed,
+and nonzero otherwise.
+
+@item system(@var{command})
+@findex system
+@cindex interaction, @code{awk} and other programs
+The system function allows the user to execute operating system commands
+and then return to the @code{awk} program. The @code{system} function
+executes the command given by the string @var{command}. It returns, as
+its value, the status returned by the command that was executed.
+
+For example, if the following fragment of code is put in your @code{awk}
+program:
+
+@example
+END @{
+ system("date | mail -s 'awk run done' root")
+@}
+@end example
+
+@noindent
+the system administrator will be sent mail when the @code{awk} program
+finishes processing input and begins its end-of-input processing.
+
+Note that redirecting @code{print} or @code{printf} into a pipe is often
+enough to accomplish your task. However, if your @code{awk}
+program is interactive, @code{system} is useful for cranking up large
+self-contained programs, such as a shell or an editor.
+
+Some operating systems cannot implement the @code{system} function.
+@code{system} causes a fatal error if it is not supported.
+@end table
+
+@c fakenode --- for prepinfo
+@subheading Controlling Output Buffering with @code{system}
+@cindex flushing buffers
+@cindex buffers, flushing
+@cindex buffering output
+@cindex output, buffering
+
+The @code{fflush} function provides explicit control over output buffering for
+individual files and pipes. However, its use is not portable to many other
+@code{awk} implementations. An alternative method to flush output
+buffers is by calling @code{system} with a null string as its argument:
+
+@example
+system("") # flush output
+@end example
+
+@noindent
+@code{gawk} treats this use of the @code{system} function as a special
+case, and is smart enough not to run a shell (or other command
+interpreter) with the empty command. Therefore, with @code{gawk}, this
+idiom is not only useful, it is efficient. While this method should work
+with other @code{awk} implementations, it will not necessarily avoid
+starting an unnecessary shell. (Other implementations may only
+flush the buffer associated with the standard output, and not necessarily
+all buffered output.)
+
+If you think about what a programmer expects, it makes sense that
+@code{system} should flush any pending output. The following program:
+
+@example
+BEGIN @{
+ print "first print"
+ system("echo system echo")
+ print "second print"
+@}
+@end example
+
+@noindent
+must print
+
+@example
+first print
+system echo
+second print
+@end example
+
+@noindent
+and not
+
+@example
+system echo
+first print
+second print
+@end example
+
+If @code{awk} did not flush its buffers before calling @code{system}, the
+latter (undesirable) output is what you would see.
+
+@node Time Functions, , I/O Functions, Built-in
+@section Functions for Dealing with Time Stamps
+
+@cindex timestamps
+@cindex time of day
+A common use for @code{awk} programs is the processing of log files
+containing time stamp information, indicating when a
+particular log record was written. Many programs log their time stamp
+in the form returned by the @code{time} system call, which is the
+number of seconds since a particular epoch. On POSIX systems,
+it is the number of seconds since Midnight, January 1, 1970, UTC.
+
+In order to make it easier to process such log files, and to produce
+useful reports, @code{gawk} provides two functions for working with time
+stamps. Both of these are @code{gawk} extensions; they are not specified
+in the POSIX standard, nor are they in any other known version
+of @code{awk}.
+
+Optional parameters are enclosed in square brackets (``['' and ``]'').
+
+@table @code
+@item systime()
+@findex systime
+This function returns the current time as the number of seconds since
+the system epoch. On POSIX systems, this is the number of seconds
+since Midnight, January 1, 1970, UTC. It may be a different number on
+other systems.
+
+@item strftime(@r{[}@var{format} @r{[}, @var{timestamp}@r{]]})
+@findex strftime
+This function returns a string. It is similar to the function of the
+same name in ANSI C. The time specified by @var{timestamp} is used to
+produce a string, based on the contents of the @var{format} string.
+The @var{timestamp} is in the same format as the value returned by the
+@code{systime} function. If no @var{timestamp} argument is supplied,
+@code{gawk} will use the current time of day as the time stamp.
+If no @var{format} argument is supplied, @code{strftime} uses
+@code{@w{"%a %b %d %H:%M:%S %Z %Y"}}. This format string produces
+output (almost) equivalent to that of the @code{date} utility.
+(Versions of @code{gawk} prior to 3.0 require the @var{format} argument.)
+@end table
+
+The @code{systime} function allows you to compare a time stamp from a
+log file with the current time of day. In particular, it is easy to
+determine how long ago a particular record was logged. It also allows
+you to produce log records using the ``seconds since the epoch'' format.
+
+The @code{strftime} function allows you to easily turn a time stamp
+into human-readable information. It is similar in nature to the @code{sprintf}
+function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}),
+in that it copies non-format specification characters verbatim to the
+returned string, while substituting date and time values for format
+specifications in the @var{format} string.
+
+@code{strftime} is guaranteed by the ANSI C standard to support
+the following date format specifications:
+
+@table @code
+@item %a
+The locale's abbreviated weekday name.
+
+@item %A
+The locale's full weekday name.
+
+@item %b
+The locale's abbreviated month name.
+
+@item %B
+The locale's full month name.
+
+@item %c
+The locale's ``appropriate'' date and time representation.
+
+@item %d
+The day of the month as a decimal number (01--31).
+
+@item %H
+The hour (24-hour clock) as a decimal number (00--23).
+
+@item %I
+The hour (12-hour clock) as a decimal number (01--12).
+
+@item %j
+The day of the year as a decimal number (001--366).
+
+@item %m
+The month as a decimal number (01--12).
+
+@item %M
+The minute as a decimal number (00--59).
+
+@item %p
+The locale's equivalent of the AM/PM designations associated
+with a 12-hour clock.
+
+@item %S
+The second as a decimal number (00--61).@footnote{Occasionally there are
+minutes in a year with one or two leap seconds, which is why the
+seconds can go up to 61.}
+
+@item %U
+The week number of the year (the first Sunday as the first day of week one)
+as a decimal number (00--53).
+
+@item %w
+The weekday as a decimal number (0--6). Sunday is day zero.
+
+@item %W
+The week number of the year (the first Monday as the first day of week one)
+as a decimal number (00--53).
+
+@item %x
+The locale's ``appropriate'' date representation.
+
+@item %X
+The locale's ``appropriate'' time representation.
+
+@item %y
+The year without century as a decimal number (00--99).
+
+@item %Y
+The year with century as a decimal number (e.g., 1995).
+
+@item %Z
+The time zone name or abbreviation, or no characters if
+no time zone is determinable.
+
+@item %%
+A literal @samp{%}.
+@end table
+
+If a conversion specifier is not one of the above, the behavior is
+undefined.@footnote{This is because ANSI C leaves the
+behavior of the C version of @code{strftime} undefined, and @code{gawk}
+will use the system's version of @code{strftime} if it's there.
+Typically, the conversion specifier will either not appear in the
+returned string, or it will appear literally.}
+
+@cindex locale, definition of
+Informally, a @dfn{locale} is the geographic place in which a program
+is meant to run. For example, a common way to abbreviate the date
+September 4, 1991 in the United States would be ``9/4/91''.
+In many countries in Europe, however, it would be abbreviated ``4.9.91''.
+Thus, the @samp{%x} specification in a @code{"US"} locale might produce
+@samp{9/4/91}, while in a @code{"EUROPE"} locale, it might produce
+@samp{4.9.91}. The ANSI C standard defines a default @code{"C"}
+locale, which is an environment that is typical of what most C programmers
+are used to.
+
+A public-domain C version of @code{strftime} is supplied with @code{gawk}
+for systems that are not yet fully ANSI-compliant. If that version is
+used to compile @code{gawk} (@pxref{Installation, ,Installing @code{gawk}}),
+then the following additional format specifications are available:
+
+@table @code
+@item %D
+Equivalent to specifying @samp{%m/%d/%y}.
+
+@item %e
+The day of the month, padded with a space if it is only one digit.
+
+@item %h
+Equivalent to @samp{%b}, above.
+
+@item %n
+A newline character (ASCII LF).
+
+@item %r
+Equivalent to specifying @samp{%I:%M:%S %p}.
+
+@item %R
+Equivalent to specifying @samp{%H:%M}.
+
+@item %T
+Equivalent to specifying @samp{%H:%M:%S}.
+
+@item %t
+A tab character.
+
+@item %k
+The hour (24-hour clock) as a decimal number (0-23).
+Single digit numbers are padded with a space.
+
+@item %l
+The hour (12-hour clock) as a decimal number (1-12).
+Single digit numbers are padded with a space.
+
+@item %C
+The century, as a number between 00 and 99.
+
+@item %u
+The weekday as a decimal number
+[1 (Monday)--7].
+
+@cindex ISO 8601
+@item %V
+The week number of the year (the first Monday as the first
+day of week one) as a decimal number (01--53).
+The method for determining the week number is as specified by ISO 8601
+(to wit: if the week containing January 1 has four or more days in the
+new year, then it is week one, otherwise it is week 53 of the previous year
+and the next week is week one).
+
+@item %G
+The year with century of the ISO week number, as a decimal number.
+
+For example, January 1, 1993, is in week 53 of 1992. Thus, the year
+of its ISO week number is 1992, even though its year is 1993.
+Similarly, December 31, 1973, is in week 1 of 1974. Thus, the year
+of its ISO week number is 1974, even though its year is 1973.
+
+@item %g
+The year without century of the ISO week number, as a decimal number (00--99).
+
+@item %Ec %EC %Ex %Ey %EY %Od %Oe %OH %OI
+@itemx %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy
+These are ``alternate representations'' for the specifications
+that use only the second letter (@samp{%c}, @samp{%C}, and so on).
+They are recognized, but their normal representations are
+used.@footnote{If you don't understand any of this, don't worry about
+it; these facilities are meant to make it easier to ``internationalize''
+programs.}
+(These facilitate compliance with the POSIX @code{date} utility.)
+
+@item %v
+The date in VMS format (e.g., 20-JUN-1991).
+
+@cindex RFC-822
+@cindex RFC-1036
+@item %z
+The timezone offset in a +HHMM format (e.g., the format necessary to
+produce RFC-822/RFC-1036 date headers).
+@end table
+
+This example is an @code{awk} implementation of the POSIX
+@code{date} utility. Normally, the @code{date} utility prints the
+current date and time of day in a well known format. However, if you
+provide an argument to it that begins with a @samp{+}, @code{date}
+will copy non-format specifier characters to the standard output, and
+will interpret the current time according to the format specifiers in
+the string. For example:
+
+@example
+$ date '+Today is %A, %B %d, %Y.'
+@print{} Today is Thursday, July 11, 1991.
+@end example
+
+Here is the @code{gawk} version of the @code{date} utility.
+It has a shell ``wrapper'', to handle the @samp{-u} option,
+which requires that @code{date} run as if the time zone
+was set to UTC.
+
+@example
+@group
+#! /bin/sh
+#
+# date --- approximate the P1003.2 'date' command
+
+case $1 in
+-u) TZ=GMT0 # use UTC
+ export TZ
+ shift ;;
+esac
+@end group
+
+@group
+gawk 'BEGIN @{
+ format = "%a %b %d %H:%M:%S %Z %Y"
+ exitval = 0
+@end group
+
+@group
+ if (ARGC > 2)
+ exitval = 1
+ else if (ARGC == 2) @{
+ format = ARGV[1]
+ if (format ~ /^\+/)
+ format = substr(format, 2) # remove leading +
+ @}
+ print strftime(format)
+ exit exitval
+@}' "$@@"
+@end group
+@end example
+
+@node User-defined, Invoking Gawk, Built-in, Top
+@chapter User-defined Functions
+
+@cindex user-defined functions
+@cindex functions, user-defined
+Complicated @code{awk} programs can often be simplified by defining
+your own functions. User-defined functions can be called just like
+built-in ones (@pxref{Function Calls}), but it is up to you to define
+them---to tell @code{awk} what they should do.
+
+@menu
+* Definition Syntax:: How to write definitions and what they mean.
+* Function Example:: An example function definition and what it
+ does.
+* Function Caveats:: Things to watch out for.
+* Return Statement:: Specifying the value a function returns.
+@end menu
+
+@node Definition Syntax, Function Example, User-defined, User-defined
+@section Function Definition Syntax
+@cindex defining functions
+@cindex function definition
+
+Definitions of functions can appear anywhere between the rules of an
+@code{awk} program. Thus, the general form of an @code{awk} program is
+extended to include sequences of rules @emph{and} user-defined function
+definitions.
+There is no need in @code{awk} to put the definition of a function
+before all uses of the function. This is because @code{awk} reads the
+entire program before starting to execute any of it.
+
+The definition of a function named @var{name} looks like this:
+
+@example
+function @var{name}(@var{parameter-list})
+@{
+ @var{body-of-function}
+@}
+@end example
+
+@cindex names, use of
+@cindex namespaces
+@noindent
+@var{name} is the name of the function to be defined. A valid function
+name is like a valid variable name: a sequence of letters, digits and
+underscores, not starting with a digit.
+Within a single @code{awk} program, any particular name can only be
+used as a variable, array or function.
+
+@var{parameter-list} is a list of the function's arguments and local
+variable names, separated by commas. When the function is called,
+the argument names are used to hold the argument values given in
+the call. The local variables are initialized to the empty string.
+A function cannot have two parameters with the same name.
+
+The @var{body-of-function} consists of @code{awk} statements. It is the
+most important part of the definition, because it says what the function
+should actually @emph{do}. The argument names exist to give the body a
+way to talk about the arguments; local variables, to give the body
+places to keep temporary values.
+
+Argument names are not distinguished syntactically from local variable
+names; instead, the number of arguments supplied when the function is
+called determines how many argument variables there are. Thus, if three
+argument values are given, the first three names in @var{parameter-list}
+are arguments, and the rest are local variables.
+
+It follows that if the number of arguments is not the same in all calls
+to the function, some of the names in @var{parameter-list} may be
+arguments on some occasions and local variables on others. Another
+way to think of this is that omitted arguments default to the
+null string.
+
+Usually when you write a function you know how many names you intend to
+use for arguments and how many you intend to use as local variables. It is
+conventional to place some extra space between the arguments and
+the local variables, to document how your function is supposed to be used.
+
+@cindex variable shadowing
+During execution of the function body, the arguments and local variable
+values hide or @dfn{shadow} any variables of the same names used in the
+rest of the program. The shadowed variables are not accessible in the
+function definition, because there is no way to name them while their
+names have been taken away for the local variables. All other variables
+used in the @code{awk} program can be referenced or set normally in the
+function's body.
+
+The arguments and local variables last only as long as the function body
+is executing. Once the body finishes, you can once again access the
+variables that were shadowed while the function was running.
+
+@cindex recursive function
+@cindex function, recursive
+The function body can contain expressions which call functions. They
+can even call this function, either directly or by way of another
+function. When this happens, we say the function is @dfn{recursive}.
+
+@cindex @code{awk} language, POSIX version
+@cindex POSIX @code{awk}
+In many @code{awk} implementations, including @code{gawk},
+the keyword @code{function} may be
+abbreviated @code{func}. However, POSIX only specifies the use of
+the keyword @code{function}. This actually has some practical implications.
+If @code{gawk} is in POSIX-compatibility mode
+(@pxref{Options, ,Command Line Options}), then the following
+statement will @emph{not} define a function:
+
+@example
+func foo() @{ a = sqrt($1) ; print a @}
+@end example
+
+@noindent
+Instead it defines a rule that, for each record, concatenates the value
+of the variable @samp{func} with the return value of the function @samp{foo}.
+If the resulting string is non-null, the action is executed.
+This is probably not what was desired. (@code{awk} accepts this input as
+syntactically valid, since functions may be used before they are defined
+in @code{awk} programs.)
+
+@cindex portability issues
+To ensure that your @code{awk} programs are portable, always use the
+keyword @code{function} when defining a function.
+
+@node Function Example, Function Caveats, Definition Syntax, User-defined
+@section Function Definition Examples
+
+Here is an example of a user-defined function, called @code{myprint}, that
+takes a number and prints it in a specific format.
+
+@example
+function myprint(num)
+@{
+ printf "%6.3g\n", num
+@}
+@end example
+
+@noindent
+To illustrate, here is an @code{awk} rule which uses our @code{myprint}
+function:
+
+@example
+$3 > 0 @{ myprint($3) @}
+@end example
+
+@noindent
+This program prints, in our special format, all the third fields that
+contain a positive number in our input. Therefore, when given:
+
+@example
+ 1.2 3.4 5.6 7.8
+ 9.10 11.12 -13.14 15.16
+17.18 19.20 21.22 23.24
+@end example
+
+@noindent
+this program, using our function to format the results, prints:
+
+@example
+ 5.6
+ 21.2
+@end example
+
+This function deletes all the elements in an array.
+
+@example
+function delarray(a, i)
+@{
+ for (i in a)
+ delete a[i]
+@}
+@end example
+
+When working with arrays, it is often necessary to delete all the elements
+in an array and start over with a new list of elements
+(@pxref{Delete, ,The @code{delete} Statement}).
+Instead of having
+to repeat this loop everywhere in your program that you need to clear out
+an array, your program can just call @code{delarray}.
+
+Here is an example of a recursive function. It takes a string
+as an input parameter, and returns the string in backwards order.
+
+@example
+function rev(str, start)
+@{
+ if (start == 0)
+ return ""
+
+ return (substr(str, start, 1) rev(str, start - 1))
+@}
+@end example
+
+If this function is in a file named @file{rev.awk}, we can test it
+this way:
+
+@example
+$ echo "Don't Panic!" |
+> gawk --source '@{ print rev($0, length($0)) @}' -f rev.awk
+@print{} !cinaP t'noD
+@end example
+
+Here is an example that uses the built-in function @code{strftime}.
+(@xref{Time Functions, ,Functions for Dealing with Time Stamps},
+for more information on @code{strftime}.)
+The C @code{ctime} function takes a timestamp and returns it in a string,
+formatted in a well known fashion. Here is an @code{awk} version:
+
+@example
+@c file eg/lib/ctime.awk
+@group
+# ctime.awk
+#
+# awk version of C ctime(3) function
+
+function ctime(ts, format)
+@{
+ format = "%a %b %d %H:%M:%S %Z %Y"
+ if (ts == 0)
+ ts = systime() # use current time as default
+ return strftime(format, ts)
+@}
+@c endfile
+@end group
+@end example
+
+@node Function Caveats, Return Statement, Function Example, User-defined
+@section Calling User-defined Functions
+
+@cindex call by value
+@cindex call by reference
+@cindex calling a function
+@cindex function call
+@dfn{Calling a function} means causing the function to run and do its job.
+A function call is an expression, and its value is the value returned by
+the function.
+
+A function call consists of the function name followed by the arguments
+in parentheses. What you write in the call for the arguments are
+@code{awk} expressions; each time the call is executed, these
+expressions are evaluated, and the values are the actual arguments. For
+example, here is a call to @code{foo} with three arguments (the first
+being a string concatenation):
+
+@example
+foo(x y, "lose", 4 * z)
+@end example
+
+@strong{Caution:} whitespace characters (spaces and tabs) are not allowed
+between the function name and the open-parenthesis of the argument list.
+If you write whitespace by mistake, @code{awk} might think that you mean
+to concatenate a variable with an expression in parentheses. However, it
+notices that you used a function name and not a variable name, and reports
+an error.
+
+@cindex call by value
+When a function is called, it is given a @emph{copy} of the values of
+its arguments. This is known as @dfn{call by value}. The caller may use
+a variable as the expression for the argument, but the called function
+does not know this: it only knows what value the argument had. For
+example, if you write this code:
+
+@example
+foo = "bar"
+z = myfunc(foo)
+@end example
+
+@noindent
+then you should not think of the argument to @code{myfunc} as being
+``the variable @code{foo}.'' Instead, think of the argument as the
+string value, @code{"bar"}.
+
+If the function @code{myfunc} alters the values of its local variables,
+this has no effect on any other variables. Thus, if @code{myfunc}
+does this:
+
+@example
+@group
+function myfunc(str)
+@{
+ print str
+ str = "zzz"
+ print str
+@}
+@end group
+@end example
+
+@noindent
+to change its first argument variable @code{str}, this @emph{does not}
+change the value of @code{foo} in the caller. The role of @code{foo} in
+calling @code{myfunc} ended when its value, @code{"bar"}, was computed.
+If @code{str} also exists outside of @code{myfunc}, the function body
+cannot alter this outer value, because it is shadowed during the
+execution of @code{myfunc} and cannot be seen or changed from there.
+
+@cindex call by reference
+However, when arrays are the parameters to functions, they are @emph{not}
+copied. Instead, the array itself is made available for direct manipulation
+by the function. This is usually called @dfn{call by reference}.
+Changes made to an array parameter inside the body of a function @emph{are}
+visible outside that function.
+@ifinfo
+This can be @strong{very} dangerous if you do not watch what you are
+doing. For example:
+@end ifinfo
+@iftex
+@emph{This can be very dangerous if you do not watch what you are
+doing.} For example:
+@end iftex
+
+@example
+function changeit(array, ind, nvalue)
+@{
+ array[ind] = nvalue
+@}
+
+BEGIN @{
+ a[1] = 1; a[2] = 2; a[3] = 3
+ changeit(a, 2, "two")
+ printf "a[1] = %s, a[2] = %s, a[3] = %s\n",
+ a[1], a[2], a[3]
+@}
+@end example
+
+@noindent
+This program prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because
+@code{changeit} stores @code{"two"} in the second element of @code{a}.
+
+@cindex undefined functions
+@cindex functions, undefined
+Some @code{awk} implementations allow you to call a function that
+has not been defined, and only report a problem at run-time when the
+program actually tries to call the function. For example:
+
+@example
+@group
+BEGIN @{
+ if (0)
+ foo()
+ else
+ bar()
+@}
+function bar() @{ @dots{} @}
+# note that `foo' is not defined
+@end group
+@end example
+
+@noindent
+Since the @samp{if} statement will never be true, it is not really a
+problem that @code{foo} has not been defined. Usually though, it is a
+problem if a program calls an undefined function.
+
+@ignore
+At one point, I had gawk dieing on this, but later decided that this might
+break old programs and/or test suites.
+@end ignore
+
+If @samp{--lint} has been specified
+(@pxref{Options, ,Command Line Options}),
+@code{gawk} will report about calls to undefined functions.
+
+@node Return Statement, , Function Caveats, User-defined
+@section The @code{return} Statement
+@cindex @code{return} statement
+
+The body of a user-defined function can contain a @code{return} statement.
+This statement returns control to the rest of the @code{awk} program. It
+can also be used to return a value for use in the rest of the @code{awk}
+program. It looks like this:
+
+@example
+return @r{[}@var{expression}@r{]}
+@end example
+
+The @var{expression} part is optional. If it is omitted, then the returned
+value is undefined and, therefore, unpredictable.
+
+A @code{return} statement with no value expression is assumed at the end of
+every function definition. So if control reaches the end of the function
+body, then the function returns an unpredictable value. @code{awk}
+will @emph{not} warn you if you use the return value of such a function.
+
+Sometimes, you want to write a function for what it does, not for
+what it returns. Such a function corresponds to a @code{void} function
+in C or to a @code{procedure} in Pascal. Thus, it may be appropriate to not
+return any value; you should simply bear in mind that if you use the return
+value of such a function, you do so at your own risk.
+
+Here is an example of a user-defined function that returns a value
+for the largest number among the elements of an array:
+
+@example
+@group
+function maxelt(vec, i, ret)
+@{
+ for (i in vec) @{
+ if (ret == "" || vec[i] > ret)
+ ret = vec[i]
+ @}
+ return ret
+@}
+@end group
+@end example
+
+@noindent
+You call @code{maxelt} with one argument, which is an array name. The local
+variables @code{i} and @code{ret} are not intended to be arguments;
+while there is nothing to stop you from passing two or three arguments
+to @code{maxelt}, the results would be strange. The extra space before
+@code{i} in the function parameter list indicates that @code{i} and
+@code{ret} are not supposed to be arguments. This is a convention that
+you should follow when you define functions.
+
+Here is a program that uses our @code{maxelt} function. It loads an
+array, calls @code{maxelt}, and then reports the maximum number in that
+array:
+
+@example
+@group
+awk '
+function maxelt(vec, i, ret)
+@{
+ for (i in vec) @{
+ if (ret == "" || vec[i] > ret)
+ ret = vec[i]
+ @}
+ return ret
+@}
+@end group
+
+@group
+# Load all fields of each record into nums.
+@{
+ for(i = 1; i <= NF; i++)
+ nums[NR, i] = $i
+@}
+
+END @{
+ print maxelt(nums)
+@}'
+@end group
+@end example
+
+Given the following input:
+
+@example
+@group
+ 1 5 23 8 16
+44 3 5 2 8 26
+256 291 1396 2962 100
+-6 467 998 1101
+99385 11 0 225
+@end group
+@end example
+
+@noindent
+our program tells us (predictably) that @code{99385} is the largest number
+in our array.
+
+@node Invoking Gawk, Library Functions, User-defined, Top
+@chapter Running @code{awk}
+@cindex command line
+@cindex invocation of @code{gawk}
+@cindex arguments, command line
+@cindex options, command line
+@cindex long options
+@cindex options, long
+
+There are two ways to run @code{awk}: with an explicit program, or with
+one or more program files. Here are templates for both of them; items
+enclosed in @samp{@r{[}@dots{}@r{]}} in these templates are optional.
+
+Besides traditional one-letter POSIX-style options, @code{gawk} also
+supports GNU long options.
+
+@example
+awk @r{[@var{options}]} -f progfile @r{[@code{--}]} @var{file} @dots{}
+awk @r{[@var{options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{}
+@end example
+
+@cindex empty program
+@cindex dark corner
+It is possible to invoke @code{awk} with an empty program:
+
+@example
+$ awk '' datafile1 datafile2
+@end example
+
+@noindent
+Doing so makes little sense though; @code{awk} will simply exit
+silently when given an empty program (d.c.). If @samp{--lint} has
+been specified on the command line, @code{gawk} will issue a
+warning that the program is empty.
+
+@menu
+* Options:: Command line options and their meanings.
+* Other Arguments:: Input file names and variable assignments.
+* AWKPATH Variable:: Searching directories for @code{awk} programs.
+* Obsolete:: Obsolete Options and/or features.
+* Undocumented:: Undocumented Options and Features.
+* Known Bugs:: Known Bugs in @code{gawk}.
+@end menu
+
+@node Options, Other Arguments, Invoking Gawk, Invoking Gawk
+@section Command Line Options
+
+Options begin with a dash, and consist of a single character.
+GNU style long options consist of two dashes and a keyword.
+The keyword can be abbreviated, as long the abbreviation allows the option
+to be uniquely identified. If the option takes an argument, then the
+keyword is either immediately followed by an equals sign (@samp{=}) and the
+argument's value, or the keyword and the argument's value are separated
+by whitespace. For brevity, the discussion below only refers to the
+traditional short options; however the long and short options are
+interchangeable in all contexts.
+
+Each long option for @code{gawk} has a corresponding
+POSIX-style option. The options and their meanings are as follows:
+
+@table @code
+@item -F @var{fs}
+@itemx --field-separator @var{fs}
+@cindex @code{-F} option
+@cindex @code{--field-separator} option
+Sets the @code{FS} variable to @var{fs}
+(@pxref{Field Separators, ,Specifying How Fields are Separated}).
+
+@item -f @var{source-file}
+@itemx --file @var{source-file}
+@cindex @code{-f} option
+@cindex @code{--file} option
+Indicates that the @code{awk} program is to be found in @var{source-file}
+instead of in the first non-option argument.
+
+@item -v @var{var}=@var{val}
+@itemx --assign @var{var}=@var{val}
+@cindex @code{-v} option
+@cindex @code{--assign} option
+Sets the variable @var{var} to the value @var{val} @strong{before}
+execution of the program begins. Such variable values are available
+inside the @code{BEGIN} rule
+(@pxref{Other Arguments, ,Other Command Line Arguments}).
+
+The @samp{-v} option can only set one variable, but you can use
+it more than once, setting another variable each time, like this:
+@samp{awk @w{-v foo=1} @w{-v bar=2} @dots{}}.
+
+@item -mf=@var{NNN}
+@itemx -mr=@var{NNN}
+Set various memory limits to the value @var{NNN}. The @samp{f} flag sets
+the maximum number of fields, and the @samp{r} flag sets the maximum
+record size. These two flags and the @samp{-m} option are from the
+Bell Labs research version of Unix @code{awk}. They are provided
+for compatibility, but otherwise ignored by
+@code{gawk}, since @code{gawk} has no predefined limits.
+
+@item -W @var{gawk-opt}
+@cindex @code{-W} option
+Following the POSIX standard, options that are implementation
+specific are supplied as arguments to the @samp{-W} option. With @code{gawk},
+these arguments may be separated by commas, or quoted and separated by
+whitespace. Case is ignored when processing these options. These options
+also have corresponding GNU style long options.
+See below.
+
+@item --
+Signals the end of the command line options. The following arguments
+are not treated as options even if they begin with @samp{-}. This
+interpretation of @samp{--} follows the POSIX argument parsing
+conventions.
+
+This is useful if you have file names that start with @samp{-},
+or in shell scripts, if you have file names that will be specified
+by the user which could start with @samp{-}.
+@end table
+
+The following @code{gawk}-specific options are available:
+
+@table @code
+@item -W traditional
+@itemx -W compat
+@itemx --traditional
+@itemx --compat
+@cindex @code{--compat} option
+@cindex @code{--traditional} option
+@cindex compatibility mode
+Specifies @dfn{compatibility mode}, in which the GNU extensions to
+the @code{awk} language are disabled, so that @code{gawk} behaves just
+like the Bell Labs research version of Unix @code{awk}.
+@samp{--traditional} is the preferred form of this option.
+@xref{POSIX/GNU, ,Extensions in @code{gawk} Not in POSIX @code{awk}},
+which summarizes the extensions. Also see
+@ref{Compatibility Mode, ,Downward Compatibility and Debugging}.
+
+@item -W copyleft
+@itemx -W copyright
+@itemx --copyleft
+@itemx --copyright
+@cindex @code{--copyleft} option
+@cindex @code{--copyright} option
+Print the short version of the General Public License.
+This option may disappear in a future version of @code{gawk}.
+
+@item -W help
+@itemx -W usage
+@itemx --help
+@itemx --usage
+@cindex @code{--help} option
+@cindex @code{--usage} option
+Print a ``usage'' message summarizing the short and long style options
+that @code{gawk} accepts, and then exit.
+
+@item -W lint
+@itemx --lint
+@cindex @code{--lint} option
+Warn about constructs that are dubious or non-portable to
+other @code{awk} implementations.
+Some warnings are issued when @code{gawk} first reads your program. Others
+are issued at run-time, as your program executes.
+
+@item -W lint-old
+@itemx --lint-old
+@cindex @code{--lint-old} option
+Warn about constructs that are not available in
+the original Version 7 Unix version of @code{awk}
+(@pxref{V7/SVR3.1, , Major Changes between V7 and SVR3.1}).
+
+@item -W posix
+@itemx --posix
+@cindex @code{--posix} option
+@cindex POSIX mode
+Operate in strict POSIX mode. This disables all @code{gawk}
+extensions (just like @samp{--traditional}), and adds the following additional
+restrictions:
+
+@c IMPORTANT! Keep this list in sync with the one in node POSIX
+
+@itemize @bullet
+@item
+@code{\x} escape sequences are not recognized
+(@pxref{Escape Sequences}).
+
+@item
+The synonym @code{func} for the keyword @code{function} is not
+recognized (@pxref{Definition Syntax, ,Function Definition Syntax}).
+
+@item
+The operators @samp{**} and @samp{**=} cannot be used in
+place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops, ,Arithmetic Operators},
+and also @pxref{Assignment Ops, ,Assignment Expressions}).
+
+@item
+Specifying @samp{-Ft} on the command line does not set the value
+of @code{FS} to be a single tab character
+(@pxref{Field Separators, ,Specifying How Fields are Separated}).
+
+@item
+The @code{fflush} built-in function is not supported
+(@pxref{I/O Functions, , Built-in Functions for Input/Output}).
+@end itemize
+
+If you supply both @samp{--traditional} and @samp{--posix} on the
+command line, @samp{--posix} will take precedence. @code{gawk}
+will also issue a warning if both options are supplied.
+
+@item -W re-interval
+@itemx --re-interval
+Allow interval expressions
+(@pxref{Regexp Operators, , Regular Expression Operators}),
+in regexps.
+Because interval expressions were traditionally not available in @code{awk},
+@code{gawk} does not provide them by default. This prevents old @code{awk}
+programs from breaking.
+
+@item -W source @var{program-text}
+@itemx --source @var{program-text}
+@cindex @code{--source} option
+Program source code is taken from the @var{program-text}. This option
+allows you to mix source code in files with source
+code that you enter on the command line. This is particularly useful
+when you have library functions that you wish to use from your command line
+programs (@pxref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}).
+
+@item -W version
+@itemx --version
+@cindex @code{--version} option
+Prints version information for this particular copy of @code{gawk}.
+This allows you to determine if your copy of @code{gawk} is up to date
+with respect to whatever the Free Software Foundation is currently
+distributing.
+It is also useful for bug reports
+(@pxref{Bugs, , Reporting Problems and Bugs}).
+@end table
+
+Any other options are flagged as invalid with a warning message, but
+are otherwise ignored.
+
+In compatibility mode, as a special case, if the value of @var{fs} supplied
+to the @samp{-F} option is @samp{t}, then @code{FS} is set to the tab
+character (@code{"\t"}). This is only true for @samp{--traditional}, and not
+for @samp{--posix}
+(@pxref{Field Separators, ,Specifying How Fields are Separated}).
+
+The @samp{-f} option may be used more than once on the command line.
+If it is, @code{awk} reads its program source from all of the named files, as
+if they had been concatenated together into one big file. This is
+useful for creating libraries of @code{awk} functions. Useful functions
+can be written once, and then retrieved from a standard place, instead
+of having to be included into each individual program.
+
+You can type in a program at the terminal and still use library functions,
+by specifying @samp{-f /dev/tty}. @code{awk} will read a file from the terminal
+to use as part of the @code{awk} program. After typing your program,
+type @kbd{Control-d} (the end-of-file character) to terminate it.
+(You may also use @samp{-f -} to read program source from the standard
+input, but then you will not be able to also use the standard input as a
+source of data.)
+
+Because it is clumsy using the standard @code{awk} mechanisms to mix source
+file and command line @code{awk} programs, @code{gawk} provides the
+@samp{--source} option. This does not require you to pre-empt the standard
+input for your source code, and allows you to easily mix command line
+and library source code
+(@pxref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}).
+
+If no @samp{-f} or @samp{--source} option is specified, then @code{gawk}
+will use the first non-option command line argument as the text of the
+program source code.
+
+@cindex @code{POSIXLY_CORRECT} environment variable
+@cindex environment variable, @code{POSIXLY_CORRECT}
+If the environment variable @code{POSIXLY_CORRECT} exists,
+then @code{gawk} will behave in strict POSIX mode, exactly as if
+you had supplied the @samp{--posix} command line option.
+Many GNU programs look for this environment variable to turn on
+strict POSIX mode. If you supply @samp{--lint} on the command line,
+and @code{gawk} turns on POSIX mode because of @code{POSIXLY_CORRECT},
+then it will print a warning message indicating that POSIX
+mode is in effect.
+
+You would typically set this variable in your shell's startup file.
+For a Bourne compatible shell (such as Bash), you would add these
+lines to the @file{.profile} file in your home directory.
+
+@example
+@group
+POSIXLY_CORRECT=true
+export POSIXLY_CORRECT
+@end group
+@end example
+
+For a @code{csh} compatible shell,@footnote{Not recommended.}
+you would add this line to the @file{.login} file in your home directory.
+
+@example
+setenv POSIXLY_CORRECT true
+@end example
+
+@node Other Arguments, AWKPATH Variable, Options, Invoking Gawk
+@section Other Command Line Arguments
+
+Any additional arguments on the command line are normally treated as
+input files to be processed in the order specified. However, an
+argument that has the form @code{@var{var}=@var{value}}, assigns
+the value @var{value} to the variable @var{var}---it does not specify a
+file at all.
+
+@vindex ARGIND
+@vindex ARGV
+All these arguments are made available to your @code{awk} program in the
+@code{ARGV} array (@pxref{Built-in Variables}). Command line options
+and the program text (if present) are omitted from @code{ARGV}.
+All other arguments, including variable assignments, are
+included. As each element of @code{ARGV} is processed, @code{gawk}
+sets the variable @code{ARGIND} to the index in @code{ARGV} of the
+current element.
+
+The distinction between file name arguments and variable-assignment
+arguments is made when @code{awk} is about to open the next input file.
+At that point in execution, it checks the ``file name'' to see whether
+it is really a variable assignment; if so, @code{awk} sets the variable
+instead of reading a file.
+
+Therefore, the variables actually receive the given values after all
+previously specified files have been read. In particular, the values of
+variables assigned in this fashion are @emph{not} available inside a
+@code{BEGIN} rule
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}),
+since such rules are run before @code{awk} begins scanning the argument list.
+
+@cindex dark corner
+The variable values given on the command line are processed for escape
+sequences (d.c.) (@pxref{Escape Sequences}).
+
+In some earlier implementations of @code{awk}, when a variable assignment
+occurred before any file names, the assignment would happen @emph{before}
+the @code{BEGIN} rule was executed. @code{awk}'s behavior was thus
+inconsistent; some command line assignments were available inside the
+@code{BEGIN} rule, while others were not. However,
+some applications came to depend
+upon this ``feature.'' When @code{awk} was changed to be more consistent,
+the @samp{-v} option was added to accommodate applications that depended
+upon the old behavior.
+
+The variable assignment feature is most useful for assigning to variables
+such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and
+output formats, before scanning the data files. It is also useful for
+controlling state if multiple passes are needed over a data file. For
+example:
+
+@cindex multiple passes over data
+@cindex passes, multiple
+@example
+awk 'pass == 1 @{ @var{pass 1 stuff} @}
+ pass == 2 @{ @var{pass 2 stuff} @}' pass=1 mydata pass=2 mydata
+@end example
+
+Given the variable assignment feature, the @samp{-F} option for setting
+the value of @code{FS} is not
+strictly necessary. It remains for historical compatibility.
+
+@node AWKPATH Variable, Obsolete, Other Arguments, Invoking Gawk
+@section The @code{AWKPATH} Environment Variable
+@cindex @code{AWKPATH} environment variable
+@cindex environment variable, @code{AWKPATH}
+@cindex search path
+@cindex directory search
+@cindex path, search
+@cindex differences between @code{gawk} and @code{awk}
+
+The previous section described how @code{awk} program files can be named
+on the command line with the @samp{-f} option. In most @code{awk}
+implementations, you must supply a precise path name for each program
+file, unless the file is in the current directory.
+
+@cindex search path, for source files
+But in @code{gawk}, if the file name supplied to the @samp{-f} option
+does not contain a @samp{/}, then @code{gawk} searches a list of
+directories (called the @dfn{search path}), one by one, looking for a
+file with the specified name.
+
+The search path is a string consisting of directory names
+separated by colons. @code{gawk} gets its search path from the
+@code{AWKPATH} environment variable. If that variable does not exist,
+@code{gawk} uses a default path, which is
+@samp{.:/usr/local/share/awk}.@footnote{Your version of @code{gawk}
+may use a directory that is different than @file{/usr/local/share/awk}; it
+will depend upon how @code{gawk} was built and installed. The actual
+directory will be the value of @samp{$(datadir)} generated when
+@code{gawk} was configured. You probably don't need to worry about this
+though.} (Programs written for use by
+system administrators should use an @code{AWKPATH} variable that
+does not include the current directory, @file{.}.)
+
+The search path feature is particularly useful for building up libraries
+of useful @code{awk} functions. The library files can be placed in a
+standard directory that is in the default path, and then specified on
+the command line with a short file name. Otherwise, the full file name
+would have to be typed for each file.
+
+By using both the @samp{--source} and @samp{-f} options, your command line
+@code{awk} programs can use facilities in @code{awk} library files.
+@xref{Library Functions, , A Library of @code{awk} Functions}.
+
+Path searching is not done if @code{gawk} is in compatibility mode.
+This is true for both @samp{--traditional} and @samp{--posix}.
+@xref{Options, ,Command Line Options}.
+
+@strong{Note:} if you want files in the current directory to be found,
+you must include the current directory in the path, either by including
+@file{.} explicitly in the path, or by writing a null entry in the
+path. (A null entry is indicated by starting or ending the path with a
+colon, or by placing two colons next to each other (@samp{::}).) If the
+current directory is not included in the path, then files cannot be
+found in the current directory. This path search mechanism is identical
+to the shell's.
+@c someday, @cite{The Bourne Again Shell}....
+
+Starting with version 3.0, if @code{AWKPATH} is not defined in the
+environment, @code{gawk} will place its default search path into
+@code{ENVIRON["AWKPATH"]}. This makes it easy to determine
+the actual search path @code{gawk} will use.
+
+@node Obsolete, Undocumented, AWKPATH Variable, Invoking Gawk
+@section Obsolete Options and/or Features
+
+@cindex deprecated options
+@cindex obsolete options
+@cindex deprecated features
+@cindex obsolete features
+This section describes features and/or command line options from
+previous releases of @code{gawk} that are either not available in the
+current version, or that are still supported but deprecated (meaning that
+they will @emph{not} be in the next release).
+
+@c update this section for each release!
+
+For version @value{VERSION} of @code{gawk}, there are no command line options
+or other deprecated features from the previous version of @code{gawk}.
+@iftex
+This section
+@end iftex
+@ifinfo
+This node
+@end ifinfo
+is thus essentially a place holder,
+in case some option becomes obsolete in a future version of @code{gawk}.
+
+@ignore
+@c This is pretty old news...
+The public-domain version of @code{strftime} that is distributed with
+@code{gawk} changed for the 2.14 release. The @samp{%V} conversion specifier
+that used to generate the date in VMS format was changed to @samp{%v}.
+This is because the POSIX standard for the @code{date} utility now
+specifies a @samp{%V} conversion specifier.
+@xref{Time Functions, ,Functions for Dealing with Time Stamps}, for details.
+@end ignore
+
+@node Undocumented, Known Bugs, Obsolete, Invoking Gawk
+@section Undocumented Options and Features
+@cindex undocumented features
+
+This section intentionally left blank.
+
+@c Read The Source, Luke!
+
+@ignore
+@c If these came out in the Info file or TeX document, then they wouldn't
+@c be undocumented, would they?
+
+@code{gawk} has one undocumented option:
+
+@table @code
+@item -W nostalgia
+@itemx --nostalgia
+Print the message @code{"awk: bailing out near line 1"} and dump core.
+This option was inspired by the common behavior of very early versions of
+Unix @code{awk}, and by a t--shirt.
+@end table
+
+Early versions of @code{awk} used to not require any separator (either
+a newline or @samp{;}) between the rules in @code{awk} programs. Thus,
+it was common to see one-line programs like:
+
+@example
+awk '@{ sum += $1 @} END @{ print sum @}'
+@end example
+
+@code{gawk} actually supports this, but it is purposely undocumented
+since it is considered bad style. The correct way to write such a program
+is either
+
+@example
+awk '@{ sum += $1 @} ; END @{ print sum @}'
+@end example
+
+@noindent
+or
+
+@example
+awk '@{ sum += $1 @}
+ END @{ print sum @}' data
+@end example
+
+@noindent
+@xref{Statements/Lines, ,@code{awk} Statements Versus Lines}, for a fuller
+explanation.
+
+@end ignore
+
+@node Known Bugs, , Undocumented, Invoking Gawk
+@section Known Bugs in @code{gawk}
+@cindex bugs, known in @code{gawk}
+@cindex known bugs
+
+@itemize @bullet
+@item
+The @samp{-F} option for changing the value of @code{FS}
+(@pxref{Options, ,Command Line Options})
+is not necessary given the command line variable
+assignment feature; it remains only for backwards compatibility.
+
+@item
+If your system actually has support for @file{/dev/fd} and the
+associated @file{/dev/stdin}, @file{/dev/stdout}, and
+@file{/dev/stderr} files, you may get different output from @code{gawk}
+than you would get on a system without those files. When @code{gawk}
+interprets these files internally, it synchronizes output to the
+standard output with output to @file{/dev/stdout}, while on a system
+with those files, the output is actually to different open files
+(@pxref{Special Files, ,Special File Names in @code{gawk}}).
+
+@item
+Syntactically invalid single character programs tend to overflow
+the parse stack, generating a rather unhelpful message. Such programs
+are surprisingly difficult to diagnose in the completely general case,
+and the effort to do so really is not worth it.
+
+@item
+The word ``GNU'' is incorrectly capitalized in at least one
+file in the source code.
+@end itemize
+
+@node Library Functions, Sample Programs, Invoking Gawk, Top
+@chapter A Library of @code{awk} Functions
+
+@c 2e: USE TEXINFO-2 FUNCTION DEFINITION STUFF!!!!!!!!!!!!!
+This chapter presents a library of useful @code{awk} functions. The
+sample programs presented later
+(@pxref{Sample Programs, ,Practical @code{awk} Programs})
+use these functions.
+The functions are presented here in a progression from simple to complex.
+
+@ref{Extract Program, ,Extracting Programs from Texinfo Source Files},
+presents a program that you can use to extract the source code for
+these example library functions and programs from the Texinfo source
+for this @value{DOCUMENT}.
+(This has already been done as part of the @code{gawk} distribution.)
+
+If you have written one or more useful, general purpose @code{awk} functions,
+and would like to contribute them for a subsequent edition of this @value{DOCUMENT},
+please contact the author. @xref{Bugs, ,Reporting Problems and Bugs},
+for information on doing this. Don't just send code, as you will be
+required to either place your code in the public domain,
+publish it under the GPL (@pxref{Copying, ,GNU GENERAL PUBLIC LICENSE}),
+or assign the copyright in it to the Free Software Foundation.
+
+@menu
+* Portability Notes:: What to do if you don't have @code{gawk}.
+* Nextfile Function:: Two implementations of a @code{nextfile}
+ function.
+* Assert Function:: A function for assertions in @code{awk}
+ programs.
+* Ordinal Functions:: Functions for using characters as numbers and
+ vice versa.
+* Join Function:: A function to join an array into a string.
+* Mktime Function:: A function to turn a date into a timestamp.
+* Gettimeofday Function:: A function to get formatted times.
+* Filetrans Function:: A function for handling data file transitions.
+* Getopt Function:: A function for processing command line
+ arguments.
+* Passwd Functions:: Functions for getting user information.
+* Group Functions:: Functions for getting group information.
+* Library Names:: How to best name private global variables in
+ library functions.
+@end menu
+
+@node Portability Notes, Nextfile Function, Library Functions, Library Functions
+@section Simulating @code{gawk}-specific Features
+@cindex portability issues
+
+The programs in this chapter and in
+@ref{Sample Programs, ,Practical @code{awk} Programs},
+freely use features that are specific to @code{gawk}.
+This section briefly discusses how you can rewrite these programs for
+different implementations of @code{awk}.
+
+Diagnostic error messages are sent to @file{/dev/stderr}.
+Use @samp{| "cat 1>&2"} instead of @samp{> "/dev/stderr"}, if your system
+does not have a @file{/dev/stderr}, or if you cannot use @code{gawk}.
+
+A number of programs use @code{nextfile}
+(@pxref{Nextfile Statement, ,The @code{nextfile} Statement}),
+to skip any remaining input in the input file.
+@ref{Nextfile Function, ,Implementing @code{nextfile} as a Function},
+shows you how to write a function that will do the same thing.
+
+Finally, some of the programs choose to ignore upper-case and lower-case
+distinctions in their input. They do this by assigning one to @code{IGNORECASE}.
+You can achieve the same effect by adding the following rule to the
+beginning of the program:
+
+@example
+# ignore case
+@{ $0 = tolower($0) @}
+@end example
+
+@noindent
+Also, verify that all regexp and string constants used in
+comparisons only use lower-case letters.
+
+@node Nextfile Function, Assert Function, Portability Notes, Library Functions
+@section Implementing @code{nextfile} as a Function
+
+@cindex skipping input files
+@cindex input files, skipping
+The @code{nextfile} statement presented in
+@ref{Nextfile Statement, ,The @code{nextfile} Statement},
+is a @code{gawk}-specific extension. It is not available in other
+implementations of @code{awk}. This section shows two versions of a
+@code{nextfile} function that you can use to simulate @code{gawk}'s
+@code{nextfile} statement if you cannot use @code{gawk}.
+
+Here is a first attempt at writing a @code{nextfile} function.
+
+@example
+@group
+# nextfile --- skip remaining records in current file
+
+# this should be read in before the "main" awk program
+
+function nextfile() @{ _abandon_ = FILENAME; next @}
+
+_abandon_ == FILENAME @{ next @}
+@end group
+@end example
+
+This file should be included before the main program, because it supplies
+a rule that must be executed first. This rule compares the current data
+file's name (which is always in the @code{FILENAME} variable) to a private
+variable named @code{_abandon_}. If the file name matches, then the action
+part of the rule executes a @code{next} statement, to go on to the next
+record. (The use of @samp{_} in the variable name is a convention.
+It is discussed more fully in
+@ref{Library Names, , Naming Library Function Global Variables}.)
+
+The use of the @code{next} statement effectively creates a loop that reads
+all the records from the current data file.
+Eventually, the end of the file is reached, and
+a new data file is opened, changing the value of @code{FILENAME}.
+Once this happens, the comparison of @code{_abandon_} to @code{FILENAME}
+fails, and execution continues with the first rule of the ``real'' program.
+
+The @code{nextfile} function itself simply sets the value of @code{_abandon_}
+and then executes a @code{next} statement to start the loop
+going.@footnote{Some implementations of @code{awk} do not allow you to
+execute @code{next} from within a function body. Some other work-around
+will be necessary if you use such a version.}
+@c mawk is what we're talking about.
+
+This initial version has a subtle problem. What happens if the same data
+file is listed @emph{twice} on the command line, one right after the other,
+or even with just a variable assignment between the two occurrences of
+the file name?
+
+@c @findex nextfile
+@c do it this way, since all the indices are merged
+@cindex @code{nextfile} function
+In such a case,
+this code will skip right through the file, a second time, even though
+it should stop when it gets to the end of the first occurrence.
+Here is a second version of @code{nextfile} that remedies this problem.
+
+@example
+@group
+@c file eg/lib/nextfile.awk
+# nextfile --- skip remaining records in current file
+# correctly handle successive occurrences of the same file
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May, 1993
+
+# this should be read in before the "main" awk program
+
+function nextfile() @{ _abandon_ = FILENAME; next @}
+
+_abandon_ == FILENAME @{
+ if (FNR == 1)
+ _abandon_ = ""
+ else
+ next
+@}
+@c endfile
+@end group
+@end example
+
+The @code{nextfile} function has not changed. It sets @code{_abandon_}
+equal to the current file name and then executes a @code{next} satement.
+The @code{next} statement reads the next record and increments @code{FNR},
+so @code{FNR} is guaranteed to have a value of at least two.
+However, if @code{nextfile} is called for the last record in the file,
+then @code{awk} will close the current data file and move on to the next
+one. Upon doing so, @code{FILENAME} will be set to the name of the new file,
+and @code{FNR} will be reset to one. If this next file is the same as
+the previous one, @code{_abandon_} will still be equal to @code{FILENAME}.
+However, @code{FNR} will be equal to one, telling us that this is a new
+occurrence of the file, and not the one we were reading when the
+@code{nextfile} function was executed. In that case, @code{_abandon_}
+is reset to the empty string, so that further executions of this rule
+will fail (until the next time that @code{nextfile} is called).
+
+If @code{FNR} is not one, then we are still in the original data file,
+and the program executes a @code{next} statement to skip through it.
+
+An important question to ask at this point is: ``Given that the
+functionality of @code{nextfile} can be provided with a library file,
+why is it built into @code{gawk}?'' This is an important question. Adding
+features for little reason leads to larger, slower programs that are
+harder to maintain.
+
+The answer is that building @code{nextfile} into @code{gawk} provides
+significant gains in efficiency. If the @code{nextfile} function is executed
+at the beginning of a large data file, @code{awk} still has to scan the entire
+file, splitting it up into records, just to skip over it. The built-in
+@code{nextfile} can simply close the file immediately and proceed to the
+next one, saving a lot of time. This is particularly important in
+@code{awk}, since @code{awk} programs are generally I/O bound (i.e.@:
+they spend most of their time doing input and output, instead of performing
+computations).
+
+@node Assert Function, Ordinal Functions, Nextfile Function, Library Functions
+@section Assertions
+
+@cindex assertions
+@cindex @code{assert}, C version
+When writing large programs, it is often useful to be able to know
+that a condition or set of conditions is true. Before proceeding with a
+particular computation, you make a statement about what you believe to be
+the case. Such a statement is known as an
+``assertion.'' The C language provides an @code{<assert.h>} header file
+and corresponding @code{assert} macro that the programmer can use to make
+assertions. If an assertion fails, the @code{assert} macro arranges to
+print a diagnostic message describing the condition that should have
+been true but was not, and then it kills the program. In C, using
+@code{assert} looks this:
+
+@example
+#include <assert.h>
+
+int myfunc(int a, double b)
+@{
+ assert(a <= 5 && b >= 17);
+ @dots{}
+@}
+@end example
+
+If the assertion failed, the program would print a message similar to
+this:
+
+@example
+prog.c:5: assertion failed: a <= 5 && b >= 17
+@end example
+
+@findex assert
+The ANSI C language makes it possible to turn the condition into a string for use
+in printing the diagnostic message. This is not possible in @code{awk}, so
+this @code{assert} function also requires a string version of the condition
+that is being tested.
+
+@example
+@c @group
+@c file eg/lib/assert.awk
+# assert --- assert that a condition is true. Otherwise exit.
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May, 1993
+
+function assert(condition, string)
+@{
+ if (! condition) @{
+ printf("%s:%d: assertion failed: %s\n",
+ FILENAME, FNR, string) > "/dev/stderr"
+ _assert_exit = 1
+ exit 1
+ @}
+@}
+
+END @{
+ if (_assert_exit)
+ exit 1
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{assert} function tests the @code{condition} parameter. If it
+is false, it prints a message to standard error, using the @code{string}
+parameter to describe the failed condition. It then sets the variable
+@code{_assert_exit} to one, and executes the @code{exit} statement.
+The @code{exit} statement jumps to the @code{END} rule. If the @code{END}
+rules finds @code{_assert_exit} to be true, then it exits immediately.
+
+The purpose of the @code{END} rule with its test is to
+keep any other @code{END} rules from running. When an assertion fails, the
+program should exit immediately.
+If no assertions fail, then @code{_assert_exit} will still be
+false when the @code{END} rule is run normally, and the rest of the
+program's @code{END} rules will execute.
+For all of this to work correctly, @file{assert.awk} must be the
+first source file read by @code{awk}.
+
+You would use this function in your programs this way:
+
+@example
+function myfunc(a, b)
+@{
+ assert(a <= 5 && b >= 17, "a <= 5 && b >= 17")
+ @dots{}
+@}
+@end example
+
+@noindent
+If the assertion failed, you would see a message like this:
+
+@example
+mydata:1357: assertion failed: a <= 5 && b >= 17
+@end example
+
+There is a problem with this version of @code{assert}, that it may not
+be possible to work around. An @code{END} rule is automatically added
+to the program calling @code{assert}. Normally, if a program consists
+of just a @code{BEGIN} rule, the input files and/or standard input are
+not read. However, now that the program has an @code{END} rule, @code{awk}
+will attempt to read the input data files, or standard input
+(@pxref{Using BEGIN/END, , Startup and Cleanup Actions}),
+most likely causing the program to hang, waiting for input.
+
+@cindex backslash continuation
+Just a note on programming style. You may have noticed that the @code{END}
+rule uses backslash continuation, with the open brace on a line by
+itself. This is so that it more closely resembles the way functions
+are written. Many of the examples
+@iftex
+in this chapter and the next one
+@end iftex
+use this style. You can decide for yourself if you like writing
+your @code{BEGIN} and @code{END} rules this way,
+or not.
+
+@node Ordinal Functions, Join Function, Assert Function, Library Functions
+@section Translating Between Characters and Numbers
+
+@cindex numeric character values
+@cindex values of characters as numbers
+One commercial implementation of @code{awk} supplies a built-in function,
+@code{ord}, which takes a character and returns the numeric value for that
+character in the machine's character set. If the string passed to
+@code{ord} has more than one character, only the first one is used.
+
+The inverse of this function is @code{chr} (from the function of the same
+name in Pascal), which takes a number and returns the corresponding character.
+
+Both functions can be written very nicely in @code{awk}; there is no real
+reason to build them into the @code{awk} interpreter.
+
+@findex ord
+@findex chr
+@example
+@c @group
+@c file eg/lib/ord.awk
+# ord.awk --- do ord and chr
+#
+# Global identifiers:
+# _ord_: numerical values indexed by characters
+# _ord_init: function to initialize _ord_
+#
+# Arnold Robbins
+# arnold@@gnu.ai.mit.edu
+# Public Domain
+# 16 January, 1992
+# 20 July, 1992, revised
+
+BEGIN @{ _ord_init() @}
+@c endfile
+@c @end group
+
+@c @group
+@c file eg/lib/ord.awk
+function _ord_init( low, high, i, t)
+@{
+ low = sprintf("%c", 7) # BEL is ascii 7
+ if (low == "\a") @{ # regular ascii
+ low = 0
+ high = 127
+ @} else if (sprintf("%c", 128 + 7) == "\a") @{
+ # ascii, mark parity
+ low = 128
+ high = 255
+ @} else @{ # ebcdic(!)
+ low = 0
+ high = 255
+ @}
+
+ for (i = low; i <= high; i++) @{
+ t = sprintf("%c", i)
+ _ord_[t] = i
+ @}
+@}
+@c endfile
+@c @end group
+@end example
+
+@cindex character sets
+@cindex character encodings
+@cindex ASCII
+@cindex EBCDIC
+@cindex mark parity
+Some explanation of the numbers used by @code{chr} is worthwhile.
+The most prominent character set in use today is ASCII. Although an
+eight-bit byte can hold 256 distinct values (from zero to 255), ASCII only
+defines characters that use the values from zero to 127.@footnote{ASCII
+has been extended in many countries to use the values from 128 to 255
+for country-specific characters. If your system uses these extensions,
+you can simplify @code{_ord_init} to simply loop from zero to 255.}
+At least one computer manufacturer that we know of
+@c Pr1me, blech
+uses ASCII, but with mark parity, meaning that the leftmost bit in the byte
+is always one. What this means is that on those systems, characters
+have numeric values from 128 to 255.
+Finally, large mainframe systems use the EBCDIC character set, which
+uses all 256 values.
+While there are other character sets in use on some older systems,
+they are not really worth worrying about.
+
+@example
+@group
+@c file eg/lib/ord.awk
+function ord(str, c)
+@{
+ # only first character is of interest
+ c = substr(str, 1, 1)
+ return _ord_[c]
+@}
+@c endfile
+@end group
+
+@group
+@c file eg/lib/ord.awk
+function chr(c)
+@{
+ # force c to be numeric by adding 0
+ return sprintf("%c", c + 0)
+@}
+@c endfile
+@end group
+
+@c @group
+@c file eg/lib/ord.awk
+#### test code ####
+# BEGIN \
+# @{
+# for (;;) @{
+# printf("enter a character: ")
+# if (getline var <= 0)
+# break
+# printf("ord(%s) = %d\n", var, ord(var))
+# @}
+# @}
+@c endfile
+@c @end group
+@end example
+
+An obvious improvement to these functions would be to move the code for the
+@code{@w{_ord_init}} function into the body of the @code{BEGIN} rule. It was
+written this way initially for ease of development.
+
+There is a ``test program'' in a @code{BEGIN} rule, for testing the
+function. It is commented out for production use.
+
+@node Join Function, Mktime Function, Ordinal Functions, Library Functions
+@section Merging an Array Into a String
+
+@cindex merging strings
+When doing string processing, it is often useful to be able to join
+all the strings in an array into one long string. The following function,
+@code{join}, accomplishes this task. It is used later in several of
+the application programs
+(@pxref{Sample Programs, ,Practical @code{awk} Programs}).
+
+Good function design is important; this function needs to be general, but it
+should also have a reasonable default behavior. It is called with an array
+and the beginning and ending indices of the elements in the array to be
+merged. This assumes that the array indices are numeric---a reasonable
+assumption since the array was likely created with @code{split}
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+
+@findex join
+@example
+@group
+@c file eg/lib/join.awk
+# join.awk --- join an array into a string
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+function join(array, start, end, sep, result, i)
+@{
+ if (sep == "")
+ sep = " "
+ else if (sep == SUBSEP) # magic value
+ sep = ""
+ result = array[start]
+ for (i = start + 1; i <= end; i++)
+ result = result sep array[i]
+ return result
+@}
+@c endfile
+@end group
+@end example
+
+An optional additional argument is the separator to use when joining the
+strings back together. If the caller supplies a non-empty value,
+@code{join} uses it. If it is not supplied, it will have a null
+value. In this case, @code{join} uses a single blank as a default
+separator for the strings. If the value is equal to @code{SUBSEP},
+then @code{join} joins the strings with no separator between them.
+@code{SUBSEP} serves as a ``magic'' value to indicate that there should
+be no separation between the component strings.
+
+It would be nice if @code{awk} had an assignment operator for concatenation.
+The lack of an explicit operator for concatenation makes string operations
+more difficult than they really need to be.
+
+@node Mktime Function, Gettimeofday Function, Join Function, Library Functions
+@section Turning Dates Into Timestamps
+
+The @code{systime} function built in to @code{gawk}
+returns the current time of day as
+a timestamp in ``seconds since the Epoch.'' This timestamp
+can be converted into a printable date of almost infinitely variable
+format using the built-in @code{strftime} function.
+(For more information on @code{systime} and @code{strftime},
+@pxref{Time Functions, ,Functions for Dealing with Time Stamps}.)
+
+@cindex converting dates to timestamps
+@cindex dates, converting to timestamps
+@cindex timestamps, converting from dates
+An interesting but difficult problem is to convert a readable representation
+of a date back into a timestamp. The ANSI C library provides a @code{mktime}
+function that does the basic job, converting a canonical representation of a
+date into a timestamp.
+
+It would appear at first glance that @code{gawk} would have to supply a
+@code{mktime} built-in function that was simply a ``hook'' to the C language
+version. In fact though, @code{mktime} can be implemented entirely in
+@code{awk}.
+
+Here is a version of @code{mktime} for @code{awk}. It takes a simple
+representation of the date and time, and converts it into a timestamp.
+
+The code is presented here intermixed with explanatory prose. In
+@ref{Extract Program, ,Extracting Programs from Texinfo Source Files},
+you will see how the Texinfo source file for this @value{DOCUMENT}
+can be processed to extract the code into a single source file.
+
+The program begins with a descriptive comment and a @code{BEGIN} rule
+that initializes a table @code{_tm_months}. This table is a two-dimensional
+array that has the lengths of the months. The first index is zero for
+regular years, and one for leap years. The values are the same for all the
+months in both kinds of years, except for February; thus the use of multiple
+assignment.
+
+@example
+@c @group
+@c file eg/lib/mktime.awk
+# mktime.awk --- convert a canonical date representation
+# into a timestamp
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+BEGIN \
+@{
+ # Initialize table of month lengths
+ _tm_months[0,1] = _tm_months[1,1] = 31
+ _tm_months[0,2] = 28; _tm_months[1,2] = 29
+ _tm_months[0,3] = _tm_months[1,3] = 31
+ _tm_months[0,4] = _tm_months[1,4] = 30
+ _tm_months[0,5] = _tm_months[1,5] = 31
+ _tm_months[0,6] = _tm_months[1,6] = 30
+ _tm_months[0,7] = _tm_months[1,7] = 31
+ _tm_months[0,8] = _tm_months[1,8] = 31
+ _tm_months[0,9] = _tm_months[1,9] = 30
+ _tm_months[0,10] = _tm_months[1,10] = 31
+ _tm_months[0,11] = _tm_months[1,11] = 30
+ _tm_months[0,12] = _tm_months[1,12] = 31
+@}
+@c endfile
+@c @end group
+@end example
+
+The benefit of merging multiple @code{BEGIN} rules
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns})
+is particularly clear when writing library files. Functions in library
+files can cleanly initialize their own private data and also provide clean-up
+actions in private @code{END} rules.
+
+The next function is a simple one that computes whether a given year is or
+is not a leap year. If a year is evenly divisible by four, but not evenly
+divisible by 100, or if it is evenly divisible by 400, then it is a leap
+year. Thus, 1904 was a leap year, 1900 was not, but 2000 will be.
+@c Change this after the year 2000 to ``2000 was'' (:-)
+
+@findex _tm_isleap
+@example
+@group
+@c file eg/lib/mktime.awk
+# decide if a year is a leap year
+function _tm_isleap(year, ret)
+@{
+ ret = (year % 4 == 0 && year % 100 != 0) ||
+ (year % 400 == 0)
+
+ return ret
+@}
+@c endfile
+@end group
+@end example
+
+This function is only used a few times in this file, and its computation
+could have been written @dfn{in-line} (at the point where it's used).
+Making it a separate function made the original development easier, and also
+avoids the possibility of typing errors when duplicating the code in
+multiple places.
+
+The next function is more interesting. It does most of the work of
+generating a timestamp, which is converting a date and time into some number
+of seconds since the Epoch. The caller passes an array (rather
+imaginatively named @code{a}) containing six
+values: the year including century, the month as a number between one and 12,
+the day of the month, the hour as a number between zero and 23, the minute in
+the hour, and the seconds within the minute.
+
+The function uses several local variables to precompute the number of
+seconds in an hour, seconds in a day, and seconds in a year. Often,
+similar C code simply writes out the expression in-line, expecting the
+compiler to do @dfn{constant folding}. E.g., most C compilers would
+turn @samp{60 * 60} into @samp{3600} at compile time, instead of recomputing
+it every time at run time. Precomputing these values makes the
+function more efficient.
+
+@findex _tm_addup
+@example
+@c @group
+@c file eg/lib/mktime.awk
+# convert a date into seconds
+function _tm_addup(a, total, yearsecs, daysecs,
+ hoursecs, i, j)
+@{
+ hoursecs = 60 * 60
+ daysecs = 24 * hoursecs
+ yearsecs = 365 * daysecs
+
+ total = (a[1] - 1970) * yearsecs
+
+@group
+ # extra day for leap years
+ for (i = 1970; i < a[1]; i++)
+ if (_tm_isleap(i))
+ total += daysecs
+@end group
+
+@group
+ j = _tm_isleap(a[1])
+ for (i = 1; i < a[2]; i++)
+ total += _tm_months[j, i] * daysecs
+@end group
+
+ total += (a[3] - 1) * daysecs
+ total += a[4] * hoursecs
+ total += a[5] * 60
+ total += a[6]
+
+ return total
+@}
+@c endfile
+@c @end group
+@end example
+
+The function starts with a first approximation of all the seconds between
+Midnight, January 1, 1970,@footnote{This is the Epoch on POSIX systems.
+It may be different on other systems.} and the beginning of the current
+year. It then goes through all those years, and for every leap year,
+adds an additional day's worth of seconds.
+
+The variable @code{j} holds either one or zero, if the current year is or is not
+a leap year.
+For every month in the current year prior to the current month, it adds
+the number of seconds in the month, using the appropriate entry in the
+@code{_tm_months} array.
+
+Finally, it adds in the seconds for the number of days prior to the current
+day, and the number of hours, minutes, and seconds in the current day.
+
+The result is a count of seconds since January 1, 1970. This value is not
+yet what is needed though. The reason why is described shortly.
+
+The main @code{mktime} function takes a single character string argument.
+This string is a representation of a date and time in a ``canonical''
+(fixed) form. This string should be
+@code{"@var{year} @var{month} @var{day} @var{hour} @var{minute} @var{second}"}.
+
+@findex mktime
+@example
+@c @group
+@c file eg/lib/mktime.awk
+# mktime --- convert a date into seconds,
+# compensate for time zone
+
+function mktime(str, res1, res2, a, b, i, j, t, diff)
+@{
+ i = split(str, a, " ") # don't rely on FS
+
+ if (i != 6)
+ return -1
+
+ # force numeric
+ for (j in a)
+ a[j] += 0
+
+@group
+ # validate
+ if (a[1] < 1970 ||
+ a[2] < 1 || a[2] > 12 ||
+ a[3] < 1 || a[3] > 31 ||
+ a[4] < 0 || a[4] > 23 ||
+ a[5] < 0 || a[5] > 59 ||
+ a[6] < 0 || a[6] > 61 )
+ return -1
+@end group
+
+ res1 = _tm_addup(a)
+ t = strftime("%Y %m %d %H %M %S", res1)
+
+ if (_tm_debug)
+ printf("(%s) -> (%s)\n", str, t) > "/dev/stderr"
+
+ split(t, b, " ")
+ res2 = _tm_addup(b)
+
+ diff = res1 - res2
+
+ if (_tm_debug)
+ printf("diff = %d seconds\n", diff) > "/dev/stderr"
+
+ res1 += diff
+
+ return res1
+@}
+@c endfile
+@c @end group
+@end example
+
+The function first splits the string into an array, using spaces and tabs as
+separators. If there are not six elements in the array, it returns an
+error, signaled as the value @minus{}1.
+Next, it forces each element of the array to be numeric, by adding zero to it.
+The following @samp{if} statement then makes sure that each element is
+within an allowable range. (This checking could be extended further, e.g.,
+to make sure that the day of the month is within the correct range for the
+particular month supplied.) All of this is essentially preliminary set-up
+and error checking.
+
+Recall that @code{_tm_addup} generated a value in seconds since Midnight,
+January 1, 1970. This value is not directly usable as the result we want,
+@emph{since the calculation does not account for the local timezone}. In other
+words, the value represents the count in seconds since the Epoch, but only
+for UTC (Universal Coordinated Time). If the local timezone is east or west
+of UTC, then some number of hours should be either added to, or subtracted from
+the resulting timestamp.
+
+For example, 6:23 p.m. in Atlanta, Georgia (USA), is normally five hours west
+of (behind) UTC. It is only four hours behind UTC if daylight savings
+time is in effect.
+If you are calling @code{mktime} in Atlanta, with the argument
+@code{@w{"1993 5 23 18 23 12"}}, the result from @code{_tm_addup} will be
+for 6:23 p.m. UTC, which is only 2:23 p.m. in Atlanta. It is necessary to
+add another four hours worth of seconds to the result.
+
+How can @code{mktime} determine how far away it is from UTC? This is
+surprisingly easy. The returned timestamp represents the time passed to
+@code{mktime} @emph{as UTC}. This timestamp can be fed back to
+@code{strftime}, which will format it as a @emph{local} time; i.e.@: as
+if it already had the UTC difference added in to it. This is done by
+giving @code{@w{"%Y %m %d %H %M %S"}} to @code{strftime} as the format
+argument. It returns the computed timestamp in the original string
+format. The result represents a time that accounts for the UTC
+difference. When the new time is converted back to a timestamp, the
+difference between the two timestamps is the difference (in seconds)
+between the local timezone and UTC. This difference is then added back
+to the original result. An example demonstrating this is presented below.
+
+Finally, there is a ``main'' program for testing the function.
+
+@example
+@c @group
+@c file eg/lib/mktime.awk
+BEGIN @{
+ if (_tm_test) @{
+ printf "Enter date as yyyy mm dd hh mm ss: "
+ getline _tm_test_date
+
+ t = mktime(_tm_test_date)
+ r = strftime("%Y %m %d %H %M %S", t)
+ printf "Got back (%s)\n", r
+ @}
+@}
+@c endfile
+@c @end group
+@end example
+
+The entire program uses two variables that can be set on the command
+line to control debugging output and to enable the test in the final
+@code{BEGIN} rule. Here is the result of a test run. (Note that debugging
+output is to standard error, and test output is to standard output.)
+
+@example
+@c @group
+$ gawk -f mktime.awk -v _tm_test=1 -v _tm_debug=1
+@print{} Enter date as yyyy mm dd hh mm ss: 1993 5 23 15 35 10
+@error{} (1993 5 23 15 35 10) -> (1993 05 23 11 35 10)
+@error{} diff = 14400 seconds
+@print{} Got back (1993 05 23 15 35 10)
+@c @end group
+@end example
+
+The time entered was 3:35 p.m. (15:35 on a 24-hour clock), on May 23, 1993.
+The first line
+of debugging output shows the resulting time as UTC---four hours ahead of
+the local time zone. The second line shows that the difference is 14400
+seconds, which is four hours. (The difference is only four hours, since
+daylight savings time is in effect during May.)
+The final line of test output shows that the timezone compensation
+algorithm works; the returned time is the same as the entered time.
+
+This program does not solve the general problem of turning an arbitrary date
+representation into a timestamp. That problem is very involved. However,
+the @code{mktime} function provides a foundation upon which to build. Other
+software can convert month names into numeric months, and AM/PM times into
+24-hour clocks, to generate the ``canonical'' format that @code{mktime}
+requires.
+
+@node Gettimeofday Function, Filetrans Function, Mktime Function, Library Functions
+@section Managing the Time of Day
+
+@cindex formatted timestamps
+@cindex timestamps, formatted
+The @code{systime} and @code{strftime} functions described in
+@ref{Time Functions, ,Functions for Dealing with Time Stamps},
+provide the minimum functionality necessary for dealing with the time of day
+in human readable form. While @code{strftime} is extensive, the control
+formats are not necessarily easy to remember or intuitively obvious when
+reading a program.
+
+The following function, @code{gettimeofday}, populates a user-supplied array
+with pre-formatted time information. It returns a string with the current
+time formatted in the same way as the @code{date} utility.
+
+@findex gettimeofday
+@example
+@c @group
+@c file eg/lib/gettime.awk
+# gettimeofday --- get the time of day in a usable format
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain, May 1993
+#
+# Returns a string in the format of output of date(1)
+# Populates the array argument time with individual values:
+# time["second"] -- seconds (0 - 59)
+# time["minute"] -- minutes (0 - 59)
+# time["hour"] -- hours (0 - 23)
+# time["althour"] -- hours (0 - 12)
+# time["monthday"] -- day of month (1 - 31)
+# time["month"] -- month of year (1 - 12)
+# time["monthname"] -- name of the month
+# time["shortmonth"] -- short name of the month
+# time["year"] -- year within century (0 - 99)
+# time["fullyear"] -- year with century (19xx or 20xx)
+# time["weekday"] -- day of week (Sunday = 0)
+# time["altweekday"] -- day of week (Monday = 0)
+# time["weeknum"] -- week number, Sunday first day
+# time["altweeknum"] -- week number, Monday first day
+# time["dayname"] -- name of weekday
+# time["shortdayname"] -- short name of weekday
+# time["yearday"] -- day of year (0 - 365)
+# time["timezone"] -- abbreviation of timezone name
+# time["ampm"] -- AM or PM designation
+
+@group
+function gettimeofday(time, ret, now, i)
+@{
+ # get time once, avoids unnecessary system calls
+ now = systime()
+
+ # return date(1)-style output
+ ret = strftime("%a %b %d %H:%M:%S %Z %Y", now)
+
+ # clear out target array
+ for (i in time)
+ delete time[i]
+@end group
+
+@group
+ # fill in values, force numeric values to be
+ # numeric by adding 0
+ time["second"] = strftime("%S", now) + 0
+ time["minute"] = strftime("%M", now) + 0
+ time["hour"] = strftime("%H", now) + 0
+ time["althour"] = strftime("%I", now) + 0
+ time["monthday"] = strftime("%d", now) + 0
+ time["month"] = strftime("%m", now) + 0
+ time["monthname"] = strftime("%B", now)
+ time["shortmonth"] = strftime("%b", now)
+ time["year"] = strftime("%y", now) + 0
+ time["fullyear"] = strftime("%Y", now) + 0
+ time["weekday"] = strftime("%w", now) + 0
+ time["altweekday"] = strftime("%u", now) + 0
+ time["dayname"] = strftime("%A", now)
+ time["shortdayname"] = strftime("%a", now)
+ time["yearday"] = strftime("%j", now) + 0
+ time["timezone"] = strftime("%Z", now)
+ time["ampm"] = strftime("%p", now)
+ time["weeknum"] = strftime("%U", now) + 0
+ time["altweeknum"] = strftime("%W", now) + 0
+
+ return ret
+@}
+@end group
+@c endfile
+@end example
+
+The string indices are easier to use and read than the various formats
+required by @code{strftime}. The @code{alarm} program presented in
+@ref{Alarm Program, ,An Alarm Clock Program},
+uses this function.
+
+@c exercise!!!
+The @code{gettimeofday} function is presented above as it was written. A
+more general design for this function would have allowed the user to supply
+an optional timestamp value that would have been used instead of the current
+time.
+
+@node Filetrans Function, Getopt Function, Gettimeofday Function, Library Functions
+@section Noting Data File Boundaries
+
+@cindex per file initialization and clean-up
+The @code{BEGIN} and @code{END} rules are each executed exactly once, at
+the beginning and end respectively of your @code{awk} program
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}).
+We (the @code{gawk} authors) once had a user who mistakenly thought that the
+@code{BEGIN} rule was executed at the beginning of each data file and the
+@code{END} rule was executed at the end of each data file. When informed
+that this was not the case, the user requested that we add new special
+patterns to @code{gawk}, named @code{BEGIN_FILE} and @code{END_FILE}, that
+would have the desired behavior. He even supplied us the code to do so.
+
+However, after a little thought, I came up with the following library program.
+It arranges to call two user-supplied functions, @code{beginfile} and
+@code{endfile}, at the beginning and end of each data file.
+Besides solving the problem in only nine(!) lines of code, it does so
+@emph{portably}; this will work with any implementation of @code{awk}.
+
+@example
+@c @group
+# transfile.awk
+#
+# Give the user a hook for filename transitions
+#
+# The user must supply functions beginfile() and endfile()
+# that each take the name of the file being started or
+# finished, respectively.
+#
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, January 1992
+# Public Domain
+
+FILENAME != _oldfilename \
+@{
+ if (_oldfilename != "")
+ endfile(_oldfilename)
+ _oldfilename = FILENAME
+ beginfile(FILENAME)
+@}
+
+END @{ endfile(FILENAME) @}
+@c @end group
+@end example
+
+This file must be loaded before the user's ``main'' program, so that the
+rule it supplies will be executed first.
+
+This rule relies on @code{awk}'s @code{FILENAME} variable that
+automatically changes for each new data file. The current file name is
+saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does
+not equal @code{_oldfilename}, then a new data file is being processed, and
+it is necessary to call @code{endfile} for the old file. Since
+@code{endfile} should only be called if a file has been processed, the
+program first checks to make sure that @code{_oldfilename} is not the null
+string. The program then assigns the current file name to
+@code{_oldfilename}, and calls @code{beginfile} for the file.
+Since, like all @code{awk} variables, @code{_oldfilename} will be
+initialized to the null string, this rule executes correctly even for the
+first data file.
+
+The program also supplies an @code{END} rule, to do the final processing for
+the last file. Since this @code{END} rule comes before any @code{END} rules
+supplied in the ``main'' program, @code{endfile} will be called first. Once
+again the value of multiple @code{BEGIN} and @code{END} rules should be clear.
+
+@findex beginfile
+@findex endfile
+This version has same problem as the first version of @code{nextfile}
+(@pxref{Nextfile Function, ,Implementing @code{nextfile} as a Function}).
+If the same data file occurs twice in a row on command line, then
+@code{endfile} and @code{beginfile} will not be executed at the end of the
+first pass and at the beginning of the second pass.
+This version solves the problem.
+
+@example
+@c @group
+@c file eg/lib/ftrans.awk
+# ftrans.awk --- handle data file transitions
+#
+# user supplies beginfile() and endfile() functions
+#
+# Arnold Robbins, arnold@@gnu.ai.mit.edu. November 1992
+# Public Domain
+
+FNR == 1 @{
+ if (_filename_ != "")
+ endfile(_filename_)
+ _filename_ = FILENAME
+ beginfile(FILENAME)
+@}
+
+END @{ endfile(_filename_) @}
+@c endfile
+@c @end group
+@end example
+
+In @ref{Wc Program, ,Counting Things},
+you will see how this library function can be used, and
+how it simplifies writing the main program.
+
+@node Getopt Function, Passwd Functions, Filetrans Function, Library Functions
+@section Processing Command Line Options
+
+@cindex @code{getopt}, C version
+@cindex processing arguments
+@cindex argument processing
+Most utilities on POSIX compatible systems take options or ``switches'' on
+the command line that can be used to change the way a program behaves.
+@code{awk} is an example of such a program
+(@pxref{Options, ,Command Line Options}).
+Often, options take @dfn{arguments}, data that the program needs to
+correctly obey the command line option. For example, @code{awk}'s
+@samp{-F} option requires a string to use as the field separator.
+The first occurrence on the command line of either @samp{--} or a
+string that does not begin with @samp{-} ends the options.
+
+Most Unix systems provide a C function named @code{getopt} for processing
+command line arguments. The programmer provides a string describing the one
+letter options. If an option requires an argument, it is followed in the
+string with a colon. @code{getopt} is also passed the
+count and values of the command line arguments, and is called in a loop.
+@code{getopt} processes the command line arguments for option letters.
+Each time around the loop, it returns a single character representing the
+next option letter that it found, or @samp{?} if it found an invalid option.
+When it returns @minus{}1, there are no options left on the command line.
+
+When using @code{getopt}, options that do not take arguments can be
+grouped together. Furthermore, options that take arguments require that the
+argument be present. The argument can immediately follow the option letter,
+or it can be a separate command line argument.
+
+Given a hypothetical program that takes
+three command line options, @samp{-a}, @samp{-b}, and @samp{-c}, and
+@samp{-b} requires an argument, all of the following are valid ways of
+invoking the program:
+
+@example
+@c @group
+prog -a -b foo -c data1 data2 data3
+prog -ac -bfoo -- data1 data2 data3
+prog -acbfoo data1 data2 data3
+@c @end group
+@end example
+
+Notice that when the argument is grouped with its option, the rest of
+the command line argument is considered to be the option's argument.
+In the above example, @samp{-acbfoo} indicates that all of the
+@samp{-a}, @samp{-b}, and @samp{-c} options were supplied,
+and that @samp{foo} is the argument to the @samp{-b} option.
+
+@code{getopt} provides four external variables that the programmer can use.
+
+@table @code
+@item optind
+The index in the argument value array (@code{argv}) where the first
+non-option command line argument can be found.
+
+@item optarg
+The string value of the argument to an option.
+
+@item opterr
+Usually @code{getopt} prints an error message when it finds an invalid
+option. Setting @code{opterr} to zero disables this feature. (An
+application might wish to print its own error message.)
+
+@item optopt
+The letter representing the command line option.
+While not usually documented, most versions supply this variable.
+@end table
+
+The following C fragment shows how @code{getopt} might process command line
+arguments for @code{awk}.
+
+@example
+@group
+int
+main(int argc, char *argv[])
+@{
+ @dots{}
+ /* print our own message */
+ opterr = 0;
+@end group
+@group
+ while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) @{
+ switch (c) @{
+ case 'f': /* file */
+ @dots{}
+ break;
+ case 'F': /* field separator */
+ @dots{}
+ break;
+ case 'v': /* variable assignment */
+ @dots{}
+ break;
+ case 'W': /* extension */
+ @dots{}
+ break;
+ case '?':
+ default:
+ usage();
+ break;
+ @}
+ @}
+ @dots{}
+@}
+@end group
+@end example
+
+As a side point, @code{gawk} actually uses the GNU @code{getopt_long}
+function to process both normal and GNU-style long options
+(@pxref{Options, ,Command Line Options}).
+
+The abstraction provided by @code{getopt} is very useful, and would be quite
+handy in @code{awk} programs as well. Here is an @code{awk} version of
+@code{getopt}. This function highlights one of the greatest weaknesses in
+@code{awk}, which is that it is very poor at manipulating single characters.
+Repeated calls to @code{substr} are necessary for accessing individual
+characters (@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+
+The discussion walks through the code a bit at a time.
+
+@example
+@c @group
+@c file eg/lib/getopt.awk
+# getopt --- do C library getopt(3) function in awk
+#
+# arnold@@gnu.ai.mit.edu
+# Public domain
+#
+# Initial version: March, 1991
+# Revised: May, 1993
+
+# External variables:
+# Optind -- index of ARGV for first non-option argument
+# Optarg -- string value of argument to current option
+# Opterr -- if non-zero, print our own diagnostic
+# Optopt -- current option letter
+
+# Returns
+# -1 at end of options
+# ? for unrecognized option
+# <c> a character representing the current option
+
+# Private Data
+# _opti index in multi-flag option, e.g., -abc
+@c endfile
+@c @end group
+@end example
+
+The function starts out with some documentation: who wrote the code,
+and when it was revised, followed by a list of the global variables it uses,
+what the return values are and what they mean, and any global variables that
+are ``private'' to this library function. Such documentation is essential
+for any program, and particularly for library functions.
+
+@findex getopt
+@example
+@c @group
+@c file eg/lib/getopt.awk
+function getopt(argc, argv, options, optl, thisopt, i)
+@{
+ optl = length(options)
+ if (optl == 0) # no options given
+ return -1
+
+ if (argv[Optind] == "--") @{ # all done
+ Optind++
+ _opti = 0
+ return -1
+ @} else if (argv[Optind] !~ /^-[^: \t\n\f\r\v\b]/) @{
+ _opti = 0
+ return -1
+ @}
+@c endfile
+@c @end group
+@end example
+
+The function first checks that it was indeed called with a string of options
+(the @code{options} parameter). If @code{options} has a zero length,
+@code{getopt} immediately returns @minus{}1.
+
+The next thing to check for is the end of the options. A @samp{--} ends the
+command line options, as does any command line argument that does not begin
+with a @samp{-}. @code{Optind} is used to step through the array of command
+line arguments; it retains its value across calls to @code{getopt}, since it
+is a global variable.
+
+The regexp used, @code{@w{/^-[^: \t\n\f\r\v\b]/}}, is
+perhaps a bit of overkill; it checks for a @samp{-} followed by anything
+that is not whitespace and not a colon.
+If the current command line argument does not match this pattern,
+it is not an option, and it ends option processing.
+
+@example
+@group
+@c file eg/lib/getopt.awk
+ if (_opti == 0)
+ _opti = 2
+ thisopt = substr(argv[Optind], _opti, 1)
+ Optopt = thisopt
+ i = index(options, thisopt)
+ if (i == 0) @{
+ if (Opterr)
+ printf("%c -- invalid option\n",
+ thisopt) > "/dev/stderr"
+ if (_opti >= length(argv[Optind])) @{
+ Optind++
+ _opti = 0
+ @} else
+ _opti++
+ return "?"
+ @}
+@c endfile
+@end group
+@end example
+
+The @code{_opti} variable tracks the position in the current command line
+argument (@code{argv[Optind]}). In the case that multiple options were
+grouped together with one @samp{-} (e.g., @samp{-abx}), it is necessary
+to return them to the user one at a time.
+
+If @code{_opti} is equal to zero, it is set to two, the index in the string
+of the next character to look at (we skip the @samp{-}, which is at position
+one). The variable @code{thisopt} holds the character, obtained with
+@code{substr}. It is saved in @code{Optopt} for the main program to use.
+
+If @code{thisopt} is not in the @code{options} string, then it is an
+invalid option. If @code{Opterr} is non-zero, @code{getopt} prints an error
+message on the standard error that is similar to the message from the C
+version of @code{getopt}.
+
+Since the option is invalid, it is necessary to skip it and move on to the
+next option character. If @code{_opti} is greater than or equal to the
+length of the current command line argument, then it is necessary to move on
+to the next one, so @code{Optind} is incremented and @code{_opti} is reset
+to zero. Otherwise, @code{Optind} is left alone and @code{_opti} is merely
+incremented.
+
+In any case, since the option was invalid, @code{getopt} returns @samp{?}.
+The main program can examine @code{Optopt} if it needs to know what the
+invalid option letter actually was.
+
+@example
+@group
+@c file eg/lib/getopt.awk
+ if (substr(options, i + 1, 1) == ":") @{
+ # get option argument
+ if (length(substr(argv[Optind], _opti + 1)) > 0)
+ Optarg = substr(argv[Optind], _opti + 1)
+ else
+ Optarg = argv[++Optind]
+ _opti = 0
+ @} else
+ Optarg = ""
+@c endfile
+@end group
+@end example
+
+If the option requires an argument, the option letter is followed by a colon
+in the @code{options} string. If there are remaining characters in the
+current command line argument (@code{argv[Optind]}), then the rest of that
+string is assigned to @code{Optarg}. Otherwise, the next command line
+argument is used (@samp{-xFOO} vs. @samp{@w{-x FOO}}). In either case,
+@code{_opti} is reset to zero, since there are no more characters left to
+examine in the current command line argument.
+
+@example
+@c @group
+@c file eg/lib/getopt.awk
+ if (_opti == 0 || _opti >= length(argv[Optind])) @{
+ Optind++
+ _opti = 0
+ @} else
+ _opti++
+ return thisopt
+@}
+@c endfile
+@c @end group
+@end example
+
+Finally, if @code{_opti} is either zero or greater than the length of the
+current command line argument, it means this element in @code{argv} is
+through being processed, so @code{Optind} is incremented to point to the
+next element in @code{argv}. If neither condition is true, then only
+@code{_opti} is incremented, so that the next option letter can be processed
+on the next call to @code{getopt}.
+
+@example
+@c @group
+@c file eg/lib/getopt.awk
+BEGIN @{
+ Opterr = 1 # default is to diagnose
+ Optind = 1 # skip ARGV[0]
+
+ # test program
+ if (_getopt_test) @{
+ while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
+ printf("c = <%c>, optarg = <%s>\n",
+ _go_c, Optarg)
+ printf("non-option arguments:\n")
+ for (; Optind < ARGC; Optind++)
+ printf("\tARGV[%d] = <%s>\n",
+ Optind, ARGV[Optind])
+ @}
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{BEGIN} rule initializes both @code{Opterr} and @code{Optind} to one.
+@code{Opterr} is set to one, since the default behavior is for @code{getopt}
+to print a diagnostic message upon seeing an invalid option. @code{Optind}
+is set to one, since there's no reason to look at the program name, which is
+in @code{ARGV[0]}.
+
+The rest of the @code{BEGIN} rule is a simple test program. Here is the
+result of two sample runs of the test program.
+
+@example
+@group
+$ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
+@print{} c = <a>, optarg = <>
+@print{} c = <c>, optarg = <>
+@print{} c = <b>, optarg = <ARG>
+@print{} non-option arguments:
+@print{} ARGV[3] = <bax>
+@print{} ARGV[4] = <-x>
+@end group
+
+@group
+$ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
+@print{} c = <a>, optarg = <>
+@error{} x -- invalid option
+@print{} c = <?>, optarg = <>
+@print{} non-option arguments:
+@print{} ARGV[4] = <xyz>
+@print{} ARGV[5] = <abc>
+@end group
+@end example
+
+The first @samp{--} terminates the arguments to @code{awk}, so that it does
+not try to interpret the @samp{-a} etc. as its own options.
+
+Several of the sample programs presented in
+@ref{Sample Programs, ,Practical @code{awk} Programs},
+use @code{getopt} to process their arguments.
+
+@node Passwd Functions, Group Functions, Getopt Function, Library Functions
+@section Reading the User Database
+
+@cindex @file{/dev/user}
+The @file{/dev/user} special file
+(@pxref{Special Files, ,Special File Names in @code{gawk}})
+provides access to the current user's real and effective user and group id
+numbers, and if available, the user's supplementary group set.
+However, since these are numbers, they do not provide very useful
+information to the average user. There needs to be some way to find the
+user information associated with the user and group numbers. This
+section presents a suite of functions for retrieving information from the
+user database. @xref{Group Functions, ,Reading the Group Database},
+for a similar suite that retrieves information from the group database.
+
+@cindex @code{getpwent}, C version
+@cindex user information
+@cindex login information
+@cindex account information
+@cindex password file
+The POSIX standard does not define the file where user information is
+kept. Instead, it provides the @code{<pwd.h>} header file
+and several C language subroutines for obtaining user information.
+The primary function is @code{getpwent}, for ``get password entry.''
+The ``password'' comes from the original user database file,
+@file{/etc/passwd}, which kept user information, along with the
+encrypted passwords (hence the name).
+
+While an @code{awk} program could simply read @file{/etc/passwd} directly
+(the format is well known), because of the way password
+files are handled on networked systems,
+this file may not contain complete information about the system's set of users.
+
+@cindex @code{pwcat} program
+To be sure of being
+able to produce a readable, complete version of the user database, it is
+necessary to write a small C program that calls @code{getpwent}.
+@code{getpwent} is defined to return a pointer to a @code{struct passwd}.
+Each time it is called, it returns the next entry in the database.
+When there are no more entries, it returns @code{NULL}, the null pointer.
+When this happens, the C program should call @code{endpwent} to close the
+database.
+Here is @code{pwcat}, a C program that ``cats'' the password database.
+
+@findex pwcat.c
+@example
+@c @group
+@c file eg/lib/pwcat.c
+/*
+ * pwcat.c
+ *
+ * Generate a printable version of the password database
+ *
+ * Arnold Robbins
+ * arnold@@gnu.ai.mit.edu
+ * May 1993
+ * Public Domain
+ */
+
+#include <stdio.h>
+#include <pwd.h>
+
+int
+main(argc, argv)
+int argc;
+char **argv;
+@{
+ struct passwd *p;
+
+ while ((p = getpwent()) != NULL)
+ printf("%s:%s:%d:%d:%s:%s:%s\n",
+ p->pw_name, p->pw_passwd, p->pw_uid,
+ p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
+
+ endpwent();
+ exit(0);
+@}
+@c endfile
+@c @end group
+@end example
+
+If you don't understand C, don't worry about it.
+The output from @code{pwcat} is the user database, in the traditional
+@file{/etc/passwd} format of colon-separated fields. The fields are:
+
+@table @asis
+@item Login name
+The user's login name.
+
+@item Encrypted password
+The user's encrypted password. This may not be available on some systems.
+
+@item User-ID
+The user's numeric user-id number.
+
+@item Group-ID
+The user's numeric group-id number.
+
+@item Full name
+The user's full name, and perhaps other information associated with the
+user.
+
+@item Home directory
+The user's login, or ``home'' directory (familiar to shell programmers as
+@code{$HOME}).
+
+@item Login shell
+The program that will be run when the user logs in. This is usually a
+shell, such as Bash (the Gnu Bourne-Again shell).
+@end table
+
+Here are a few lines representative of @code{pwcat}'s output.
+
+@example
+@c @group
+$ pwcat
+@print{} root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
+@print{} nobody:*:65534:65534::/:
+@print{} daemon:*:1:1::/:
+@print{} sys:*:2:2::/:/bin/csh
+@print{} bin:*:3:3::/bin:
+@print{} arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
+@print{} miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
+@dots{}
+@c @end group
+@end example
+
+With that introduction, here is a group of functions for getting user
+information. There are several functions here, corresponding to the C
+functions of the same name.
+
+@findex _pw_init
+@example
+@c file eg/lib/passwdawk.in
+@group
+# passwd.awk --- access password file information
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+BEGIN @{
+ # tailor this to suit your system
+ _pw_awklib = "/usr/local/libexec/awk/"
+@}
+@end group
+
+function _pw_init( oldfs, oldrs, olddol0, pwcat)
+@{
+ if (_pw_inited)
+ return
+ oldfs = FS
+ oldrs = RS
+ olddol0 = $0
+ FS = ":"
+ RS = "\n"
+ pwcat = _pw_awklib "pwcat"
+ while ((pwcat | getline) > 0) @{
+ _pw_byname[$1] = $0
+ _pw_byuid[$3] = $0
+ _pw_bycount[++_pw_total] = $0
+ @}
+ close(pwcat)
+ _pw_count = 0
+ _pw_inited = 1
+ FS = oldfs
+ RS = oldrs
+ $0 = olddol0
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{BEGIN} rule sets a private variable to the directory where
+@code{pwcat} is stored. Since it is used to help out an @code{awk} library
+routine, we have chosen to put it in @file{/usr/local/libexec/awk}.
+You might want it to be in a different directory on your system.
+
+The function @code{_pw_init} keeps three copies of the user information
+in three associative arrays. The arrays are indexed by user name
+(@code{_pw_byname}), by user-id number (@code{_pw_byuid}), and by order of
+occurrence (@code{_pw_bycount}).
+
+The variable @code{_pw_inited} is used for efficiency; @code{_pw_init} only
+needs to be called once.
+
+Since this function uses @code{getline} to read information from
+@code{pwcat}, it first saves the values of @code{FS}, @code{RS}, and
+@code{$0}. Doing so is necessary, since these functions could be called
+from anywhere within a user's program, and the user may have his or her
+own values for @code{FS} and @code{RS}.
+@ignore
+Problem, what if FIELDWIDTHS is in use? Sigh.
+@end ignore
+
+The main part of the function uses a loop to read database lines, split
+the line into fields, and then store the line into each array as necessary.
+When the loop is done, @code{@w{_pw_init}} cleans up by closing the pipeline,
+setting @code{@w{_pw_inited}} to one, and restoring @code{FS}, @code{RS}, and
+@code{$0}. The use of @code{@w{_pw_count}} will be explained below.
+
+@findex getpwnam
+@example
+@group
+@c file eg/lib/passwdawk.in
+function getpwnam(name)
+@{
+ _pw_init()
+ if (name in _pw_byname)
+ return _pw_byname[name]
+ return ""
+@}
+@c endfile
+@end group
+@end example
+
+The @code{getpwnam} function takes a user name as a string argument. If that
+user is in the database, it returns the appropriate line. Otherwise it
+returns the null string.
+
+@findex getpwuid
+@example
+@group
+@c file eg/lib/passwdawk.in
+function getpwuid(uid)
+@{
+ _pw_init()
+ if (uid in _pw_byuid)
+ return _pw_byuid[uid]
+ return ""
+@}
+@c endfile
+@end group
+@end example
+
+Similarly,
+the @code{getpwuid} function takes a user-id number argument. If that
+user number is in the database, it returns the appropriate line. Otherwise it
+returns the null string.
+
+@findex getpwent
+@example
+@c @group
+@c file eg/lib/passwdawk.in
+function getpwent()
+@{
+ _pw_init()
+ if (_pw_count < _pw_total)
+ return _pw_bycount[++_pw_count]
+ return ""
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{getpwent} function simply steps through the database, one entry at
+a time. It uses @code{_pw_count} to track its current position in the
+@code{_pw_bycount} array.
+
+@findex endpwent
+@example
+@c @group
+@c file eg/lib/passwdawk.in
+function endpwent()
+@{
+ _pw_count = 0
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{@w{endpwent}} function resets @code{@w{_pw_count}} to zero, so that
+subsequent calls to @code{getpwent} will start over again.
+
+A conscious design decision in this suite is that each subroutine calls
+@code{@w{_pw_init}} to initialize the database arrays. The overhead of running
+a separate process to generate the user database, and the I/O to scan it,
+will only be incurred if the user's main program actually calls one of these
+functions. If this library file is loaded along with a user's program, but
+none of the routines are ever called, then there is no extra run-time overhead.
+(The alternative would be to move the body of @code{@w{_pw_init}} into a
+@code{BEGIN} rule, which would always run @code{pwcat}. This simplifies the
+code but runs an extra process that may never be needed.)
+
+In turn, calling @code{_pw_init} is not too expensive, since the
+@code{_pw_inited} variable keeps the program from reading the data more than
+once. If you are worried about squeezing every last cycle out of your
+@code{awk} program, the check of @code{_pw_inited} could be moved out of
+@code{_pw_init} and duplicated in all the other functions. In practice,
+this is not necessary, since most @code{awk} programs are I/O bound, and it
+would clutter up the code.
+
+The @code{id} program in @ref{Id Program, ,Printing Out User Information},
+uses these functions.
+
+@node Group Functions, Library Names, Passwd Functions, Library Functions
+@section Reading the Group Database
+
+@cindex @code{getgrent}, C version
+@cindex group information
+@cindex account information
+@cindex group file
+Much of the discussion presented in
+@ref{Passwd Functions, ,Reading the User Database},
+applies to the group database as well. Although there has traditionally
+been a well known file, @file{/etc/group}, in a well known format, the POSIX
+standard only provides a set of C library routines
+(@code{<grp.h>} and @code{getgrent})
+for accessing the information.
+Even though this file may exist, it likely does not have
+complete information. Therefore, as with the user database, it is necessary
+to have a small C program that generates the group database as its output.
+
+@cindex @code{grcat} program
+Here is @code{grcat}, a C program that ``cats'' the group database.
+
+@findex grcat.c
+@example
+@c @group
+@c file eg/lib/grcat.c
+/*
+ * grcat.c
+ *
+ * Generate a printable version of the group database
+ *
+ * Arnold Robbins, arnold@@gnu.ai.mit.edu
+ * May 1993
+ * Public Domain
+ */
+
+#include <stdio.h>
+#include <grp.h>
+
+@group
+int
+main(argc, argv)
+int argc;
+char **argv;
+@{
+ struct group *g;
+ int i;
+@end group
+
+ while ((g = getgrent()) != NULL) @{
+ printf("%s:%s:%d:", g->gr_name, g->gr_passwd,
+ g->gr_gid);
+ for (i = 0; g->gr_mem[i] != NULL; i++) @{
+ printf("%s", g->gr_mem[i]);
+ if (g->gr_mem[i+1] != NULL)
+ putchar(',');
+ @}
+ putchar('\n');
+ @}
+ endgrent();
+ exit(0);
+@}
+@c endfile
+@c @end group
+@end example
+
+Each line in the group database represent one group. The fields are
+separated with colons, and represent the following information.
+
+@table @asis
+@item Group Name
+The name of the group.
+
+@item Group Password
+The encrypted group password. In practice, this field is never used. It is
+usually empty, or set to @samp{*}.
+
+@item Group ID Number
+The numeric group-id number. This number should be unique within the file.
+
+@item Group Member List
+A comma-separated list of user names. These users are members of the group.
+Most Unix systems allow users to be members of several groups
+simultaneously. If your system does, then reading @file{/dev/user} will
+return those group-id numbers in @code{$5} through @code{$NF}.
+(Note that @file{/dev/user} is a @code{gawk} extension;
+@pxref{Special Files, ,Special File Names in @code{gawk}}.)
+@end table
+
+@iftex
+@page
+@end iftex
+Here is what running @code{grcat} might produce:
+
+@example
+@group
+$ grcat
+@print{} wheel:*:0:arnold
+@print{} nogroup:*:65534:
+@print{} daemon:*:1:
+@print{} kmem:*:2:
+@print{} staff:*:10:arnold,miriam,andy
+@print{} other:*:20:
+@dots{}
+@end group
+@end example
+
+Here are the functions for obtaining information from the group database.
+There are several, modeled after the C library functions of the same names.
+
+@findex _gr_init
+@example
+@group
+@c file eg/lib/groupawk.in
+# group.awk --- functions for dealing with the group file
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+BEGIN \
+@{
+ # Change to suit your system
+ _gr_awklib = "/usr/local/libexec/awk/"
+@}
+@c endfile
+@end group
+
+@group
+@c file eg/lib/groupawk.in
+function _gr_init( oldfs, oldrs, olddol0, grcat, n, a, i)
+@{
+ if (_gr_inited)
+ return
+@end group
+
+@group
+ oldfs = FS
+ oldrs = RS
+ olddol0 = $0
+ FS = ":"
+ RS = "\n"
+@end group
+
+@group
+ grcat = _gr_awklib "grcat"
+ while ((grcat | getline) > 0) @{
+ if ($1 in _gr_byname)
+ _gr_byname[$1] = _gr_byname[$1] "," $4
+ else
+ _gr_byname[$1] = $0
+ if ($3 in _gr_bygid)
+ _gr_bygid[$3] = _gr_bygid[$3] "," $4
+ else
+ _gr_bygid[$3] = $0
+
+ n = split($4, a, "[ \t]*,[ \t]*")
+@end group
+@group
+ for (i = 1; i <= n; i++)
+ if (a[i] in _gr_groupsbyuser)
+ _gr_groupsbyuser[a[i]] = \
+ _gr_groupsbyuser[a[i]] " " $1
+ else
+ _gr_groupsbyuser[a[i]] = $1
+@end group
+
+@group
+ _gr_bycount[++_gr_count] = $0
+ @}
+@end group
+@group
+ close(grcat)
+ _gr_count = 0
+ _gr_inited++
+ FS = oldfs
+ RS = oldrs
+ $0 = olddol0
+@}
+@c endfile
+@end group
+@end example
+
+The @code{BEGIN} rule sets a private variable to the directory where
+@code{grcat} is stored. Since it is used to help out an @code{awk} library
+routine, we have chosen to put it in @file{/usr/local/libexec/awk}. You might
+want it to be in a different directory on your system.
+
+These routines follow the same general outline as the user database routines
+(@pxref{Passwd Functions, ,Reading the User Database}).
+The @code{@w{_gr_inited}} variable is used to
+ensure that the database is scanned no more than once.
+The @code{@w{_gr_init}} function first saves @code{FS}, @code{RS}, and
+@code{$0}, and then sets @code{FS} and @code{RS} to the correct values for
+scanning the group information.
+
+The group information is stored is several associative arrays.
+The arrays are indexed by group name (@code{@w{_gr_byname}}), by group-id number
+(@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}).
+There is an additional array indexed by user name (@code{@w{_gr_groupsbyuser}}),
+that is a space separated list of groups that each user belongs to.
+
+Unlike the user database, it is possible to have multiple records in the
+database for the same group. This is common when a group has a large number
+of members. Such a pair of entries might look like:
+
+@example
+tvpeople:*:101:johny,jay,arsenio
+tvpeople:*:101:david,conan,tom,joan
+@end example
+
+For this reason, @code{_gr_init} looks to see if a group name or
+group-id number has already been seen. If it has, then the user names are
+simply concatenated onto the previous list of users. (There is actually a
+subtle problem with the code presented above. Suppose that
+the first time there were no names. This code adds the names with
+a leading comma. It also doesn't check that there is a @code{$4}.)
+
+Finally, @code{_gr_init} closes the pipeline to @code{grcat}, restores
+@code{FS}, @code{RS}, and @code{$0}, initializes @code{_gr_count} to zero
+(it is used later), and makes @code{_gr_inited} non-zero.
+
+@findex getgrnam
+@example
+@c @group
+@c file eg/lib/groupawk.in
+function getgrnam(group)
+@{
+ _gr_init()
+ if (group in _gr_byname)
+ return _gr_byname[group]
+ return ""
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{getgrnam} function takes a group name as its argument, and if that
+group exists, it is returned. Otherwise, @code{getgrnam} returns the null
+string.
+
+@findex getgrgid
+@example
+@c @group
+@c file eg/lib/groupawk.in
+function getgrgid(gid)
+@{
+ _gr_init()
+ if (gid in _gr_bygid)
+ return _gr_bygid[gid]
+ return ""
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{getgrgid} function is similar, it takes a numeric group-id, and
+looks up the information associated with that group-id.
+
+@findex getgruser
+@example
+@group
+@c file eg/lib/groupawk.in
+function getgruser(user)
+@{
+ _gr_init()
+ if (user in _gr_groupsbyuser)
+ return _gr_groupsbyuser[user]
+ return ""
+@}
+@c endfile
+@end group
+@end example
+
+The @code{getgruser} function does not have a C counterpart. It takes a
+user name, and returns the list of groups that have the user as a member.
+
+@findex getgrent
+@example
+@c @group
+@c file eg/lib/groupawk.in
+function getgrent()
+@{
+ _gr_init()
+ if (++gr_count in _gr_bycount)
+ return _gr_bycount[_gr_count]
+ return ""
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{getgrent} function steps through the database one entry at a time.
+It uses @code{_gr_count} to track its position in the list.
+
+@findex endgrent
+@example
+@group
+@c file eg/lib/groupawk.in
+function endgrent()
+@{
+ _gr_count = 0
+@}
+@c endfile
+@end group
+@end example
+
+@code{endgrent} resets @code{_gr_count} to zero so that @code{getgrent} can
+start over again.
+
+As with the user database routines, each function calls @code{_gr_init} to
+initialize the arrays. Doing so only incurs the extra overhead of running
+@code{grcat} if these functions are used (as opposed to moving the body of
+@code{_gr_init} into a @code{BEGIN} rule).
+
+Most of the work is in scanning the database and building the various
+associative arrays. The functions that the user calls are themselves very
+simple, relying on @code{awk}'s associative arrays to do work.
+
+The @code{id} program in @ref{Id Program, ,Printing Out User Information},
+uses these functions.
+
+@node Library Names, , Group Functions, Library Functions
+@section Naming Library Function Global Variables
+
+@cindex namespace issues in @code{awk}
+@cindex documenting @code{awk} programs
+@cindex programs, documenting
+Due to the way the @code{awk} language evolved, variables are either
+@dfn{global} (usable by the entire program), or @dfn{local} (usable just by
+a specific function). There is no intermediate state analogous to
+@code{static} variables in C.
+
+Library functions often need to have global variables that they can use to
+preserve state information between calls to the function. For example,
+@code{getopt}'s variable @code{_opti}
+(@pxref{Getopt Function, ,Processing Command Line Options}),
+and the @code{_tm_months} array used by @code{mktime}
+(@pxref{Mktime Function, ,Turning Dates Into Timestamps}).
+Such variables are called @dfn{private}, since the only functions that need to
+use them are the ones in the library.
+
+When writing a library function, you should try to choose names for your
+private variables so that they will not conflict with any variables used by
+either another library function or a user's main program. For example, a
+name like @samp{i} or @samp{j} is not a good choice, since user programs
+often use variable names like these for their own purposes.
+
+The example programs shown in this chapter all start the names of their
+private variables with an underscore (@samp{_}). Users generally don't use
+leading underscores in their variable names, so this convention immediately
+decreases the chances that the variable name will be accidentally shared
+with the user's program.
+
+In addition, several of the library functions use a prefix that helps
+indicate what function or set of functions uses the variables. For example,
+@code{_tm_months} in @code{mktime}
+(@pxref{Mktime Function, ,Turning Dates Into Timestamps}), and
+@code{_pw_byname} in the user data base routines
+(@pxref{Passwd Functions, ,Reading the User Database}).
+This convention is recommended, since it even further decreases the chance
+of inadvertent conflict among variable names.
+Note that this convention can be used equally well both for variable names
+and for private function names too.
+
+While I could have re-written all the library routines to use this
+convention, I did not do so, in order to show how my own @code{awk}
+programming style has evolved, and to provide some basis for this
+discussion.
+
+As a final note on variable naming, if a function makes global variables
+available for use by a main program, it is a good convention to start that
+variable's name with a capital letter.
+For example, @code{getopt}'s @code{Opterr} and @code{Optind} variables
+(@pxref{Getopt Function, ,Processing Command Line Options}).
+The leading capital letter indicates that it is global, while the fact that
+the variable name is not all capital letters indicates that the variable is
+not one of @code{awk}'s built-in variables, like @code{FS}.
+
+It is also important that @emph{all} variables in library functions
+that do not need to save state are in fact declared local. If this is
+not done, the variable could accidentally be used in the user's program,
+leading to bugs that are very difficult to track down.
+
+@example
+function lib_func(x, y, l1, l2)
+@{
+ @dots{}
+ @var{use variable} some_var # some_var could be local
+ @dots{} # but is not by oversight
+@}
+@end example
+
+@cindex Tcl
+A different convention, common in the Tcl community, is to use a single
+associative array to hold the values needed by the library function(s), or
+``package.'' This significantly decreases the number of actual global names
+in use. For example, the functions described in
+@ref{Passwd Functions, , Reading the User Database},
+might have used @code{@w{PW_data["inited"]}}, @code{@w{PW_data["total"]}},
+@code{@w{PW_data["count"]}} and @code{@w{PW_data["awklib"]}}, instead of
+@code{@w{_pw_inited}}, @code{@w{_pw_awklib}}, @code{@w{_pw_total}},
+and @code{@w{_pw_count}}.
+
+The conventions presented in this section are exactly that, conventions. You
+are not required to write your programs this way, we merely recommend that
+you do so.
+
+@node Sample Programs, Language History, Library Functions, Top
+@chapter Practical @code{awk} Programs
+
+This chapter presents a potpourri of @code{awk} programs for your reading
+enjoyment.
+@iftex
+There are two sections. The first presents @code{awk}
+versions of several common POSIX utilities.
+The second is a grab-bag of interesting programs.
+@end iftex
+
+Many of these programs use the library functions presented in
+@ref{Library Functions, ,A Library of @code{awk} Functions}.
+
+@menu
+* Clones:: Clones of common utilities.
+* Miscellaneous Programs:: Some interesting @code{awk} programs.
+@end menu
+
+@node Clones, Miscellaneous Programs, Sample Programs, Sample Programs
+@section Re-inventing Wheels for Fun and Profit
+
+This section presents a number of POSIX utilities that are implemented in
+@code{awk}. Re-inventing these programs in @code{awk} is often enjoyable,
+since the algorithms can be very clearly expressed, and usually the code is
+very concise and simple. This is true because @code{awk} does so much for you.
+
+It should be noted that these programs are not necessarily intended to
+replace the installed versions on your system. Instead, their
+purpose is to illustrate @code{awk} language programming for ``real world''
+tasks.
+
+The programs are presented in alphabetical order.
+
+@menu
+* Cut Program:: The @code{cut} utility.
+* Egrep Program:: The @code{egrep} utility.
+* Id Program:: The @code{id} utility.
+* Split Program:: The @code{split} utility.
+* Tee Program:: The @code{tee} utility.
+* Uniq Program:: The @code{uniq} utility.
+* Wc Program:: The @code{wc} utility.
+@end menu
+
+@node Cut Program, Egrep Program, Clones, Clones
+@subsection Cutting Out Fields and Columns
+
+@cindex @code{cut} utility
+The @code{cut} utility selects, or ``cuts,'' either characters or fields
+from its standard
+input and sends them to its standard output. @code{cut} can cut out either
+a list of characters, or a list of fields. By default, fields are separated
+by tabs, but you may supply a command line option to change the field
+@dfn{delimiter}, i.e.@: the field separator character. @code{cut}'s definition
+of fields is less general than @code{awk}'s.
+
+A common use of @code{cut} might be to pull out just the login name of
+logged-on users from the output of @code{who}. For example, the following
+pipeline generates a sorted, unique list of the logged on users:
+
+@example
+who | cut -c1-8 | sort | uniq
+@end example
+
+The options for @code{cut} are:
+
+@table @code
+@item -c @var{list}
+Use @var{list} as the list of characters to cut out. Items within the list
+may be separated by commas, and ranges of characters can be separated with
+dashes. The list @samp{1-8,15,22-35} specifies characters one through
+eight, 15, and 22 through 35.
+
+@item -f @var{list}
+Use @var{list} as the list of fields to cut out.
+
+@item -d @var{delim}
+Use @var{delim} as the field separator character instead of the tab
+character.
+
+@item -s
+Suppress printing of lines that do not contain the field delimiter.
+@end table
+
+The @code{awk} implementation of @code{cut} uses the @code{getopt} library
+function (@pxref{Getopt Function, ,Processing Command Line Options}),
+and the @code{join} library function
+(@pxref{Join Function, ,Merging an Array Into a String}).
+
+The program begins with a comment describing the options and a @code{usage}
+function which prints out a usage message and exits. @code{usage} is called
+if invalid arguments are supplied.
+
+@findex cut.awk
+@example
+@c @group
+@c file eg/prog/cut.awk
+# cut.awk --- implement cut in awk
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+# Options:
+# -f list Cut fields
+# -d c Field delimiter character
+# -c list Cut characters
+#
+# -s Suppress lines without the delimiter character
+
+function usage( e1, e2)
+@{
+ e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
+ e2 = "usage: cut [-c list] [files...]"
+ print e1 > "/dev/stderr"
+ print e2 > "/dev/stderr"
+ exit 1
+@}
+@c endfile
+@c @end group
+@end example
+
+@noindent
+The variables @code{e1} and @code{e2} are used so that the function
+fits nicely on the
+@iftex
+page.
+@end iftex
+@ifinfo
+screen.
+@end ifinfo
+
+Next comes a @code{BEGIN} rule that parses the command line options.
+It sets @code{FS} to a single tab character, since that is @code{cut}'s
+default field separator. The output field separator is also set to be the
+same as the input field separator. Then @code{getopt} is used to step
+through the command line options. One or the other of the variables
+@code{by_fields} or @code{by_chars} is set to true, to indicate that
+processing should be done by fields or by characters respectively.
+When cutting by characters, the output field separator is set to the null
+string.
+
+@example
+@c @group
+@c file eg/prog/cut.awk
+BEGIN \
+@{
+ FS = "\t" # default
+ OFS = FS
+ while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) @{
+ if (c == "f") @{
+ by_fields = 1
+ fieldlist = Optarg
+ @} else if (c == "c") @{
+ by_chars = 1
+ fieldlist = Optarg
+ OFS = ""
+ @} else if (c == "d") @{
+ if (length(Optarg) > 1) @{
+ printf("Using first character of %s" \
+ " for delimiter\n", Optarg) > "/dev/stderr"
+ Optarg = substr(Optarg, 1, 1)
+ @}
+ FS = Optarg
+ OFS = FS
+ if (FS == " ") # defeat awk semantics
+ FS = "[ ]"
+ @} else if (c == "s")
+ suppress++
+ else
+ usage()
+ @}
+
+ for (i = 1; i < Optind; i++)
+ ARGV[i] = ""
+@c endfile
+@c @end group
+@end example
+
+Special care is taken when the field delimiter is a space. Using
+@code{@w{" "}} (a single space) for the value of @code{FS} is
+incorrect---@code{awk} would
+separate fields with runs of spaces and/or tabs, and we want them to be
+separated with individual spaces. Also, note that after @code{getopt} is
+through, we have to clear out all the elements of @code{ARGV} from one to
+@code{Optind}, so that @code{awk} will not try to process the command line
+options as file names.
+
+After dealing with the command line options, the program verifies that the
+options make sense. Only one or the other of @samp{-c} and @samp{-f} should
+be used, and both require a field list. Then either @code{set_fieldlist} or
+@code{set_charlist} is called to pull apart the list of fields or
+characters.
+
+@example
+@c @group
+@c file eg/prog/cut.awk
+ if (by_fields && by_chars)
+ usage()
+
+ if (by_fields == 0 && by_chars == 0)
+ by_fields = 1 # default
+
+ if (fieldlist == "") @{
+ print "cut: needs list for -c or -f" > "/dev/stderr"
+ exit 1
+ @}
+
+@group
+ if (by_fields)
+ set_fieldlist()
+ else
+ set_charlist()
+@}
+@c endfile
+@end group
+@end example
+
+Here is @code{set_fieldlist}. It first splits the field list apart
+at the commas, into an array. Then, for each element of the array, it
+looks to see if it is actually a range, and if so splits it apart. The range
+is verified to make sure the first number is smaller than the second.
+Each number in the list is added to the @code{flist} array, which simply
+lists the fields that will be printed.
+Normal field splitting is used.
+The program lets @code{awk}
+handle the job of doing the field splitting.
+
+@example
+@c @group
+@c file eg/prog/cut.awk
+function set_fieldlist( n, m, i, j, k, f, g)
+@{
+ n = split(fieldlist, f, ",")
+ j = 1 # index in flist
+ for (i = 1; i <= n; i++) @{
+ if (index(f[i], "-") != 0) @{ # a range
+ m = split(f[i], g, "-")
+ if (m != 2 || g[1] >= g[2]) @{
+ printf("bad field list: %s\n",
+ f[i]) > "/dev/stderr"
+ exit 1
+ @}
+ for (k = g[1]; k <= g[2]; k++)
+ flist[j++] = k
+ @} else
+ flist[j++] = f[i]
+ @}
+ nfields = j - 1
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{set_charlist} function is more complicated than @code{set_fieldlist}.
+The idea here is to use @code{gawk}'s @code{FIELDWIDTHS} variable
+(@pxref{Constant Size, ,Reading Fixed-width Data}),
+which describes constant width input. When using a character list, that is
+exactly what we have.
+
+Setting up @code{FIELDWIDTHS} is more complicated than simply listing the
+fields that need to be printed. We have to keep track of the fields to be
+printed, and also the intervening characters that have to be skipped.
+For example, suppose you wanted characters one through eight, 15, and
+22 through 35. You would use @samp{-c 1-8,15,22-35}. The necessary value
+for @code{FIELDWIDTHS} would be @code{@w{"8 6 1 6 14"}}. This gives us five
+fields, and what should be printed are @code{$1}, @code{$3}, and @code{$5}.
+The intermediate fields are ``filler,'' stuff in between the desired data.
+
+@code{flist} lists the fields to be printed, and @code{t} tracks the
+complete field list, including filler fields.
+
+@example
+@c @group
+@c file eg/prog/cut.awk
+function set_charlist( field, i, j, f, g, t,
+ filler, last, len)
+@{
+ field = 1 # count total fields
+ n = split(fieldlist, f, ",")
+ j = 1 # index in flist
+ for (i = 1; i <= n; i++) @{
+ if (index(f[i], "-") != 0) @{ # range
+ m = split(f[i], g, "-")
+ if (m != 2 || g[1] >= g[2]) @{
+ printf(bad character list: %s\n",
+ f[i]) > "/dev/stderr"
+ exit 1
+ @}
+ len = g[2] - g[1] + 1
+ if (g[1] > 1) # compute length of filler
+ filler = g[1] - last - 1
+ else
+ filler = 0
+ if (filler)
+ t[field++] = filler
+ t[field++] = len # length of field
+ last = g[2]
+ flist[j++] = field - 1
+ @} else @{
+ if (f[i] > 1)
+ filler = f[i] - last - 1
+ else
+ filler = 0
+ if (filler)
+ t[field++] = filler
+ t[field++] = 1
+ last = f[i]
+ flist[j++] = field - 1
+ @}
+ @}
+@group
+ FIELDWIDTHS = join(t, 1, field - 1)
+ nfields = j - 1
+@}
+@end group
+@c endfile
+@end example
+
+Here is the rule that actually processes the data. If the @samp{-s} option
+was given, then @code{suppress} will be true. The first @code{if} statement
+makes sure that the input record does have the field separator. If
+@code{cut} is processing fields, @code{suppress} is true, and the field
+separator character is not in the record, then the record is skipped.
+
+If the record is valid, then at this point, @code{gawk} has split the data
+into fields, either using the character in @code{FS} or using fixed-length
+fields and @code{FIELDWIDTHS}. The loop goes through the list of fields
+that should be printed. If the corresponding field has data in it, it is
+printed. If the next field also has data, then the separator character is
+written out in between the fields.
+
+@c 2e: Could use `index($0, FS) != 0' instead of `$0 !~ FS', below
+
+@example
+@c @group
+@c file eg/prog/cut.awk
+@{
+ if (by_fields && suppress && $0 !~ FS)
+ next
+
+ for (i = 1; i <= nfields; i++) @{
+ if ($flist[i] != "") @{
+ printf "%s", $flist[i]
+ if (i < nfields && $flist[i+1] != "")
+ printf "%s", OFS
+ @}
+ @}
+ print ""
+@}
+@c endfile
+@c @end group
+@end example
+
+This version of @code{cut} relies on @code{gawk}'s @code{FIELDWIDTHS}
+variable to do the character-based cutting. While it would be possible in
+other @code{awk} implementations to use @code{substr}
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}),
+it would also be extremely painful to do so.
+The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem
+of picking the input line apart by characters.
+
+@node Egrep Program, Id Program, Cut Program, Clones
+@subsection Searching for Regular Expressions in Files
+
+@cindex @code{egrep} utility
+The @code{egrep} utility searches files for patterns. It uses regular
+expressions that are almost identical to those available in @code{awk}
+(@pxref{Regexp Constants, ,Regular Expression Constants}). It is used this way:
+
+@example
+egrep @r{[} @var{options} @r{]} '@var{pattern}' @var{files} @dots{}
+@end example
+
+The @var{pattern} is a regexp.
+In typical usage, the regexp is quoted to prevent the shell from expanding
+any of the special characters as file name wildcards.
+Normally, @code{egrep} prints the
+lines that matched. If multiple file names are provided on the command
+line, each output line is preceded by the name of the file and a colon.
+
+The options are:
+
+@table @code
+@item -c
+Print out a count of the lines that matched the pattern, instead of the
+lines themselves.
+
+@item -s
+Be silent. No output is produced, and the exit value indicates whether
+or not the pattern was matched.
+
+@item -v
+Invert the sense of the test. @code{egrep} prints the lines that do
+@emph{not} match the pattern, and exits successfully if the pattern was not
+matched.
+
+@item -i
+Ignore case distinctions in both the pattern and the input data.
+
+@item -l
+Only print the names of the files that matched, not the lines that matched.
+
+@item -e @var{pattern}
+Use @var{pattern} as the regexp to match. The purpose of the @samp{-e}
+option is to allow patterns that start with a @samp{-}.
+@end table
+
+This version uses the @code{getopt} library function
+(@pxref{Getopt Function, ,Processing Command Line Options}),
+and the file transition library program
+(@pxref{Filetrans Function, ,Noting Data File Boundaries}).
+
+The program begins with a descriptive comment, and then a @code{BEGIN} rule
+that processes the command line arguments with @code{getopt}. The @samp{-i}
+(ignore case) option is particularly easy with @code{gawk}; we just use the
+@code{IGNORECASE} built in variable
+(@pxref{Built-in Variables}).
+
+@findex egrep.awk
+@example
+@c @group
+@c file eg/prog/egrep.awk
+# egrep.awk --- simulate egrep in awk
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+# Options:
+# -c count of lines
+# -s silent - use exit value
+# -v invert test, success if no match
+# -i ignore case
+# -l print filenames only
+# -e argument is pattern
+
+BEGIN @{
+ while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) @{
+ if (c == "c")
+ count_only++
+ else if (c == "s")
+ no_print++
+ else if (c == "v")
+ invert++
+ else if (c == "i")
+ IGNORECASE = 1
+ else if (c == "l")
+ filenames_only++
+ else if (c == "e")
+ pattern = Optarg
+ else
+ usage()
+ @}
+@c endfile
+@c @end group
+@end example
+
+Next comes the code that handles the @code{egrep} specific behavior. If no
+pattern was supplied with @samp{-e}, the first non-option on the command
+line is used. The @code{awk} command line arguments up to @code{ARGV[Optind]}
+are cleared, so that @code{awk} won't try to process them as files. If no
+files were specified, the standard input is used, and if multiple files were
+specified, we make sure to note this so that the file names can precede the
+matched lines in the output.
+
+The last two lines are commented out, since they are not needed in
+@code{gawk}. They should be uncommented if you have to use another version
+of @code{awk}.
+
+@example
+@c @group
+@c file eg/prog/egrep.awk
+ if (pattern == "")
+ pattern = ARGV[Optind++]
+
+ for (i = 1; i < Optind; i++)
+ ARGV[i] = ""
+ if (Optind >= ARGC) @{
+ ARGV[1] = "-"
+ ARGC = 2
+ @} else if (ARGC - Optind > 1)
+ do_filenames++
+
+# if (IGNORECASE)
+# pattern = tolower(pattern)
+@}
+@c endfile
+@c @end group
+@end example
+
+The next set of lines should be uncommented if you are not using
+@code{gawk}. This rule translates all the characters in the input line
+into lower-case if the @samp{-i} option was specified. The rule is
+commented out since it is not necessary with @code{gawk}.
+@c bug: if a match happens, we output the translated line, not the original
+
+@example
+@c @group
+@c file eg/prog/egrep.awk
+#@{
+# if (IGNORECASE)
+# $0 = tolower($0)
+#@}
+@c endfile
+@c @end group
+@end example
+
+The @code{beginfile} function is called by the rule in @file{ftrans.awk}
+when each new file is processed. In this case, it is very simple; all it
+does is initialize a variable @code{fcount} to zero. @code{fcount} tracks
+how many lines in the current file matched the pattern.
+
+@example
+@c @group
+@c file eg/prog/egrep.awk
+function beginfile(junk)
+@{
+ fcount = 0
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{endfile} function is called after each file has been processed.
+It is used only when the user wants a count of the number of lines that
+matched. @code{no_print} will be true only if the exit status is desired.
+@code{count_only} will be true if line counts are desired. @code{egrep}
+will therefore only print line counts if printing and counting are enabled.
+The output format must be adjusted depending upon the number of files to be
+processed. Finally, @code{fcount} is added to @code{total}, so that we
+know how many lines altogether matched the pattern.
+
+@example
+@c @group
+@c file eg/prog/egrep.awk
+function endfile(file)
+@{
+ if (! no_print && count_only)
+ if (do_filenames)
+ print file ":" fcount
+ else
+ print fcount
+
+ total += fcount
+@}
+@c endfile
+@c @end group
+@end example
+
+This rule does most of the work of matching lines. The variable
+@code{matches} will be true if the line matched the pattern. If the user
+wants lines that did not match, the sense of the @code{matches} is inverted
+using the @samp{!} operator. @code{fcount} is incremented with the value of
+@code{matches}, which will be either one or zero, depending upon a
+successful or unsuccessful match. If the line did not match, the
+@code{next} statement just moves on to the next record.
+
+There are several optimizations for performance in the following few lines
+of code. If the user only wants exit status (@code{no_print} is true), and
+we don't have to count lines, then it is enough to know that one line in
+this file matched, and we can skip on to the next file with @code{nextfile}.
+Along similar lines, if we are only printing file names, and we
+don't need to count lines, we can print the file name, and then skip to the
+next file with @code{nextfile}.
+
+Finally, each line is printed, with a leading filename and colon if
+necessary.
+
+@ignore
+2e: note, probably better to recode the last few lines as
+ if (! count_only) @{
+ if (no_print)
+ nextfile
+
+ if (filenames_only) @{
+ print FILENAME
+ nextfile
+ @}
+
+ if (do_filenames)
+ print FILENAME ":" $0
+ else
+ print
+ @}
+@end ignore
+
+@example
+@c @group
+@c file eg/prog/egrep.awk
+@{
+ matches = ($0 ~ pattern)
+ if (invert)
+ matches = ! matches
+
+ fcount += matches # 1 or 0
+
+ if (! matches)
+ next
+
+ if (no_print && ! count_only)
+ nextfile
+
+ if (filenames_only && ! count_only) @{
+ print FILENAME
+ nextfile
+ @}
+
+ if (do_filenames && ! count_only)
+ print FILENAME ":" $0
+ else if (! count_only)
+ print
+@}
+@c endfile
+@c @end group
+@end example
+
+@c @strong{Exercise}: rearrange the code inside @samp{if (! count_only)}.
+
+The @code{END} rule takes care of producing the correct exit status. If
+there were no matches, the exit status is one, otherwise it is zero.
+
+@example
+@c @group
+@c file eg/prog/egrep.awk
+END \
+@{
+ if (total == 0)
+ exit 1
+ exit 0
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{usage} function prints a usage message in case of invalid options
+and then exits.
+
+@example
+@c @group
+@c file eg/prog/egrep.awk
+function usage( e)
+@{
+ e = "Usage: egrep [-csvil] [-e pat] [files ...]"
+ print e > "/dev/stderr"
+ exit 1
+@}
+@c endfile
+@c @end group
+@end example
+
+The variable @code{e} is used so that the function fits nicely
+on the printed page.
+
+@node Id Program, Split Program, Egrep Program, Clones
+@subsection Printing Out User Information
+
+@cindex @code{id} utility
+The @code{id} utility lists a user's real and effective user-id numbers,
+real and effective group-id numbers, and the user's group set, if any.
+@code{id} will only print the effective user-id and group-id if they are
+different from the real ones. If possible, @code{id} will also supply the
+corresponding user and group names. The output might look like this:
+
+@example
+$ id
+@print{} uid=2076(arnold) gid=10(staff) groups=10(staff),4(tty)
+@end example
+
+This information is exactly what is provided by @code{gawk}'s
+@file{/dev/user} special file (@pxref{Special Files, ,Special File Names in @code{gawk}}).
+However, the @code{id} utility provides a more palatable output than just a
+string of numbers.
+
+Here is a simple version of @code{id} written in @code{awk}.
+It uses the user database library functions
+(@pxref{Passwd Functions, ,Reading the User Database}),
+and the group database library functions
+(@pxref{Group Functions, ,Reading the Group Database}).
+
+The program is fairly straightforward. All the work is done in the
+@code{BEGIN} rule. The user and group id numbers are obtained from
+@file{/dev/user}. If there is no support for @file{/dev/user}, the program
+gives up.
+
+The code is repetitive. The entry in the user database for the real user-id
+number is split into parts at the @samp{:}. The name is the first field.
+Similar code is used for the effective user-id number, and the group
+numbers.
+
+@findex id.awk
+@example
+@c @group
+@c file eg/prog/id.awk
+# id.awk --- implement id in awk
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+# output is:
+# uid=12(foo) euid=34(bar) gid=3(baz) \
+# egid=5(blat) groups=9(nine),2(two),1(one)
+
+BEGIN \
+@{
+ if ((getline < "/dev/user") < 0) @{
+ err = "id: no /dev/user support - cannot run"
+ print err > "/dev/stderr"
+ exit 1
+ @}
+ close("/dev/user")
+
+ uid = $1
+ euid = $2
+ gid = $3
+ egid = $4
+
+ printf("uid=%d", uid)
+ pw = getpwuid(uid)
+@group
+ if (pw != "") @{
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ @}
+@end group
+
+ if (euid != uid) @{
+ printf(" euid=%d", euid)
+ pw = getpwuid(euid)
+ if (pw != "") @{
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ @}
+ @}
+
+ printf(" gid=%d", gid)
+ pw = getgrgid(gid)
+ if (pw != "") @{
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ @}
+
+ if (egid != gid) @{
+ printf(" egid=%d", egid)
+ pw = getgrgid(egid)
+ if (pw != "") @{
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ @}
+ @}
+
+ if (NF > 4) @{
+ printf(" groups=");
+ for (i = 5; i <= NF; i++) @{
+ printf("%d", $i)
+ pw = getgrgid($i)
+ if (pw != "") @{
+ split(pw, a, ":")
+ printf("(%s)", a[1])
+ @}
+ if (i < NF)
+ printf(",")
+ @}
+ @}
+ print ""
+@}
+@c endfile
+@c @end group
+@end example
+
+@c exercise!!!
+@ignore
+The POSIX version of @code{id} takes arguments that control which
+information is printed. Modify this version to accept the same
+arguments and perform in the same way.
+@end ignore
+
+@node Split Program, Tee Program, Id Program, Clones
+@subsection Splitting a Large File Into Pieces
+
+@cindex @code{split} utility
+The @code{split} program splits large text files into smaller pieces. By default,
+the output files are named @file{xaa}, @file{xab}, and so on. Each file has
+1000 lines in it, with the likely exception of the last file. To change the
+number of lines in each file, you supply a number on the command line
+preceded with a minus, e.g., @samp{-500} for files with 500 lines in them
+instead of 1000. To change the name of the output files to something like
+@file{myfileaa}, @file{myfileab}, and so on, you supply an additional
+argument that specifies the filename.
+
+Here is a version of @code{split} in @code{awk}. It uses the @code{ord} and
+@code{chr} functions presented in
+@ref{Ordinal Functions, ,Translating Between Characters and Numbers}.
+
+The program first sets its defaults, and then tests to make sure there are
+not too many arguments. It then looks at each argument in turn. The
+first argument could be a minus followed by a number. If it is, this happens
+to look like a negative number, so it is made positive, and that is the
+count of lines. The data file name is skipped over, and the final argument
+is used as the prefix for the output file names.
+
+@findex split.awk
+@example
+@c @group
+@c file eg/prog/split.awk
+# split.awk --- do split in awk
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+# usage: split [-num] [file] [outname]
+
+BEGIN \
+@{
+ outfile = "x" # default
+ count = 1000
+ if (ARGC > 4)
+ usage()
+
+ i = 1
+ if (ARGV[i] ~ /^-[0-9]+$/) @{
+ count = -ARGV[i]
+ ARGV[i] = ""
+ i++
+ @}
+ # test argv in case reading from stdin instead of file
+ if (i in ARGV)
+ i++ # skip data file name
+ if (i in ARGV) @{
+ outfile = ARGV[i]
+ ARGV[i] = ""
+ @}
+
+ s1 = s2 = "a"
+ out = (outfile s1 s2)
+@}
+@c endfile
+@c @end group
+@end example
+
+The next rule does most of the work. @code{tcount} (temporary count) tracks
+how many lines have been printed to the output file so far. If it is greater
+than @code{count}, it is time to close the current file and start a new one.
+@code{s1} and @code{s2} track the current suffixes for the file name. If
+they are both @samp{z}, the file is just too big. Otherwise, @code{s1}
+moves to the next letter in the alphabet and @code{s2} starts over again at
+@samp{a}.
+
+@example
+@c @group
+@c file eg/prog/split.awk
+@{
+ if (++tcount > count) @{
+ close(out)
+ if (s2 == "z") @{
+ if (s1 == "z") @{
+ printf("split: %s is too large to split\n", \
+ FILENAME) > "/dev/stderr"
+ exit 1
+ @}
+ s1 = chr(ord(s1) + 1)
+ s2 = "a"
+ @} else
+ s2 = chr(ord(s2) + 1)
+ out = (outfile s1 s2)
+ tcount = 1
+ @}
+ print > out
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{usage} function simply prints an error message and exits.
+
+@example
+@c @group
+@c file eg/prog/split.awk
+function usage( e)
+@{
+ e = "usage: split [-num] [file] [outname]"
+ print e > "/dev/stderr"
+ exit 1
+@}
+@c endfile
+@c @end group
+@end example
+
+@noindent
+The variable @code{e} is used so that the function
+fits nicely on the
+@iftex
+page.
+@end iftex
+@ifinfo
+screen.
+@end ifinfo
+
+This program is a bit sloppy; it relies on @code{awk} to close the last file
+for it automatically, instead of doing it in an @code{END} rule.
+
+@node Tee Program, Uniq Program, Split Program, Clones
+@subsection Duplicating Output Into Multiple Files
+
+@cindex @code{tee} utility
+The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies
+its standard input to its standard output, and also duplicates it to the
+files named on the command line. Its usage is:
+
+@example
+tee @r{[}-a@r{]} file @dots{}
+@end example
+
+The @samp{-a} option tells @code{tee} to append to the named files, instead of
+truncating them and starting over.
+
+The @code{BEGIN} rule first makes a copy of all the command line arguments,
+into an array named @code{copy}.
+@code{ARGV[0]} is not copied, since it is not needed.
+@code{tee} cannot use @code{ARGV} directly, since @code{awk} will attempt to
+process each file named in @code{ARGV} as input data.
+
+If the first argument is @samp{-a}, then the flag variable
+@code{append} is set to true, and both @code{ARGV[1]} and
+@code{copy[1]} are deleted. If @code{ARGC} is less than two, then no file
+names were supplied, and @code{tee} prints a usage message and exits.
+Finally, @code{awk} is forced to read the standard input by setting
+@code{ARGV[1]} to @code{"-"}, and @code{ARGC} to two.
+
+@c 2e: the `ARGC--' in the `if (ARGV[1] == "-a")' isn't needed.
+
+@findex tee.awk
+@example
+@c @group
+@c file eg/prog/tee.awk
+# tee.awk --- tee in awk
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+# Revised December 1995
+
+BEGIN \
+@{
+ for (i = 1; i < ARGC; i++)
+ copy[i] = ARGV[i]
+
+ if (ARGV[1] == "-a") @{
+ append = 1
+ delete ARGV[1]
+ delete copy[1]
+ ARGC--
+ @}
+ if (ARGC < 2) @{
+ print "usage: tee [-a] file ..." > "/dev/stderr"
+ exit 1
+ @}
+ ARGV[1] = "-"
+ ARGC = 2
+@}
+@c endfile
+@c @end group
+@end example
+
+The single rule does all the work. Since there is no pattern, it is
+executed for each line of input. The body of the rule simply prints the
+line into each file on the command line, and then to the standard output.
+
+@example
+@group
+@c file eg/prog/tee.awk
+@{
+ # moving the if outside the loop makes it run faster
+ if (append)
+ for (i in copy)
+ print >> copy[i]
+ else
+ for (i in copy)
+ print > copy[i]
+ print
+@}
+@c endfile
+@end group
+@end example
+
+It would have been possible to code the loop this way:
+
+@example
+for (i in copy)
+ if (append)
+ print >> copy[i]
+ else
+ print > copy[i]
+@end example
+
+@noindent
+This is more concise, but it is also less efficient. The @samp{if} is
+tested for each record and for each output file. By duplicating the loop
+body, the @samp{if} is only tested once for each input record. If there are
+@var{N} input records and @var{M} input files, the first method only
+executes @var{N} @samp{if} statements, while the second would execute
+@var{N}@code{*}@var{M} @samp{if} statements.
+
+Finally, the @code{END} rule cleans up, by closing all the output files.
+
+@example
+@c @group
+@c file eg/prog/tee.awk
+END \
+@{
+ for (i in copy)
+ close(copy[i])
+@}
+@c endfile
+@c @end group
+@end example
+
+@node Uniq Program, Wc Program, Tee Program, Clones
+@subsection Printing Non-duplicated Lines of Text
+
+@cindex @code{uniq} utility
+The @code{uniq} utility reads sorted lines of data on its standard input,
+and (by default) removes duplicate lines. In other words, only unique lines
+are printed, hence the name. @code{uniq} has a number of options. The usage is:
+
+@example
+uniq @r{[}-udc @r{[}-@var{n}@r{]]} @r{[}+@var{n}@r{]} @r{[} @var{input file} @r{[} @var{output file} @r{]]}
+@end example
+
+The option meanings are:
+
+@table @code
+@item -d
+Only print repeated lines.
+
+@item -u
+Only print non-repeated lines.
+
+@item -c
+Count lines. This option overrides @samp{-d} and @samp{-u}. Both repeated
+and non-repeated lines are counted.
+
+@item -@var{n}
+Skip @var{n} fields before comparing lines. The definition of fields is the
+same as @code{awk}'s default: non-whitespace characters separated by runs of
+spaces and/or tabs.
+
+@item +@var{n}
+Skip @var{n} characters before comparing lines. Any fields specified with
+@samp{-@var{n}} are skipped first.
+
+@item @var{input file}
+Data is read from the input file named on the command line, instead of from
+the standard input.
+
+@item @var{output file}
+The generated output is sent to the named output file, instead of to the
+standard output.
+@end table
+
+Normally @code{uniq} behaves as if both the @samp{-d} and @samp{-u} options
+had been provided.
+
+Here is an @code{awk} implementation of @code{uniq}. It uses the
+@code{getopt} library function
+(@pxref{Getopt Function, ,Processing Command Line Options}),
+and the @code{join} library function
+(@pxref{Join Function, ,Merging an Array Into a String}).
+
+The program begins with a @code{usage} function and then a brief outline of
+the options and their meanings in a comment.
+
+The @code{BEGIN} rule deals with the command line arguments and options. It
+uses a trick to get @code{getopt} to handle options of the form @samp{-25},
+treating such an option as the option letter @samp{2} with an argument of
+@samp{5}. If indeed two or more digits were supplied (@code{Optarg} looks
+like a number), @code{Optarg} is
+concatenated with the option digit, and then result is added to zero to make
+it into a number. If there is only one digit in the option, then
+@code{Optarg} is not needed, and @code{Optind} must be decremented so that
+@code{getopt} will process it next time. This code is admittedly a bit
+tricky.
+
+If no options were supplied, then the default is taken, to print both
+repeated and non-repeated lines. The output file, if provided, is assigned
+to @code{outputfile}. Earlier, @code{outputfile} was initialized to the
+standard output, @file{/dev/stdout}.
+
+@findex uniq.awk
+@example
+@c @group
+@c file eg/prog/uniq.awk
+# uniq.awk --- do uniq in awk
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+function usage( e)
+@{
+ e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]"
+ print e > "/dev/stderr"
+ exit 1
+@}
+
+# -c count lines. overrides -d and -u
+# -d only repeated lines
+# -u only non-repeated lines
+# -n skip n fields
+# +n skip n characters, skip fields first
+
+BEGIN \
+@{
+ count = 1
+ outputfile = "/dev/stdout"
+ opts = "udc0:1:2:3:4:5:6:7:8:9:"
+ while ((c = getopt(ARGC, ARGV, opts)) != -1) @{
+ if (c == "u")
+ non_repeated_only++
+ else if (c == "d")
+ repeated_only++
+ else if (c == "c")
+ do_count++
+ else if (index("0123456789", c) != 0) @{
+ # getopt requires args to options
+ # this messes us up for things like -5
+ if (Optarg ~ /^[0-9]+$/)
+ fcount = (c Optarg) + 0
+ else @{
+ fcount = c + 0
+ Optind--
+ @}
+ @} else
+ usage()
+ @}
+
+ if (ARGV[Optind] ~ /^\+[0-9]+$/) @{
+ charcount = substr(ARGV[Optind], 2) + 0
+ Optind++
+ @}
+
+ for (i = 1; i < Optind; i++)
+ ARGV[i] = ""
+
+ if (repeated_only == 0 && non_repeated_only == 0)
+ repeated_only = non_repeated_only = 1
+
+ if (ARGC - Optind == 2) @{
+ outputfile = ARGV[ARGC - 1]
+ ARGV[ARGC - 1] = ""
+ @}
+@}
+@c endfile
+@c @end group
+@end example
+
+The following function, @code{are_equal}, compares the current line,
+@code{$0}, to the
+previous line, @code{last}. It handles skipping fields and characters.
+
+If no field count and no character count were specified, @code{are_equal}
+simply returns one or zero depending upon the result of a simple string
+comparison of @code{last} and @code{$0}. Otherwise, things get more
+complicated.
+
+If fields have to be skipped, each line is broken into an array using
+@code{split}
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}),
+and then the desired fields are joined back into a line using @code{join}.
+The joined lines are stored in @code{clast} and @code{cline}.
+If no fields are skipped, @code{clast} and @code{cline} are set to
+@code{last} and @code{$0} respectively.
+
+Finally, if characters are skipped, @code{substr} is used to strip off the
+leading @code{charcount} characters in @code{clast} and @code{cline}. The
+two strings are then compared, and @code{are_equal} returns the result.
+
+@example
+@c @group
+@c file eg/prog/uniq.awk
+function are_equal( n, m, clast, cline, alast, aline)
+@{
+ if (fcount == 0 && charcount == 0)
+ return (last == $0)
+
+ if (fcount > 0) @{
+ n = split(last, alast)
+ m = split($0, aline)
+ clast = join(alast, fcount+1, n)
+ cline = join(aline, fcount+1, m)
+ @} else @{
+ clast = last
+ cline = $0
+ @}
+ if (charcount) @{
+ clast = substr(clast, charcount + 1)
+ cline = substr(cline, charcount + 1)
+ @}
+
+ return (clast == cline)
+@}
+@c endfile
+@c @end group
+@end example
+
+The following two rules are the body of the program. The first one is
+executed only for the very first line of data. It sets @code{last} equal to
+@code{$0}, so that subsequent lines of text have something to be compared to.
+
+The second rule does the work. The variable @code{equal} will be one or zero
+depending upon the results of @code{are_equal}'s comparison. If @code{uniq}
+is counting repeated lines, then the @code{count} variable is incremented if
+the lines are equal. Otherwise the line is printed and @code{count} is
+reset, since the two lines are not equal.
+
+If @code{uniq} is not counting, @code{count} is incremented if the lines are
+equal. Otherwise, if @code{uniq} is counting repeated lines, and more than
+one line has been seen, or if @code{uniq} is counting non-repeated lines,
+and only one line has been seen, then the line is printed, and @code{count}
+is reset.
+
+Finally, similar logic is used in the @code{END} rule to print the final
+line of input data.
+
+@example
+@c @group
+@c file eg/prog/uniq.awk
+@group
+NR == 1 @{
+ last = $0
+ next
+@}
+@end group
+
+@{
+ equal = are_equal()
+
+ if (do_count) @{ # overrides -d and -u
+ if (equal)
+ count++
+ else @{
+ printf("%4d %s\n", count, last) > outputfile
+ last = $0
+ count = 1 # reset
+ @}
+ next
+ @}
+
+ if (equal)
+ count++
+ else @{
+ if ((repeated_only && count > 1) ||
+ (non_repeated_only && count == 1))
+ print last > outputfile
+ last = $0
+ count = 1
+ @}
+@}
+
+@group
+END @{
+ if (do_count)
+ printf("%4d %s\n", count, last) > outputfile
+ else if ((repeated_only && count > 1) ||
+ (non_repeated_only && count == 1))
+ print last > outputfile
+@}
+@end group
+@c endfile
+@c @end group
+@end example
+
+@node Wc Program, , Uniq Program, Clones
+@subsection Counting Things
+
+@cindex @code{wc} utility
+The @code{wc} (word count) utility counts lines, words, and characters in
+one or more input files. Its usage is:
+
+@example
+wc @r{[}-lwc@r{]} @r{[} @var{files} @dots{} @r{]}
+@end example
+
+If no files are specified on the command line, @code{wc} reads its standard
+input. If there are multiple files, it will also print total counts for all
+the files. The options and their meanings are:
+
+@table @code
+@item -l
+Only count lines.
+
+@item -w
+Only count words.
+A ``word'' is a contiguous sequence of non-whitespace characters, separated
+by spaces and/or tabs. Happily, this is the normal way @code{awk} separates
+fields in its input data.
+
+@item -c
+Only count characters.
+@end table
+
+Implementing @code{wc} in @code{awk} is particularly elegant, since
+@code{awk} does a lot of the work for us; it splits lines into words (i.e.@:
+fields) and counts them, it counts lines (i.e.@: records) for us, and it can
+easily tell us how long a line is.
+
+This version uses the @code{getopt} library function
+(@pxref{Getopt Function, ,Processing Command Line Options}),
+and the file transition functions
+(@pxref{Filetrans Function, ,Noting Data File Boundaries}).
+
+This version has one major difference from traditional versions of @code{wc}.
+Our version always prints the counts in the order lines, words,
+and characters. Traditional versions note the order of the @samp{-l},
+@samp{-w}, and @samp{-c} options on the command line, and print the counts
+in that order.
+
+The @code{BEGIN} rule does the argument processing.
+The variable @code{print_total} will
+be true if more than one file was named on the command line.
+
+@findex wc.awk
+@example
+@c @group
+@c file eg/prog/wc.awk
+# wc.awk --- count lines, words, characters
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+# Options:
+# -l only count lines
+# -w only count words
+# -c only count characters
+#
+# Default is to count lines, words, characters
+
+BEGIN @{
+ # let getopt print a message about
+ # invalid options. we ignore them
+ while ((c = getopt(ARGC, ARGV, "lwc")) != -1) @{
+ if (c == "l")
+ do_lines = 1
+ else if (c == "w")
+ do_words = 1
+ else if (c == "c")
+ do_chars = 1
+ @}
+ for (i = 1; i < Optind; i++)
+ ARGV[i] = ""
+
+ # if no options, do all
+ if (! do_lines && ! do_words && ! do_chars)
+ do_lines = do_words = do_chars = 1
+
+ print_total = (ARC - i > 2)
+@}
+@c endfile
+@c @end group
+@end example
+
+The @code{beginfile} function is simple; it just resets the counts of lines,
+words, and characters to zero, and saves the current file name in
+@code{fname}.
+
+The @code{endfile} function adds the current file's numbers to the running
+totals of lines, words, and characters. It then prints out those numbers
+for the file that was just read. It relies on @code{beginfile} to reset the
+numbers for the following data file.
+
+@example
+@c @group
+@c file eg/prog/wc.awk
+function beginfile(file)
+@{
+ chars = lines = words = 0
+ fname = FILENAME
+@}
+
+function endfile(file)
+@{
+ tchars += chars
+ tlines += lines
+ twords += words
+@group
+ if (do_lines)
+ printf "\t%d", lines
+@end group
+ if (do_words)
+ printf "\t%d", words
+ if (do_chars)
+ printf "\t%d", chars
+ printf "\t%s\n", fname
+@}
+@c endfile
+@c @end group
+@end example
+
+There is one rule that is executed for each line. It adds the length of the
+record to @code{chars}. It has to add one, since the newline character
+separating records (the value of @code{RS}) is not part of the record
+itself. @code{lines} is incremented for each line read, and @code{words} is
+incremented by the value of @code{NF}, the number of ``words'' on this
+line.@footnote{Examine the code in
+@ref{Filetrans Function, ,Noting Data File Boundaries}.
+Why must @code{wc} use a separate @code{lines} variable, instead of using
+the value of @code{FNR} in @code{endfile}?}
+
+Finally, the @code{END} rule simply prints the totals for all the files.
+
+@example
+@c @group
+@c file eg/prog/wc.awk
+# do per line
+@{
+ chars += length($0) + 1 # get newline
+ lines++
+ words += NF
+@}
+
+END @{
+ if (print_total) @{
+ if (do_lines)
+ printf "\t%d", tlines
+ if (do_words)
+ printf "\t%d", twords
+ if (do_chars)
+ printf "\t%d", tchars
+ print "\ttotal"
+ @}
+@}
+@c endfile
+@c @end group
+@end example
+
+@node Miscellaneous Programs, , Clones, Sample Programs
+@section A Grab Bag of @code{awk} Programs
+
+This section is a large ``grab bag'' of miscellaneous programs.
+We hope you find them both interesting and enjoyable.
+
+@menu
+* Dupword Program:: Finding duplicated words in a document.
+* Alarm Program:: An alarm clock.
+* Translate Program:: A program similar to the @code{tr} utility.
+* Labels Program:: Printing mailing labels.
+* Word Sorting:: A program to produce a word usage count.
+* History Sorting:: Eliminating duplicate entries from a history
+ file.
+* Extract Program:: Pulling out programs from Texinfo source
+ files.
+* Simple Sed:: A Simple Stream Editor.
+* Igawk Program:: A wrapper for @code{awk} that includes files.
+@end menu
+
+@node Dupword Program, Alarm Program, Miscellaneous Programs, Miscellaneous Programs
+@subsection Finding Duplicated Words in a Document
+
+A common error when writing large amounts of prose is to accidentally
+duplicate words. Often you will see this in text as something like ``the
+the program does the following @dots{}.'' When the text is on-line, often
+the duplicated words occur at the end of one line and the beginning of
+another, making them very difficult to spot.
+@c as here!
+
+This program, @file{dupword.awk}, scans through a file one line at a time,
+and looks for adjacent occurrences of the same word. It also saves the last
+word on a line (in the variable @code{prev}) for comparison with the first
+word on the next line.
+
+The first two statements make sure that the line is all lower-case, so that,
+for example,
+``The'' and ``the'' compare equal to each other. The second statement
+removes all non-alphanumeric and non-whitespace characters from the line, so
+that punctuation does not affect the comparison either. This sometimes
+leads to reports of duplicated words that really are different, but this is
+unusual.
+
+@findex dupword.awk
+@example
+@group
+@c file eg/prog/dupword.awk
+# dupword --- find duplicate words in text
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# December 1991
+
+@{
+ $0 = tolower($0)
+ gsub(/[^A-Za-z0-9 \t]/, "");
+ if ($1 == prev)
+ printf("%s:%d: duplicate %s\n",
+ FILENAME, FNR, $1)
+ for (i = 2; i <= NF; i++)
+ if ($i == $(i-1))
+ printf("%s:%d: duplicate %s\n",
+ FILENAME, FNR, $i)
+ prev = $NF
+@}
+@c endfile
+@end group
+@end example
+
+@node Alarm Program, Translate Program, Dupword Program, Miscellaneous Programs
+@subsection An Alarm Clock Program
+
+The following program is a simple ``alarm clock'' program.
+You give it a time of day, and an optional message. At the given time,
+it prints the message on the standard output. In addition, you can give it
+the number of times to repeat the message, and also a delay between
+repetitions.
+
+This program uses the @code{gettimeofday} function from
+@ref{Gettimeofday Function, ,Managing the Time of Day}.
+
+All the work is done in the @code{BEGIN} rule. The first part is argument
+checking and setting of defaults; the delay, the count, and the message to
+print. If the user supplied a message, but it does not contain the ASCII BEL
+character (known as the ``alert'' character, @samp{\a}), then it is added to
+the message. (On many systems, printing the ASCII BEL generates some sort
+of audible alert. Thus, when the alarm goes off, the system calls attention
+to itself, in case the user is not looking at their computer or terminal.)
+
+@findex alarm.awk
+@example
+@c @group
+@c file eg/prog/alarm.awk
+# alarm --- set an alarm
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+# usage: alarm time [ "message" [ count [ delay ] ] ]
+
+BEGIN \
+@{
+ # Initial argument sanity checking
+ usage1 = "usage: alarm time ['message' [count [delay]]]"
+ usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1])
+
+ if (ARGC < 2) @{
+ print usage > "/dev/stderr"
+ exit 1
+ @} else if (ARGC == 5) @{
+ delay = ARGV[4] + 0
+ count = ARGV[3] + 0
+ message = ARGV[2]
+ @} else if (ARGC == 4) @{
+ count = ARGV[3] + 0
+ message = ARGV[2]
+ @} else if (ARGC == 3) @{
+ message = ARGV[2]
+ @} else if (ARGV[1] !~ /[0-9]?[0-9]:[0-9][0-9]/) @{
+ print usage1 > "/dev/stderr"
+ print usage2 > "/dev/stderr"
+ exit 1
+ @}
+
+ # set defaults for once we reach the desired time
+ if (delay == 0)
+ delay = 180 # 3 minutes
+ if (count == 0)
+ count = 5
+@group
+ if (message == "")
+ message = sprintf("\aIt is now %s!\a", ARGV[1])
+ else if (index(message, "\a") == 0)
+ message = "\a" message "\a"
+@end group
+@c endfile
+@end example
+
+The next section of code turns the alarm time into hours and minutes,
+and converts it if necessary to a 24-hour clock. Then it turns that
+time into a count of the seconds since midnight. Next it turns the current
+time into a count of seconds since midnight. The difference between the two
+is how long to wait before setting off the alarm.
+
+@example
+@c @group
+@c file eg/prog/alarm.awk
+ # split up dest time
+ split(ARGV[1], atime, ":")
+ hour = atime[1] + 0 # force numeric
+ minute = atime[2] + 0 # force numeric
+
+ # get current broken down time
+ gettimeofday(now)
+
+ # if time given is 12-hour hours and it's after that
+ # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
+ # then add 12 to real hour
+ if (hour < 12 && now["hour"] > hour)
+ hour += 12
+
+ # set target time in seconds since midnight
+ target = (hour * 60 * 60) + (minute * 60)
+
+ # get current time in seconds since midnight
+ current = (now["hour"] * 60 * 60) + \
+ (now["minute"] * 60) + now["second"]
+
+ # how long to sleep for
+ naptime = target - current
+ if (naptime <= 0) @{
+ print "time is in the past!" > "/dev/stderr"
+ exit 1
+ @}
+@c endfile
+@c @end group
+@end example
+
+Finally, the program uses the @code{system} function
+(@pxref{I/O Functions, ,Built-in Functions for Input/Output})
+to call the @code{sleep} utility. The @code{sleep} utility simply pauses
+for the given number of seconds. If the exit status is not zero,
+the program assumes that @code{sleep} was interrupted, and exits. If
+@code{sleep} exited with an OK status (zero), then the program prints the
+message in a loop, again using @code{sleep} to delay for however many
+seconds are necessary.
+
+@example
+@c @group
+@c file eg/prog/alarm.awk
+ # zzzzzz..... go away if interrupted
+ if (system(sprintf("sleep %d", naptime)) != 0)
+ exit 1
+
+ # time to notify!
+ command = sprintf("sleep %d", delay)
+ for (i = 1; i <= count; i++) @{
+ print message
+ # if sleep command interrupted, go away
+ if (system(command) != 0)
+ break
+ @}
+
+ exit 0
+@}
+@c endfile
+@c @end group
+@end example
+
+@node Translate Program, Labels Program, Alarm Program, Miscellaneous Programs
+@subsection Transliterating Characters
+
+The system @code{tr} utility transliterates characters. For example, it is
+often used to map upper-case letters into lower-case, for further
+processing.
+
+@example
+@var{generate data} | tr '[A-Z]' '[a-z]' | @var{process data} @dots{}
+@end example
+
+You give @code{tr} two lists of characters enclosed in square brackets.
+Usually, the lists are quoted to keep the shell from attempting to do a
+filename expansion.@footnote{On older, non-POSIX systems, @code{tr} often
+does not require that the lists be enclosed in square brackets and quoted.
+This is a feature.} When processing the input, the
+first character in the first list is replaced with the first character in the
+second list, the second character in the first list is replaced with the
+second character in the second list, and so on.
+If there are more characters in the ``from'' list than in the ``to'' list,
+the last character of the ``to'' list is used for the remaining characters
+in the ``from'' list.
+
+Some time ago,
+@c early or mid-1989!
+a user proposed to us that we add a transliteration function to @code{gawk}.
+Being opposed to ``creeping featurism,'' I wrote the following program to
+prove that character transliteration could be done with a user-level
+function. This program is not as complete as the system @code{tr} utility,
+but it will do most of the job.
+
+The @code{translate} program demonstrates one of the few weaknesses of
+standard
+@code{awk}: dealing with individual characters is very painful, requiring
+repeated use of the @code{substr}, @code{index}, and @code{gsub} built-in
+functions
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@footnote{This
+program was written before @code{gawk} acquired the ability to
+split each character in a string into separate array elements.
+How might this ability simplify the program?}
+
+There are two functions. The first, @code{stranslate}, takes three
+arguments.
+
+@table @code
+@item from
+A list of characters to translate from.
+
+@item to
+A list of characters to translate to.
+
+@item target
+The string to do the translation on.
+@end table
+
+Associative arrays make the translation part fairly easy. @code{t_ar} holds
+the ``to'' characters, indexed by the ``from'' characters. Then a simple
+loop goes through @code{from}, one character at a time. For each character
+in @code{from}, if the character appears in @code{target}, @code{gsub}
+is used to change it to the corresponding @code{to} character.
+
+The @code{translate} function simply calls @code{stranslate} using @code{$0}
+as the target. The main program sets two global variables, @code{FROM} and
+@code{TO}, from the command line, and then changes @code{ARGV} so that
+@code{awk} will read from the standard input.
+
+Finally, the processing rule simply calls @code{translate} for each record.
+
+@findex translate.awk
+@example
+@c @group
+@c file eg/prog/translate.awk
+# translate --- do tr like stuff
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# August 1989
+
+# bugs: does not handle things like: tr A-Z a-z, it has
+# to be spelled out. However, if `to' is shorter than `from',
+# the last character in `to' is used for the rest of `from'.
+
+function stranslate(from, to, target, lf, lt, t_ar, i, c)
+@{
+ lf = length(from)
+ lt = length(to)
+ for (i = 1; i <= lt; i++)
+ t_ar[substr(from, i, 1)] = substr(to, i, 1)
+ if (lt < lf)
+ for (; i <= lf; i++)
+ t_ar[substr(from, i, 1)] = substr(to, lt, 1)
+ for (i = 1; i <= lf; i++) @{
+ c = substr(from, i, 1)
+ if (index(target, c) > 0)
+ gsub(c, t_ar[c], target)
+ @}
+ return target
+@}
+
+@group
+function translate(from, to)
+@{
+ return $0 = stranslate(from, to, $0)
+@}
+@end group
+
+# main program
+BEGIN @{
+ if (ARGC < 3) @{
+ print "usage: translate from to" > "/dev/stderr"
+ exit
+ @}
+ FROM = ARGV[1]
+ TO = ARGV[2]
+ ARGC = 2
+ ARGV[1] = "-"
+@}
+
+@{
+ translate(FROM, TO)
+ print
+@}
+@c endfile
+@c @end group
+@end example
+
+While it is possible to do character transliteration in a user-level
+function, it is not necessarily efficient, and we started to consider adding
+a built-in function. However, shortly after writing this program, we learned
+that the System V Release 4 @code{awk} had added the @code{toupper} and
+@code{tolower} functions. These functions handle the vast majority of the
+cases where character transliteration is necessary, and so we chose to
+simply add those functions to @code{gawk} as well, and then leave well
+enough alone.
+
+An obvious improvement to this program would be to set up the
+@code{t_ar} array only once, in a @code{BEGIN} rule. However, this
+assumes that the ``from'' and ``to'' lists
+will never change throughout the lifetime of the program.
+
+@node Labels Program, Word Sorting, Translate Program, Miscellaneous Programs
+@subsection Printing Mailing Labels
+
+Here is a ``real world''@footnote{``Real world'' is defined as
+``a program actually used to get something done.''}
+program. This script reads lists of names and
+addresses, and generates mailing labels. Each page of labels has 20 labels
+on it, two across and ten down. The addresses are guaranteed to be no more
+than five lines of data. Each address is separated from the next by a blank
+line.
+
+The basic idea is to read 20 labels worth of data. Each line of each label
+is stored in the @code{line} array. The single rule takes care of filling
+the @code{line} array and printing the page when 20 labels have been read.
+
+The @code{BEGIN} rule simply sets @code{RS} to the empty string, so that
+@code{awk} will split records at blank lines
+(@pxref{Records, ,How Input is Split into Records}).
+It sets @code{MAXLINES} to 100, since @code{MAXLINE} is the maximum number
+of lines on the page (20 * 5 = 100).
+
+Most of the work is done in the @code{printpage} function.
+The label lines are stored sequentially in the @code{line} array. But they
+have to be printed horizontally; @code{line[1]} next to @code{line[6]},
+@code{line[2]} next to @code{line[7]}, and so on. Two loops are used to
+accomplish this. The outer loop, controlled by @code{i}, steps through
+every 10 lines of data; this is each row of labels. The inner loop,
+controlled by @code{j}, goes through the lines within the row.
+As @code{j} goes from zero to four, @samp{i+j} is the @code{j}'th line in
+the row, and @samp{i+j+5} is the entry next to it. The output ends up
+looking something like this:
+
+@example
+line 1 line 6
+line 2 line 7
+line 3 line 8
+line 4 line 9
+line 5 line 10
+@end example
+
+As a final note, at lines 21 and 61, an extra blank line is printed, to keep
+the output lined up on the labels. This is dependent on the particular
+brand of labels in use when the program was written. You will also note
+that there are two blank lines at the top and two blank lines at the bottom.
+
+The @code{END} rule arranges to flush the final page of labels; there may
+not have been an even multiple of 20 labels in the data.
+
+@findex labels.awk
+@example
+@c @group
+@c file eg/prog/labels.awk
+# labels.awk
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# June 1992
+
+# Program to print labels. Each label is 5 lines of data
+# that may have blank lines. The label sheets have 2
+# blank lines at the top and 2 at the bottom.
+
+BEGIN @{ RS = "" ; MAXLINES = 100 @}
+
+function printpage( i, j)
+@{
+ if (Nlines <= 0)
+ return
+
+ printf "\n\n" # header
+
+ for (i = 1; i <= Nlines; i += 10) @{
+ if (i == 21 || i == 61)
+ print ""
+ for (j = 0; j < 5; j++) @{
+ if (i + j > MAXLINES)
+ break
+ printf " %-41s %s\n", line[i+j], line[i+j+5]
+ @}
+ print ""
+ @}
+
+ printf "\n\n" # footer
+
+ for (i in line)
+ line[i] = ""
+@}
+
+# main rule
+@{
+ if (Count >= 20) @{
+ printpage()
+ Count = 0
+ Nlines = 0
+ @}
+ n = split($0, a, "\n")
+ for (i = 1; i <= n; i++)
+ line[++Nlines] = a[i]
+ for (; i <= 5; i++)
+ line[++Nlines] = ""
+ Count++
+@}
+
+END \
+@{
+ printpage()
+@}
+@c endfile
+@c @end group
+@end example
+
+@node Word Sorting, History Sorting, Labels Program, Miscellaneous Programs
+@subsection Generating Word Usage Counts
+
+The following @code{awk} program prints
+the number of occurrences of each word in its input. It illustrates the
+associative nature of @code{awk} arrays by using strings as subscripts. It
+also demonstrates the @samp{for @var{x} in @var{array}} construction.
+Finally, it shows how @code{awk} can be used in conjunction with other
+utility programs to do a useful task of some complexity with a minimum of
+effort. Some explanations follow the program listing.
+
+@example
+awk '
+# Print list of word frequencies
+@{
+ for (i = 1; i <= NF; i++)
+ freq[$i]++
+@}
+
+END @{
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word]
+@}'
+@end example
+
+The first thing to notice about this program is that it has two rules. The
+first rule, because it has an empty pattern, is executed on every line of
+the input. It uses @code{awk}'s field-accessing mechanism
+(@pxref{Fields, ,Examining Fields}) to pick out the individual words from
+the line, and the built-in variable @code{NF} (@pxref{Built-in Variables})
+to know how many fields are available.
+
+For each input word, an element of the array @code{freq} is incremented to
+reflect that the word has been seen an additional time.
+
+The second rule, because it has the pattern @code{END}, is not executed
+until the input has been exhausted. It prints out the contents of the
+@code{freq} table that has been built up inside the first action.
+
+This program has several problems that would prevent it from being
+useful by itself on real text files:
+
+@itemize @bullet
+@item
+Words are detected using the @code{awk} convention that fields are
+separated by whitespace and that other characters in the input (except
+newlines) don't have any special meaning to @code{awk}. This means that
+punctuation characters count as part of words.
+
+@item
+The @code{awk} language considers upper- and lower-case characters to be
+distinct. Therefore, @samp{bartender} and @samp{Bartender} are not treated
+as the same word. This is undesirable since, in normal text, words
+are capitalized if they begin sentences, and a frequency analyzer should not
+be sensitive to capitalization.
+
+@iftex
+@page
+@end iftex
+@item
+The output does not come out in any useful order. You're more likely to be
+interested in which words occur most frequently, or having an alphabetized
+table of how frequently each word occurs.
+@end itemize
+
+The way to solve these problems is to use some of the more advanced
+features of the @code{awk} language. First, we use @code{tolower} to remove
+case distinctions. Next, we use @code{gsub} to remove punctuation
+characters. Finally, we use the system @code{sort} utility to process the
+output of the @code{awk} script. Here is the new version of
+the program:
+
+@findex wordfreq.sh
+@example
+@c file eg/prog/wordfreq.awk
+# Print list of word frequencies
+@{
+ $0 = tolower($0) # remove case distinctions
+ gsub(/[^a-z0-9_ \t]/, "", $0) # remove punctuation
+ for (i = 1; i <= NF; i++)
+ freq[$i]++
+@}
+@c endfile
+
+END @{
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word]
+@}
+@end example
+
+Assuming we have saved this program in a file named @file{wordfreq.awk},
+and that the data is in @file{file1}, the following pipeline
+
+@example
+awk -f wordfreq.awk file1 | sort +1 -nr
+@end example
+
+@noindent
+produces a table of the words appearing in @file{file1} in order of
+decreasing frequency.
+
+The @code{awk} program suitably massages the data and produces a word
+frequency table, which is not ordered.
+
+The @code{awk} script's output is then sorted by the @code{sort} utility and
+printed on the terminal. The options given to @code{sort} in this example
+specify to sort using the second field of each input line (skipping one field),
+that the sort keys should be treated as numeric quantities (otherwise
+@samp{15} would come before @samp{5}), and that the sorting should be done
+in descending (reverse) order.
+
+We could have even done the @code{sort} from within the program, by
+changing the @code{END} action to:
+
+@example
+@c file eg/prog/wordfreq.awk
+END @{
+ sort = "sort +1 -nr"
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word] | sort
+ close(sort)
+@}
+@c endfile
+@end example
+
+You would have to use this way of sorting on systems that do not
+have true pipes.
+
+See the general operating system documentation for more information on how
+to use the @code{sort} program.
+
+@node History Sorting, Extract Program, Word Sorting, Miscellaneous Programs
+@subsection Removing Duplicates from Unsorted Text
+
+The @code{uniq} program
+(@pxref{Uniq Program, ,Printing Non-duplicated Lines of Text}),
+removes duplicate lines from @emph{sorted} data.
+
+Suppose, however, you need to remove duplicate lines from a data file, but
+that you wish to preserve the order the lines are in? A good example of
+this might be a shell history file. The history file keeps a copy of all
+the commands you have entered, and it is not unusual to repeat a command
+several times in a row. Occasionally you might wish to compact the history
+by removing duplicate entries. Yet it is desirable to maintain the order
+of the original commands.
+
+This simple program does the job. It uses two arrays. The @code{data}
+array is indexed by the text of each line.
+For each line, @code{data[$0]} is incremented.
+
+If a particular line has not
+been seen before, then @code{data[$0]} will be zero.
+In that case, the text of the line is stored in @code{lines[count]}.
+Each element of @code{lines} is a unique command, and the indices of
+@code{lines} indicate the order in which those lines were encountered.
+The @code{END} rule simply prints out the lines, in order.
+
+@cindex Rakitzis, Byron
+@findex histsort.awk
+@example
+@group
+@c file eg/prog/histsort.awk
+# histsort.awk --- compact a shell history file
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+# Thanks to Byron Rakitzis for the general idea
+@{
+ if (data[$0]++ == 0)
+ lines[++count] = $0
+@}
+
+END @{
+ for (i = 1; i <= count; i++)
+ print lines[i]
+@}
+@c endfile
+@end group
+@end example
+
+This program also provides a foundation for generating other useful
+information. For example, using the following @code{print} satement in the
+@code{END} rule would indicate how often a particular command was used.
+
+@example
+print data[lines[i]], lines[i]
+@end example
+
+This works because @code{data[$0]} was incremented each time a line was
+seen.
+
+@node Extract Program, Simple Sed, History Sorting, Miscellaneous Programs
+@subsection Extracting Programs from Texinfo Source Files
+
+@iftex
+Both this chapter and the previous chapter
+(@ref{Library Functions, ,A Library of @code{awk} Functions}),
+present a large number of @code{awk} programs.
+@end iftex
+@ifinfo
+The nodes
+@ref{Library Functions, ,A Library of @code{awk} Functions},
+and @ref{Sample Programs, ,Practical @code{awk} Programs},
+are the top level nodes for a large number of @code{awk} programs.
+@end ifinfo
+If you wish to experiment with these programs, it is tedious to have to type
+them in by hand. Here we present a program that can extract parts of a
+Texinfo input file into separate files.
+
+This @value{DOCUMENT} is written in Texinfo, the GNU project's document
+formatting language. A single Texinfo source file can be used to produce both
+printed and on-line documentation.
+@iftex
+Texinfo is fully documented in @cite{Texinfo---The GNU Documentation Format},
+available from the Free Software Foundation.
+@end iftex
+@ifinfo
+The Texinfo language is described fully, starting with
+@ref{Top, , Introduction, texi, Texinfo---The GNU Documentation Format}.
+@end ifinfo
+
+For our purposes, it is enough to know three things about Texinfo input
+files.
+
+@itemize @bullet
+@item
+The ``at'' symbol, @samp{@@}, is special in Texinfo, much like @samp{\} in C
+or @code{awk}. Literal @samp{@@} symbols are represented in Texinfo source
+files as @samp{@@@@}.
+
+@item
+Comments start with either @samp{@@c} or @samp{@@comment}.
+The file extraction program will work by using special comments that start
+at the beginning of a line.
+
+@item
+Example text that should not be split across a page boundary is bracketed
+between lines containing @samp{@@group} and @samp{@@end group} commands.
+@end itemize
+
+The following program, @file{extract.awk}, reads through a Texinfo source
+file, and does two things, based on the special comments.
+Upon seeing @samp{@w{@@c system @dots{}}},
+it runs a command, by extracting the command text from the
+control line and passing it on to the @code{system} function
+(@pxref{I/O Functions, ,Built-in Functions for Input/Output}).
+Upon seeing @samp{@@c file @var{filename}}, each subsequent line is sent to
+the file @var{filename}, until @samp{@@c endfile} is encountered.
+The rules in @file{extract.awk} will match either @samp{@@c} or
+@samp{@@comment} by letting the @samp{omment} part be optional.
+Lines containing @samp{@@group} and @samp{@@end group} are simply removed.
+@file{extract.awk} uses the @code{join} library function
+(@pxref{Join Function, ,Merging an Array Into a String}).
+
+The example programs in the on-line Texinfo source for @cite{@value{TITLE}}
+(@file{gawk.texi}) have all been bracketed inside @samp{file},
+and @samp{endfile} lines. The @code{gawk} distribution uses a copy of
+@file{extract.awk} to extract the sample
+programs and install many of them in a standard directory, where
+@code{gawk} can find them.
+
+@file{extract.awk} begins by setting @code{IGNORECASE} to one, so that
+mixed upper-case and lower-case letters in the directives won't matter.
+
+The first rule handles calling @code{system}, checking that a command was
+given (@code{NF} is at least three), and also checking that the command
+exited with a zero exit status, signifying OK.
+
+@findex extract.awk
+@example
+@c @group
+@c file eg/prog/extract.awk
+# extract.awk --- extract files and run programs
+# from texinfo files
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# May 1993
+
+BEGIN @{ IGNORECASE = 1 @}
+
+@group
+/^@@c(omment)?[ \t]+system/ \
+@{
+ if (NF < 3) @{
+ e = (FILENAME ":" FNR)
+ e = (e ": badly formed `system' line")
+ print e > "/dev/stderr"
+ next
+ @}
+ $1 = ""
+ $2 = ""
+ stat = system($0)
+ if (stat != 0) @{
+ e = (FILENAME ":" FNR)
+ e = (e ": warning: system returned " stat)
+ print e > "/dev/stderr"
+ @}
+@}
+@end group
+@c endfile
+@end example
+
+@noindent
+The variable @code{e} is used so that the function
+fits nicely on the
+@iftex
+page.
+@end iftex
+@ifinfo
+screen.
+@end ifinfo
+
+The second rule handles moving data into files. It verifies that a file
+name was given in the directive. If the file named is not the current file,
+then the current file is closed. This means that an @samp{@@c endfile} was
+not given for that file. (We should probably print a diagnostic in this
+case, although at the moment we do not.)
+
+The @samp{for} loop does the work. It reads lines using @code{getline}
+(@pxref{Getline, ,Explicit Input with @code{getline}}).
+For an unexpected end of file, it calls the @code{@w{unexpected_eof}}
+function. If the line is an ``endfile'' line, then it breaks out of
+the loop.
+If the line is an @samp{@@group} or @samp{@@end group} line, then it
+ignores it, and goes on to the next line.
+
+Most of the work is in the following few lines. If the line has no @samp{@@}
+symbols, it can be printed directly. Otherwise, each leading @samp{@@} must be
+stripped off.
+
+To remove the @samp{@@} symbols, the line is split into separate elements of
+the array @code{a}, using the @code{split} function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+Each element of @code{a} that is empty indicates two successive @samp{@@}
+symbols in the original line. For each two empty elements (@samp{@@@@} in
+the original file), we have to add back in a single @samp{@@} symbol.
+
+When the processing of the array is finished, @code{join} is called with the
+value of @code{SUBSEP}, to rejoin the pieces back into a single
+line. That line is then printed to the output file.
+
+@example
+@c @group
+@c file eg/prog/extract.awk
+/^@@c(omment)?[ \t]+file/ \
+@{
+@group
+ if (NF != 3) @{
+ e = (FILENAME ":" FNR ": badly formed `file' line")
+ print e > "/dev/stderr"
+ next
+ @}
+@end group
+ if ($3 != curfile) @{
+ if (curfile != "")
+ close(curfile)
+ curfile = $3
+ @}
+
+ for (;;) @{
+ if ((getline line) <= 0)
+ unexpected_eof()
+ if (line ~ /^@@c(omment)?[ \t]+endfile/)
+ break
+ else if (line ~ /^@@(end[ \t]+)?group/)
+ continue
+ if (index(line, "@@") == 0) @{
+ print line > curfile
+ continue
+ @}
+ n = split(line, a, "@@")
+@group
+ # if a[1] == "", means leading @@,
+ # don't add one back in.
+@end group
+ for (i = 2; i <= n; i++) @{
+ if (a[i] == "") @{ # was an @@@@
+ a[i] = "@@"
+ if (a[i+1] == "")
+ i++
+ @}
+ @}
+ print join(a, 1, n, SUBSEP) > curfile
+ @}
+@}
+@c endfile
+@c @end group
+@end example
+
+An important thing to note is the use of the @samp{>} redirection.
+Output done with @samp{>} only opens the file once; it stays open and
+subsequent output is appended to the file
+(@pxref{Redirection, , Redirecting Output of @code{print} and @code{printf}}).
+This allows us to easily mix program text and explanatory prose for the same
+sample source file (as has been done here!) without any hassle. The file is
+only closed when a new data file name is encountered, or at the end of the
+input file.
+
+Finally, the function @code{@w{unexpected_eof}} prints an appropriate
+error message and then exits.
+
+The @code{END} rule handles the final cleanup, closing the open file.
+
+@example
+@c file eg/prog/extract.awk
+@group
+function unexpected_eof()
+@{
+ printf("%s:%d: unexpected EOF or error\n", \
+ FILENAME, FNR) > "/dev/stderr"
+ exit 1
+@}
+@end group
+
+END @{
+ if (curfile)
+ close(curfile)
+@}
+@c endfile
+@end example
+
+@node Simple Sed, Igawk Program, Extract Program, Miscellaneous Programs
+@subsection A Simple Stream Editor
+
+@cindex @code{sed} utility
+The @code{sed} utility is a ``stream editor,'' a program that reads a
+stream of data, makes changes to it, and passes the modified data on.
+It is often used to make global changes to a large file, or to a stream
+of data generated by a pipeline of commands.
+
+While @code{sed} is a complicated program in its own right, its most common
+use is to perform global substitutions in the middle of a pipeline:
+
+@example
+command1 < orig.data | sed 's/old/new/g' | command2 > result
+@end example
+
+Here, the @samp{s/old/new/g} tells @code{sed} to look for the regexp
+@samp{old} on each input line, and replace it with the text @samp{new},
+globally (i.e.@: all the occurrences on a line). This is similar to
+@code{awk}'s @code{gsub} function
+(@pxref{String Functions, , Built-in Functions for String Manipulation}).
+
+The following program, @file{awksed.awk}, accepts at least two command line
+arguments; the pattern to look for and the text to replace it with. Any
+additional arguments are treated as data file names to process. If none
+are provided, the standard input is used.
+
+@cindex Brennan, Michael
+@cindex @code{awksed}
+@cindex simple stream editor
+@cindex stream editor, simple
+@example
+@c @group
+@c file eg/prog/awksed.awk
+# awksed.awk --- do s/foo/bar/g using just print
+# Thanks to Michael Brennan for the idea
+
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# August 1995
+
+function usage()
+@{
+ print "usage: awksed pat repl [files...]" > "/dev/stderr"
+ exit 1
+@}
+
+BEGIN @{
+ # validate arguments
+ if (ARGC < 3)
+ usage()
+
+ RS = ARGV[1]
+ ORS = ARGV[2]
+
+ # don't use arguments as files
+ ARGV[1] = ARGV[2] = ""
+@}
+
+# look ma, no hands!
+@{
+ if (RT == "")
+ printf "%s", $0
+ else
+ print
+@}
+@c endfile
+@c @end group
+@end example
+
+The program relies on @code{gawk}'s ability to have @code{RS} be a regexp
+and on the setting of @code{RT} to the actual text that terminated the
+record (@pxref{Records, ,How Input is Split into Records}).
+
+The idea is to have @code{RS} be the pattern to look for. @code{gawk}
+will automatically set @code{$0} to the text between matches of the pattern.
+This is text that we wish to keep, unmodified. Then, by setting @code{ORS}
+to the replacement text, a simple @code{print} statement will output the
+text we wish to keep, followed by the replacement text.
+
+There is one wrinkle to this scheme, which is what to do if the last record
+doesn't end with text that matches @code{RS}? Using a @code{print}
+statement unconditionally prints the replacement text, which is not correct.
+
+However, if the file did not end in text that matches @code{RS}, @code{RT}
+will be set to the null string. In this case, we can print @code{$0} using
+@code{printf}
+(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).
+
+The @code{BEGIN} rule handles the setup, checking for the right number
+of arguments, and calling @code{usage} if there is a problem. Then it sets
+@code{RS} and @code{ORS} from the command line arguments, and sets
+@code{ARGV[1]} and @code{ARGV[2]} to the null string, so that they will
+not be treated as file names
+(@pxref{ARGC and ARGV, , Using @code{ARGC} and @code{ARGV}}).
+
+The @code{usage} function prints an error message and exits.
+
+Finally, the single rule handles the printing scheme outlined above,
+using @code{print} or @code{printf} as appropriate, depending upon the
+value of @code{RT}.
+
+@ignore
+Exercise, compare the performance of this version with the more
+straightforward:
+
+BEGIN {
+ pat = ARGV[1]
+ repl = ARGV[2]
+ ARGV[1] = ARGV[2] = ""
+}
+
+{ gsub(pat, repl); print }
+
+Exercise: what are the advantages and disadvantages of this version vs. sed?
+ Advantage: egrep regexps
+ speed (?)
+ Disadvantage: no & in replacement text
+
+Others?
+@end ignore
+
+@node Igawk Program, , Simple Sed, Miscellaneous Programs
+@subsection An Easy Way to Use Library Functions
+
+Using library functions in @code{awk} can be very beneficial. It
+encourages code re-use and the writing of general functions. Programs are
+smaller, and therefore clearer.
+However, using library functions is only easy when writing @code{awk}
+programs; it is painful when running them, requiring multiple @samp{-f}
+options. If @code{gawk} is unavailable, then so too is the @code{AWKPATH}
+environment variable and the ability to put @code{awk} functions into a
+library directory (@pxref{Options, ,Command Line Options}).
+
+It would be nice to be able to write programs like so:
+
+@example
+# library functions
+@@include getopt.awk
+@@include join.awk
+@dots{}
+
+# main program
+BEGIN @{
+ while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1)
+ @dots{}
+ @dots{}
+@}
+@end example
+
+The following program, @file{igawk.sh}, provides this service.
+It simulates @code{gawk}'s searching of the @code{AWKPATH} variable,
+and also allows @dfn{nested} includes; i.e.@: a file that has been included
+with @samp{@@include} can contain further @samp{@@include} statements.
+@code{igawk} will make an effort to only include files once, so that nested
+includes don't accidentally include a library function twice.
+
+@code{igawk} should behave externally just like @code{gawk}. This means it
+should accept all of @code{gawk}'s command line arguments, including the
+ability to have multiple source files specified via @samp{-f}, and the
+ability to mix command line and library source files.
+
+The program is written using the POSIX Shell (@code{sh}) command language.
+The way the program works is as follows:
+
+@enumerate
+@item
+Loop through the arguments, saving anything that doesn't represent
+@code{awk} source code for later, when the expanded program is run.
+
+@item
+For any arguments that do represent @code{awk} text, put the arguments into
+a temporary file that will be expanded. There are two cases.
+
+@enumerate a
+@item
+Literal text, provided with @samp{--source} or @samp{--source=}. This
+text is just echoed directly. The @code{echo} program will automatically
+supply a trailing newline.
+
+@item
+File names provided with @samp{-f}. We use a neat trick, and echo
+@samp{@@include @var{filename}} into the temporary file. Since the file
+inclusion program will work the way @code{gawk} does, this will get the text
+of the file included into the program at the correct point.
+@end enumerate
+
+@item
+Run an @code{awk} program (naturally) over the temporary file to expand
+@samp{@@include} statements. The expanded program is placed in a second
+temporary file.
+
+@item
+Run the expanded program with @code{gawk} and any other original command line
+arguments that the user supplied (such as the data file names).
+@end enumerate
+
+The initial part of the program turns on shell tracing if the first
+argument was @samp{debug}. Otherwise, a shell @code{trap} statement
+arranges to clean up any temporary files on program exit or upon an
+interrupt.
+
+@c 2e: For the temp file handling, go with Darrel's ig=${TMP:-/tmp}/igs.$$
+@c 2e: or something as similar as possible.
+
+The next part loops through all the command line arguments.
+There are several cases of interest.
+
+@table @code
+@item --
+This ends the arguments to @code{igawk}. Anything else should be passed on
+to the user's @code{awk} program without being evaluated.
+
+@item -W
+This indicates that the next option is specific to @code{gawk}. To make
+argument processing easier, the @samp{-W} is appended to the front of the
+remaining arguments and the loop continues. (This is an @code{sh}
+programming trick. Don't worry about it if you are not familiar with
+@code{sh}.)
+
+@item -v
+@itemx -F
+These are saved and passed on to @code{gawk}.
+
+@item -f
+@itemx --file
+@itemx --file=
+@itemx -Wfile=
+The file name is saved to the temporary file @file{/tmp/ig.s.$$} with an
+@samp{@@include} statement.
+The @code{sed} utility is used to remove the leading option part of the
+argument (e.g., @samp{--file=}).
+
+@item --source
+@itemx --source=
+@itemx -Wsource=
+The source text is echoed into @file{/tmp/ig.s.$$}.
+
+@iftex
+@page
+@end iftex
+@item --version
+@itemx --version
+@itemx -Wversion
+@code{igawk} prints its version number, and runs @samp{gawk --version}
+to get the @code{gawk} version information, and then exits.
+@end table
+
+If none of @samp{-f}, @samp{--file}, @samp{-Wfile}, @samp{--source},
+or @samp{-Wsource}, were supplied, then the first non-option argument
+should be the @code{awk} program. If there are no command line
+arguments left, @code{igawk} prints an error message and exits.
+Otherwise, the first argument is echoed into @file{/tmp/ig.s.$$}.
+
+In any case, after the arguments have been processed,
+@file{/tmp/ig.s.$$} contains the complete text of the original @code{awk}
+program.
+
+The @samp{$$} in @code{sh} represents the current process ID number.
+It is often used in shell programs to generate unique temporary file
+names. This allows multiple users to run @code{igawk} without worrying
+that the temporary file names will clash.
+
+@cindex @code{sed} utility
+Here's the program:
+
+@findex igawk.sh
+@example
+@c @group
+@c file eg/prog/igawk.sh
+#! /bin/sh
+
+# igawk --- like gawk but do @@include processing
+# Arnold Robbins, arnold@@gnu.ai.mit.edu, Public Domain
+# July 1993
+
+if [ "$1" = debug ]
+then
+ set -x
+ shift
+else
+ # cleanup on exit, hangup, interrupt, quit, termination
+ trap 'rm -f /tmp/ig.[se].$$' 0 1 2 3 15
+fi
+
+while [ $# -ne 0 ] # loop over arguments
+do
+ case $1 in
+ --) shift; break;;
+
+ -W) shift
+ set -- -W"$@@"
+ continue;;
+
+ -[vF]) opts="$opts $1 '$2'"
+ shift;;
+
+ -[vF]*) opts="$opts '$1'" ;;
+
+ -f) echo @@include "$2" >> /tmp/ig.s.$$
+ shift;;
+
+ -f*) f=`echo "$1" | sed 's/-f//'`
+ echo @@include "$f" >> /tmp/ig.s.$$ ;;
+
+ -?file=*) # -Wfile or --file
+ f=`echo "$1" | sed 's/-.file=//'`
+ echo @@include "$f" >> /tmp/ig.s.$$ ;;
+
+ -?file) # get arg, $2
+ echo @@include "$2" >> /tmp/ig.s.$$
+ shift;;
+
+ -?source=*) # -Wsource or --source
+ t=`echo "$1" | sed 's/-.source=//'`
+ echo "$t" >> /tmp/ig.s.$$ ;;
+
+ -?source) # get arg, $2
+ echo "$2" >> /tmp/ig.s.$$
+ shift;;
+
+ -?version)
+ echo igawk: version 1.0 1>&2
+ gawk --version
+ exit 0 ;;
+
+ -[W-]*) opts="$opts '$1'" ;;
+
+ *) break;;
+ esac
+ shift
+done
+
+if [ ! -s /tmp/ig.s.$$ ]
+then
+ if [ -z "$1" ]
+ then
+ echo igawk: no program! 1>&2
+ exit 1
+ else
+ echo "$1" > /tmp/ig.s.$$
+ shift
+ fi
+fi
+
+# at this point, /tmp/ig.s.$$ has the program
+@c endfile
+@c @end group
+@end example
+
+The @code{awk} program to process @samp{@@include} directives reads through
+the program, one line at a time using @code{getline}
+(@pxref{Getline, ,Explicit Input with @code{getline}}).
+The input file names and @samp{@@include} statements are managed using a
+stack. As each @samp{@@include} is encountered, the current file name is
+``pushed'' onto the stack, and the file named in the @samp{@@include}
+directive becomes
+the current file name. As each file is finished, the stack is ``popped,''
+and the previous input file becomes the current input file again.
+The process is started by making the original file the first one on the
+stack.
+
+The @code{pathto} function does the work of finding the full path to a
+file. It simulates @code{gawk}'s behavior when searching the @code{AWKPATH}
+environment variable
+(@pxref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}).
+If a file name has a @samp{/} in it, no path search
+is done. Otherwise, the file name is concatenated with the name of each
+directory in the path, and an attempt is made to open the generated file
+name. The only way in @code{awk} to test if a file can be read is to go
+ahead and try to read it with @code{getline}; that is what @code{pathto}
+does. If the file can be read, it is closed, and the file name is
+returned.
+@ignore
+An alternative way to test for the file's existence would be to call
+@samp{system("test -r " t)}, which uses the @code{test} utility to
+see if the file exists and is readable. The disadvantage to this method
+is that it requires creating an extra process, and can thus be slightly
+slower.
+@end ignore
+
+@example
+@c @group
+@c file eg/prog/igawk.sh
+gawk -- '
+# process @@include directives
+
+function pathto(file, i, t, junk)
+@{
+ if (index(file, "/") != 0)
+ return file
+
+ for (i = 1; i <= ndirs; i++) @{
+ t = (pathlist[i] "/" file)
+ if ((getline junk < t) > 0) @{
+ # found it
+ close(t)
+ return t
+ @}
+ @}
+ return ""
+@}
+@c endfile
+@c @end group
+@end example
+
+The main program is contained inside one @code{BEGIN} rule. The first thing it
+does is set up the @code{pathlist} array that @code{pathto} uses. After
+splitting the path on @samp{:}, null elements are replaced with @code{"."},
+which represents the current directory.
+
+@example
+@c @group
+@c file eg/prog/igawk.sh
+BEGIN @{
+ path = ENVIRON["AWKPATH"]
+ ndirs = split(path, pathlist, ":")
+ for (i = 1; i <= ndirs; i++) @{
+ if (pathlist[i] == "")
+ pathlist[i] = "."
+ @}
+@c endfile
+@c @end group
+@end example
+
+The stack is initialized with @code{ARGV[1]}, which will be @file{/tmp/ig.s.$$}.
+The main loop comes next. Input lines are read in succession. Lines that
+do not start with @samp{@@include} are printed verbatim.
+
+If the line does start with @samp{@@include}, the file name is in @code{$2}.
+@code{pathto} is called to generate the full path. If it could not, then we
+print an error message and continue.
+
+The next thing to check is if the file has been included already. The
+@code{processed} array is indexed by the full file name of each included
+file, and it tracks this information for us. If the file has been
+seen, a warning message is printed. Otherwise, the new file name is
+pushed onto the stack and processing continues.
+
+Finally, when @code{getline} encounters the end of the input file, the file
+is closed and the stack is popped. When @code{stackptr} is less than zero,
+the program is done.
+
+@example
+@c @group
+@c file eg/prog/igawk.sh
+ stackptr = 0
+ input[stackptr] = ARGV[1] # ARGV[1] is first file
+
+ for (; stackptr >= 0; stackptr--) @{
+ while ((getline < input[stackptr]) > 0) @{
+ if (tolower($1) != "@@include") @{
+ print
+ continue
+ @}
+ fpath = pathto($2)
+ if (fpath == "") @{
+ printf("igawk:%s:%d: cannot find %s\n", \
+ input[stackptr], FNR, $2) > "/dev/stderr"
+ continue
+ @}
+@group
+ if (! (fpath in processed)) @{
+ processed[fpath] = input[stackptr]
+ input[++stackptr] = fpath
+ @} else
+ print $2, "included in", input[stackptr], \
+ "already included in", \
+ processed[fpath] > "/dev/stderr"
+ @}
+@end group
+@group
+ close(input[stackptr])
+ @}
+@}' /tmp/ig.s.$$ > /tmp/ig.e.$$
+@end group
+@c endfile
+@c @end group
+@end example
+
+The last step is to call @code{gawk} with the expanded program and the original
+options and command line arguments that the user supplied. @code{gawk}'s
+exit status is passed back on to @code{igawk}'s calling program.
+
+@c this causes more problems than it solves, so leave it out.
+@ignore
+The special file @file{/dev/null} is passed as a data file to @code{gawk}
+to handle an interesting case. Suppose that the user's program only has
+a @code{BEGIN} rule, and there are no data files to read. The program should exit without reading any data
+files. However, suppose that an included library file defines an @code{END}
+rule of its own. In this case, @code{gawk} will hang, reading standard
+input. In order to avoid this, @file{/dev/null} is explicitly to the
+command line. Reading from @file{/dev/null} always returns an immediate
+end of file indication.
+
+@c Hmm. Add /dev/null if $# is 0? Still messes up ARGV. Sigh.
+@end ignore
+
+@example
+@c @group
+@c file eg/prog/igawk.sh
+eval gawk -f /tmp/ig.e.$$ $opts -- "$@@"
+
+exit $?
+@c endfile
+@c @end group
+@end example
+
+This version of @code{igawk} represents my third attempt at this program.
+There are three key simplifications that made the program work better.
+
+@enumerate
+@item
+Using @samp{@@include} even for the files named with @samp{-f} makes building
+the initial collected @code{awk} program much simpler; all the
+@samp{@@include} processing can be done once.
+
+@item
+The @code{pathto} function doesn't try to save the line read with
+@code{getline} when testing for the file's accessibility. Trying to save
+this line for use with the main program complicates things considerably.
+@c what problem does this engender though - exercise
+@c answer, reading from "-" or /dev/stdin
+
+@item
+Using a @code{getline} loop in the @code{BEGIN} rule does it all in one
+place. It is not necessary to call out to a separate loop for processing
+nested @samp{@@include} statements.
+@end enumerate
+
+Also, this program illustrates that it is often worthwhile to combine
+@code{sh} and @code{awk} programming together. You can usually accomplish
+quite a lot, without having to resort to low-level programming in C or C++, and it
+is frequently easier to do certain kinds of string and argument manipulation
+using the shell than it is in @code{awk}.
+
+Finally, @code{igawk} shows that it is not always necessary to add new
+features to a program; they can often be layered on top. With @code{igawk},
+there is no real reason to build @samp{@@include} processing into
+@code{gawk} itself.
+
+As an additional example of this, consider the idea of having two
+files in a directory in the search path.
+
+@table @file
+@item default.awk
+This file would contain a set of default library functions, such
+as @code{getopt} and @code{assert}.
+
+@item site.awk
+This file would contain library functions that are specific to a site or
+installation, i.e.@: locally developed functions.
+Having a separate file allows @file{default.awk} to change with
+new @code{gawk} releases, without requiring the system administrator to
+update it each time by adding the local functions.
+@end table
+
+One user
+@c Karl Berry, karl@ileaf.com, 10/95
+suggested that @code{gawk} be modified to automatically read these files
+upon startup. Instead, it would be very simple to modify @code{igawk}
+to do this. Since @code{igawk} can process nested @samp{@@include}
+directives, @file{default.awk} could simply contain @samp{@@include}
+statements for the desired library functions.
+
+@c Exercise: make this change
+
+@node Language History, Gawk Summary, Sample Programs, Top
+@chapter The Evolution of the @code{awk} Language
+
+This @value{DOCUMENT} describes the GNU implementation of @code{awk}, which follows
+the POSIX specification. Many @code{awk} users are only familiar
+with the original @code{awk} implementation in Version 7 Unix.
+(This implementation was the basis for @code{awk} in Berkeley Unix,
+through 4.3--Reno. The 4.4 release of Berkeley Unix uses @code{gawk} 2.15.2
+for its version of @code{awk}.) This chapter briefly describes the
+evolution of the @code{awk} language, with cross references to other parts
+of the @value{DOCUMENT} where you can find more information.
+
+@menu
+* V7/SVR3.1:: The major changes between V7 and System V
+ Release 3.1.
+* SVR4:: Minor changes between System V Releases 3.1
+ and 4.
+* POSIX:: New features from the POSIX standard.
+* BTL:: New features from the AT&T Bell Laboratories
+ version of @code{awk}.
+* POSIX/GNU:: The extensions in @code{gawk} not in POSIX
+ @code{awk}.
+@end menu
+
+@node V7/SVR3.1, SVR4, Language History, Language History
+@section Major Changes between V7 and SVR3.1
+
+The @code{awk} language evolved considerably between the release of
+Version 7 Unix (1978) and the new version first made generally available in
+System V Release 3.1 (1987). This section summarizes the changes, with
+cross-references to further details.
+
+@itemize @bullet
+@item
+The requirement for @samp{;} to separate rules on a line
+(@pxref{Statements/Lines, ,@code{awk} Statements Versus Lines}).
+
+@item
+User-defined functions, and the @code{return} statement
+(@pxref{User-defined, ,User-defined Functions}).
+
+@item
+The @code{delete} statement (@pxref{Delete, ,The @code{delete} Statement}).
+
+@item
+The @code{do}-@code{while} statement
+(@pxref{Do Statement, ,The @code{do}-@code{while} Statement}).
+
+@item
+The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand} and
+@code{srand} (@pxref{Numeric Functions, ,Numeric Built-in Functions}).
+
+@item
+The built-in functions @code{gsub}, @code{sub}, and @code{match}
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+
+@item
+The built-in functions @code{close}, and @code{system}
+(@pxref{I/O Functions, ,Built-in Functions for Input/Output}).
+
+@item
+The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART},
+and @code{SUBSEP} built-in variables (@pxref{Built-in Variables}).
+
+@item
+The conditional expression using the ternary operator @samp{?:}
+(@pxref{Conditional Exp, ,Conditional Expressions}).
+
+@item
+The exponentiation operator @samp{^}
+(@pxref{Arithmetic Ops, ,Arithmetic Operators}) and its assignment operator
+form @samp{^=} (@pxref{Assignment Ops, ,Assignment Expressions}).
+
+@item
+C-compatible operator precedence, which breaks some old @code{awk}
+programs (@pxref{Precedence, ,Operator Precedence (How Operators Nest)}).
+
+@item
+Regexps as the value of @code{FS}
+(@pxref{Field Separators, ,Specifying How Fields are Separated}), and as the
+third argument to the @code{split} function
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+
+@item
+Dynamic regexps as operands of the @samp{~} and @samp{!~} operators
+(@pxref{Regexp Usage, ,How to Use Regular Expressions}).
+
+@item
+The escape sequences @samp{\b}, @samp{\f}, and @samp{\r}
+(@pxref{Escape Sequences}).
+(Some vendors have updated their old versions of @code{awk} to
+recognize @samp{\r}, @samp{\b}, and @samp{\f}, but this is not
+something you can rely on.)
+
+@item
+Redirection of input for the @code{getline} function
+(@pxref{Getline, ,Explicit Input with @code{getline}}).
+
+@item
+Multiple @code{BEGIN} and @code{END} rules
+(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}).
+
+@item
+Multi-dimensional arrays
+(@pxref{Multi-dimensional, ,Multi-dimensional Arrays}).
+@end itemize
+
+@node SVR4, POSIX, V7/SVR3.1, Language History
+@section Changes between SVR3.1 and SVR4
+
+@cindex @code{awk} language, V.4 version
+The System V Release 4 version of Unix @code{awk} added these features
+(some of which originated in @code{gawk}):
+
+@itemize @bullet
+@item
+The @code{ENVIRON} variable (@pxref{Built-in Variables}).
+
+@item
+Multiple @samp{-f} options on the command line
+(@pxref{Options, ,Command Line Options}).
+
+@item
+The @samp{-v} option for assigning variables before program execution begins
+(@pxref{Options, ,Command Line Options}).
+
+@item
+The @samp{--} option for terminating command line options.
+
+@item
+The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences
+(@pxref{Escape Sequences}).
+
+@item
+A defined return value for the @code{srand} built-in function
+(@pxref{Numeric Functions, ,Numeric Built-in Functions}).
+
+@item
+The @code{toupper} and @code{tolower} built-in string functions
+for case translation
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+
+@item
+A cleaner specification for the @samp{%c} format-control letter in the
+@code{printf} function
+(@pxref{Control Letters, ,Format-Control Letters}).
+
+@item
+The ability to dynamically pass the field width and precision (@code{"%*.*d"})
+in the argument list of the @code{printf} function
+(@pxref{Control Letters, ,Format-Control Letters}).
+
+@item
+The use of regexp constants such as @code{/foo/} as expressions, where
+they are equivalent to using the matching operator, as in @samp{$0 ~ /foo/}
+(@pxref{Using Constant Regexps, ,Using Regular Expression Constants}).
+@end itemize
+
+@node POSIX, BTL, SVR4, Language History
+@section Changes between SVR4 and POSIX @code{awk}
+
+The POSIX Command Language and Utilities standard for @code{awk}
+introduced the following changes into the language:
+
+@itemize @bullet
+@item
+The use of @samp{-W} for implementation-specific options.
+
+@item
+The use of @code{CONVFMT} for controlling the conversion of numbers
+to strings (@pxref{Conversion, ,Conversion of Strings and Numbers}).
+
+@item
+The concept of a numeric string, and tighter comparison rules to go
+with it (@pxref{Typing and Comparison, ,Variable Typing and Comparison Expressions}).
+
+@item
+More complete documentation of many of the previously undocumented
+features of the language.
+@end itemize
+
+The following common extensions are not permitted by the POSIX
+standard:
+
+@c IMPORTANT! Keep this list in sync with the one in node Options
+
+@itemize @bullet
+@item
+@code{\x} escape sequences are not recognized
+(@pxref{Escape Sequences}).
+
+@item
+The synonym @code{func} for the keyword @code{function} is not
+recognized (@pxref{Definition Syntax, ,Function Definition Syntax}).
+
+@item
+The operators @samp{**} and @samp{**=} cannot be used in
+place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops, ,Arithmetic Operators},
+and also @pxref{Assignment Ops, ,Assignment Expressions}).
+
+@item
+Specifying @samp{-Ft} on the command line does not set the value
+of @code{FS} to be a single tab character
+(@pxref{Field Separators, ,Specifying How Fields are Separated}).
+
+@item
+The @code{fflush} built-in function is not supported
+(@pxref{I/O Functions, , Built-in Functions for Input/Output}).
+@end itemize
+
+@node BTL, POSIX/GNU, POSIX, Language History
+@section Extensions in the AT&T Bell Laboratories @code{awk}
+
+@cindex Kernighan, Brian
+Brian Kernighan, one of the original designers of Unix @code{awk},
+has made his version available via anonymous @code{ftp}
+(@pxref{Other Versions, ,Other Freely Available @code{awk} Implementations}).
+This section describes extensions in his version of @code{awk} that are
+not in POSIX @code{awk}.
+
+@itemize @bullet
+@item
+The @samp{-mf=@var{NNN}} and @samp{-mr=@var{NNN}} command line options
+to set the maximum number of fields, and the maximum
+record size, respectively
+(@pxref{Options, ,Command Line Options}).
+
+@item
+The @code{fflush} built-in function for flushing buffered output
+(@pxref{I/O Functions, ,Built-in Functions for Input/Output}).
+
+@ignore
+@item
+The @code{SYMTAB} array, that allows access to the internal symbol
+table of @code{awk}. This feature is not documented, largely because
+it is somewhat shakily implemented. For instance, you cannot access arrays
+or array elements through it.
+@end ignore
+@end itemize
+
+@node POSIX/GNU, , BTL, Language History
+@section Extensions in @code{gawk} Not in POSIX @code{awk}
+
+@cindex compatibility mode
+The GNU implementation, @code{gawk}, adds a number of features.
+This sections lists them in the order they were added to @code{gawk}.
+They can all be disabled with either the @samp{--traditional} or
+@samp{--posix} options
+(@pxref{Options, ,Command Line Options}).
+
+Version 2.10 of @code{gawk} introduced these features:
+
+@itemize @bullet
+@item
+The @code{AWKPATH} environment variable for specifying a path search for
+the @samp{-f} command line option
+(@pxref{Options, ,Command Line Options}).
+
+@item
+The @code{IGNORECASE} variable and its effects
+(@pxref{Case-sensitivity, ,Case-sensitivity in Matching}).
+
+@item
+The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr}, and
+@file{/dev/fd/@var{n}} file name interpretation
+(@pxref{Special Files, ,Special File Names in @code{gawk}}).
+@end itemize
+
+Version 2.13 of @code{gawk} introduced these features:
+
+@itemize @bullet
+@item
+The @code{FIELDWIDTHS} variable and its effects
+(@pxref{Constant Size, ,Reading Fixed-width Data}).
+
+@item
+The @code{systime} and @code{strftime} built-in functions for obtaining
+and printing time stamps
+(@pxref{Time Functions, ,Functions for Dealing with Time Stamps}).
+
+@item
+The @samp{-W lint} option to provide source code and run time error
+and portability checking
+(@pxref{Options, ,Command Line Options}).
+
+@item
+The @samp{-W compat} option to turn off these extensions
+(@pxref{Options, ,Command Line Options}).
+
+@item
+The @samp{-W posix} option for full POSIX compliance
+(@pxref{Options, ,Command Line Options}).
+@end itemize
+
+Version 2.14 of @code{gawk} introduced these features:
+
+@itemize @bullet
+@item
+The @code{next file} statement for skipping to the next data file
+(@pxref{Nextfile Statement, ,The @code{nextfile} Statement}).
+@end itemize
+
+Version 2.15 of @code{gawk} introduced these features:
+
+@itemize @bullet
+@item
+The @code{ARGIND} variable, that tracks the movement of @code{FILENAME}
+through @code{ARGV} (@pxref{Built-in Variables}).
+
+@item
+The @code{ERRNO} variable, that contains the system error message when
+@code{getline} returns @minus{}1, or when @code{close} fails
+(@pxref{Built-in Variables}).
+
+@item
+The ability to use GNU-style long named options that start with @samp{--}
+(@pxref{Options, ,Command Line Options}).
+
+@item
+The @samp{--source} option for mixing command line and library
+file source code
+(@pxref{Options, ,Command Line Options}).
+
+@item
+The @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}, and
+@file{/dev/user} file name interpretation
+(@pxref{Special Files, ,Special File Names in @code{gawk}}).
+@end itemize
+
+Version 3.0 of @code{gawk} introduced these features:
+
+@itemize @bullet
+@item
+The @code{next file} statement became @code{nextfile}
+(@pxref{Nextfile Statement, ,The @code{nextfile} Statement}).
+
+@item
+The @samp{--lint-old} option to
+warn about constructs that are not available in
+the original Version 7 Unix version of @code{awk}
+(@pxref{V7/SVR3.1, , Major Changes between V7 and SVR3.1}).
+
+@item
+The @samp{--traditional} option was added as a better name for
+@samp{--compat} (@pxref{Options, ,Command Line Options}).
+
+@item
+The ability for @code{FS} to be a null string, and for the third
+argument to @code{split} to be the null string
+(@pxref{Single Character Fields, , Making Each Character a Separate Field}).
+
+@item
+The ability for @code{RS} to be a regexp
+(@pxref{Records, , How Input is Split into Records}).
+
+@item
+The @code{RT} variable
+(@pxref{Records, , How Input is Split into Records}).
+
+@item
+The @code{gensub} function for more powerful text manipulation
+(@pxref{String Functions, , Built-in Functions for String Manipulation}).
+
+@item
+The @code{strftime} function acquired a default time format,
+allowing it to be called with no arguments
+(@pxref{Time Functions, , Functions for Dealing with Time Stamps}).
+
+@item
+Full support for both POSIX and GNU regexps
+(@pxref{Regexp, , Regular Expressions}).
+
+@item
+The @samp{--re-interval} option to provide interval expressions in regexps
+(@pxref{Regexp Operators, , Regular Expression Operators}).
+
+@item
+@code{IGNORECASE} changed, now applying to string comparison as well
+as regexp operations
+(@pxref{Case-sensitivity, ,Case-sensitivity in Matching}).
+
+@item
+The @samp{-m} option and the @code{fflush} function from the
+Bell Labs research version of @code{awk}
+(@pxref{Options, ,Command Line Options}; also
+@pxref{I/O Functions, ,Built-in Functions for Input/Output}).
+
+@item
+The use of GNU Autoconf to control the configuration process
+(@pxref{Quick Installation, , Compiling @code{gawk} for Unix}).
+
+@item
+Amiga support
+(@pxref{Amiga Installation, ,Installing @code{gawk} on an Amiga}).
+
+@c XXX ADD MORE STUFF HERE
+
+@end itemize
+
+@node Gawk Summary, Installation, Language History, Top
+@appendix @code{gawk} Summary
+
+This appendix provides a brief summary of the @code{gawk} command line and the
+@code{awk} language. It is designed to serve as ``quick reference.'' It is
+therefore terse, but complete.
+
+@menu
+* Command Line Summary:: Recapitulation of the command line.
+* Language Summary:: A terse review of the language.
+* Variables/Fields:: Variables, fields, and arrays.
+* Rules Summary:: Patterns and Actions, and their component
+ parts.
+* Actions Summary:: Quick overview of actions.
+* Functions Summary:: Defining and calling functions.
+* Historical Features:: Some undocumented but supported ``features''.
+@end menu
+
+@node Command Line Summary, Language Summary, Gawk Summary, Gawk Summary
+@appendixsec Command Line Options Summary
+
+The command line consists of options to @code{gawk} itself, the
+@code{awk} program text (if not supplied via the @samp{-f} option), and
+values to be made available in the @code{ARGC} and @code{ARGV}
+predefined @code{awk} variables:
+
+@example
+gawk @r{[@var{POSIX or GNU style options}]} -f @var{source-file} @r{[@code{--}]} @var{file} @dots{}
+gawk @r{[@var{POSIX or GNU style options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{}
+@end example
+
+The options that @code{gawk} accepts are:
+
+@table @code
+@item -F @var{fs}
+@itemx --field-separator @var{fs}
+Use @var{fs} for the input field separator (the value of the @code{FS}
+predefined variable).
+
+@item -f @var{program-file}
+@itemx --file @var{program-file}
+Read the @code{awk} program source from the file @var{program-file}, instead
+of from the first command line argument.
+
+@item -mf=@var{NNN}
+@itemx -mr=@var{NNN}
+The @samp{f} flag sets
+the maximum number of fields, and the @samp{r} flag sets the maximum
+record size. These options are ignored by @code{gawk}, since @code{gawk}
+has no predefined limits; they are only for compatibility with the
+Bell Labs research version of Unix @code{awk}.
+
+@item -v @var{var}=@var{val}
+@itemx --assign @var{var}=@var{val}
+Assign the variable @var{var} the value @var{val} before program execution
+begins.
+
+@item -W traditional
+@itemx -W compat
+@itemx --traditional
+@itemx --compat
+Use compatibility mode, in which @code{gawk} extensions are turned
+off.
+
+@item -W copyleft
+@itemx -W copyright
+@itemx --copyleft
+@itemx --copyright
+Print the short version of the General Public License on the error
+output. This option may disappear in a future version of @code{gawk}.
+
+@item -W help
+@itemx -W usage
+@itemx --help
+@itemx --usage
+Print a relatively short summary of the available options on the error output.
+
+@item -W lint
+@itemx --lint
+Give warnings about dubious or non-portable @code{awk} constructs.
+
+@item -W lint-old
+@itemx --lint-old
+Warn about constructs that are not available in
+the original Version 7 Unix version of @code{awk}.
+
+@item -W posix
+@itemx --posix
+Use POSIX compatibility mode, in which @code{gawk} extensions
+are turned off and additional restrictions apply.
+
+@item -W re-interval
+@itemx --re-interval
+Allow interval expressions
+(@pxref{Regexp Operators, , Regular Expression Operators}),
+in regexps.
+
+@item -W source=@var{program-text}
+@itemx --source @var{program-text}
+Use @var{program-text} as @code{awk} program source code. This option allows
+mixing command line source code with source code from files, and is
+particularly useful for mixing command line programs with library functions.
+
+@item -W version
+@itemx --version
+Print version information for this particular copy of @code{gawk} on the error
+output.
+
+@item --
+Signal the end of options. This is useful to allow further arguments to the
+@code{awk} program itself to start with a @samp{-}. This is mainly for
+consistency with POSIX argument parsing conventions.
+@end table
+
+Any other options are flagged as invalid, but are otherwise ignored.
+@xref{Options, ,Command Line Options}, for more details.
+
+@node Language Summary, Variables/Fields, Command Line Summary, Gawk Summary
+@appendixsec Language Summary
+
+An @code{awk} program consists of a sequence of zero or more pattern-action
+statements and optional function definitions. One or the other of the
+pattern and action may be omitted.
+
+@example
+@var{pattern} @{ @var{action statements} @}
+@var{pattern}
+ @{ @var{action statements} @}
+
+function @var{name}(@var{parameter list}) @{ @var{action statements} @}
+@end example
+
+@code{gawk} first reads the program source from the
+@var{program-file}(s), if specified, or from the first non-option
+argument on the command line. The @samp{-f} option may be used multiple
+times on the command line. @code{gawk} reads the program text from all
+the @var{program-file} files, effectively concatenating them in the
+order they are specified. This is useful for building libraries of
+@code{awk} functions, without having to include them in each new
+@code{awk} program that uses them. To use a library function in a file
+from a program typed in on the command line, specify
+@samp{--source '@var{program}'}, and type your program in between the single
+quotes.
+@xref{Options, ,Command Line Options}.
+
+The environment variable @code{AWKPATH} specifies a search path to use
+when finding source files named with the @samp{-f} option. The default
+path, which is
+@samp{.:/usr/local/share/awk}@footnote{The path may use a directory
+other than @file{/usr/local/share/awk}, depending upon how @code{gawk}
+was built and installed.} is used if @code{AWKPATH} is not set.
+If a file name given to the @samp{-f} option contains a @samp{/} character,
+no path search is performed.
+@xref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}.
+
+@code{gawk} compiles the program into an internal form, and then proceeds to
+read each file named in the @code{ARGV} array.
+The initial values of @code{ARGV} come from the command line arguments.
+If there are no files named
+on the command line, @code{gawk} reads the standard input.
+
+If a ``file'' named on the command line has the form
+@samp{@var{var}=@var{val}}, it is treated as a variable assignment: the
+variable @var{var} is assigned the value @var{val}.
+If any of the files have a value that is the null string, that
+element in the list is skipped.
+
+For each record in the input, @code{gawk} tests to see if it matches any
+@var{pattern} in the @code{awk} program. For each pattern that the record
+matches, the associated @var{action} is executed.
+
+@node Variables/Fields, Rules Summary, Language Summary, Gawk Summary
+@appendixsec Variables and Fields
+
+@code{awk} variables are not declared; they come into existence when they are
+first used. Their values are either floating-point numbers or strings.
+@code{awk} also has one-dimensional arrays; multiple-dimensional arrays
+may be simulated. There are several predefined variables that
+@code{awk} sets as a program runs; these are summarized below.
+
+@menu
+* Fields Summary:: Input field splitting.
+* Built-in Summary:: @code{awk}'s built-in variables.
+* Arrays Summary:: Using arrays.
+* Data Type Summary:: Values in @code{awk} are numbers or strings.
+@end menu
+
+@node Fields Summary, Built-in Summary, Variables/Fields, Variables/Fields
+@appendixsubsec Fields
+
+As each input line is read, @code{gawk} splits the line into
+@var{fields}, using the value of the @code{FS} variable as the field
+separator. If @code{FS} is a single character, fields are separated by
+that character. Otherwise, @code{FS} is expected to be a full regular
+expression. In the special case that @code{FS} is a single space,
+fields are separated by runs of spaces and/or tabs.
+If @code{FS} is the null string (@code{""}), then each individual
+character in the record becomes a separate field.
+Note that the value
+of @code{IGNORECASE} (@pxref{Case-sensitivity, ,Case-sensitivity in Matching})
+also affects how fields are split when @code{FS} is a regular expression.
+
+Each field in the input line may be referenced by its position, @code{$1},
+@code{$2}, and so on. @code{$0} is the whole line. The value of a field may
+be assigned to as well. Field numbers need not be constants:
+
+@example
+n = 5
+print $n
+@end example
+
+@noindent
+prints the fifth field in the input line. The variable @code{NF} is set to
+the total number of fields in the input line.
+
+References to non-existent fields (i.e.@: fields after @code{$NF}) return
+the null string. However, assigning to a non-existent field (e.g.,
+@code{$(NF+2) = 5}) increases the value of @code{NF}, creates any
+intervening fields with the null string as their value, and causes the
+value of @code{$0} to be recomputed, with the fields being separated by
+the value of @code{OFS}.
+@xref{Reading Files, ,Reading Input Files}.
+
+@node Built-in Summary, Arrays Summary, Fields Summary, Variables/Fields
+@appendixsubsec Built-in Variables
+
+@code{gawk}'s built-in variables are:
+
+@table @code
+@item ARGC
+The number of elements in @code{ARGV}. See below for what is actually
+included in @code{ARGV}.
+
+@item ARGIND
+The index in @code{ARGV} of the current file being processed.
+When @code{gawk} is processing the input data files,
+it is always true that @samp{FILENAME == ARGV[ARGIND]}.
+
+@item ARGV
+The array of command line arguments. The array is indexed from zero to
+@code{ARGC} @minus{} 1. Dynamically changing @code{ARGC} and
+the contents of @code{ARGV}
+can control the files used for data. A null-valued element in
+@code{ARGV} is ignored. @code{ARGV} does not include the options to
+@code{awk} or the text of the @code{awk} program itself.
+
+@item CONVFMT
+The conversion format to use when converting numbers to strings.
+
+@item FIELDWIDTHS
+A space separated list of numbers describing the fixed-width input data.
+
+@item ENVIRON
+An array of environment variable values. The array
+is indexed by variable name, each element being the value of that
+variable. Thus, the environment variable @code{HOME} is
+@code{ENVIRON["HOME"]}. One possible value might be @file{/home/arnold}.
+
+Changing this array does not affect the environment seen by programs
+which @code{gawk} spawns via redirection or the @code{system} function.
+(This may change in a future version of @code{gawk}.)
+
+Some operating systems do not have environment variables.
+The @code{ENVIRON} array is empty when running on these systems.
+
+@item ERRNO
+The system error message when an error occurs using @code{getline}
+or @code{close}.
+
+@item FILENAME
+The name of the current input file. If no files are specified on the command
+line, the value of @code{FILENAME} is the null string.
+
+@item FNR
+The input record number in the current input file.
+
+@item FS
+The input field separator, a space by default.
+
+@item IGNORECASE
+The case-sensitivity flag for string comparisons and regular expression
+operations. If @code{IGNORECASE} has a non-zero value, then pattern
+matching in rules, record separating with @code{RS}, field splitting
+with @code{FS}, regular expression matching with @samp{~} and
+@samp{!~}, and the @code{gensub}, @code{gsub}, @code{index},
+@code{match}, @code{split} and @code{sub} built-in functions all
+ignore case when doing regular expression operations, and all string
+comparisons are done ignoring case.
+
+@item NF
+The number of fields in the current input record.
+
+@item NR
+The total number of input records seen so far.
+
+@item OFMT
+The output format for numbers for the @code{print} statement,
+@code{"%.6g"} by default.
+
+@item OFS
+The output field separator, a space by default.
+
+@item ORS
+The output record separator, by default a newline.
+
+@item RS
+The input record separator, by default a newline.
+If @code{RS} is set to the null string, then records are separated by
+blank lines. When @code{RS} is set to the null string, then the newline
+character always acts as a field separator, in addition to whatever value
+@code{FS} may have. If @code{RS} is set to a multi-character
+string, it denotes a regexp; input text matching the regexp
+separates records.
+
+@item RT
+The input text that matched the text denoted by @code{RS},
+the record separator.
+
+@item RSTART
+The index of the first character last matched by @code{match}; zero if no match.
+
+@item RLENGTH
+The length of the string last matched by @code{match}; @minus{}1 if no match.
+
+@item SUBSEP
+The string used to separate multiple subscripts in array elements, by
+default @code{"\034"}.
+@end table
+
+@xref{Built-in Variables}, for more information.
+
+@node Arrays Summary, Data Type Summary, Built-in Summary, Variables/Fields
+@appendixsubsec Arrays
+
+Arrays are subscripted with an expression between square brackets
+(@samp{[} and @samp{]}). Array subscripts are @emph{always} strings;
+numbers are converted to strings as necessary, following the standard
+conversion rules
+(@pxref{Conversion, ,Conversion of Strings and Numbers}).
+
+If you use multiple expressions separated by commas inside the square
+brackets, then the array subscript is a string consisting of the
+concatenation of the individual subscript values, converted to strings,
+separated by the subscript separator (the value of @code{SUBSEP}).
+
+The special operator @code{in} may be used in a conditional context
+to see if an array has an index consisting of a particular value.
+
+@example
+if (val in array)
+ print array[val]
+@end example
+
+If the array has multiple subscripts, use @samp{(i, j, @dots{}) in @var{array}}
+to test for existence of an element.
+
+The @code{in} construct may also be used in a @code{for} loop to iterate
+over all the elements of an array.
+@xref{Scanning an Array, ,Scanning All Elements of an Array}.
+
+You can remove an element from an array using the @code{delete} statement.
+
+You can clear an entire array using @samp{delete @var{array}}.
+
+@xref{Arrays, ,Arrays in @code{awk}}.
+
+@node Data Type Summary, , Arrays Summary, Variables/Fields
+@appendixsubsec Data Types
+
+The value of an @code{awk} expression is always either a number
+or a string.
+
+Some contexts (such as arithmetic operators) require numeric
+values. They convert strings to numbers by interpreting the text
+of the string as a number. If the string does not look like a
+number, it converts to zero.
+
+Other contexts (such as concatenation) require string values.
+They convert numbers to strings by effectively printing them
+with @code{sprintf}.
+@xref{Conversion, ,Conversion of Strings and Numbers}, for the details.
+
+To force conversion of a string value to a number, simply add zero
+to it. If the value you start with is already a number, this
+does not change it.
+
+To force conversion of a numeric value to a string, concatenate it with
+the null string.
+
+Comparisons are done numerically if both operands are numeric, or if
+one is numeric and the other is a numeric string. Otherwise one or
+both operands are converted to strings and a string comparison is
+performed. Fields, @code{getline} input, @code{FILENAME}, @code{ARGV}
+elements, @code{ENVIRON} elements and the elements of an array created
+by @code{split} are the only items that can be numeric strings. String
+constants, such as @code{"3.1415927"} are not numeric strings, they are
+string constants. The full rules for comparisons are described in
+@ref{Typing and Comparison, ,Variable Typing and Comparison Expressions}.
+
+Uninitialized variables have the string value @code{""} (the null, or
+empty, string). In contexts where a number is required, this is
+equivalent to zero.
+
+@xref{Variables}, for more information on variable naming and initialization;
+@pxref{Conversion, ,Conversion of Strings and Numbers}, for more information
+on how variable values are interpreted.
+
+@node Rules Summary, Actions Summary, Variables/Fields, Gawk Summary
+@appendixsec Patterns
+
+@menu
+* Pattern Summary:: Quick overview of patterns.
+* Regexp Summary:: Quick overview of regular expressions.
+@end menu
+
+An @code{awk} program is mostly composed of rules, each consisting of a
+pattern followed by an action. The action is enclosed in @samp{@{} and
+@samp{@}}. Either the pattern may be missing, or the action may be
+missing, but not both. If the pattern is missing, the
+action is executed for every input record. A missing action is
+equivalent to @samp{@w{@{ print @}}}, which prints the entire line.
+
+@c These paragraphs repeated for both patterns and actions. I don't
+@c like this, but I also don't see any way around it. Update both copies
+@c if they need fixing.
+Comments begin with the @samp{#} character, and continue until the end of the
+line. Blank lines may be used to separate statements. Statements normally
+end with a newline; however, this is not the case for lines ending in a
+@samp{,}, @samp{@{}, @samp{?}, @samp{:}, @samp{&&}, or @samp{||}. Lines
+ending in @code{do} or @code{else} also have their statements automatically
+continued on the following line. In other cases, a line can be continued by
+ending it with a @samp{\}, in which case the newline is ignored.
+
+Multiple statements may be put on one line by separating each one with
+a @samp{;}.
+This applies to both the statements within the action part of a rule (the
+usual case), and to the rule statements.
+
+@xref{Comments, ,Comments in @code{awk} Programs}, for information on
+@code{awk}'s commenting convention;
+@pxref{Statements/Lines, ,@code{awk} Statements Versus Lines}, for a
+description of the line continuation mechanism in @code{awk}.
+
+@node Pattern Summary, Regexp Summary, Rules Summary, Rules Summary
+@appendixsubsec Pattern Summary
+
+@code{awk} patterns may be one of the following:
+
+@example
+/@var{regular expression}/
+@var{relational expression}
+@var{pattern} && @var{pattern}
+@var{pattern} || @var{pattern}
+@var{pattern} ? @var{pattern} : @var{pattern}
+(@var{pattern})
+! @var{pattern}
+@var{pattern1}, @var{pattern2}
+BEGIN
+END
+@end example
+
+@code{BEGIN} and @code{END} are two special kinds of patterns that are not
+tested against the input. The action parts of all @code{BEGIN} rules are
+concatenated as if all the statements had been written in a single @code{BEGIN}
+rule. They are executed before any of the input is read. Similarly, all the
+@code{END} rules are concatenated, and executed when all the input is exhausted (or
+when an @code{exit} statement is executed). @code{BEGIN} and @code{END}
+patterns cannot be combined with other patterns in pattern expressions.
+@code{BEGIN} and @code{END} rules cannot have missing action parts.
+
+For @code{/@var{regular-expression}/} patterns, the associated statement is
+executed for each input record that matches the regular expression. Regular
+expressions are summarized below.
+
+A @var{relational expression} may use any of the operators defined below in
+the section on actions. These generally test whether certain fields match
+certain regular expressions.
+
+The @samp{&&}, @samp{||}, and @samp{!} operators are logical ``and,''
+logical ``or,'' and logical ``not,'' respectively, as in C. They do
+short-circuit evaluation, also as in C, and are used for combining more
+primitive pattern expressions. As in most languages, parentheses may be
+used to change the order of evaluation.
+
+The @samp{?:} operator is like the same operator in C. If the first
+pattern matches, then the second pattern is matched against the input
+record; otherwise, the third is matched. Only one of the second and
+third patterns is matched.
+
+The @samp{@var{pattern1}, @var{pattern2}} form of a pattern is called a
+range pattern. It matches all input lines starting with a line that
+matches @var{pattern1}, and continuing until a line that matches
+@var{pattern2}, inclusive. A range pattern cannot be used as an operand
+of any of the pattern operators.
+
+@xref{Pattern Overview, ,Pattern Elements}.
+
+@node Regexp Summary, , Pattern Summary, Rules Summary
+@appendixsubsec Regular Expressions
+
+Regular expressions are based on POSIX EREs (extended regular expressions).
+The escape sequences allowed in string constants are also valid in
+regular expressions (@pxref{Escape Sequences}).
+Regexps are composed of characters as follows:
+
+@table @code
+@item @var{c}
+matches the character @var{c} (assuming @var{c} is none of the characters
+listed below).
+
+@item \@var{c}
+matches the literal character @var{c}.
+
+@item .
+matches any character, @emph{including} newline.
+In strict POSIX mode, @samp{.} does not match the @sc{nul}
+character, which is a character with all bits equal to zero.
+
+@item ^
+matches the beginning of a string.
+
+@item $
+matches the end of a string.
+
+@item [@var{abc}@dots{}]
+matches any of the characters @var{abc}@dots{} (character list).
+
+@item [[:@var{class}:]]
+matches any character in the character class @var{class}. Allowable classes
+are @code{alnum}, @code{alpha}, @code{blank}, @code{cntrl},
+@code{digit}, @code{graph}, @code{lower}, @code{print}, @code{punct},
+@code{space}, @code{upper}, and @code{xdigit}.
+
+@item [[.@var{symbol}.]]
+matches the multi-character collating symbol @var{symbol}.
+@code{gawk} does not currently support collating symbols.
+
+@item [[=@var{chars}=]]
+matches any of the equivalent characters in @var{chars}.
+@code{gawk} does not currently support equivalence classes.
+
+@item [^@var{abc}@dots{}]
+matches any character except @var{abc}@dots{} and newline (negated
+character list).
+
+@item @var{r1}|@var{r2}
+matches either @var{r1} or @var{r2} (alternation).
+
+@item @var{r1r2}
+matches @var{r1}, and then @var{r2} (concatenation).
+
+@item @var{r}+
+matches one or more @var{r}'s.
+
+@item @var{r}*
+matches zero or more @var{r}'s.
+
+@item @var{r}?
+matches zero or one @var{r}'s.
+
+@item (@var{r})
+matches @var{r} (grouping).
+
+@item @var{r}@{@var{n}@}
+@itemx @var{r}@{@var{n},@}
+@itemx @var{r}@{@var{n},@var{m}@}
+matches at least @var{n}, @var{n} to any number, or @var{n} to @var{m}
+occurrences of @var{r} (interval expressions).
+
+@item \y
+matches the empty string at either the beginning or the
+end of a word.
+
+@item \B
+matches the empty string within a word.
+
+@item \<
+matches the empty string at the beginning of a word.
+
+@item \>
+matches the empty string at the end of a word.
+
+@item \w
+matches any word-constituent character (alphanumeric characters and
+the underscore).
+
+@item \W
+matches any character that is not word-constituent.
+
+@item \`
+matches the empty string at the beginning of a buffer (same as a string
+in @code{gawk}).
+
+@item \'
+matches the empty string at the end of a buffer.
+@end table
+
+The various command line options
+control how @code{gawk} interprets characters in regexps.
+
+@c NOTE!!! Keep this in sync with the same table in the regexp chapter!
+@table @asis
+@item No options
+In the default case, @code{gawk} provide all the facilities of
+POSIX regexps and the GNU regexp operators described above.
+However, interval expressions are not supported.
+
+@item @code{--posix}
+Only POSIX regexps are supported, the GNU operators are not special
+(e.g., @samp{\w} matches a literal @samp{w}). Interval expressions
+are allowed.
+
+@item @code{--traditional}
+Traditional Unix @code{awk} regexps are matched. The GNU operators
+are not special, interval expressions are not available, and neither
+are the POSIX character classes (@code{[[:alnum:]]} and so on).
+Characters described by octal and hexadecimal escape sequences are
+treated literally, even if they represent regexp metacharacters.
+
+@item @code{--re-interval}
+Allow interval expressions in regexps, even if @samp{--traditional}
+has been provided.
+@end table
+
+@xref{Regexp, ,Regular Expressions}.
+
+@node Actions Summary, Functions Summary, Rules Summary, Gawk Summary
+@appendixsec Actions
+
+Action statements are enclosed in braces, @samp{@{} and @samp{@}}.
+A missing action statement is equivalent to @samp{@w{@{ print @}}}.
+
+Action statements consist of the usual assignment, conditional, and looping
+statements found in most languages. The operators, control statements,
+and Input/Output statements available are similar to those in C.
+
+@c These paragraphs repeated for both patterns and actions. I don't
+@c like this, but I also don't see any way around it. Update both copies
+@c if they need fixing.
+Comments begin with the @samp{#} character, and continue until the end of the
+line. Blank lines may be used to separate statements. Statements normally
+end with a newline; however, this is not the case for lines ending in a
+@samp{,}, @samp{@{}, @samp{?}, @samp{:}, @samp{&&}, or @samp{||}. Lines
+ending in @code{do} or @code{else} also have their statements automatically
+continued on the following line. In other cases, a line can be continued by
+ending it with a @samp{\}, in which case the newline is ignored.
+
+Multiple statements may be put on one line by separating each one with
+a @samp{;}.
+This applies to both the statements within the action part of a rule (the
+usual case), and to the rule statements.
+
+@xref{Comments, ,Comments in @code{awk} Programs}, for information on
+@code{awk}'s commenting convention;
+@pxref{Statements/Lines, ,@code{awk} Statements Versus Lines}, for a
+description of the line continuation mechanism in @code{awk}.
+
+@menu
+* Operator Summary:: @code{awk} operators.
+* Control Flow Summary:: The control statements.
+* I/O Summary:: The I/O statements.
+* Printf Summary:: A summary of @code{printf}.
+* Special File Summary:: Special file names interpreted internally.
+* Built-in Functions Summary:: Built-in numeric and string functions.
+* Time Functions Summary:: Built-in time functions.
+* String Constants Summary:: Escape sequences in strings.
+@end menu
+
+@node Operator Summary, Control Flow Summary, Actions Summary, Actions Summary
+@appendixsubsec Operators
+
+The operators in @code{awk}, in order of decreasing precedence, are:
+
+@table @code
+@item (@dots{})
+Grouping.
+
+@item $
+Field reference.
+
+@item ++ --
+Increment and decrement, both prefix and postfix.
+
+@item ^
+Exponentiation (@samp{**} may also be used, and @samp{**=} for the assignment
+operator, but they are not specified in the POSIX standard).
+
+@item + - !
+Unary plus, unary minus, and logical negation.
+
+@item * / %
+Multiplication, division, and modulus.
+
+@item + -
+Addition and subtraction.
+
+@item @var{space}
+String concatenation.
+
+@item < <= > >= != ==
+The usual relational operators.
+
+@item ~ !~
+Regular expression match, negated match.
+
+@item in
+Array membership.
+
+@item &&
+Logical ``and''.
+
+@item ||
+Logical ``or''.
+
+@item ?:
+A conditional expression. This has the form @samp{@var{expr1} ?
+@var{expr2} : @var{expr3}}. If @var{expr1} is true, the value of the
+expression is @var{expr2}; otherwise it is @var{expr3}. Only one of
+@var{expr2} and @var{expr3} is evaluated.
+
+@item = += -= *= /= %= ^=
+Assignment. Both absolute assignment (@code{@var{var}=@var{value}})
+and operator assignment (the other forms) are supported.
+@end table
+
+@xref{Expressions}.
+
+@node Control Flow Summary, I/O Summary, Operator Summary, Actions Summary
+@appendixsubsec Control Statements
+
+The control statements are as follows:
+
+@example
+if (@var{condition}) @var{statement} @r{[} else @var{statement} @r{]}
+while (@var{condition}) @var{statement}
+do @var{statement} while (@var{condition})
+for (@var{expr1}; @var{expr2}; @var{expr3}) @var{statement}
+for (@var{var} in @var{array}) @var{statement}
+break
+continue
+delete @var{array}[@var{index}]
+delete @var{array}
+exit @r{[} @var{expression} @r{]}
+@{ @var{statements} @}
+@end example
+
+@xref{Statements, ,Control Statements in Actions}.
+
+@node I/O Summary, Printf Summary, Control Flow Summary, Actions Summary
+@appendixsubsec I/O Statements
+
+The Input/Output statements are as follows:
+
+@table @code
+@item getline
+Set @code{$0} from next input record; set @code{NF}, @code{NR}, @code{FNR}.
+@xref{Getline, ,Explicit Input with @code{getline}}.
+
+@item getline <@var{file}
+Set @code{$0} from next record of @var{file}; set @code{NF}.
+
+@item getline @var{var}
+Set @var{var} from next input record; set @code{NF}, @code{FNR}.
+
+@item getline @var{var} <@var{file}
+Set @var{var} from next record of @var{file}.
+
+@item @var{command} | getline
+Run @var{command}, piping its output into @code{getline}; sets @code{$0},
+@code{NF}, @code{NR}.
+
+@item @var{command} | getline @code{var}
+Run @var{command}, piping its output into @code{getline}; sets @var{var}.
+
+@item next
+Stop processing the current input record. The next input record is read and
+processing starts over with the first pattern in the @code{awk} program.
+If the end of the input data is reached, the @code{END} rule(s), if any,
+are executed.
+@xref{Next Statement, ,The @code{next} Statement}.
+
+@item nextfile
+Stop processing the current input file. The next input record read comes
+from the next input file. @code{FILENAME} is updated, @code{FNR} is set to one,
+@code{ARGIND} is incremented,
+and processing starts over with the first pattern in the @code{awk} program.
+If the end of the input data is reached, the @code{END} rule(s), if any,
+are executed.
+Earlier versions of @code{gawk} used @samp{next file}; this usage is still
+supported, but is considered to be deprecated.
+@xref{Nextfile Statement, ,The @code{nextfile} Statement}.
+
+@item print
+Prints the current record.
+@xref{Printing, ,Printing Output}.
+
+@item print @var{expr-list}
+Prints expressions.
+
+@item print @var{expr-list} > @var{file}
+Prints expressions to @var{file}. If @var{file} does not exist, it is
+created. If it does exist, its contents are deleted the first time the
+@code{print} is executed.
+
+@item print @var{expr-list} >> @var{file}
+Prints expressions to @var{file}. The previous contents of @var{file}
+are retained, and the output of @code{print} is appended to the file.
+
+@item print @var{expr-list} | @var{command}
+Prints expressions, sending the output down a pipe to @var{command}.
+The pipeline to the command stays open until the @code{close} function
+is called.
+
+@item printf @var{fmt, expr-list}
+Format and print.
+
+@item printf @var{fmt, expr-list} > file
+Format and print to @var{file}. If @var{file} does not exist, it is
+created. If it does exist, its contents are deleted the first time the
+@code{printf} is executed.
+
+@item printf @var{fmt, expr-list} >> @var{file}
+Format and print to @var{file}. The previous contents of @var{file}
+are retained, and the output of @code{printf} is appended to the file.
+
+@item printf @var{fmt, expr-list} | @var{command}
+Format and print, sending the output down a pipe to @var{command}.
+The pipeline to the command stays open until the @code{close} function
+is called.
+@end table
+
+@code{getline} returns zero on end of file, and @minus{}1 on an error.
+In the event of an error, @code{getline} will set @code{ERRNO} to
+the value of a system-dependent string that describes the error.
+
+@node Printf Summary, Special File Summary, I/O Summary, Actions Summary
+@appendixsubsec @code{printf} Summary
+
+Conversion specification have the form
+@code{%}[@var{flag}][@var{width}][@code{.}@var{prec}]@var{format}.
+@c whew!
+Items in brackets are optional.
+
+The @code{awk} @code{printf} statement and @code{sprintf} function
+accept the following conversion specification formats:
+
+@table @code
+@item %c
+An ASCII character. If the argument used for @samp{%c} is numeric, it is
+treated as a character and printed. Otherwise, the argument is assumed to
+be a string, and the only first character of that string is printed.
+
+@item %d
+@itemx %i
+A decimal number (the integer part).
+
+@item %e
+@itemx %E
+A floating point number of the form
+@samp{@r{[}-@r{]}d.dddddde@r{[}+-@r{]}dd}.
+The @samp{%E} format uses @samp{E} instead of @samp{e}.
+
+@item %f
+A floating point number of the form
+@r{[}@code{-}@r{]}@code{ddd.dddddd}.
+
+@item %g
+@itemx %G
+Use either the @samp{%e} or @samp{%f} formats, whichever produces a shorter
+string, with non-significant zeros suppressed.
+@samp{%G} will use @samp{%E} instead of @samp{%e}.
+
+@item %o
+An unsigned octal number (again, an integer).
+
+@item %s
+A character string.
+
+@item %x
+@itemx %X
+An unsigned hexadecimal number (an integer).
+The @samp{%X} format uses @samp{A} through @samp{F} instead of
+@samp{a} through @samp{f} for decimal 10 through 15.
+
+@item %%
+A single @samp{%} character; no argument is converted.
+@end table
+
+There are optional, additional parameters that may lie between the @samp{%}
+and the control letter:
+
+@table @code
+@item -
+The expression should be left-justified within its field.
+
+@item @var{space}
+For numeric conversions, prefix positive values with a space, and
+negative values with a minus sign.
+
+@item +
+The plus sign, used before the width modifier (see below),
+says to always supply a sign for numeric conversions, even if the data
+to be formatted is positive. The @samp{+} overrides the space modifier.
+
+@item #
+Use an ``alternate form'' for certain control letters.
+For @samp{o}, supply a leading zero.
+For @samp{x}, and @samp{X}, supply a leading @samp{0x} or @samp{0X} for
+a non-zero result.
+For @samp{e}, @samp{E}, and @samp{f}, the result will always contain a
+decimal point.
+For @samp{g}, and @samp{G}, trailing zeros are not removed from the result.
+
+@item 0
+A leading @samp{0} (zero) acts as a flag, that indicates output should be
+padded with zeros instead of spaces.
+This applies even to non-numeric output formats.
+This flag only has an effect when the field width is wider than the
+value to be printed.
+
+@item @var{width}
+The field should be padded to this width. The field is normally padded
+with spaces. If the @samp{0} flag has been used, it is padded with zeros.
+
+@item .@var{prec}
+A number that specifies the precision to use when printing.
+For the @samp{e}, @samp{E}, and @samp{f} formats, this specifies the
+number of digits you want printed to the right of the decimal point.
+For the @samp{g}, and @samp{G} formats, it specifies the maximum number
+of significant digits. For the @samp{d}, @samp{o}, @samp{i}, @samp{u},
+@samp{x}, and @samp{X} formats, it specifies the minimum number of
+digits to print. For the @samp{s} format, it specifies the maximum number of
+characters from the string that should be printed.
+@end table
+
+Either or both of the @var{width} and @var{prec} values may be specified
+as @samp{*}. In that case, the particular value is taken from the argument
+list.
+
+@xref{Printf, ,Using @code{printf} Statements for Fancier Printing}.
+
+@node Special File Summary, Built-in Functions Summary, Printf Summary, Actions Summary
+@appendixsubsec Special File Names
+
+When doing I/O redirection from either @code{print} or @code{printf} into a
+file, or via @code{getline} from a file, @code{gawk} recognizes certain special
+file names internally. These file names allow access to open file descriptors
+inherited from @code{gawk}'s parent process (usually the shell). The
+file names are:
+
+@table @file
+@item /dev/stdin
+The standard input.
+
+@item /dev/stdout
+The standard output.
+
+@item /dev/stderr
+The standard error output.
+
+@item /dev/fd/@var{n}
+The file denoted by the open file descriptor @var{n}.
+@end table
+
+In addition, reading the following files provides process related information
+about the running @code{gawk} program. All returned records are terminated
+with a newline.
+
+@table @file
+@item /dev/pid
+Returns the process ID of the current process.
+
+@item /dev/ppid
+Returns the parent process ID of the current process.
+
+@item /dev/pgrpid
+Returns the process group ID of the current process.
+
+@item /dev/user
+At least four space-separated fields, containing the return values of
+the @code{getuid}, @code{geteuid}, @code{getgid}, and @code{getegid}
+system calls.
+If there are any additional fields, they are the group IDs returned by
+@code{getgroups} system call.
+(Multiple groups may not be supported on all systems.)
+@end table
+
+@noindent
+These file names may also be used on the command line to name data files.
+These file names are only recognized internally if you do not
+actually have files with these names on your system.
+
+@xref{Special Files, ,Special File Names in @code{gawk}}, for a longer description that
+provides the motivation for this feature.
+
+@node Built-in Functions Summary, Time Functions Summary, Special File Summary, Actions Summary
+@appendixsubsec Built-in Functions
+
+@code{awk} provides a number of built-in functions for performing
+numeric operations, string related operations, and I/O related operations.
+
+The built-in arithmetic functions are:
+
+@table @code
+@item atan2(@var{y}, @var{x})
+the arctangent of @var{y/x} in radians.
+
+@item cos(@var{expr})
+the cosine in radians.
+
+@item exp(@var{expr})
+the exponential function (@code{e ^ @var{expr}}).
+
+@item int(@var{expr})
+truncates to integer.
+
+@item log(@var{expr})
+the natural logarithm of @code{expr}.
+
+@item rand()
+a random number between zero and one.
+
+@item sin(@var{expr})
+the sine in radians.
+
+@item sqrt(@var{expr})
+the square root function.
+
+@item srand(@r{[}@var{expr}@r{]})
+use @var{expr} as a new seed for the random number generator. If no @var{expr}
+is provided, the time of day is used. The return value is the previous
+seed for the random number generator.
+@end table
+
+@iftex
+@page
+@end iftex
+@code{awk} has the following built-in string functions:
+
+@table @code
+@item gensub(@var{regex}, @var{subst}, @var{how} @r{[}, @var{target}@r{]})
+If @var{how} is a string beginning with @samp{g} or @samp{G}, then
+replace each match of @var{regex} in @var{target} with @var{subst}.
+Otherwise, replace the @var{how}'th occurrence. If @var{target} is not
+supplied, use @code{$0}. The return value is the changed string; the
+original @var{target} is not modified. Within @var{subst},
+@samp{\@var{n}}, where @var{n} is a digit from one to nine, can be used to
+indicate the text that matched the @var{n}'th parenthesized
+subexpression.
+
+@item gsub(@var{regex}, @var{subst} @r{[}, @var{target}@r{]})
+for each substring matching the regular expression @var{regex} in the string
+@var{target}, substitute the string @var{subst}, and return the number of
+substitutions. If @var{target} is not supplied, use @code{$0}.
+
+@item index(@var{str}, @var{search})
+returns the index of the string @var{search} in the string @var{str}, or
+zero if
+@var{search} is not present.
+
+@item length(@r{[}@var{str}@r{]})
+returns the length of the string @var{str}. The length of @code{$0}
+is returned if no argument is supplied.
+
+@item match(@var{str}, @var{regex})
+returns the position in @var{str} where the regular expression @var{regex}
+occurs, or zero if @var{regex} is not present, and sets the values of
+@code{RSTART} and @code{RLENGTH}.
+
+@item split(@var{str}, @var{arr} @r{[}, @var{regex}@r{]})
+splits the string @var{str} into the array @var{arr} on the regular expression
+@var{regex}, and returns the number of elements. If @var{regex} is omitted,
+@code{FS} is used instead. @var{regex} can be the null string, causing
+each character to be placed into its own array element.
+The array @var{arr} is cleared first.
+
+@item sprintf(@var{fmt}, @var{expr-list})
+prints @var{expr-list} according to @var{fmt}, and returns the resulting string.
+
+@item sub(@var{regex}, @var{subst} @r{[}, @var{target}@r{]})
+just like @code{gsub}, but only the first matching substring is replaced.
+
+@item substr(@var{str}, @var{index} @r{[}, @var{len}@r{]})
+returns the @var{len}-character substring of @var{str} starting at @var{index}.
+If @var{len} is omitted, the rest of @var{str} is used.
+
+@item tolower(@var{str})
+returns a copy of the string @var{str}, with all the upper-case characters in
+@var{str} translated to their corresponding lower-case counterparts.
+Non-alphabetic characters are left unchanged.
+
+@item toupper(@var{str})
+returns a copy of the string @var{str}, with all the lower-case characters in
+@var{str} translated to their corresponding upper-case counterparts.
+Non-alphabetic characters are left unchanged.
+@end table
+
+The I/O related functions are:
+
+@table @code
+@item close(@var{expr})
+Close the open file or pipe denoted by @var{expr}.
+
+@item fflush(@r{[}@var{expr}@r{]})
+Flush any buffered output for the output file or pipe denoted by @var{expr}.
+If @var{expr} is omitted, standard output is flushed.
+If @var{expr} is the null string (@code{""}), all output buffers are flushed.
+
+@item system(@var{cmd-line})
+Execute the command @var{cmd-line}, and return the exit status.
+If your operating system does not support @code{system}, calling it will
+generate a fatal error.
+
+@samp{system("")} can be used to force @code{awk} to flush any pending
+output. This is more portable, but less obvious, than calling @code{fflush}.
+@end table
+
+@node Time Functions Summary, String Constants Summary, Built-in Functions Summary, Actions Summary
+@appendixsubsec Time Functions
+
+The following two functions are available for getting the current
+time of day, and for formatting time stamps.
+
+@table @code
+@item systime()
+returns the current time of day as the number of seconds since a particular
+epoch (Midnight, January 1, 1970 UTC, on POSIX systems).
+
+@item strftime(@r{[}@var{format}@r{[}, @var{timestamp}@r{]]})
+formats @var{timestamp} according to the specification in @var{format}.
+The current time of day is used if no @var{timestamp} is supplied.
+A default format equivalent to the output of the @code{date} utility is used if
+no @var{format} is supplied.
+@xref{Time Functions, ,Functions for Dealing with Time Stamps}, for the
+details on the conversion specifiers that @code{strftime} accepts.
+@end table
+
+@iftex
+@xref{Built-in, ,Built-in Functions}, for a description of all of
+@code{awk}'s built-in functions.
+@end iftex
+
+@node String Constants Summary, , Time Functions Summary, Actions Summary
+@appendixsubsec String Constants
+
+String constants in @code{awk} are sequences of characters enclosed
+in double quotes (@code{"}). Within strings, certain @dfn{escape sequences}
+are recognized, as in C. These are:
+
+@table @code
+@item \\
+A literal backslash.
+
+@item \a
+The ``alert'' character; usually the ASCII BEL character.
+
+@item \b
+Backspace.
+
+@item \f
+Formfeed.
+
+@item \n
+Newline.
+
+@item \r
+Carriage return.
+
+@item \t
+Horizontal tab.
+
+@item \v
+Vertical tab.
+
+@item \x@var{hex digits}
+The character represented by the string of hexadecimal digits following
+the @samp{\x}. As in ANSI C, all following hexadecimal digits are
+considered part of the escape sequence. E.g., @code{"\x1B"} is a
+string containing the ASCII ESC (escape) character. (The @samp{\x}
+escape sequence is not in POSIX @code{awk}.)
+
+@item \@var{ddd}
+The character represented by the one, two, or three digit sequence of octal
+digits. Thus, @code{"\033"} is also a string containing the ASCII ESC
+(escape) character.
+
+@item \@var{c}
+The literal character @var{c}, if @var{c} is not one of the above.
+@end table
+
+The escape sequences may also be used inside constant regular expressions
+(e.g., the regexp @code{@w{/[@ \t\f\n\r\v]/}} matches whitespace
+characters).
+
+@xref{Escape Sequences}.
+
+@node Functions Summary, Historical Features, Actions Summary, Gawk Summary
+@appendixsec User-defined Functions
+
+Functions in @code{awk} are defined as follows:
+
+@example
+function @var{name}(@var{parameter list}) @{ @var{statements} @}
+@end example
+
+Actual parameters supplied in the function call are used to instantiate
+the formal parameters declared in the function. Arrays are passed by
+reference, other variables are passed by value.
+
+If there are fewer arguments passed than there are names in @var{parameter-list},
+the extra names are given the null string as their value. Extra names have the
+effect of local variables.
+
+The open-parenthesis in a function call of a user-defined function must
+immediately follow the function name, without any intervening white space.
+This is to avoid a syntactic ambiguity with the concatenation operator.
+
+The word @code{func} may be used in place of @code{function} (but not in
+POSIX @code{awk}).
+
+Use the @code{return} statement to return a value from a function.
+
+@xref{User-defined, ,User-defined Functions}.
+
+@node Historical Features, , Functions Summary, Gawk Summary
+@appendixsec Historical Features
+
+@cindex historical features
+There are two features of historical @code{awk} implementations that
+@code{gawk} supports.
+
+First, it is possible to call the @code{length} built-in function not only
+with no arguments, but even without parentheses!
+
+@example
+a = length
+@end example
+
+@noindent
+is the same as either of
+
+@example
+a = length()
+a = length($0)
+@end example
+
+@noindent
+For example:
+
+@example
+$ echo abcdef | awk '@{ print length @}'
+@print{} 6
+@end example
+
+@noindent
+This feature is marked as ``deprecated'' in the POSIX standard, and
+@code{gawk} will issue a warning about its use if @samp{--lint} is
+specified on the command line.
+(The ability to use @code{length} this way was actually an accident of the
+original Unix @code{awk} implementation. If any built-in function used
+@code{$0} as its default argument, it was possible to call that function
+without the parentheses. In particular, it was common practice to use
+the @code{length} function in this fashion, and this usage was documented
+in the @code{awk} manual page.)
+
+The other historical feature is the use of either the @code{break} statement,
+or the @code{continue} statement
+outside the body of a @code{while}, @code{for}, or @code{do} loop. Traditional
+@code{awk} implementations have treated such usage as equivalent to the
+@code{next} statement. More recent versions of Unix @code{awk} do not allow
+it. @code{gawk} supports this usage if @samp{--traditional} has been
+specified.
+
+@xref{Options, ,Command Line Options}, for more information about the
+@samp{--posix} and @samp{--lint} options.
+
+@node Installation, Notes, Gawk Summary, Top
+@appendix Installing @code{gawk}
+
+This appendix provides instructions for installing @code{gawk} on the
+various platforms that are supported by the developers. The primary
+developers support Unix (and one day, GNU), while the other ports were
+contributed. The file @file{ACKNOWLEDGMENT} in the @code{gawk}
+distribution lists the electronic mail addresses of the people who did
+the respective ports, and they are also provided in
+@ref{Bugs, , Reporting Problems and Bugs}.
+
+@menu
+* Gawk Distribution:: What is in the @code{gawk} distribution.
+* Unix Installation:: Installing @code{gawk} under various versions
+ of Unix.
+* VMS Installation:: Installing @code{gawk} on VMS.
+* PC Installation:: Installing and Compiling @code{gawk} on MS-DOS
+ and OS/2
+* Atari Installation:: Installing @code{gawk} on the Atari ST.
+* Amiga Installation:: Installing @code{gawk} on an Amiga.
+* Bugs:: Reporting Problems and Bugs.
+* Other Versions:: Other freely available @code{awk}
+ implementations.
+@end menu
+
+@node Gawk Distribution, Unix Installation, Installation, Installation
+@appendixsec The @code{gawk} Distribution
+
+This section first describes how to get the @code{gawk}
+distribution, how to extract it, and then what is in the various files and
+subdirectories.
+
+@menu
+* Getting:: How to get the distribution.
+* Extracting:: How to extract the distribution.
+* Distribution contents:: What is in the distribution.
+@end menu
+
+@node Getting, Extracting, Gawk Distribution, Gawk Distribution
+@appendixsubsec Getting the @code{gawk} Distribution
+@cindex getting @code{gawk}
+@cindex anonymous @code{ftp}
+@cindex @code{ftp}, anonymous
+@cindex Free Software Foundation
+There are three ways you can get GNU software.
+
+@enumerate
+@item
+You can copy it from someone else who already has it.
+
+@cindex Free Software Foundation
+@item
+You can order @code{gawk} directly from the Free Software Foundation.
+Software distributions are available for Unix, MS-DOS, and VMS, on
+tape, CD-ROM, or floppies (MS-DOS only). The address is:
+
+@quotation
+Free Software Foundation @*
+59 Temple Place---Suite 330 @*
+Boston, MA 02111-1307 USA @*
+Phone: +1-617-542-5942 @*
+Fax (including Japan): +1-617-542-2652 @*
+E-mail: @code{gnu@@prep.ai.mit.edu} @*
+@end quotation
+
+@noindent
+Ordering from the FSF directly contributes to the support of the foundation
+and to the production of more free software.
+
+@item
+You can get @code{gawk} by using anonymous @code{ftp} to the Internet host
+@code{ftp.gnu.ai.mit.edu}, in the directory @file{/pub/gnu}.
+
+Here is a list of alternate @code{ftp} sites from which you can obtain GNU
+software. When a site is listed as ``@var{site}@code{:}@var{directory}'' the
+@var{directory} indicates the directory where GNU software is kept.
+You should use a site that is geographically close to you.
+
+@table @asis
+@item Asia:
+@table @code
+@item cair-archive.kaist.ac.kr:/pub/gnu
+@itemx ftp.cs.titech.ac.jp
+@itemx ftp.nectec.or.th:/pub/mirrors/gnu
+@itemx utsun.s.u-tokyo.ac.jp:/ftpsync/prep
+@end table
+
+@item Australia:
+@table @code
+@item archie.au:/gnu
+(@code{archie.oz} or @code{archie.oz.au} for ACSnet)
+@end table
+
+@item Africa:
+@table @code
+@item ftp.sun.ac.za:/pub/gnu
+@end table
+
+@item Middle East:
+@table @code
+@item ftp.technion.ac.il:/pub/unsupported/gnu
+@end table
+
+@item Europe:
+@table @code
+@item archive.eu.net
+@itemx ftp.denet.dk
+@itemx ftp.eunet.ch
+@itemx ftp.funet.fi:/pub/gnu
+@itemx ftp.ieunet.ie:pub/gnu
+@itemx ftp.informatik.rwth-aachen.de:/pub/gnu
+@itemx ftp.informatik.tu-muenchen.de
+@itemx ftp.luth.se:/pub/unix/gnu
+@itemx ftp.mcc.ac.uk
+@itemx ftp.stacken.kth.se
+@itemx ftp.sunet.se:/pub/gnu
+@itemx ftp.univ-lyon1.fr:pub/gnu
+@itemx ftp.win.tue.nl:/pub/gnu
+@itemx irisa.irisa.fr:/pub/gnu
+@itemx isy.liu.se
+@itemx nic.switch.ch:/mirror/gnu
+@itemx src.doc.ic.ac.uk:/gnu
+@itemx unix.hensa.ac.uk:/pub/uunet/systems/gnu
+@end table
+
+@item South America:
+@table @code
+@item ftp.inf.utfsm.cl:/pub/gnu
+@itemx ftp.unicamp.br:/pub/gnu
+@end table
+
+@item Western Canada:
+@table @code
+@item ftp.cs.ubc.ca:/mirror2/gnu
+@end table
+
+@item USA:
+@table @code
+@item col.hp.com:/mirrors/gnu
+@itemx f.ms.uky.edu:/pub3/gnu
+@itemx ftp.cc.gatech.edu:/pub/gnu
+@itemx ftp.cs.columbia.edu:/archives/gnu/prep
+@itemx ftp.digex.net:/pub/gnu
+@itemx ftp.hawaii.edu:/mirrors/gnu
+@itemx ftp.kpc.com:/pub/mirror/gnu
+@end table
+
+@iftex
+@page
+@end iftex
+@item USA (continued):
+@table @code
+@itemx ftp.uu.net:/systems/gnu
+@itemx gatekeeper.dec.com:/pub/GNU
+@itemx jaguar.utah.edu:/gnustuff
+@itemx labrea.stanford.edu
+@itemx mrcnext.cso.uiuc.edu:/pub/gnu
+@itemx vixen.cso.uiuc.edu:/gnu
+@itemx wuarchive.wustl.edu:/systems/gnu
+@end table
+@end table
+@end enumerate
+
+@node Extracting, Distribution contents, Getting, Gawk Distribution
+@appendixsubsec Extracting the Distribution
+@code{gawk} is distributed as a @code{tar} file compressed with the
+GNU Zip program, @code{gzip}.
+
+Once you have the distribution (for example,
+@file{gawk-@value{VERSION}.0.tar.gz}), first use @code{gzip} to expand the
+file, and then use @code{tar} to extract it. You can use the following
+pipeline to produce the @code{gawk} distribution:
+
+@example
+# Under System V, add 'o' to the tar flags
+gzip -d -c gawk-@value{VERSION}.0.tar.gz | tar -xvpf -
+@end example
+
+@noindent
+This will create a directory named @file{gawk-@value{VERSION}.0} in the current
+directory.
+
+The distribution file name is of the form
+@file{gawk-@var{V}.@var{R}.@var{n}.tar.gz}.
+The @var{V} represents the major version of @code{gawk},
+the @var{R} represents the current release of version @var{V}, and
+the @var{n} represents a @dfn{patch level}, meaning that minor bugs have
+been fixed in the release. The current patch level is 0, but when
+retrieving distributions, you should get the version with the highest
+version, release, and patch level. (Note that release levels greater than
+or equal to 90 denote ``beta,'' or non-production software; you may not wish
+to retrieve such a version unless you don't mind experimenting.)
+
+If you are not on a Unix system, you will need to make other arrangements
+for getting and extracting the @code{gawk} distribution. You should consult
+a local expert.
+
+@node Distribution contents, , Extracting, Gawk Distribution
+@appendixsubsec Contents of the @code{gawk} Distribution
+
+The @code{gawk} distribution has a number of C source files,
+documentation files,
+subdirectories and files related to the configuration process
+(@pxref{Unix Installation, ,Compiling and Installing @code{gawk} on Unix}),
+and several subdirectories related to different, non-Unix,
+operating systems.
+
+@table @asis
+@item various @samp{.c}, @samp{.y}, and @samp{.h} files
+These files are the actual @code{gawk} source code.
+@end table
+
+@iftex
+@page
+@end iftex
+@table @file
+@item README
+@itemx README_d/README.*
+Descriptive files: @file{README} for @code{gawk} under Unix, and the
+rest for the various hardware and software combinations.
+
+@item INSTALL
+A file providing an overview of the configuration and installation process.
+
+@item PORTS
+A list of systems to which @code{gawk} has been ported, and which
+have successfully run the test suite.
+
+@item ACKNOWLEDGMENT
+A list of the people who contributed major parts of the code or documentation.
+
+@item ChangeLog
+A detailed list of source code changes as bugs are fixed or improvements made.
+
+@item NEWS
+A list of changes to @code{gawk} since the last release or patch.
+
+@item COPYING
+The GNU General Public License.
+
+@item FUTURES
+A brief list of features and/or changes being contemplated for future
+releases, with some indication of the time frame for the feature, based
+on its difficulty.
+
+@item LIMITATIONS
+A list of those factors that limit @code{gawk}'s performance.
+Most of these depend on the hardware or operating system software, and
+are not limits in @code{gawk} itself.
+
+@item POSIX.STD
+A description of one area where the POSIX standard for @code{awk} is
+incorrect, and how @code{gawk} handles the problem.
+
+@item PROBLEMS
+A file describing known problems with the current release.
+
+@item doc/gawk.1
+The @code{troff} source for a manual page describing @code{gawk}.
+This is distributed for the convenience of Unix users.
+
+@item doc/gawk.texi
+The Texinfo source file for this @value{DOCUMENT}.
+It should be processed with @TeX{} to produce a printed document, and
+with @code{makeinfo} to produce an Info file.
+
+@item doc/gawk.info
+The generated Info file for this @value{DOCUMENT}.
+
+@item doc/igawk.1
+The @code{troff} source for a manual page describing the @code{igawk}
+program presented in
+@ref{Igawk Program, ,An Easy Way to Use Library Functions}.
+
+@item doc/Makefile.in
+The input file used during the configuration process to generate the
+actual @file{Makefile} for creating the documentation.
+
+@item Makefile.in
+@itemx acconfig.h
+@itemx aclocal.m4
+@itemx configh.in
+@itemx configure.in
+@itemx configure
+@itemx custom.h
+@itemx missing/*
+These files and subdirectory are used when configuring @code{gawk}
+for various Unix systems. They are explained in detail in
+@ref{Unix Installation, ,Compiling and Installing @code{gawk} on Unix}.
+
+@item awklib/extract.awk
+@itemx awklib/Makefile.in
+The @file{awklib} directory contains a copy of @file{extract.awk}
+(@pxref{Extract Program, ,Extracting Programs from Texinfo Source Files}),
+which can be used to extract the sample programs from the Texinfo
+source file for this @value{DOCUMENT}, and a @file{Makefile.in} file, which
+@code{configure} uses to generate a @file{Makefile}.
+As part of the process of building @code{gawk}, the library functions from
+@ref{Library Functions, , A Library of @code{awk} Functions},
+and the @code{igawk} program from
+@ref{Igawk Program, , An Easy Way to Use Library Functions},
+are extracted into ready to use files.
+They are installed as part of the installation process.
+
+@item amiga/*
+Files needed for building @code{gawk} on an Amiga.
+@xref{Amiga Installation, ,Installing @code{gawk} on an Amiga}, for details.
+
+@item atari/*
+Files needed for building @code{gawk} on an Atari ST.
+@xref{Atari Installation, ,Installing @code{gawk} on the Atari ST}, for details.
+
+@item pc/*
+Files needed for building @code{gawk} under MS-DOS and OS/2.
+@xref{PC Installation, ,MS-DOS and OS/2 Installation and Compilation}, for details.
+
+@item vms/*
+Files needed for building @code{gawk} under VMS.
+@xref{VMS Installation, ,How to Compile and Install @code{gawk} on VMS}, for details.
+
+@item test/*
+A test suite for
+@code{gawk}. You can use @samp{make check} from the top level @code{gawk}
+directory to run your version of @code{gawk} against the test suite.
+If @code{gawk} successfully passes @samp{make check} then you can
+be confident of a successful port.
+@end table
+
+@node Unix Installation, VMS Installation, Gawk Distribution, Installation
+@appendixsec Compiling and Installing @code{gawk} on Unix
+
+Usually, you can compile and install @code{gawk} by typing only two
+commands. However, if you do use an unusual system, you may need
+to configure @code{gawk} for your system yourself.
+
+@menu
+* Quick Installation:: Compiling @code{gawk} under Unix.
+* Configuration Philosophy:: How it's all supposed to work.
+@end menu
+
+@node Quick Installation, Configuration Philosophy, Unix Installation, Unix Installation
+@appendixsubsec Compiling @code{gawk} for Unix
+
+@cindex installation, unix
+After you have extracted the @code{gawk} distribution, @code{cd}
+to @file{gawk-@value{VERSION}.0}. Like most GNU software,
+@code{gawk} is configured
+automatically for your Unix system by running the @code{configure} program.
+This program is a Bourne shell script that was generated automatically using
+GNU @code{autoconf}.
+@iftex
+(The @code{autoconf} software is
+described fully in
+@cite{Autoconf---Generating Automatic Configuration Scripts},
+which is available from the Free Software Foundation.)
+@end iftex
+@ifinfo
+(The @code{autoconf} software is described fully starting with
+@ref{Top, , Introduction, autoconf, Autoconf---Generating Automatic Configuration Scripts}.)
+@end ifinfo
+
+To configure @code{gawk}, simply run @code{configure}:
+
+@example
+sh ./configure
+@end example
+
+This produces a @file{Makefile} and @file{config.h} tailored to your system.
+The @file{config.h} file describes various facts about your system.
+You may wish to edit the @file{Makefile} to
+change the @code{CFLAGS} variable, which controls
+the command line options that are passed to the C compiler (such as
+optimization levels, or compiling for debugging).
+
+Alternatively, you can add your own values for most @code{make}
+variables, such as @code{CC} and @code{CFLAGS}, on the command line when
+running @code{configure}:
+
+@example
+CC=cc CFLAGS=-g sh ./configure
+@end example
+
+@noindent
+See the file @file{INSTALL} in the @code{gawk} distribution for
+all the details.
+
+After you have run @code{configure}, and possibly edited the @file{Makefile},
+type:
+
+@example
+make
+@end example
+
+@noindent
+and shortly thereafter, you should have an executable version of @code{gawk}.
+That's all there is to it!
+(If these steps do not work, please send in a bug report;
+@pxref{Bugs, ,Reporting Problems and Bugs}.)
+
+@node Configuration Philosophy, , Quick Installation, Unix Installation
+@appendixsubsec The Configuration Process
+
+@cindex configuring @code{gawk}
+(This section is of interest only if you know something about using the
+C language and the Unix operating system.)
+
+The source code for @code{gawk} generally attempts to adhere to formal
+standards wherever possible. This means that @code{gawk} uses library
+routines that are specified by the ANSI C standard and by the POSIX
+operating system interface standard. When using an ANSI C compiler,
+function prototypes are used to help improve the compile-time checking.
+
+Many Unix systems do not support all of either the ANSI or the
+POSIX standards. The @file{missing} subdirectory in the @code{gawk}
+distribution contains replacement versions of those subroutines that are
+most likely to be missing.
+
+The @file{config.h} file that is created by the @code{configure} program
+contains definitions that describe features of the particular operating
+system where you are attempting to compile @code{gawk}. The three things
+described by this file are what header files are available, so that
+they can be correctly included,
+what (supposedly) standard functions are actually available in your C
+libraries, and
+other miscellaneous facts about your
+variant of Unix. For example, there may not be an @code{st_blksize}
+element in the @code{stat} structure. In this case @samp{HAVE_ST_BLKSIZE}
+would be undefined.
+
+@cindex @code{custom.h} configuration file
+It is possible for your C compiler to lie to @code{configure}. It may
+do so by not exiting with an error when a library function is not
+available. To get around this, you can edit the file @file{custom.h}.
+Use an @samp{#ifdef} that is appropriate for your system, and either
+@code{#define} any constants that @code{configure} should have defined but
+didn't, or @code{#undef} any constants that @code{configure} defined and
+should not have. @file{custom.h} is automatically included by
+@file{config.h}.
+
+It is also possible that the @code{configure} program generated by
+@code{autoconf}
+will not work on your system in some other fashion. If you do have a problem,
+the file
+@file{configure.in} is the input for @code{autoconf}. You may be able to
+change this file, and generate a new version of @code{configure} that will
+work on your system. @xref{Bugs, ,Reporting Problems and Bugs}, for
+information on how to report problems in configuring @code{gawk}. The same
+mechanism may be used to send in updates to @file{configure.in} and/or
+@file{custom.h}.
+
+@node VMS Installation, PC Installation, Unix Installation, Installation
+@appendixsec How to Compile and Install @code{gawk} on VMS
+
+@c based on material from Pat Rankin <rankin@eql.caltech.edu>
+
+@cindex installation, vms
+This section describes how to compile and install @code{gawk} under VMS.
+
+@menu
+* VMS Compilation:: How to compile @code{gawk} under VMS.
+* VMS Installation Details:: How to install @code{gawk} under VMS.
+* VMS Running:: How to run @code{gawk} under VMS.
+* VMS POSIX:: Alternate instructions for VMS POSIX.
+@end menu
+
+@node VMS Compilation, VMS Installation Details, VMS Installation, VMS Installation
+@appendixsubsec Compiling @code{gawk} on VMS
+
+To compile @code{gawk} under VMS, there is a @code{DCL} command procedure that
+will issue all the necessary @code{CC} and @code{LINK} commands, and there is
+also a @file{Makefile} for use with the @code{MMS} utility. From the source
+directory, use either
+
+@example
+$ @@[.VMS]VMSBUILD.COM
+@end example
+
+@noindent
+or
+
+@example
+$ MMS/DESCRIPTION=[.VMS]DESCRIP.MMS GAWK
+@end example
+
+Depending upon which C compiler you are using, follow one of the sets
+of instructions in this table:
+
+@table @asis
+@item VAX C V3.x
+Use either @file{vmsbuild.com} or @file{descrip.mms} as is. These use
+@code{CC/OPTIMIZE=NOLINE}, which is essential for Version 3.0.
+
+@item VAX C V2.x
+You must have Version 2.3 or 2.4; older ones won't work. Edit either
+@file{vmsbuild.com} or @file{descrip.mms} according to the comments in them.
+For @file{vmsbuild.com}, this just entails removing two @samp{!} delimiters.
+Also edit @file{config.h} (which is a copy of file @file{[.config]vms-conf.h})
+and comment out or delete the two lines @samp{#define __STDC__ 0} and
+@samp{#define VAXC_BUILTINS} near the end.
+
+@item GNU C
+Edit @file{vmsbuild.com} or @file{descrip.mms}; the changes are different
+from those for VAX C V2.x, but equally straightforward. No changes to
+@file{config.h} should be needed.
+
+@item DEC C
+Edit @file{vmsbuild.com} or @file{descrip.mms} according to their comments.
+No changes to @file{config.h} should be needed.
+@end table
+
+@code{gawk} has been tested under VAX/VMS 5.5-1 using VAX C V3.2,
+GNU C 1.40 and 2.3. It should work without modifications for VMS V4.6 and up.
+
+@node VMS Installation Details, VMS Running, VMS Compilation, VMS Installation
+@appendixsubsec Installing @code{gawk} on VMS
+
+To install @code{gawk}, all you need is a ``foreign'' command, which is
+a @code{DCL} symbol whose value begins with a dollar sign. For example:
+
+@example
+$ GAWK :== $disk1:[gnubin]GAWK
+@end example
+
+@noindent
+(Substitute the actual location of @code{gawk.exe} for
+@samp{$disk1:[gnubin]}.) The symbol should be placed in the
+@file{login.com} of any user who wishes to run @code{gawk},
+so that it will be defined every time the user logs on.
+Alternatively, the symbol may be placed in the system-wide
+@file{sylogin.com} procedure, which will allow all users
+to run @code{gawk}.
+
+Optionally, the help entry can be loaded into a VMS help library:
+
+@example
+$ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP
+@end example
+
+@noindent
+(You may want to substitute a site-specific help library rather than
+the standard VMS library @samp{HELPLIB}.) After loading the help text,
+
+@example
+$ HELP GAWK
+@end example
+
+@noindent
+will provide information about both the @code{gawk} implementation and the
+@code{awk} programming language.
+
+The logical name @samp{AWK_LIBRARY} can designate a default location
+for @code{awk} program files. For the @samp{-f} option, if the specified
+filename has no device or directory path information in it, @code{gawk}
+will look in the current directory first, then in the directory specified
+by the translation of @samp{AWK_LIBRARY} if the file was not found.
+If after searching in both directories, the file still is not found,
+then @code{gawk} appends the suffix @samp{.awk} to the filename and the
+file search will be re-tried. If @samp{AWK_LIBRARY} is not defined, that
+portion of the file search will fail benignly.
+
+@node VMS Running, VMS POSIX, VMS Installation Details, VMS Installation
+@appendixsubsec Running @code{gawk} on VMS
+
+Command line parsing and quoting conventions are significantly different
+on VMS, so examples in this @value{DOCUMENT} or from other sources often need minor
+changes. They @emph{are} minor though, and all @code{awk} programs
+should run correctly.
+
+Here are a couple of trivial tests:
+
+@example
+$ gawk -- "BEGIN @{print ""Hello, World!""@}"
+$ gawk -"W" version
+! could also be -"W version" or "-W version"
+@end example
+
+@noindent
+Note that upper-case and mixed-case text must be quoted.
+
+The VMS port of @code{gawk} includes a @code{DCL}-style interface in addition
+to the original shell-style interface (see the help entry for details).
+One side-effect of dual command line parsing is that if there is only a
+single parameter (as in the quoted string program above), the command
+becomes ambiguous. To work around this, the normally optional @samp{--}
+flag is required to force Unix style rather than @code{DCL} parsing. If any
+other dash-type options (or multiple parameters such as data files to be
+processed) are present, there is no ambiguity and @samp{--} can be omitted.
+
+The default search path when looking for @code{awk} program files specified
+by the @samp{-f} option is @code{"SYS$DISK:[],AWK_LIBRARY:"}. The logical
+name @samp{AWKPATH} can be used to override this default. The format
+of @samp{AWKPATH} is a comma-separated list of directory specifications.
+When defining it, the value should be quoted so that it retains a single
+translation, and not a multi-translation @code{RMS} searchlist.
+
+@node VMS POSIX, , VMS Running, VMS Installation
+@appendixsubsec Building and Using @code{gawk} on VMS POSIX
+
+Ignore the instructions above, although @file{vms/gawk.hlp} should still
+be made available in a help library. Make sure that the @code{configure}
+script is executable; use @samp{chmod +x}
+on it if necessary. Then execute the following commands:
+
+@example
+@group
+$ POSIX
+psx> CC=vms/posix-cc.sh configure
+psx> CC=c89 make gawk
+@end group
+@end example
+
+@noindent
+The first command will construct files @file{config.h} and @file{Makefile}
+out of templates. The second command will compile and link @code{gawk}.
+@ignore
+Due to a @code{make} bug in VMS POSIX V1.0 and V1.1,
+the file @file{awktab.c} must be given as an explicit target or it will
+not be built and the final link step will fail.
+@end ignore
+Ignore the warning
+@code{"Could not find lib m in lib list"}; it is harmless, caused by the
+explicit use of @samp{-lm} as a linker option which is not needed
+under VMS POSIX. Under V1.1 (but not V1.0) a problem with the @code{yacc}
+skeleton @file{/etc/yyparse.c} will cause a compiler warning for
+@file{awktab.c}, followed by a linker warning about compilation warnings
+in the resulting object module. These warnings can be ignored.
+
+Once built, @code{gawk} will work like any other shell utility. Unlike
+the normal VMS port of @code{gawk}, no special command line manipulation is
+needed in the VMS POSIX environment.
+
+@c Rewritten by Scott Deifik <scottd@amgen.com>
+@c and Darrel Hankerson <hankedr@mail.auburn.edu>
+@node PC Installation, Atari Installation, VMS Installation, Installation
+@appendixsec MS-DOS and OS/2 Installation and Compilation
+
+@cindex installation, MS-DOS and OS/2
+If you have received a binary distribution prepared by the DOS
+maintainers, then @code{gawk} and the necessary support files will appear
+under the @file{gnu} directory, with executables in @file{gnu/bin},
+libraries in @file{gnu/lib/awk}, and manual pages under @file{gnu/man}.
+This is designed for easy installation to a @file{/gnu} directory on your
+drive, but the files can be installed anywhere provided @code{AWKPATH} is
+set properly. Regardless of the installation directory, the first line of
+@file{igawk.cmd} and @file{igawk.bat} (in @file{gnu/bin}) may need to be
+edited.
+
+The binary distribution will contain a separate file describing the
+contents. In particular, it may include more than one version of the
+@code{gawk} executable. OS/2 binary distributions may have a
+different arrangement, but installation is similar.
+
+The OS/2 and MS-DOS versions of @code{gawk} search for program files as
+described in @ref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}.
+However, semicolons (rather than colons) separate elements
+in the @code{AWKPATH} variable. If @code{AWKPATH} is not set or is empty,
+then the default search path is @code{@w{".;c:/lib/awk;c:/gnu/lib/awk"}}.
+
+An @code{sh}-like shell (as opposed to @code{command.com} under MS-DOS
+or @code{cmd.exe} under OS/2) may be useful for @code{awk} programming.
+Ian Stewartson has written an excellent shell for MS-DOS and OS/2, and a
+@code{ksh} clone and GNU Bash are available for OS/2. The file
+@file{README_d/README.pc} in the @code{gawk} distribution contains
+information on these shells. Users of Stewartson's shell on DOS should
+examine its documentation on handling of command-lines. In particular,
+the setting for @code{gawk} in the shell configuration may need to be
+changed, and the @code{ignoretype} option may also be of interest.
+
+@code{gawk} can be compiled for MS-DOS and OS/2 using the GNU development tools
+from DJ Delorie (DJGPP, MS-DOS-only) or Eberhard Mattes (EMX, MS-DOS and OS/2).
+Microsoft C can be used to build 16-bit versions for MS-DOS and OS/2. The file
+@file{README_d/README.pc} in the @code{gawk} distribution contains additional
+notes, and @file{pc/Makefile} contains important notes on compilation options.
+
+To build @code{gawk}, copy the files in the @file{pc} directory to the
+directory with the rest of the @code{gawk} sources. The @file{Makefile}
+contains a configuration section with comments, and may need to be
+edited in order to work with your @code{make} utility.
+
+The @file{Makefile} contains a number of targets for building various MS-DOS
+and OS/2 versions. A list of targets will be printed if the @code{make}
+command is given without a target. As an example, to build @code{gawk}
+using the DJGPP tools, enter @samp{make djgpp}.
+
+Using @code{make} to run the standard tests and to install @code{gawk}
+requires additional Unix-like tools, including @code{sh}, @code{sed}, and
+@code{cp}. In order to run the tests, the @file{test/*.ok} files may need to
+be converted so that they have the usual DOS-style end-of-line markers. Most
+of the tests will work properly with Stewartson's shell along with the
+companion utilities or appropriate GNU utilities. However, some editing of
+@file{test/Makefile} is required. It is recommended that the file
+@file{pc/Makefile.tst} be copied to @file{test/Makefile} as a
+replacement. Details can be found in @file{README_d/README.pc}.
+
+@node Atari Installation, Amiga Installation, PC Installation, Installation
+@appendixsec Installing @code{gawk} on the Atari ST
+
+@c based on material from Michal Jaegermann <michal@gortel.phys.ualberta.ca>
+
+@cindex atari
+@cindex installation, atari
+There are no substantial differences when installing @code{gawk} on
+various Atari models. Compiled @code{gawk} executables do not require
+a large amount of memory with most @code{awk} programs and should run on all
+Motorola processor based models (called further ST, even if that is not
+exactly right).
+
+In order to use @code{gawk}, you need to have a shell, either text or
+graphics, that does not map all the characters of a command line to
+upper-case. Maintaining case distinction in option flags is very
+important (@pxref{Options, ,Command Line Options}).
+These days this is the default, and it may only be a problem for some
+very old machines. If your system does not preserve the case of option
+flags, you will need to upgrade your tools. Support for I/O
+redirection is necessary to make it easy to import @code{awk} programs
+from other environments. Pipes are nice to have, but not vital.
+
+@menu
+* Atari Compiling:: Compiling @code{gawk} on Atari
+* Atari Using:: Running @code{gawk} on Atari
+@end menu
+
+@node Atari Compiling, Atari Using, Atari Installation, Atari Installation
+@appendixsubsec Compiling @code{gawk} on the Atari ST
+
+A proper compilation of @code{gawk} sources when @code{sizeof(int)}
+differs from @code{sizeof(void *)} requires an ANSI C compiler. An initial
+port was done with @code{gcc}. You may actually prefer executables
+where @code{int}s are four bytes wide, but the other variant works as well.
+
+You may need quite a bit of memory when trying to recompile the @code{gawk}
+sources, as some source files (@file{regex.c} in particular) are quite
+big. If you run out of memory compiling such a file, try reducing the
+optimization level for this particular file; this may help.
+
+@cindex Linux
+With a reasonable shell (Bash will do), and in particular if you run
+Linux, MiNT or a similar operating system, you have a pretty good
+chance that the @code{configure} utility will succeed. Otherwise
+sample versions of @file{config.h} and @file{Makefile.st} are given in the
+@file{atari} subdirectory and can be edited and copied to the
+corresponding files in the main source directory. Even if
+@code{configure} produced something, it might be advisable to compare
+its results with the sample versions and possibly make adjustments.
+
+Some @code{gawk} source code fragments depend on a preprocessor define
+@samp{atarist}. This basically assumes the TOS environment with @code{gcc}.
+Modify these sections as appropriate if they are not right for your
+environment. Also see the remarks about @code{AWKPATH} and @code{envsep} in
+@ref{Atari Using, ,Running @code{gawk} on the Atari ST}.
+
+As shipped, the sample @file{config.h} claims that the @code{system}
+function is missing from the libraries, which is not true, and an
+alternative implementation of this function is provided in
+@file{atari/system.c}. Depending upon your particular combination of
+shell and operating system, you may wish to change the file to indicate
+that @code{system} is available.
+
+@node Atari Using, , Atari Compiling, Atari Installation
+@appendixsubsec Running @code{gawk} on the Atari ST
+
+An executable version of @code{gawk} should be placed, as usual,
+anywhere in your @code{PATH} where your shell can find it.
+
+While executing, @code{gawk} creates a number of temporary files. When
+using @code{gcc} libraries for TOS, @code{gawk} looks for either of
+the environment variables @code{TEMP} or @code{TMPDIR}, in that order.
+If either one is found, its value is assumed to be a directory for
+temporary files. This directory must exist, and if you can spare the
+memory, it is a good idea to put it on a RAM drive. If neither
+@code{TEMP} nor @code{TMPDIR} are found, then @code{gawk} uses the
+current directory for its temporary files.
+
+The ST version of @code{gawk} searches for its program files as described in
+@ref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}.
+The default value for the @code{AWKPATH} variable is taken from
+@code{DEFPATH} defined in @file{Makefile}. The sample @code{gcc}/TOS
+@file{Makefile} for the ST in the distribution sets @code{DEFPATH} to
+@code{@w{".,c:\lib\awk,c:\gnu\lib\awk"}}. The search path can be
+modified by explicitly setting @code{AWKPATH} to whatever you wish.
+Note that colons cannot be used on the ST to separate elements in the
+@code{AWKPATH} variable, since they have another, reserved, meaning.
+Instead, you must use a comma to separate elements in the path. When
+recompiling, the separating character can be modified by initializing
+the @code{envsep} variable in @file{atari/gawkmisc.atr} to another
+value.
+
+Although @code{awk} allows great flexibility in doing I/O redirections
+from within a program, this facility should be used with care on the ST
+running under TOS. In some circumstances the OS routines for file
+handle pool processing lose track of certain events, causing the
+computer to crash, and requiring a reboot. Often a warm reboot is
+sufficient. Fortunately, this happens infrequently, and in rather
+esoteric situations. In particular, avoid having one part of an
+@code{awk} program using @code{print} statements explicitly redirected
+to @code{"/dev/stdout"}, while other @code{print} statements use the
+default standard output, and a calling shell has redirected standard
+output to a file.
+
+When @code{gawk} is compiled with the ST version of @code{gcc} and its
+usual libraries, it will accept both @samp{/} and @samp{\} as path separators.
+While this is convenient, it should be remembered that this removes one,
+technically valid, character (@samp{/}) from your file names, and that
+it may create problems for external programs, called via the @code{system}
+function, which may not support this convention. Whenever it is possible
+that a file created by @code{gawk} will be used by some other program,
+use only backslashes. Also remember that in @code{awk}, backslashes in
+strings have to be doubled in order to get literal backslashes
+(@pxref{Escape Sequences}).
+
+@node Amiga Installation, Bugs, Atari Installation, Installation
+@appendixsec Installing @code{gawk} on an Amiga
+
+@cindex amiga
+@cindex installation, amiga
+You can install @code{gawk} on an Amiga system using a Unix emulation
+environment available via anonymous @code{ftp} from
+@code{wuarchive.wustl.edu} in the directory @file{pub/aminet/dev/gcc}.
+This includes a shell based on @code{pdksh}. The primary component of
+this environment is a Unix emulation library, @file{ixemul.lib}.
+@c could really use more background here, who wrote this, etc.
+
+A more complete distribution for the Amiga is available on
+the FreshFish CD-ROM from:
+
+@quotation
+Amiga Library Services @*
+610 North Alma School Road, Suite 18 @*
+Chandler, AZ 85224 USA @*
+Phone: +1-602-491-0048 @*
+FAX: +1-602-491-0048 @*
+E-mail: @code{orders@@amigalib.com}
+@end quotation
+
+Once you have the distribution, you can configure @code{gawk} simply by
+running @code{configure}:
+
+@example
+configure -v m68k-cbm-amigados
+@end example
+
+Then run @code{make}, and you should be all set!
+(If these steps do not work, please send in a bug report;
+@pxref{Bugs, ,Reporting Problems and Bugs}.)
+
+@node Bugs, Other Versions, Amiga Installation, Installation
+@appendixsec Reporting Problems and Bugs
+
+If you have problems with @code{gawk} or think that you have found a bug,
+please report it to the developers; we cannot promise to do anything
+but we might well want to fix it.
+
+Before reporting a bug, make sure you have actually found a real bug.
+Carefully reread the documentation and see if it really says you can do
+what you're trying to do. If it's not clear whether you should be able
+to do something or not, report that too; it's a bug in the documentation!
+
+Before reporting a bug or trying to fix it yourself, try to isolate it
+to the smallest possible @code{awk} program and input data file that
+reproduces the problem. Then send us the program and data file,
+some idea of what kind of Unix system you're using, and the exact results
+@code{gawk} gave you. Also say what you expected to occur; this will help
+us decide whether the problem was really in the documentation.
+
+Once you have a precise problem, there are two e-mail addresses you
+can send mail to.
+
+@table @asis
+@item Internet:
+@samp{bug-gnu-utils@@prep.ai.mit.edu}
+
+@item UUCP:
+@samp{uunet!prep.ai.mit.edu!bug-gnu-utils}
+@end table
+
+Please include the
+version number of @code{gawk} you are using. You can get this information
+with the command @samp{gawk --version}.
+You should send a carbon copy of your mail to Arnold Robbins, who can
+be reached at @samp{arnold@@gnu.ai.mit.edu}.
+
+@cindex @code{comp.lang.awk}
+@strong{Important!} Do @emph{not} try to report bugs in @code{gawk} by
+posting to the Usenet/Internet newsgroup @code{comp.lang.awk}.
+While the @code{gawk} developers do occasionally read this newsgroup,
+there is no guarantee that we will see your posting. The steps described
+above are the official, recognized ways for reporting bugs.
+
+Non-bug suggestions are always welcome as well. If you have questions
+about things that are unclear in the documentation or are just obscure
+features, ask Arnold Robbins; he will try to help you out, although he
+may not have the time to fix the problem. You can send him electronic
+mail at the Internet address above.
+
+If you find bugs in one of the non-Unix ports of @code{gawk}, please send
+an electronic mail message to the person who maintains that port. They
+are listed below, and also in the @file{README} file in the @code{gawk}
+distribution. Information in the @code{README} file should be considered
+authoritative if it conflicts with this @value{DOCUMENT}.
+
+The people maintaining the non-Unix ports of @code{gawk} are:
+
+@cindex Deifik, Scott
+@cindex Fish, Fred
+@cindex Hankerson, Darrel
+@cindex Jaegermann, Michal
+@cindex Rankin, Pat
+@cindex Rommel, Kai Uwe
+@table @asis
+@item MS-DOS
+Scott Deifik, @samp{scottd@@amgen.com}, and
+Darrel Hankerson, @samp{hankedr@@mail.auburn.edu}.
+
+@item OS/2
+Kai Uwe Rommel, @samp{rommel@@ars.de}.
+
+@item VMS
+Pat Rankin, @samp{rankin@@eql.caltech.edu}.
+
+@item Atari ST
+Michal Jaegermann, @samp{michal@@gortel.phys.ualberta.ca}.
+
+@item Amiga
+Fred Fish, @samp{fnf@@amigalib.com}.
+@end table
+
+If your bug is also reproducible under Unix, please send copies of your
+report to the general GNU bug list, as well as to Arnold Robbins, at the
+addresses listed above.
+
+@node Other Versions, , Bugs, Installation
+@appendixsec Other Freely Available @code{awk} Implementations
+
+There are two other freely available @code{awk} implementations.
+This section briefly describes where to get them.
+
+@table @asis
+@cindex Kernighan, Brian
+@cindex anonymous @code{ftp}
+@cindex @code{ftp}, anonymous
+@item Unix @code{awk}
+Brian Kernighan has been able to make his implementation of
+@code{awk} freely available. You can get it via anonymous @code{ftp}
+to the host @code{@w{netlib.att.com}}. Change directory to
+@file{/netlib/research}. Use ``binary'' or ``image'' mode, and
+retrieve @file{awk.bundle.Z}.
+
+This is a shell archive that has been compressed with the @code{compress}
+utility. It can be uncompressed with either @code{uncompress} or the
+GNU @code{gunzip} utility.
+
+This version requires an ANSI C compiler; GCC (the GNU C compiler)
+works quite nicely.
+
+@cindex Brennan, Michael
+@cindex @code{mawk}
+@item @code{mawk}
+Michael Brennan has written an independent implementation of @code{awk},
+called @code{mawk}. It is available under the GPL
+(@pxref{Copying, ,GNU GENERAL PUBLIC LICENSE}),
+just as @code{gawk} is.
+
+You can get it via anonymous @code{ftp} to the host
+@code{@w{oxy.edu}}. Change directory to @file{/public}. Use ``binary''
+or ``image'' mode, and retrieve @file{mawk1.2.1.tar.gz} (or the latest
+version that is there).
+
+@code{gunzip} may be used to decompress this file. Installation
+is similar to @code{gawk}'s
+(@pxref{Unix Installation, , Compiling and Installing @code{gawk} on Unix}).
+@end table
+
+@node Notes, Glossary, Installation, Top
+@appendix Implementation Notes
+
+This appendix contains information mainly of interest to implementors and
+maintainers of @code{gawk}. Everything in it applies specifically to
+@code{gawk}, and not to other implementations.
+
+@menu
+* Compatibility Mode:: How to disable certain @code{gawk} extensions.
+* Additions:: Making Additions To @code{gawk}.
+* Future Extensions:: New features that may be implemented one day.
+* Improvements:: Suggestions for improvements by volunteers.
+@end menu
+
+@node Compatibility Mode, Additions, Notes, Notes
+@appendixsec Downward Compatibility and Debugging
+
+@xref{POSIX/GNU, ,Extensions in @code{gawk} Not in POSIX @code{awk}},
+for a summary of the GNU extensions to the @code{awk} language and program.
+All of these features can be turned off by invoking @code{gawk} with the
+@samp{--traditional} option, or with the @samp{--posix} option.
+
+If @code{gawk} is compiled for debugging with @samp{-DDEBUG}, then there
+is one more option available on the command line:
+
+@table @code
+@item -W parsedebug
+@itemx --parsedebug
+Print out the parse stack information as the program is being parsed.
+@end table
+
+This option is intended only for serious @code{gawk} developers,
+and not for the casual user. It probably has not even been compiled into
+your version of @code{gawk}, since it slows down execution.
+
+@node Additions, Future Extensions, Compatibility Mode, Notes
+@appendixsec Making Additions to @code{gawk}
+
+If you should find that you wish to enhance @code{gawk} in a significant
+fashion, you are perfectly free to do so. That is the point of having
+free software; the source code is available, and you are free to change
+it as you wish (@pxref{Copying, ,GNU GENERAL PUBLIC LICENSE}).
+
+This section discusses the ways you might wish to change @code{gawk},
+and any considerations you should bear in mind.
+
+@menu
+* Adding Code:: Adding code to the main body of @code{gawk}.
+* New Ports:: Porting @code{gawk} to a new operating system.
+@end menu
+
+@node Adding Code, New Ports, Additions, Additions
+@appendixsubsec Adding New Features
+
+@cindex adding new features
+@cindex features, adding
+You are free to add any new features you like to @code{gawk}.
+However, if you want your changes to be incorporated into the @code{gawk}
+distribution, there are several steps that you need to take in order to
+make it possible for me to include to your changes.
+
+@enumerate 1
+@item
+Get the latest version.
+It is much easier for me to integrate changes if they are relative to
+the most recent distributed version of @code{gawk}. If your version of
+@code{gawk} is very old, I may not be able to integrate them at all.
+@xref{Getting, ,Getting the @code{gawk} Distribution},
+for information on getting the latest version of @code{gawk}.
+
+@item
+@iftex
+Follow the @cite{GNU Coding Standards}.
+@end iftex
+@ifinfo
+See @inforef{Top, , Version, standards, GNU Coding Standards}.
+@end ifinfo
+This document describes how GNU software should be written. If you haven't
+read it, please do so, preferably @emph{before} starting to modify @code{gawk}.
+(The @cite{GNU Coding Standards} are available as part of the Autoconf
+distribution, from the FSF.)
+
+@cindex @code{gawk} coding style
+@cindex coding style used in @code{gawk}
+@item
+Use the @code{gawk} coding style.
+The C code for @code{gawk} follows the instructions in the
+@cite{GNU Coding Standards}, with minor exceptions. The code is formatted
+using the traditional ``K&R'' style, particularly as regards the placement
+of braces and the use of tabs. In brief, the coding rules for @code{gawk}
+are:
+
+@itemize @bullet
+@item
+Use old style (non-prototype) function headers when defining functions.
+
+@item
+Put the name of the function at the beginning of its own line.
+
+@item
+Put the return type of the function, even if it is @code{int}, on the
+line above the line with the name and arguments of the function.
+
+@item
+The declarations for the function arguments should not be indented.
+
+@item
+Put spaces around parentheses used in control structures
+(@code{if}, @code{while}, @code{for}, @code{do}, @code{switch}
+and @code{return}).
+
+@item
+Do not put spaces in front of parentheses used in function calls.
+
+@item
+Put spaces around all C operators, and after commas in function calls.
+
+@item
+Do not use the comma operator to produce multiple side-effects, except
+in @code{for} loop initialization and increment parts, and in macro bodies.
+
+@item
+Use real tabs for indenting, not spaces.
+
+@item
+Use the ``K&R'' brace layout style.
+
+@item
+Use comparisons against @code{NULL} and @code{'\0'} in the conditions of
+@code{if}, @code{while} and @code{for} statements, and in the @code{case}s
+of @code{switch} statements, instead of just the
+plain pointer or character value.
+
+@item
+Use the @code{TRUE}, @code{FALSE}, and @code{NULL} symbolic constants,
+and the character constant @code{'\0'} where appropriate, instead of @code{1}
+and @code{0}.
+
+@item
+Provide one-line descriptive comments for each function.
+
+@item
+Do not use @samp{#elif}. Many older Unix C compilers cannot handle it.
+@end itemize
+
+If I have to reformat your code to follow the coding style used in
+@code{gawk}, I may not bother.
+
+@item
+Be prepared to sign the appropriate paperwork.
+In order for the FSF to distribute your changes, you must either place
+those changes in the public domain, and submit a signed statement to that
+effect, or assign the copyright in your changes to the FSF.
+Both of these actions are easy to do, and @emph{many} people have done so
+already. If you have questions, please contact me
+(@pxref{Bugs, , Reporting Problems and Bugs}),
+or @code{gnu@@prep.ai.mit.edu}.
+
+@item
+Update the documentation.
+Along with your new code, please supply new sections and or chapters
+for this @value{DOCUMENT}. If at all possible, please use real
+Texinfo, instead of just supplying unformatted ASCII text (although
+even that is better than no documentation at all).
+Conventions to be followed in @cite{@value{TITLE}} are provided
+after the @samp{@@bye} at the end of the Texinfo source file.
+If possible, please update the man page as well.
+
+You will also have to sign paperwork for your documentation changes.
+
+@item
+Submit changes as context diffs or unified diffs.
+Use @samp{diff -c -r -N} or @samp{diff -u -r -N} to compare
+the original @code{gawk} source tree with your version.
+(I find context diffs to be more readable, but unified diffs are
+more compact.)
+I recommend using the GNU version of @code{diff}.
+Send the output produced by either run of @code{diff} to me when you
+submit your changes.
+@xref{Bugs, , Reporting Problems and Bugs}, for the electronic mail
+information.
+
+Using this format makes it easy for me to apply your changes to the
+master version of the @code{gawk} source code (using @code{patch}).
+If I have to apply the changes manually, using a text editor, I may
+not do so, particularly if there are lots of changes.
+@end enumerate
+
+Although this sounds like a lot of work, please remember that while you
+may write the new code, I have to maintain it and support it, and if it
+isn't possible for me to do that with a minimum of extra work, then I
+probably will not.
+
+@node New Ports, , Adding Code, Additions
+@appendixsubsec Porting @code{gawk} to a New Operating System
+
+@cindex porting @code{gawk}
+If you wish to port @code{gawk} to a new operating system, there are
+several steps to follow.
+
+@enumerate 1
+@item
+Follow the guidelines in
+@ref{Adding Code, ,Adding New Features},
+concerning coding style, submission of diffs, and so on.
+
+@item
+When doing a port, bear in mind that your code must co-exist peacefully
+with the rest of @code{gawk}, and the other ports. Avoid gratuitous
+changes to the system-independent parts of the code. If at all possible,
+avoid sprinkling @samp{#ifdef}s just for your port throughout the
+code.
+
+If the changes needed for a particular system affect too much of the
+code, I probably will not accept them. In such a case, you will, of course,
+be able to distribute your changes on your own, as long as you comply
+with the GPL
+(@pxref{Copying, ,GNU GENERAL PUBLIC LICENSE}).
+
+@item
+A number of the files that come with @code{gawk} are maintained by other
+people at the Free Software Foundation. Thus, you should not change them
+unless it is for a very good reason. I.e.@: changes are not out of the
+question, but changes to these files will be scrutinized extra carefully.
+The files are @file{alloca.c}, @file{getopt.h}, @file{getopt.c},
+@file{getopt1.c}, @file{regex.h}, @file{regex.c}, @file{dfa.h},
+@file{dfa.c}, @file{install-sh}, and @file{mkinstalldirs}.
+
+@item
+Be willing to continue to maintain the port.
+Non-Unix operating systems are supported by volunteers who maintain
+the code needed to compile and run @code{gawk} on their systems. If no-one
+volunteers to maintain a port, that port becomes unsupported, and it may
+be necessary to remove it from the distribution.
+
+@item
+Supply an appropriate @file{gawkmisc.???} file.
+Each port has its own @file{gawkmisc.???} that implements certain
+operating system specific functions. This is cleaner than a plethora of
+@samp{#ifdef}s scattered throughout the code. The @file{gawkmisc.c} in
+the main source directory includes the appropriate
+@file{gawkmisc.???} file from each subdirectory.
+Be sure to update it as well.
+
+Each port's @file{gawkmisc.???} file has a suffix reminiscent of the machine
+or operating system for the port. For example, @file{pc/gawkmisc.pc} and
+@file{vms/gawkmisc.vms}. The use of separate suffixes, instead of plain
+@file{gawkmisc.c}, makes it possible to move files from a port's subdirectory
+into the main subdirectory, without accidentally destroying the real
+@file{gawkmisc.c} file. (Currently, this is only an issue for the MS-DOS
+and OS/2 ports.)
+
+@item
+Supply a @file{Makefile} and any other C source and header files that are
+necessary for your operating system. All your code should be in a
+separate subdirectory, with a name that is the same as, or reminiscent
+of, either your operating system or the computer system. If possible,
+try to structure things so that it is not necessary to move files out
+of the subdirectory into the main source directory. If that is not
+possible, then be sure to avoid using names for your files that
+duplicate the names of files in the main source directory.
+
+@item
+Update the documentation.
+Please write a section (or sections) for this @value{DOCUMENT} describing the
+installation and compilation steps needed to install and/or compile
+@code{gawk} for your system.
+
+@item
+Be prepared to sign the appropriate paperwork.
+In order for the FSF to distribute your code, you must either place
+your code in the public domain, and submit a signed statement to that
+effect, or assign the copyright in your code to the FSF.
+@ifinfo
+Both of these actions are easy to do, and @emph{many} people have done so
+already. If you have questions, please contact me, or
+@code{gnu@@prep.ai.mit.edu}.
+@end ifinfo
+@end enumerate
+
+Following these steps will make it much easier to integrate your changes
+into @code{gawk}, and have them co-exist happily with the code for other
+operating systems that is already there.
+
+In the code that you supply, and that you maintain, feel free to use a
+coding style and brace layout that suits your taste.
+
+@c why should this be needed? sigh
+@iftex
+@page
+@end iftex
+@node Future Extensions, Improvements, Additions, Notes
+@appendixsec Probable Future Extensions
+
+@ignore
+From emory!scalpel.netlabs.com!lwall Tue Oct 31 12:43:17 1995
+Return-Path: <emory!scalpel.netlabs.com!lwall>
+Message-Id: <9510311732.AA28472@scalpel.netlabs.com>
+To: arnold@skeeve.atl.ga.us (Arnold D. Robbins)
+Subject: Re: May I quote you?
+In-Reply-To: Your message of "Tue, 31 Oct 95 09:11:00 EST."
+ <m0tAHPQ-00014MC@skeeve.atl.ga.us>
+Date: Tue, 31 Oct 95 09:32:46 -0800
+From: Larry Wall <emory!scalpel.netlabs.com!lwall>
+
+: Greetings. I am working on the release of gawk 3.0. Part of it will be a
+: thoroughly updated manual. One of the sections deals with planned future
+: extensions and enhancements. I have the following at the beginning
+: of it:
+:
+: @cindex PERL
+: @cindex Wall, Larry
+: @display
+: @i{AWK is a language similar to PERL, only considerably more elegant.} @*
+: Arnold Robbins
+: @sp 1
+: @i{Hey!} @*
+: Larry Wall
+: @end display
+:
+: Before I actually release this for publication, I wanted to get your
+: permission to quote you. (Hopefully, in the spirit of much of GNU, the
+: implied humor is visible... :-)
+
+I think that would be fine.
+
+Larry
+@end ignore
+
+@cindex PERL
+@cindex Wall, Larry
+@display
+@i{AWK is a language similar to PERL, only considerably more elegant.}
+Arnold Robbins
+
+@i{Hey!}
+Larry Wall
+@end display
+
+This section briefly lists extensions and possible improvements
+that indicate the directions we are
+currently considering for @code{gawk}. The file @file{FUTURES} in the
+@code{gawk} distributions lists these extensions as well.
+
+This is a list of probable future changes that will be usable by the
+@code{awk} language programmer.
+
+@c these are ordered by likelihood
+@table @asis
+@item Localization
+The GNU project is starting to support multiple languages.
+It will at least be possible to make @code{gawk} print its warnings and
+error messages in languages other than English.
+It may be possible for @code{awk} programs to also use the multiple
+language facilities, separate from @code{gawk} itself.
+
+@item Databases
+It may be possible to map a GDBM/NDBM/SDBM file into an @code{awk} array.
+
+@item A @code{PROCINFO} Array
+The special files that provide process-related information
+(@pxref{Special Files, ,Special File Names in @code{gawk}})
+may be superseded by a @code{PROCINFO} array that would provide the same
+information, in an easier to access fashion.
+
+@item More @code{lint} warnings
+There are more things that could be checked for portability.
+
+@item Control of subprocess environment
+Changes made in @code{gawk} to the array @code{ENVIRON} may be
+propagated to subprocesses run by @code{gawk}.
+
+@ignore
+@item @code{RECLEN} variable for fixed length records
+Along with @code{FIELDWIDTHS}, this would speed up the processing of
+fixed-length records.
+
+@item A @code{restart} keyword
+After modifying @code{$0}, @code{restart} would restart the pattern
+matching loop, without reading a new record from the input.
+
+@item A @samp{|&} redirection
+The @samp{|&} redirection, in place of @samp{|}, would open a two-way
+pipeline for communication with a sub-process (via @code{getline} and
+@code{print} and @code{printf}).
+
+@item Function valued variables
+It would be possible to assign the name of a user-defined or built-in
+function to a regular @code{awk} variable, and then call the function
+indirectly, by using the regular variable. This would make it possible
+to write general purpose sorting and comparing routines, for example,
+by simply passing the name of one function into another.
+
+@item A built-in @code{stat} function
+The @code{stat} function would provide an easy-to-use hook to the
+@code{stat} system call so that @code{awk} programs could determine information
+about files.
+
+@item A built-in @code{ftw} function
+Combined with function valued variables and the @code{stat} function,
+@code{ftw} (file tree walk) would make it easy for an @code{awk} program
+to walk an entire file tree.
+@end ignore
+@end table
+
+This is a list of probable improvements that will make @code{gawk}
+perform better.
+
+@table @asis
+@item An Improved Version of @code{dfa}
+The @code{dfa} pattern matcher from GNU @code{grep} has some
+problems. Either a new version or a fixed one will deal with some
+important regexp matching issues.
+
+@item Use of @code{mmap}
+On systems that support the @code{mmap} system call, its use would provide
+much faster file input, and considerably simplified input buffer management.
+
+@item Use of GNU @code{malloc}
+The GNU version of @code{malloc} could potentially speed up @code{gawk},
+since it relies heavily on the use of dynamic memory allocation.
+
+@item Use of the @code{rx} regexp library
+The @code{rx} regular expression library could potentially speed up
+all regexp operations that require knowing the exact location of matches.
+This includes record termination, field and array splitting,
+and the @code{sub}, @code{gsub}, @code{gensub} and @code{match} functions.
+@end table
+
+@node Improvements, , Future Extensions, Notes
+@appendixsec Suggestions for Improvements
+
+Here are some projects that would-be @code{gawk} hackers might like to take
+on. They vary in size from a few days to a few weeks of programming,
+depending on which one you choose and how fast a programmer you are. Please
+send any improvements you write to the maintainers at the GNU project.
+@xref{Adding Code, , Adding New Features},
+for guidelines to follow when adding new features to @code{gawk}.
+@xref{Bugs, ,Reporting Problems and Bugs}, for information on
+contacting the maintainers.
+
+@enumerate
+@item
+Compilation of @code{awk} programs: @code{gawk} uses a Bison (YACC-like)
+parser to convert the script given it into a syntax tree; the syntax
+tree is then executed by a simple recursive evaluator. This method incurs
+a lot of overhead, since the recursive evaluator performs many procedure
+calls to do even the simplest things.
+
+It should be possible for @code{gawk} to convert the script's parse tree
+into a C program which the user would then compile, using the normal
+C compiler and a special @code{gawk} library to provide all the needed
+functions (regexps, fields, associative arrays, type coercion, and so
+on).
+
+An easier possibility might be for an intermediate phase of @code{awk} to
+convert the parse tree into a linear byte code form like the one used
+in GNU Emacs Lisp. The recursive evaluator would then be replaced by
+a straight line byte code interpreter that would be intermediate in speed
+between running a compiled program and doing what @code{gawk} does
+now.
+
+@item
+The programs in the test suite could use documenting in this @value{DOCUMENT}.
+
+@item
+See the @file{FUTURES} file for more ideas. Contact us if you would
+seriously like to tackle any of the items listed there.
+@end enumerate
+
+@node Glossary, Copying, Notes, Top
+@appendix Glossary
+
+@table @asis
+@item Action
+A series of @code{awk} statements attached to a rule. If the rule's
+pattern matches an input record, @code{awk} executes the
+rule's action. Actions are always enclosed in curly braces.
+@xref{Action Overview, ,Overview of Actions}.
+
+@item Amazing @code{awk} Assembler
+Henry Spencer at the University of Toronto wrote a retargetable assembler
+completely as @code{awk} scripts. It is thousands of lines long, including
+machine descriptions for several eight-bit microcomputers.
+It is a good example of a
+program that would have been better written in another language.
+
+@item Amazingly Workable Formatter (@code{awf})
+Henry Spencer at the University of Toronto wrote a formatter that accepts
+a large subset of the @samp{nroff -ms} and @samp{nroff -man} formatting
+commands, using @code{awk} and @code{sh}.
+
+@item ANSI
+The American National Standards Institute. This organization produces
+many standards, among them the standards for the C and C++ programming
+languages.
+
+@item Assignment
+An @code{awk} expression that changes the value of some @code{awk}
+variable or data object. An object that you can assign to is called an
+@dfn{lvalue}. The assigned values are called @dfn{rvalues}.
+@xref{Assignment Ops, ,Assignment Expressions}.
+
+@item @code{awk} Language
+The language in which @code{awk} programs are written.
+
+@item @code{awk} Program
+An @code{awk} program consists of a series of @dfn{patterns} and
+@dfn{actions}, collectively known as @dfn{rules}. For each input record
+given to the program, the program's rules are all processed in turn.
+@code{awk} programs may also contain function definitions.
+
+@item @code{awk} Script
+Another name for an @code{awk} program.
+
+@item Bash
+The GNU version of the standard shell (the Bourne-Again shell).
+See ``Bourne Shell.''
+
+@item BBS
+See ``Bulletin Board System.''
+
+@item Boolean Expression
+Named after the English mathematician Boole. See ``Logical Expression.''
+
+@item Bourne Shell
+The standard shell (@file{/bin/sh}) on Unix and Unix-like systems,
+originally written by Steven R.@: Bourne.
+Many shells (Bash, @code{ksh}, @code{pdksh}, @code{zsh}) are
+generally upwardly compatible with the Bourne shell.
+
+@item Built-in Function
+The @code{awk} language provides built-in functions that perform various
+numerical, time stamp related, and string computations. Examples are
+@code{sqrt} (for the square root of a number) and @code{substr} (for a
+substring of a string). @xref{Built-in, ,Built-in Functions}.
+
+@item Built-in Variable
+@code{ARGC}, @code{ARGIND}, @code{ARGV}, @code{CONVFMT}, @code{ENVIRON},
+@code{ERRNO}, @code{FIELDWIDTHS}, @code{FILENAME}, @code{FNR}, @code{FS},
+@code{IGNORECASE}, @code{NF}, @code{NR}, @code{OFMT}, @code{OFS}, @code{ORS},
+@code{RLENGTH}, @code{RSTART}, @code{RS}, @code{RT}, and @code{SUBSEP},
+are the variables that have special meaning to @code{awk}.
+Changing some of them affects @code{awk}'s running environment.
+Several of these variables are specific to @code{gawk}.
+@xref{Built-in Variables}.
+
+@item Braces
+See ``Curly Braces.''
+
+@item Bulletin Board System
+A computer system allowing users to log in and read and/or leave messages
+for other users of the system, much like leaving paper notes on a bulletin
+board.
+
+@item C
+The system programming language that most GNU software is written in. The
+@code{awk} programming language has C-like syntax, and this @value{DOCUMENT}
+points out similarities between @code{awk} and C when appropriate.
+
+@cindex ISO 8859-1
+@cindex ISO Latin-1
+@item Character Set
+The set of numeric codes used by a computer system to represent the
+characters (letters, numbers, punctuation, etc.) of a particular country
+or place. The most common character set in use today is ASCII (American
+Standard Code for Information Interchange). Many European
+countries use an extension of ASCII known as ISO-8859-1 (ISO Latin-1).
+
+@item CHEM
+A preprocessor for @code{pic} that reads descriptions of molecules
+and produces @code{pic} input for drawing them. It was written in @code{awk}
+by Brian Kernighan and Jon Bentley, and is available from
+@code{@w{netlib@@research.att.com}}.
+
+@item Compound Statement
+A series of @code{awk} statements, enclosed in curly braces. Compound
+statements may be nested.
+@xref{Statements, ,Control Statements in Actions}.
+
+@item Concatenation
+Concatenating two strings means sticking them together, one after another,
+giving a new string. For example, the string @samp{foo} concatenated with
+the string @samp{bar} gives the string @samp{foobar}.
+@xref{Concatenation, ,String Concatenation}.
+
+@item Conditional Expression
+An expression using the @samp{?:} ternary operator, such as
+@samp{@var{expr1} ? @var{expr2} : @var{expr3}}. The expression
+@var{expr1} is evaluated; if the result is true, the value of the whole
+expression is the value of @var{expr2}, otherwise the value is
+@var{expr3}. In either case, only one of @var{expr2} and @var{expr3}
+is evaluated. @xref{Conditional Exp, ,Conditional Expressions}.
+
+@item Comparison Expression
+A relation that is either true or false, such as @samp{(a < b)}.
+Comparison expressions are used in @code{if}, @code{while}, @code{do},
+and @code{for}
+statements, and in patterns to select which input records to process.
+@xref{Typing and Comparison, ,Variable Typing and Comparison Expressions}.
+
+@item Curly Braces
+The characters @samp{@{} and @samp{@}}. Curly braces are used in
+@code{awk} for delimiting actions, compound statements, and function
+bodies.
+
+@item Dark Corner
+An area in the language where specifications often were (or still
+are) not clear, leading to unexpected or undesirable behavior.
+Such areas are marked in this @value{DOCUMENT} with ``(d.c.)'' in the
+text, and are indexed under the heading ``dark corner.''
+
+@item Data Objects
+These are numbers and strings of characters. Numbers are converted into
+strings and vice versa, as needed.
+@xref{Conversion, ,Conversion of Strings and Numbers}.
+
+@item Double Precision
+An internal representation of numbers that can have fractional parts.
+Double precision numbers keep track of more digits than do single precision
+numbers, but operations on them are more expensive. This is the way
+@code{awk} stores numeric values. It is the C type @code{double}.
+
+@item Dynamic Regular Expression
+A dynamic regular expression is a regular expression written as an
+ordinary expression. It could be a string constant, such as
+@code{"foo"}, but it may also be an expression whose value can vary.
+@xref{Computed Regexps, , Using Dynamic Regexps}.
+
+@item Environment
+A collection of strings, of the form @var{name@code{=}val}, that each
+program has available to it. Users generally place values into the
+environment in order to provide information to various programs. Typical
+examples are the environment variables @code{HOME} and @code{PATH}.
+
+@item Empty String
+See ``Null String.''
+
+@item Escape Sequences
+A special sequence of characters used for describing non-printing
+characters, such as @samp{\n} for newline, or @samp{\033} for the ASCII
+ESC (escape) character. @xref{Escape Sequences}.
+
+@item Field
+When @code{awk} reads an input record, it splits the record into pieces
+separated by whitespace (or by a separator regexp which you can
+change by setting the built-in variable @code{FS}). Such pieces are
+called fields. If the pieces are of fixed length, you can use the built-in
+variable @code{FIELDWIDTHS} to describe their lengths.
+@xref{Field Separators, ,Specifying How Fields are Separated},
+and also see
+@xref{Constant Size, , Reading Fixed-width Data}.
+
+@item Floating Point Number
+Often referred to in mathematical terms as a ``rational'' number, this is
+just a number that can have a fractional part.
+See ``Double Precision'' and ``Single Precision.''
+
+@item Format
+Format strings are used to control the appearance of output in the
+@code{printf} statement. Also, data conversions from numbers to strings
+are controlled by the format string contained in the built-in variable
+@code{CONVFMT}. @xref{Control Letters, ,Format-Control Letters}.
+
+@item Function
+A specialized group of statements used to encapsulate general
+or program-specific tasks. @code{awk} has a number of built-in
+functions, and also allows you to define your own.
+@xref{Built-in, ,Built-in Functions},
+and @ref{User-defined, ,User-defined Functions}.
+
+@item FSF
+See ``Free Software Foundation.''
+
+@item Free Software Foundation
+A non-profit organization dedicated
+to the production and distribution of freely distributable software.
+It was founded by Richard M.@: Stallman, the author of the original
+Emacs editor. GNU Emacs is the most widely used version of Emacs today.
+
+@item @code{gawk}
+The GNU implementation of @code{awk}.
+
+@item General Public License
+This document describes the terms under which @code{gawk} and its source
+code may be distributed. (@pxref{Copying, ,GNU GENERAL PUBLIC LICENSE})
+
+@item GNU
+``GNU's not Unix''. An on-going project of the Free Software Foundation
+to create a complete, freely distributable, POSIX-compliant computing
+environment.
+
+@item GPL
+See ``General Public License.''
+
+@item Hexadecimal
+Base 16 notation, where the digits are @code{0}-@code{9} and
+@code{A}-@code{F}, with @samp{A}
+representing 10, @samp{B} representing 11, and so on up to @samp{F} for 15.
+Hexadecimal numbers are written in C using a leading @samp{0x},
+to indicate their base. Thus, @code{0x12} is 18 (one times 16 plus 2).
+
+@item I/O
+Abbreviation for ``Input/Output,'' the act of moving data into and/or
+out of a running program.
+
+@item Input Record
+A single chunk of data read in by @code{awk}. Usually, an @code{awk} input
+record consists of one line of text.
+@xref{Records, ,How Input is Split into Records}.
+
+@item Integer
+A whole number, i.e.@: a number that does not have a fractional part.
+
+@item Keyword
+In the @code{awk} language, a keyword is a word that has special
+meaning. Keywords are reserved and may not be used as variable names.
+
+@code{gawk}'s keywords are:
+@code{BEGIN},
+@code{END},
+@code{if},
+@code{else},
+@code{while},
+@code{do@dots{}while},
+@code{for},
+@code{for@dots{}in},
+@code{break},
+@code{continue},
+@code{delete},
+@code{next},
+@code{nextfile},
+@code{function},
+@code{func},
+and @code{exit}.
+
+@item Logical Expression
+An expression using the operators for logic, AND, OR, and NOT, written
+@samp{&&}, @samp{||}, and @samp{!} in @code{awk}. Often called Boolean
+expressions, after the mathematician who pioneered this kind of
+mathematical logic.
+
+@item Lvalue
+An expression that can appear on the left side of an assignment
+operator. In most languages, lvalues can be variables or array
+elements. In @code{awk}, a field designator can also be used as an
+lvalue.
+
+@item Null String
+A string with no characters in it. It is represented explicitly in
+@code{awk} programs by placing two double-quote characters next to
+each other (@code{""}). It can appear in input data by having two successive
+occurrences of the field separator appear next to each other.
+
+@item Number
+A numeric valued data object. The @code{gawk} implementation uses double
+precision floating point to represent numbers.
+Very old @code{awk} implementations use single precision floating
+point.
+
+@item Octal
+Base-eight notation, where the digits are @code{0}-@code{7}.
+Octal numbers are written in C using a leading @samp{0},
+to indicate their base. Thus, @code{013} is 11 (one times 8 plus 3).
+
+@item Pattern
+Patterns tell @code{awk} which input records are interesting to which
+rules.
+
+A pattern is an arbitrary conditional expression against which input is
+tested. If the condition is satisfied, the pattern is said to @dfn{match}
+the input record. A typical pattern might compare the input record against
+a regular expression. @xref{Pattern Overview, ,Pattern Elements}.
+
+@item POSIX
+The name for a series of standards being developed by the IEEE
+that specify a Portable Operating System interface. The ``IX'' denotes
+the Unix heritage of these standards. The main standard of interest for
+@code{awk} users is
+@cite{IEEE Standard for Information Technology, Standard 1003.2-1992,
+Portable Operating System Interface (POSIX) Part 2: Shell and Utilities}.
+Informally, this standard is often referred to as simply ``P1003.2.''
+
+@item Private
+Variables and/or functions that are meant for use exclusively by library
+functions, and not for the main @code{awk} program. Special care must be
+taken when naming such variables and functions.
+@xref{Library Names, , Naming Library Function Global Variables}.
+
+@item Range (of input lines)
+A sequence of consecutive lines from the input file. A pattern
+can specify ranges of input lines for @code{awk} to process, or it can
+specify single lines. @xref{Pattern Overview, ,Pattern Elements}.
+
+@item Recursion
+When a function calls itself, either directly or indirectly.
+If this isn't clear, refer to the entry for ``recursion.''
+
+@item Redirection
+Redirection means performing input from other than the standard input
+stream, or output to other than the standard output stream.
+
+You can redirect the output of the @code{print} and @code{printf} statements
+to a file or a system command, using the @samp{>}, @samp{>>}, and @samp{|}
+operators. You can redirect input to the @code{getline} statement using
+the @samp{<} and @samp{|} operators.
+@xref{Redirection, ,Redirecting Output of @code{print} and @code{printf}},
+and @ref{Getline, ,Explicit Input with @code{getline}}.
+
+@item Regexp
+Short for @dfn{regular expression}. A regexp is a pattern that denotes a
+set of strings, possibly an infinite set. For example, the regexp
+@samp{R.*xp} matches any string starting with the letter @samp{R}
+and ending with the letters @samp{xp}. In @code{awk}, regexps are
+used in patterns and in conditional expressions. Regexps may contain
+escape sequences. @xref{Regexp, ,Regular Expressions}.
+
+@item Regular Expression
+See ``regexp.''
+
+@item Regular Expression Constant
+A regular expression constant is a regular expression written within
+slashes, such as @code{/foo/}. This regular expression is chosen
+when you write the @code{awk} program, and cannot be changed doing
+its execution. @xref{Regexp Usage, ,How to Use Regular Expressions}.
+
+@item Rule
+A segment of an @code{awk} program that specifies how to process single
+input records. A rule consists of a @dfn{pattern} and an @dfn{action}.
+@code{awk} reads an input record; then, for each rule, if the input record
+satisfies the rule's pattern, @code{awk} executes the rule's action.
+Otherwise, the rule does nothing for that input record.
+
+@item Rvalue
+A value that can appear on the right side of an assignment operator.
+In @code{awk}, essentially every expression has a value. These values
+are rvalues.
+
+@item @code{sed}
+See ``Stream Editor.''
+
+@item Short-Circuit
+The nature of the @code{awk} logical operators @samp{&&} and @samp{||}.
+If the value of the entire expression can be deduced from evaluating just
+the left-hand side of these operators, the right-hand side will not
+be evaluated
+(@pxref{Boolean Ops, ,Boolean Expressions}).
+
+@item Side Effect
+A side effect occurs when an expression has an effect aside from merely
+producing a value. Assignment expressions, increment and decrement
+expressions and function calls have side effects.
+@xref{Assignment Ops, ,Assignment Expressions}.
+
+@item Single Precision
+An internal representation of numbers that can have fractional parts.
+Single precision numbers keep track of fewer digits than do double precision
+numbers, but operations on them are less expensive in terms of CPU time.
+This is the type used by some very old versions of @code{awk} to store
+numeric values. It is the C type @code{float}.
+
+@item Space
+The character generated by hitting the space bar on the keyboard.
+
+@item Special File
+A file name interpreted internally by @code{gawk}, instead of being handed
+directly to the underlying operating system. For example, @file{/dev/stderr}.
+@xref{Special Files, ,Special File Names in @code{gawk}}.
+
+@item Stream Editor
+A program that reads records from an input stream and processes them one
+or more at a time. This is in contrast with batch programs, which may
+expect to read their input files in entirety before starting to do
+anything, and with interactive programs, which require input from the
+user.
+
+@item String
+A datum consisting of a sequence of characters, such as @samp{I am a
+string}. Constant strings are written with double-quotes in the
+@code{awk} language, and may contain escape sequences.
+@xref{Escape Sequences}.
+
+@item Tab
+The character generated by hitting the @kbd{TAB} key on the keyboard.
+It usually expands to up to eight spaces upon output.
+
+@item Unix
+A computer operating system originally developed in the early 1970's at
+AT&T Bell Laboratories. It initially became popular in universities around
+the world, and later moved into commercial evnironments as a software
+development system and network server system. There are many commercial
+versions of Unix, as well as several work-alike systems whose source code
+is freely available (such as Linux, NetBSD, and FreeBSD).
+
+@item Whitespace
+A sequence of space or tab characters occurring inside an input record or a
+string.
+@end table
+
+@node Copying, Index, Glossary, Top
+@unnumbered GNU GENERAL PUBLIC LICENSE
+@center Version 2, June 1991
+
+@display
+Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
+59 Temple Place --- Suite 330, Boston, MA 02111-1307, USA
+
+Everyone is permitted to copy and distribute verbatim copies
+of this license document, but changing it is not allowed.
+@end display
+
+@c fakenode --- for prepinfo
+@unnumberedsec Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software---to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+@iftex
+@c fakenode --- for prepinfo
+@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+@end iftex
+@ifinfo
+@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+@end ifinfo
+
+@enumerate 0
+@item
+This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The ``Program'', below,
+refers to any such program or work, and a ``work based on the Program''
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term ``modification''.) Each licensee is addressed as ``you''.
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+@item
+You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+@item
+You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+@enumerate a
+@item
+You must cause the modified files to carry prominent notices
+stating that you changed the files and the date of any change.
+
+@item
+You must cause any work that you distribute or publish, that in
+whole or in part contains or is derived from the Program or any
+part thereof, to be licensed as a whole at no charge to all third
+parties under the terms of this License.
+
+@item
+If the modified program normally reads commands interactively
+when run, you must cause it, when started running for such
+interactive use in the most ordinary way, to print or display an
+announcement including an appropriate copyright notice and a
+notice that there is no warranty (or else, saying that you provide
+a warranty) and that users may redistribute the program under
+these conditions, and telling the user how to view a copy of this
+License. (Exception: if the Program itself is interactive but
+does not normally print such an announcement, your work based on
+the Program is not required to print an announcement.)
+@end enumerate
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+@item
+You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+@enumerate a
+@item
+Accompany it with the complete corresponding machine-readable
+source code, which must be distributed under the terms of Sections
+1 and 2 above on a medium customarily used for software interchange; or,
+
+@item
+Accompany it with a written offer, valid for at least three
+years, to give any third party, for a charge no more than your
+cost of physically performing source distribution, a complete
+machine-readable copy of the corresponding source code, to be
+distributed under the terms of Sections 1 and 2 above on a medium
+customarily used for software interchange; or,
+
+@item
+Accompany it with the information you received as to the offer
+to distribute corresponding source code. (This alternative is
+allowed only for non-commercial distribution and only if you
+received the program in object code or executable form with such
+an offer, in accord with Subsection b above.)
+@end enumerate
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+@item
+You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+@item
+You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+@item
+Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+@item
+If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+@item
+If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+@item
+The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and ``any
+later version'', you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+@item
+If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+@iftex
+@c fakenode --- for prepinfo
+@heading NO WARRANTY
+@end iftex
+@ifinfo
+@center NO WARRANTY
+@end ifinfo
+
+@item
+BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW@. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE@. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU@. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+@item
+IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+@end enumerate
+
+@iftex
+@c fakenode --- for prepinfo
+@heading END OF TERMS AND CONDITIONS
+@end iftex
+@ifinfo
+@center END OF TERMS AND CONDITIONS
+@end ifinfo
+
+@page
+@c fakenode --- for prepinfo
+@unnumberedsec How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the ``copyright'' line and a pointer to where the full notice is found.
+
+@smallexample
+@var{one line to give the program's name and an idea of what it does.}
+Copyright (C) 19@var{yy} @var{name of author}
+
+This program is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation; either version 2
+of the License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE@. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place --- Suite 330, Boston, MA 02111-1307, USA.
+@end smallexample
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+@smallexample
+Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
+Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
+type `show w'. This is free software, and you are welcome
+to redistribute it under certain conditions; type `show c'
+for details.
+@end smallexample
+
+The hypothetical commands @samp{show w} and @samp{show c} should show
+the appropriate parts of the General Public License. Of course, the
+commands you use may be called something other than @samp{show w} and
+@samp{show c}; they could even be mouse-clicks or menu items---whatever
+suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a ``copyright disclaimer'' for the program, if
+necessary. Here is a sample; alter the names:
+
+@smallexample
+@group
+Yoyodyne, Inc., hereby disclaims all copyright
+interest in the program `Gnomovision'
+(which makes passes at compilers) written
+by James Hacker.
+
+@var{signature of Ty Coon}, 1 April 1989
+Ty Coon, President of Vice
+@end group
+@end smallexample
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Library General
+Public License instead of this License.
+
+@node Index, , Copying, Top
+@unnumbered Index
+@printindex cp
+
+@summarycontents
+@contents
+@bye
+
+Unresolved Issues:
+------------------
+1. From ADR.
+
+ Robert J. Chassell points out that awk programs should have some indication
+ of how to use them. It would be useful to perhaps have a "programming
+ style" section of the manual that would include this and other tips.
+
+2. The default AWKPATH search path should be configurable via `configure'
+ The default and how this changes needs to be documented.
+
+Consistency issues:
+ /.../ regexps are in @code, not @samp
+ ".." strings are in @code, not @samp
+ no @print before @dots
+ values of expressions in the text (@code{x} has the value 15),
+ should be in roman, not @code
+ Use tab and not TAB
+ Use ESC and not ESCAPE
+ Use space and not blank to describe the space bar's character
+ The term "blank" is thus basically reserved for "blank lines" etc.
+ The `(d.c.)' should appear inside the closing `.' of a sentence
+ It should come before (pxref{...})
+ " " should have an @w{} around it
+ Use "non-" everywhere
+ Use @code{ftp} when talking about anonymous ftp
+ Use upper-case and lower-case, not "upper case" and "lower case"
+ Use alphanumeric, not alpha-numeric
+ Use --foo, not -Wfoo when describing long options
+ Use findex for all programs and functions in the example chapters
+ Use "Bell Labs" or "AT&T Bell Laboratories", but not
+ "AT&T Bell Labs".
+ Use "behavior" instead of "behaviour".
+ Use "zeros" instead of "zeroes".
+ Use "Input/Output", not "input/output". Also "I/O", not "i/o".
+ Use @code{do}, and not @code{do}-@code{while}, except where
+ actually discussing the do-while.
+ The words "a", "and", "as", "between", "for", "from", "in", "of",
+ "on", "that", "the", "to", "with", and "without",
+ should not be capitalized in @chapter, @section etc.
+ "Into" and "How" should.
+ Search for @dfn; make sure important items are also indexed.
+ "e.g." should always be followed by a comma.
+ "i.e." should never be followed by a comma, and should be followed
+ by `@:'.
+ The numbers zero through ten should be spelled out, except when
+ talking about file descriptor numbers. > 10 and < 0, it's
+ ok to use numbers.
+ In tables, put command line options in @code, while in the text,
+ put them in @samp.
+ When using @strong, use "Note:" or "Caution:" with colons and
+ not exclamation points. Do not surround the paragraphs
+ with @quotation ... @end quotation.
+
+Date: Wed, 13 Apr 94 15:20:52 -0400
+From: rsm@gnu.ai.mit.edu (Richard Stallman)
+To: gnu-prog@gnu.ai.mit.edu
+Subject: A reminder: no pathnames in GNU
+
+It's a GNU convention to use the term "file name" for the name of a
+file, never "pathname". We use the term "path" for search paths,
+which are lists of file names. Using it for a single file name as
+well is potentially confusing to users.
+
+So please check any documentation you maintain, if you think you might
+have used "pathname".
+
+Note that "file name" should be two words when it appears as ordinary
+text. It's ok as one word when it's a metasyntactic variable, though.
+
+Suggestions:
+------------
+Enhance FIELDWIDTHS with some way to indicate "the rest of the record".
+E.g., a length of 0 or -1 or something. May be "n"?
+
+Make FIELDWIDTHS be an array?
+
+What if FIELDWIDTHS has invalid values in it?