TagTag for 2.1.6 release

git-svn-id: svn://svn.code.sf.net/p/pyparsing/code/tags/pyparsing_2.1.6@402 9bf210a0-9d2d-494c-87cf-cfb32e7dff7b
author: ptmcg <ptmcg@9bf210a0-9d2d-494c-87cf-cfb32e7dff7b> 2016-08-09 00:23:49 +0000
committer: ptmcg <ptmcg@9bf210a0-9d2d-494c-87cf-cfb32e7dff7b> 2016-08-09 00:23:49 +0000
commit: b2c3ade75384efe76b8774b607e17fe98fab92ef (patch)
tree: b162262f3f0a4bc976d45bed08ccfc6cc9a2eb23 /trunk/src/HowToUsePyparsing.txt
parent: 0be19d2d8545f1ac4b93ffd0d10524613837ba39 (diff)
download: pyparsing-b2c3ade75384efe76b8774b607e17fe98fab92ef.tar.gz
1 files changed, 993 insertions, 0 deletions
diff --git a/trunk/src/HowToUsePyparsing.txt b/trunk/src/HowToUsePyparsing.txt
new file mode 100644
index 0000000..71d5313
--- /dev/null
+++ b/trunk/src/HowToUsePyparsing.txt
@@ -0,0 +1,993 @@
+==========================
+Using the pyparsing module
+==========================
+
+:author: Paul McGuire
+:address: ptmcg@users.sourceforge.net
+
+:revision: 2.0.1
+:date: July, 2013
+
+:copyright: Copyright |copy| 2003-2013 Paul McGuire. 
+
+.. |copy| unicode:: 0xA9
+
+:abstract: This document provides how-to instructions for the
+    pyparsing library, an easy-to-use Python module for constructing
+    and executing basic text parsers.  The pyparsing module is useful
+    for evaluating user-definable
+    expressions, processing custom application language commands, or
+    extracting data from formatted reports.
+
+.. sectnum::    :depth: 4
+
+.. contents::   :depth: 4
+
+
+Steps to follow
+===============
+
+To parse an incoming data string, the client code must follow these steps:
+
+1. First define the tokens and patterns to be matched, and assign
+   this to a program variable.  Optional results names or parsing
+   actions can also be defined at this time.
+
+2. Call ``parseString()`` or ``scanString()`` on this variable, passing in 
+   the string to
+   be parsed.  During the matching process, whitespace between
+   tokens is skipped by default (although this can be changed).
+   When token matches occur, any defined parse action methods are
+   called.
+
+3. Process the parsed results, returned as a list of strings.
+   Matching results may also be accessed as named attributes of
+   the returned results, if names are defined in the definition of
+   the token pattern, using ``setResultsName()``.
+
+
+Hello, World!
+-------------
+
+The following complete Python program will parse the greeting "Hello, World!",
+or any other greeting of the form "<salutation>, <addressee>!"::
+
+    from pyparsing import Word, alphas
+    
+    greet = Word( alphas ) + "," + Word( alphas ) + "!"
+    greeting = greet.parseString( "Hello, World!" )
+    print greeting
+    
+The parsed tokens are returned in the following form::
+
+    ['Hello', ',', 'World', '!']
+    
+
+Usage notes
+-----------
+
+- The pyparsing module can be used to interpret simple command
+  strings or algebraic expressions, or can be used to extract data
+  from text reports with complicated format and structure ("screen
+  or report scraping").  However, it is possible that your defined
+  matching patterns may accept invalid inputs.  Use pyparsing to
+  extract data from strings assumed to be well-formatted.  
+
+- To keep up the readability of your code, use operators_  such as ``+``, ``|``, 
+  ``^``, and ``~`` to combine expressions.  You can also combine
+  string literals with ParseExpressions - they will be
+  automatically converted to Literal objects.  For example::
+  
+    integer  = Word( nums )            # simple unsigned integer
+    variable = Word( alphas, max=1 )   # single letter variable, such as x, z, m, etc.
+    arithOp  = Word( "+-*/", max=1 )   # arithmetic operators
+    equation = variable + "=" + integer + arithOp + integer    # will match "x=2+2", etc.
+
+  In the definition of ``equation``, the string ``"="`` will get added as
+  a ``Literal("=")``, but in a more readable way.
+
+- The pyparsing module's default behavior is to ignore whitespace.  This is the
+  case for 99% of all parsers ever written.  This allows you to write simple, clean,
+  grammars, such as the above ``equation``, without having to clutter it up with
+  extraneous ``ws`` markers.  The ``equation`` grammar will successfully parse all of the
+  following statements::
+    
+    x=2+2
+    x = 2+2
+    a = 10   *   4
+    r= 1234/ 100000
+    
+  Of course, it is quite simple to extend this example to support more elaborate expressions, with
+  nesting with parentheses, floating point numbers, scientific notation, and named constants 
+  (such as ``e`` or ``pi``).  See ``fourFn.py``, included in the examples directory.
+
+- To modify pyparsing's default whitespace skipping, you can use one or
+  more of the following methods:
+  
+  - use the static method ``ParserElement.setDefaultWhitespaceChars``
+    to override the normal set of whitespace chars (' \t\n').  For instance
+    when defining a grammar in which newlines are significant, you should
+    call ``ParserElement.setDefaultWhitespaceChars(' \t')`` to remove 
+    newline from the set of skippable whitespace characters.  Calling
+    this method will affect all pyparsing expressions defined afterward.
+    
+  - call ``leaveWhitespace()`` on individual expressions, to suppress the 
+    skipping of whitespace before trying to match the expression
+    
+  - use ``Combine`` to require that successive expressions must be
+    adjacent in the input string.  For instance, this expression::
+    
+      real = Word(nums) + '.' + Word(nums)
+    
+    will match "3.14159", but will also match "3 . 12".  It will also 
+    return the matched results as ['3', '.', '14159'].  By changing this
+    expression to::
+    
+      real = Combine( Word(nums) + '.' + Word(nums) )
+    
+    it will not match numbers with embedded spaces, and it will return a
+    single concatenated string '3.14159' as the parsed token.
+
+- Repetition of expressions can be indicated using the '*' operator.  An
+  expression may be multiplied by an integer value (to indicate an exact
+  repetition count), or by a tuple containing
+  two integers, or None and an integer, representing min and max repetitions
+  (with None representing no min or no max, depending whether it is the first or
+  second tuple element).  See the following examples, where n is used to 
+  indicate an integer value:
+
+  - ``expr*3`` is equivalent to ``expr + expr + expr``
+  
+  - ``expr*(2,3)`` is equivalent to ``expr + expr + Optional(expr)``
+  
+  - ``expr*(n,None)`` or ``expr*(n,)`` is equivalent
+    to ``expr*n + ZeroOrMore(expr)`` (read as "at least n instances of expr")
+    
+  - ``expr*(None,n)`` is equivalent to ``expr*(0,n)``
+    (read as "0 to n instances of expr")
+    
+  - ``expr*(None,None)`` is equivalent to ``ZeroOrMore(expr)``
+  
+  - ``expr*(1,None)`` is equivalent to ``OneOrMore(expr)``
+
+  Note that ``expr*(None,n)`` does not raise an exception if
+  more than n exprs exist in the input stream; that is,
+  ``expr*(None,n)`` does not enforce a maximum number of expr
+  occurrences.  If this behavior is desired, then write
+  ``expr*(None,n) + ~expr``.
+  
+- ``MatchFirst`` expressions are matched left-to-right, and the first
+  match found will skip all later expressions within, so be sure
+  to define less-specific patterns after more-specific patterns.
+  If you are not sure which expressions are most specific, use Or
+  expressions (defined using the ``^`` operator) - they will always
+  match the longest expression, although they are more
+  compute-intensive.
+  
+- ``Or`` expressions will evaluate all of the specified subexpressions
+  to determine which is the "best" match, that is, which matches
+  the longest string in the input data.  In case of a tie, the
+  left-most expression in the ``Or`` list will win.
+
+- If parsing the contents of an entire file, pass it to the
+  ``parseFile`` method using::
+    
+    expr.parseFile( sourceFile )
+    
+- ``ParseExceptions`` will report the location where an expected token
+  or expression failed to match.  For example, if we tried to use our
+  "Hello, World!" parser to parse "Hello World!" (leaving out the separating
+  comma), we would get an exception, with the message::
+  
+    pyparsing.ParseException: Expected "," (6), (1,7)
+  
+  In the case of complex
+  expressions, the reported location may not be exactly where you
+  would expect.  See more information under ParseException_ .
+
+- Use the ``Group`` class to enclose logical groups of tokens within a
+  sublist.  This will help organize your results into more
+  hierarchical form (the default behavior is to return matching
+  tokens as a flat list of matching input strings).
+  
+- Punctuation may be significant for matching, but is rarely of
+  much interest in the parsed results.  Use the ``suppress()`` method
+  to keep these tokens from cluttering up your returned lists of
+  tokens.  For example, ``delimitedList()`` matches a succession of
+  one or more expressions, separated by delimiters (commas by
+  default), but only returns a list of the actual expressions -
+  the delimiters are used for parsing, but are suppressed from the
+  returned output.
+  
+- Parse actions can be used to convert values from strings to
+  other data types (ints, floats, booleans, etc.).
+  
+- Results names are recommended for retrieving tokens from complex
+  expressions.  It is much easier to access a token using its field
+  name than using a positional index, especially if the expression 
+  contains optional elements.  You can also shortcut
+  the ``setResultsName`` call::
+  
+    stats = "AVE:" + realNum.setResultsName("average") + \
+            "MIN:" + realNum.setResultsName("min") + \
+            "MAX:" + realNum.setResultsName("max")  
+
+  can now be written as this::
+  
+    stats = "AVE:" + realNum("average") + \
+            "MIN:" + realNum("min") + \
+            "MAX:" + realNum("max")  
+  
+- Be careful when defining parse actions that modify global variables or
+  data structures (as in ``fourFn.py``), especially for low level tokens 
+  or expressions that may occur within an ``And`` expression; an early element 
+  of an ``And`` may match, but the overall expression may fail.
+
+- Performance of pyparsing may be slow for complex grammars and/or large
+  input strings.  The psyco_ package can be used to improve the speed of the
+  pyparsing module with no changes to grammar or program logic - observed
+  improvments have been in the 20-50% range.
+
+.. _psyco: http://psyco.sourceforge.net/
+
+
+Classes
+=======
+
+Classes in the pyparsing module
+-------------------------------
+
+``ParserElement`` - abstract base class for all pyparsing classes;
+methods for code to use are:
+
+- ``parseString( sourceString, parseAll=False )`` - only called once, on the overall
+  matching pattern; returns a ParseResults_ object that makes the
+  matched tokens available as a list, and optionally as a dictionary, 
+  or as an object with named attributes; if parseAll is set to True, then
+  parseString will raise a ParseException if the grammar does not process
+  the complete input string.
+
+- ``parseFile( sourceFile )`` - a convenience function, that accepts an
+  input file object or filename.  The file contents are passed as a 
+  string to ``parseString()``.  ``parseFile`` also supports the ``parseAll`` argument.
+  
+- ``scanString( sourceString )`` - generator function, used to find and
+  extract matching text in the given source string; for each matched text, 
+  returns a tuple of:
+  
+  - matched tokens (packaged as a ParseResults_ object)
+  
+  - start location of the matched text in the given source string
+  
+  - end location in the given source string
+  
+  ``scanString`` allows you to scan through the input source string for
+  random matches, instead of exhaustively defining the grammar for the entire
+  source text (as would be required with ``parseString``).
+
+- ``transformString( sourceString )`` - convenience wrapper function for
+  ``scanString``, to process the input source string, and replace matching
+  text with the tokens returned from parse actions defined in the grammar
+  (see setParseAction_).
+
+- ``searchString( sourceString )`` - another convenience wrapper function for
+  ``scanString``, returns a list of the matching tokens returned from each
+  call to ``scanString``.
+
+- ``setName( name )`` - associate a short descriptive name for this
+  element, useful in displaying exceptions and trace information
+
+- ``setResultsName( string, listAllMatches=False )`` - name to be given 
+  to tokens matching
+  the element; if multiple tokens within
+  a repetition group (such as ``ZeroOrMore`` or ``delimitedList``) the
+  default is to return only the last matching token - if listAllMatches
+  is set to True, then a list of all the matching tokens is returned. 
+  (New in 1.5.6 - a results name with a trailing '*' character will be
+  interpreted as setting listAllMatches to True.)
+  Note: 
+  ``setResultsName`` returns a *copy* of the element so that a single
+  basic element can be referenced multiple times and given
+  different names within a complex grammar.
+
+.. _setParseAction:
+
+- ``setParseAction( *fn )`` - specify one or more functions to call after successful
+  matching of the element; each function is defined as ``fn( s,
+  loc, toks )``, where:
+  
+  - ``s`` is the original parse string
+  
+  - ``loc`` is the location in the string where matching started
+  
+  - ``toks`` is the list of the matched tokens, packaged as a ParseResults_ object
+  
+  Multiple functions can be attached to a ParserElement by specifying multiple
+  arguments to setParseAction, or by calling setParseAction multiple times.
+  
+  Each parse action function can return a modified ``toks`` list, to perform conversion, or
+  string modifications.  For brevity, ``fn`` may also be a
+  lambda - here is an example of using a parse action to convert matched
+  integer tokens from strings to integers::
+  
+    intNumber = Word(nums).setParseAction( lambda s,l,t: [ int(t[0]) ] )
+
+  If ``fn`` does not modify the ``toks`` list, it does not need to return
+  anything at all.
+
+- ``setBreak( breakFlag=True )`` - if breakFlag is True, calls pdb.set_break()
+  as this expression is about to be parsed
+
+- ``copy()`` - returns a copy of a ParserElement; can be used to use the same
+  parse expression in different places in a grammar, with different parse actions
+  attached to each
+
+- ``leaveWhitespace()`` - change default behavior of skipping
+  whitespace before starting matching (mostly used internally to the 
+  pyparsing module, rarely used by client code)
+
+- ``setWhitespaceChars( chars )`` - define the set of chars to be ignored
+  as whitespace before trying to match a specific ParserElement, in place of the
+  default set of whitespace (space, tab, newline, and return)
+
+- ``setDefaultWhitespaceChars( chars )`` - class-level method to override
+  the default set of whitespace chars for all subsequently created ParserElements
+  (including copies); useful when defining grammars that treat one or more of the
+  default whitespace characters as significant (such as a line-sensitive grammar, to 
+  omit newline from the list of ignorable whitespace)
+
+- ``suppress()`` - convenience function to suppress the output of the
+  given element, instead of wrapping it with a Suppress object.
+
+- ``ignore( expr )`` - function to specify parse expression to be
+  ignored while matching defined patterns; can be called
+  repeatedly to specify multiple expressions; useful to specify
+  patterns of comment syntax, for example
+
+- ``setDebug( dbgFlag=True )`` - function to enable/disable tracing output 
+  when trying to match this element
+
+- ``validate()`` - function to verify that the defined grammar does not
+  contain infinitely recursive constructs
+  
+.. _parseWithTabs:
+
+- ``parseWithTabs()`` - function to override default behavior of converting
+  tabs to spaces before parsing the input string; rarely used, except when
+  specifying whitespace-significant grammars using the White_ class.
+
+- ``enablePackrat()`` - a class-level static method to enable a memoizing
+  performance enhancement, known as "packrat parsing".  packrat parsing is 
+  disabled by default, since it may conflict with some user programs that use
+  parse actions.  To activate the packrat feature, your
+  program must call the class method ParserElement.enablePackrat().  If
+  your program uses psyco to "compile as you go", you must call 
+  enablePackrat before calling psyco.full().  If you do not do this,
+  Python will crash.  For best results, call enablePackrat() immediately
+  after importing pyparsing.
+     
+
+Basic ParserElement subclasses
+------------------------------
+
+- ``Literal`` - construct with a string to be matched exactly
+
+- ``CaselessLiteral`` - construct with a string to be matched, but
+  without case checking; results are always returned as the
+  defining literal, NOT as they are found in the input string
+
+- ``Keyword`` - similar to Literal, but must be immediately followed by
+  whitespace, punctuation, or other non-keyword characters; prevents
+  accidental matching of a non-keyword that happens to begin with a
+  defined keyword
+  
+- ``CaselessKeyword`` - similar to Keyword, but with caseless matching
+  behavior
+  
+.. _Word:
+
+- ``Word`` - one or more contiguous characters; construct with a
+  string containing the set of allowed initial characters, and an
+  optional second string of allowed body characters; for instance,
+  a common Word construct is to match a code identifier - in C, a
+  valid identifier must start with an alphabetic character or an 
+  underscore ('_'), followed by a body that can also include numeric
+  digits.  That is, ``a``, ``i``, ``MAX_LENGTH``, ``_a1``, ``b_109_``, and 
+  ``plan9FromOuterSpace``
+  are all valid identifiers; ``9b7z``, ``$a``, ``.section``, and ``0debug``
+  are not.  To
+  define an identifier using a Word, use either of the following::
+  
+  - Word( alphas+"_", alphanums+"_" )
+  - Word( srange("[a-zA-Z_]"), srange("[a-zA-Z0-9_]") )
+  
+  If only one
+  string given, it specifies that the same character set defined
+  for the initial character is used for the word body; for instance, to
+  define an identifier that can only be composed of capital letters and
+  underscores, use::
+  
+  - Word( "ABCDEFGHIJKLMNOPQRSTUVWXYZ_" )
+  - Word( srange("[A-Z_]") )
+
+  A Word may
+  also be constructed with any of the following optional parameters:
+  
+  - ``min`` - indicating a minimum length of matching characters
+  
+  - ``max`` - indicating a maximum length of matching characters
+  
+  - ``exact`` - indicating an exact length of matching characters
+
+  If ``exact`` is specified, it will override any values for ``min`` or ``max``.
+  
+  New in 1.5.6 - Sometimes you want to define a word using all 
+  characters in a range except for one or two of them; you can do this
+  with the new ``excludeChars`` argument. This is helpful if you want to define
+  a word with all printables except for a single delimiter character, such
+  as '.'. Previously, you would have to create a custom string to pass to Word.
+  With this change, you can just create ``Word(printables, excludeChars='.')``.
+
+- ``CharsNotIn`` - similar to Word_, but matches characters not
+  in the given constructor string (accepts only one string for both
+  initial and body characters); also supports ``min``, ``max``, and ``exact``
+  optional parameters.
+
+- ``Regex`` - a powerful construct, that accepts a regular expression
+  to be matched at the current parse position; accepts an optional
+  ``flags`` parameter, corresponding to the flags parameter in the re.compile
+  method; if the expression includes named sub-fields, they will be 
+  represented in the returned ParseResults_
+
+- ``QuotedString`` - supports the definition of custom quoted string
+  formats, in addition to pyparsing's built-in ``dblQuotedString`` and
+  ``sglQuotedString``.  ``QuotedString`` allows you to specify the following 
+  parameters:
+  
+  - ``quoteChar`` - string of one or more characters defining the quote delimiting string
+  
+  - ``escChar`` - character to escape quotes, typically backslash (default=None)
+  
+  - ``escQuote`` - special quote sequence to escape an embedded quote string (such as SQL's "" to escape an embedded ") (default=None)
+  
+  - ``multiline`` - boolean indicating whether quotes can span multiple lines (default=False)
+  
+  - ``unquoteResults`` - boolean indicating whether the matched text should be unquoted (default=True)
+  
+  - ``endQuoteChar`` - string of one or more characters defining the end of the quote delimited string (default=None => same as quoteChar)
+    
+- ``SkipTo`` - skips ahead in the input string, accepting any
+  characters up to the specified pattern; may be constructed with 
+  the following optional parameters:
+  
+  - ``include`` - if set to true, also consumes the match expression
+    (default is false)
+  
+  - ``ignore`` - allows the user to specify patterns to not be matched,
+    to prevent false matches
+  
+  - ``failOn`` - if a literal string or expression is given for this argument, it defines an expression that
+    should cause the ``SkipTo`` expression to fail, and not skip over that expression
+
+.. _White:
+
+- ``White`` - also similar to Word_, but matches whitespace 
+  characters.  Not usually needed, as whitespace is implicitly
+  ignored by pyparsing.  However, some grammars are whitespace-sensitive,
+  such as those that use leading tabs or spaces to indicating grouping
+  or hierarchy.  (If matching on tab characters, be sure to call 
+  parseWithTabs_ on the top-level parse element.)
+  
+- ``Empty`` - a null expression, requiring no characters - will always
+  match; useful for debugging and for specialized grammars
+  
+- ``NoMatch`` - opposite of Empty, will never match; useful for debugging
+  and for specialized grammars
+
+
+Expression subclasses
+---------------------
+
+- ``And`` - construct with a list of ParserElements, all of which must
+  match for And to match; can also be created using the '+'
+  operator; multiple expressions can be Anded together using the '*'
+  operator as in::
+  
+    ipAddress = Word(nums) + ('.'+Word(nums))*3
+    
+  A tuple can be used as the multiplier, indicating a min/max::
+  
+    usPhoneNumber = Word(nums) + ('-'+Word(nums))*(1,2)
+
+  A special form of ``And`` is created if the '-' operator is used 
+  instead of the '+' operator.  In the ipAddress example above, if
+  no trailing '.' and Word(nums) are found after matching the initial
+  Word(nums), then pyparsing will back up in the grammar and try other
+  alternatives to ipAddress.  However, if ipAddress is defined as::
+  
+    strictIpAddress = Word(nums) - ('.'+Word(nums))*3
+    
+  then no backing up is done.  If the first Word(nums) of strictIpAddress
+  is matched, then any mismatch after that will raise a ParseSyntaxException,
+  which will halt the parsing process immediately.  By careful use of the
+  '-' operator, grammars can provide meaningful error messages close to 
+  the location where the incoming text does not match the specified
+  grammar.
+
+- ``Or`` - construct with a list of ParserElements, any of which must
+  match for Or to match; if more than one expression matches, the
+  expression that makes the longest match will be used; can also
+  be created using the '^' operator
+
+- ``MatchFirst`` - construct with a list of ParserElements, any of
+  which must match for MatchFirst to match; matching is done
+  left-to-right, taking the first expression that matches; can
+  also be created using the '|' operator
+
+- ``Each`` - similar to And, in that all of the provided expressions
+  must match; however, Each permits matching to be done in any order;
+  can also be created using the '&' operator
+  
+- ``Optional`` - construct with a ParserElement, but this element is
+  not required to match; can be constructed with an optional ``default`` argument,
+  containing a default string or object to be supplied if the given optional
+  parse element is not found in the input string; parse action will only
+  be called if a match is found, or if a default is specified
+
+- ``ZeroOrMore`` - similar to Optional, but can be repeated
+
+- ``OneOrMore`` - similar to ZeroOrMore, but at least one match must
+  be present
+
+- ``FollowedBy`` - a lookahead expression, requires matching of the given
+  expressions, but does not advance the parsing position within the input string
+
+- ``NotAny`` - a negative lookahead expression, prevents matching of named
+  expressions, does not advance the parsing position within the input string; 
+  can also be created using the unary '~' operator
+
+
+.. _operators:
+
+Expression operators
+--------------------
+
+- ``~`` - creates NotAny using the expression after the operator
+
+- ``+`` - creates And using the expressions before and after the operator
+
+- ``|`` - creates MatchFirst (first left-to-right match) using the expressions before and after the operator
+
+- ``^`` - creates Or (longest match) using the expressions before and after the operator
+
+- ``&`` - creates Each using the expressions before and after the operator
+
+- ``*`` - creates And by multiplying the expression by the integer operand; if
+  expression is multiplied by a 2-tuple, creates an And of (min,max)
+  expressions (similar to "{min,max}" form in regular expressions); if
+  min is None, intepret as (0,max); if max is None, interpret as 
+  expr*min + ZeroOrMore(expr)
+    
+- ``-`` - like ``+`` but with no backup and retry of alternatives
+
+- ``*`` - repetition of expression
+
+- ``==`` - matching expression to string; returns True if the string matches the given expression
+
+- ``<<=`` - inserts the expression following the operator as the body of the 
+  Forward expression before the operator
+
+
+
+Positional subclasses
+---------------------
+
+- ``StringStart`` - matches beginning of the text
+
+- ``StringEnd`` - matches the end of the text
+
+- ``LineStart`` - matches beginning of a line (lines delimited by ``\n`` characters)
+
+- ``LineEnd`` - matches the end of a line
+
+- ``WordStart`` - matches a leading word boundary
+
+- ``WordEnd`` - matches a trailing word boundary
+
+
+
+Converter subclasses
+--------------------
+
+- ``Upcase`` - converts matched tokens to uppercase (deprecated -
+  use ``upcaseTokens`` parse action instead)
+
+- ``Combine`` - joins all matched tokens into a single string, using
+  specified joinString (default ``joinString=""``); expects
+  all matching tokens to be adjacent, with no intervening
+  whitespace (can be overridden by specifying ``adjacent=False`` in constructor)
+
+- ``Suppress`` - clears matched tokens; useful to keep returned
+  results from being cluttered with required but uninteresting
+  tokens (such as list delimiters)
+
+
+Special subclasses
+------------------
+
+- ``Group`` - causes the matched tokens to be enclosed in a list;
+  useful in repeated elements like ``ZeroOrMore`` and ``OneOrMore`` to
+  break up matched tokens into groups for each repeated pattern
+
+- ``Dict`` - like ``Group``, but also constructs a dictionary, using the
+  [0]'th elements of all enclosed token lists as the keys, and
+  each token list as the value
+
+- ``SkipTo`` - catch-all matching expression that accepts all characters
+  up until the given pattern is found to match; useful for specifying
+  incomplete grammars
+
+- ``Forward`` - placeholder token used to define recursive token
+  patterns; when defining the actual expression later in the
+  program, insert it into the ``Forward`` object using the ``<<``
+  operator (see ``fourFn.py`` for an example).
+
+
+Other classes
+-------------
+.. _ParseResults:
+
+- ``ParseResults`` - class used to contain and manage the lists of tokens
+  created from parsing the input using the user-defined parse 
+  expression.  ParseResults can be accessed in a number of ways:
+
+  - as a list
+  
+    - total list of elements can be found using len()
+    
+    - individual elements can be found using [0], [1], [-1], etc.
+    
+    - elements can be deleted using ``del``
+    
+    - the -1th element can be extracted and removed in a single operation
+      using ``pop()``, or any element can be extracted and removed 
+      using ``pop(n)``
+    
+  - as a dictionary
+  
+    - if ``setResultsName()`` is used to name elements within the 
+      overall parse expression, then these fields can be referenced
+      as dictionary elements or as attributes
+      
+    - the Dict class generates dictionary entries using the data of the
+      input text - in addition to ParseResults listed as ``[ [ a1, b1, c1, ...], [ a2, b2, c2, ...]  ]``
+      it also acts as a dictionary with entries defined as ``{ a1 : [ b1, c1, ... ] }, { a2 : [ b2, c2, ... ] }``; 
+      this is especially useful when processing tabular data where the first column contains a key 
+      value for that line of data
+      
+    - list elements that are deleted using ``del`` will still be accessible by their
+      dictionary keys
+      
+    - supports ``get()``, ``items()`` and ``keys()`` methods, similar to a dictionary
+    
+    - a keyed item can be extracted and removed using ``pop(key)``.  Here
+      key must be non-numeric (such as a string), in order to use dict 
+      extraction instead of list extraction.
+      
+    - new named elements can be added (in a parse action, for instance), using the same
+      syntax as adding an item to a dict (``parseResults["X"]="new item"``); named elements can be removed using ``del parseResults["X"]``
+      
+  - as a nested list
+  
+    - results returned from the Group class are encapsulated within their
+      own list structure, so that the tokens can be handled as a hierarchical
+      tree
+      
+  ParseResults can also be converted to an ordinary list of strings
+  by calling ``asList()``.  Note that this will strip the results of any
+  field names that have been defined for any embedded parse elements.
+  (The ``pprint`` module is especially good at printing out the nested contents
+  given by ``asList()``.)
+  
+  Finally, ParseResults can be converted to an XML string by calling ``asXML()``. Where
+  possible, results will be tagged using the results names defined for the respective
+  ParseExpressions.  ``asXML()`` takes two optional arguments:
+  
+  - ``doctagname`` - for ParseResults that do not have a defined name, this argument
+    will wrap the resulting XML in a set of opening and closing tags ``<doctagname>``
+    and ``</doctagname>``.
+
+  - ``namedItemsOnly`` (default=``False``) - flag to indicate if the generated XML should 
+    skip items that do not have defined names.  If a nested group item is named, then all
+    embedded items will be included, whether they have names or not.
+
+
+Exception classes and Troubleshooting
+-------------------------------------
+
+.. _ParseException:
+
+- ``ParseException`` - exception returned when a grammar parse fails;
+  ParseExceptions have attributes loc, msg, line, lineno, and column; to view the 
+  text line and location where the reported ParseException occurs, use::
+  
+    except ParseException, err:
+        print err.line
+        print " "*(err.column-1) + "^"
+        print err
+  
+- ``RecursiveGrammarException`` - exception returned by ``validate()`` if
+  the grammar contains a recursive infinite loop, such as::
+  
+    badGrammar = Forward()
+    goodToken = Literal("A")
+    badGrammar <<= Optional(goodToken) + badGrammar
+
+- ``ParseFatalException`` - exception that parse actions can raise to stop parsing
+  immediately.  Should be used when a semantic error is found in the input text, such
+  as a mismatched XML tag.
+
+- ``ParseSyntaxException`` - subclass of ``ParseFatalException`` raised when a
+  syntax error is found, based on the use of the '-' operator when defining
+  a sequence of expressions in an ``And`` expression.
+
+You can also get some insights into the parsing logic using diagnostic parse actions,
+and setDebug(), or test the matching of expression fragments by testing them using 
+scanString().
+
+
+Miscellaneous attributes and methods
+====================================
+
+Helper methods
+--------------
+
+- ``delimitedList( expr, delim=',')`` - convenience function for
+  matching one or more occurrences of expr, separated by delim.
+  By default, the delimiters are suppressed, so the returned results contain
+  only the separate list elements.  Can optionally specify ``combine=True``,
+  indicating that the expressions and delimiters should be returned as one
+  combined value (useful for scoped variables, such as "a.b.c", or 
+  "a::b::c", or paths such as "a/b/c").
+
+- ``countedArray( expr )`` - convenience function for a pattern where an list of
+  instances of the given expression are preceded by an integer giving the count of
+  elements in the list.  Returns an expression that parses the leading integer,
+  reads exactly that many expressions, and returns the array of expressions in the
+  parse results - the leading integer is suppressed from the results (although it
+  is easily reconstructed by using len on the returned array).
+
+- ``oneOf( string, caseless=False )`` - convenience function for quickly declaring an
+  alternative set of ``Literal`` tokens, by splitting the given string on 
+  whitespace boundaries.  The tokens are sorted so that longer
+  matches are attempted first; this ensures that a short token does
+  not mask a longer one that starts with the same characters. If ``caseless=True``, 
+  will create an alternative set of CaselessLiteral tokens.
+
+- ``dictOf( key, value )`` - convenience function for quickly declaring a 
+  dictionary pattern of ``Dict( ZeroOrMore( Group( key + value ) ) )``.
+
+- ``makeHTMLTags( tagName )`` and ``makeXMLTags( tagName )`` - convenience
+  functions to create definitions of opening and closing tag expressions.  Returns
+  a pair of expressions, for the corresponding <tag> and </tag> strings.  Includes
+  support for attributes in the opening tag, such as <tag attr1="abc"> - attributes
+  are returned as keyed tokens in the returned ParseResults.  ``makeHTMLTags`` is less
+  restrictive than ``makeXMLTags``, especially with respect to case sensitivity.
+
+- ``infixNotation(baseOperand, operatorList)`` - (formerly named ``operatorPrecedence``) convenience function to define a 
+  grammar for parsing infix notation 
+  expressions with a hierarchical precedence of operators. To use the ``infixNotation`` 
+  helper:
+  
+  1.  Define the base "atom" operand term of the grammar.
+      For this simple grammar, the smallest operand is either
+      and integer or a variable.  This will be the first argument
+      to the ``infixNotation`` method.
+      
+  2.  Define a list of tuples for each level of operator
+      precendence.  Each tuple is of the form
+      ``(opExpr, numTerms, rightLeftAssoc, parseAction)``, where:
+      
+      - ``opExpr`` - the pyparsing expression for the operator;
+        may also be a string, which will be converted to a Literal; if
+        None, indicates an empty operator, such as the implied
+        multiplication operation between 'm' and 'x' in "y = mx + b".
+      
+      - ``numTerms`` - the number of terms for this operator (must
+        be 1, 2, or 3)
+      
+      - ``rightLeftAssoc`` is the indicator whether the operator is
+        right or left associative, using the pyparsing-defined
+        constants ``opAssoc.RIGHT`` and ``opAssoc.LEFT``.
+      
+      - ``parseAction`` is the parse action to be associated with 
+        expressions matching this operator expression (the
+        ``parseAction`` tuple member may be omitted)
+        
+  3.  Call ``infixNotation`` passing the operand expression and
+      the operator precedence list, and save the returned value
+      as the generated pyparsing expression.  You can then use
+      this expression to parse input strings, or incorporate it
+      into a larger, more complex grammar.
+ 
+- ``matchPreviousLiteral`` and ``matchPreviousExpr`` - function to define and 
+  expression that matches the same content
+  as was parsed in a previous parse expression.  For instance::
+  
+        first = Word(nums)
+        matchExpr = first + ":" + matchPreviousLiteral(first)
+  
+  will match "1:1", but not "1:2".  Since this matches at the literal
+  level, this will also match the leading "1:1" in "1:10".
+  
+  In contrast::
+  
+        first = Word(nums)
+        matchExpr = first + ":" + matchPreviousExpr(first)
+           
+  will *not* match the leading "1:1" in "1:10"; the expressions are
+  evaluated first, and then compared, so "1" is compared with "10".
+
+- ``nestedExpr(opener, closer, content=None, ignoreExpr=quotedString)`` - method for defining nested 
+  lists enclosed in opening and closing delimiters.
+
+  - ``opener`` - opening character for a nested list (default="("); can also be a pyparsing expression
+    
+  - ``closer`` - closing character for a nested list (default=")"); can also be a pyparsing expression
+    
+  - ``content`` - expression for items within the nested lists (default=None)
+    
+  - ``ignoreExpr`` - expression for ignoring opening and closing delimiters (default=quotedString)
+
+  If an expression is not provided for the content argument, the nested
+  expression will capture all whitespace-delimited content between delimiters
+  as a list of separate values.
+
+  Use the ignoreExpr argument to define expressions that may contain
+  opening or closing characters that should not be treated as opening
+  or closing characters for nesting, such as quotedString or a comment
+  expression.  Specify multiple expressions using an Or or MatchFirst.
+  The default is quotedString, but if no expressions are to be ignored,
+  then pass None for this argument.
+
+
+- ``indentedBlock( statementExpr, indentationStackVar, indent=True)`` -
+  function to define an indented block of statements, similar to 
+  indentation-based blocking in Python source code:
+  
+  - ``statementExpr`` - the expression defining a statement that
+    will be found in the indented block; a valid ``indentedBlock``
+    must contain at least 1 matching ``statementExpr``
+
+  - ``indentationStackVar`` - a Python list variable; this variable
+    should be common to all ``indentedBlock`` expressions defined
+    within the same grammar, and should be reinitialized to [1]
+    each time the grammar is to be used
+        
+  - ``indent`` - a boolean flag indicating whether the expressions
+    within the block must be indented from the current parse
+    location; if using ``indentedBlock`` to define the left-most
+    statements (all starting in column 1), set ``indent`` to False
+
+.. _originalTextFor:
+
+- ``originalTextFor( expr )`` - helper function to preserve the originally parsed text, regardless of any
+  token processing or conversion done by the contained expression.  For instance, the following expression::
+  
+        fullName = Word(alphas) + Word(alphas)
+
+  will return the parse of "John Smith" as ['John', 'Smith'].  In some applications, the actual name as it
+  was given in the input string is what is desired.  To do this, use ``originalTextFor``::
+  
+        fullName = originalTextFor(Word(alphas) + Word(alphas))
+
+- ``ungroup( expr )`` - function to "ungroup" returned tokens; useful
+  to undo the default behavior of And to always group the returned tokens, even
+  if there is only one in the list. (New in 1.5.6)
+
+- ``lineno( loc, string )`` - function to give the line number of the
+  location within the string; the first line is line 1, newlines
+  start new rows
+
+- ``col( loc, string )`` - function to give the column number of the
+  location within the string; the first column is column 1,
+  newlines reset the column number to 1
+
+- ``line( loc, string )`` - function to retrieve the line of text
+  representing ``lineno( loc, string )``; useful when printing out diagnostic
+  messages for exceptions
+
+- ``srange( rangeSpec )`` - function to define a string of characters, 
+  given a string of the form used by regexp string ranges, such as ``"[0-9]"`` for 
+  all numeric digits, ``"[A-Z_]"`` for uppercase characters plus underscore, and 
+  so on (note that rangeSpec does not include support for generic regular 
+  expressions, just string range specs)
+
+- ``getTokensEndLoc()`` - function to call from within a parse action to get
+  the ending location for the matched tokens
+  
+- ``traceParseAction(fn)`` - decorator function to debug parse actions. Lists
+  each call, called arguments, and return value or exception
+  
+  
+
+Helper parse actions
+--------------------
+
+- ``removeQuotes`` - removes the first and last characters of a quoted string;
+  useful to remove the delimiting quotes from quoted strings
+  
+- ``replaceWith(replString)`` - returns a parse action that simply returns the
+  replString; useful when using transformString, or converting HTML entities, as in::
+  
+      nbsp = Literal("&nbsp;").setParseAction( replaceWith("<BLANK>") )
+
+- ``keepOriginalText``- (deprecated, use originalTextFor_ instead) restores any internal whitespace or suppressed 
+  text within the tokens for a matched parse
+  expression.  This is especially useful when defining expressions
+  for scanString or transformString applications.
+
+- ``withAttribute( *args, **kwargs )`` - helper to create a validating parse action to be used with start tags created 
+  with ``makeXMLTags`` or ``makeHTMLTags``. Use ``withAttribute`` to qualify a starting tag 
+  with a required attribute value, to avoid false matches on common tags such as 
+  ``<TD>`` or ``<DIV>``.
+  
+  ``withAttribute`` can be called with:
+  
+  - keyword arguments, as in ``(class="Customer",align="right")``, or
+  
+  - a list of name-value tuples, as in ``( ("ns1:class", "Customer"), ("ns2:align","right") )``
+
+  An attribute can be specified to have the special value 
+  ``withAttribute.ANY_VALUE``, which will match any value - use this to 
+  ensure that an attribute is present but any attribute value is
+  acceptable.
+
+- ``downcaseTokens`` - converts all matched tokens to lowercase
+
+- ``upcaseTokens`` - converts all matched tokens to uppercase
+
+- ``matchOnlyAtCol( columnNumber )`` - a parse action that verifies that
+  an expression was matched at a particular column, raising a 
+  ParseException if matching at a different column number; useful when parsing
+  tabular data
+
+
+
+Common string and token constants
+---------------------------------
+
+- ``alphas`` - same as ``string.letters``
+
+- ``nums`` - same as ``string.digits``
+
+- ``alphanums`` - a string containing ``alphas + nums``
+
+- ``alphas8bit`` - a string containing alphabetic 8-bit characters::
+
+    �������������������������������������������������������������
+
+- ``printables`` - same as ``string.printable``, minus the space (``' '``) character
+
+- ``empty`` - a global ``Empty()``; will always match
+
+- ``sglQuotedString`` - a string of characters enclosed in 's; may
+  include whitespace, but not newlines
+
+- ``dblQuotedString`` - a string of characters enclosed in "s; may
+  include whitespace, but not newlines
+
+- ``quotedString`` - ``sglQuotedString | dblQuotedString``
+
+- ``cStyleComment`` - a comment block delimited by ``'/*'`` and ``'*/'`` sequences; can span
+  multiple lines, but does not support nesting of comments
+
+- ``htmlComment`` - a comment block delimited by ``'<!--'`` and ``'-->'`` sequences; can span
+  multiple lines, but does not support nesting of comments
+
+- ``commaSeparatedList`` - similar to ``delimitedList``, except that the
+  list expressions can be any text value, or a quoted string; quoted strings can
+  safely include commas without incorrectly breaking the string into two tokens
+
+- ``restOfLine`` - all remaining printable characters up to but not including the next
+  newline
author	ptmcg <ptmcg@9bf210a0-9d2d-494c-87cf-cfb32e7dff7b>	2016-08-09 00:23:49 +0000
committer	ptmcg <ptmcg@9bf210a0-9d2d-494c-87cf-cfb32e7dff7b>	2016-08-09 00:23:49 +0000
commit	b2c3ade75384efe76b8774b607e17fe98fab92ef (patch)
tree	b162262f3f0a4bc976d45bed08ccfc6cc9a2eb23 /trunk/src/HowToUsePyparsing.txt
parent	0be19d2d8545f1ac4b93ffd0d10524613837ba39 (diff)
download	pyparsing-b2c3ade75384efe76b8774b607e17fe98fab92ef.tar.gz