diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/misc/ChangeLog | 6 | ||||
-rw-r--r-- | doc/misc/bovine.texi | 480 |
2 files changed, 486 insertions, 0 deletions
diff --git a/doc/misc/ChangeLog b/doc/misc/ChangeLog index 22e0e9d85ae..3557e27184c 100644 --- a/doc/misc/ChangeLog +++ b/doc/misc/ChangeLog @@ -1,3 +1,9 @@ +2012-12-13 Eric Ludlam <zappo@gnu.org> + David Ponce <david@dponce.com> + Richard Kim <emacs18@gmail.com> + + * bovine.texi: New file, imported from CEDET trunk. + 2012-12-12 Glenn Morris <rgm@gnu.org> * flymake.texi (Customizable variables, Locating the buildfile): diff --git a/doc/misc/bovine.texi b/doc/misc/bovine.texi new file mode 100644 index 00000000000..b24e0e0dd7d --- /dev/null +++ b/doc/misc/bovine.texi @@ -0,0 +1,480 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename bovine.info +@set TITLE Bovine parser development +@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim +@settitle @value{TITLE} + +@c ************************************************************************* +@c @ Header +@c ************************************************************************* + +@c Merge all indexes into a single index for now. +@c We can always separate them later into two or more as needed. +@syncodeindex vr cp +@syncodeindex fn cp +@syncodeindex ky cp +@syncodeindex pg cp +@syncodeindex tp cp + +@c @footnotestyle separate +@c @paragraphindent 2 +@c @@smallbook +@c %**end of header + +@copying +This manual documents Bovine parser development in Semantic + +Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 Eric M. Ludlam +Copyright @copyright{} 2001, 2002, 2003, 2004 David Ponce +Copyright @copyright{} 2002, 2003 Richard Y. Kim + +@quotation +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.1 or +any later version published by the Free Software Foundation; with the +Invariant Sections being list their titles, with the Front-Cover Texts +being list, and with the Back-Cover Texts being list. A copy of the +license is included in the section entitled ``GNU Free Documentation +License''. +@end quotation +@end copying + +@ifinfo +@dircategory Emacs +@direntry +* Semantic bovine parser development: (bovine). +@end direntry +@end ifinfo + +@iftex +@finalout +@end iftex + +@c @setchapternewpage odd +@c @setchapternewpage off + +@ifinfo +This file documents parser development with the bovine parser generator +@emph{Infrastructure for parser based text analysis in Emacs} + +Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR} +@end ifinfo + +@titlepage +@sp 10 +@title @value{TITLE} +@author by @value{AUTHOR} +@vskip 0pt plus 1 fill +Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR} +@page +@vskip 0pt plus 1 fill +@insertcopying +@end titlepage +@page + +@c MACRO inclusion +@include semanticheader.texi + + +@c ************************************************************************* +@c @ Document +@c ************************************************************************* +@contents + +@node top +@top @value{TITLE} + +The @dfn{bovine} parser is the original @semantic{} parser, and is an +implementation of an @acronym{LL} parser. It is good for simple +languages. It has many conveniences making grammar writing easy. The +conveniences make it less powerful than a Bison-like @acronym{LALR} +parser. For more information, @inforef{top, the Wisent Parser Manual, +wisent}. + +Bovine @acronym{LL} grammars are stored in files with a @file{.by} +extension. When compiled, the contents is converted into a file of +the form @file{NAME-by.el}. This, in turn is byte compiled. +@inforef{top, Grammar Framework Manual, grammar-fw}. + +@menu +* Starting Rules:: The starting rules for the grammar. +* Bovine Grammar Rules:: Rules used to parse a language +* Optional Lambda Expression:: Actions to take when a rule is matched +* Bovine Examples:: Simple Samples +* GNU Free Documentation License:: +* Index:: +@end menu + +@node Starting Rules +@chapter Starting Rules + +In Bison, one and only one nonterminal is designated as the ``start'' +symbol. In @semantic{}, one or more nonterminals can be designated as +the ``start'' symbol. They are declared following the @code{%start} +keyword separated by spaces. @inforef{start Decl, ,grammar-fw}. + +If no @code{%start} keyword is used in a grammar, then the very first +is used. Internally the first start nonterminal is targeted by the +reserved symbol @code{bovine-toplevel}, so it can be found by the +parser harness. + +To find locally defined variables, the local context handler needs to +parse the body of functional code. The @code{scopestart} declaration +specifies the name of a nonterminal used as the goal to parse a local +context, @inforef{scopestart Decl, ,grammar-fw}. Internally the +scopestart nonterminal is targeted by the reserved symbol +@code{bovine-inner-scope}, so it can be found by the parser harness. + +@node Bovine Grammar Rules +@chapter Bovine Grammar Rules + +The rules are what allow the compiler to create tags from a language +file. Once the setup is done in the prologue, you can start writing +rules. @inforef{Grammar Rules, ,grammar-fw}. + +@example +@var{result} : @var{components1} @var{optional-semantic-action1}) + | @var{components2} @var{optional-semantic-action2} + ; +@end example + +@var{result} is a nonterminal, that is a symbol synthesized in your grammar. +@var{components} is a list of elements that are to be matched if @var{result} +is to be made. @var{optional-semantic-action} is an optional sequence +of simplified Emacs Lisp expressions for concocting the parse tree. + +In bison, each time an element of @var{components} is found, it is +@dfn{shifted} onto the parser stack. (The stack of matched elements.) +When all @var{components}' elements have been matched, it is +@dfn{reduced} to @var{result}. @xref{(bison)Algorithm}. + +A particular @var{result} written into your grammar becomes +the parser's goal. It is designated by a @code{%start} statement +(@pxref{Starting Rules}). The value returned by the associated +@var{optional-semantic-action} is the parser's result. It should be +a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, , +semantic-appdev}. + +@var{components} is made up of symbols. A symbol such as @code{FOO} +means that a syntactic token of class @code{FOO} must be matched. + +@menu +* How Lexical Tokens Match:: +* Grammar-to-Lisp Details:: +* Order of components in rules:: +@end menu + +@node How Lexical Tokens Match +@section How Lexical Tokens Match + +A lexical rule must be used to define how to match a lexical token. + +For instance: + +@example +%keyword FOO "foo" +@end example + +Means that @code{FOO} is a reserved language keyword, matched as such +by looking up into a keyword table, @inforef{keyword Decl, +,grammar-fw}. This is because @code{"foo"} will be converted to +@code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO} +won't be available any other way. + +If we specify our token in this way: + +@example +%token <symbol> FOO "foo" +@end example + +then @code{FOO} will match the string @code{"foo"} explicitly, but it +won't do so at the lexical level, allowing use of the text +@code{"foo"} in other forms of regular expressions. + +In that case, @code{FOO} is a @code{symbol}-type token. To match, a +@code{symbol} must first be encountered, and then it must +@code{string-match "foo"}. + +@table @strong +@item Caution: +Be especially careful to remember that @code{"foo"}, and more +generally the %token's match-value string, is a regular expression! +@end table + +Non symbol tokens are also allowed. For example: + +@example +%token <punctuation> PERIOD "[.]" + +filename : symbol PERIOD symbol + ; +@end example + +@code{PERIOD} is a @code{punctuation}-type token that will explicitly +match one period when used in the above rule. + +@table @strong +@item Please Note: +@code{symbol}, @code{punctuation}, etc., are predefined lexical token +types, based on the @dfn{syntax class}-character associations +currently in effect. +@end table + +@node Grammar-to-Lisp Details +@section Grammar-to-Lisp Details + +For the bovinator, lexical token matching patterns are @emph{inlined}. +When the grammar-to-lisp converter encounters a lexical token +declaration of the form: + +@example +%token <@var{type}> @var{token-name} @var{match-value} +@end example + +It substitutes every occurrences of @var{token-name} in rules, by its +expanded form: + +@example +@var{type} @var{match-value} +@end example + +For example: + +@example +%token <symbol> MOOSE "moose" + +find_a_moose: MOOSE + ; +@end example + +Will generate this pseudo equivalent-rule: + +@example +find_a_moose: symbol "moose" ;; invalid syntax! + ; +@end example + +Thus, from the bovinator point of view, the @var{components} part of a +rule is made up of symbols and strings. A string in the mix means +that the previous symbol must have the additional constraint of +exactly matching it, as described in @ref{How Lexical Tokens Match}. + +@table @strong +@item Please Note: +For the bovinator, this task was mixed into the language definition to +simplify implementation, though Bison's technique is more efficient. +@end table + +@node Order of components in rules +@section Order of components in rules + +If a rule has multiple components, order is important, for example + +@example +headerfile : symbol PERIOD symbol + | symbol + ; +@end example + +would match @samp{foo.h} or the @acronym{C++} header @samp{foo}. +The bovine parser will first attempt to match the long form, and then +the short form. If they were in reverse order, then the long form +would never be tested. + +@c @xref{Default syntactic tokens}. + +@node Optional Lambda Expression +@chapter Optional Lambda Expressions + +The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into +a bovine lambda. This lambda has special short-cuts to simplify +reading the semantic action definition. An @acronym{OLE} like this: + +@example +( $1 ) +@end example + +results in a lambda return which consists entirely of the string +or object found by matching the first (zeroth) element of match. +An @acronym{OLE} like this: + +@example +( ,(foo $1) ) +@end example + +executes @code{foo} on the first argument, and then splices its return +into the return list whereas: + +@example +( (foo $1) ) +@end example + +executes @code{foo}, and that is placed in the return list. + +Here are other things that can appear inline: + +@table @code +@item $1 +The first object matched. + +@item ,$1 +The first object spliced into the list (assuming it is a list from a +non-terminal). + +@item '$1 +The first object matched, placed in a list. i.e. @code{( $1 )}. + +@item foo +The symbol @code{foo} (exactly as displayed). + +@item (foo) +A function call to foo which is stuck into the return list. + +@item ,(foo) +A function call to foo which is spliced into the return list. + +@item '(foo) +A function call to foo which is stuck into the return list in a list. + +@item (EXPAND @var{$1} @var{nonterminal} @var{depth}) +A list starting with @code{EXPAND} performs a recursive parse on the +token passed to it (represented by @samp{$1} above.) The +@dfn{semantic list} is a common token to expand, as there are often +interesting things in the list. The @var{nonterminal} is a symbol in +your table which the bovinator will start with when parsing. +@var{nonterminal}'s definition is the same as any other nonterminal. +@var{depth} should be at least @samp{1} when descending into a +semantic list. + +@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth}) +Is like @code{EXPAND}, except that the parser will iterate over +@var{nonterminal} until there are no more matches. (The same way the +parser iterates over the starting rule (@pxref{Starting Rules}). This +lets you have much simpler rules in this specific case, and also lets +you have positional information in the returned tokens, and error +skipping. + +@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{}) +This is used for creating an association list. Each @var{symbol} is +included in the list if the associated @var{value} is non-@code{nil}. +While the items are all listed explicitly, the created structure is an +association list of the form: + +@example +((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{}) +@end example + +@item (TAG @var{name} @var{class} [@var{attributes}]) +This creates one tag in the current buffer. + +@table @var +@item name +Is a string that represents the tag in the language. + +@item class +Is the kind of tag being create, such as @code{function}, or +@code{variable}, though any symbol will work. + +@item attributes +Is an optional set of labeled values such as @w{@code{:constant-flag t :parent +"parenttype"}}. +@end table + +@item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}]) +@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}]) +@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}]) +@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}]) +@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}]) +@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}]) +Create a tag with @var{name} of respectively the class +@code{variable}, @code{function}, @code{type}, @code{include}, +@code{package}, and @code{code}. +See @inforef{Creating Tags, , semantic-appdev} for the lisp +functions these translate into. +@end table + +If the symbol @code{%quotemode backquote} is specified, then use +@code{,@@} to splice a list in, and @code{,} to evaluate the expression. +This lets you send @code{$1} as a symbol into a list instead of having +it expanded inline. + +@node Bovine Examples +@chapter Examples + +The rule: + +@example +any-symbol: symbol + ; +@end example + +is equivalent to + +@example +any-symbol: symbol + ( $1 ) + ; +@end example + +which, if it matched the string @samp{"A"}, would return + +@example +( "A" ) +@end example + +If this rule were used like this: + +@example +%token <punctuation> EQUAL "=" +@dots{} +assign: any-symbol EQUAL any-symbol + ( $1 $3 ) + ; +@end example + +it would match @samp{"A=B"}, and return + +@example +( ("A") ("B") ) +@end example + +The letters @samp{A} and @samp{B} come back in lists because +@samp{any-symbol} is a nonterminal, not an actual lexical element. + +To get a better result with nonterminals, use @asis{,} to splice lists +in like this: + +@example +%token <punctuation> EQUAL "=" +@dots{} +assign: any-symbol EQUAL any-symbol + ( ,$1 ,$3 ) + ; +@end example + +which would return + +@example +( "A" "B" ) +@end example + +@node GNU Free Documentation License +@appendix GNU Free Documentation License + +@include fdl.texi + +@node Index +@unnumbered Index +@printindex cp + +@iftex +@contents +@summarycontents +@end iftex + +@bye + +@c Following comments are for the benefit of ispell. + +@c LocalWords: bovinator inlined |