summaryrefslogtreecommitdiff
path: root/TODO
diff options
context:
space:
mode:
authorAkim Demaille <akim.demaille@gmail.com>2019-10-15 07:28:22 +0200
committerAkim Demaille <akim.demaille@gmail.com>2019-10-15 07:28:33 +0200
commitee35055b490dc01faaa7310f4ed84dda0031c26c (patch)
tree51b1d3a6fac12a996853f897eb1356c955c9d8ed /TODO
parente5cbac98b66ddb61fbbadfc77ffcfcd87ea3cb71 (diff)
downloadbison-ee35055b490dc01faaa7310f4ed84dda0031c26c.tar.gz
TODO: update
Diffstat (limited to 'TODO')
-rw-r--r--TODO60
1 files changed, 45 insertions, 15 deletions
diff --git a/TODO b/TODO
index f3f08ce1..5ca7e27a 100644
--- a/TODO
+++ b/TODO
@@ -7,9 +7,6 @@ breaks.
Also, we seem to teach YYPRINT very early on, although it should be
considered deprecated: %printer is superior.
-** glr.cc
-move glr.c into the yy namespace
-
** improve syntax errors (UTF-8, internationalization)
Bison depends on the current locale. For instance:
@@ -58,7 +55,7 @@ Maybe we should exhibit the YYUNDEFTOK token. It could also be assigned a
semantic value so that yyerror could be used to report invalid lexemes.
* Bison 3.6
-** Unit rules
+** Unit rules / Injection rules (Akim Demaille)
Maybe we could expand unit rules (or "injections", see
https://homepages.cwi.nl/~daybuild/daily-books/syntax/2-sdf/sdf.html), i.e.,
transform
@@ -77,10 +74,11 @@ Practice' is impossible to find, but according to 'Parsing Techniques: a
Practical Guide', it includes information about this issue. Does anybody
have it?
-** Injection rules
-See above.
+** clean up (Akim Demaille)
+Do not work on these items now, as I (Akim) have branches with a lot of
+changes in this area, and no desire to have to fix conflicts. These
+cleaning up will happen after my branches have been merged.
-** clean up
*** lalr.c
Introduce a goto struct, and use it in place of from_state/to_state.
Rename states1 as path, length as pathlen.
@@ -139,12 +137,6 @@ itself uses int (for yylen for instance), yet stack is based on size_t.
Maybe locations should also move to ints.
-** C
-Introduce state_type rather than spreading yytype_int16 everywhere?
-
-** glr.c
-yyspaceLeft should probably be a pointer diff.
-
** Graphviz display code thoughts
The code for the --graph option is over two files: print_graph, and
graphviz. This is because Bison used to also produce VCG graphs, but since
@@ -224,11 +216,13 @@ since it is no longer bound to a particular parser, it's just a
(standalone symbol).
* Various
-** Rewrite glr.cc in C++
+** Rewrite glr.cc in C++ (Valentin Tolmer)
As a matter of fact, it would be very interesting to see how much we can
share between lalr1.cc and glr.cc. Most of the skeletons should be common.
It would be a very nice source of inspiration for the other languages.
+Valentin Tolmer is working on this.
+
** YYERRCODE
Defined to 256, but not used, not documented. Probably the token
number for the error token, which POSIX wants to be 256, but which
@@ -298,6 +292,12 @@ other improvements and also made it faster (probably because memory
management is performed once instead of three times). I suggest that
we do the same in yacc.c.
+(Some time later): it's also very nice to have three stacks: it's more dense
+as we don't lose bits to padding. For instance the typical stack for states
+will use 8 bits, while it is likely to consume 32 bits in a struct.
+
+We need trustworth benching for Bison, for all our backends.
+
** yysyntax_error
The code bw glr.c and yacc.c is really alike, we can certainly factor
some parts.
@@ -341,7 +341,24 @@ LORIA, INRIA Nancy - Grand Est, Nancy, France
* Extensions
** Multiple start symbols
-Would be very useful when parsing closely related languages.
+Would be very useful when parsing closely related languages. The idea is to
+declared several start symbols, for instance
+
+ %start: stmt expr
+ %%
+ stmt: ...
+ expr: ...
+
+and to generate parse, parse_stmt and parse_expr. Technically, the above
+grammar would be transformed into
+
+ %start: yy_start
+ yy_start: YY_START_STMT stmt | YY_START_EXPR expr
+
+so that there are no conflicts in the grammar (as would undoubtedly happen
+with yy_start: stmt | expr). Then all that remains to do is to adjust the
+skeletons so that this initial token (YY_START_STMT, YY_START_EXPR) be
+shifted first.
** Better error messages
The users are not provided with enough tools to forge their error messages.
@@ -359,6 +376,12 @@ should make this reasonably easy to implement.
Bruce Mardle <marblypup@yahoo.co.uk>
https://lists.gnu.org/archive/html/bison-patches/2015-09/msg00000.html
+However, there are many other things to do before having such a feature,
+because I don't want a % equivalent to #include (which we all learned to
+hate). I want something that builds "modules" of grammars, and assembles
+them together, paying attention to keep separate bits separates, in
+pseudo name spaces.
+
** Push parsers
There is demand for push parsers in Java and C++. And GLR I guess.
@@ -385,6 +408,10 @@ must be in the scanner: we must not parse what is in a switched off
part of %if. Akim Demaille thinks it should be in the parser, so as
to avoid falling into another CPP mistake.
+(Later): I'm sure there's actually good case for this. People who need that
+feature can use m4/cpp on top of Bison. I don't think it is worth the
+trouble in Bison itself.
+
** XML Output
There are couple of available extensions of Bison targeting some XML
output. Some day we should consider including them. One issue is
@@ -404,6 +431,9 @@ XML output for GNU Bison
https://lists.gnu.org/archive/html/bug-bison/2016-06/msg00000.html
http://www.cs.cornell.edu/andru/papers/cupex/
+Andrew Myers and Vincent Imbimbo are working on this item, see
+https://github.com/akimd/bison/issues/12
+
* Coding system independence
Paul notes: