diff options
author | Akim Demaille <akim.demaille@gmail.com> | 2019-10-15 07:28:22 +0200 |
---|---|---|
committer | Akim Demaille <akim.demaille@gmail.com> | 2019-10-15 07:28:33 +0200 |
commit | ee35055b490dc01faaa7310f4ed84dda0031c26c (patch) | |
tree | 51b1d3a6fac12a996853f897eb1356c955c9d8ed /TODO | |
parent | e5cbac98b66ddb61fbbadfc77ffcfcd87ea3cb71 (diff) | |
download | bison-ee35055b490dc01faaa7310f4ed84dda0031c26c.tar.gz |
TODO: update
Diffstat (limited to 'TODO')
-rw-r--r-- | TODO | 60 |
1 files changed, 45 insertions, 15 deletions
@@ -7,9 +7,6 @@ breaks. Also, we seem to teach YYPRINT very early on, although it should be considered deprecated: %printer is superior. -** glr.cc -move glr.c into the yy namespace - ** improve syntax errors (UTF-8, internationalization) Bison depends on the current locale. For instance: @@ -58,7 +55,7 @@ Maybe we should exhibit the YYUNDEFTOK token. It could also be assigned a semantic value so that yyerror could be used to report invalid lexemes. * Bison 3.6 -** Unit rules +** Unit rules / Injection rules (Akim Demaille) Maybe we could expand unit rules (or "injections", see https://homepages.cwi.nl/~daybuild/daily-books/syntax/2-sdf/sdf.html), i.e., transform @@ -77,10 +74,11 @@ Practice' is impossible to find, but according to 'Parsing Techniques: a Practical Guide', it includes information about this issue. Does anybody have it? -** Injection rules -See above. +** clean up (Akim Demaille) +Do not work on these items now, as I (Akim) have branches with a lot of +changes in this area, and no desire to have to fix conflicts. These +cleaning up will happen after my branches have been merged. -** clean up *** lalr.c Introduce a goto struct, and use it in place of from_state/to_state. Rename states1 as path, length as pathlen. @@ -139,12 +137,6 @@ itself uses int (for yylen for instance), yet stack is based on size_t. Maybe locations should also move to ints. -** C -Introduce state_type rather than spreading yytype_int16 everywhere? - -** glr.c -yyspaceLeft should probably be a pointer diff. - ** Graphviz display code thoughts The code for the --graph option is over two files: print_graph, and graphviz. This is because Bison used to also produce VCG graphs, but since @@ -224,11 +216,13 @@ since it is no longer bound to a particular parser, it's just a (standalone symbol). * Various -** Rewrite glr.cc in C++ +** Rewrite glr.cc in C++ (Valentin Tolmer) As a matter of fact, it would be very interesting to see how much we can share between lalr1.cc and glr.cc. Most of the skeletons should be common. It would be a very nice source of inspiration for the other languages. +Valentin Tolmer is working on this. + ** YYERRCODE Defined to 256, but not used, not documented. Probably the token number for the error token, which POSIX wants to be 256, but which @@ -298,6 +292,12 @@ other improvements and also made it faster (probably because memory management is performed once instead of three times). I suggest that we do the same in yacc.c. +(Some time later): it's also very nice to have three stacks: it's more dense +as we don't lose bits to padding. For instance the typical stack for states +will use 8 bits, while it is likely to consume 32 bits in a struct. + +We need trustworth benching for Bison, for all our backends. + ** yysyntax_error The code bw glr.c and yacc.c is really alike, we can certainly factor some parts. @@ -341,7 +341,24 @@ LORIA, INRIA Nancy - Grand Est, Nancy, France * Extensions ** Multiple start symbols -Would be very useful when parsing closely related languages. +Would be very useful when parsing closely related languages. The idea is to +declared several start symbols, for instance + + %start: stmt expr + %% + stmt: ... + expr: ... + +and to generate parse, parse_stmt and parse_expr. Technically, the above +grammar would be transformed into + + %start: yy_start + yy_start: YY_START_STMT stmt | YY_START_EXPR expr + +so that there are no conflicts in the grammar (as would undoubtedly happen +with yy_start: stmt | expr). Then all that remains to do is to adjust the +skeletons so that this initial token (YY_START_STMT, YY_START_EXPR) be +shifted first. ** Better error messages The users are not provided with enough tools to forge their error messages. @@ -359,6 +376,12 @@ should make this reasonably easy to implement. Bruce Mardle <marblypup@yahoo.co.uk> https://lists.gnu.org/archive/html/bison-patches/2015-09/msg00000.html +However, there are many other things to do before having such a feature, +because I don't want a % equivalent to #include (which we all learned to +hate). I want something that builds "modules" of grammars, and assembles +them together, paying attention to keep separate bits separates, in +pseudo name spaces. + ** Push parsers There is demand for push parsers in Java and C++. And GLR I guess. @@ -385,6 +408,10 @@ must be in the scanner: we must not parse what is in a switched off part of %if. Akim Demaille thinks it should be in the parser, so as to avoid falling into another CPP mistake. +(Later): I'm sure there's actually good case for this. People who need that +feature can use m4/cpp on top of Bison. I don't think it is worth the +trouble in Bison itself. + ** XML Output There are couple of available extensions of Bison targeting some XML output. Some day we should consider including them. One issue is @@ -404,6 +431,9 @@ XML output for GNU Bison https://lists.gnu.org/archive/html/bug-bison/2016-06/msg00000.html http://www.cs.cornell.edu/andru/papers/cupex/ +Andrew Myers and Vincent Imbimbo are working on this item, see +https://github.com/akimd/bison/issues/12 + * Coding system independence Paul notes: |