From b47340982b38e5583b444454e5b6035cb33e1d9c Mon Sep 17 00:00:00 2001 From: Akim Demaille Date: Tue, 15 Oct 2019 08:28:15 +0200 Subject: TODO: more updates --- TODO | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 106 insertions(+), 17 deletions(-) (limited to 'TODO') diff --git a/TODO b/TODO index 5ca7e27a..d2c56b73 100644 --- a/TODO +++ b/TODO @@ -76,8 +76,9 @@ have it? ** clean up (Akim Demaille) Do not work on these items now, as I (Akim) have branches with a lot of -changes in this area, and no desire to have to fix conflicts. These -cleaning up will happen after my branches have been merged. +changes in this area (hitting several files), and no desire to have to fix +conflicts. Addressing these items will happen after my branches have been +merged. *** lalr.c Introduce a goto struct, and use it in place of from_state/to_state. @@ -128,6 +129,84 @@ $ ./tests/testsuite -l | grep errors | sed q 38: input.at:1730 errors * Short term +** Stop indentation in diagnostics +Before Bison 2.7, we printed "flatly" the dependencies in long diagnostics: + + input.y:2.7-12: %type redeclaration for exp + input.y:1.7-12: previous declaration + +In Bison 2.7, we indented them + + input.y:2.7-12: error: %type redeclaration for exp + input.y:1.7-12: previous declaration + +Later we quoted the source in the diagnostics, and today we have: + + /tmp/foo.y:1.12-14: warning: symbol FOO redeclared [-Wother] + 1 | %token FOO FOO + | ^~~ + /tmp/foo.y:1.8-10: previous declaration + 1 | %token FOO FOO + | ^~~ + +The indentation is no longer helping. We should probably get rid of it, or +maybe keep it only when -fno-caret. GCC displays this as a "note": + + $ g++-mp-9 -Wall /tmp/foo.c -c + /tmp/foo.c:1:10: error: redefinition of 'int foo' + 1 | int foo, foo; + | ^~~ + /tmp/foo.c:1:5: note: 'int foo' previously declared here + 1 | int foo, foo; + | ^~~ + +Likewise for Clang, contrary to what I believed (because "note:" is written +in black, so it doesn't show in my terminal :-) + + $ clang++-mp-8.0 -Wall /tmp/foo.c -c + clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated] + /tmp/foo.c:1:10: error: redefinition of 'foo' + int foo, foo; + ^ + /tmp/foo.c:1:5: note: previous definition is here + int foo, foo; + ^ + 1 error generated. + +** Better design for diagnostics +The current implementation of diagnostics is adhoc, it grew organically. It +works as a series of calls to several functions, with dependency of the +latter calls on the former. For instance: + + complain (&sym->location, + sym->content->status == needed ? complaint : Wother, + _("symbol %s is used, but is not defined as a token" + " and has no rules; did you mean %s?"), + quote_n (0, sym->tag), + quote_n (1, best->tag)); + if (feature_flag & feature_caret) + location_caret_suggestion (sym->location, best->tag, stderr); + +We should rewrite this in a more FP way: + +1. build a rich structure that denotes the (complete) diagnostic. + "Complete" in the sense that it also contains the suggestions, the list + of possible matches, etc. + +2. send this to the pretty-printing routine. The diagnostic structure + should be sufficient so that we can generate all the 'format' of + diagnostics, including the fixits. + +If properly done, this diagnostic module can be detached from Bison and be +put in gnulib. It could be used, for instance, for errors caught by +xgettext. + +There's certainly already something alike in GCC. At least that's the +impression I get from reading the "-fdiagnostics-format=FORMAT" part of this +page: + +https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html + ** consistency token vs terminal @@ -137,6 +216,11 @@ itself uses int (for yylen for instance), yet stack is based on size_t. Maybe locations should also move to ints. +Paul Eggert already covered most of this. But before publishing these +changes, we need to ask our C++ users if they agree with that change, or if +we need some migration path. Could be a %define variable, or simply +%require "3.5". + ** Graphviz display code thoughts The code for the --graph option is over two files: print_graph, and graphviz. This is because Bison used to also produce VCG graphs, but since @@ -156,9 +240,6 @@ Little effort seems to have been given to factoring these files and their rint{,-xml} counterpart. We would very much like to re-use the pretty format of states from .output for the graphs, etc. -Also, the underscore in print_graph.[ch] isn't very fitting considering the -dashes in the other filenames. - Since graphviz dies on medium-to-big grammars, maybe consider an other tool? ** push-parser @@ -296,12 +377,17 @@ we do the same in yacc.c. as we don't lose bits to padding. For instance the typical stack for states will use 8 bits, while it is likely to consume 32 bits in a struct. -We need trustworth benching for Bison, for all our backends. +We need trustworthy benchmarks for Bison, for all our backends. Akim has a +few things scattered around; we need to put them in the repo, and make them +more useful. ** yysyntax_error The code bw glr.c and yacc.c is really alike, we can certainly factor some parts. +This should be worked on when we also address the expected improvements for +error generation (e.g., i18n). + * Report @@ -342,23 +428,25 @@ LORIA, INRIA Nancy - Grand Est, Nancy, France * Extensions ** Multiple start symbols Would be very useful when parsing closely related languages. The idea is to -declared several start symbols, for instance +declare several start symbols, for instance - %start: stmt expr + %start stmt expr %% stmt: ... expr: ... -and to generate parse, parse_stmt and parse_expr. Technically, the above -grammar would be transformed into +and to generate parse(), parse_stmt() and parse_expr(). Technically, the +above grammar would be transformed into - %start: yy_start + %start yy_start + %token YY_START_STMT YY_START_EXPR + %% yy_start: YY_START_STMT stmt | YY_START_EXPR expr -so that there are no conflicts in the grammar (as would undoubtedly happen -with yy_start: stmt | expr). Then all that remains to do is to adjust the -skeletons so that this initial token (YY_START_STMT, YY_START_EXPR) be -shifted first. +so that there are no new conflicts in the grammar (as would undoubtedly +happen with yy_start: stmt | expr). Then adjust the skeletons so that this +initial token (YY_START_STMT, YY_START_EXPR) be shifted first in the +corresponding parse function. ** Better error messages The users are not provided with enough tools to forge their error messages. @@ -379,8 +467,8 @@ https://lists.gnu.org/archive/html/bison-patches/2015-09/msg00000.html However, there are many other things to do before having such a feature, because I don't want a % equivalent to #include (which we all learned to hate). I want something that builds "modules" of grammars, and assembles -them together, paying attention to keep separate bits separates, in -pseudo name spaces. +them together, paying attention to keep separate bits separated, in pseudo +name spaces. ** Push parsers There is demand for push parsers in Java and C++. And GLR I guess. @@ -463,6 +551,7 @@ It is unfortunate that there is a total order for precedence. It makes it impossible to have modular precedence information. We should move to partial orders (sounds like series/parallel orders to me). +This is a prerequisite for modules. * $undefined From Hans: -- cgit v1.2.1