summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAkim Demaille <akim.demaille@gmail.com>2019-10-15 08:28:15 +0200
committerAkim Demaille <akim.demaille@gmail.com>2019-10-15 08:40:50 +0200
commitb47340982b38e5583b444454e5b6035cb33e1d9c (patch)
treee81ba6fc11c542d7cd98c6eb9a29b48dfaef3974
parentee35055b490dc01faaa7310f4ed84dda0031c26c (diff)
downloadbison-b47340982b38e5583b444454e5b6035cb33e1d9c.tar.gz
TODO: more updates
-rw-r--r--TODO123
1 files changed, 106 insertions, 17 deletions
diff --git a/TODO b/TODO
index 5ca7e27a..d2c56b73 100644
--- a/TODO
+++ b/TODO
@@ -76,8 +76,9 @@ have it?
** clean up (Akim Demaille)
Do not work on these items now, as I (Akim) have branches with a lot of
-changes in this area, and no desire to have to fix conflicts. These
-cleaning up will happen after my branches have been merged.
+changes in this area (hitting several files), and no desire to have to fix
+conflicts. Addressing these items will happen after my branches have been
+merged.
*** lalr.c
Introduce a goto struct, and use it in place of from_state/to_state.
@@ -128,6 +129,84 @@ $ ./tests/testsuite -l | grep errors | sed q
38: input.at:1730 errors
* Short term
+** Stop indentation in diagnostics
+Before Bison 2.7, we printed "flatly" the dependencies in long diagnostics:
+
+ input.y:2.7-12: %type redeclaration for exp
+ input.y:1.7-12: previous declaration
+
+In Bison 2.7, we indented them
+
+ input.y:2.7-12: error: %type redeclaration for exp
+ input.y:1.7-12: previous declaration
+
+Later we quoted the source in the diagnostics, and today we have:
+
+ /tmp/foo.y:1.12-14: warning: symbol FOO redeclared [-Wother]
+ 1 | %token FOO FOO
+ | ^~~
+ /tmp/foo.y:1.8-10: previous declaration
+ 1 | %token FOO FOO
+ | ^~~
+
+The indentation is no longer helping. We should probably get rid of it, or
+maybe keep it only when -fno-caret. GCC displays this as a "note":
+
+ $ g++-mp-9 -Wall /tmp/foo.c -c
+ /tmp/foo.c:1:10: error: redefinition of 'int foo'
+ 1 | int foo, foo;
+ | ^~~
+ /tmp/foo.c:1:5: note: 'int foo' previously declared here
+ 1 | int foo, foo;
+ | ^~~
+
+Likewise for Clang, contrary to what I believed (because "note:" is written
+in black, so it doesn't show in my terminal :-)
+
+ $ clang++-mp-8.0 -Wall /tmp/foo.c -c
+ clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]
+ /tmp/foo.c:1:10: error: redefinition of 'foo'
+ int foo, foo;
+ ^
+ /tmp/foo.c:1:5: note: previous definition is here
+ int foo, foo;
+ ^
+ 1 error generated.
+
+** Better design for diagnostics
+The current implementation of diagnostics is adhoc, it grew organically. It
+works as a series of calls to several functions, with dependency of the
+latter calls on the former. For instance:
+
+ complain (&sym->location,
+ sym->content->status == needed ? complaint : Wother,
+ _("symbol %s is used, but is not defined as a token"
+ " and has no rules; did you mean %s?"),
+ quote_n (0, sym->tag),
+ quote_n (1, best->tag));
+ if (feature_flag & feature_caret)
+ location_caret_suggestion (sym->location, best->tag, stderr);
+
+We should rewrite this in a more FP way:
+
+1. build a rich structure that denotes the (complete) diagnostic.
+ "Complete" in the sense that it also contains the suggestions, the list
+ of possible matches, etc.
+
+2. send this to the pretty-printing routine. The diagnostic structure
+ should be sufficient so that we can generate all the 'format' of
+ diagnostics, including the fixits.
+
+If properly done, this diagnostic module can be detached from Bison and be
+put in gnulib. It could be used, for instance, for errors caught by
+xgettext.
+
+There's certainly already something alike in GCC. At least that's the
+impression I get from reading the "-fdiagnostics-format=FORMAT" part of this
+page:
+
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html
+
** consistency
token vs terminal
@@ -137,6 +216,11 @@ itself uses int (for yylen for instance), yet stack is based on size_t.
Maybe locations should also move to ints.
+Paul Eggert already covered most of this. But before publishing these
+changes, we need to ask our C++ users if they agree with that change, or if
+we need some migration path. Could be a %define variable, or simply
+%require "3.5".
+
** Graphviz display code thoughts
The code for the --graph option is over two files: print_graph, and
graphviz. This is because Bison used to also produce VCG graphs, but since
@@ -156,9 +240,6 @@ Little effort seems to have been given to factoring these files and their
rint{,-xml} counterpart. We would very much like to re-use the pretty format
of states from .output for the graphs, etc.
-Also, the underscore in print_graph.[ch] isn't very fitting considering the
-dashes in the other filenames.
-
Since graphviz dies on medium-to-big grammars, maybe consider an other tool?
** push-parser
@@ -296,12 +377,17 @@ we do the same in yacc.c.
as we don't lose bits to padding. For instance the typical stack for states
will use 8 bits, while it is likely to consume 32 bits in a struct.
-We need trustworth benching for Bison, for all our backends.
+We need trustworthy benchmarks for Bison, for all our backends. Akim has a
+few things scattered around; we need to put them in the repo, and make them
+more useful.
** yysyntax_error
The code bw glr.c and yacc.c is really alike, we can certainly factor
some parts.
+This should be worked on when we also address the expected improvements for
+error generation (e.g., i18n).
+
* Report
@@ -342,23 +428,25 @@ LORIA, INRIA Nancy - Grand Est, Nancy, France
* Extensions
** Multiple start symbols
Would be very useful when parsing closely related languages. The idea is to
-declared several start symbols, for instance
+declare several start symbols, for instance
- %start: stmt expr
+ %start stmt expr
%%
stmt: ...
expr: ...
-and to generate parse, parse_stmt and parse_expr. Technically, the above
-grammar would be transformed into
+and to generate parse(), parse_stmt() and parse_expr(). Technically, the
+above grammar would be transformed into
- %start: yy_start
+ %start yy_start
+ %token YY_START_STMT YY_START_EXPR
+ %%
yy_start: YY_START_STMT stmt | YY_START_EXPR expr
-so that there are no conflicts in the grammar (as would undoubtedly happen
-with yy_start: stmt | expr). Then all that remains to do is to adjust the
-skeletons so that this initial token (YY_START_STMT, YY_START_EXPR) be
-shifted first.
+so that there are no new conflicts in the grammar (as would undoubtedly
+happen with yy_start: stmt | expr). Then adjust the skeletons so that this
+initial token (YY_START_STMT, YY_START_EXPR) be shifted first in the
+corresponding parse function.
** Better error messages
The users are not provided with enough tools to forge their error messages.
@@ -379,8 +467,8 @@ https://lists.gnu.org/archive/html/bison-patches/2015-09/msg00000.html
However, there are many other things to do before having such a feature,
because I don't want a % equivalent to #include (which we all learned to
hate). I want something that builds "modules" of grammars, and assembles
-them together, paying attention to keep separate bits separates, in
-pseudo name spaces.
+them together, paying attention to keep separate bits separated, in pseudo
+name spaces.
** Push parsers
There is demand for push parsers in Java and C++. And GLR I guess.
@@ -463,6 +551,7 @@ It is unfortunate that there is a total order for precedence. It
makes it impossible to have modular precedence information. We should
move to partial orders (sounds like series/parallel orders to me).
+This is a prerequisite for modules.
* $undefined
From Hans: