diff options
author | Akim Demaille <akim.demaille@gmail.com> | 2020-04-04 19:27:07 +0200 |
---|---|---|
committer | Akim Demaille <akim.demaille@gmail.com> | 2020-04-05 08:56:23 +0200 |
commit | 4e3c06b0f87ddb161ce00cd1dc644ff983b8efdf (patch) | |
tree | 3484d591c7b180b6b5aef31378dd329ba2d35699 /TODO | |
parent | 4e26809ab9eaf18beb15f5af3331f42c782f9572 (diff) | |
download | bison-4e3c06b0f87ddb161ce00cd1dc644ff983b8efdf.tar.gz |
todo: update
Diffstat (limited to 'TODO')
-rw-r--r-- | TODO | 69 |
1 files changed, 4 insertions, 65 deletions
@@ -1,28 +1,24 @@ * Bison 3.6 ** Documentation - yyexpected_tokens in all the languages. -- remove yysyntax_error_arguments. - YYNOMEM - i18n in Java -** Java -Check api.token.raw - ** Naming conventions -yysyntax_error_arguments should be yy_syntax_error_arguments, since it's a -private implementation detail. - There's no good reason to use the "yy" prefix in parser::context, is there? See also the case of Java. We should keep the prefix for private implementation details, but maybe not for public APIs. -** User token number, internal synbol number, external token number, etc. +** User token number, internal symbol number, external token number, etc. There is some confusion over these terms, which is even a problem for translators. We need something clear, especially if we provide access to the symbol numbers (which would be useful for custom error messages). We could use "number" and "code". +Update: the current best options would be "token kind" and "symbol kind", +instead of "token type" and "symbol type". + *** The documentation You can explicitly specify the numeric code for a token type... @@ -50,75 +46,18 @@ uses "user token number" in most places. *** M4 Make it consistent with the rest (it uses "user_number" and "number"). -** Symbol numbers -Giving names to symbol numbers would be useful in custom error messages. It -would actually also make the following point gracefully handled (status of -YYERRCODE, YYUNDEFTOK, etc.). Possibly we could also define YYEMPTY (twice: -as a token and as a symbol). And YYEOF. - -** Consistency -YYUNDEFTOK is an internal symbol number, as YYTERROR. -But YYERRCODE is an external token number. - ** Java: EOF We should be able to redefine EOF like we do in C. ** Java: calc.at Stop hard-coding "Calc". Adjust local.at (look for FIXME). -** Java: _ -We must not use _ in Java, it is becoming a keyword in Java 9. - -examples/java/calc/Calc.java:998: warning: '_' used as an identifier - "$end", "error", "$undefined", _("end of line"), _("number"), "'='", - ^ - (use of '_' as an identifier might not be supported in releases after Java SE 8) - ** doc I feel it's ugly to use the GNU style to declare functions in the doc. It generates tons of white space in the page, and may contribute to bad page breaks. ** improve syntax errors (UTF-8, internationalization) -Bison depends on the current locale. For instance: - -%define parse.error verbose -%code top { - #include <stdio.h> - #include <stdlib.h> - void yyerror(const char* msg) { fprintf(stderr, "%s\n", msg); } - int yylex() { return 0; } -} -%% -exp: "↦" | "🎅🐃" | '\n' -%% -int main() { return yyparse(); } - -gives different results with/without LC_ALL=C. - -$ LC_ALL=C /opt/local/bin/bison /tmp/mangle.y -o ascii.c -$ /opt/local/bin/bison /tmp/mangle.y -o utf8.c -$ diff -u ascii.c utf8.c -I#line ---- ascii.c 2019-01-12 08:15:35.878010093 +0100 -+++ utf8.c 2019-01-12 08:15:38.856495929 +0100 -@@ -415,9 +415,8 @@ - First, the terminals, then, starting at YYNTOKENS, nonterminals. */ - static const char *const yytname[] = - { -- "$end", "error", "$undefined", "\"\\342\\206\\246\"", -- "\"\\360\\237\\216\\205\\360\\237\\220\\203\"", "'\\n'", "$accept", -- "exp", YY_NULLPTR -+ "$end", "error", "$undefined", "\"↦\"", "\"🎅🐃\"", "'\\n'", -+ "$accept", "exp", YY_NULLPTR - }; - #endif - -$ gcc ascii.c -o ascii && ./ascii -syntax error, unexpected $end, expecting "\342\206\246" or "\360\237\216\205\360\237\220\203" or '\n' -$ gcc utf8.c -o utf8 && ./utf8 -syntax error, unexpected $end, expecting ↦ or 🎅🐃 or '\n' - - While at it, we should stop using "$end" by default, in favor of "end of file", or "end of input", whatever. See how lalr1.java does that. |