summaryrefslogtreecommitdiff
path: root/TODO
diff options
context:
space:
mode:
authorAkim Demaille <akim.demaille@gmail.com>2020-04-04 19:27:07 +0200
committerAkim Demaille <akim.demaille@gmail.com>2020-04-05 08:56:23 +0200
commit4e3c06b0f87ddb161ce00cd1dc644ff983b8efdf (patch)
tree3484d591c7b180b6b5aef31378dd329ba2d35699 /TODO
parent4e26809ab9eaf18beb15f5af3331f42c782f9572 (diff)
downloadbison-4e3c06b0f87ddb161ce00cd1dc644ff983b8efdf.tar.gz
todo: update
Diffstat (limited to 'TODO')
-rw-r--r--TODO69
1 files changed, 4 insertions, 65 deletions
diff --git a/TODO b/TODO
index 5558c2a3..ac61ac26 100644
--- a/TODO
+++ b/TODO
@@ -1,28 +1,24 @@
* Bison 3.6
** Documentation
- yyexpected_tokens in all the languages.
-- remove yysyntax_error_arguments.
- YYNOMEM
- i18n in Java
-** Java
-Check api.token.raw
-
** Naming conventions
-yysyntax_error_arguments should be yy_syntax_error_arguments, since it's a
-private implementation detail.
-
There's no good reason to use the "yy" prefix in parser::context, is there?
See also the case of Java. We should keep the prefix for private
implementation details, but maybe not for public APIs.
-** User token number, internal synbol number, external token number, etc.
+** User token number, internal symbol number, external token number, etc.
There is some confusion over these terms, which is even a problem for
translators. We need something clear, especially if we provide access to
the symbol numbers (which would be useful for custom error messages).
We could use "number" and "code".
+Update: the current best options would be "token kind" and "symbol kind",
+instead of "token type" and "symbol type".
+
*** The documentation
You can explicitly specify the numeric code for a token type...
@@ -50,75 +46,18 @@ uses "user token number" in most places.
*** M4
Make it consistent with the rest (it uses "user_number" and "number").
-** Symbol numbers
-Giving names to symbol numbers would be useful in custom error messages. It
-would actually also make the following point gracefully handled (status of
-YYERRCODE, YYUNDEFTOK, etc.). Possibly we could also define YYEMPTY (twice:
-as a token and as a symbol). And YYEOF.
-
-** Consistency
-YYUNDEFTOK is an internal symbol number, as YYTERROR.
-But YYERRCODE is an external token number.
-
** Java: EOF
We should be able to redefine EOF like we do in C.
** Java: calc.at
Stop hard-coding "Calc". Adjust local.at (look for FIXME).
-** Java: _
-We must not use _ in Java, it is becoming a keyword in Java 9.
-
-examples/java/calc/Calc.java:998: warning: '_' used as an identifier
- "$end", "error", "$undefined", _("end of line"), _("number"), "'='",
- ^
- (use of '_' as an identifier might not be supported in releases after Java SE 8)
-
** doc
I feel it's ugly to use the GNU style to declare functions in the doc. It
generates tons of white space in the page, and may contribute to bad page
breaks.
** improve syntax errors (UTF-8, internationalization)
-Bison depends on the current locale. For instance:
-
-%define parse.error verbose
-%code top {
- #include <stdio.h>
- #include <stdlib.h>
- void yyerror(const char* msg) { fprintf(stderr, "%s\n", msg); }
- int yylex() { return 0; }
-}
-%%
-exp: "↦" | "🎅🐃" | '\n'
-%%
-int main() { return yyparse(); }
-
-gives different results with/without LC_ALL=C.
-
-$ LC_ALL=C /opt/local/bin/bison /tmp/mangle.y -o ascii.c
-$ /opt/local/bin/bison /tmp/mangle.y -o utf8.c
-$ diff -u ascii.c utf8.c -I#line
---- ascii.c 2019-01-12 08:15:35.878010093 +0100
-+++ utf8.c 2019-01-12 08:15:38.856495929 +0100
-@@ -415,9 +415,8 @@
- First, the terminals, then, starting at YYNTOKENS, nonterminals. */
- static const char *const yytname[] =
- {
-- "$end", "error", "$undefined", "\"\\342\\206\\246\"",
-- "\"\\360\\237\\216\\205\\360\\237\\220\\203\"", "'\\n'", "$accept",
-- "exp", YY_NULLPTR
-+ "$end", "error", "$undefined", "\"↦\"", "\"🎅🐃\"", "'\\n'",
-+ "$accept", "exp", YY_NULLPTR
- };
- #endif
-
-$ gcc ascii.c -o ascii && ./ascii
-syntax error, unexpected $end, expecting "\342\206\246" or "\360\237\216\205\360\237\220\203" or '\n'
-$ gcc utf8.c -o utf8 && ./utf8
-syntax error, unexpected $end, expecting ↦ or 🎅🐃 or '\n'
-
-
While at it, we should stop using "$end" by default, in favor of "end of
file", or "end of input", whatever. See how lalr1.java does that.