yacc.c: introduce an enum that defines the symbol's number

There's a number of advantage in exposing the symbol (internal) numbers: - custom error messages can use them to decide how to represent a given symbol, or a set of symbols. - we need something similar in uses of yyexpected_tokens. For instance, currently, bistromathic's completion() reads: int ntokens = expected_tokens (line, tokens, YYNTOKENS); [...] for (int i = 0; i < ntokens; ++i) if (tokens[i] == YYTRANSLATE (TOK_VAR)) [...] else if (tokens[i] == YYTRANSLATE (TOK_FUN)) [...] else [...] - now that it's a compile-time expression, we can easily build static tables, switch, etc. - some users depended on the ability to get the token number from a symbol to write test cases for their scanners. But Bison 3.5 removed the table this feature depended upon (a reverse yytranslate). Now they can check against the actual symbol number, without having pay (space and time) a conversion. See https://lists.gnu.org/r/bug-bison/2020-01/msg00001.html, and https://lists.gnu.org/archive/html/bug-bison/2020-03/msg00015.html. - it helps us clearly separate the internal symbol numbers from the external token numbers, whose difference is sometimes blurred in the code when values coincide (e.g. "yychar = yytoken = YYEOF"). - it allows us to get rid of ugly macros with inconsistent names such as YYUNDEFTOK and YYTERROR, and to group related definitions together. - similarly it provides a clean access to the $accept symbol (which proves convenient in a current experimentation of mine with several %start symbols). Let's declare this type as a private type (in the *.c file, not the *.h one). So it does not need to be influenced by the api prefix. * data/skeletons/bison.m4 (b4_symbol_sid): New. (b4_symbol): Use it. * data/skeletons/c.m4 (b4_symbol_enum, b4_declare_symbol_enum): New. * data/skeletons/yacc.c: Use b4_declare_symbol_enum. (YYUNDEFTOK, YYTERROR): Remove. Use the corresponding symbol enum instead.
author: Akim Demaille <akim.demaille@gmail.com> 2020-03-28 10:33:06 +0100
committer: Akim Demaille <akim.demaille@gmail.com> 2020-04-01 08:31:33 +0200
commit: 3ba001baacec616c54309b847a0466202ef05bdf (patch)
tree: bda404c896ed15227e3369c0a3415da179a57ad8 /TODO
parent: 4140320a0a992c7abdff19998ab0f19504047a04 (diff)
download: bison-3ba001baacec616c54309b847a0466202ef05bdf.tar.gz
1 files changed, 5 insertions, 0 deletions
diff --git a/TODO b/TODO
index 196af194..f314682a 100644
--- a/TODO
+++ b/TODO
@@ -18,6 +18,8 @@ There is some confusion over these terms, which is even a problem for
 translators.  We need something clear, especially if we provide access to
 the symbol numbers (which would be useful for custom error messages).
 
+We could use "number" and "code".
+
 *** The documentation
 
 You can explicitly specify the numeric code for a token type...
@@ -42,6 +44,9 @@ uses "user token number" in most places.
     complain (&loc, complaint, _("user token number of %s too large"),
               sym->tag);
 
+*** M4
+Make it consistent with the rest (it uses "user_number" and "number").
+
 ** Symbol numbers
 Giving names to symbol numbers would be useful in custom error messages.  It
 would actually also make the following point gracefully handled (status of
author	Akim Demaille <akim.demaille@gmail.com>	2020-03-28 10:33:06 +0100
committer	Akim Demaille <akim.demaille@gmail.com>	2020-04-01 08:31:33 +0200
commit	3ba001baacec616c54309b847a0466202ef05bdf (patch)
tree	bda404c896ed15227e3369c0a3415da179a57ad8 /TODO
parent	4140320a0a992c7abdff19998ab0f19504047a04 (diff)
download	bison-3ba001baacec616c54309b847a0466202ef05bdf.tar.gz