|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This involves removing the custom parsing loop in parsing/parse.ml
and instead using the parsing loop provided by MenhirLib. We use
Menhir's so-called "simplified" error-handling strategy (which is
new as of 20201216). This strategy differs from the "legacy"
strategy in two ways:
* It does not read one token too far (so the custom code in
parsing/parse.ml can be removed).
* If the current state cannot shift or reduce the error token,
then it just dies, whereas the legacy strategy would pop an
item off the stack and continue.
The second point impacts some of the syntax error messages produced
by the parser. Popping items off the stack meant forgetting part of
the input that had just been read, and would lead the parser to
produce messages that did not make sense to the user, because there
was no way for the user to tell that the message was relative to an
earlier point in the input. Here is an example:
```
$ more foo.ml
(2 + )
$ ocamlc -c foo.ml
File "foo.ml", line 1, characters 5-6:
1 | (2 + )
^
Error: Syntax error: ')' expected
File "foo.ml", line 1, characters 0-1:
1 | (2 + )
^
This '(' might be unmatched
```
With the simplified parsing strategy, a simple "Syntax error" message
is printed. This is arguably better: it may seem less informative,
but it is actually less wrong and less confusing. It is a better basis
for producing better syntax error messages in the future.
|
|
(plus tooling) (#10086)
* Improve [make clean-menhir] to remove parser.{automaton,conflicts}.
* Distinguish MENHIRBASICFLAGS and MENHIRFLAGS.
The former is a subset of the latter, and suffices when running Menhir
to perform an analysis of the grammar.
This allows [make interpret-menhir] to be used even if ocamlrun and ocamlc
have not been built yet.
* Define an alias (i.e., concrete syntax) for every token. Add --require-aliases.
The flag --require-aliases makes sure that the property that every token
has an alias will be preserved in the future.
This requires Menhir 20201214.
* Add [make list-parse-errors].
This rule runs Menhir's reachability analysis, which produces a list of all
states where a syntax error can be detected (and a corresponding list of of
erroneous sentences). This data is stored in parsing/parser.auto.messages.
All text between BEGIN AVOID and END AVOID is removed from the grammar before
the analysis is run. This can be used to filter out productions and
declarations that the analysis should ignore.
* Add [make generate-parse-errors].
This rule turns the error sentences stored in parsing/parser.auto.messages
into concrete .ml files, which can be used as tests. One file per sentence is
created. The file name is derived from the sentence. The test files are placed
in the directory testsuite/tests/generated-parse-errors.
* Mark the three productions that use [not_expecting] with [AVOID].
* Mark the production that allows puns with [AVOID].
This prevents [make list-parse-errors] from generating sentences that exploit
this production. Indeed, there is a risk of generating sentences that
cause syntax errors, due to the auxiliary function [addlb], which rejects
certain puns.
* Mark some of the start symbols with [AVOID].
* Add one new test file in testsuite/tests/generated-parse-errors/errors.ml.
This file was produced by [make generate-parse-errors].
This file contains:
1072 sentences whose start symbol is implementation.
5 sentences whose start symbol is use_file.
9 sentences whose start symbol is toplevel_phrase.
The parser's output can be described as follows:
1086 syntax errors reported.
721 syntax errors without explanation.
365 syntax errors with an indication of what was expected.
307 syntax errors with an indication of an unmatched delimiter.
|