summaryrefslogtreecommitdiff
path: root/src/backend.adoc
blob: e8d8323f3583e4ff2f7140b4011bac195399c026 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
= How to add a support for a new language to flex

= Theory

The flex code was historically written to generate parsers in C, but
it has factored to isolate knowledge of the specifics of each target
languageas from the logic for byukilding the lexer state tables much
as possible.

The only assumption that is absolutely baked into all of flex is that
the bodies of initializers for arrays of integers consist of decimal
numeric kiterals sepaerated by commas (and optional whitespace).

Otherwise, knowledge of each target langage's syntax lives in two
places: (1) a table of langyuge-specific syntax-generator methods,
and (2) A language-specific skeleton file.

For example: The methods for the C and C++ back end live in a source
file named cpp_backend.c (so named because both languages use the C
preprocessor), and in a skeleton file names cpp-flex.skl.

Syntactically C-like languages such as Go, Rust, and Java should be easy
target.  Alnost anything generally descended from Algol shouldn't be
much more difficult; this certainly includes the whole
Pascal/Modula/Oberon family.

= Writing a new backend

All the code that accesses language-specific code generators goes
through a global pointer named "backend" to a method table.  The
results of these generators are used to fill in some parts of the
language-specifoc skeleton file amd conditionalize other.

Read the definition of struct backend_t in src/flexdefs.h, and
attached comments, to get a feel for the methods.  Don't worry
about understandng table generator names at first.

To write support for a language, you'll want to do the following
steps:

1. Clone one of the existing back-end/skeleton pairs.  If the language
   you are supporting is named "foo", you should create files named
   foo_backend.c and foo-flex.skl.

2. Add foo_backend.c to COMMON_SOURCES in src.Makefile.am.  Add the
   name of your skeleton file to EXTRA_DIST.

3. Add a production to src/Makefile.am parallel to the one that
   priduces cpp-skel.h.  Your objecting is to make a string list
   initializer from your skeleton file that can be linked with flex
   and is opointed at by the skel nember of your language back end.

4. Add some logic to main.c that enables the new back end with a
   new command-line option.  Following this step you should be
   able to run flex on a specification and get code out in the
   language of whatever back end you cloned.

5. The interesting part: mutate your new back end and skeleton so they
   produce code in your desired target langage.

6. Write a test suite for your back end.  You should be able to clone
   one of the existing sets of test loads to get good coverage.  Note
   that is highly unlikely your back end will be accepted into the
   flex distribution without a test suite.

A hint about step 5:

* Don't bother supporting non-reentrant parser generation.
  The interface of original lex with all those globals hanging out
  needs to be supported in C for backwards compatibility, but
  there

The following assumptions in the code might trip you up and
require fixes outside a back end.

1. The language has a case-arm syntax that looks like
   a (possibly empty) prefix, followed by a value
   expression, followed by a colon.

2. Either case arms can be stacked as in C; that is, there is
   an implicit fallthrough if the case arm has no code. Or,
   there is an explicit fallthrough keyword that enables this,
   as in Go.

By putting a yyterminate() call in the fallthrough member
and a null pointer in the endcase member, you could handle
languages like Pascal