manual/src/tutorials/polyvariant.etex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179

\chapter{Polymorphic variants} \label{c:poly-variant}
%HEVEA\cutname{polyvariant.html}
{\it (Chapter written by Jacques Garrigue)}

Variants as presented in section~\ref{s:tut-recvariants} are a
powerful tool to build data structures and algorithms. However they
sometimes lack flexibility when used in modular programming. This is
due to the fact that every constructor is assigned to a unique type
when defined and used. Even if the same name appears in the definition
of multiple types, the constructor itself belongs to only one type.
Therefore, one cannot decide that a given constructor belongs to
multiple types, or consider a value of some type to belong to some
other type with more constructors.

With polymorphic variants, this original assumption is removed. That
is, a variant tag does not belong to any type in particular, the type
system will just check that it is an admissible value according to its
use. You need not define a type before using a variant tag. A variant
type will be inferred independently for each of its uses.

\section{s:polyvariant:basic-use}{Basic use}

In programs, polymorphic variants work like usual ones. You just have
to prefix their names with a backquote character "`".
\begin{caml_example}{toplevel}
[`On; `Off];;
`Number 1;;
let f = function `On -> 1 | `Off -> 0 | `Number n -> n;;
List.map f [`On; `Off];;
\end{caml_example}
"[>`Off|`On] list" means that to match this list, you should at
least be able to match "`Off" and "`On", without argument.
"[<`On|`Off|`Number of int]" means that "f" may be applied to "`Off",
"`On" (both without argument), or "`Number" $n$ where
$n$ is an integer.
The ">" and "<" inside the variant types show that they may still be
refined, either by defining more tags or by allowing less. As such, they
contain an implicit type variable. Because each of the variant types
appears only once in the whole type, their implicit type variables are
not shown.

The above variant types were polymorphic, allowing further refinement.
When writing type annotations, one will most often describe fixed
variant types, that is types that cannot be refined. This is
also the case for type abbreviations. Such types do not contain "<" or
">", but just an enumeration of the tags and their associated types,
just like in a normal datatype definition.
\begin{caml_example}{toplevel}
type 'a vlist = [`Nil | `Cons of 'a * 'a vlist];;
let rec map f : 'a vlist -> 'b vlist = function
  | `Nil -> `Nil
  | `Cons(a, l) -> `Cons(f a, map f l)
;;
\end{caml_example}

\section{s:polyvariant-advanced}{Advanced use}

Type-checking polymorphic variants is a subtle thing, and some
expressions may result in more complex type information.

\begin{caml_example}{toplevel}
let f = function `A -> `C | `B -> `D | x -> x;;
f `E;;
\end{caml_example}
Here we are seeing two phenomena. First, since this matching is open
(the last case catches any tag), we obtain the type "[> `A | `B]"
rather than "[< `A | `B]" in a closed matching. Then, since "x" is
returned as is, input and return types are identical. The notation "as
'a" denotes such type sharing. If we apply "f" to yet another tag
"`E", it gets added to the list.

\begin{caml_example}{toplevel}
let f1 = function `A x -> x = 1 | `B -> true | `C -> false
let f2 = function `A x -> x = "a" | `B -> true ;;
let f x = f1 x && f2 x;;
\end{caml_example}
Here "f1" and "f2" both accept the variant tags "`A" and "`B", but the
argument of "`A" is "int" for "f1" and "string" for "f2". In "f"'s
type "`C", only accepted by "f1", disappears, but both argument types
appear for "`A" as "int & string". This means that if we
pass the variant tag "`A" to "f", its argument should be {\em both}
"int" and "string". Since there is no such value, "f" cannot be
applied to "`A", and "`B" is the only accepted input.

Even if a value has a fixed variant type, one can still give it a
larger type through coercions. Coercions are normally written with
both the source type and the destination type, but in simple cases the
source type may be omitted.
\begin{caml_example}{toplevel}
type 'a wlist = [`Nil | `Cons of 'a * 'a wlist | `Snoc of 'a wlist * 'a];;
let wlist_of_vlist  l = (l : 'a vlist :> 'a wlist);;
let open_vlist l = (l : 'a vlist :> [> 'a vlist]);;
fun x -> (x :> [`A|`B|`C]);;
\end{caml_example}

You may also selectively coerce values through pattern matching.
\begin{caml_example}{toplevel}
let split_cases = function
  | `Nil | `Cons _ as x -> `A x
  | `Snoc _ as x -> `B x
;;
\end{caml_example}
When an or-pattern composed of variant tags is wrapped inside an
alias-pattern, the alias is given a type containing only the tags
enumerated in the or-pattern. This allows for many useful idioms, like
incremental definition of functions.

\begin{caml_example}{toplevel}
let num x = `Num x
let eval1 eval (`Num x) = x
let rec eval x = eval1 eval x ;;
let plus x y = `Plus(x,y)
let eval2 eval = function
  | `Plus(x,y) -> eval x + eval y
  | `Num _ as x -> eval1 eval x
let rec eval x = eval2 eval x ;;
\end{caml_example}

To make this even more comfortable, you may use type definitions as
abbreviations for or-patterns. That is, if you have defined "type
myvariant = [`Tag1 of int | `Tag2 of bool]", then the pattern "#myvariant" is
equivalent to writing "(`Tag1(_ : int) | `Tag2(_ : bool))".
\begin{caml_eval}
type myvariant = [`Tag1 of int | `Tag2 of bool];;
\end{caml_eval}

Such abbreviations may be used alone,
\begin{caml_example}{toplevel}
let f = function
  | #myvariant -> "myvariant"
  | `Tag3 -> "Tag3";;
\end{caml_example}
or combined with with aliases.
\begin{caml_example}{toplevel}
let g1 = function `Tag1 _ -> "Tag1" | `Tag2 _ -> "Tag2";;
let g = function
  | #myvariant as x -> g1 x
  | `Tag3 -> "Tag3";;
\end{caml_example}

\section{s:polyvariant-weaknesses}{Weaknesses of polymorphic variants}

After seeing the power of polymorphic variants, one may wonder why
they were added to core language variants, rather than replacing them.

The answer is twofold. The first aspect is that while being pretty
efficient, the lack of static type information allows for less
optimizations, and makes polymorphic variants slightly heavier than
core language ones. However noticeable differences would only
appear on huge data structures.

More important is the fact that polymorphic variants, while being
type-safe, result in a weaker type discipline. That is, core language
variants do actually much more than ensuring type-safety, they also
check that you use only declared constructors, that all constructors
present in a data-structure are compatible, and they enforce typing
constraints to their parameters.

For this reason, you must be more careful about making types explicit
when you use polymorphic variants. When you write a library, this is
easy since you can describe exact types in interfaces, but for simple
programs you are probably better off with core language variants.

Beware also that some idioms make trivial errors very hard to find.
For instance, the following code is probably wrong but the compiler
has no way to see it.
\begin{caml_example}{toplevel}
type abc = [`A | `B | `C] ;;
let f = function
  | `As -> "A"
  | #abc -> "other" ;;
let f : abc -> string = f ;;
\end{caml_example}
You can avoid such risks by annotating the definition itself.
\begin{caml_example}{toplevel}[error]
let f : abc -> string = function
  | `As -> "A"
  | #abc -> "other" ;;
\end{caml_example}