= Hacking the compiler :camel:

This document is a work-in-progress attempt to provide useful
information for people willing to inspect or modify the compiler
distribution's codebase. Feel free to improve it by sending change
proposals for it.

If you already have a patch that you would like to contribute to the
official distribution, please see link:CONTRIBUTING.md[].

=== Your first compiler modification

1. Create a new git branch to store your changes.
+
----
git checkout -b my-modification
----
Usually, this branch wants to be based on `trunk`. If your changes must be on a
specific release, use its release branch (*not* the release tag) instead. For
example, to make a fix for 4.11.1, base your branch on *4.11* (not on *4.11.1*).
The `configure` step for the compiler recognises a development build from the
`+dev` in the version number (see file `VERSION`), and release tarballs and the tagged Git commits do
not have this which causes some important development things to be disabled
(ocamltest and converting C compiler warnings to errors).

2. Consult link:INSTALL.adoc[] for build instructions. Here is the gist of it:
+
----
./configure
make -j 4
----
If you are on a release build and need development options, you can add
`--enable-ocamltest` (to allow running the testsuite) and `--enable-warn-error`
(so you don't get caught by CI later!).

3. Try the newly built compiler binaries `ocamlc`, `ocamlopt` or their
`.opt` version. To try the toplevel, use:
+
----
make runtop
----

4. Hack frenetically and keep rebuilding.

5. Run the testsuite from time to time.
+
----
make tests
----

6. You did it, Well done! Consult link:CONTRIBUTING.md[] to send your contribution upstream.

See also our <<tips,development tips and tricks>>, for example on how to
<<opam-switch,create an opam switch>> to test your modified compiler.

=== What to do

There is always a lot of potential tasks, both for old and
newcomers. Here are various potential projects:

* https://github.com/ocaml/ocaml/issues[The OCaml
  bugtracker] contains reported bugs and feature requests. Some
  changes that should be accessible to newcomers are marked with the
  tag link:++https://github.com/ocaml/ocaml/issues?q=is%3Aopen+is%3Aissue+label%3Anewcomer-job++[
  newcomer-job].

* The
  https://github.com/ocamllabs/compiler-hacking/wiki/Things-to-work-on[OCaml
  Labs compiler-hacking wiki] contains various ideas of changes to
  propose, some easy, some requiring a fair amount of work.

* Documentation improvements are always much appreciated, either in
  the various `.mli` files or in the official manual
  (See link:manual/README.md[]). If you invest effort in understanding
  a part of the codebase, submitting a pull request that adds
  clarifying comments can be an excellent contribution to help you,
  next time, and other code readers.

* The https://github.com/ocaml/ocaml[github project] contains a lot of
  pull requests, many of them being in dire need of a review -- we
  have more people willing to contribute changes than to review
  someone else's change. Picking one of them, trying to understand the
  code (looking at the code around it) and asking questions about what
  you don't understand or what feels odd is super-useful. It helps the
  contribution process, and it is also an excellent way to get to know
  various parts of the compiler from the angle of a specific aspect or
  feature.
+
Again, reviewing small or medium-sized pull requests is accessible to
anyone with OCaml programming experience, and helps maintainers and
other contributors. If you also submit pull requests yourself, a good
discipline is to review at least as many pull requests as you submit.

== Structure of the compiler

The compiler codebase can be intimidating at first sight. Here are
a few pointers to get started.

=== Compilation pipeline

==== The driver -- link:driver/[]

The driver contains the "main" function of the compilers that drive
compilation. It parses the command-line arguments and composes the
required compiler passes by calling functions from the various parts
of the compiler described below.

==== Parsing -- link:parsing/[]

Parses source files and produces an Abstract Syntax Tree (AST)
(link:parsing/parsetree.mli[] has lot of helpful comments). See
link:parsing/HACKING.adoc[].

The logic for Camlp4 and Ppx preprocessing is not in link:parsing/[],
but in link:driver/[], see link:driver/pparse.mli[] and
link:driver/pparse.ml[].

==== Typing -- link:typing/[]

Type-checks the AST and produces a typed representation of the program
(link:typing/typedtree.mli[] has some helpful comments). See
link:typing/HACKING.adoc[].

==== The bytecode compiler -- link:bytecomp/[]

==== The native compiler -- link:middle_end/[] and link:asmcomp/[]

=== Runtime system

The low-level routines that OCaml programs use during their execution:
garbage collection, interaction with the operating system
(IO in particular), low-level primitives to manipulate some OCaml data
structures, etc. Mostly implemented in C, with some rare bits of
assembly code in architecture-specific files. The "includes"
corresponding to the `.c` files are in the link:runtime/caml[]
subdirectory.

Some files are only used by bytecode programs, some only used by
native-compiled programs, but most of the runtime code is
common. (See link:runtime/Makefile[] for the list of common,
bytecode-only and native-only source files.)

See link:runtime/HACKING.adoc[].

=== Libraries

link:stdlib/[]:: The standard library. Each file is largely
independent and should not need further knowledge.

link:otherlibs/[]:: External libraries such as `unix`, `threads`,
`dynlink` and `str`.

Instructions for building the full reference manual are provided in
link:manual/README.md[]. However, if you only modify the documentation
comments in `.mli` files in the compiler codebase, you can observe the
result by running

----
make html_doc
----

and then opening link:./api_docgen/ocamldoc/build/html/libref/index.html[] in a web browser.
The documentation is located in
link:./api_docgen/odoc/build/html/libref/index.html[] when `--with-odoc` is
passed to the configure script.

=== Tools

link:lex/[]:: The `ocamllex` lexer generator.

link:yacc/[]:: The `ocamlyacc` parser generator. We do not recommend
using it for user projects in need of a parser generator. Please
consider using and contributing to
link:http://gallium.inria.fr/~fpottier/menhir/[menhir] instead, which
has tons of extra features, lets you write more readable grammars, and
has excellent documentation.

=== Complete file listing

  BOOTSTRAP.adoc::        instructions for bootstrapping
  Changes::               what's new with each release
  CONTRIBUTING.md::       how to contribute to OCaml
  HACKING.adoc::          this file
  INSTALL.adoc::          instructions for installation
  LICENSE::               license and copyright notice
  Makefile::              main Makefile
  Makefile.common::       common Makefile definitions
  README.adoc::           general information on the compiler distribution
  README.win32.adoc::     general information on the Windows ports of OCaml
  VERSION::               version string. Run `make configure` after changing.
  asmcomp/::              native-code compiler and linker
  boot/::                 bootstrap compiler
  build-aux/:             autotools support scripts
  bytecomp/::             bytecode compiler and linker
  compilerlibs/::         the OCaml compiler as a library
  configure::             configure script
  configure.ac:           autoconf input file
  debugger/::             source-level replay debugger
  driver/::               driver code for the compilers
  flexdll/::              git submodule -- see link:README.win32.adoc[]
  lex/::                  lexer generator
  man/::                  man pages
  manual/::               system to generate the manual
  middle_end/::           the flambda optimisation phase
  ocamldoc/::             documentation generator
  ocamltest/::            test driver
  otherlibs/::            several additional libraries
  parsing/::              syntax analysis -- see link:parsing/HACKING.adoc[]
  release-info/::         documentation and tools to prepare releases
  runtime/::              bytecode interpreter and runtime systems
  stdlib/::               standard library
  testsuite/::            tests -- see link:testsuite/HACKING.adoc[]
  tools/::                various utilities
  toplevel/::             interactive system
  typing/::               typechecking -- see link:typing/HACKING.adoc[]
  utils/::                utility libraries
  yacc/::                 parser generator

[#tips]
== Development tips and tricks

=== Keep merge commits when merging and cherry-picking Github PRs

Having the Github PR number show up in the git log is very useful for
later triaging. We recently disabled the "Rebase and merge" button,
precisely because it does not produce a merge commit.

When you cherry-pick a PR in another branch, please cherry-pick this
merge-style commit rather than individual commits, whenever
possible. (Picking a merge commit typically requires the `-m 1`
option.) You should also use the `-x` option to include the hash of
the original commit in the commit message.

----
git cherry-pick -x -m 1 <merge-commit-hash>
----

[#opam-switch]
=== Testing with `opam`

If you are working on a development version of the compiler, you can create an
opam switch from it by running the following from the development repository:

-----
opam switch create . --empty
opam install .
-----

If you want to test someone else's development version from a public
git repository, you can build a switch directly (without cloning their
work locally) by pinning:

----
opam switch create my-switch-name --empty
# Replace $VERSION by the trunk version
opam pin add ocaml-variants.$VERSION+branch git+https://$REPO#branch
----

==== Incremental builds with `opam`

This section documents some tips to speed up your workflow when you need to
alternate between testing your branch and patching the compiler.
We'll assume that you're currently in a clone of the compiler's source code.

===== Initial setup

For the rest of the section to work, you'll need your compiler to be
configured in the same way as `opam` would have configured it. The simplest
way is to run the normal commands for the switch initialization, with the extra
`--inplace-build` flag:

-----
opam switch create . --empty
opam install . --inplace-build
-----

However, if you need specific configuration options, you can also configure it
manually, as long as you make sure that the configuration prefix is the one
where `opam` would install the compiler.
You will then need to install the compiler, either from the working directory
(that you must build yourself) or using the regular sandboxed builds.

-----
# Example with regular opam build
opam switch create . --empty
opam install .
./configure --prefix=$(opam var prefix) # put extra configuration args here
-----

-----
# Example with installation from the current directory, installing only the
# bytecode versions of the tools
opam switch create . --empty
./configure --prefix=$(opam var prefix) # put extra configuration args here
make world && make opt
opam install . --assume-built
-----

===== Basic workflow

We will assume that the workflow alternates between work on the compiler and
external (`opam`-related) commands.
As an example, debugging an issue in the compiler can be done by a first step
that triggers the issue (by installing a given `opam` package), then adding
some logging to the compiler, re-trigger the issue, and based on the logs either
add more logging, or try a patch, and so on.

The part of this workflow that we're going to optimize is when we switch from
working on the compiler to using the compiler. The basic way to do this is to
run `opam install .` again, but this will recompile the compiler from scratch
and also trigger a recompilation of all the packages in the switch.

===== Using `opam-custom-install`

The `opam-custom-install` plugin allows you to install a package using a custom
command instead of the package-supplied one. It can be installed following
instructions https://gitlab.ocamlpro.com/louis/opam-custom-install[here].

In our case, we need to build the compiler, and when we've built everything
that we need then we run `opam custom-install ocaml-variants -- make install`.
This will make `opam` remove the previously installed version of the compiler
(if any), then install the new one in its stead.

-----
# reinstall the compiler, and rebuild all opam packages
opam custom-install ocaml-variants -- make install
-----

Since most `opam` packages depend on the compiler, this will trigger a
reinstallation of all the packages in the switch.
If you want to avoid that (for instance, your patch only adds some logging
so you expect the core libraries and all the already compiled packages to be
identical), you can use the additional `--no-recompilations` flag.
There are no checks that it's safe to do so, so if your patch ends up
changing even slightly one of the core libraries' files, you will likely
get inconsistent assumptions errors later.

-----
# reinstall the compiler, leaving the opam packages untouched -- unsafe!
opam custom-install --no-recompilations ocaml-variants -- make install
-----

Note about the first installation:
When you start from an empty switch, and install a compiler (in our case,
the `ocaml-variants` package provided by the compiler's `opam` file), then
a number of additional packages are installed to ensure that the switch
will work correctly. Mainly, the `ocaml` package needs to be installed,
and while it's done automatically when using regular `opam` commands, the
`custom-install` plugin will not force installation of dependencies.
Moreover, if you try to fix the problem by manually installing the `ocaml`
package, `opam` will try to recompile `ocaml-variants`, using the default
instructions. You can get around this by running
`opam reinstall --forget-pending` just after the `opam custom-install` command
and just before the `opam install ocaml command`.
Full example:

-----
opam switch create . --empty
./configure --prefix=$(opam var prefix) --disable-ocamldoc --disable-ocamltest
make world && make opt
opam custom-install ocaml-variants -- make install
opam reinstall --forget-pending --yes
opam install ocaml
# You now have a working switch, in which you can start installing packages
-----

One advantage of this plugin over a plain `make install` is that it
correctly tracks the files associated with the compiler, so if your
`make install` command only installs the bytecode versions of the tools,
then with `opam-custom-install` you will end up in a state where only the
bytecode tools are installed, whereas with a raw `make install` you will have
stale native binaries remaining in your switch.
Since it's significantly faster to build the bytecode version of the tools,
and many `opam` packages will pick the native version of the compilers if
present and the bytecode version otherwise, you can build your initial switch
with the native versions (to get quickly to a state where a bug appears),
then clean your working directory and start building bytecode tools only
for the actual debugging phase.

===== Without `opam-custom-install`

You can achieve some improvements using built-in `opam` commands.

Using `opam install . --assume-built` will simply remove the
package for the compiler, then run the installation instructions
(`make install`) in the working directory, tracking the installed files
correctly. The main difference with the `opam-custom-install` version
is that there's no way to prevent this command from triggering a full
recompilation of your switch.

You can also run `make install` manually, which will not trigger a
recompilation, but will not remove the previous version either and can
mess with `opam`'s tracking of installed files.

=== Useful Makefile targets and options

Besides the targets listed in link:INSTALL.adoc[] for build and
installation, the following targets may be of use:

`make runtop` :: builds and runs the ocaml toplevel of the distribution
                          (optionally uses `rlwrap` for readline+history support)
`make natruntop`:: builds and runs the native ocaml toplevel (experimental)

`make partialclean`:: Clean the OCaml files but keep the compiled C files.

`make depend`:: Regenerate the `.depend` file. Should be used each time new dependencies are added between files.

`make -C testsuite parallel`:: see link:testsuite/HACKING.adoc[]

You can use `make foo V=1` to build the target foo and show full
commands instead of abbreviated names like OCAMLC, etc. This can be
useful to know the flags to use to manually rebuild a file.

Additionally, there are some developer specific targets in link:Makefile.dev[].
These targets are automatically available when working in a Git clone of the
repository, but are not available from a tarball.

=== Automatic configure options

If you have options to `configure` which you always (or at least frequently)
use, it's possible to store them in Git, and `configure` will automatically add
them. For example, you may wish to avoid building the debug runtime by default
while developing, in which case you can issue
`git config --global ocaml.configure '--disable-debug-runtime'`. The `configure`
script will alert you that it has picked up this option and added it _before_
any options you specified for `configure`.

Options are added before those passed on the command line, so it's possible to
override them, for example `./configure --enable-debug-runtime` will build the
debug runtime, since the enable flag appears after the disable flag. You can
also use the full power of Git's `config` command and have options specific to
particular clone or worktree.

=== Speeding up configure

`configure` includes the standard `-C` option which caches various test results
in the file `config.cache` and can use those results to avoid running tests in
subsequent invocations. This mechanism works fine, except that it is easy to
clean the cache by mistake (e.g. with `git clean -dfX`). The cache is also
host-specific which means the file has to be deleted if you run `configure` with
a new `--host` value (this is quite common on Windows, where `configure` is
also quite slow to run).

You can elect to have host-specific cache files by issuing
`git config --global ocaml.configure-cache .`. The `configure` script will now
automatically create `ocaml-host.cache` (e.g. `ocaml-x86_64-pc-windows.cache`,
or `ocaml-default.cache`). If you work with multiple worktrees, you can share
these cache files by issuing `git config --global ocaml.configure-cache ..`. The
directory is interpreted _relative_ to the `configure` script.

=== Bootstrapping

The OCaml compiler is bootstrapped. This means that
previously-compiled bytecode versions of the compiler and lexer are
included in the repository under the
link:boot/[] directory. These bytecode images are used once the
bytecode runtime (which is written in C) has been built to compile the
standard library and then to build a fresh compiler. Details can be
found in link:BOOTSTRAP.adoc[].

=== Speeding up builds

Once you've built a natively-compiled `ocamlc.opt`, you can use it to
speed up future builds by copying it to `boot`:

----
cp ocamlc.opt boot/
----

If `boot/ocamlc` changes (e.g. because you ran `make bootstrap`), then
the build will revert to the slower bytecode-compiled `ocamlc` until
you do the above step again.

=== Using merlin

During the development of the compiler, the internal format of compiled object
files evolves, and quickly becomes incompatible with the format of the last
OCaml release. In particular, even an up-to-date merlin will be unable to use
them during most of the development cycle: opening a compiler source file with
merlin gives a frustrating error message.

To use merlin on the compiler, you want to build the compiler with an older
version of itself. One easy way to do this is to use the experimental build
rules for Dune, which are distributed with the compiler (with no guarantees that
the build will work all the time). Assuming you already have a recent OCaml
version installed with merlin and dune, you can just run the following from the
compiler sources:

----
./configure # if not already done
make clean && dune build @libs
----

which will do a bytecode build of all the distribution (without linking
the executables), using your OCaml compiler.

Merlin will be looking at the artefacts generated by dune (in `_build`), rather
than trying to open the incompatible artefacts produced by a Makefile build. In
particular, you need to repeat the dune build every time you change the interface
of some compilation unit, so that merlin is aware of the new interface.

You only need to run `configure` once, but you will need to run `make clean`
every time you want to run `dune` after you built something with `make`;
otherwise dune will complain that build artefacts are present among the sources.

Finally, there will be times where the compiler simply cannot be built with an
older version of itself. One example of this is when a new primitive is added to
the runtime, and then used in the standard library straight away, since the rest
of the compiler requires the `stdlib` library to build, nothing can be build. In
such situations, you will have to either live without merlin, or develop on an
older branch of the compiler, for example the maintenance branch of the last
released version. Developing a patch from a release branch can later introduce a
substantial amount of extra work, when you rebase to the current development
version. But it also makes it a lot easier to test the impact of your work on
third-party code, by installing a local <<opam-switch,opam switch>>: opam
packages tend to be compatible with released versions of the compiler, whereas
most packages are incompatible with the in-progress development version.

=== Continuous integration

[#check-typo]
==== check-typo

The `tools/check-typo` script enforces various typographical rules in the
OCaml compiler codebase.

Running `./tools/check-typo` from the repository root will check all
source files. This can be fairly slow (2 minutes for example). Use
`./tools/check-typo <path>` to run it on some file or directory
(recursively) only.

Running `./tools/check-typo-since trunk` checks all files that changed
in the commits since `trunk` -- this work with any git reference. It
runs much faster than a full `./tools/check-typo`, typically instantly.

You can also setup a git commit-hook to automatically run `check-typo`
on the changes you commit, by copying the file
`tools/pre-commit-githook` to `.git/hooks/pre-commit`. If changes in a commit
alter the `configure` script, the hook also checks that committed `configure`
script is up-to-date.

Some files need special rules to opt out of `check-typo` checks; this
is specified in the `.gitattributes` file at the root of the
repository, using `typo.foo` attributes.

==== GitHub's Continuous Integration: GitHub Actions and AppVeyor

The scripts that are run on GitHub Actions are described in
link:.github/workflows/build.yml[].

For example, if you want to reproduce the default build on your
machine, you can use the configuration values and run command taken from
link:tools/ci/actions/runner.sh[]:

----
XARCH=x64 bash -ex tools/ci/actions/runner.sh configure
----

The link:.github/workflows/hygiene.yml[] script supports other kinds of
tests which inspect the patch submitted as part of a pull request. These
tests rely on ancillary data generated by GitHub Actions which you have to
set explicitly to reproduce them locally.

`Changes updated` checks that the link:Changes[] file has been modified
(hopefully to add a new entry). It can be disabled by including "_(no change
entry needed)_" in one of your commit messages -- but in general all patches
submitted should come with a Changes entry; see the guidelines in
link:CONTRIBUTING.md[].

The Windows ports take a long time to test - INRIA's precheck service is the
best to use when all 6 Windows ports need testing for a branch, but the
AppVeyor scripts also support the other ports. The matrix is controlled by
the following environment variables, which should be set in link:appveyor.yml[]:

- `PORT` - this must be set on each job. Either `mingw`, `msvc` or `cygwin`
  followed by `32` or `64`.
- `BOOTSTRAP_FLEXDLL` - must be set on each job. Either `true` or `false`.
  At present, must be `false` for Cygwin builds. Controls whether flexlink
  is bootstrapped as part of the test or installed from a binary archive.
- `FORCE_CYGWIN_UPGRADE`. Default: `0`. Set to `1` to force an upgrade of
  Cygwin packages as part of the build. Normally a full upgrade is only
  triggered if the packages installed require it.
- `BUILD_MODE`. Default: `world.opt`. Either `world.opt`, `steps`, or `C`.
  Controls whether the build uses the `world.opt` target or the classic
  `world`, `opt`, `opt.opt` targets. The `C` build is a fast test used to
  build just enough of the tree to cover the C sources (it's used to test
  old MSVC compilers).
- `SDK`. Defaults to Visual Studio 2015. Specifies the exact command to run
  to set-up the Microsoft build environment.
- `CYGWIN_DIST`. Default: `64`. Either `64` or `32`, selects 32-bit or 64-bit
  Cygwin as the build environment.

==== INRIA's Continuous Integration (CI)

INRIA provides a Jenkins continuous integration service that OCaml
uses, see link:https://ci.inria.fr/ocaml/[]. It provides a wider
architecture support (MSVC and MinGW, a zsystems s390x machine, and
various MacOS versions) than the Travis/AppVeyor testing on github,
but only runs on commits to the trunk or release branches, not on every
PR.

You do not need to be an INRIA employee to open an account on this
jenkins service; anyone can create an account there to access build
logs and manually restart builds. If you
would like to do this but have trouble doing it, please email
ocaml-ci-admin@inria.fr.

To be notified by email of build failures, you can subscribe to the
ocaml-ci-notifications@inria.fr mailing list by visiting
https://sympa.inria.fr/sympa/info/ocaml-ci-notifications[its web page.]

==== Running INRIA's CI on a publicly available git branch

If you have suspicions that your changes may fail on exotic architectures
(they touch the build system or the backend code generator,
for example) and would like to get wider testing than github's CI
provides, it is possible to manually start INRIA's CI on arbitrary git
branches even before opening a pull request as follows:

1. Make sure you have an account on Inria's CI as described before.

2. Make sure you have been added to the ocaml project.

3. Prepare a branch with the code you'd like to test, say "mybranch". It
is probably a good idea to make sure your branch is based on the latest
trunk.

4. Make your branch publicly available. For instance, you can fork
OCaml's GitHub repository and then push "mybranch" to your fork.

5. Visit https://ci.inria.fr/ocaml/job/precheck and log in. Click on
"Build with parameters".

6. Fill in the REPO_URL and BRANCH fields as appropriate and run the build.

7. You should receive a bunch of e-mails with the build logs for each
slave and each tested configuration (with and without flambda) attached.

==== Changing what the CI does

INRIA's CI "main" and "precheck" jobs run the script
tools/ci-build. In particular, when running the CI on a publicly
available branch via the "precheck" job as explained in the previous
section, you can edit this script to change what the CI will test.

For instance, parallel builds are only tested for the "trunk"
branch. In order to use "precheck" to test parallel build on a custom
branch, add this at the beginning of tools/ci-build:

----
OCAML_JOBS=10
----

=== The `caml-commits` mailing list

If you would like to receive email notifications of all commits made to the main
git repository, you can subscribe to the caml-commits@inria.fr mailing list by
visiting https://sympa.inria.fr/sympa/info/caml-commits[its web page.]

Happy Hacking!