summaryrefslogtreecommitdiff
path: root/docs/users_guide/extending_ghc.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/users_guide/extending_ghc.rst')
-rw-r--r--docs/users_guide/extending_ghc.rst535
1 files changed, 535 insertions, 0 deletions
diff --git a/docs/users_guide/extending_ghc.rst b/docs/users_guide/extending_ghc.rst
new file mode 100644
index 0000000000..efe18b0a3f
--- /dev/null
+++ b/docs/users_guide/extending_ghc.rst
@@ -0,0 +1,535 @@
+.. _extending-ghc:
+
+Extending and using GHC as a Library
+====================================
+
+GHC exposes its internal APIs to users through the built-in ghc package.
+It allows you to write programs that leverage GHC's entire compilation
+driver, in order to analyze or compile Haskell code programmatically.
+Furthermore, GHC gives users the ability to load compiler plugins during
+compilation - modules which are allowed to view and change GHC's
+internal intermediate representation, Core. Plugins are suitable for
+things like experimental optimizations or analysis, and offer a lower
+barrier of entry to compiler development for many common cases.
+
+Furthermore, GHC offers a lightweight annotation mechanism that you can
+use to annotate your source code with metadata, which you can later
+inspect with either the compiler API or a compiler plugin.
+
+.. _annotation-pragmas:
+
+Source annotations
+------------------
+
+Annotations are small pragmas that allow you to attach data to
+identifiers in source code, which are persisted when compiled. These
+pieces of data can then inspected and utilized when using GHC as a
+library or writing a compiler plugin.
+
+.. _ann-pragma:
+
+Annotating values
+~~~~~~~~~~~~~~~~~
+
+.. index::
+ single: ANN pragma
+ single: pragma; ANN
+ single: source annotations
+
+Any expression that has both ``Typeable`` and ``Data`` instances may be
+attached to a top-level value binding using an ``ANN`` pragma. In
+particular, this means you can use ``ANN`` to annotate data constructors
+(e.g. ``Just``) as well as normal values (e.g. ``take``). By way of
+example, to annotate the function ``foo`` with the annotation
+``Just "Hello"`` you would do this:
+
+::
+
+ {-# ANN foo (Just "Hello") #-}
+ foo = ...
+
+A number of restrictions apply to use of annotations:
+
+- The binder being annotated must be at the top level (i.e. no nested
+ binders)
+
+- The binder being annotated must be declared in the current module
+
+- The expression you are annotating with must have a type with
+ ``Typeable`` and ``Data`` instances
+
+- The `Template Haskell staging restrictions <>`__ apply to the
+ expression being annotated with, so for example you cannot run a
+ function from the module being compiled.
+
+ To be precise, the annotation ``{-# ANN x e #-}`` is well staged if
+ and only if ``$(e)`` would be (disregarding the usual type
+ restrictions of the splice syntax, and the usual restriction on
+ splicing inside a splice - ``$([|1|])`` is fine as an annotation,
+ albeit redundant).
+
+If you feel strongly that any of these restrictions are too onerous,
+:ghc-wiki:`please give the GHC team a shout <MailingListsAndIRC>`.
+
+However, apart from these restrictions, many things are allowed,
+including expressions which are not fully evaluated! Annotation
+expressions will be evaluated by the compiler just like Template Haskell
+splices are. So, this annotation is fine:
+
+::
+
+ {-# ANN f SillyAnnotation { foo = (id 10) + $([| 20 |]), bar = 'f } #-}
+ f = ...
+
+.. _typeann-pragma:
+
+Annotating types
+~~~~~~~~~~~~~~~~
+
+.. index::
+ single: ANN pragma; on types
+
+You can annotate types with the ``ANN`` pragma by using the ``type``
+keyword. For example:
+
+::
+
+ {-# ANN type Foo (Just "A `Maybe String' annotation") #-}
+ data Foo = ...
+
+.. _modann-pragma:
+
+Annotating modules
+~~~~~~~~~~~~~~~~~~
+
+.. index::
+ single: ANN pragma; on modules
+
+You can annotate modules with the ``ANN`` pragma by using the ``module``
+keyword. For example:
+
+::
+
+ {-# ANN module (Just "A `Maybe String' annotation") #-}
+
+.. _ghc-as-a-library:
+
+Using GHC as a Library
+----------------------
+
+The ``ghc`` package exposes most of GHC's frontend to users, and thus
+allows you to write programs that leverage it. This library is actually
+the same library used by GHC's internal, frontend compilation driver,
+and thus allows you to write tools that programmatically compile source
+code and inspect it. Such functionality is useful in order to write
+things like IDE or refactoring tools. As a simple example, here's a
+program which compiles a module, much like ghc itself does by default
+when invoked:
+
+::
+
+ import GHC
+ import GHC.Paths ( libdir )
+ import DynFlags ( defaultLogAction )
+
+ main =
+ defaultErrorHandler defaultLogAction $ do
+ runGhc (Just libdir) $ do
+ dflags <- getSessionDynFlags
+ setSessionDynFlags dflags
+ target <- guessTarget "test_main.hs" Nothing
+ setTargets [target]
+ load LoadAllTargets
+
+The argument to ``runGhc`` is a bit tricky. GHC needs this to find its
+libraries, so the argument must refer to the directory that is printed
+by ``ghc --print-libdir`` for the same version of GHC that the program
+is being compiled with. Above we therefore use the ``ghc-paths`` package
+which provides this for us.
+
+Compiling it results in:
+
+::
+
+ $ cat test_main.hs
+ main = putStrLn "hi"
+ $ ghc -package ghc simple_ghc_api.hs
+ [1 of 1] Compiling Main ( simple_ghc_api.hs, simple_ghc_api.o )
+ Linking simple_ghc_api ...
+ $ ./simple_ghc_api
+ $ ./test_main
+ hi
+ $
+
+For more information on using the API, as well as more samples and
+references, please see `this Haskell.org wiki
+page <http://haskell.org/haskellwiki/GHC/As_a_library>`__.
+
+.. _compiler-plugins:
+
+Compiler Plugins
+----------------
+
+GHC has the ability to load compiler plugins at compile time. The
+feature is similar to the one provided by
+`GCC <http://gcc.gnu.org/wiki/plugins>`__, and allows users to write
+plugins that can adjust the behaviour of the constraint solver, inspect
+and modify the compilation pipeline, as well as transform and inspect
+GHC's intermediate language, Core. Plugins are suitable for experimental
+analysis or optimization, and require no changes to GHC's source code to
+use.
+
+Plugins cannot optimize/inspect C--, nor can they implement things like
+parser/front-end modifications like GCC, apart from limited changes to
+the constraint solver. If you feel strongly that any of these
+restrictions are too onerous,
+:ghc-wiki:`please give the GHC team a shout <MailingListsAndIRC>`.
+
+.. _using-compiler-plugins:
+
+Using compiler plugins
+~~~~~~~~~~~~~~~~~~~~~~
+
+Plugins can be specified on the command line with the option
+``-fplugin=module`` where ⟨module⟩ is a module in a registered package
+that exports a plugin. Arguments can be given to plugins with the
+command line option ``-fplugin-opt=module:args``, where ⟨args⟩ are
+arguments interpreted by the plugin provided by ⟨module⟩.
+
+As an example, in order to load the plugin exported by ``Foo.Plugin`` in
+the package ``foo-ghc-plugin``, and give it the parameter "baz", we
+would invoke GHC like this:
+
+::
+
+ $ ghc -fplugin Foo.Plugin -fplugin-opt Foo.Plugin:baz Test.hs
+ [1 of 1] Compiling Main ( Test.hs, Test.o )
+ Loading package ghc-prim ... linking ... done.
+ Loading package integer-gmp ... linking ... done.
+ Loading package base ... linking ... done.
+ Loading package ffi-1.0 ... linking ... done.
+ Loading package foo-ghc-plugin-0.1 ... linking ... done.
+ ...
+ Linking Test ...
+ $
+
+Since plugins are exported by registered packages, it's safe to put
+dependencies on them in cabal for example, and specify plugin arguments
+to GHC through the ``ghc-options`` field.
+
+.. _writing-compiler-plugins:
+
+Writing compiler plugins
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Plugins are modules that export at least a single identifier,
+``plugin``, of type ``GhcPlugins.Plugin``. All plugins should
+``import GhcPlugins`` as it defines the interface to the compilation
+pipeline.
+
+A ``Plugin`` effectively holds a function which installs a compilation
+pass into the compiler pipeline. By default there is the empty plugin
+which does nothing, ``GhcPlugins.defaultPlugin``, which you should
+override with record syntax to specify your installation function. Since
+the exact fields of the ``Plugin`` type are open to change, this is the
+best way to ensure your plugins will continue to work in the future with
+minimal interface impact.
+
+``Plugin`` exports a field, ``installCoreToDos`` which is a function of
+type ``[CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]``. A
+``CommandLineOption`` is effectively just ``String``, and a ``CoreToDo``
+is basically a function of type ``Core -> Core``. A ``CoreToDo`` gives
+your pass a name and runs it over every compiled module when you invoke
+GHC.
+
+As a quick example, here is a simple plugin that just does nothing and
+just returns the original compilation pipeline, unmodified, and says
+'Hello':
+
+::
+
+ module DoNothing.Plugin (plugin) where
+ import GhcPlugins
+
+ plugin :: Plugin
+ plugin = defaultPlugin {
+ installCoreToDos = install
+ }
+
+ install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
+ install _ todo = do
+ reinitializeGlobals
+ putMsgS "Hello!"
+ return todo
+
+Provided you compiled this plugin and registered it in a package (with
+cabal for instance,) you can then use it by just specifying
+``-fplugin=DoNothing.Plugin`` on the command line, and during the
+compilation you should see GHC say 'Hello'.
+
+Note carefully the ``reinitializeGlobals`` call at the beginning of the
+installation function. Due to bugs in the windows linker dealing with
+``libghc``, this call is necessary to properly ensure compiler plugins
+have the same global state as GHC at the time of invocation. Without
+``reinitializeGlobals``, compiler plugins can crash at runtime because
+they may require state that hasn't otherwise been initialized.
+
+In the future, when the linking bugs are fixed, ``reinitializeGlobals``
+will be deprecated with a warning, and changed to do nothing.
+
+.. _core-plugins-in-more-detail:
+
+Core plugins in more detail
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``CoreToDo`` is effectively a data type that describes all the kinds of
+optimization passes GHC does on Core. There are passes for
+simplification, CSE, vectorisation, etc. There is a specific case for
+plugins, ``CoreDoPluginPass :: String -> PluginPass -> CoreToDo`` which
+should be what you always use when inserting your own pass into the
+pipeline. The first parameter is the name of the plugin, and the second
+is the pass you wish to insert.
+
+``CoreM`` is a monad that all of the Core optimizations live and operate
+inside of.
+
+A plugin's installation function (``install`` in the above example)
+takes a list of ``CoreToDo``\ s and returns a list of ``CoreToDo``.
+Before GHC begins compiling modules, it enumerates all the needed
+plugins you tell it to load, and runs all of their installation
+functions, initially on a list of passes that GHC specifies itself.
+After doing this for every plugin, the final list of passes is given to
+the optimizer, and are run by simply going over the list in order.
+
+You should be careful with your installation function, because the list
+of passes you give back isn't questioned or double checked by GHC at the
+time of this writing. An installation function like the following:
+
+::
+
+ install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
+ install _ _ = return []
+
+is certainly valid, but also certainly not what anyone really wants.
+
+.. _manipulating-bindings:
+
+Manipulating bindings
+^^^^^^^^^^^^^^^^^^^^^
+
+In the last section we saw that besides a name, a ``CoreDoPluginPass``
+takes a pass of type ``PluginPass``. A ``PluginPass`` is a synonym for
+``(ModGuts -> CoreM ModGuts)``. ``ModGuts`` is a type that represents
+the one module being compiled by GHC at any given time.
+
+A ``ModGuts`` holds all of the module's top level bindings which we can
+examine. These bindings are of type ``CoreBind`` and effectively
+represent the binding of a name to body of code. Top-level module
+bindings are part of a ``ModGuts`` in the field ``mg_binds``.
+Implementing a pass that manipulates the top level bindings merely needs
+to iterate over this field, and return a new ``ModGuts`` with an updated
+``mg_binds`` field. Because this is such a common case, there is a
+function provided named ``bindsOnlyPass`` which lifts a function of type
+``([CoreBind] -> CoreM [CoreBind])`` to type
+``(ModGuts -> CoreM ModGuts)``.
+
+Continuing with our example from the last section, we can write a simple
+plugin that just prints out the name of all the non-recursive bindings
+in a module it compiles:
+
+::
+
+ module SayNames.Plugin (plugin) where
+ import GhcPlugins
+
+ plugin :: Plugin
+ plugin = defaultPlugin {
+ installCoreToDos = install
+ }
+
+ install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
+ install _ todo = do
+ reinitializeGlobals
+ return (CoreDoPluginPass "Say name" pass : todo)
+
+ pass :: ModGuts -> CoreM ModGuts
+ pass guts = do dflags <- getDynFlags
+ bindsOnlyPass (mapM (printBind dflags)) guts
+ where printBind :: DynFlags -> CoreBind -> CoreM CoreBind
+ printBind dflags bndr@(NonRec b _) = do
+ putMsgS $ "Non-recursive binding named " ++ showSDoc dflags (ppr b)
+ return bndr
+ printBind _ bndr = return bndr
+
+.. _getting-annotations:
+
+Using Annotations
+^^^^^^^^^^^^^^^^^
+
+Previously we discussed annotation pragmas (:ref:`annotation-pragmas`),
+which we mentioned could be used to give compiler plugins extra guidance
+or information. Annotations for a module can be retrieved by a plugin,
+but you must go through the modules ``ModGuts`` in order to get it.
+Because annotations can be arbitrary instances of ``Data`` and
+``Typeable``, you need to give a type annotation specifying the proper
+type of data to retrieve from the interface file, and you need to make
+sure the annotation type used by your users is the same one your plugin
+uses. For this reason, we advise distributing annotations as part of the
+package which also provides compiler plugins if possible.
+
+To get the annotations of a single binder, you can use
+``getAnnotations`` and specify the proper type. Here's an example that
+will print out the name of any top-level non-recursive binding with the
+``SomeAnn`` annotation:
+
+::
+
+ {-# LANGUAGE DeriveDataTypeable #-}
+ module SayAnnNames.Plugin (plugin, SomeAnn(..)) where
+ import GhcPlugins
+ import Control.Monad (unless)
+ import Data.Data
+
+ data SomeAnn = SomeAnn deriving (Data, Typeable)
+
+ plugin :: Plugin
+ plugin = defaultPlugin {
+ installCoreToDos = install
+ }
+
+ install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
+ install _ todo = do
+ reinitializeGlobals
+ return (CoreDoPluginPass "Say name" pass : todo)
+
+ pass :: ModGuts -> CoreM ModGuts
+ pass g = do
+ dflags <- getDynFlags
+ mapM_ (printAnn dflags g) (mg_binds g) >> return g
+ where printAnn :: DynFlags -> ModGuts -> CoreBind -> CoreM CoreBind
+ printAnn dflags guts bndr@(NonRec b _) = do
+ anns <- annotationsOn guts b :: CoreM [SomeAnn]
+ unless (null anns) $ putMsgS $ "Annotated binding found: " ++ showSDoc dflags (ppr b)
+ return bndr
+ printAnn _ _ bndr = return bndr
+
+ annotationsOn :: Data a => ModGuts -> CoreBndr -> CoreM [a]
+ annotationsOn guts bndr = do
+ anns <- getAnnotations deserializeWithData guts
+ return $ lookupWithDefaultUFM anns [] (varUnique bndr)
+
+Please see the GHC API documentation for more about how to use internal
+APIs, etc.
+
+.. _typechecker-plugins:
+
+Typechecker plugins
+~~~~~~~~~~~~~~~~~~~
+
+In addition to Core plugins, GHC has experimental support for
+typechecker plugins, which allow the behaviour of the constraint solver
+to be modified. For example, they make it possible to interface the
+compiler to an SMT solver, in order to support a richer theory of
+type-level arithmetic expressions than the theory built into GHC (see
+:ref:`typelit-tyfuns`).
+
+The ``Plugin`` type has a field ``tcPlugin`` of type
+``[CommandLineOption] -> Maybe TcPlugin``, where the ``TcPlugin`` type
+is defined thus:
+
+::
+
+ data TcPlugin = forall s . TcPlugin
+ { tcPluginInit :: TcPluginM s
+ , tcPluginSolve :: s -> TcPluginSolver
+ , tcPluginStop :: s -> TcPluginM ()
+ }
+
+ type TcPluginSolver = [Ct] -> [Ct] -> [Ct] -> TcPluginM TcPluginResult
+
+ data TcPluginResult = TcPluginContradiction [Ct] | TcPluginOk [(EvTerm,Ct)] [Ct]
+
+(The details of this representation are subject to change as we gain
+more experience writing typechecker plugins. It should not be assumed to
+be stable between GHC releases.)
+
+The basic idea is as follows:
+
+- When type checking a module, GHC calls ``tcPluginInit`` once before
+ constraint solving starts. This allows the plugin to look things up
+ in the context, initialise mutable state or open a connection to an
+ external process (e.g. an external SMT solver). The plugin can return
+ a result of any type it likes, and the result will be passed to the
+ other two fields.
+
+- During constraint solving, GHC repeatedly calls ``tcPluginSolve``.
+ This function is provided with the current set of constraints, and
+ should return a ``TcPluginResult`` that indicates whether a
+ contradiction was found or progress was made. If the plugin solver
+ makes progress, GHC will re-start the constraint solving pipeline,
+ looping until a fixed point is reached.
+
+- Finally, GHC calls ``tcPluginStop`` after constraint solving is
+ finished, allowing the plugin to dispose of any resources it has
+ allocated (e.g. terminating the SMT solver process).
+
+Plugin code runs in the ``TcPluginM`` monad, which provides a restricted
+interface to GHC API functionality that is relevant for typechecker
+plugins, including ``IO`` and reading the environment. If you need
+functionality that is not exposed in the ``TcPluginM`` module, you can
+use ``unsafeTcPluginTcM :: TcM a -> TcPluginM a``, but are encouraged to
+contact the GHC team to suggest additions to the interface. Note that
+``TcPluginM`` can perform arbitrary IO via
+``tcPluginIO :: IO a -> TcPluginM a``, although some care must be taken
+with side effects (particularly in ``tcPluginSolve``). In general, it is
+up to the plugin author to make sure that any IO they do is safe.
+
+.. _constraint-solving-with-plugins:
+
+Constraint solving with plugins
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The key component of a typechecker plugin is a function of type
+``TcPluginSolver``, like this:
+
+::
+
+ solve :: [Ct] -> [Ct] -> [Ct] -> TcPluginM TcPluginResult
+ solve givens deriveds wanteds = ...
+
+This function will be invoked at two points in the constraint solving
+process: after simplification of given constraints, and after
+unflattening of wanted constraints. The two phases can be distinguished
+because the deriveds and wanteds will be empty in the first case. In
+each case, the plugin should either
+
+- return ``TcPluginContradiction`` with a list of impossible
+ constraints (which must be a subset of those passed in), so they can
+ be turned into errors; or
+
+- return ``TcPluginOk`` with lists of solved and new constraints (the
+ former must be a subset of those passed in and must be supplied with
+ corresponding evidence terms).
+
+If the plugin cannot make any progress, it should return
+``TcPluginOk [] []``. Otherwise, if there were any new constraints, the
+main constraint solver will be re-invoked to simplify them, then the
+plugin will be invoked again. The plugin is responsible for making sure
+that this process eventually terminates.
+
+Plugins are provided with all available constraints (including
+equalities and typeclass constraints), but it is easy for them to
+discard those that are not relevant to their domain, because they need
+return only those constraints for which they have made progress (either
+by solving or contradicting them).
+
+Constraints that have been solved by the plugin must be provided with
+evidence in the form of an ``EvTerm`` of the type of the constraint.
+This evidence is ignored for given and derived constraints, which GHC
+"solves" simply by discarding them; typically this is used when they are
+uninformative (e.g. reflexive equations). For wanted constraints, the
+evidence will form part of the Core term that is generated after
+typechecking, and can be checked by ``-dcore-lint``. It is possible for
+the plugin to create equality axioms for use in evidence terms, but GHC
+does not check their consistency, and inconsistent axiom sets may lead
+to segfaults or other runtime misbehaviour.