diff options
Diffstat (limited to 'docs/users_guide/ffi-chap.rst')
-rw-r--r-- | docs/users_guide/ffi-chap.rst | 658 |
1 files changed, 658 insertions, 0 deletions
diff --git a/docs/users_guide/ffi-chap.rst b/docs/users_guide/ffi-chap.rst new file mode 100644 index 0000000000..f9ebe5ae6e --- /dev/null +++ b/docs/users_guide/ffi-chap.rst @@ -0,0 +1,658 @@ +.. _ffi: + +Foreign function interface (FFI) +================================ + +.. index:: + single: -XForeignFunctionInterface + single: Foreign function interface + single: interfacing with native code + +GHC (mostly) conforms to the Haskell Foreign Function Interface, whose +definition is part of the Haskell Report on +```http://www.haskell.org/`` <http://www.haskell.org/>`__. + +FFI support is enabled by default, but can be enabled or disabled +explicitly with the ``-XForeignFunctionInterface`` flag. + +GHC implements a number of GHC-specific extensions to the FFI Addendum. +These extensions are described in :ref:`ffi-ghcexts`, but please note +that programs using these features are not portable. Hence, these +features should be avoided where possible. + +The FFI libraries are documented in the accompanying library +documentation; see for example the :base-ref:`Foreign` module. + +.. _ffi-ghcexts: + +GHC extensions to the FFI Addendum +---------------------------------- + +The FFI features that are described in this section are specific to GHC. +Your code will not be portable to other compilers if you use them. + +Unboxed types +~~~~~~~~~~~~~ + +The following unboxed types may be used as basic foreign types (see FFI +Addendum, Section 3.2): ``Int#``, ``Word#``, ``Char#``, ``Float#``, +``Double#``, ``Addr#``, ``StablePtr# a``, ``MutableByteArray#``, +``ForeignObj#``, and ``ByteArray#``. + +.. _ffi-newtype-io: + +Newtype wrapping of the IO monad +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The FFI spec requires the IO monad to appear in various places, but it +can sometimes be convenient to wrap the IO monad in a ``newtype``, thus: + +:: + + newtype MyIO a = MIO (IO a) + +(A reason for doing so might be to prevent the programmer from calling +arbitrary IO procedures in some part of the program.) + +The Haskell FFI already specifies that arguments and results of foreign +imports and exports will be automatically unwrapped if they are newtypes +(Section 3.2 of the FFI addendum). GHC extends the FFI by automatically +unwrapping any newtypes that wrap the IO monad itself. More precisely, +wherever the FFI specification requires an ``IO`` type, GHC will accept any +newtype-wrapping of an ``IO`` type. For example, these declarations are OK: + +:: + + foreign import foo :: Int -> MyIO Int + foreign import "dynamic" baz :: (Int -> MyIO Int) -> CInt -> MyIO Int + +.. _ffi-prim: + +Primitive imports +~~~~~~~~~~~~~~~~~ + +GHC extends the FFI with an additional calling convention ``prim``, +e.g.: + +:: + + foreign import prim "foo" foo :: ByteArray# -> (# Int#, Int# #) + +This is used to import functions written in Cmm code that follow an +internal GHC calling convention. The arguments and results must be +unboxed types, except that an argument may be of type ``Any`` (by way of +``unsafeCoerce#``) and the result type is allowed to be an unboxed tuple +or the type ``Any``. + +This feature is not intended for use outside of the core libraries that +come with GHC. For more details see the +:ghc-wiki:`GHC developer wiki <Commentary/PrimOps>`. + +.. _ffi-interruptible: + +Interruptible foreign calls +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This concerns the interaction of foreign calls with +``Control.Concurrent.throwTo``. Normally when the target of a +``throwTo`` is involved in a foreign call, the exception is not raised +until the call returns, and in the meantime the caller is blocked. This +can result in unresponsiveness, which is particularly undesirable in the +case of user interrupt (e.g. Control-C). The default behaviour when a +Control-C signal is received (``SIGINT`` on Unix) is to raise the +``UserInterrupt`` exception in the main thread; if the main thread is +blocked in a foreign call at the time, then the program will not respond +to the user interrupt. + +The problem is that it is not possible in general to interrupt a foreign +call safely. However, GHC does provide a way to interrupt blocking +system calls which works for most system calls on both Unix and Windows. +When the ``InterruptibleFFI`` extension is enabled, a foreign call can +be annotated with ``interruptible`` instead of ``safe`` or ``unsafe``: + +:: + + foreign import ccall interruptible + "sleep" sleepBlock :: CUint -> IO CUint + +``interruptible`` behaves exactly as ``safe``, except that when a +``throwTo`` is directed at a thread in an interruptible foreign call, an +OS-specific mechanism will be used to attempt to cause the foreign call +to return: + +Unix systems + The thread making the foreign call is sent a ``SIGPIPE`` signal + using ``pthread_kill()``. This is usually enough to cause a blocking + system call to return with ``EINTR`` (GHC by default installs an + empty signal handler for ``SIGPIPE``, to override the default + behaviour which is to terminate the process immediately). + +Windows systems + [Vista and later only] The RTS calls the Win32 function + ``CancelSynchronousIO``, which will cause a blocking I/O operation + to return with the error ``ERROR_OPERATION_ABORTED``. + +If the system call is successfully interrupted, it will return to +Haskell whereupon the exception can be raised. Be especially careful +when using ``interruptible`` that the caller of the foreign function is +prepared to deal with the consequences of the call being interrupted; on +Unix it is good practice to check for ``EINTR`` always, but on Windows +it is not typically necessary to handle ``ERROR_OPERATION_ABORTED``. + +.. _ffi-capi: + +The CAPI calling convention +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``CApiFFI`` extension allows a calling convention of ``capi`` to be +used in foreign declarations, e.g. + +:: + + foreign import capi "header.h f" f :: CInt -> IO CInt + +Rather than generating code to call ``f`` according to the platform's +ABI, we instead call ``f`` using the C API defined in the header +``header.h``. Thus ``f`` can be called even if it may be defined as a +CPP ``#define`` rather than a proper function. + +When using ``capi``, it is also possible to import values, rather than +functions. For example, + +:: + + foreign import capi "pi.h value pi" c_pi :: CDouble + +will work regardless of whether ``pi`` is defined as + +:: + + const double pi = 3.14; + +or with + +:: + + #define pi 3.14 + +In order to tell GHC the C type that a Haskell type corresponds to when +it is used with the CAPI, a ``CTYPE`` pragma can be used on the type +definition. The header which defines the type can optionally also be +specified. The syntax looks like: + +:: + + data {-# CTYPE "unistd.h" "useconds_t" #-} T = ... + newtype {-# CTYPE "useconds_t" #-} T = ... + +``hs_thread_done()`` +~~~~~~~~~~~~~~~~~~~~ + +:: + + void hs_thread_done(void); + +GHC allocates a small amount of thread-local memory when a thread calls +a Haskell function via a ``foreign export``. This memory is not normally +freed until ``hs_exit()``; the memory is cached so that subsequent calls +into Haskell are fast. However, if your application is long-running and +repeatedly creates new threads that call into Haskell, you probably want +to arrange that this memory is freed in those threads that have finished +calling Haskell functions. To do this, call ``hs_thread_done()`` from +the thread whose memory you want to free. + +Calling ``hs_thread_done()`` is entirely optional. You can call it as +often or as little as you like. It is safe to call it from a thread that +has never called any Haskell functions, or one that never will. If you +forget to call it, the worst that can happen is that some memory remains +allocated until ``hs_exit()`` is called. If you call it too often, the +worst that can happen is that the next call to a Haskell function incurs +some extra overhead. + +.. _ffi-ghc: + +Using the FFI with GHC +---------------------- + +The following sections also give some hints and tips on the use of the +foreign function interface in GHC. + +.. _foreign-export-ghc: + +Using ``foreign export`` and ``foreign import ccall "wrapper"`` with GHC +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. index:: + single: foreign export; with GHC + +When GHC compiles a module (say ``M.hs``) which uses ``foreign export`` +or ``foreign import "wrapper"``, it generates a ``M_stub.h`` for use by +C programs. + +For a plain ``foreign export``, the file ``M_stub.h`` contains a C +prototype for the foreign exported function. For example, if we compile +the following module: + +:: + + module Foo where + + foreign export ccall foo :: Int -> IO Int + + foo :: Int -> IO Int + foo n = return (length (f n)) + + f :: Int -> [Int] + f 0 = [] + f n = n:(f (n-1)) + +Then ``Foo_stub.h`` will contain something like this: + +:: + + #include "HsFFI.h" + extern HsInt foo(HsInt a0); + +To invoke ``foo()`` from C, just ``#include "Foo_stub.h"`` and call +``foo()``. + +The ``Foo_stub.h`` file can be redirected using the ``-stubdir`` option; +see :ref:`options-output`. + +.. _using-own-main: + +Using your own ``main()`` +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Normally, GHC's runtime system provides a ``main()``, which arranges to +invoke ``Main.main`` in the Haskell program. However, you might want to +link some Haskell code into a program which has a main function written +in another language, say C. In order to do this, you have to initialize +the Haskell runtime system explicitly. + +Let's take the example from above, and invoke it from a standalone C +program. Here's the C code: + +:: + + #include <stdio.h> + #include "HsFFI.h" + + #ifdef __GLASGOW_HASKELL__ + #include "Foo_stub.h" + #endif + + int main(int argc, char *argv[]) + { + int i; + + hs_init(&argc, &argv); + + for (i = 0; i < 5; i++) { + printf("%d\n", foo(2500)); + } + + hs_exit(); + return 0; + } + +We've surrounded the GHC-specific bits with +``#ifdef __GLASGOW_HASKELL__``; the rest of the code should be portable +across Haskell implementations that support the FFI standard. + +The call to ``hs_init()`` initializes GHC's runtime system. Do NOT try +to invoke any Haskell functions before calling ``hs_init()``: bad things +will undoubtedly happen. + +We pass references to ``argc`` and ``argv`` to ``hs_init()`` so that it +can separate out any arguments for the RTS (i.e. those arguments between +``+RTS...-RTS``). + +After we've finished invoking our Haskell functions, we can call +``hs_exit()``, which terminates the RTS. + +There can be multiple calls to ``hs_init()``, but each one should be +matched by one (and only one) call to ``hs_exit()`` [1]_. + +.. note:: + When linking the final program, it is normally easiest to do the + link using GHC, although this isn't essential. If you do use GHC, then + don't forget the flag ``-no-hs-main``\ ``-no-hs-main``, otherwise GHC + will try to link to the ``Main`` Haskell module. + +.. [1] + The outermost ``hs_exit()`` will actually de-initialise the system. + NOTE that currently GHC's runtime cannot reliably re-initialise after + this has happened, see :ref:`infelicities-ffi`. + +To use ``+RTS`` flags with ``hs_init()``, we have to modify the example +slightly. By default, GHC's RTS will only accept "safe" ``+RTS`` flags +(see :ref:`options-linker`), and the ``-rtsopts``\ ``-rtsopts`` +link-time flag overrides this. However, ``-rtsopts`` has no effect when +``-no-hs-main`` is in use (and the same goes for ``-with-rtsopts``). To +set these options we have to call a GHC-specific API instead of +``hs_init()``: + +:: + + #include <stdio.h> + #include "HsFFI.h" + + #ifdef __GLASGOW_HASKELL__ + #include "Foo_stub.h" + #include "Rts.h" + #endif + + int main(int argc, char *argv[]) + { + int i; + + #if __GLASGOW_HASKELL__ >= 703 + { + RtsConfig conf = defaultRtsConfig; + conf.rts_opts_enabled = RtsOptsAll; + hs_init_ghc(&argc, &argv, conf); + } + #else + hs_init(&argc, &argv); + #endif + + for (i = 0; i < 5; i++) { + printf("%d\n", foo(2500)); + } + + hs_exit(); + return 0; + } + +Note two changes: we included ``Rts.h``, which defines the GHC-specific +external RTS interface, and we called ``hs_init_ghc()`` instead of +``hs_init()``, passing an argument of type ``RtsConfig``. ``RtsConfig`` +is a struct with various fields that affect the behaviour of the runtime +system. Its definition is: + +:: + + typedef struct { + RtsOptsEnabledEnum rts_opts_enabled; + const char *rts_opts; + } RtsConfig; + + extern const RtsConfig defaultRtsConfig; + + typedef enum { + RtsOptsNone, // +RTS causes an error + RtsOptsSafeOnly, // safe RTS options allowed; others cause an error + RtsOptsAll // all RTS options allowed + } RtsOptsEnabledEnum; + +There is a default value ``defaultRtsConfig`` that should be used to +initialise variables of type ``RtsConfig``. More fields will undoubtedly +be added to ``RtsConfig`` in the future, so in order to keep your code +forwards-compatible it is best to initialise with ``defaultRtsConfig`` +and then modify the required fields, as in the code sample above. + +.. _ffi-library: + +Making a Haskell library that can be called from foreign code +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The scenario here is much like in :ref:`using-own-main`, except that the +aim is not to link a complete program, but to make a library from +Haskell code that can be deployed in the same way that you would deploy +a library of C code. + +The main requirement here is that the runtime needs to be initialized +before any Haskell code can be called, so your library should provide +initialisation and deinitialisation entry points, implemented in C or +C++. For example: + +:: + + #include <stdlib.h> + #include "HsFFI.h" + + HsBool mylib_init(void){ + int argc = 2; + char *argv[] = { "+RTS", "-A32m", NULL }; + char **pargv = argv; + + // Initialize Haskell runtime + hs_init(&argc, &pargv); + + // do any other initialization here and + // return false if there was a problem + return HS_BOOL_TRUE; + } + + void mylib_end(void){ + hs_exit(); + } + +The initialisation routine, ``mylib_init``, calls ``hs_init()`` as +normal to initialise the Haskell runtime, and the corresponding +deinitialisation function ``mylib_end()`` calls ``hs_exit()`` to shut +down the runtime. + +.. _glasgow-foreign-headers: + +Using header files +~~~~~~~~~~~~~~~~~~ + +.. index:: + single: C calls, function headers + +C functions are normally declared using prototypes in a C header file. +Earlier versions of GHC (6.8.3 and earlier) ``#include``\ d the header +file in the C source file generated from the Haskell code, and the C +compiler could therefore check that the C function being called via the +FFI was being called at the right type. + +GHC no longer includes external header files when compiling via C, so +this checking is not performed. The change was made for compatibility +with the :ref:`native code generator <native-code-gen>` (``-fasm``) and to +comply strictly with the FFI specification, which requires that FFI calls are +not subject to macro expansion and other CPP conversions that may be applied +when using C header files. This approach also simplifies the inlining of foreign +calls across module and package boundaries: there's no need for the header file +to be available when compiling an inlined version of a foreign call, so the +compiler is free to inline foreign calls in any context. + +The ``-#include`` option is now deprecated, and the ``include-files`` +field in a Cabal package specification is ignored. + +Memory Allocation +~~~~~~~~~~~~~~~~~ + +The FFI libraries provide several ways to allocate memory for use with +the FFI, and it isn't always clear which way is the best. This decision +may be affected by how efficient a particular kind of allocation is on a +given compiler/platform, so this section aims to shed some light on how +the different kinds of allocation perform with GHC. + +``alloca`` + Useful for short-term allocation when the allocation is intended to + scope over a given ``IO`` computation. This kind of allocation is + commonly used when marshalling data to and from FFI functions. + + In GHC, ``alloca`` is implemented using ``MutableByteArray#``, so + allocation and deallocation are fast: much faster than C's + ``malloc/free``, but not quite as fast as stack allocation in C. Use + ``alloca`` whenever you can. + +``mallocForeignPtr`` + Useful for longer-term allocation which requires garbage collection. + If you intend to store the pointer to the memory in a foreign data + structure, then ``mallocForeignPtr`` is *not* a good choice, + however. + + In GHC, ``mallocForeignPtr`` is also implemented using + ``MutableByteArray#``. Although the memory is pointed to by a + ``ForeignPtr``, there are no actual finalizers involved (unless you + add one with ``addForeignPtrFinalizer``), and the deallocation is + done using GC, so ``mallocForeignPtr`` is normally very cheap. + +``malloc/free`` + If all else fails, then you need to resort to ``Foreign.malloc`` and + ``Foreign.free``. These are just wrappers around the C functions of + the same name, and their efficiency will depend ultimately on the + implementations of these functions in your platform's C library. We + usually find ``malloc`` and ``free`` to be significantly slower than + the other forms of allocation above. + +``Foreign.Marshal.Pool`` + Pools are currently implemented using ``malloc/free``, so while they + might be a more convenient way to structure your memory allocation + than using one of the other forms of allocation, they won't be any + more efficient. We do plan to provide an improved-performance + implementation of Pools in the future, however. + +.. _ffi-threads: + +Multi-threading and the FFI +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to use the FFI in a multi-threaded setting, you must use the +``-threaded`` option (see :ref:`options-linker`). + +Foreign imports and multi-threading +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When you call a ``foreign import``\ ed function that is annotated as +``safe`` (the default), and the program was linked using ``-threaded``, +then the call will run concurrently with other running Haskell threads. +If the program was linked without ``-threaded``, then the other Haskell +threads will be blocked until the call returns. + +This means that if you need to make a foreign call to a function that +takes a long time or blocks indefinitely, then you should mark it +``safe`` and use ``-threaded``. Some library functions make such calls +internally; their documentation should indicate when this is the case. + +If you are making foreign calls from multiple Haskell threads and using +``-threaded``, make sure that the foreign code you are calling is +thread-safe. In particularly, some GUI libraries are not thread-safe and +require that the caller only invokes GUI methods from a single thread. +If this is the case, you may need to restrict your GUI operations to a +single Haskell thread, and possibly also use a bound thread (see +:ref:`haskell-threads-and-os-threads`). + +Note that foreign calls made by different Haskell threads may execute in +*parallel*, even when the ``+RTS -N`` flag is not being used +(:ref:`parallel-options`). The ``+RTS -N`` flag controls parallel +execution of Haskell threads, but there may be an arbitrary number of +foreign calls in progress at any one time, regardless of the ``+RTS -N`` +value. + +If a call is annotated as ``interruptible`` and the program was +multithreaded, the call may be interrupted in the event that the Haskell +thread receives an exception. The mechanism by which the interrupt +occurs is platform dependent, but is intended to cause blocking system +calls to return immediately with an interrupted error code. The +underlying operating system thread is not to be destroyed. See +:ref:`ffi-interruptible` for more details. + +.. _haskell-threads-and-os-threads: + +The relationship between Haskell threads and OS threads +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Normally there is no fixed relationship between Haskell threads and OS +threads. This means that when you make a foreign call, that call may +take place in an unspecified OS thread. Furthermore, there is no +guarantee that multiple calls made by one Haskell thread will be made by +the same OS thread. + +This usually isn't a problem, and it allows the GHC runtime system to +make efficient use of OS thread resources. However, there are cases +where it is useful to have more control over which OS thread is used, +for example when calling foreign code that makes use of thread-local +state. For cases like this, we provide *bound threads*, which are +Haskell threads tied to a particular OS thread. For information on bound +threads, see the documentation for the :base-ref:`Control.Concurrent` module. + +Foreign exports and multi-threading +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When the program is linked with ``-threaded``, then you may invoke +``foreign export``\ ed functions from multiple OS threads concurrently. +The runtime system must be initialised as usual by calling +``hs_init()``, and this call must complete before invoking any +``foreign export``\ ed functions. + +.. _hs-exit: + +On the use of ``hs_exit()`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``hs_exit()`` normally causes the termination of any running Haskell +threads in the system, and when ``hs_exit()`` returns, there will be no +more Haskell threads running. The runtime will then shut down the system +in an orderly way, generating profiling output and statistics if +necessary, and freeing all the memory it owns. + +It isn't always possible to terminate a Haskell thread forcibly: for +example, the thread might be currently executing a foreign call, and we +have no way to force the foreign call to complete. What's more, the +runtime must assume that in the worst case the Haskell code and runtime +are about to be removed from memory (e.g. if this is a +:ref:`Windows DLL <win32-dlls>`, ``hs_exit()`` is normally called before unloading +the DLL). So ``hs_exit()`` *must* wait until all outstanding foreign +calls return before it can return itself. + +The upshot of this is that if you have Haskell threads that are blocked +in foreign calls, then ``hs_exit()`` may hang (or possibly busy-wait) +until the calls return. Therefore it's a good idea to make sure you +don't have any such threads in the system when calling ``hs_exit()``. +This includes any threads doing I/O, because I/O may (or may not, +depending on the type of I/O and the platform) be implemented using +blocking foreign calls. + +The GHC runtime treats program exit as a special case, to avoid the need +to wait for blocked threads when a standalone executable exits. Since +the program and all its threads are about to terminate at the same time +that the code is removed from memory, it isn't necessary to ensure that +the threads have exited first. (Unofficially, if you want to use this +fast and loose version of ``hs_exit()``, then call +``shutdownHaskellAndExit()`` instead). + +.. _ffi-floating-point: + +Floating point and the FFI +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. index:: + single: Floating point; and the FFI + +The standard C99 ``fenv.h`` header provides operations for inspecting +and modifying the state of the floating point unit. In particular, the +rounding mode used by floating point operations can be changed, and the +exception flags can be tested. + +In Haskell, floating-point operations have pure types, and the +evaluation order is unspecified. So strictly speaking, since the +``fenv.h`` functions let you change the results of, or observe the +effects of floating point operations, use of ``fenv.h`` renders the +behaviour of floating-point operations anywhere in the program +undefined. + +Having said that, we *can* document exactly what GHC does with respect +to the floating point state, so that if you really need to use +``fenv.h`` then you can do so with full knowledge of the pitfalls: + +- GHC completely ignores the floating-point environment, the runtime + neither modifies nor reads it. + +- The floating-point environment is not saved over a normal thread + context-switch. So if you modify the floating-point state in one + thread, those changes may be visible in other threads. Furthermore, + testing the exception state is not reliable, because a context switch + may change it. If you need to modify or test the floating point state + and use threads, then you must use bound threads + (``Control.Concurrent.forkOS``), because a bound thread has its own + OS thread, and OS threads do save and restore the floating-point + state. + +- It is safe to modify the floating-point unit state temporarily during + a foreign call, because foreign calls are never pre-empted by GHC. + |