diff options
Diffstat (limited to 'docs/src/userguide/parallelism.rst')
-rw-r--r-- | docs/src/userguide/parallelism.rst | 89 |
1 files changed, 58 insertions, 31 deletions
diff --git a/docs/src/userguide/parallelism.rst b/docs/src/userguide/parallelism.rst index e9d473e66..7cdae95b3 100644 --- a/docs/src/userguide/parallelism.rst +++ b/docs/src/userguide/parallelism.rst @@ -8,6 +8,9 @@ Using Parallelism ********************************** +.. include:: + ../two-syntax-variants-used + Cython supports native parallelism through the :py:mod:`cython.parallel` module. To use this kind of parallelism, the GIL must be released (see :ref:`Releasing the GIL <nogil>`). @@ -87,7 +90,7 @@ It currently supports OpenMP, but later on more backends might be supported. runtime: The schedule and chunk size are taken from the runtime scheduling variable, which can be set through the ``openmp.omp_set_schedule()`` - function call, or the OMP_SCHEDULE environment variable. Note that + function call, or the ``OMP_SCHEDULE`` environment variable. Note that this essentially disables any static compile time optimisations of the scheduling code itself and may therefore show a slightly worse performance than when the same scheduling policy is statically @@ -116,17 +119,27 @@ It currently supports OpenMP, but later on more backends might be supported. Example with a reduction: -.. literalinclude:: ../../examples/userguide/parallelism/simple_sum.pyx +.. tabs:: + + .. group-tab:: Pure Python + + .. literalinclude:: ../../examples/userguide/parallelism/simple_sum.py + + .. group-tab:: Cython + + .. literalinclude:: ../../examples/userguide/parallelism/simple_sum.pyx -Example with a :term:`typed memoryview<Typed memoryview>` (e.g. a NumPy array):: +Example with a :term:`typed memoryview<Typed memoryview>` (e.g. a NumPy array) - from cython.parallel import prange +.. tabs:: - def func(double[:] x, double alpha): - cdef Py_ssize_t i + .. group-tab:: Pure Python - for i in prange(x.shape[0]): - x[i] = alpha * x[i] + .. literalinclude:: ../../examples/userguide/parallelism/memoryview_sum.py + + .. group-tab:: Cython + + .. literalinclude:: ../../examples/userguide/parallelism/memoryview_sum.pyx .. function:: parallel(num_threads=None) @@ -137,29 +150,17 @@ Example with a :term:`typed memoryview<Typed memoryview>` (e.g. a NumPy array):: is also private to the prange. Variables that are private in the parallel block are unavailable after the parallel block. - Example with thread-local buffers:: - - from cython.parallel import parallel, prange - from libc.stdlib cimport abort, malloc, free + Example with thread-local buffers - cdef Py_ssize_t idx, i, n = 100 - cdef int * local_buf - cdef size_t size = 10 + .. tabs:: - with nogil, parallel(): - local_buf = <int *> malloc(sizeof(int) * size) - if local_buf is NULL: - abort() + .. group-tab:: Pure Python - # populate our local buffer in a sequential loop - for i in xrange(size): - local_buf[i] = i * 2 + .. literalinclude:: ../../examples/userguide/parallelism/parallel.py - # share the work using the thread-local buffer(s) - for i in prange(n, schedule='guided'): - func(local_buf) + .. group-tab:: Cython - free(local_buf) + .. literalinclude:: ../../examples/userguide/parallelism/parallel.pyx Later on sections might be supported in parallel blocks, to distribute code sections of work among threads. @@ -174,9 +175,17 @@ Compiling ========= To actually use the OpenMP support, you need to tell the C or C++ compiler to -enable OpenMP. For gcc this can be done as follows in a setup.py: +enable OpenMP. For gcc this can be done as follows in a ``setup.py``: + +.. tabs:: + + .. group-tab:: Pure Python -.. literalinclude:: ../../examples/userguide/parallelism/setup.py + .. literalinclude:: ../../examples/userguide/parallelism/setup_py.py + + .. group-tab:: Cython + + .. literalinclude:: ../../examples/userguide/parallelism/setup_pyx.py For Microsoft Visual C++ compiler, use ``'/openmp'`` instead of ``'-fopenmp'``. @@ -188,13 +197,21 @@ The parallel with and prange blocks support the statements break, continue and return in nogil mode. Additionally, it is valid to use a ``with gil`` block inside these blocks, and have exceptions propagate from them. However, because the blocks use OpenMP, they can not just be left, so the -exiting procedure is best-effort. For prange() this means that the loop +exiting procedure is best-effort. For ``prange()`` this means that the loop body is skipped after the first break, return or exception for any subsequent iteration in any thread. It is undefined which value shall be returned if multiple different values may be returned, as the iterations are in no particular order: -.. literalinclude:: ../../examples/userguide/parallelism/breaking_loop.pyx +.. tabs:: + + .. group-tab:: Pure Python + + .. literalinclude:: ../../examples/userguide/parallelism/breaking_loop.py + + .. group-tab:: Cython + + .. literalinclude:: ../../examples/userguide/parallelism/breaking_loop.pyx In the example above it is undefined whether an exception shall be raised, whether it will simply break or whether it will return 2. @@ -203,7 +220,17 @@ Using OpenMP Functions ====================== OpenMP functions can be used by cimporting ``openmp``: -.. literalinclude:: ../../examples/userguide/parallelism/cimport_openmp.pyx +.. tabs:: + + .. group-tab:: Pure Python + + .. literalinclude:: ../../examples/userguide/parallelism/cimport_openmp.py + :lines: 3- + + .. group-tab:: Cython + + .. literalinclude:: ../../examples/userguide/parallelism/cimport_openmp.pyx + :lines: 3- .. rubric:: References |