diff options
author | Ross Barnowski <rossbar@berkeley.edu> | 2022-03-13 09:25:38 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-03-13 09:25:38 -0700 |
commit | ed276211e7054fc6277dc3d11ea6ebaba998013c (patch) | |
tree | a2d381c4ea45be63c982424e11dffa1ef5602907 | |
parent | 217ef3438317f845066b8a122ae1d742259fbb7a (diff) | |
download | networkx-ed276211e7054fc6277dc3d11ea6ebaba998013c.tar.gz |
First draft. (#5359)
-rw-r--r-- | doc/developer/nxeps/nxep-0004.rst | 217 |
1 files changed, 217 insertions, 0 deletions
diff --git a/doc/developer/nxeps/nxep-0004.rst b/doc/developer/nxeps/nxep-0004.rst new file mode 100644 index 00000000..bbd4d891 --- /dev/null +++ b/doc/developer/nxeps/nxep-0004.rst @@ -0,0 +1,217 @@ +.. _NXEP4: + +====================================================================== +NXEP 4 — Adopting `numpy.random.Generator` as default random interface +====================================================================== + +:Author: Ross Barnowski (rossbar@berkeley.edu) +:Status: Draft +:Type: Standards Track +:Created: 2022-02-24 + + +Abstract +-------- + +Pseudo-random numbers play an important role in many graph and network analysis +algorithms in NetworkX. +NetworkX provides a :ref:`standard interface to random number generators <randomness>` +that includes support for `numpy.random` and the Python built-in `random` module. +`numpy.random` is used extensively within NetworkX and in most cases is the +preferred package for random number generation. +NumPy introduced a new interface in the `numpy.random` package in NumPy version +1.17. +According to :doc:`NEP19`, the new interface based on `numpy.random.Generator` +is recommended over the legacy `numpy.random.RandomState` as the former has +`better statistical properties <pcg_table>`_, :ref:`more features <what's_new_or_different>`, +and :doc:`improved performance <random/performance>`. +This NXEP proposes a strategy for adopting `numpy.random.Generator` as the +**default** interface for random number generation within NetworkX. + +.. _pcg_table: https://www.pcg-random.org/index.html + +Motivation and Scope +-------------------- + +The primary motivation for adopting `numpy.random.Generator` as the default +random number generation engine in NetworkX is to allow users to benefit from +the improvements in `numpy.random.Generator`, including: + - Advances in statistical quality of modern pRNG's + - Improved performance + - Additional features + +The `numpy.random.Generator` API is very similar to the `numpy.random.RandomState` +API, so users can benefit from these improvements without any additional changes +[#f1]_ to their existing NetworkX code. + +In principle this change would impact NetworkX users that use any of the +functions decorated by `~networkx.utils.decorators.np_random_state` +or `~networkx.utils.decorators.py_random_state` (when the ``random_state`` argument +involves ``numpy``). +See the next section for details. + +.. [#f1] See note about the compatibility layer in the :ref:`Implementation section <Implementation>` + +Usage and Impact +---------------- + +In NetworkX, random number generators are typically created via a decorator:: + + from networkx.utils import np_random_state + + @np_random_state("seed") # Or could be the arg position, i.e. 0 + def foo(seed=None): + return seed + +The decorator is responsible for mapping various different inputs into an +instance of a random number generator within the function. +Currently, the random number generator instance that is returned is a +`numpy.random.RandomState` object:: + + >>> type(foo(None)) + numpy.random.mtrand.RandomState + >>> type(foo(12345)) + numpy.random.mtrand.RandomState + +The only way to get a `numpy.random.Generator` instance from the random state +decorators is to pass the instance in directly:: + + >>> import numpy as np + >>> rng = np.random.default_rng() + >>> type(foo(rng)) + numpy.random._generator.Generator + +This NXEP proposes to change the behavior so that when e.g. and integer or +`None` is given for the ``seed`` parameter, a `numpy.random.Generator` instance +is returned instead, i.e.:: + + >>> type(foo(None)) + numpy.random._generator.Generator + >>> type(foo(12345)) + numpy.random._generator.Generator + +`numpy.random.RandomState` instances can still be used as ``seed``, but they +must be explicitly passed in:: + + >>> rs = np.random.RandomState(12345) + >>> type(foo(rs)) + numpy.random.mtrand.RandomState + +Backward compatibility +---------------------- + +There are three main concerns: + +1. The ``Generator`` interface is not stream-compatible with ``RandomState``, + thus the results of the ``Generator`` methods will not be exactly the same + as the corresponding ``RandomState`` methods. +2. There are a few slight differences in method names and availability between + the ``RandomState`` and ``Generator`` APIs. +3. There is no global ``Generator`` instance internal to `numpy.random` as is + the case for `numpy.random.RandomState`. + +The `numpy.random.Generator` interface breaks the stream-compatibility +guarantee that `numpy.random.RandomState` upheld of exact reproducibility of +values. +Switching the default random number generator from ``RandomState`` to +``Generator`` would mean functions decorated with ``np_random_state`` would +produce different results when a value *other than an instantiated rng* is used +as the seed. +For example, let's take the following function:: + + @np_random_state("seed") + def bar(num, seed=None): + """Return an array of `num` uniform random numbers.""" + return seed.random(num) + +With the current implementation of ``np_random_state``, a user can pass in an +integer value to ``seed`` which will be used to seed a new ``RandomState`` +instance. +Using the same seed value guarantees the output is always exactly reproducible:: + + >>> bar(10, seed=12345) + array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503, + 0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987]) + >>> bar(10, seed=12345) + array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503, + 0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987]) + +However, after changing the default rng returned by ``np_random_state`` to +a ``Generator`` instance, the values produced by the decorated ``bar`` function +for integer seeds would no longer be identical:: + + >>> bar(10, seed=12345) + array([0.22733602, 0.31675834, 0.79736546, 0.67625467, 0.39110955, + 0.33281393, 0.59830875, 0.18673419, 0.67275604, 0.94180287]) + +In order to recover exact reproducibility of the original results, a seeded +``RandomState`` instance would need to be explicitly created and passed in +via ``seed``:: + + >>> import numpy as np + >>> rng = np.random.RandomState(12345) + >>> bar(10, seed=rng) + array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503, + 0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987]) + +Because the streams would no longer be compatible, it is proposed in this NXEP +that switching the default random number generator only be considered for a +major release, e.g. the transition from NetworkX 2.X to NetworkX 3.0. + +The second point is only a concern for users who are using +`~networkx.utils.misc.create_random_state` and the corresponding decorator +`~networkx.utils.decorators.np_random_state` in their own libraries. +For example, the `numpy.random.RandomState.randint` method has been replaced +by `numpy.random.Generator.integers`. +Thus any code that uses `create_random_state` or `create_py_random_state` and +relies on the ``randint`` method of the returned rng would result in an +`AttributeError`. +This can be addressed with a compatiblity class similar to the +`networkx.utils.misc.PythonRandomInterface` class, which provides a compatibility +layer between `random` and `numpy.random.RandomState`. + +`create_random_state` currently returns the global ``numpy.random.mtrand._rand`` +`RandomState` instance when the input is `None` or the numpy.random module. +By switching to `numpy.random.Generator`, this will no longer be possible as +there is no global, internal `Generator` instance in the `numpy.random` module. +This should have no effect on users. + +Detailed description +-------------------- + +This NXEP proposes to change the default random number generator produced by +the `~networkx.utils.misc.create_random_state` function (and the related +decorator `~networkx.utils.decorators.np_random_state`) from a `numpy.random.RandomState` +instance to a `numpy.random.Generator` instance when the input to the +function is either an integer or `None`. + +Related Work +------------ + +- NEP 19 +- TODO + +Implementation +-------------- + + +TODO: simple diff here + +The implementation itself is quite simple. Most of the work will go into +improved/reorganized tests. + +Alternatives +------------ + +The status quo, i.e. using ``RandomState`` by default, is a completely +acceptable alternative. +``RandomState`` is not deprecated, and is expected to maintain its stream-compatibility +guarantee in perpetuity. + +Discussion +---------- + +This section may just be a bullet list including links to any discussions +regarding the NXEP: + +- This includes links to mailing list threads or relevant GitHub issues. |