summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRoss Barnowski <rossbar@berkeley.edu>2022-03-13 09:25:38 -0700
committerGitHub <noreply@github.com>2022-03-13 09:25:38 -0700
commited276211e7054fc6277dc3d11ea6ebaba998013c (patch)
treea2d381c4ea45be63c982424e11dffa1ef5602907
parent217ef3438317f845066b8a122ae1d742259fbb7a (diff)
downloadnetworkx-ed276211e7054fc6277dc3d11ea6ebaba998013c.tar.gz
First draft. (#5359)
-rw-r--r--doc/developer/nxeps/nxep-0004.rst217
1 files changed, 217 insertions, 0 deletions
diff --git a/doc/developer/nxeps/nxep-0004.rst b/doc/developer/nxeps/nxep-0004.rst
new file mode 100644
index 00000000..bbd4d891
--- /dev/null
+++ b/doc/developer/nxeps/nxep-0004.rst
@@ -0,0 +1,217 @@
+.. _NXEP4:
+
+======================================================================
+NXEP 4 — Adopting `numpy.random.Generator` as default random interface
+======================================================================
+
+:Author: Ross Barnowski (rossbar@berkeley.edu)
+:Status: Draft
+:Type: Standards Track
+:Created: 2022-02-24
+
+
+Abstract
+--------
+
+Pseudo-random numbers play an important role in many graph and network analysis
+algorithms in NetworkX.
+NetworkX provides a :ref:`standard interface to random number generators <randomness>`
+that includes support for `numpy.random` and the Python built-in `random` module.
+`numpy.random` is used extensively within NetworkX and in most cases is the
+preferred package for random number generation.
+NumPy introduced a new interface in the `numpy.random` package in NumPy version
+1.17.
+According to :doc:`NEP19`, the new interface based on `numpy.random.Generator`
+is recommended over the legacy `numpy.random.RandomState` as the former has
+`better statistical properties <pcg_table>`_, :ref:`more features <what's_new_or_different>`,
+and :doc:`improved performance <random/performance>`.
+This NXEP proposes a strategy for adopting `numpy.random.Generator` as the
+**default** interface for random number generation within NetworkX.
+
+.. _pcg_table: https://www.pcg-random.org/index.html
+
+Motivation and Scope
+--------------------
+
+The primary motivation for adopting `numpy.random.Generator` as the default
+random number generation engine in NetworkX is to allow users to benefit from
+the improvements in `numpy.random.Generator`, including:
+ - Advances in statistical quality of modern pRNG's
+ - Improved performance
+ - Additional features
+
+The `numpy.random.Generator` API is very similar to the `numpy.random.RandomState`
+API, so users can benefit from these improvements without any additional changes
+[#f1]_ to their existing NetworkX code.
+
+In principle this change would impact NetworkX users that use any of the
+functions decorated by `~networkx.utils.decorators.np_random_state`
+or `~networkx.utils.decorators.py_random_state` (when the ``random_state`` argument
+involves ``numpy``).
+See the next section for details.
+
+.. [#f1] See note about the compatibility layer in the :ref:`Implementation section <Implementation>`
+
+Usage and Impact
+----------------
+
+In NetworkX, random number generators are typically created via a decorator::
+
+ from networkx.utils import np_random_state
+
+ @np_random_state("seed") # Or could be the arg position, i.e. 0
+ def foo(seed=None):
+ return seed
+
+The decorator is responsible for mapping various different inputs into an
+instance of a random number generator within the function.
+Currently, the random number generator instance that is returned is a
+`numpy.random.RandomState` object::
+
+ >>> type(foo(None))
+ numpy.random.mtrand.RandomState
+ >>> type(foo(12345))
+ numpy.random.mtrand.RandomState
+
+The only way to get a `numpy.random.Generator` instance from the random state
+decorators is to pass the instance in directly::
+
+ >>> import numpy as np
+ >>> rng = np.random.default_rng()
+ >>> type(foo(rng))
+ numpy.random._generator.Generator
+
+This NXEP proposes to change the behavior so that when e.g. and integer or
+`None` is given for the ``seed`` parameter, a `numpy.random.Generator` instance
+is returned instead, i.e.::
+
+ >>> type(foo(None))
+ numpy.random._generator.Generator
+ >>> type(foo(12345))
+ numpy.random._generator.Generator
+
+`numpy.random.RandomState` instances can still be used as ``seed``, but they
+must be explicitly passed in::
+
+ >>> rs = np.random.RandomState(12345)
+ >>> type(foo(rs))
+ numpy.random.mtrand.RandomState
+
+Backward compatibility
+----------------------
+
+There are three main concerns:
+
+1. The ``Generator`` interface is not stream-compatible with ``RandomState``,
+ thus the results of the ``Generator`` methods will not be exactly the same
+ as the corresponding ``RandomState`` methods.
+2. There are a few slight differences in method names and availability between
+ the ``RandomState`` and ``Generator`` APIs.
+3. There is no global ``Generator`` instance internal to `numpy.random` as is
+ the case for `numpy.random.RandomState`.
+
+The `numpy.random.Generator` interface breaks the stream-compatibility
+guarantee that `numpy.random.RandomState` upheld of exact reproducibility of
+values.
+Switching the default random number generator from ``RandomState`` to
+``Generator`` would mean functions decorated with ``np_random_state`` would
+produce different results when a value *other than an instantiated rng* is used
+as the seed.
+For example, let's take the following function::
+
+ @np_random_state("seed")
+ def bar(num, seed=None):
+ """Return an array of `num` uniform random numbers."""
+ return seed.random(num)
+
+With the current implementation of ``np_random_state``, a user can pass in an
+integer value to ``seed`` which will be used to seed a new ``RandomState``
+instance.
+Using the same seed value guarantees the output is always exactly reproducible::
+
+ >>> bar(10, seed=12345)
+ array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503,
+ 0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987])
+ >>> bar(10, seed=12345)
+ array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503,
+ 0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987])
+
+However, after changing the default rng returned by ``np_random_state`` to
+a ``Generator`` instance, the values produced by the decorated ``bar`` function
+for integer seeds would no longer be identical::
+
+ >>> bar(10, seed=12345)
+ array([0.22733602, 0.31675834, 0.79736546, 0.67625467, 0.39110955,
+ 0.33281393, 0.59830875, 0.18673419, 0.67275604, 0.94180287])
+
+In order to recover exact reproducibility of the original results, a seeded
+``RandomState`` instance would need to be explicitly created and passed in
+via ``seed``::
+
+ >>> import numpy as np
+ >>> rng = np.random.RandomState(12345)
+ >>> bar(10, seed=rng)
+ array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503,
+ 0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987])
+
+Because the streams would no longer be compatible, it is proposed in this NXEP
+that switching the default random number generator only be considered for a
+major release, e.g. the transition from NetworkX 2.X to NetworkX 3.0.
+
+The second point is only a concern for users who are using
+`~networkx.utils.misc.create_random_state` and the corresponding decorator
+`~networkx.utils.decorators.np_random_state` in their own libraries.
+For example, the `numpy.random.RandomState.randint` method has been replaced
+by `numpy.random.Generator.integers`.
+Thus any code that uses `create_random_state` or `create_py_random_state` and
+relies on the ``randint`` method of the returned rng would result in an
+`AttributeError`.
+This can be addressed with a compatiblity class similar to the
+`networkx.utils.misc.PythonRandomInterface` class, which provides a compatibility
+layer between `random` and `numpy.random.RandomState`.
+
+`create_random_state` currently returns the global ``numpy.random.mtrand._rand``
+`RandomState` instance when the input is `None` or the numpy.random module.
+By switching to `numpy.random.Generator`, this will no longer be possible as
+there is no global, internal `Generator` instance in the `numpy.random` module.
+This should have no effect on users.
+
+Detailed description
+--------------------
+
+This NXEP proposes to change the default random number generator produced by
+the `~networkx.utils.misc.create_random_state` function (and the related
+decorator `~networkx.utils.decorators.np_random_state`) from a `numpy.random.RandomState`
+instance to a `numpy.random.Generator` instance when the input to the
+function is either an integer or `None`.
+
+Related Work
+------------
+
+- NEP 19
+- TODO
+
+Implementation
+--------------
+
+
+TODO: simple diff here
+
+The implementation itself is quite simple. Most of the work will go into
+improved/reorganized tests.
+
+Alternatives
+------------
+
+The status quo, i.e. using ``RandomState`` by default, is a completely
+acceptable alternative.
+``RandomState`` is not deprecated, and is expected to maintain its stream-compatibility
+guarantee in perpetuity.
+
+Discussion
+----------
+
+This section may just be a bullet list including links to any discussions
+regarding the NXEP:
+
+- This includes links to mailing list threads or relevant GitHub issues.