summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRoss Barnowski <rossbar@berkeley.edu>2022-03-14 22:23:49 -0700
committerGitHub <noreply@github.com>2022-03-14 22:23:49 -0700
commitc6e9065e7c0f3a8b1f58a2ffdd5ea041a33f0b94 (patch)
tree5d65ec52b200824af5b1efa77d476a943d926183
parent51eb1ad5b4b4483f4ca1b751936e804b03142b0b (diff)
downloadnetworkx-c6e9065e7c0f3a8b1f58a2ffdd5ea041a33f0b94.tar.gz
Finish up NXEP 4 first draft (#5391)
* Minor updates to wording. * Summarize and add links to relevant sklearn discussions. * Add an implementation section. * Add suggestion from Dan re: pkg-level toggle.
-rw-r--r--doc/developer/nxeps/nxep-0004.rst106
1 files changed, 96 insertions, 10 deletions
diff --git a/doc/developer/nxeps/nxep-0004.rst b/doc/developer/nxeps/nxep-0004.rst
index bbd4d891..22168437 100644
--- a/doc/developer/nxeps/nxep-0004.rst
+++ b/doc/developer/nxeps/nxep-0004.rst
@@ -17,7 +17,7 @@ Pseudo-random numbers play an important role in many graph and network analysis
algorithms in NetworkX.
NetworkX provides a :ref:`standard interface to random number generators <randomness>`
that includes support for `numpy.random` and the Python built-in `random` module.
-`numpy.random` is used extensively within NetworkX and in most cases is the
+`numpy.random` is used extensively within NetworkX and in several cases is the
preferred package for random number generation.
NumPy introduced a new interface in the `numpy.random` package in NumPy version
1.17.
@@ -171,10 +171,11 @@ This can be addressed with a compatiblity class similar to the
layer between `random` and `numpy.random.RandomState`.
`create_random_state` currently returns the global ``numpy.random.mtrand._rand``
-`RandomState` instance when the input is `None` or the numpy.random module.
+`RandomState` instance when the input is `None` or the ``numpy.random`` module.
By switching to `numpy.random.Generator`, this will no longer be possible as
there is no global, internal `Generator` instance in the `numpy.random` module.
-This should have no effect on users.
+This should have no effect on users, as ``seed=None`` currently does not
+guarantee reproducible results.
Detailed description
--------------------
@@ -188,17 +189,80 @@ function is either an integer or `None`.
Related Work
------------
-- NEP 19
-- TODO
+Scikit-learn has a similar pattern for imposing determinism on functions that
+depend on randomness.
+For example, many functions in ``scikit-learn`` have a ``random_state`` argument
+that functions similarly to how ``seed`` behaves in many NetworkX function
+signatures.
+One difference between ``scikit-learn`` and ``networkx`` is that scikit-learn
+**only** supports ``RandomState`` via the ``random_state`` keyword argument,
+whereas NetworkX implicitly supports both the built-in `random` module, as well
+as both the numpy ``RandomState`` and ``Generator`` instances (depending on
+the type of ``seed``).
+This is reflected in the name of the keyword argument as ``random_state``
+(used by scikit-learn) is les ambiguous than ``seed`` (used by NetworkX).
+
+There are multiple relevant discussions in the scikit-learn community about
+potential approaches to supporting the new NumPy random interface:
+
+- `scikit-learn/scikit-learn#16988 <sklearn16988>`_ covers strategies and concerns
+ related to enabling users to use the ``Generator``-based random number generators.
+- `scikit-learn/scikit-learn#14042 <sklearn14042>`_ is a higher-level discussion
+ that includes additional information about the design considerations and constraints
+ related to scikit-learn's ``random_state``.
+- There is also a releated `SLEP <slep011>`_.
+
+.. _sklearn16988: https://github.com/scikit-learn/scikit-learn/issues/16988
+.. _sklearn14042: https://github.com/scikit-learn/scikit-learn/issues/14042
+.. _slep011: https://github.com/scikit-learn/enhancement_proposals/pull/24
Implementation
--------------
-
-TODO: simple diff here
-
-The implementation itself is quite simple. Most of the work will go into
-improved/reorganized tests.
+The implementation itself is quite simple. The logic that determines how
+inputs are mapped to random number generators is encapsulated in the
+`~networkx.utils.misc.create_random_state` function (and the related
+`~networkx.utils.misc.create_py_random_state`).
+Currently (i.e. NetworkX <= 2.X), this function maps inputs like ``None``,
+``numpy.random``, and integers to ``RandomState`` instances::
+
+ def create_random_state(random_state=None):
+ if random_state is None or random_state is np.random:
+ return np.random.mtrand._rand
+ if isinstance(random_state, np.random.RandomState):
+ return random_state
+ if isinstance(random_state, int):
+ return np.random.RandomState(random_state)
+ if isinstance(random_state, np.random.Generator):
+ return random_state
+ msg = (
+ f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
+ "numpy.random.Generator instance"
+ )
+ raise ValueError(msg)
+
+This NXEP proposes to modify the function to produce ``Generator`` instances
+for these inputs. An example implementation might look something like::
+
+
+ def create_random_state(random_state=None):
+ if random_state is None or random_state is np.random:
+ return np.random.default_rng()
+ if isinstance(random_state, (np.random.RandomState, np.random.Generator)):
+ return random_state
+ if isinstance(random_state, int):
+ return np.random.default_rng(random_state)
+ msg = (
+ f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
+ "numpy.random.Generator instance"
+ )
+ raise ValueError(msg)
+
+
+The above captures the essential change in logic, though implementation details
+may differ.
+Most of the work related implementing this change will be associated with
+improved/reorganized tests; including adding tests rng-stream reproducibility.
Alternatives
------------
@@ -208,6 +272,28 @@ acceptable alternative.
``RandomState`` is not deprecated, and is expected to maintain its stream-compatibility
guarantee in perpetuity.
+Another possible alternative would be to provide a package-level toggle that
+users could use to switch the behavior the ``seed`` kwarg for all functions
+decorated by ``np_random_state`` or ``py_random_state``.
+To illustrate (ignoring implementation details)::
+
+
+ >>> import networkx as nx
+ >>> from networkx.utils.misc import create_random_state
+
+ # NetworkX 2.X behavior: RandomState by default
+
+ >>> type(create_random_state(12345))
+ numpy.random.mtrand.RandomState
+
+ # Change random backend by setting pkg attr
+
+ >>> nx._random_backend = "Generator"
+
+ >>> type(create_random_state(12345))
+ numpy.random._generator.Generator
+
+
Discussion
----------