Finish up NXEP 4 first draft (#5391)

* Minor updates to wording. * Summarize and add links to relevant sklearn discussions. * Add an implementation section. * Add suggestion from Dan re: pkg-level toggle.
author: Ross Barnowski <rossbar@berkeley.edu> 2022-03-14 22:23:49 -0700
committer: GitHub <noreply@github.com> 2022-03-14 22:23:49 -0700
commit: c6e9065e7c0f3a8b1f58a2ffdd5ea041a33f0b94 (patch)
tree: 5d65ec52b200824af5b1efa77d476a943d926183
parent: 51eb1ad5b4b4483f4ca1b751936e804b03142b0b (diff)
download: networkx-c6e9065e7c0f3a8b1f58a2ffdd5ea041a33f0b94.tar.gz
1 files changed, 96 insertions, 10 deletions
diff --git a/doc/developer/nxeps/nxep-0004.rst b/doc/developer/nxeps/nxep-0004.rst
index bbd4d891..22168437 100644
--- a/doc/developer/nxeps/nxep-0004.rst
+++ b/doc/developer/nxeps/nxep-0004.rst
@@ -17,7 +17,7 @@ Pseudo-random numbers play an important role in many graph and network analysis
 algorithms in NetworkX.
 NetworkX provides a :ref:`standard interface to random number generators <randomness>`
 that includes support for `numpy.random` and the Python built-in `random` module.
-`numpy.random` is used extensively within NetworkX and in most cases is the
+`numpy.random` is used extensively within NetworkX and in several cases is the
 preferred package for random number generation.
 NumPy introduced a new interface in the `numpy.random` package in NumPy version
 1.17.
@@ -171,10 +171,11 @@ This can be addressed with a compatiblity class similar to the
 layer between `random` and `numpy.random.RandomState`.
 
 `create_random_state` currently returns the global ``numpy.random.mtrand._rand``
-`RandomState` instance when the input is `None` or the numpy.random module.
+`RandomState` instance when the input is `None` or the ``numpy.random`` module.
 By switching to `numpy.random.Generator`, this will no longer be possible as
 there is no global, internal `Generator` instance in the `numpy.random` module.
-This should have no effect on users.
+This should have no effect on users, as ``seed=None`` currently does not
+guarantee reproducible results.
 
 Detailed description
 --------------------
@@ -188,17 +189,80 @@ function is either an integer or `None`.
 Related Work
 ------------
 
-- NEP 19
-- TODO
+Scikit-learn has a similar pattern for imposing determinism on functions that
+depend on randomness.
+For example, many functions in ``scikit-learn`` have a ``random_state`` argument
+that functions similarly to how ``seed`` behaves in many NetworkX function
+signatures.
+One difference between ``scikit-learn`` and ``networkx`` is that scikit-learn
+**only** supports ``RandomState`` via the ``random_state`` keyword argument,
+whereas NetworkX implicitly supports both the built-in `random` module, as well
+as both the numpy ``RandomState`` and ``Generator`` instances (depending on
+the type of ``seed``).
+This is reflected in the name of the keyword argument as ``random_state``
+(used by scikit-learn) is les ambiguous than ``seed`` (used by NetworkX).
+
+There are multiple relevant discussions in the scikit-learn community about
+potential approaches to supporting the new NumPy random interface:
+
+- `scikit-learn/scikit-learn#16988 <sklearn16988>`_ covers strategies and concerns
+  related to enabling users to use the ``Generator``-based random number generators.
+- `scikit-learn/scikit-learn#14042 <sklearn14042>`_ is a higher-level discussion
+  that includes additional information about the design considerations and constraints
+  related to scikit-learn's ``random_state``.
+- There is also a releated `SLEP <slep011>`_.
+
+.. _sklearn16988: https://github.com/scikit-learn/scikit-learn/issues/16988
+.. _sklearn14042: https://github.com/scikit-learn/scikit-learn/issues/14042
+.. _slep011: https://github.com/scikit-learn/enhancement_proposals/pull/24
 
 Implementation
 --------------
 
-
-TODO: simple diff here
-
-The implementation itself is quite simple. Most of the work will go into
-improved/reorganized tests.
+The implementation itself is quite simple. The logic that determines how
+inputs are mapped to random number generators is encapsulated in the
+`~networkx.utils.misc.create_random_state` function (and the related
+`~networkx.utils.misc.create_py_random_state`).
+Currently (i.e. NetworkX <= 2.X), this function maps inputs like ``None``,
+``numpy.random``, and integers to ``RandomState`` instances::
+
+    def create_random_state(random_state=None):
+        if random_state is None or random_state is np.random:
+            return np.random.mtrand._rand
+        if isinstance(random_state, np.random.RandomState):
+            return random_state
+        if isinstance(random_state, int):
+            return np.random.RandomState(random_state)
+        if isinstance(random_state, np.random.Generator):
+            return random_state
+        msg = (
+            f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
+            "numpy.random.Generator instance"
+        )
+        raise ValueError(msg)
+
+This NXEP proposes to modify the function to produce ``Generator`` instances
+for these inputs. An example implementation might look something like::
+
+
+    def create_random_state(random_state=None):
+        if random_state is None or random_state is np.random:
+            return np.random.default_rng()
+        if isinstance(random_state, (np.random.RandomState, np.random.Generator)):
+            return random_state
+        if isinstance(random_state, int):
+            return np.random.default_rng(random_state)
+        msg = (
+            f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
+            "numpy.random.Generator instance"
+        )
+        raise ValueError(msg)
+
+
+The above captures the essential change in logic, though implementation details
+may differ.
+Most of the work related implementing this change will be associated with
+improved/reorganized tests; including adding tests rng-stream reproducibility.
 
 Alternatives
 ------------
@@ -208,6 +272,28 @@ acceptable alternative.
 ``RandomState`` is not deprecated, and is expected to maintain its stream-compatibility
 guarantee in perpetuity.
 
+Another possible alternative would be to provide a package-level toggle that
+users could use to switch the behavior the ``seed`` kwarg for all functions
+decorated by ``np_random_state`` or ``py_random_state``.
+To illustrate (ignoring implementation details)::
+
+    
+    >>> import networkx as nx
+    >>> from networkx.utils.misc import create_random_state
+
+    # NetworkX 2.X behavior: RandomState by default
+
+    >>> type(create_random_state(12345))
+    numpy.random.mtrand.RandomState
+
+    # Change random backend by setting pkg attr
+
+    >>> nx._random_backend = "Generator"
+
+    >>> type(create_random_state(12345))
+    numpy.random._generator.Generator
+
+
 Discussion
 ----------
author	Ross Barnowski <rossbar@berkeley.edu>	2022-03-14 22:23:49 -0700
committer	GitHub <noreply@github.com>	2022-03-14 22:23:49 -0700
commit	c6e9065e7c0f3a8b1f58a2ffdd5ea041a33f0b94 (patch)
tree	5d65ec52b200824af5b1efa77d476a943d926183
parent	51eb1ad5b4b4483f4ca1b751936e804b03142b0b (diff)
download	networkx-c6e9065e7c0f3a8b1f58a2ffdd5ea041a33f0b94.tar.gz