diff options
author | Ross Barnowski <rossbar@berkeley.edu> | 2022-03-14 22:23:49 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-03-14 22:23:49 -0700 |
commit | c6e9065e7c0f3a8b1f58a2ffdd5ea041a33f0b94 (patch) | |
tree | 5d65ec52b200824af5b1efa77d476a943d926183 | |
parent | 51eb1ad5b4b4483f4ca1b751936e804b03142b0b (diff) | |
download | networkx-c6e9065e7c0f3a8b1f58a2ffdd5ea041a33f0b94.tar.gz |
Finish up NXEP 4 first draft (#5391)
* Minor updates to wording.
* Summarize and add links to relevant sklearn discussions.
* Add an implementation section.
* Add suggestion from Dan re: pkg-level toggle.
-rw-r--r-- | doc/developer/nxeps/nxep-0004.rst | 106 |
1 files changed, 96 insertions, 10 deletions
diff --git a/doc/developer/nxeps/nxep-0004.rst b/doc/developer/nxeps/nxep-0004.rst index bbd4d891..22168437 100644 --- a/doc/developer/nxeps/nxep-0004.rst +++ b/doc/developer/nxeps/nxep-0004.rst @@ -17,7 +17,7 @@ Pseudo-random numbers play an important role in many graph and network analysis algorithms in NetworkX. NetworkX provides a :ref:`standard interface to random number generators <randomness>` that includes support for `numpy.random` and the Python built-in `random` module. -`numpy.random` is used extensively within NetworkX and in most cases is the +`numpy.random` is used extensively within NetworkX and in several cases is the preferred package for random number generation. NumPy introduced a new interface in the `numpy.random` package in NumPy version 1.17. @@ -171,10 +171,11 @@ This can be addressed with a compatiblity class similar to the layer between `random` and `numpy.random.RandomState`. `create_random_state` currently returns the global ``numpy.random.mtrand._rand`` -`RandomState` instance when the input is `None` or the numpy.random module. +`RandomState` instance when the input is `None` or the ``numpy.random`` module. By switching to `numpy.random.Generator`, this will no longer be possible as there is no global, internal `Generator` instance in the `numpy.random` module. -This should have no effect on users. +This should have no effect on users, as ``seed=None`` currently does not +guarantee reproducible results. Detailed description -------------------- @@ -188,17 +189,80 @@ function is either an integer or `None`. Related Work ------------ -- NEP 19 -- TODO +Scikit-learn has a similar pattern for imposing determinism on functions that +depend on randomness. +For example, many functions in ``scikit-learn`` have a ``random_state`` argument +that functions similarly to how ``seed`` behaves in many NetworkX function +signatures. +One difference between ``scikit-learn`` and ``networkx`` is that scikit-learn +**only** supports ``RandomState`` via the ``random_state`` keyword argument, +whereas NetworkX implicitly supports both the built-in `random` module, as well +as both the numpy ``RandomState`` and ``Generator`` instances (depending on +the type of ``seed``). +This is reflected in the name of the keyword argument as ``random_state`` +(used by scikit-learn) is les ambiguous than ``seed`` (used by NetworkX). + +There are multiple relevant discussions in the scikit-learn community about +potential approaches to supporting the new NumPy random interface: + +- `scikit-learn/scikit-learn#16988 <sklearn16988>`_ covers strategies and concerns + related to enabling users to use the ``Generator``-based random number generators. +- `scikit-learn/scikit-learn#14042 <sklearn14042>`_ is a higher-level discussion + that includes additional information about the design considerations and constraints + related to scikit-learn's ``random_state``. +- There is also a releated `SLEP <slep011>`_. + +.. _sklearn16988: https://github.com/scikit-learn/scikit-learn/issues/16988 +.. _sklearn14042: https://github.com/scikit-learn/scikit-learn/issues/14042 +.. _slep011: https://github.com/scikit-learn/enhancement_proposals/pull/24 Implementation -------------- - -TODO: simple diff here - -The implementation itself is quite simple. Most of the work will go into -improved/reorganized tests. +The implementation itself is quite simple. The logic that determines how +inputs are mapped to random number generators is encapsulated in the +`~networkx.utils.misc.create_random_state` function (and the related +`~networkx.utils.misc.create_py_random_state`). +Currently (i.e. NetworkX <= 2.X), this function maps inputs like ``None``, +``numpy.random``, and integers to ``RandomState`` instances:: + + def create_random_state(random_state=None): + if random_state is None or random_state is np.random: + return np.random.mtrand._rand + if isinstance(random_state, np.random.RandomState): + return random_state + if isinstance(random_state, int): + return np.random.RandomState(random_state) + if isinstance(random_state, np.random.Generator): + return random_state + msg = ( + f"{random_state} cannot be used to create a numpy.random.RandomState or\n" + "numpy.random.Generator instance" + ) + raise ValueError(msg) + +This NXEP proposes to modify the function to produce ``Generator`` instances +for these inputs. An example implementation might look something like:: + + + def create_random_state(random_state=None): + if random_state is None or random_state is np.random: + return np.random.default_rng() + if isinstance(random_state, (np.random.RandomState, np.random.Generator)): + return random_state + if isinstance(random_state, int): + return np.random.default_rng(random_state) + msg = ( + f"{random_state} cannot be used to create a numpy.random.RandomState or\n" + "numpy.random.Generator instance" + ) + raise ValueError(msg) + + +The above captures the essential change in logic, though implementation details +may differ. +Most of the work related implementing this change will be associated with +improved/reorganized tests; including adding tests rng-stream reproducibility. Alternatives ------------ @@ -208,6 +272,28 @@ acceptable alternative. ``RandomState`` is not deprecated, and is expected to maintain its stream-compatibility guarantee in perpetuity. +Another possible alternative would be to provide a package-level toggle that +users could use to switch the behavior the ``seed`` kwarg for all functions +decorated by ``np_random_state`` or ``py_random_state``. +To illustrate (ignoring implementation details):: + + + >>> import networkx as nx + >>> from networkx.utils.misc import create_random_state + + # NetworkX 2.X behavior: RandomState by default + + >>> type(create_random_state(12345)) + numpy.random.mtrand.RandomState + + # Change random backend by setting pkg attr + + >>> nx._random_backend = "Generator" + + >>> type(create_random_state(12345)) + numpy.random._generator.Generator + + Discussion ---------- |