From 7a18e4ac4f0b9f0933a190ac25cd75729bcdd146 Mon Sep 17 00:00:00 2001 From: Ryan C Cooper Date: Fri, 12 Feb 2021 12:47:53 -0500 Subject: DOC: Update arraycreation (#17887) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * rewriting basics.creation * halfway through first draft basics.creation * first draft for array creation rewrite * correct some broken links * correct typo in code format * Apply suggestions from code review - @melissawm PEP8 corrections and some other mistakes. thanks @melissawm! Co-authored-by: Melissa Weber Mendonça * remove comments on L347 and 356, fix link on L22 * remove typo on L355 section title error * change ND array -> ndarray docs consistent vocab * added np.* to numpy functions and dtypes * @mattip review changes Great changes by @mattip. I like the change from "object" -> "function" Co-authored-by: Matti Picus * added default_rng() example and some links * Update doc/source/user/basics.creation.rst reformat 3D array L40 Co-authored-by: Eric Wieser * Apply suggestions from code review thanks @eric-wieser for catching some formatting typos. Co-authored-by: Eric Wieser * working on updates and trying to build * fixed heading underlines and some links * Apply suggestions from code review fixing link to default_rng Co-authored-by: Eric Wieser * Apply suggestions from code review - @melissawm adding @melissawm comments to changes Co-authored-by: Melissa Weber Mendonça * reword ndarray creation distinctions * L34 indented and L94 literal block removed * L34 remove newline to try to fix indent make error * Update doc/source/user/basics.creation.rst @melissawm caught a formatting issue for a list, `::` -> `:` and extra line added Co-authored-by: Melissa Weber Mendonça * MAINT: Update doc/source/user/basics.creation.rst Co-authored-by: Eric Wieser * MAINT: Update doc/source/user/basics.creation.rst Co-authored-by: Eric Wieser * MAINT: Update doc/source/user/basics.creation.rst Co-authored-by: Eric Wieser Co-authored-by: Melissa Weber Mendonça Co-authored-by: Matti Picus Co-authored-by: Eric Wieser Co-authored-by: Charles Harris --- doc/source/user/basics.creation.rst | 386 +++++++++++++++++++++++++++++------- 1 file changed, 312 insertions(+), 74 deletions(-) (limited to 'doc/source') diff --git a/doc/source/user/basics.creation.rst b/doc/source/user/basics.creation.rst index 671a8ec59..ccd6de184 100644 --- a/doc/source/user/basics.creation.rst +++ b/doc/source/user/basics.creation.rst @@ -9,55 +9,102 @@ Array creation Introduction ============ -There are 5 general mechanisms for creating arrays: +There are 6 general mechanisms for creating arrays: -1) Conversion from other Python structures (e.g., lists, tuples) -2) Intrinsic numpy array creation objects (e.g., arange, ones, zeros, +1) Conversion from other Python structures (i.e. lists and tuples) +2) Intrinsic NumPy array creation functions (e.g. arange, ones, zeros, etc.) -3) Reading arrays from disk, either from standard or custom formats -4) Creating arrays from raw bytes through the use of strings or buffers -5) Use of special library functions (e.g., random) +3) Replicating, joining, or mutating existing arrays +4) Reading arrays from disk, either from standard or custom formats +5) Creating arrays from raw bytes through the use of strings or buffers +6) Use of special library functions (e.g., random) -This section will not cover means of replicating, joining, or otherwise -expanding or mutating existing arrays. Nor will it cover creating object -arrays or structured arrays. Both of those are covered in their own sections. +You can use these methods to create ndarrays or :ref:`structured_arrays`. +This document will cover general methods for ndarray creation. -Converting Python array_like Objects to NumPy Arrays -==================================================== - -In general, numerical data arranged in an array-like structure in Python can -be converted to arrays through the use of the array() function. The most -obvious examples are lists and tuples. See the documentation for array() for -details for its use. Some objects may support the array-protocol and allow -conversion to arrays this way. A simple way to find out if the object can be -converted to a numpy array using array() is simply to try it interactively and -see if it works! (The Python Way). +1) Converting Python sequences to NumPy Arrays +============================================== -Examples: :: +NumPy arrays can be defined using Python sequences such as lists and +tuples. Lists and tuples are defined using ``[...]`` and ``(...)``, +respectively. Lists and tuples can define ndarray creation: - >>> x = np.array([2,3,1,0]) - >>> x = np.array([2, 3, 1, 0]) - >>> x = np.array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and lists, - and types - >>> x = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]]) +* a list of numbers will create a 1D array, +* a list of lists will create a 2D array, +* further nested lists will create higher-dimensional arrays. In general, any array object is called an **ndarray** in NumPy. -Intrinsic NumPy Array Creation -============================== - -NumPy has built-in functions for creating arrays from scratch: +:: -zeros(shape) will create an array filled with 0 values with the specified -shape. The default dtype is float64. :: + >>> a1D = np.array([1, 2, 3, 4]) + >>> a2D = np.array([[1, 2], [3, 4]]) + >>> a3D = np.array([[[1, 2], [3, 4]], + [[5, 6], [7, 8]]]) - >>> np.zeros((2, 3)) - array([[ 0., 0., 0.], [ 0., 0., 0.]]) +When you use :func:`numpy.array` to define a new array, you should +consider the :doc:`dtype ` of the elements in the array, +which can be specified explicitly. This feature gives you +more control over the underlying data structures and how the elements +are handled in C/C++ functions. If you are not careful with ``dtype`` +assignments, you can get unwanted overflow, as such -ones(shape) will create an array filled with 1 values. It is identical to -zeros in all other respects. +:: -arange() will create arrays with regularly incrementing values. Check the -docstring for complete information on the various ways it can be used. A few -examples will be given here: :: + >>> a = np.array([127, 128, 129], dtype=np.int8) + >>> a + array([ 127, -128, -127], dtype=int8) + +An 8-bit signed integer represents integers from -128 to 127. +Assigning the ``int8`` array to integers outside of this range results +in overflow. This feature can often be misunderstood. If you +perform calculations with mismatching ``dtypes``, you can get unwanted +results, for example:: + + >>> a = array([2, 3, 4], dtype = np.uint32) + >>> b = array([5, 6, 7], dtype = np.uint32) + >>> c_unsigned32 = a - b + >>> print('unsigned c:', c_unsigned32, c_unsigned32.dtype) + unsigned c: [4294967293 4294967293 4294967293] uint32 + >>> c_signed32 = a - b.astype(np.int32) + >>> print('signed c:', c_signed32, c_signed32.dtype) + signed c: [-3 -3 -3] int64 + +Notice when you perform operations with two arrays of the same +``dtype``: ``uint32``, the resulting array is the same type. When you +perform operations with different ``dtype``, NumPy will +assign a new type that satisfies all of the array elements involved in +the computation, here ``uint32`` and ``int32`` can both be represented in +as ``int64``. + +The default NumPy behavior is to create arrays in either 64-bit signed +integers or double precision floating point numbers, ``int64`` and +``float``, respectively. If you expect your arrays to be a certain type, +then you need to specify the ``dtype`` while you create the array. + +2) Intrinsic NumPy array creation functions +=========================================== +.. + 40 functions seems like a small number, but the routies.array-creation + has ~47. I'm sure there are more. + +NumPy has over 40 built-in functions for creating arrays as laid +out in the :ref:`Array creation routines `. +These functions can be split into roughly three categories, based on the +dimension of the array they create: + +1) 1D arrays +2) 2D arrays +3) ndarrays + +1 - 1D array creation functions +------------------------------- + +The 1D array creation functions e.g. :func:`numpy.linspace` and +:func:`numpy.arange` generally need at least two inputs, ``start`` and +``stop``. + +:func:`numpy.arange` creates arrays with regularly incrementing values. +Check the documentation for complete information and examples. A few +examples are shown:: >>> np.arange(10) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) @@ -66,43 +113,216 @@ examples will be given here: :: >>> np.arange(2, 3, 0.1) array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9]) -Note that there are some subtleties regarding the last usage that the user -should be aware of that are described in the arange docstring. +Note: best practice for :func:`numpy.arange` is to use integer start, end, and +step values. There are some subtleties regarding ``dtype``. In the second +example, the ``dtype`` is defined. In the third example, the array is +``dtype=float`` to accomodate the step size of ``0.1``. Due to roundoff error, +the ``stop`` value is sometimes included. -linspace() will create arrays with a specified number of elements, and +:func:`numpy.linspace` will create arrays with a specified number of elements, and spaced equally between the specified beginning and end values. For example: :: >>> np.linspace(1., 4., 6) array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. ]) -The advantage of this creation function is that one can guarantee the -number of elements and the starting and end point, which arange() -generally will not do for arbitrary start, stop, and step values. +The advantage of this creation function is that you guarantee the +number of elements and the starting and end point. The previous +``arange(start, stop, step)`` will not include the value ``stop``. + +2 - 2D array creation functions +------------------------------- + +The 2D array creation functions e.g. :func:`numpy.eye`, :func:`numpy.diag`, and :func:`numpy.vander` +define properties of special matrices represented as 2D arrays. + +``np.eye(n, m)`` defines a 2D identity matrix. The elements where i=j (row index and column index are equal) are 1 +and the rest are 0, as such:: + + >>> np.eye(3) + array([[1., 0., 0.], + [0., 1., 0.], + [0., 0., 1.]]) + >>> np.eye(3, 5) + array([[1., 0., 0., 0., 0.], + [0., 1., 0., 0., 0.], + [0., 0., 1., 0., 0.]]) + +:func:`numpy.diag` can define either a square 2D array with given values along +the diagonal *or* if given a 2D array returns a 1D array that is +only the diagonal elements. The two array creation functions can be helpful while +doing linear algebra, as such:: + + >>> np.diag([1, 2, 3]) + array([[1, 0, 0], + [0, 2, 0], + [0, 0, 3]]) + >>> np.diag([1, 2, 3], 1) + array([[0, 1, 0, 0], + [0, 0, 2, 0], + [0, 0, 0, 3], + [0, 0, 0, 0]]) + >>> a = np.array([[1, 2], [3, 4]]) + >>> np.diag(a) + array([1, 4]) + +``vander(x, n)`` defines a Vandermonde matrix as a 2D NumPy array. Each column +of the Vandermonde matrix is a decreasing power of the input 1D array or +list or tuple, +``x`` where the highest polynomial order is ``n-1``. This array creation +routine is helpful in generating linear least squares models, as such:: + + >>> np.vander(np.linspace(0, 2, 5), 2) + array([[0. , 0. , 1. ], + [0.25, 0.5 , 1. ], + [1. , 1. , 1. ], + [2.25, 1.5 , 1. ], + [4. , 2. , 1. ]]) + >>> np.vander([1, 2, 3, 4], 2) + array([[1, 1], + [2, 1], + [3, 1], + [4, 1]]) + >>> np.vander((1, 2, 3, 4), 4) + array([[ 1, 1, 1, 1], + [ 8, 4, 2, 1], + [27, 9, 3, 1], + [64, 16, 4, 1]]) + +3 - general ndarray creation functions +-------------------------------------- + +The ndarray creation functions e.g. :func:`numpy.ones`, +:func:`numpy.zeros`, and :meth:`~numpy.random.Generator.random` define +arrays based upon the desired shape. The ndarray creation functions +can create arrays with any dimension by specifying how many dimensions +and length along that dimension in a tuple or list. + +:func:`numpy.zeros` will create an array filled with 0 values with the +specified shape. The default dtype is ``float64``:: -indices() will create a set of arrays (stacked as a one-higher dimensioned -array), one per dimension with each representing variation in that dimension. -An example illustrates much better than a verbal description: :: + >>> np.zeros((2, 3)) + array([[0., 0., 0.], + [0., 0., 0.]]) + >>> np.zeros((2, 3, 2)) + array([[[0., 0.], + [0., 0.], + [0., 0.]], + + [[0., 0.], + [0., 0.], + [0., 0.]]]) + +:func:`numpy.ones` will create an array filled with 1 values. It is identical to +``zeros`` in all other respects as such:: + + >>> np.ones((2, 3)) + array([[ 1., 1., 1.], + [ 1., 1., 1.]]) + >>> np.ones((2, 3, 2)) + array([[[1., 1.], + [1., 1.], + [1., 1.]], + + [[1., 1.], + [1., 1.], + [1., 1.]]]) + +The :meth:`~numpy.random.Generator.random` method of the result of +``default_rng`` will create an array filled with random +values between 0 and 1. It is included with the :func:`numpy.random` +library. Below, two arrays are created with shapes (2,3) and (2,3,2), +respectively. The seed is set to 42 so you can reproduce these +pseudorandom numbers:: + + >>> import numpy.random.default_rng + >>> default_rng(42).random((2,3)) + array([[0.77395605, 0.43887844, 0.85859792], + [0.69736803, 0.09417735, 0.97562235]]) + >>> default_rng(42).random((2,3,2)) + array([[[0.77395605, 0.43887844], + [0.85859792, 0.69736803], + [0.09417735, 0.97562235]], + [[0.7611397 , 0.78606431], + [0.12811363, 0.45038594], + [0.37079802, 0.92676499]]]) + +:func:`numpy.indices` will create a set of arrays (stacked as a one-higher +dimensioned array), one per dimension with each representing variation in that +dimension: :: >>> np.indices((3,3)) - array([[[0, 0, 0], [1, 1, 1], [2, 2, 2]], [[0, 1, 2], [0, 1, 2], [0, 1, 2]]]) + array([[[0, 0, 0], + [1, 1, 1], + [2, 2, 2]], + [[0, 1, 2], + [0, 1, 2], + [0, 1, 2]]]) This is particularly useful for evaluating functions of multiple dimensions on a regular grid. -Reading Arrays From Disk -======================== +3) Replicating, joining, or mutating existing arrays +==================================================== -This is presumably the most common case of large array creation. The details, -of course, depend greatly on the format of data on disk and so this section -can only give general pointers on how to handle various formats. +Once you have created arrays, you can replicate, join, or mutate those +existing arrays to create new arrays. When you assign an array or its +elements to a new variable, you have to explicitly :func:`numpy.copy` the array, +otherwise the variable is a view into the original array. Consider the +following example:: + + >>> a = np.array([1, 2, 3, 4, 5, 6]) + >>> b = a[:2] + >>> b += 1 + >>> print('a =', a, '; b =', b) + a = [2 3 3 4 5 6]; b = [2 3] + +In this example, you did not create a new array. You created a variable, +``b`` that viewed the first 2 elements of ``a``. When you added 1 to ``b`` you +would get the same result by adding 1 to ``a[:2]``. If you want to create a +*new* array, use the :func:`numpy.copy` array creation routine as such:: + + >>> a = np.array([1, 2, 3, 4]) + >>> b = a[:2].copy() + >>> b += 1 + >>> print('a = ', a, 'b = ', b) + a = [1 2 3 4 5 6] b = [2 3] + +For more information and examples look at :ref:`Copies and Views +`. + +There are a number of routines to join existing arrays e.g. :func:`numpy.vstack`, +:func:`numpy.hstack`, and :func:`numpy.block`. Here is an example of joining four 2-by-2 +arrays into a 4-by-4 array using ``block``:: + + >>> A = np.ones((2, 2)) + >>> B = np.eye((2, 2)) + >>> C = np.zeros((2, 2)) + >>> D = np.diag((-3, -4)) + >>> np.block([[A, B], + [C, D]]) + array([[ 1., 1., 1., 0. ], + [ 1., 1., 0., 1. ], + [ 0., 0., -3., 0. ], + [ 0., 0., 0., -4. ]]) + +Other routines use similar syntax to join ndarrays. Check the +routine's documentation for further examples and syntax. + +4) Reading arrays from disk, either from standard or custom formats +=================================================================== + +This is the most common case of large array creation. The details depend +greatly on the format of data on disk. This section gives general pointers on +how to handle various formats. For more detailed examples of IO look at +:ref:`How to Read and Write files `. Standard Binary Formats ----------------------- Various fields have standard formats for array data. The following lists the -ones with known python libraries to read them and return numpy arrays (there -may be others for which it is possible to read and convert to numpy arrays so +ones with known Python libraries to read them and return NumPy arrays (there +may be others for which it is possible to read and convert to NumPy arrays so check the last section as well) :: @@ -114,33 +334,51 @@ convert are those formats supported by libraries like PIL (able to read and write many image formats such as jpg, png, etc). Common ASCII Formats ------------------------- +-------------------- -Comma Separated Value files (CSV) are widely used (and an export and import -option for programs like Excel). There are a number of ways of reading these -files in Python. There are CSV functions in Python and functions in pylab -(part of matplotlib). +Delimited files such as comma separated value (csv) and tab separated +value (tsv) files are used for programs like Excel and LabView. Python +functions can read and parse these files line-by-line. NumPy has two +standard routines for importing a file with delimited data :func:`numpy.loadtxt` +and :func:`numpy.genfromtxt`. These functions have more involved use cases in +:doc:`how-to-io`. A simple example given a ``simple.csv``: -More generic ascii files can be read using the io package in scipy. +.. code-block:: bash -Custom Binary Formats ---------------------- + $ cat simple.csv + x, y + 0, 0 + 1, 1 + 2, 4 + 3, 9 + +Importing ``simple.csv`` is accomplished using :func:`loadtxt`:: + + >>> np.loadtxt('simple.csv', delimiter = ',', skiprows = 1) # doctest: +SKIP + array([[0., 0.], + [1., 1.], + [2., 4.], + [3., 9.]]) + + +More generic ASCII files can be read using `scipy.io` and `Pandas +`_. + +5) Creating arrays from raw bytes through the use of strings or buffers +======================================================================= There are a variety of approaches one can use. If the file has a relatively -simple format then one can write a simple I/O library and use the numpy -fromfile() function and .tofile() method to read and write numpy arrays +simple format then one can write a simple I/O library and use the NumPy +``fromfile()`` function and ``.tofile()`` method to read and write NumPy arrays directly (mind your byteorder though!) If a good C or C++ library exists that read the data, one can wrap that library with a variety of techniques though that certainly is much more work and requires significantly more advanced knowledge to interface with C or C++. -Use of Special Libraries ------------------------- - -There are libraries that can be used to generate arrays for special purposes -and it isn't possible to enumerate all of them. The most common uses are use -of the many array generation functions in random that can generate arrays of -random values, and some utility functions to generate special matrices (e.g. -diagonal). - +6) Use of special library functions (e.g., SciPy, Pandas, and OpenCV) +===================================================================== +NumPy is the fundamental library for array containers in the Python Scientific Computing +stack. Many Python libraries, including SciPy, Pandas, and OpenCV, use NumPy ndarrays +as the common format for data exchange, These libraries can create, +operate on, and work with NumPy arrays. -- cgit v1.2.1