diff options
author | Stefan Krah <skrah@bytereef.org> | 2012-02-25 12:24:21 +0100 |
---|---|---|
committer | Stefan Krah <skrah@bytereef.org> | 2012-02-25 12:24:21 +0100 |
commit | 9a2d99e28a5c2989b2db4023acae4f550885f2ef (patch) | |
tree | 29bb99fc008de30ecc1e765d6d14ee35cd5bdfe5 /Doc | |
parent | 5a3d04623b0dc8219326989bc3619d5f56737a94 (diff) | |
download | cpython-git-9a2d99e28a5c2989b2db4023acae4f550885f2ef.tar.gz |
- Issue #10181: New memoryview implementation fixes multiple ownership
and lifetime issues of dynamically allocated Py_buffer members (#9990)
as well as crashes (#8305, #7433). Many new features have been added
(See whatsnew/3.3), and the documentation has been updated extensively.
The ndarray test object from _testbuffer.c implements all aspects of
PEP-3118, so further development towards the complete implementation
of the PEP can proceed in a test-driven manner.
Thanks to Nick Coghlan, Antoine Pitrou and Pauli Virtanen for review
and many ideas.
- Issue #12834: Fix incorrect results of memoryview.tobytes() for
non-contiguous arrays.
- Issue #5231: Introduce memoryview.cast() method that allows changing
format and shape without making a copy of the underlying memory.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/c-api/buffer.rst | 485 | ||||
-rw-r--r-- | Doc/c-api/memoryview.rst | 29 | ||||
-rw-r--r-- | Doc/c-api/typeobj.rst | 88 | ||||
-rw-r--r-- | Doc/library/stdtypes.rst | 298 | ||||
-rw-r--r-- | Doc/whatsnew/3.3.rst | 56 |
5 files changed, 712 insertions, 244 deletions
diff --git a/Doc/c-api/buffer.rst b/Doc/c-api/buffer.rst index d98ece3e98..2d199922f2 100644 --- a/Doc/c-api/buffer.rst +++ b/Doc/c-api/buffer.rst @@ -7,6 +7,7 @@ Buffer Protocol .. sectionauthor:: Greg Stein <gstein@lyra.org> .. sectionauthor:: Benjamin Peterson +.. sectionauthor:: Stefan Krah .. index:: @@ -20,7 +21,7 @@ as image processing or numeric analysis. While each of these types have their own semantics, they share the common characteristic of being backed by a possibly large memory buffer. It is -then desireable, in some situations, to access that buffer directly and +then desirable, in some situations, to access that buffer directly and without intermediate copying. Python provides such a facility at the C level in the form of the *buffer @@ -60,8 +61,10 @@ isn't needed anymore. Failure to do so could lead to various issues such as resource leaks. -The buffer structure -==================== +.. _buffer-structure: + +Buffer structure +================ Buffer structures (or simply "buffers") are useful as a way to expose the binary data from another object to the Python programmer. They can also be @@ -81,246 +84,400 @@ can be created. .. c:type:: Py_buffer - .. c:member:: void *buf + .. c:member:: void \*obj + + A new reference to the exporting object or *NULL*. The reference is owned + by the consumer and automatically decremented and set to *NULL* by + :c:func:`PyBuffer_Release`. + + For temporary buffers that are wrapped by :c:func:`PyMemoryView_FromBuffer` + this field must be *NULL*. - A pointer to the start of the memory for the object. + .. c:member:: void \*buf + + A pointer to the start of the logical structure described by the buffer + fields. This can be any location within the underlying physical memory + block of the exporter. For example, with negative :c:member:`~Py_buffer.strides` + the value may point to the end of the memory block. + + For contiguous arrays, the value points to the beginning of the memory + block. .. c:member:: Py_ssize_t len - :noindex: - The total length of the memory in bytes. + ``product(shape) * itemsize``. For contiguous arrays, this is the length + of the underlying memory block. For non-contiguous arrays, it is the length + that the logical structure would have if it were copied to a contiguous + representation. + + Accessing ``((char *)buf)[0] up to ((char *)buf)[len-1]`` is only valid + if the buffer has been obtained by a request that guarantees contiguity. In + most cases such a request will be :c:macro:`PyBUF_SIMPLE` or :c:macro:`PyBUF_WRITABLE`. .. c:member:: int readonly - An indicator of whether the buffer is read only. + An indicator of whether the buffer is read-only. This field is controlled + by the :c:macro:`PyBUF_WRITABLE` flag. + + .. c:member:: Py_ssize_t itemsize + + Item size in bytes of a single element. Same as the value of :func:`struct.calcsize` + called on non-NULL :c:member:`~Py_buffer.format` values. + + Important exception: If a consumer requests a buffer without the + :c:macro:`PyBUF_FORMAT` flag, :c:member:`~Py_Buffer.format` will + be set to *NULL*, but :c:member:`~Py_buffer.itemsize` still has + the value for the original format. + + If :c:member:`~Py_Buffer.shape` is present, the equality + ``product(shape) * itemsize == len`` still holds and the consumer + can use :c:member:`~Py_buffer.itemsize` to navigate the buffer. + + If :c:member:`~Py_Buffer.shape` is *NULL* as a result of a :c:macro:`PyBUF_SIMPLE` + or a :c:macro:`PyBUF_WRITABLE` request, the consumer must disregard + :c:member:`~Py_buffer.itemsize` and assume ``itemsize == 1``. - .. c:member:: const char *format - :noindex: + .. c:member:: const char \*format - A *NULL* terminated string in :mod:`struct` module style syntax giving - the contents of the elements available through the buffer. If this is - *NULL*, ``"B"`` (unsigned bytes) is assumed. + A *NUL* terminated string in :mod:`struct` module style syntax describing + the contents of a single item. If this is *NULL*, ``"B"`` (unsigned bytes) + is assumed. + + This field is controlled by the :c:macro:`PyBUF_FORMAT` flag. .. c:member:: int ndim - The number of dimensions the memory represents as a multi-dimensional - array. If it is 0, :c:data:`strides` and :c:data:`suboffsets` must be - *NULL*. - - .. c:member:: Py_ssize_t *shape - - An array of :c:type:`Py_ssize_t`\s the length of :c:data:`ndim` giving the - shape of the memory as a multi-dimensional array. Note that - ``((*shape)[0] * ... * (*shape)[ndims-1])*itemsize`` should be equal to - :c:data:`len`. - - .. c:member:: Py_ssize_t *strides - - An array of :c:type:`Py_ssize_t`\s the length of :c:data:`ndim` giving the - number of bytes to skip to get to a new element in each dimension. - - .. c:member:: Py_ssize_t *suboffsets - - An array of :c:type:`Py_ssize_t`\s the length of :c:data:`ndim`. If these - suboffset numbers are greater than or equal to 0, then the value stored - along the indicated dimension is a pointer and the suboffset value - dictates how many bytes to add to the pointer after de-referencing. A - suboffset value that it negative indicates that no de-referencing should - occur (striding in a contiguous memory block). - - Here is a function that returns a pointer to the element in an N-D array - pointed to by an N-dimensional index when there are both non-NULL strides - and suboffsets:: - - void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides, - Py_ssize_t *suboffsets, Py_ssize_t *indices) { - char *pointer = (char*)buf; - int i; - for (i = 0; i < ndim; i++) { - pointer += strides[i] * indices[i]; - if (suboffsets[i] >=0 ) { - pointer = *((char**)pointer) + suboffsets[i]; - } - } - return (void*)pointer; - } + The number of dimensions the memory represents as an n-dimensional array. + If it is 0, :c:member:`~Py_Buffer.buf` points to a single item representing + a scalar. In this case, :c:member:`~Py_buffer.shape`, :c:member:`~Py_buffer.strides` + and :c:member:`~Py_buffer.suboffsets` MUST be *NULL*. + The macro :c:macro:`PyBUF_MAX_NDIM` limits the maximum number of dimensions + to 64. Exporters MUST respect this limit, consumers of multi-dimensional + buffers SHOULD be able to handle up to :c:macro:`PyBUF_MAX_NDIM` dimensions. - .. c:member:: Py_ssize_t itemsize + .. c:member:: Py_ssize_t \*shape + + An array of :c:type:`Py_ssize_t` of length :c:member:`~Py_buffer.ndim` + indicating the shape of the memory as an n-dimensional array. Note that + ``shape[0] * ... * shape[ndim-1] * itemsize`` MUST be equal to + :c:member:`~Py_buffer.len`. + + Shape values are restricted to ``shape[n] >= 0``. The case + ``shape[n] == 0`` requires special attention. See `complex arrays`_ + for further information. + + The shape array is read-only for the consumer. + + .. c:member:: Py_ssize_t \*strides + + An array of :c:type:`Py_ssize_t` of length :c:member:`~Py_buffer.ndim` + giving the number of bytes to skip to get to a new element in each + dimension. + + Stride values can be any integer. For regular arrays, strides are + usually positive, but a consumer MUST be able to handle the case + ``strides[n] <= 0``. See `complex arrays`_ for further information. + + The strides array is read-only for the consumer. + + .. c:member:: Py_ssize_t \*suboffsets + + An array of :c:type:`Py_ssize_t` of length :c:member:`~Py_buffer.ndim`. + If ``suboffsets[n] >= 0``, the values stored along the nth dimension are + pointers and the suboffset value dictates how many bytes to add to each + pointer after de-referencing. A suboffset value that is negative + indicates that no de-referencing should occur (striding in a contiguous + memory block). - This is a storage for the itemsize (in bytes) of each element of the - shared memory. It is technically un-necessary as it can be obtained - using :c:func:`PyBuffer_SizeFromFormat`, however an exporter may know - this information without parsing the format string and it is necessary - to know the itemsize for proper interpretation of striding. Therefore, - storing it is more convenient and faster. + This type of array representation is used by the Python Imaging Library + (PIL). See `complex arrays`_ for further information how to access elements + of such an array. - .. c:member:: void *internal + The suboffsets array is read-only for the consumer. + + .. c:member:: void \*internal This is for use internally by the exporting object. For example, this might be re-cast as an integer by the exporter and used to store flags about whether or not the shape, strides, and suboffsets arrays must be - freed when the buffer is released. The consumer should never alter this + freed when the buffer is released. The consumer MUST NOT alter this value. +.. _buffer-request-types: -Buffer-related functions -======================== +Buffer request types +==================== +Buffers are usually obtained by sending a buffer request to an exporting +object via :c:func:`PyObject_GetBuffer`. Since the complexity of the logical +structure of the memory can vary drastically, the consumer uses the *flags* +argument to specify the exact buffer type it can handle. -.. c:function:: int PyObject_CheckBuffer(PyObject *obj) +All :c:data:`Py_buffer` fields are unambiguously defined by the request +type. + +request-independent fields +~~~~~~~~~~~~~~~~~~~~~~~~~~ +The following fields are not influenced by *flags* and must always be filled in +with the correct values: :c:member:`~Py_buffer.obj`, :c:member:`~Py_buffer.buf`, +:c:member:`~Py_buffer.len`, :c:member:`~Py_buffer.itemsize`, :c:member:`~Py_buffer.ndim`. - Return 1 if *obj* supports the buffer interface otherwise 0. When 1 is - returned, it doesn't guarantee that :c:func:`PyObject_GetBuffer` will - succeed. +readonly, format +~~~~~~~~~~~~~~~~ -.. c:function:: int PyObject_GetBuffer(PyObject *obj, Py_buffer *view, int flags) + .. c:macro:: PyBUF_WRITABLE - Export a view over some internal data from the target object *obj*. - *obj* must not be NULL, and *view* must point to an existing - :c:type:`Py_buffer` structure allocated by the caller (most uses of - this function will simply declare a local variable of type - :c:type:`Py_buffer`). The *flags* argument is a bit field indicating - what kind of buffer is requested. The buffer interface allows - for complicated memory layout possibilities; however, some callers - won't want to handle all the complexity and instead request a simple - view of the target object (using :c:macro:`PyBUF_SIMPLE` for a read-only - view and :c:macro:`PyBUF_WRITABLE` for a read-write view). + Controls the :c:member:`~Py_buffer.readonly` field. If set, the exporter + MUST provide a writable buffer or else report failure. Otherwise, the + exporter MAY provide either a read-only or writable buffer, but the choice + MUST be consistent for all consumers. - Some exporters may not be able to share memory in every possible way and - may need to raise errors to signal to some consumers that something is - just not possible. These errors should be a :exc:`BufferError` unless - there is another error that is actually causing the problem. The - exporter can use flags information to simplify how much of the - :c:data:`Py_buffer` structure is filled in with non-default values and/or - raise an error if the object can't support a simpler view of its memory. + .. c:macro:: PyBUF_FORMAT - On success, 0 is returned and the *view* structure is filled with useful - values. On error, -1 is returned and an exception is raised; the *view* - is left in an undefined state. + Controls the :c:member:`~Py_buffer.format` field. If set, this field MUST + be filled in correctly. Otherwise, this field MUST be *NULL*. - The following are the possible values to the *flags* arguments. - .. c:macro:: PyBUF_SIMPLE +:c:macro:`PyBUF_WRITABLE` can be \|'d to any of the flags in the next section. +Since :c:macro:`PyBUF_SIMPLE` is defined as 0, :c:macro:`PyBUF_WRITABLE` +can be used as a stand-alone flag to request a simple writable buffer. - This is the default flag. The returned buffer exposes a read-only - memory area. The format of data is assumed to be raw unsigned bytes, - without any particular structure. This is a "stand-alone" flag - constant. It never needs to be '|'d to the others. The exporter will - raise an error if it cannot provide such a contiguous buffer of bytes. +:c:macro:`PyBUF_FORMAT` can be \|'d to any of the flags except :c:macro:`PyBUF_SIMPLE`. +The latter already implies format ``B`` (unsigned bytes). - .. c:macro:: PyBUF_WRITABLE - Like :c:macro:`PyBUF_SIMPLE`, but the returned buffer is writable. If - the exporter doesn't support writable buffers, an error is raised. +shape, strides, suboffsets +~~~~~~~~~~~~~~~~~~~~~~~~~~ - .. c:macro:: PyBUF_STRIDES +The flags that control the logical structure of the memory are listed +in decreasing order of complexity. Note that each flag contains all bits +of the flags below it. - This implies :c:macro:`PyBUF_ND`. The returned buffer must provide - strides information (i.e. the strides cannot be NULL). This would be - used when the consumer can handle strided, discontiguous arrays. - Handling strides automatically assumes you can handle shape. The - exporter can raise an error if a strided representation of the data is - not possible (i.e. without the suboffsets). - .. c:macro:: PyBUF_ND ++-----------------------------+-------+---------+------------+ +| Request | shape | strides | suboffsets | ++=============================+=======+=========+============+ +| .. c:macro:: PyBUF_INDIRECT | yes | yes | if needed | ++-----------------------------+-------+---------+------------+ +| .. c:macro:: PyBUF_STRIDES | yes | yes | NULL | ++-----------------------------+-------+---------+------------+ +| .. c:macro:: PyBUF_ND | yes | NULL | NULL | ++-----------------------------+-------+---------+------------+ +| .. c:macro:: PyBUF_SIMPLE | NULL | NULL | NULL | ++-----------------------------+-------+---------+------------+ - The returned buffer must provide shape information. The memory will be - assumed C-style contiguous (last dimension varies the fastest). The - exporter may raise an error if it cannot provide this kind of - contiguous buffer. If this is not given then shape will be *NULL*. - .. c:macro:: PyBUF_C_CONTIGUOUS - PyBUF_F_CONTIGUOUS - PyBUF_ANY_CONTIGUOUS +contiguity requests +~~~~~~~~~~~~~~~~~~~ - These flags indicate that the contiguity returned buffer must be - respectively, C-contiguous (last dimension varies the fastest), Fortran - contiguous (first dimension varies the fastest) or either one. All of - these flags imply :c:macro:`PyBUF_STRIDES` and guarantee that the - strides buffer info structure will be filled in correctly. +C or Fortran contiguity can be explicitly requested, with and without stride +information. Without stride information, the buffer must be C-contiguous. - .. c:macro:: PyBUF_INDIRECT ++-----------------------------------+-------+---------+------------+--------+ +| Request | shape | strides | suboffsets | contig | ++===================================+=======+=========+============+========+ +| .. c:macro:: PyBUF_C_CONTIGUOUS | yes | yes | NULL | C | ++-----------------------------------+-------+---------+------------+--------+ +| .. c:macro:: PyBUF_F_CONTIGUOUS | yes | yes | NULL | F | ++-----------------------------------+-------+---------+------------+--------+ +| .. c:macro:: PyBUF_ANY_CONTIGUOUS | yes | yes | NULL | C or F | ++-----------------------------------+-------+---------+------------+--------+ +| .. c:macro:: PyBUF_ND | yes | NULL | NULL | C | ++-----------------------------------+-------+---------+------------+--------+ - This flag indicates the returned buffer must have suboffsets - information (which can be NULL if no suboffsets are needed). This can - be used when the consumer can handle indirect array referencing implied - by these suboffsets. This implies :c:macro:`PyBUF_STRIDES`. - .. c:macro:: PyBUF_FORMAT +compound requests +~~~~~~~~~~~~~~~~~ - The returned buffer must have true format information if this flag is - provided. This would be used when the consumer is going to be checking - for what 'kind' of data is actually stored. An exporter should always - be able to provide this information if requested. If format is not - explicitly requested then the format must be returned as *NULL* (which - means ``'B'``, or unsigned bytes). +All possible requests are fully defined by some combination of the flags in +the previous section. For convenience, the buffer protocol provides frequently +used combinations as single flags. - .. c:macro:: PyBUF_STRIDED +In the following table *U* stands for undefined contiguity. The consumer would +have to call :c:func:`PyBuffer_IsContiguous` to determine contiguity. - This is equivalent to ``(PyBUF_STRIDES | PyBUF_WRITABLE)``. - .. c:macro:: PyBUF_STRIDED_RO - This is equivalent to ``(PyBUF_STRIDES)``. ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| Request | shape | strides | suboffsets | contig | readonly | format | ++===============================+=======+=========+============+========+==========+========+ +| .. c:macro:: PyBUF_FULL | yes | yes | if needed | U | 0 | yes | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_FULL_RO | yes | yes | if needed | U | 1 or 0 | yes | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_RECORDS | yes | yes | NULL | U | 0 | yes | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_RECORDS_RO | yes | yes | NULL | U | 1 or 0 | yes | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_STRIDED | yes | yes | NULL | U | 0 | NULL | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_STRIDED_RO | yes | yes | NULL | U | 1 or 0 | NULL | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_CONTIG | yes | NULL | NULL | C | 0 | NULL | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_CONTIG_RO | yes | NULL | NULL | C | 1 or 0 | NULL | ++-------------------------------+-------+---------+------------+--------+----------+--------+ - .. c:macro:: PyBUF_RECORDS - This is equivalent to ``(PyBUF_STRIDES | PyBUF_FORMAT | - PyBUF_WRITABLE)``. +Complex arrays +============== - .. c:macro:: PyBUF_RECORDS_RO +NumPy-style: shape and strides +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The logical structure of NumPy-style arrays is defined by :c:member:`~Py_buffer.itemsize`, +:c:member:`~Py_buffer.ndim`, :c:member:`~Py_buffer.shape` and :c:member:`~Py_buffer.strides`. + +If ``ndim == 0``, the memory location pointed to by :c:member:`~Py_buffer.buf` is +interpreted as a scalar of size :c:member:`~Py_buffer.itemsize`. In that case, +both :c:member:`~Py_buffer.shape` and :c:member:`~Py_buffer.strides` are *NULL*. + +If :c:member:`~Py_buffer.strides` is *NULL*, the array is interpreted as +a standard n-dimensional C-array. Otherwise, the consumer must access an +n-dimensional array as follows: + + ``ptr = (char *)buf + indices[0] * strides[0] + ... + indices[n-1] * strides[n-1]`` + ``item = *((typeof(item) *)ptr);`` + + +As noted above, :c:member:`~Py_buffer.buf` can point to any location within +the actual memory block. An exporter can check the validity of a buffer with +this function: + +.. code-block:: python + + def verify_structure(memlen, itemsize, ndim, shape, strides, offset): + """Verify that the parameters represent a valid array within + the bounds of the allocated memory: + char *mem: start of the physical memory block + memlen: length of the physical memory block + offset: (char *)buf - mem + """ + if offset % itemsize: + return False + if offset < 0 or offset+itemsize > memlen: + return False + if any(v % itemsize for v in strides): + return False + + if ndim <= 0: + return ndim == 0 and not shape and not strides + if 0 in shape: + return True + + imin = sum(strides[j]*(shape[j]-1) for j in range(ndim) + if strides[j] <= 0) + imax = sum(strides[j]*(shape[j]-1) for j in range(ndim) + if strides[j] > 0) + + return 0 <= offset+imin and offset+imax+itemsize <= memlen + + +PIL-style: shape, strides and suboffsets +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In addition to the regular items, PIL-style arrays can contain pointers +that must be followed in order to get to the next element in a dimension. +For example, the regular three-dimensional C-array ``char v[2][2][3]`` can +also be viewed as an array of 2 pointers to 2 two-dimensional arrays: +``char (*v[2])[2][3]``. In suboffsets representation, those two pointers +can be embedded at the start of :c:member:`~Py_buffer.buf`, pointing +to two ``char x[2][3]`` arrays that can be located anywhere in memory. + + +Here is a function that returns a pointer to the element in an N-D array +pointed to by an N-dimensional index when there are both non-NULL strides +and suboffsets:: + + void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides, + Py_ssize_t *suboffsets, Py_ssize_t *indices) { + char *pointer = (char*)buf; + int i; + for (i = 0; i < ndim; i++) { + pointer += strides[i] * indices[i]; + if (suboffsets[i] >=0 ) { + pointer = *((char**)pointer) + suboffsets[i]; + } + } + return (void*)pointer; + } - This is equivalent to ``(PyBUF_STRIDES | PyBUF_FORMAT)``. - .. c:macro:: PyBUF_FULL +Buffer-related functions +======================== - This is equivalent to ``(PyBUF_INDIRECT | PyBUF_FORMAT | - PyBUF_WRITABLE)``. +.. c:function:: int PyObject_CheckBuffer(PyObject *obj) - .. c:macro:: PyBUF_FULL_RO + Return 1 if *obj* supports the buffer interface otherwise 0. When 1 is + returned, it doesn't guarantee that :c:func:`PyObject_GetBuffer` will + succeed. - This is equivalent to ``(PyBUF_INDIRECT | PyBUF_FORMAT)``. - .. c:macro:: PyBUF_CONTIG +.. c:function:: int PyObject_GetBuffer(PyObject *exporter, Py_buffer *view, int flags) - This is equivalent to ``(PyBUF_ND | PyBUF_WRITABLE)``. + Send a request to *exporter* to fill in *view* as specified by *flags*. + If the exporter cannot provide a buffer of the exact type, it MUST raise + :c:data:`PyExc_BufferError`, set :c:member:`view->obj` to *NULL* and + return -1. - .. c:macro:: PyBUF_CONTIG_RO + On success, fill in *view*, set :c:member:`view->obj` to a new reference + to *exporter* and return 0. - This is equivalent to ``(PyBUF_ND)``. + Successful calls to :c:func:`PyObject_GetBuffer` must be paired with calls + to :c:func:`PyBuffer_Release`, similar to :c:func:`malloc` and :c:func:`free`. + Thus, after the consumer is done with the buffer, :c:func:`PyBuffer_Release` + must be called exactly once. .. c:function:: void PyBuffer_Release(Py_buffer *view) - Release the buffer *view*. This should be called when the buffer is no - longer being used as it may free memory from it. + Release the buffer *view* and decrement the reference count for + :c:member:`view->obj`. This function MUST be called when the buffer + is no longer being used, otherwise reference leaks may occur. + + It is an error to call this function on a buffer that was not obtained via + :c:func:`PyObject_GetBuffer`. .. c:function:: Py_ssize_t PyBuffer_SizeFromFormat(const char *) - Return the implied :c:data:`~Py_buffer.itemsize` from the struct-stype - :c:data:`~Py_buffer.format`. + Return the implied :c:data:`~Py_buffer.itemsize` from :c:data:`~Py_buffer.format`. + This function is not yet implemented. -.. c:function:: int PyBuffer_IsContiguous(Py_buffer *view, char fortran) +.. c:function:: int PyBuffer_IsContiguous(Py_buffer *view, char order) - Return 1 if the memory defined by the *view* is C-style (*fortran* is - ``'C'``) or Fortran-style (*fortran* is ``'F'``) contiguous or either one - (*fortran* is ``'A'``). Return 0 otherwise. + Return 1 if the memory defined by the *view* is C-style (*order* is + ``'C'``) or Fortran-style (*order* is ``'F'``) contiguous or either one + (*order* is ``'A'``). Return 0 otherwise. -.. c:function:: void PyBuffer_FillContiguousStrides(int ndim, Py_ssize_t *shape, Py_ssize_t *strides, Py_ssize_t itemsize, char fortran) +.. c:function:: void PyBuffer_FillContiguousStrides(int ndim, Py_ssize_t *shape, Py_ssize_t *strides, Py_ssize_t itemsize, char order) Fill the *strides* array with byte-strides of a contiguous (C-style if - *fortran* is ``'C'`` or Fortran-style if *fortran* is ``'F'``) array of the + *order* is ``'C'`` or Fortran-style if *order* is ``'F'``) array of the given shape with the given number of bytes per element. -.. c:function:: int PyBuffer_FillInfo(Py_buffer *view, PyObject *obj, void *buf, Py_ssize_t len, int readonly, int infoflags) +.. c:function:: int PyBuffer_FillInfo(Py_buffer *view, PyObject *exporter, void *buf, Py_ssize_t len, int readonly, int flags) + + Handle buffer requests for an exporter that wants to expose *buf* of size *len* + with writability set according to *readonly*. *buf* is interpreted as a sequence + of unsigned bytes. + + The *flags* argument indicates the request type. This function always fills in + *view* as specified by flags, unless *buf* has been designated as read-only + and :c:macro:`PyBUF_WRITABLE` is set in *flags*. + + On success, set :c:member:`view->obj` to a new reference to *exporter* and + return 0. Otherwise, raise :c:data:`PyExc_BufferError`, set + :c:member:`view->obj` to *NULL* and return -1; + + If this function is used as part of a :ref:`getbufferproc <buffer-structs>`, + *exporter* MUST be set to the exporting object. Otherwise, *exporter* MUST + be NULL. + - Fill in a buffer-info structure, *view*, correctly for an exporter that can - only share a contiguous chunk of memory of "unsigned bytes" of the given - length. Return 0 on success and -1 (with raising an error) on error. diff --git a/Doc/c-api/memoryview.rst b/Doc/c-api/memoryview.rst index 6b49cdf70a..ef039753af 100644 --- a/Doc/c-api/memoryview.rst +++ b/Doc/c-api/memoryview.rst @@ -17,16 +17,19 @@ any other object. Create a memoryview object from an object that provides the buffer interface. If *obj* supports writable buffer exports, the memoryview object will be - readable and writable, otherwise it will be read-only. + read/write, otherwise it may be either read-only or read/write at the + discretion of the exporter. +.. c:function:: PyObject *PyMemoryView_FromMemory(char *mem, Py_ssize_t size, int flags) + + Create a memoryview object using *mem* as the underlying buffer. + *flags* can be one of :c:macro:`PyBUF_READ` or :c:macro:`PyBUF_WRITE`. .. c:function:: PyObject *PyMemoryView_FromBuffer(Py_buffer *view) Create a memoryview object wrapping the given buffer structure *view*. - The memoryview object then owns the buffer represented by *view*, which - means you shouldn't try to call :c:func:`PyBuffer_Release` yourself: it - will be done on deallocation of the memoryview object. - + For simple byte buffers, :c:func:`PyMemoryView_FromMemory` is the preferred + function. .. c:function:: PyObject *PyMemoryView_GetContiguous(PyObject *obj, int buffertype, char order) @@ -43,10 +46,16 @@ any other object. currently allowed to create subclasses of :class:`memoryview`. -.. c:function:: Py_buffer *PyMemoryView_GET_BUFFER(PyObject *obj) +.. c:function:: Py_buffer *PyMemoryView_GET_BUFFER(PyObject *mview) + + Return a pointer to the memoryview's private copy of the exporter's buffer. + *mview* **must** be a memoryview instance; this macro doesn't check its type, + you must do it yourself or you will risk crashes. + +.. c:function:: Py_buffer *PyMemoryView_GET_BASE(PyObject *mview) - Return a pointer to the buffer structure wrapped by the given - memoryview object. The object **must** be a memoryview instance; - this macro doesn't check its type, you must do it yourself or you - will risk crashes. + Return either a pointer to the exporting object that the memoryview is based + on or *NULL* if the memoryview has been created by one of the functions + :c:func:`PyMemoryView_FromMemory` or :c:func:`PyMemoryView_FromBuffer`. + *mview* **must** be a memoryview instance. diff --git a/Doc/c-api/typeobj.rst b/Doc/c-api/typeobj.rst index 68ca9adaa9..b15d927719 100644 --- a/Doc/c-api/typeobj.rst +++ b/Doc/c-api/typeobj.rst @@ -1198,46 +1198,74 @@ Buffer Object Structures .. sectionauthor:: Greg J. Stein <greg@lyra.org> .. sectionauthor:: Benjamin Peterson +.. sectionauthor:: Stefan Krah +.. c:type:: PyBufferProcs -The :ref:`buffer interface <bufferobjects>` exports a model where an object can expose its internal -data. + This structure holds pointers to the functions required by the + :ref:`Buffer protocol <bufferobjects>`. The protocol defines how + an exporter object can expose its internal data to consumer objects. -If an object does not export the buffer interface, then its :attr:`tp_as_buffer` -member in the :c:type:`PyTypeObject` structure should be *NULL*. Otherwise, the -:attr:`tp_as_buffer` will point to a :c:type:`PyBufferProcs` structure. +.. c:member:: getbufferproc PyBufferProcs.bf_getbuffer + The signature of this function is:: -.. c:type:: PyBufferProcs + int (PyObject *exporter, Py_buffer *view, int flags); + + Handle a request to *exporter* to fill in *view* as specified by *flags*. + A standard implementation of this function will take these steps: + + - Check if the request can be met. If not, raise :c:data:`PyExc_BufferError`, + set :c:data:`view->obj` to *NULL* and return -1. + + - Fill in the requested fields. + + - Increment an internal counter for the number of exports. + + - Set :c:data:`view->obj` to *exporter* and increment :c:data:`view->obj`. + + - Return 0. + + The individual fields of *view* are described in section + :ref:`Buffer structure <buffer-structure>`, the rules how an exporter + must react to specific requests are in section + :ref:`Buffer request types <buffer-request-types>`. + + All memory pointed to in the :c:type:`Py_buffer` structure belongs to + the exporter and must remain valid until there are no consumers left. + :c:member:`~Py_buffer.shape`, :c:member:`~Py_buffer.strides`, + :c:member:`~Py_buffer.suboffsets` and :c:member:`~Py_buffer.internal` + are read-only for the consumer. + + :c:func:`PyBuffer_FillInfo` provides an easy way of exposing a simple + bytes buffer while dealing correctly with all request types. + + :c:func:`PyObject_GetBuffer` is the interface for the consumer that + wraps this function. + +.. c:member:: releasebufferproc PyBufferProcs.bf_releasebuffer + + The signature of this function is:: + + void (PyObject *exporter, Py_buffer *view); - Structure used to hold the function pointers which define an implementation of - the buffer protocol. + Handle a request to release the resources of the buffer. If no resources + need to be released, this field may be *NULL*. A standard implementation + of this function will take these steps: - .. c:member:: getbufferproc bf_getbuffer + - Decrement an internal counter for the number of exports. - This should fill a :c:type:`Py_buffer` with the necessary data for - exporting the type. The signature of :data:`getbufferproc` is ``int - (PyObject *obj, Py_buffer *view, int flags)``. *obj* is the object to - export, *view* is the :c:type:`Py_buffer` struct to fill, and *flags* gives - the conditions the caller wants the memory under. (See - :c:func:`PyObject_GetBuffer` for all flags.) :c:member:`bf_getbuffer` is - responsible for filling *view* with the appropriate information. - (:c:func:`PyBuffer_FillView` can be used in simple cases.) See - :c:type:`Py_buffer`\s docs for what needs to be filled in. + - If the counter is 0, free all memory associated with *view*. + The exporter MUST use the :c:member:`~Py_buffer.internal` field to keep + track of buffer-specific resources (if present). This field is guaranteed + to remain constant, while a consumer MAY pass a copy of the original buffer + as the *view* argument. - .. c:member:: releasebufferproc bf_releasebuffer - This should release the resources of the buffer. The signature of - :c:data:`releasebufferproc` is ``void (PyObject *obj, Py_buffer *view)``. - If the :c:data:`bf_releasebuffer` function is not provided (i.e. it is - *NULL*), then it does not ever need to be called. + This function MUST NOT decrement :c:data:`view->obj`, since that is + done automatically in :c:func:`PyBuffer_Release`. - The exporter of the buffer interface must make sure that any memory - pointed to in the :c:type:`Py_buffer` structure remains valid until - releasebuffer is called. Exporters will need to define a - :c:data:`bf_releasebuffer` function if they can re-allocate their memory, - strides, shape, suboffsets, or format variables which they might share - through the struct bufferinfo. - See :c:func:`PyBuffer_Release`. + :c:func:`PyBuffer_Release` is the interface for the consumer that + wraps this function. diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index a07be4f91f..183b2f78fc 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -2377,7 +2377,7 @@ memoryview type :class:`memoryview` objects allow Python code to access the internal data of an object that supports the :ref:`buffer protocol <bufferobjects>` without -copying. Memory is generally interpreted as simple bytes. +copying. .. class:: memoryview(obj) @@ -2391,52 +2391,88 @@ copying. Memory is generally interpreted as simple bytes. is a single byte, but other types such as :class:`array.array` may have bigger elements. - ``len(view)`` returns the total number of elements in the memoryview, - *view*. The :class:`~memoryview.itemsize` attribute will give you the + ``len(view)`` is equal to the length of :class:`~memoryview.tolist`. + If ``view.ndim = 0``, the length is 1. If ``view.ndim = 1``, the length + is equal to the number of elements in the view. For higher dimensions, + the length is equal to the length of the nested list representation of + the view. The :class:`~memoryview.itemsize` attribute will give you the number of bytes in a single element. - A :class:`memoryview` supports slicing to expose its data. Taking a single - index will return a single element as a :class:`bytes` object. Full - slicing will result in a subview:: + A :class:`memoryview` supports slicing to expose its data. If + :class:`~memoryview.format` is one of the native format specifiers + from the :mod:`struct` module, indexing will return a single element + with the correct type. Full slicing will result in a subview:: + + >>> v = memoryview(b'abcefg') + >>> v[1] + 98 + >>> v[-1] + 103 + >>> v[1:4] + <memory at 0x7f3ddc9f4350> + >>> bytes(v[1:4]) + b'bce' + + Other native formats:: + + >>> import array + >>> a = array.array('l', [-11111111, 22222222, -33333333, 44444444]) + >>> a[0] + -11111111 + >>> a[-1] + 44444444 + >>> a[2:3].tolist() + [-33333333] + >>> a[::2].tolist() + [-11111111, -33333333] + >>> a[::-1].tolist() + [44444444, -33333333, 22222222, -11111111] - >>> v = memoryview(b'abcefg') - >>> v[1] - b'b' - >>> v[-1] - b'g' - >>> v[1:4] - <memory at 0x77ab28> - >>> bytes(v[1:4]) - b'bce' - - If the object the memoryview is over supports changing its data, the - memoryview supports slice assignment:: + .. versionadded:: 3.3 + + If the underlying object is writable, the memoryview supports slice + assignment. Resizing is not allowed:: >>> data = bytearray(b'abcefg') >>> v = memoryview(data) >>> v.readonly False - >>> v[0] = b'z' + >>> v[0] = ord(b'z') >>> data bytearray(b'zbcefg') >>> v[1:4] = b'123' >>> data bytearray(b'z123fg') - >>> v[2] = b'spam' + >>> v[2:3] = b'spam' Traceback (most recent call last): - File "<stdin>", line 1, in <module> - ValueError: cannot modify size of memoryview object - - Notice how the size of the memoryview object cannot be changed. + File "<stdin>", line 1, in <module> + ValueError: memoryview assignment: lvalue and rvalue have different structures + >>> v[2:6] = b'spam' + >>> data + bytearray(b'z1spam') - Memoryviews of hashable (read-only) types are also hashable and their - hash value matches the corresponding bytes object:: + Memoryviews of hashable (read-only) types are also hashable. The hash + is defined as ``hash(m) == hash(m.tobytes())``:: >>> v = memoryview(b'abcefg') >>> hash(v) == hash(b'abcefg') True >>> hash(v[2:4]) == hash(b'ce') True + >>> hash(v[::-2]) == hash(b'abcefg'[::-2]) + True + + Hashing of multi-dimensional objects is supported:: + + >>> buf = bytes(list(range(12))) + >>> x = memoryview(buf) + >>> y = x.cast('B', shape=[2,2,3]) + >>> x.tolist() + [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] + >>> y.tolist() + [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]] + >>> hash(x) == hash(y) == hash(y.tobytes()) + True .. versionchanged:: 3.3 Memoryview objects are now hashable. @@ -2455,12 +2491,20 @@ copying. Memory is generally interpreted as simple bytes. >>> bytes(m) b'abc' + For non-contiguous arrays the result is equal to the flattened list + representation with all elements converted to bytes. + .. method:: tolist() - Return the data in the buffer as a list of integers. :: + Return the data in the buffer as a list of elements. :: >>> memoryview(b'abc').tolist() [97, 98, 99] + >>> import array + >>> a = array.array('d', [1.1, 2.2, 3.3]) + >>> m = memoryview(a) + >>> m.tolist() + [1.1, 2.2, 3.3] .. method:: release() @@ -2487,7 +2531,7 @@ copying. Memory is generally interpreted as simple bytes. >>> with memoryview(b'abc') as m: ... m[0] ... - b'a' + 97 >>> m[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> @@ -2495,45 +2539,219 @@ copying. Memory is generally interpreted as simple bytes. .. versionadded:: 3.2 + .. method:: cast(format[, shape]) + + Cast a memoryview to a new format or shape. *shape* defaults to + ``[byte_length//new_itemsize]``, which means that the result view + will be one-dimensional. The return value is a new memoryview, but + the buffer itself is not copied. Supported casts are 1D -> C-contiguous + and C-contiguous -> 1D. One of the formats must be a byte format + ('B', 'b' or 'c'). The byte length of the result must be the same + as the original length. + + Cast 1D/long to 1D/unsigned bytes:: + + >>> import array + >>> a = array.array('l', [1,2,3]) + >>> x = memoryview(a) + >>> x.format + 'l' + >>> x.itemsize + 8 + >>> len(x) + 3 + >>> x.nbytes + 24 + >>> y = x.cast('B') + >>> y.format + 'B' + >>> y.itemsize + 1 + >>> len(y) + 24 + >>> y.nbytes + 24 + + Cast 1D/unsigned bytes to 1D/char:: + + >>> b = bytearray(b'zyz') + >>> x = memoryview(b) + >>> x[0] = b'a' + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + ValueError: memoryview: invalid value for format "B" + >>> y = x.cast('c') + >>> y[0] = b'a' + >>> b + bytearray(b'ayz') + + Cast 1D/bytes to 3D/ints to 1D/signed char:: + + >>> import struct + >>> buf = struct.pack("i"*12, *list(range(12))) + >>> x = memoryview(buf) + >>> y = x.cast('i', shape=[2,2,3]) + >>> y.tolist() + [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]] + >>> y.format + 'i' + >>> y.itemsize + 4 + >>> len(y) + 2 + >>> y.nbytes + 48 + >>> z = y.cast('b') + >>> z.format + 'b' + >>> z.itemsize + 1 + >>> len(z) + 48 + >>> z.nbytes + 48 + + Cast 1D/unsigned char to to 2D/unsigned long:: + + >>> buf = struct.pack("L"*6, *list(range(6))) + >>> x = memoryview(buf) + >>> y = x.cast('L', shape=[2,3]) + >>> len(y) + 2 + >>> y.nbytes + 48 + >>> y.tolist() + [[0, 1, 2], [3, 4, 5]] + + .. versionadded:: 3.3 + There are also several readonly attributes available: + .. attribute:: obj + + The underlying object of the memoryview:: + + >>> b = bytearray(b'xyz') + >>> m = memoryview(b) + >>> m.obj is b + True + + .. versionadded:: 3.3 + + .. attribute:: nbytes + + ``nbytes == product(shape) * itemsize == len(m.tobytes())``. This is + the amount of space in bytes that the array would use in a contiguous + representation. It is not necessarily equal to len(m):: + + >>> import array + >>> a = array.array('i', [1,2,3,4,5]) + >>> m = memoryview(a) + >>> len(m) + 5 + >>> m.nbytes + 20 + >>> y = m[::2] + >>> len(y) + 3 + >>> y.nbytes + 12 + >>> len(y.tobytes()) + 12 + + Multi-dimensional arrays:: + + >>> import struct + >>> buf = struct.pack("d"*12, *[1.5*x for x in range(12)]) + >>> x = memoryview(buf) + >>> y = x.cast('d', shape=[3,4]) + >>> y.tolist() + [[0.0, 1.5, 3.0, 4.5], [6.0, 7.5, 9.0, 10.5], [12.0, 13.5, 15.0, 16.5]] + >>> len(y) + 3 + >>> y.nbytes + 96 + + .. versionadded:: 3.3 + + .. attribute:: readonly + + A bool indicating whether the memory is read only. + .. attribute:: format A string containing the format (in :mod:`struct` module style) for each - element in the view. This defaults to ``'B'``, a simple bytestring. + element in the view. A memoryview can be created from exporters with + arbitrary format strings, but some methods (e.g. :meth:`tolist`) are + restricted to native single element formats. Special care must be taken + when comparing memoryviews. Since comparisons are required to return a + value for ``==`` and ``!=``, two memoryviews referencing the same + exporter can compare as not-equal if the exporter's format is not + understood:: + + >>> from ctypes import BigEndianStructure, c_long + >>> class BEPoint(BigEndianStructure): + ... _fields_ = [("x", c_long), ("y", c_long)] + ... + >>> point = BEPoint(100, 200) + >>> a = memoryview(point) + >>> b = memoryview(point) + >>> a == b + False + >>> a.tolist() + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + NotImplementedError: memoryview: unsupported format T{>l:x:>l:y:} .. attribute:: itemsize The size in bytes of each element of the memoryview:: - >>> m = memoryview(array.array('H', [1,2,3])) + >>> import array, struct + >>> m = memoryview(array.array('H', [32000, 32001, 32002])) >>> m.itemsize 2 >>> m[0] - b'\x01\x00' - >>> len(m[0]) == m.itemsize + 32000 + >>> struct.calcsize('H') == m.itemsize True - .. attribute:: shape - - A tuple of integers the length of :attr:`ndim` giving the shape of the - memory as a N-dimensional array. - .. attribute:: ndim An integer indicating how many dimensions of a multi-dimensional array the memory represents. + .. attribute:: shape + + A tuple of integers the length of :attr:`ndim` giving the shape of the + memory as a N-dimensional array. + .. attribute:: strides A tuple of integers the length of :attr:`ndim` giving the size in bytes to access each element for each dimension of the array. - .. attribute:: readonly + .. attribute:: suboffsets - A bool indicating whether the memory is read only. + Used internally for PIL-style arrays. The value is informational only. + + .. attribute:: c_contiguous + + A bool indicating whether the memory is C-contiguous. + + .. versionadded:: 3.3 + + .. attribute:: f_contiguous + + A bool indicating whether the memory is Fortran contiguous. + + .. versionadded:: 3.3 + + .. attribute:: contiguous + + A bool indicating whether the memory is contiguous. - .. memoryview.suboffsets isn't documented because it only seems useful for C + .. versionadded:: 3.3 .. _typecontextmanager: diff --git a/Doc/whatsnew/3.3.rst b/Doc/whatsnew/3.3.rst index 20e2914bc0..560331f39b 100644 --- a/Doc/whatsnew/3.3.rst +++ b/Doc/whatsnew/3.3.rst @@ -49,6 +49,62 @@ This article explains the new features in Python 3.3, compared to 3.2. +.. _pep-3118: + +PEP 3118: New memoryview implementation and buffer protocol documentation +========================================================================= + +:issue:`10181` - memoryview bug fixes and features. + Written by Stefan Krah. + +The new memoryview implementation comprehensively fixes all ownership and +lifetime issues of dynamically allocated fields in the Py_buffer struct +that led to multiple crash reports. Additionally, several functions that +crashed or returned incorrect results for non-contiguous or multi-dimensional +input have been fixed. + +The memoryview object now has a PEP-3118 compliant getbufferproc() +that checks the consumer's request type. Many new features have been +added, most of them work in full generality for non-contiguous arrays +and arrays with suboffsets. + +The documentation has been updated, clearly spelling out responsibilities +for both exporters and consumers. Buffer request flags are grouped into +basic and compound flags. The memory layout of non-contiguous and +multi-dimensional NumPy-style arrays is explained. + +Features +-------- + +* All native single character format specifiers in struct module syntax + (optionally prefixed with '@') are now supported. + +* With some restrictions, the cast() method allows changing of format and + shape of C-contiguous arrays. + +* Multi-dimensional list representations are supported for any array type. + +* Multi-dimensional comparisons are supported for any array type. + +* All array types are hashable if the exporting object is hashable + and the view is read-only. + +* Arbitrary slicing of any 1-D arrays type is supported. For example, it + is now possible to reverse a memoryview in O(1) by using a negative step. + +API changes +----------- + +* The maximum number of dimensions is officially limited to 64. + +* The representation of empty shape, strides and suboffsets is now + an empty tuple instead of None. + +* Accessing a memoryview element with format 'B' (unsigned bytes) + now returns an integer (in accordance with the struct module syntax). + For returning a bytes object the view must be cast to 'c' first. + + .. _pep-393: PEP 393: Flexible String Representation |