summaryrefslogtreecommitdiff
path: root/Doc/library/pickle.rst
diff options
context:
space:
mode:
authorGeorg Brandl <georg@python.org>2007-08-15 14:28:01 +0000
committerGeorg Brandl <georg@python.org>2007-08-15 14:28:01 +0000
commit8ec7f656134b1230ab23003a94ba3266d7064122 (patch)
treebc730d5fb3302dc375edd26b26f750d609b61d72 /Doc/library/pickle.rst
parentf56181ff53ba00b7bed3997a4dccd9a1b6217b57 (diff)
downloadcpython-git-8ec7f656134b1230ab23003a94ba3266d7064122.tar.gz
Move the 2.6 reST doc tree in place.
Diffstat (limited to 'Doc/library/pickle.rst')
-rw-r--r--Doc/library/pickle.rst868
1 files changed, 868 insertions, 0 deletions
diff --git a/Doc/library/pickle.rst b/Doc/library/pickle.rst
new file mode 100644
index 0000000000..ab19ff89e9
--- /dev/null
+++ b/Doc/library/pickle.rst
@@ -0,0 +1,868 @@
+
+:mod:`pickle` --- Python object serialization
+=============================================
+
+.. index::
+ single: persistence
+ pair: persistent; objects
+ pair: serializing; objects
+ pair: marshalling; objects
+ pair: flattening; objects
+ pair: pickling; objects
+
+.. module:: pickle
+ :synopsis: Convert Python objects to streams of bytes and back.
+
+
+.. % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
+.. % Rewritten by Barry Warsaw <barry@zope.com>
+
+The :mod:`pickle` module implements a fundamental, but powerful algorithm for
+serializing and de-serializing a Python object structure. "Pickling" is the
+process whereby a Python object hierarchy is converted into a byte stream, and
+"unpickling" is the inverse operation, whereby a byte stream is converted back
+into an object hierarchy. Pickling (and unpickling) is alternatively known as
+"serialization", "marshalling," [#]_ or "flattening", however, to avoid
+confusion, the terms used here are "pickling" and "unpickling".
+
+This documentation describes both the :mod:`pickle` module and the
+:mod:`cPickle` module.
+
+
+Relationship to other Python modules
+------------------------------------
+
+The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
+module. As its name implies, :mod:`cPickle` is written in C, so it can be up to
+1000 times faster than :mod:`pickle`. However it does not support subclassing
+of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
+these are functions, not classes. Most applications have no need for this
+functionality, and can benefit from the improved performance of :mod:`cPickle`.
+Other than that, the interfaces of the two modules are nearly identical; the
+common interface is described in this manual and differences are pointed out
+where necessary. In the following discussions, we use the term "pickle" to
+collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
+
+The data streams the two modules produce are guaranteed to be interchangeable.
+
+Python has a more primitive serialization module called :mod:`marshal`, but in
+general :mod:`pickle` should always be the preferred way to serialize Python
+objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
+files.
+
+The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
+
+* The :mod:`pickle` module keeps track of the objects it has already serialized,
+ so that later references to the same object won't be serialized again.
+ :mod:`marshal` doesn't do this.
+
+ This has implications both for recursive objects and object sharing. Recursive
+ objects are objects that contain references to themselves. These are not
+ handled by marshal, and in fact, attempting to marshal recursive objects will
+ crash your Python interpreter. Object sharing happens when there are multiple
+ references to the same object in different places in the object hierarchy being
+ serialized. :mod:`pickle` stores such objects only once, and ensures that all
+ other references point to the master copy. Shared objects remain shared, which
+ can be very important for mutable objects.
+
+* :mod:`marshal` cannot be used to serialize user-defined classes and their
+ instances. :mod:`pickle` can save and restore class instances transparently,
+ however the class definition must be importable and live in the same module as
+ when the object was stored.
+
+* The :mod:`marshal` serialization format is not guaranteed to be portable
+ across Python versions. Because its primary job in life is to support
+ :file:`.pyc` files, the Python implementers reserve the right to change the
+ serialization format in non-backwards compatible ways should the need arise.
+ The :mod:`pickle` serialization format is guaranteed to be backwards compatible
+ across Python releases.
+
+.. warning::
+
+ The :mod:`pickle` module is not intended to be secure against erroneous or
+ maliciously constructed data. Never unpickle data received from an untrusted or
+ unauthenticated source.
+
+Note that serialization is a more primitive notion than persistence; although
+:mod:`pickle` reads and writes file objects, it does not handle the issue of
+naming persistent objects, nor the (even more complicated) issue of concurrent
+access to persistent objects. The :mod:`pickle` module can transform a complex
+object into a byte stream and it can transform the byte stream into an object
+with the same internal structure. Perhaps the most obvious thing to do with
+these byte streams is to write them onto a file, but it is also conceivable to
+send them across a network or store them in a database. The module
+:mod:`shelve` provides a simple interface to pickle and unpickle objects on
+DBM-style database files.
+
+
+Data stream format
+------------------
+
+.. index::
+ single: XDR
+ single: External Data Representation
+
+The data format used by :mod:`pickle` is Python-specific. This has the
+advantage that there are no restrictions imposed by external standards such as
+XDR (which can't represent pointer sharing); however it means that non-Python
+programs may not be able to reconstruct pickled Python objects.
+
+By default, the :mod:`pickle` data format uses a printable ASCII representation.
+This is slightly more voluminous than a binary representation. The big
+advantage of using printable ASCII (and of some other characteristics of
+:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
+possible for a human to read the pickled file with a standard text editor.
+
+There are currently 3 different protocols which can be used for pickling.
+
+* Protocol version 0 is the original ASCII protocol and is backwards compatible
+ with earlier versions of Python.
+
+* Protocol version 1 is the old binary format which is also compatible with
+ earlier versions of Python.
+
+* Protocol version 2 was introduced in Python 2.3. It provides much more
+ efficient pickling of new-style classes.
+
+Refer to :pep:`307` for more information.
+
+If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
+as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
+available will be used.
+
+.. versionchanged:: 2.3
+ Introduced the *protocol* parameter.
+
+A binary format, which is slightly more efficient, can be chosen by specifying a
+*protocol* version >= 1.
+
+
+Usage
+-----
+
+To serialize an object hierarchy, you first create a pickler, then you call the
+pickler's :meth:`dump` method. To de-serialize a data stream, you first create
+an unpickler, then you call the unpickler's :meth:`load` method. The
+:mod:`pickle` module provides the following constant:
+
+
+.. data:: HIGHEST_PROTOCOL
+
+ The highest protocol version available. This value can be passed as a
+ *protocol* value.
+
+ .. versionadded:: 2.3
+
+.. note::
+
+ Be sure to always open pickle files created with protocols >= 1 in binary mode.
+ For the old ASCII-based pickle protocol 0 you can use either text mode or binary
+ mode as long as you stay consistent.
+
+ A pickle file written with protocol 0 in binary mode will contain lone linefeeds
+ as line terminators and therefore will look "funny" when viewed in Notepad or
+ other editors which do not support this format.
+
+The :mod:`pickle` module provides the following functions to make the pickling
+process more convenient:
+
+
+.. function:: dump(obj, file[, protocol])
+
+ Write a pickled representation of *obj* to the open file object *file*. This is
+ equivalent to ``Pickler(file, protocol).dump(obj)``.
+
+ If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
+ specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
+ version will be used.
+
+ .. versionchanged:: 2.3
+ Introduced the *protocol* parameter.
+
+ *file* must have a :meth:`write` method that accepts a single string argument.
+ It can thus be a file object opened for writing, a :mod:`StringIO` object, or
+ any other custom object that meets this interface.
+
+
+.. function:: load(file)
+
+ Read a string from the open file object *file* and interpret it as a pickle data
+ stream, reconstructing and returning the original object hierarchy. This is
+ equivalent to ``Unpickler(file).load()``.
+
+ *file* must have two methods, a :meth:`read` method that takes an integer
+ argument, and a :meth:`readline` method that requires no arguments. Both
+ methods should return a string. Thus *file* can be a file object opened for
+ reading, a :mod:`StringIO` object, or any other custom object that meets this
+ interface.
+
+ This function automatically determines whether the data stream was written in
+ binary mode or not.
+
+
+.. function:: dumps(obj[, protocol])
+
+ Return the pickled representation of the object as a string, instead of writing
+ it to a file.
+
+ If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
+ specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
+ version will be used.
+
+ .. versionchanged:: 2.3
+ The *protocol* parameter was added.
+
+
+.. function:: loads(string)
+
+ Read a pickled object hierarchy from a string. Characters in the string past
+ the pickled object's representation are ignored.
+
+The :mod:`pickle` module also defines three exceptions:
+
+
+.. exception:: PickleError
+
+ A common base class for the other exceptions defined below. This inherits from
+ :exc:`Exception`.
+
+
+.. exception:: PicklingError
+
+ This exception is raised when an unpicklable object is passed to the
+ :meth:`dump` method.
+
+
+.. exception:: UnpicklingError
+
+ This exception is raised when there is a problem unpickling an object. Note that
+ other exceptions may also be raised during unpickling, including (but not
+ necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
+ :exc:`ImportError`, and :exc:`IndexError`.
+
+The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
+:class:`Unpickler`:
+
+
+.. class:: Pickler(file[, protocol])
+
+ This takes a file-like object to which it will write a pickle data stream.
+
+ If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
+ specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
+ protocol version will be used.
+
+ .. versionchanged:: 2.3
+ Introduced the *protocol* parameter.
+
+ *file* must have a :meth:`write` method that accepts a single string argument.
+ It can thus be an open file object, a :mod:`StringIO` object, or any other
+ custom object that meets this interface.
+
+:class:`Pickler` objects define one (or two) public methods:
+
+
+.. method:: Pickler.dump(obj)
+
+ Write a pickled representation of *obj* to the open file object given in the
+ constructor. Either the binary or ASCII format will be used, depending on the
+ value of the *protocol* argument passed to the constructor.
+
+
+.. method:: Pickler.clear_memo()
+
+ Clears the pickler's "memo". The memo is the data structure that remembers
+ which objects the pickler has already seen, so that shared or recursive objects
+ pickled by reference and not by value. This method is useful when re-using
+ picklers.
+
+ .. note::
+
+ Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
+ created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an
+ instance variable called :attr:`memo` which is a Python dictionary. So to clear
+ the memo for a :mod:`pickle` module pickler, you could do the following::
+
+ mypickler.memo.clear()
+
+ Code that does not need to support older versions of Python should simply use
+ :meth:`clear_memo`.
+
+It is possible to make multiple calls to the :meth:`dump` method of the same
+:class:`Pickler` instance. These must then be matched to the same number of
+calls to the :meth:`load` method of the corresponding :class:`Unpickler`
+instance. If the same object is pickled by multiple :meth:`dump` calls, the
+:meth:`load` will all yield references to the same object. [#]_
+
+:class:`Unpickler` objects are defined as:
+
+
+.. class:: Unpickler(file)
+
+ This takes a file-like object from which it will read a pickle data stream.
+ This class automatically determines whether the data stream was written in
+ binary mode or not, so it does not need a flag as in the :class:`Pickler`
+ factory.
+
+ *file* must have two methods, a :meth:`read` method that takes an integer
+ argument, and a :meth:`readline` method that requires no arguments. Both
+ methods should return a string. Thus *file* can be a file object opened for
+ reading, a :mod:`StringIO` object, or any other custom object that meets this
+ interface.
+
+:class:`Unpickler` objects have one (or two) public methods:
+
+
+.. method:: Unpickler.load()
+
+ Read a pickled object representation from the open file object given in the
+ constructor, and return the reconstituted object hierarchy specified therein.
+
+ This method automatically determines whether the data stream was written in
+ binary mode or not.
+
+
+.. method:: Unpickler.noload()
+
+ This is just like :meth:`load` except that it doesn't actually create any
+ objects. This is useful primarily for finding what's called "persistent ids"
+ that may be referenced in a pickle data stream. See section
+ :ref:`pickle-protocol` below for more details.
+
+ **Note:** the :meth:`noload` method is currently only available on
+ :class:`Unpickler` objects created with the :mod:`cPickle` module.
+ :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
+ method.
+
+
+What can be pickled and unpickled?
+----------------------------------
+
+The following types can be pickled:
+
+* ``None``, ``True``, and ``False``
+
+* integers, long integers, floating point numbers, complex numbers
+
+* normal and Unicode strings
+
+* tuples, lists, sets, and dictionaries containing only picklable objects
+
+* functions defined at the top level of a module
+
+* built-in functions defined at the top level of a module
+
+* classes that are defined at the top level of a module
+
+* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
+ picklable (see section :ref:`pickle-protocol` for details)
+
+Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
+exception; when this happens, an unspecified number of bytes may have already
+been written to the underlying file. Trying to pickle a highly recursive data
+structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
+raised in this case. You can carefully raise this limit with
+:func:`sys.setrecursionlimit`.
+
+Note that functions (built-in and user-defined) are pickled by "fully qualified"
+name reference, not by value. This means that only the function name is
+pickled, along with the name of module the function is defined in. Neither the
+function's code, nor any of its function attributes are pickled. Thus the
+defining module must be importable in the unpickling environment, and the module
+must contain the named object, otherwise an exception will be raised. [#]_
+
+Similarly, classes are pickled by named reference, so the same restrictions in
+the unpickling environment apply. Note that none of the class's code or data is
+pickled, so in the following example the class attribute ``attr`` is not
+restored in the unpickling environment::
+
+ class Foo:
+ attr = 'a class attr'
+
+ picklestring = pickle.dumps(Foo)
+
+These restrictions are why picklable functions and classes must be defined in
+the top level of a module.
+
+Similarly, when class instances are pickled, their class's code and data are not
+pickled along with them. Only the instance data are pickled. This is done on
+purpose, so you can fix bugs in a class or add methods to the class and still
+load objects that were created with an earlier version of the class. If you
+plan to have long-lived objects that will see many versions of a class, it may
+be worthwhile to put a version number in the objects so that suitable
+conversions can be made by the class's :meth:`__setstate__` method.
+
+
+.. _pickle-protocol:
+
+The pickle protocol
+-------------------
+
+This section describes the "pickling protocol" that defines the interface
+between the pickler/unpickler and the objects that are being serialized. This
+protocol provides a standard way for you to define, customize, and control how
+your objects are serialized and de-serialized. The description in this section
+doesn't cover specific customizations that you can employ to make the unpickling
+environment slightly safer from untrusted pickle data streams; see section
+:ref:`pickle-sub` for more details.
+
+
+.. _pickle-inst:
+
+Pickling and unpickling normal class instances
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. index::
+ single: __getinitargs__() (copy protocol)
+ single: __init__() (instance constructor)
+
+When a pickled class instance is unpickled, its :meth:`__init__` method is
+normally *not* invoked. If it is desirable that the :meth:`__init__` method be
+called on unpickling, an old-style class can define a method
+:meth:`__getinitargs__`, which should return a *tuple* containing the arguments
+to be passed to the class constructor (:meth:`__init__` for example). The
+:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is
+incorporated in the pickle for the instance.
+
+.. index:: single: __getnewargs__() (copy protocol)
+
+New-style types can provide a :meth:`__getnewargs__` method that is used for
+protocol 2. Implementing this method is needed if the type establishes some
+internal invariants when the instance is created, or if the memory allocation is
+affected by the values passed to the :meth:`__new__` method for the type (as it
+is for tuples and strings). Instances of a new-style type :class:`C` are
+created using ::
+
+ obj = C.__new__(C, *args)
+
+
+where *args* is the result of calling :meth:`__getnewargs__` on the original
+object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
+
+.. index::
+ single: __getstate__() (copy protocol)
+ single: __setstate__() (copy protocol)
+ single: __dict__ (instance attribute)
+
+Classes can further influence how their instances are pickled; if the class
+defines the method :meth:`__getstate__`, it is called and the return state is
+pickled as the contents for the instance, instead of the contents of the
+instance's dictionary. If there is no :meth:`__getstate__` method, the
+instance's :attr:`__dict__` is pickled.
+
+Upon unpickling, if the class also defines the method :meth:`__setstate__`, it
+is called with the unpickled state. [#]_ If there is no :meth:`__setstate__`
+method, the pickled state must be a dictionary and its items are assigned to the
+new instance's dictionary. If a class defines both :meth:`__getstate__` and
+:meth:`__setstate__`, the state object needn't be a dictionary and these methods
+can do what they want. [#]_
+
+.. warning::
+
+ For new-style classes, if :meth:`__getstate__` returns a false value, the
+ :meth:`__setstate__` method will not be called.
+
+
+Pickling and unpickling extension types
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When the :class:`Pickler` encounters an object of a type it knows nothing about
+--- such as an extension type --- it looks in two places for a hint of how to
+pickle it. One alternative is for the object to implement a :meth:`__reduce__`
+method. If provided, at pickling time :meth:`__reduce__` will be called with no
+arguments, and it must return either a string or a tuple.
+
+If a string is returned, it names a global variable whose contents are pickled
+as normal. The string returned by :meth:`__reduce__` should be the object's
+local name relative to its module; the pickle module searches the module
+namespace to determine the object's module.
+
+When a tuple is returned, it must be between two and five elements long.
+Optional elements can either be omitted, or ``None`` can be provided as their
+value. The semantics of each element are:
+
+* A callable object that will be called to create the initial version of the
+ object. The next element of the tuple will provide arguments for this callable,
+ and later elements provide additional state information that will subsequently
+ be used to fully reconstruct the pickled data.
+
+ In the unpickling environment this object must be either a class, a callable
+ registered as a "safe constructor" (see below), or it must have an attribute
+ :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
+ :exc:`UnpicklingError` will be raised in the unpickling environment. Note that
+ as usual, the callable itself is pickled by name.
+
+* A tuple of arguments for the callable object.
+
+ .. versionchanged:: 2.5
+ Formerly, this argument could also be ``None``.
+
+* Optionally, the object's state, which will be passed to the object's
+ :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the
+ object has no :meth:`__setstate__` method, then, as above, the value must be a
+ dictionary and it will be added to the object's :attr:`__dict__`.
+
+* Optionally, an iterator (and not a sequence) yielding successive list items.
+ These list items will be pickled, and appended to the object using either
+ ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used
+ for list subclasses, but may be used by other classes as long as they have
+ :meth:`append` and :meth:`extend` methods with the appropriate signature.
+ (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
+ protocol version is used as well as the number of items to append, so both must
+ be supported.)
+
+* Optionally, an iterator (not a sequence) yielding successive dictionary items,
+ which should be tuples of the form ``(key, value)``. These items will be
+ pickled and stored to the object using ``obj[key] = value``. This is primarily
+ used for dictionary subclasses, but may be used by other classes as long as they
+ implement :meth:`__setitem__`.
+
+It is sometimes useful to know the protocol version when implementing
+:meth:`__reduce__`. This can be done by implementing a method named
+:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when
+it exists, is called in preference over :meth:`__reduce__` (you may still
+provide :meth:`__reduce__` for backwards compatibility). The
+:meth:`__reduce_ex__` method will be called with a single integer argument, the
+protocol version.
+
+The :class:`object` class implements both :meth:`__reduce__` and
+:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but
+not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this
+and calls :meth:`__reduce__`.
+
+An alternative to implementing a :meth:`__reduce__` method on the object to be
+pickled, is to register the callable with the :mod:`copy_reg` module. This
+module provides a way for programs to register "reduction functions" and
+constructors for user-defined types. Reduction functions have the same
+semantics and interface as the :meth:`__reduce__` method described above, except
+that they are called with a single argument, the object to be pickled.
+
+The registered constructor is deemed a "safe constructor" for purposes of
+unpickling as described above.
+
+
+Pickling and unpickling external objects
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For the benefit of object persistence, the :mod:`pickle` module supports the
+notion of a reference to an object outside the pickled data stream. Such
+objects are referenced by a "persistent id", which is just an arbitrary string
+of printable ASCII characters. The resolution of such names is not defined by
+the :mod:`pickle` module; it will delegate this resolution to user defined
+functions on the pickler and unpickler. [#]_
+
+To define external persistent id resolution, you need to set the
+:attr:`persistent_id` attribute of the pickler object and the
+:attr:`persistent_load` attribute of the unpickler object.
+
+To pickle objects that have an external persistent id, the pickler must have a
+custom :func:`persistent_id` method that takes an object as an argument and
+returns either ``None`` or the persistent id for that object. When ``None`` is
+returned, the pickler simply pickles the object as normal. When a persistent id
+string is returned, the pickler will pickle that string, along with a marker so
+that the unpickler will recognize the string as a persistent id.
+
+To unpickle external objects, the unpickler must have a custom
+:func:`persistent_load` function that takes a persistent id string and returns
+the referenced object.
+
+Here's a silly example that *might* shed more light::
+
+ import pickle
+ from cStringIO import StringIO
+
+ src = StringIO()
+ p = pickle.Pickler(src)
+
+ def persistent_id(obj):
+ if hasattr(obj, 'x'):
+ return 'the value %d' % obj.x
+ else:
+ return None
+
+ p.persistent_id = persistent_id
+
+ class Integer:
+ def __init__(self, x):
+ self.x = x
+ def __str__(self):
+ return 'My name is integer %d' % self.x
+
+ i = Integer(7)
+ print i
+ p.dump(i)
+
+ datastream = src.getvalue()
+ print repr(datastream)
+ dst = StringIO(datastream)
+
+ up = pickle.Unpickler(dst)
+
+ class FancyInteger(Integer):
+ def __str__(self):
+ return 'I am the integer %d' % self.x
+
+ def persistent_load(persid):
+ if persid.startswith('the value '):
+ value = int(persid.split()[2])
+ return FancyInteger(value)
+ else:
+ raise pickle.UnpicklingError, 'Invalid persistent id'
+
+ up.persistent_load = persistent_load
+
+ j = up.load()
+ print j
+
+In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute
+can also be set to a Python list, in which case, when the unpickler reaches a
+persistent id, the persistent id string will simply be appended to this list.
+This functionality exists so that a pickle data stream can be "sniffed" for
+object references without actually instantiating all the objects in a pickle.
+[#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction
+with the :meth:`noload` method on the Unpickler.
+
+.. % BAW: Both pickle and cPickle support something called
+.. % inst_persistent_id() which appears to give unknown types a second
+.. % shot at producing a persistent id. Since Jim Fulton can't remember
+.. % why it was added or what it's for, I'm leaving it undocumented.
+
+
+.. _pickle-sub:
+
+Subclassing Unpicklers
+----------------------
+
+By default, unpickling will import any class that it finds in the pickle data.
+You can control exactly what gets unpickled and what gets called by customizing
+your unpickler. Unfortunately, exactly how you do this is different depending
+on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
+
+In the :mod:`pickle` module, you need to derive a subclass from
+:class:`Unpickler`, overriding the :meth:`load_global` method.
+:meth:`load_global` should read two lines from the pickle data stream where the
+first line will the name of the module containing the class and the second line
+will be the name of the instance's class. It then looks up the class, possibly
+importing the module and digging out the attribute, then it appends what it
+finds to the unpickler's stack. Later on, this class will be assigned to the
+:attr:`__class__` attribute of an empty class, as a way of magically creating an
+instance without calling its class's :meth:`__init__`. Your job (should you
+choose to accept it), would be to have :meth:`load_global` push onto the
+unpickler's stack, a known safe version of any class you deem safe to unpickle.
+It is up to you to produce such a class. Or you could raise an error if you
+want to disallow all unpickling of instances. If this sounds like a hack,
+you're right. Refer to the source code to make this work.
+
+Things are a little cleaner with :mod:`cPickle`, but not by much. To control
+what gets unpickled, you can set the unpickler's :attr:`find_global` attribute
+to a function or ``None``. If it is ``None`` then any attempts to unpickle
+instances will raise an :exc:`UnpicklingError`. If it is a function, then it
+should accept a module name and a class name, and return the corresponding class
+object. It is responsible for looking up the class and performing any necessary
+imports, and it may raise an error to prevent instances of the class from being
+unpickled.
+
+The moral of the story is that you should be really careful about the source of
+the strings your application unpickles.
+
+
+.. _pickle-example:
+
+Example
+-------
+
+For the simplest code, use the :func:`dump` and :func:`load` functions. Note
+that a self-referencing list is pickled and restored correctly. ::
+
+ import pickle
+
+ data1 = {'a': [1, 2.0, 3, 4+6j],
+ 'b': ('string', u'Unicode string'),
+ 'c': None}
+
+ selfref_list = [1, 2, 3]
+ selfref_list.append(selfref_list)
+
+ output = open('data.pkl', 'wb')
+
+ # Pickle dictionary using protocol 0.
+ pickle.dump(data1, output)
+
+ # Pickle the list using the highest protocol available.
+ pickle.dump(selfref_list, output, -1)
+
+ output.close()
+
+The following example reads the resulting pickled data. When reading a
+pickle-containing file, you should open the file in binary mode because you
+can't be sure if the ASCII or binary format was used. ::
+
+ import pprint, pickle
+
+ pkl_file = open('data.pkl', 'rb')
+
+ data1 = pickle.load(pkl_file)
+ pprint.pprint(data1)
+
+ data2 = pickle.load(pkl_file)
+ pprint.pprint(data2)
+
+ pkl_file.close()
+
+Here's a larger example that shows how to modify pickling behavior for a class.
+The :class:`TextReader` class opens a text file, and returns the line number and
+line contents each time its :meth:`readline` method is called. If a
+:class:`TextReader` instance is pickled, all attributes *except* the file object
+member are saved. When the instance is unpickled, the file is reopened, and
+reading resumes from the last location. The :meth:`__setstate__` and
+:meth:`__getstate__` methods are used to implement this behavior. ::
+
+ #!/usr/local/bin/python
+
+ class TextReader:
+ """Print and number lines in a text file."""
+ def __init__(self, file):
+ self.file = file
+ self.fh = open(file)
+ self.lineno = 0
+
+ def readline(self):
+ self.lineno = self.lineno + 1
+ line = self.fh.readline()
+ if not line:
+ return None
+ if line.endswith("\n"):
+ line = line[:-1]
+ return "%d: %s" % (self.lineno, line)
+
+ def __getstate__(self):
+ odict = self.__dict__.copy() # copy the dict since we change it
+ del odict['fh'] # remove filehandle entry
+ return odict
+
+ def __setstate__(self, dict):
+ fh = open(dict['file']) # reopen file
+ count = dict['lineno'] # read from file...
+ while count: # until line count is restored
+ fh.readline()
+ count = count - 1
+ self.__dict__.update(dict) # update attributes
+ self.fh = fh # save the file object
+
+A sample usage might be something like this::
+
+ >>> import TextReader
+ >>> obj = TextReader.TextReader("TextReader.py")
+ >>> obj.readline()
+ '1: #!/usr/local/bin/python'
+ >>> obj.readline()
+ '2: '
+ >>> obj.readline()
+ '3: class TextReader:'
+ >>> import pickle
+ >>> pickle.dump(obj, open('save.p', 'wb'))
+
+If you want to see that :mod:`pickle` works across Python processes, start
+another Python session, before continuing. What follows can happen from either
+the same process or a new process. ::
+
+ >>> import pickle
+ >>> reader = pickle.load(open('save.p', 'rb'))
+ >>> reader.readline()
+ '4: """Print and number lines in a text file."""'
+
+
+.. seealso::
+
+ Module :mod:`copy_reg`
+ Pickle interface constructor registration for extension types.
+
+ Module :mod:`shelve`
+ Indexed databases of objects; uses :mod:`pickle`.
+
+ Module :mod:`copy`
+ Shallow and deep object copying.
+
+ Module :mod:`marshal`
+ High-performance serialization of built-in types.
+
+
+:mod:`cPickle` --- A faster :mod:`pickle`
+=========================================
+
+.. module:: cPickle
+ :synopsis: Faster version of pickle, but not subclassable.
+.. moduleauthor:: Jim Fulton <jim@zope.com>
+.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
+
+
+.. index:: module: pickle
+
+The :mod:`cPickle` module supports serialization and de-serialization of Python
+objects, providing an interface and functionality nearly identical to the
+:mod:`pickle` module. There are several differences, the most important being
+performance and subclassability.
+
+First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
+the former is implemented in C. Second, in the :mod:`cPickle` module the
+callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
+This means that you cannot use them to derive custom pickling and unpickling
+subclasses. Most applications have no need for this functionality and should
+benefit from the greatly improved performance of the :mod:`cPickle` module.
+
+The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
+identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
+interchangeably with existing pickles. [#]_
+
+There are additional minor differences in API between :mod:`cPickle` and
+:mod:`pickle`, however for most applications, they are interchangeable. More
+documentation is provided in the :mod:`pickle` module documentation, which
+includes a list of the documented differences.
+
+.. rubric:: Footnotes
+
+.. [#] Don't confuse this with the :mod:`marshal` module
+
+.. [#] In the :mod:`pickle` module these callables are classes, which you could
+ subclass to customize the behavior. However, in the :mod:`cPickle` module these
+ callables are factory functions and so cannot be subclassed. One common reason
+ to subclass is to control what objects can actually be unpickled. See section
+ :ref:`pickle-sub` for more details.
+
+.. [#] *Warning*: this is intended for pickling multiple objects without intervening
+ modifications to the objects or their parts. If you modify an object and then
+ pickle it again using the same :class:`Pickler` instance, the object is not
+ pickled again --- a reference to it is pickled and the :class:`Unpickler` will
+ return the old value, not the modified one. There are two problems here: (1)
+ detecting changes, and (2) marshalling a minimal set of changes. Garbage
+ Collection may also become a problem here.
+
+.. [#] The exception raised will likely be an :exc:`ImportError` or an
+ :exc:`AttributeError` but it could be something else.
+
+.. [#] These methods can also be used to implement copying class instances.
+
+.. [#] This protocol is also used by the shallow and deep copying operations defined in
+ the :mod:`copy` module.
+
+.. [#] The actual mechanism for associating these user defined functions is slightly
+ different for :mod:`pickle` and :mod:`cPickle`. The description given here
+ works the same for both implementations. Users of the :mod:`pickle` module
+ could also use subclassing to effect the same results, overriding the
+ :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
+ classes.
+
+.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
+ in their living rooms.
+
+.. [#] A word of caution: the mechanisms described here use internal attributes and
+ methods, which are subject to change in future versions of Python. We intend to
+ someday provide a common interface for controlling this behavior, which will
+ work in either :mod:`pickle` or :mod:`cPickle`.
+
+.. [#] Since the pickle data format is actually a tiny stack-oriented programming
+ language, and some freedom is taken in the encodings of certain objects, it is
+ possible that the two modules produce different data streams for the same input
+ objects. However it is guaranteed that they will always be able to read each
+ other's data streams.
+