summaryrefslogtreecommitdiff
path: root/index.rst
diff options
context:
space:
mode:
authorBob Ippolito <bob@redivi.com>2014-08-09 21:08:39 -0700
committerBob Ippolito <bob@redivi.com>2014-08-09 21:08:39 -0700
commitbaf6d6445c289f5d7eb5dbb14de65de48fbf56ad (patch)
treed35bb235941ce0f5bb6f9c9bdc5ed59191e57505 /index.rst
parent0ffb14c0b466891821d219175b1a463a7f81dfe1 (diff)
downloadsimplejson-baf6d6445c289f5d7eb5dbb14de65de48fbf56ad.tar.gz
doc updates from http://bugs.python.org/issue21514 re: #102v3.6.2
Diffstat (limited to 'index.rst')
-rw-r--r--index.rst92
1 files changed, 76 insertions, 16 deletions
diff --git a/index.rst b/index.rst
index a959d0a..b0242cc 100644
--- a/index.rst
+++ b/index.rst
@@ -7,9 +7,11 @@
.. sectionauthor:: Bob Ippolito <bob@redivi.com>
`JSON (JavaScript Object Notation) <http://json.org>`_, specified by
-:rfc:`4627`, is a lightweight data interchange format based on a subset of
-`JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ syntax (`ECMA-262 3rd
-edition <http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf>`_).
+:rfc:`7159` (which obsoletes :rfc:`4627`) and by
+`ECMA-404 <http://www.ecma-international.org/publications/standards/Ecma-404.htm>`_,
+is a lightweight data interchange format inspired by
+`JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ object literal syntax
+(although it is not a strict subset of JavaScript [#rfc-errata]_ ).
:mod:`simplejson` exposes an API familiar to users of the standard library
:mod:`marshal` and :mod:`pickle` modules. It is the externally maintained
@@ -286,7 +288,7 @@ Basic Usage
.. versionchanged:: 3.5.0
*int_as_string_bitcount* is new in 3.5.0.
- .. note::
+ .. note::
JSON is not a framed protocol so unlike :mod:`pickle` or :mod:`marshal` it
does not make sense to serialize more than one JSON document without some
@@ -777,18 +779,18 @@ Exceptions
The column corresponding to end (may be ``None``)
-Standard Compliance
--------------------
+Standard Compliance and Interoperability
+----------------------------------------
-The JSON format is specified by :rfc:`4627`. This section details this
-module's level of compliance with the RFC. For simplicity,
-:class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and parameters other
-than those explicitly mentioned, are not considered.
+The JSON format is specified by :rfc:`7159` and by
+`ECMA-404 <http://www.ecma-international.org/publications/standards/Ecma-404.htm>`_.
+This section details this module's level of compliance with the RFC.
+For simplicity, :class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and
+parameters other than those explicitly mentioned, are not considered.
This module does not comply with the RFC in a strict fashion, implementing some
extensions that are valid JavaScript but not valid JSON. In particular:
-- Top-level non-object, non-array values are accepted and output;
- Infinite and NaN number values are accepted and output;
- Repeated names within an object are accepted, and only the value of the last
name-value pair is used.
@@ -802,7 +804,7 @@ Character Encodings
^^^^^^^^^^^^^^^^^^^
The RFC recommends that JSON be represented using either UTF-8, UTF-16, or
-UTF-32, with UTF-8 being the default.
+UTF-32, with UTF-8 being the recommended default for maximum interoperability.
As permitted, though not required, by the RFC, this module's serializer sets
*ensure_ascii=True* by default, thus escaping the output so that the resulting
@@ -810,9 +812,23 @@ strings only contain ASCII characters.
Other than the *ensure_ascii* parameter, this module is defined strictly in
terms of conversion between Python objects and
-:class:`Unicode strings <str>`, and thus does not otherwise address the issue
-of character encodings.
-
+:class:`Unicode strings <str>`, and thus does not otherwise directly address
+the issue of character encodings.
+
+The RFC prohibits adding a byte order mark (BOM) to the start of a JSON text,
+and this module's serializer does not add a BOM to its output.
+The RFC permits, but does not require, JSON deserializers to ignore an initial
+BOM in their input. This module's deserializer will ignore an initial BOM, if
+present.
+
+.. versionchanged:: 3.6.0
+ Older versions would raise :exc:`ValueError` when an initial BOM is present
+
+The RFC does not explicitly forbid JSON strings which contain byte sequences
+that don't correspond to valid Unicode characters (e.g. unpaired UTF-16
+surrogates), but it does note that they may cause interoperability problems.
+By default, this module accepts and outputs (when present in the original
+:class:`str`) codepoints for such sequences.
Infinite and NaN Number Values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -841,7 +857,7 @@ Repeated Names Within an Object
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The RFC specifies that the names within a JSON object should be unique, but
-does not specify how repeated names in JSON objects should be handled. By
+does not mandate how repeated names in JSON objects should be handled. By
default, this module does not raise an exception; instead, it ignores all but
the last name-value pair for a given name::
@@ -851,6 +867,42 @@ the last name-value pair for a given name::
The *object_pairs_hook* parameter can be used to alter this behavior.
+
+Top-level Non-Object, Non-Array Values
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The old version of JSON specified by the obsolete :rfc:`4627` required that
+the top-level value of a JSON text must be either a JSON object or array
+(Python :class:`dict` or :class:`list`), and could not be a JSON null,
+boolean, number, or string value. :rfc:`7159` removed that restriction, and
+this module does not and has never implemented that restriction in either its
+serializer or its deserializer.
+
+Regardless, for maximum interoperability, you may wish to voluntarily adhere
+to the restriction yourself.
+
+
+Implementation Limitations
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some JSON deserializer implementations may set limits on:
+
+* the size of accepted JSON texts
+* the maximum level of nesting of JSON objects and arrays
+* the range and precision of JSON numbers
+* the content and maximum length of JSON strings
+
+This module does not impose any such limits beyond those of the relevant
+Python datatypes themselves or the Python interpreter itself.
+
+When serializing to JSON, beware any such limitations in applications that may
+consume your JSON. In particular, it is common for JSON numbers to be
+deserialized into IEEE 754 double precision numbers and thus subject to that
+representation's range and precision limitations. This is especially relevant
+when serializing Python :class:`int` values of extremely large magnitude, or
+when serializing instances of "exotic" numerical types such as
+:class:`decimal.Decimal`.
+
.. highlight:: bash
.. _json-commandline:
@@ -897,3 +949,11 @@ Command line options
Write the output of the *infile* to the given *outfile*. Otherwise, write it
to :attr:`sys.stdout`.
+
+.. rubric:: Footnotes
+
+.. [#rfc-errata] As noted in `the errata for RFC 7159
+ <http://www.rfc-editor.org/errata_search.php?rfc=7159>`_,
+ JSON permits literal U+2028 (LINE SEPARATOR) and
+ U+2029 (PARAGRAPH SEPARATOR) characters in strings, whereas JavaScript
+ (as of ECMAScript Edition 5.1) does not.