1 files changed, 141 insertions, 137 deletions
diff --git a/docs/unicode.txt b/docs/unicode.txt
index 01fff198c1..2313c28c98 100644
--- a/docs/unicode.txt
+++ b/docs/unicode.txt
@@ -8,24 +8,24 @@ Django natively supports Unicode data everywhere. Providing your database can
 somehow store the data, you can safely pass around Unicode strings to
 templates, models and the database.
 
-This files describes some things to be aware of if you are writing applications
-which do not only use ASCII-encoded data.
+This document tells you what you need to know if you're writing applications
+that use data or templates that are encoded in something other than ASCII.
 
 Creating the database
 =====================
+
 Make sure your database is configured to be able to store arbitrary string
 data. Normally, this means giving it an encoding of UTF-8 or UTF-16. If you use
-a more restrictive encoding -- for example, latin1 (iso8859-1) -- there will be
-some characters that you cannot store in the database and information will be
-lost.
+a more restrictive encoding -- for example, latin1 (iso8859-1) -- you won't be
+able to store certain characters in the database, and information will be lost.
 
- * For MySQL users, refer to the `MySQL manual`_ (section 10.3.2 for MySQL 5.1)
-   for details on how to set or alter the database character set encoding.
+ * MySQL users, refer to the `MySQL manual`_ (section 10.3.2 for MySQL 5.1) for
+   details on how to set or alter the database character set encoding.
 
- * For PostgreSQL users, refer to the `PostgreSQL manual`_ (section 21.2.2 in
+ * PostgreSQL users, refer to the `PostgreSQL manual`_ (section 21.2.2 in
    PostgreSQL 8) for details on creating databases with the correct encoding.
 
- * For SQLite users, there is nothing you need to do. SQLite always uses UTF-8
+ * SQLite users, there is nothing you need to do. SQLite always uses UTF-8
    for internal encoding.
 
 .. _MySQL manual: http://www.mysql.org/doc/refman/5.1/en/charset-database.html
@@ -37,119 +37,119 @@ convert strings retrieved from the database into Python Unicode strings. You
 don't even need to tell Django what encoding your database uses: that is
 handled transparently.
 
+For more, see the section "The database API" below.
+
 General string handling
 =======================
 
-Whenever you use strings with Django, you have two choices. You can use Unicode
-strings or you can use normal strings (sometimes called bytestrings) that are
-encoded using UTF-8.
+Whenever you use strings with Django -- e.g., in database lookups, template
+rendering or anywhere else -- you have two choices for encoding those strings.
+You can use Unicode strings, or you can use normal strings (sometimes called
+"bytestrings") that are encoded using UTF-8.
 
 .. warning::
-    A bytestring does not carry any information with it about its encoding. So
-    we have to make an assumption and Django assumes that all bytestrings are
-    in UTF-8. If you pass a string to Django that has been encoded in some
-    other format, things will go wrong in interesting ways. Usually Django will
-    raise a UnicodeDecodeError at some point.
-
-If your code only uses ASCII data, you are quite safe to simply use your normal
-strings (since ASCII is a subset of UTF-8) and pass them around at will.
-
-Do not be fooled into thinking that if your ``DEFAULT_CHARSET`` setting is set
-to something other than ``utf-8`` you can use that encoding in your
-bytestrings!  The ``DEFAULT_CHARSET`` only applies to the strings generated as
-the result of template rendering (and email). Django will always assume UTF-8
+    A bytestring does not carry any information with it about its encoding.
+    For that reason, we have to make an assumption, and Django assumes that all
+    bytestrings are in UTF-8.
+
+    If you pass a string to Django that has been encoded in some other format,
+    things will go wrong in interesting ways. Usually, Django will raise a
+    ``UnicodeDecodeError`` at some point.
+
+If your code only uses ASCII data, it's safe to use your normal strings,
+passing them around at will, because ASCII is a subset of UTF-8.
+
+Don't be fooled into thinking that if your ``DEFAULT_CHARSET`` setting is set
+to something other than ``'utf-8'`` you can use that other encoding in your
+bytestrings! ``DEFAULT_CHARSET`` only applies to the strings generated as
+the result of template rendering (and e-mail). Django will always assume UTF-8
 encoding for internal bytestrings. The reason for this is that the
 ``DEFAULT_CHARSET`` setting is not actually under your control (if you are the
-application developer). It is under the control of the person installing and
-using your application and if they choose a different setting, your code must
-still continue to work. Ergo, it cannot rely on that setting.
+application developer). It's under the control of the person installing and
+using your application -- and if that person chooses a different setting, your
+code must still continue to work. Ergo, it cannot rely on that setting.
 
 In most cases when Django is dealing with strings, it will convert them to
-Unicode strings before doing anything else. So if you pass in a bytestring, be
-prepared to receive a Unicode string back in the result.
-
-.. _lazy translation:
+Unicode strings before doing anything else. So, as a general rule, if you pass
+in a bytestring, be prepared to receive a Unicode string back in the result.
 
 Translated strings
 ------------------
 
-There is actually a third type of string-like object you may encounter when
-using Django. If you are using the internationalization features of Django,
-there is the concept of a "lazy translation". This is a string that has been
-marked as translated, but the actual result is not determined until the object
-is used in a string. This is useful because the locale that should be used for
-the translation will not be known until the string is used, even though the
-string might have originally been created when the code was first imported.
+Aside from Unicode strings and bytestrings, there's a third type of string-like
+object you may encounter when using Django. The framework's
+internationalization features introduce the concept of a "lazy translation" --
+a string that has been marked as translated but whose actual translation result
+isn't determined until the object is used in a string. This feature is useful
+in cases where the translation locale is unknown until the string is used, even
+though the string might have originally been created when the code was first
+imported.
 
 Normally, you won't have to worry about lazy translations. Just be aware that
 if you examine an object and it claims to be a
 ``django.utils.functional.__proxy__`` object, it is a lazy translation.
-Calling ``unicode()`` with the translation as the argument will generate a
-string in the current locale.
+Calling ``unicode()`` with the lazy translation as the argument will generate a
+Unicode string in the current locale.
 
 For more details about lazy translation objects, refer to the
 internationalization_ documentation.
 
 .. _internationalization: ../i18n/#lazy-translation
 
-.. _utility functions:
-
 Useful utility functions
 ------------------------
 
-Since some string operations come up again and again, Django ships with a few
-useful functions that should make working with unicode and bytestring objects
+Because some string operations come up again and again, Django ships with a few
+useful functions that should make working with Unicode and bytestring objects
 a bit easier.
 
 Conversion functions
 ~~~~~~~~~~~~~~~~~~~~
 
 The ``django.utils.encoding`` module contains a few functions that are handy
-for converting back and forth between unicode and bytestrings.
+for converting back and forth between Unicode and bytestrings.
 
     * ``smart_unicode(s, encoding='utf-8', errors='strict')`` converts its
-      input to unicode string. The ``encoding`` parameter specifies the input
-      encoding of any bytestring -- Django uses this internally when
-      processing form input data, for example, which might not be UTF-8
-      encoded. The ``errors`` parameter takes any of the values that are
-      accepted by Python's ``unicode()`` function for its error handling.
+      input to a Unicode string. The ``encoding`` parameter specifies the input
+      encoding. (For example, Django uses this internally when processing form
+      input data, which might not be UTF-8 encoded.) The ``errors`` parameter
+      takes any of the values that are accepted by Python's ``unicode()``
+      function for its error handling.
 
       If you pass ``smart_unicode()`` an object that has a ``__unicode__``
       method, it will use that method to do the conversion.
 
     * ``force_unicode(s, encoding='utf-8', errors='strict')`` is identical to
       ``smart_unicode()`` in almost all cases. The difference is when the
-      first argument is a `lazy translation`_ instance. Whilst
+      first argument is a `lazy translation`_ instance. While
       ``smart_unicode()`` preserves lazy translations, ``force_unicode()``
-      forces those objects to a unicode string (causing the translation to
-      occur). Normally, you will want to use ``smart_unicode()``. However,
-      ``force_unicode()`` is useful in filters and template tags when you
-      absolutely must have a string to work with, not just something that can
+      forces those objects to a Unicode string (causing the translation to
+      occur). Normally, you'll want to use ``smart_unicode()``. However,
+      ``force_unicode()`` is useful in template tags and filters that
+      absolutely *must* have a string to work with, not just something that can
       be converted to a string.
 
     * ``smart_str(s, encoding='utf-8', strings_only=False, errors='strict')``
       is essentially the opposite of ``smart_unicode()``. It forces the first
-      argument to a string. The ``strings_only`` parameter, if set to True,
+      argument to a bytestring. The ``strings_only`` parameter, if set to True,
       will result in Python integers, booleans and ``None`` not being
       converted to a string (they keep their original types). This is slightly
       different semantics from Python's builtin ``str()`` function, but the
-      difference is needed in a few places internally.
+      difference is needed in a few places within Django's internals.
 
-Normally, you will only need to use ``smart_unicode()``. Call it as early as
-possible on any input data that might be either a unicode or bytestring and
-from then on you can treat the result as always being unicode.
-
-.. _uri_and_iri:
+Normally, you'll only need to use ``smart_unicode()``. Call it as early as
+possible on any input data that might be either Unicode or a bytestring, and
+from then on, you can treat the result as always being Unicode.
 
 URI and IRI handling
 ~~~~~~~~~~~~~~~~~~~~
 
 Web frameworks have to deal with URLs (which are a type of URI_). One
 requirement of URLs is that they are encoded using only ASCII characters.
-However, in an international environment, you will often need to construct a
-URL from an IRI_ (very loosely speaking, a URI that can contain unicode
-characters). Getting the quoting and conversion from IRI to URI correct can be
-a little tricky, so Django provides some assistance.
+However, in an international environment, you might need to construct a
+URL from an IRI_ -- very loosely speaking, a URI that can contain Unicode
+characters. Quoting and converting an IRI to URI can be a little tricky, so
+Django provides some assistance.
 
     * The function ``django.utils.encoding.iri_to_uri()`` implements the
       conversion from IRI to URI as required by the specification (`RFC
@@ -158,9 +158,9 @@ a little tricky, so Django provides some assistance.
     * The functions ``django.utils.http.urlquote()`` and
       ``django.utils.http.urlquote_plus()`` are versions of Python's standard
       ``urllib.quote()`` and ``urllib.quote_plus()`` that work with non-ASCII
-      characters (the data is converted to UTF-8 prior to encoding).
+      characters. (The data is converted to UTF-8 prior to encoding.)
 
-These two groups of functions have slightly different purposes and it is
+These two groups of functions have slightly different purposes, and it's
 important to keep them straight. Normally, you would use ``urlquote()`` on the
 individual portions of the IRI or URI path so that any reserved characters
 such as '&' or '%' are correctly encoded. Then, you apply ``iri_to_uri()`` to
@@ -168,10 +168,9 @@ the full IRI and it converts any non-ASCII characters to the correct encoded
 values.
 
 .. note::
-    It isn't completely correct to say that ``iri_to_uri()`` implements the
-    full algorithm in the IRI specification. It does not perform the
-    international domain name encoding portion of the algorithm (at the
-    moment).
+    Technically, it isn't correct to say that ``iri_to_uri()`` implements the
+    full algorithm in the IRI specification. It doesn't (yet) perform the
+    international domain name encoding portion of the algorithm.
 
 The ``iri_to_uri()`` function will not change ASCII characters that are
 otherwise permitted in a URL. So, for example, the character '%' is not
@@ -208,45 +207,46 @@ double-quoting problems.
 Models
 ======
 
-Because all strings are returned from the database as unicode strings, model
+Because all strings are returned from the database as Unicode strings, model
 fields that are character based (CharField, TextField, URLField, etc) will
-contain unicode values when Django retrieves the model from the database. This
-is always the case, even if the data could fit into an ASCII string.
+contain Unicode values when Django retrieves data from the database. This
+is *always* the case, even if the data could fit into an ASCII bytestring.
 
-As always, you can pass in bytestrings when creating a model or populating a
-field and Django will convert it to unicode when it needs to.
+You can pass in bytestrings when creating a model or populating a field, and
+Django will convert it to Unicode when it needs to.
 
 Choosing between ``__str__()`` and ``__unicode__()``
------------------------------------------------------
-
-One consequence of using unicode by default is that you have to take some care
-when printing data from the model. In particular, rather than writing a
-``__str__()`` method, it is recommended to write a ``__unicode__()`` method for
-your model. In the ``__unicode__()`` method, you can quite safely return the
-values of all your fields without having to worry about whether they fit into a
-bytestring or not (the result of ``__str__()`` is *always* a bytestring, even
-if you accidentally try to return a unicode object).
-
-You can still create a ``__str__()`` method on your models if you wish, of
-course. However, Django's ``Model`` base class automatically provides you with
-a ``__str__()`` method that calls your ``__unicode__()`` method and then
-encodes the result correctly into UTF-8. So you would normally only create a
-``__unicode__()`` method and let Django handle the coercion to a bytestring
-when required.
+----------------------------------------------------
+
+One consequence of using Unicode by default is that you have to take some care
+when printing data from the model.
+
+In particular, rather than giving your model a ``__str__()`` method, we
+recommended you implement a ``__unicode__()`` method. In the ``__unicode__()``
+method, you can quite safely return the values of all your fields without
+having to worry about whether they fit into a bytestring or not. (The way
+Python works, the result of ``__str__()`` is *always* a bytestring, even if you
+accidentally try to return a Unicode object).
+
+You can still create a ``__str__()`` method on your models if you want, of
+course, but you shouldn't need to do this unless you have a good reason.
+Django's ``Model`` base class automatically provides a ``__str__()``
+implementation that calls ``__unicode__()`` and encodes the result into UTF-8.
+This means you'll normally only need to implement a ``__unicode__()`` method
+and let Django handle the coercion to a bytestring when required.
 
 Taking care in ``get_absolute_url()``
 -------------------------------------
 
-URLs can only contain ASCII characters. If you are constructing a URL from
-pieces of data that might be non-ASCII, you must be careful to encode the
-results in a way that is suitable for a URL. If you are using the
-``django.db.models.permalink()`` decorator, this is handled automatically by
-the decorator.
+URLs can only contain ASCII characters. If you're constructing a URL from
+pieces of data that might be non-ASCII, be careful to encode the results in a
+way that is suitable for a URL. The ``django.db.models.permalink()`` decorator
+handles this for you automatically.
 
-If you are constructing the URL manually, you need to take care of the
-encoding yourself. Normally, this would involve a combination of the
-``iri_to_uri()`` and ``urlquote()`` functions that were documented above_. For
-example::
+If you're constructing a URL manually (i.e., *not* using the ``permalink()``
+decorator), you'll need to take care of the encoding yourself. In this case,
+use the ``iri_to_uri()`` and ``urlquote()`` functions that were documented
+above_. For example::
 
     from django.utils.encoding import iri_to_uri
     from django.utils.http import urlquote
@@ -265,28 +265,31 @@ non-ASCII characters would have been removed in quoting in the first line.)
 The database API
 ================
 
-You can happily pass unicode strings or bytestrings as arguments to
+You can pass either Unicode strings or UTF-8 bytestrings as arguments to
 ``filter()`` methods and the like in the database API. The following two
 querysets are identical::
 
     qs = People.objects.filter(name__contains=u'Å')
     qs = People.objects.filter(name__contains='\xc3\85') # UTF-8 encoding of Å
 
-
 Templates
 =========
 
-As usual, templates can be created from unicode or bytestrings. However, they
-can also be created by reading a file from disk and this creates a slight
-complication: not all filesystems store their data encoded as UTF-8. If your
-template files are not stored with a UTF-8 encoding, set the ``FILE_CHARSET``
-setting to the encoding of the on-disk files. When Django reads in a template
-file it will convert the data from this encoding to unicode.
+You can use either Unicode or bytestrings when creating templates manually::
+
+	from django.template import Template
+	t1 = Template('This is a bytestring template.')
+	t2 = Template(u'This is a Unicode template.')
 
-When a template is rendered for sending out as an HTML document or an e-mail,
-it may be convenient to use an encoding other than UTF-8. You should set the
-``DEFAULT_CHARSET`` parameter to control the rendered template encoding (the
-default setting is utf-8).
+But the common case is to read templates from the filesystem, and this creates
+a slight complication: not all filesystems store their data encoded as UTF-8.
+If your template files are not stored with a UTF-8 encoding, set the ``FILE_CHARSET``
+setting to the encoding of the files on disk. When Django reads in a template
+file, it will convert the data from this encoding to Unicode. (``FILE_CHARSET``
+is set to ``'utf-8'`` by default.)
+
+The ``DEFAULT_CHARSET`` setting controls the encoding of rendered templates.
+This is set to UTF-8 by default.
 
 Template tags and filters
 -------------------------
@@ -299,18 +302,20 @@ A couple of tips to remember when writing your own template tags and filters:
     * Use ``force_unicode()`` in preference to ``smart_unicode()`` in these
       places. Tag rendering and filter calls occur as the template is being
       rendered, so there is no advantage to postponing the conversion of lazy
-      transation objects into strings any longer. It is easier to work solely
-      with Unicode strings at this point.
+      translation objects into strings. It's easier to work solely with Unicode
+      strings at that point.
 
 E-mail
 ======
 
-Django's email framework (in ``django.core.mail``) supports unicode
-transparently. You can use unicode data in the message bodies and any headers.
-However, you must still respect the requirements of the email specifications,
-so, for example, email addresses should use ASCII characters. The following
-code is certainly possible (demonstrating the everything except e-mail
-addresses can be non-ASCII)::
+Django's e-mail framework (in ``django.core.mail``) supports Unicode
+transparently. You can use Unicode data in the message bodies and any headers.
+However, you're still obligated to respect the requirements of the e-mail
+specifications, so, for example, e-mail addresses should use only ASCII
+characters.
+
+The following code example demonstrates that everything except e-mail addresses
+can be non-ASCII::
 
     from django.core.mail import EmailMessage
 
@@ -320,19 +325,20 @@ addresses can be non-ASCII)::
     body = u'...'
     EmailMessage(subject, body, sender, recipients).send()
 
-
 Form submission
 ===============
 
-HTML form submission is a tricky area. There is no guarantee that the
-submission will include encoding information.
+HTML form submission is a tricky area. There's no guarantee that the
+submission will include encoding information, which means the framework might
+have to guess at the encoding of submitted data.
 
 Django adopts a "lazy" approach to decoding form data. The data in an
 ``HttpRequest`` object is only decoded when you access it. In fact, most of
 the data is not decoded at all. Only the ``HttpRequest.GET`` and
 ``HttpRequest.POST`` data structures have any decoding applied to them. Those
-two fields will return their members as unicode data. All other members will
-be returned exactly as they were submitted by the client.
+two fields will return their members as Unicode data. All other attributes and
+methods of ``HttpRequest`` return data exactly as it was submitted by the
+client.
 
 By default, the ``DEFAULT_CHARSET`` setting is used as the assumed encoding
 for form data. If you need to change this for a particular form, you can set
@@ -346,14 +352,12 @@ does this for you. For example::
         ...
 
 You can even change the encoding after having accessed ``request.GET`` or
-``request.POST`` and all subsequent accesses will use the new encoding.
-
-It will typically be very rare that you would need to worry about changing the
-form encoding. However, if you are talking to a legacy system or a system
-beyond your control with particular ideas about encoding, you do have a way to
-control the decoding of the data.
+``request.POST``, and all subsequent accesses will use the new encoding.
 
-For request features such as file uploads, no automatic decoding takes place,
-because those attributes are normally treated as collections of bytes, rather
-than strings. Any decoding would alter the meaning of the stream of bytes.
+Most developers won't need to worry about changing form encoding, but this is
+a useful feature for applications that talk to legacy systems whose encoding
+you cannot control.
 
+Django does not decode the data of file uploads, because that data is normally
+treated as collections of bytes, rather than strings. Any automatic decoding
+there would alter the meaning of the stream of bytes.