Unicode
=======

Python 2.x
----------

Under Python 2, the string type can hold arbitrary encoded byte strings.
PycURL will pass whatever byte strings it is given verbatim to libcurl.

If your application works with encoded byte strings, you should be able to
pass them to PycURL. If your application works with Unicode data, you need to
encode the data to byte strings yourself. Which encoding to use depends on
the protocol you are working with - HTTP headers should be encoded in latin1,
HTTP request bodies are commonly encoded in utf-8 and their encoding is
specified in the Content-Type header value.

Prior to PycURL 7.19.3, PycURL did not accept Unicode data under Python 2.
Even Unicode strings containing only ASCII code points had to be encoded to
byte strings.

As of PycURL 7.19.3, for compatibility with Python 3, PycURL will accept
Unicode strings under Python 2 provided they contain ASCII code points only.
In other words, PycURL will encode Unicode into ASCII for you. If you supply
a Unicode string containing characters that are outside of ASCII, the call will
fail with a UnicodeEncodeError.

PycURL will return data from libcurl, like request bodies and header values,
as byte strings. If the data is ASCII, you can treat it as string data.
Otherwise you will need to decode the byte strings usisng the correct encoding.
What encoding is correct depends on the protocol and potentially returned
data itself - HTTP response headers are supposed to be latin1 encoded but
encoding of response body is specified in the Content-Type header.

Python 3.x (from PycURL 7.19.3 onward)
--------------------------------------

Under Python 3, the rules are as follows:

PycURL will accept bytes for any string data passed to libcurl (e.g.
arguments to curl_easy_setopt).

PycURL will accept (Unicode) strings for string arguments to curl_easy_setopt.
libcurl generally expects the options to be already appropriately encoded
and escaped (e.g. for CURLOPT_URL). Also, Python 2 code dealing with
Unicode values for string options will typically manually encode such values.
Therefore PycURL will attempt to encode Unicode strings with the ascii codec
only, allowing the application to pass ASCII data in a straightforward manner
but requiring Unicode data to be appropriately encoded.

It may be helpful to remember that libcurl operates on byte arrays.
It is a C library and does not do any Unicode encoding or decoding, offloading
that task on the application using it. PycURL, being a thin wrapper around
libcurl, passes the Unicode encoding and decoding responsibilities to you
except for the trivial case of encoding Unicode data containing only ASCII
characters into ASCII.

Caution: when using CURLOPT_READFUNCTION in tandem with CURLOPT_POSTFIELDSIZE,
as would be done for HTTP for example, take care to pass the length of
encoded data to CURLOPT_POSTFIELDSIZE if you are doing the encoding from
Unicode strings. If you pass the number of Unicode characters rather than
encoded bytes to libcurl, the server will receive wrong Content-Length.
Alternatively you can return Unicode strings from a CURLOPT_READFUNCTION
function, if you are certain they will only contain ASCII code points.

If encoding to ASCII fails, PycURL will return an error to libcurl, and
libcurl in turn will fail the request with an exception like
"read function error/data error". You may examine sys.last_value for
information on exception that occurred during encoding in this case.

PycURL will return all data read from the network as bytes. In particular,
this means that BytesIO should be used rather than StringIO for writing the
response to memory. Header function will also receive bytes.

Because PycURL does not perform encoding or decoding, other than to ASCII,
any file objects that PycURL is meant to interact with via CURLOPT_READDATA,
CURLOPT_WRITEDATA, CURLOPT_WRITEHEADER, CURLOPT_READFUNCTION,
CURLOPT_WRITEFUNCTION or CURLOPT_HEADERFUNCTION must be opened in binary
mode ("b" flag to open() call).

Python 3.x before PycURL 7.19.3
-------------------------------

PycURL did not have official Python 3 support prior to PycURL 7.19.3.
There were two patches on SourceForge (original_, revised_)
adding Python 3 support, but they did not handle Unicode strings correctly.
Instead of using Python encoding functionality, these patches used
C standard library unicode to multibyte conversion functions, and thus
they can have the same behavior as Python encoding code or behave
entirely differently.

Python 3 support as implemented in PycURL 7.19.3 and documented here
does not, as mentioned, actually perform any encoding other than to convert
from Unicode strings containing ASCII-only bytes to ASCII byte strings.

Linux distributions that offered Python 3 packages of PycURL prior to 7.19.3
used SourceForge patches and may behave in ways contradictory to what is
described in this document.

.. _original: http://sourceforge.net/p/pycurl/patches/5/
.. _revised: http://sourceforge.net/p/pycurl/patches/12/