summaryrefslogtreecommitdiff
path: root/docs/user-guide.rst
blob: ec74bb1c8261e365fdc8d646e10623bc09133e14 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
User Guide
==========

.. currentmodule:: urllib3

Making requests
---------------

First things first, import the urllib3 module::

    >>> import urllib3

You'll need a :class:`~poolmanager.PoolManager` instance to make requests.
This object handles all of the details of connection pooling and thread safety
so that you don't have to::

    >>> http = urllib3.PoolManager()

To make a request use :meth:`~poolmanager.PoolManager.request`::

    >>> r = http.request('GET', 'http://httpbin.org/robots.txt')
    >>> r.data
    b'User-agent: *\nDisallow: /deny\n'

``request()`` returns a :class:`~response.HTTPResponse` object, the
:ref:`response_content` section explains how to handle various responses.

You can use :meth:`~poolmanager.PoolManager.request` to make requests using any
HTTP verb::

    >>> r = http.request(
    ...     'POST',
    ...     'http://httpbin.org/post',
    ...     fields={'hello': 'world'})

The :ref:`request_data` section covers sending other kinds of requests data,
including JSON, files, and binary data.

.. _response_content:

Response content
----------------

The :class:`~response.HTTPResponse` object provides
:attr:`~response.HTTPResponse.status`, :attr:`~response.HTTPResponse.data`, and
:attr:`~response.HTTPResponse.header` attributes::

    >>> r = http.request('GET', 'http://httpbin.org/ip')
    >>> r.status
    200
    >>> r.data
    b'{\n  "origin": "104.232.115.37"\n}\n'
    >>> r.headers
    HTTPHeaderDict({'Content-Length': '33', ...})

JSON content
~~~~~~~~~~~~

JSON content can be loaded by decoding and deserializing the
:attr:`~response.HTTPResponse.data` attribute of the request::

    >>> import json
    >>> r = http.request('GET', 'http://httpbin.org/ip')
    >>> json.loads(r.data.decode('utf-8'))
    {'origin': '127.0.0.1'}

Binary content
~~~~~~~~~~~~~~

The :attr:`~response.HTTPResponse.data` attribute of the response is always set
to a byte string representing the response content::

    >>> r = http.request('GET', 'http://httpbin.org/bytes/8')
    >>> r.data
    b'\xaa\xa5H?\x95\xe9\x9b\x11'

.. note:: For larger responses, it's sometimes better to :ref:`stream <stream>`
    the response.

.. _request_data:

Request data
------------

Headers
~~~~~~~

You can specify headers as a dictionary in the ``headers`` argument in :meth:`~poolmanager.PoolManager.request`::

    >>> r = http.request(
    ...     'GET',
    ...     'http://httpbin.org/headers',
    ...     headers={
    ...         'X-Something': 'value'
    ...     })
    >>> json.loads(r.data.decode('utf-8'))['headers']
    {'X-Something': 'value', ...}

Query parameters
~~~~~~~~~~~~~~~~

For ``GET``, ``HEAD``, and ``DELETE`` requests, you can simply pass the
arguments as a dictionary in the ``fields`` argument to
:meth:`~poolmanager.PoolManager.request`::

    >>> r = http.request(
    ...     'GET',
    ...     'http://httpbin.org/get',
    ...     fields={'arg': 'value'})
    >>> json.loads(r.data.decode('utf-8'))['args']
    {'arg': 'value'}

For ``POST`` and ``PUT`` requests, you need to manually encode query parameters
in the URL::

    >>> from urllib.parse import urlencode
    >>> encoded_args = urlencode({'arg': 'value'})
    >>> url = 'http://httpbin.org/post?' + encoded_args
    >>> r = http.request('POST', url)
    >>> json.loads(r.data.decode('utf-8'))['args']
    {'arg': 'value'}


.. _form_data:

Form data
~~~~~~~~~

For ``PUT`` and ``POST`` requests, urllib3 will automatically form-encode the
dictionary in the ``fields`` argument provided to
:meth:`~poolmanager.PoolManager.request`::

    >>> r = http.request(
    ...     'POST',
    ...     'http://httpbin.org/post',
    ...     fields={'field': 'value'})
    >>> json.loads(r.data.decode('utf-8'))['form']
    {'field': 'value'}

JSON
~~~~

You can sent JSON a request by specifying the encoded data as the ``body``
argument and setting the ``Content-Type`` header when calling
:meth:`~poolmanager.PoolManager.request`::

    >>> import json
    >>> data = {'attribute': 'value'}
    >>> encoded_data = json.dumps(data).encode('utf-8')
    >>> r = http.request(
    ...     'POST',
    ...     'http://httpbin.org/post',
    ...     body=encoded_data,
    ...     headers={'Content-Type': 'application/json'})
    >>> json.loads(r.data.decode('utf-8'))['json']
    {'attribute': 'value'}

Files & binary data
~~~~~~~~~~~~~~~~~~~

For uploading files using ``multipart/form-data`` encoding you can use the same
approach as :ref:`form_data` and specify the file field as a tuple of
``(file_name, file_data)``::

    >>> file_data = open('example.txt').read()
    >>> r = http.request(
    ...     'POST',
    ...     'http://httpbin.org/post',
    ...     fields={
    ...         'filefield': ('example.txt', file_data),
    ...     })
    >>> json.loads(r.data.decode('utf-8'))['files']
    {'filefield': '...'}

While specifying the filename is not strictly required, it's recommended in
order to match browser behavior. You can also pass a third item in the tuple
to specify the file's MIME type explicitly::

    >>> r = http.request(
    ...     'POST',
    ...     'http://httpbin.org/post',
    ...     fields={
    ...         'filefield': ('example.txt', file_data, 'text/plain'),
    ...     })

For sending raw binary data simply specify the ``body`` argument. It's also
recommended to set the ``Content-Type`` header::

    >>> binary_data = open('example.jpg', 'rb').read()
    >>> r = http.request(
    ...     'POST',
    ...     'http://httpbin.org/post',
    ...     body=binary_data,
    ...     headers={'Content-Type': 'image/jpeg'})
    >>> json.loads(r.data.decode('utf-8'))['data']
    b'...'

.. _ssl:

Certificate verification
------------------------

It is highly recommended to always use SSL certificate verification.
**By default, urllib3 does not verify HTTPS requests**.

In order to enable verification you will need root certificate. The easiest
and most reliable method is to use the `certifi <https://certifi.io/en/latest>`_ package which provides Mozilla's root certificate bundle::

    pip install certifi

You can also install certifi along with urllib3 by using the ``secure``
extra::

    pip install urllib3[secure]

.. warning:: If you're using Python 2 you may need additional packages. See the :ref:`section below <ssl_py2>` for more details.

Once you have certificates, you can create a :class:`~poolmanager.PoolManager`
that verifies certificates when making requests::

    >>> import certifi
    >>> import urllib3
    >>> http = urllib3.PoolManager(
    ...     cert_reqs='CERT_REQUIRED',
    ...     ca_certs=certifi.where())

The :class:`~poolmanager.PoolManager` will automatically handle certificate
verification and will raise :class:`~exceptions.SSLError` if verification fails::

    >>> http.request('GET', 'https://google.com')
    (No exception)
    >>> http.request('GET', 'https://expired.badssl.com')
    urllib3.exceptions.SSLError ...

.. note:: You can use OS-provided certificates if desired. Just specify the full
    path to the certificate bundle as the ``ca_certs`` argument instead of
    ``certifi.where()``. For example, most Linux systems store the certificates
    at ``/etc/ssl/certs/ca-certificates.crt``. Other operating systems can
    be `difficult <https://stackoverflow.com/questions/10095676/openssl-reasonable-default-for-trusted-ca-certificates>`_.

.. _ssl_py2:

Certificate verification in Python 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Older versions of Python 2 are built with an :mod:`ssl` module that lacks
:ref:`SNI support <sni_warning>` and can lag behind security updates. For these reasons it's recommended to use
`pyOpenSSL <https://pyopenssl.readthedocs.io/en/latest/>`_.

If you install urllib3 with the ``secure`` extra, all required packages for
certificate verification on Python 2 will be installed::

    pip install urllib3[secure]

If you want to install the packages manually, you will need ``pyOpenSSL``,
``cryptography``, ``idna``, and ``certifi``.

.. note:: If you are not using macOS or Windows, note that `cryptography
    <https://cryptography.io/en/latest/>`_ requires additional system packages
    to compile. See `building cryptography on Linux
    <https://cryptography.io/en/latest/installation/#building-cryptography-on-linux>`_
    for the list of packages required.

Once installed, you can tell urllib3 to use pyOpenSSL by using :mod:`urllib3.contrib.pyopenssl`::

    >>> import urllib3.contrib.pyopenssl
    >>> urllib3.contrib.pyopenssl.inject_into_urllib3()

Finally, you can create a :class:`~poolmanager.PoolManager` that verifies
certificates when performing requests::

    >>> import certifi
    >>> import urllib3
    >>> http = urllib3.PoolManager(
    ...     cert_reqs='CERT_REQUIRED',
    ...     ca_certs=certifi.where())

If you do not wish to use pyOpenSSL, you can simply omit the call to
:func:`urllib3.contrib.pyopenssl.inject_into_urllib3`. urllib3 will fall back
to the standard-library :mod:`ssl` module. You may experience
:ref:`several warnings <ssl_warnings>` when doing this.

.. warning:: If you do not use pyOpenSSL, Python must be compiled with ssl
    support for certificate verification to work. It is uncommon, but it is
    possible to compile Python without SSL support. See this
    `Stackoverflow thread <https://stackoverflow.com/questions/5128845/importerror-no-module-named-ssl>`_
    for more details.

    If you are on Google App Engine, you must explicitly enable SSL
    support in your ``app.yaml``::

        libraries:
        - name: ssl
          version: latest

Using timeouts
--------------

Timeouts allow you to control how long requests are allowed to run before
being aborted. In simple cases, you can specify a timeout as a ``float``
to :meth:`~poolmanager.PoolManager.request`::

    >>> http.request(
    ...     'GET', 'http://httpbin.org/delay/3', timeout=4.0)
    <urllib3.response.HTTPResponse>
    >>> http.request(
    ...     'GET', 'http://httpbin.org/delay/3', timeout=2.5)
    MaxRetryError caused by ReadTimeoutError

For more granular control you can use a :class:`~util.timeout.Timeout`
instance which lets you specify separate connect and read timeouts::

    >>> http.request(
    ...     'GET',
    ...     'http://httpbin.org/delay/3',
    ...     timeout=urllib3.Timeout(connect=1.0))
    <urllib3.response.HTTPResponse>
    >>> http.request(
    ...     'GET',
    ...     'http://httpbin.org/delay/3',
    ...     timeout=urllib3.Timeout(connect=1.0, read=2.0))
    MaxRetryError caused by ReadTimeoutError


If you want all requests to be subject to the same timeout, you can specify
the timeout at the :class:`~urllib3.poolmanager.PoolManager` level::

    >>> http = urllib3.PoolManager(timeout=3.0)
    >>> http = urllib3.PoolManager(
    ...     timeout=urllib3.Timeout(connect=1.0, read=2.0))

You still override this pool-level timeout by specifying ``timeout`` to
:meth:`~poolmanager.PoolManager.request`.

Retrying requests
-----------------

urllib3 can automatically retry idempotent requests. This same mechanism also
handles redirects. You can control the retries using the ``retries`` parameter
to :meth:`~poolmanager.PoolManager.request`. By default, urllib3 will retry
requests 3 times and follow up to 3 redirects.

To change the number of retries just specify an integer::

    >>> http.requests('GET', 'http://httpbin.org/ip', retries=10)

To disable all retry and redirect logic specify ``retries=False``::

    >>> http.request(
    ...     'GET', 'http://nxdomain.example.com', retries=False)
    NewConnectionError
    >>> r = http.request(
    ...     'GET', 'http://httpbin.org/redirect/1', retries=False)
    >>> r.status
    302

To disable redirects but keep the retrying logic, specify ``redirect=False``::

    >>> r = http.request(
    ...     'GET', 'http://httpbin.org/redirect/1', redirect=False)
    >>> r.status
    302

For more granular control you can use a :class:`~util.retry.Retry` instance.
This class allows you far greater control of how requests are retried.

For example, to do a total of 3 retries, but limit to only 2 redirects::

    >>> http.request(
    ...     'GET',
    ...     'http://httpbin.org/redirect/3',
    ...     retries=urllib3.Retries(3, redirect=2))
    MaxRetryError

You can also disable exceptions for too many redirects and just return the
``302`` response::

    >>> r = http.request(
    ...     'GET',
    ...     'http://httpbin.org/redirect/3',
    ...     retries=urllib3.Retries(
    ...         redirect=2, raise_on_redirect=False))
    >>> r.status
    302

If you want all requests to be subject to the same retry policy, you can
specify the retry at the :class:`~urllib3.poolmanager.PoolManager` level::

    >>> http = urllib3.PoolManager(retries=False)
    >>> http = urllib3.PoolManager(
    ...     retries=urllib3.Retry(5, redirect=2))

You still override this pool-level retry policy by specifying ``retries`` to
:meth:`~poolmanager.PoolManager.request`.

Errors & Exceptions
-------------------

urllib3 wraps lower-level exceptions, for example::

    >>> try:
    ...     http.request('GET', 'nx.example.com', retries=False)
    >>> except urllib3.exceptions.NewConnectionError:
    ...     print('Connection failed.')

See :mod:`~urllib3.exceptions` for the full list of all exceptions.

Logging
-------

If you are using the standard library :mod:`logging` module urllib3 will
emit several logs. In some cases this can be undesirable. You can use the
standard logger interface to change the log level for urllib3's logger::

    >>> logging.getLogger("urllib3").setLevel(logging.WARNING)