summaryrefslogtreecommitdiff
path: root/README.rst
blob: 0ad98984dc37ac8948a4c8bb605e9818a3660b39 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
natsort
=======

.. image:: https://travis-ci.org/SethMMorton/natsort.svg?branch=master
    :target: https://travis-ci.org/SethMMorton/natsort

.. image:: https://coveralls.io/repos/SethMMorton/natsort/badge.png?branch=master
    :target: https://coveralls.io/r/SethMMorton/natsort?branch=master

Natural sorting for python. 

    - Source Code: https://github.com/SethMMorton/natsort
    - Downloads: https://pypi.python.org/pypi/natsort
    - Documentation: http://pythonhosted.org/natsort
    - `Optional Dependencies`_

      - `fastnumbers <https://pypi.python.org/pypi/fastnumbers>`_ >= 0.7.1
      - `PyICU <https://pypi.python.org/pypi/PyICU>`_ >= 1.0.0

Quick Description
-----------------

When you try to sort a list of strings that contain numbers, the normal python
sort algorithm sorts lexicographically, so you might not get the results that you
expect:

.. code-block:: python

    >>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
    >>> sorted(a)
    ['a1', 'a10', 'a2', 'a4', 'a9']

Notice that it has the order ('1', '10', '2') - this is because the list is
being sorted in lexicographical order, which sorts numbers like you would
letters (i.e. 'b', 'ba', 'c').

``natsort`` provides a function ``natsorted`` that helps sort lists
"naturally", either as real numbers (i.e. signed/unsigned floats or ints),
or as versions.  Using ``natsorted`` is simple:

.. code-block:: python

    >>> from natsort import natsorted
    >>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
    >>> natsorted(a)
    ['a1', 'a2', 'a4', 'a9', 'a10']

``natsorted`` identifies numbers anywhere in a string and sorts them
naturally. Here are some other things you can do with ``natsort``
(please see the `examples <http://pythonhosted.org//natsort/examples.html>`_
for a quick start guide, or the
`api <http://pythonhosted.org//natsort/api.html>`_ for more details).

Sorting Versions
++++++++++++++++

This is handled properly by default (as of ``natsort`` version >= 4.0.0):

.. code-block:: python

    >>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
    >>> natsorted(a)
    ['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']

If you need to sort release candidates, please see
http://pythonhosted.org//natsort/examples.html#rc-sorting for a useful hack.

Sorting by Real Numbers (i.e. Signed Floats)
++++++++++++++++++++++++++++++++++++++++++++

This is useful in scientific data analysis and was
the default behavior of ``natsorted`` for ``natsort``
version < 4.0.0. Use the ``realsorted`` function:

.. code-block:: python

    >>> from natsort import realsorted, ns
    >>> a = ['num5.10', 'num-3', 'num5.3', 'num2']
    >>> natsorted(a)
    ['num2', 'num5.3', 'num5.10', 'num-3']
    >>> natsorted(a, alg=ns.REAL)
    ['num-3', 'num2', 'num5.10', 'num5.3']
    >>> realsorted(a)  # shortcut for natsorted with alg=ns.REAL
    ['num-3', 'num2', 'num5.10', 'num5.3']

Locale-Aware Sorting (or "Human Sorting")
+++++++++++++++++++++++++++++++++++++++++

This is where the non-numeric characters are ordered based on their meaning,
not on their ordinal value, and a locale-dependent thousands separator
is accounted for in the number.
This can be achieved with the `humansorted`` function:

.. code-block:: python

    >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
    >>> natsorted(a)
    ['Apple', 'Banana', 'apple14,689', 'apple15', 'banana']
    >>> import locale
    >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
    'en_US.UTF-8'
    >>> natsorted(a, alg=ns.LOCALE)
    ['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']
    >>> from natsort import humansorted
    >>> humansorted(a)  # shortcut for natsorted with alg=ns.LOCALE
    ['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']

You may find you need to explicitly set the locale to get this to work
(as shown in the example).
Please see http://pythonhosted.org/natsort/locale_issues.html and the Installation section 
below before using the `humansorted`` function.

Sorting Mixed Types
+++++++++++++++++++

You can mix and match ``int``, ``float``, and ``str`` (or ``unicode``) types
when you sort:

.. code-block:: python

    >>> a = ['4.5', 6, 2.0, '5', 'a']
    >>> natsorted(a)
    [2.0, '4.5', '5', 6, 'a']
    >>> # On Python 2, sorted(a) would return [2.0, 6, '4.5', '5', 'a']
    >>> # On Python 3, sorted(a) would raise an "unorderable types" TypeError

Handling Bytes on Python 3
++++++++++++++++++++++++++

``natsort`` does not officially support the `bytes` type on Python 3, but
convenience functions are provided that help you decode to `str` first:

.. code-block:: python

    >>> from natsort import as_utf8
    >>> a = [b'a', 14.0, 'b']
    >>> # On Python 2, natsorted(a) would would work as expected.
    >>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
    >>> natsorted(a, key=as_utf8) == [14.0, b'a', 'b']
    True
    >>> a = [b'a56', b'a5', b'a6', b'a40']
    >>> # On Python 2, natsorted(a) would would work as expected.
    >>> # On Python 3, natsorted(a) would return the same results as sorted(a)
    >>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
    True

Generating a Reusable Sorting Key
+++++++++++++++++++++++++++++++++

All of the ``*sorted`` functions from the ``natsort`` are convenience functions
around the something similar to the following:

.. code-block:: python

    >>> from natsort import natsort_keygen
    >>> natsort_key = natsort_keygen()
    >>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
    >>> natsorted(a) == sorted(a, key=natsort_key)
    True

You can use this key for your own use (such as passing to ``list.sort``).
You can also customize the key with the ``ns`` enum
(see `the ns enum <http://pythonhosted.org//natsort/ns_class.html>`_).

Other Useful Things
+++++++++++++++++++

 - recursively descend into lists of lists
 - controlling the case-sensitivity (see http://pythonhosted.org//natsort/examples.html#case-sort)
 - sorting file paths correctly (see http://pythonhosted.org//natsort/examples.html#path-sort)
 - allow custom sorting keys (see http://pythonhosted.org//natsort/examples.html#custom-sort)

Shell script
------------

``natsort`` comes with a shell script called ``natsort``, or can also be called
from the command line with ``python -m natsort``. 

Requirements
------------

``natsort`` requires Python version 2.6 or greater or Python 3.3 or greater.
It may run on (but is not tested against) Python 3.2.

Optional Dependencies
---------------------

fastnumbers
+++++++++++

The most efficient sorting can occur if you install the 
`fastnumbers <https://pypi.python.org/pypi/fastnumbers>`_ package
(version >=0.7.1); it helps with the string to number conversions.
``natsort`` will still run (efficiently) without the package, but if you need
to squeeze out that extra juice it is recommended you include this as a dependency.
``natsort`` will not require (or check) that
`fastnumbers <https://pypi.python.org/pypi/fastnumbers>`_ is installed
at installation.

PyICU
+++++

It is recommended that you install `PyICU <https://pypi.python.org/pypi/PyICU>`_
if you wish to sort in a locale-dependent manner, see
http://pythonhosted.org/natsort/locale_issues.html for an explanation why.

Author
------

Seth M. Morton

History
-------

These are the last three entries of the changelog.  See the package documentation
for the complete `changelog <http://pythonhosted.org//natsort/changelog.html>`_.

05-08-2014 v. 5.0.0
+++++++++++++++++++

    - ``ns.LOCALE``/``humansorted`` now accounts for thousands separators.
    - Refactored entire codebase to be more functional (as in use functions as
      units). Previously, the code was rather monolithic and difficult to follow. The
      goal is that with the code existing in smaller units, contributing will
      be easier.
    - Deprecated ``ns.TYPESAFE`` option as it is now always on (due to a new
      iterator-based algorithm, the typesafe function is now cheap).
    - Increased speed of execution (came for free with the new functional approach
      because the new factory function paradigm eliminates most ``if`` branches
      during execution).

      - For the most cases, the code is 30-40% faster than version 4.0.4.
      - If using ``ns.LOCALE`` or ``humansorted``, the code is 1100% faster than
        version 4.0.4.

    - Improved clarity of documentaion with regards to locale-aware sorting.
    - Added a new ``chain_functions`` function for convenience in creating
      a complex user-given ``key`` from several existing functions.

11-01-2015 v. 4.0.4
+++++++++++++++++++

    - Improved coverage of unit tests.
    - Unit tests use new and improved hypothesis library.
    - Fixed compatibility issues with Python 3.5

06-25-2015 v. 4.0.3
+++++++++++++++++++

    - Fixed bad install on last release (sorry guys!).