diff options
author | Seth Morton <seth.m.morton@gmail.com> | 2018-11-10 10:13:53 -0800 |
---|---|---|
committer | GitHub <noreply@github.com> | 2018-11-10 10:13:53 -0800 |
commit | f7cffcc5c5e1c876387c6aad7a8ed6d5bca9e822 (patch) | |
tree | f45bcc375dbe40f86c871a42a1b4bdd7aeeb9a35 | |
parent | c1b72d8e2660f586fde546c176dbe7189a5b0999 (diff) | |
parent | d19b12902eb8c81190fc9b9a72c4eee248923f37 (diff) | |
download | natsort-f7cffcc5c5e1c876387c6aad7a8ed6d5bca9e822.tar.gz |
Merge pull request #76 from jdufresne/pycon
Correct highlighting language in docs
-rw-r--r-- | README.rst | 32 | ||||
-rw-r--r-- | docs/source/examples.rst | 42 | ||||
-rw-r--r-- | docs/source/howitworks.rst | 86 | ||||
-rw-r--r-- | docs/source/intro.rst | 24 | ||||
-rw-r--r-- | docs/source/locale_issues.rst | 4 | ||||
-rw-r--r-- | docs/source/shell.rst | 26 |
6 files changed, 116 insertions, 98 deletions
@@ -42,7 +42,7 @@ When you try to sort a list of strings that contain numbers, the normal python sort algorithm sorts lexicographically, so you might not get the results that you expect: -.. code-block:: python +.. code-block:: pycon >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in'] >>> sorted(a) @@ -57,7 +57,7 @@ letters (i.e. 'b', 'ba', 'c'). sorting based on meaning and not computer code point). Using ``natsorted`` is simple: -.. code-block:: python +.. code-block:: pycon >>> from natsort import natsorted >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in'] @@ -75,7 +75,7 @@ for a quick start guide, or the To sort a list and assign the output to the same variable, you must explicitly assign the output to a variable: -.. code-block:: python +.. code-block:: pycon >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in'] >>> natsorted(a) @@ -97,7 +97,7 @@ Sorting Versions This is handled properly by default (as of ``natsort`` version >= 4.0.0): -.. code-block:: python +.. code-block:: pycon >>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10'] >>> natsorted(a) @@ -113,7 +113,7 @@ This is useful in scientific data analysis and was the default behavior of ``natsorted`` for ``natsort`` version < 4.0.0. Use the ``realsorted`` function: -.. code-block:: python +.. code-block:: pycon >>> from natsort import realsorted, ns >>> # Note that when interpreting as signed floats, the below numbers are @@ -134,7 +134,7 @@ not on their ordinal value, and a locale-dependent thousands separator and decim separator is accounted for in the number. This can be achieved with the ``humansorted`` function: -.. code-block:: python +.. code-block:: pycon >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana'] >>> natsorted(a) @@ -160,7 +160,7 @@ If you need to combine multiple algorithm modifiers (such as ``ns.REAL``, ``ns.LOCALE``, and ``ns.IGNORECASE``), you can combine the options using the bitwise OR operator (``|``). For example, -.. code-block:: python +.. code-block:: pycon >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana'] >>> natsorted(a, alg=ns.REAL | ns.LOCALE | ns.IGNORECASE) @@ -180,7 +180,7 @@ All of the available customizations can be found in the documentation for You can also add your own custom transformation functions with the ``key`` argument. These can be used with ``alg`` if you wish. -.. code-block:: python +.. code-block:: pycon >>> a = ['apple2.50', '2.3apple'] >>> natsorted(a, key=lambda x: x.replace('apple', ''), alg=ns.REAL) @@ -192,7 +192,7 @@ Sorting Mixed Types You can mix and match ``int``, ``float``, and ``str`` (or ``unicode``) types when you sort: -.. code-block:: python +.. code-block:: pycon >>> a = ['4.5', 6, 2.0, '5', 'a'] >>> natsorted(a) @@ -206,7 +206,7 @@ Handling Bytes on Python 3 ``natsort`` does not officially support the `bytes` type on Python 3, but convenience functions are provided that help you decode to `str` first: -.. code-block:: python +.. code-block:: pycon >>> from natsort import as_utf8 >>> a = [b'a', 14.0, 'b'] @@ -229,7 +229,7 @@ key using ``natsort_keygen`` and then passes that to the built-in generate a custom sorting key to sort in-place using the ``list.sort`` method. -.. code-block:: python +.. code-block:: pycon >>> from natsort import natsort_keygen >>> natsort_key = natsort_keygen() @@ -282,7 +282,7 @@ How *does* ``natsort`` work? key generator ``natsort.natsort_keygen()``. ``natsort.natsorted()`` is essentially a wrapper for the following code: - .. code-block:: python + .. code-block:: pycon >>> from natsort import natsort_keygen >>> natsort_key = natsort_keygen() @@ -351,7 +351,7 @@ Installation Use ``pip``! -.. code-block:: sh +.. code-block:: console $ pip install natsort @@ -361,7 +361,7 @@ at installation time to install those dependencies as well - use ``fast`` for `fastnumbers <https://pypi.org/project/fastnumbers>`_ and ``icu`` for `PyICU <https://pypi.org/project/PyICU>`_. -.. code-block:: sh +.. code-block:: console # Install both optional dependencies. $ pip install natsort[fast,icu] @@ -377,7 +377,7 @@ The recommended way to run tests is with `tox <https://tox.readthedocs.io/en/lat After installing ``tox``, running tests is as simple as executing the following in the ``natsort`` directory: -.. code-block:: sh +.. code-block:: console $ tox @@ -390,7 +390,7 @@ tests manually using `pytest <https://docs.pytest.org/en/latest/>`_ - ``natsort` contains a ``Pipfile`` for use with `pipenv <https://github.com/pypa/pipenv>`_ that makes it easy for you to install the testing dependencies: -.. code-block:: sh +.. code-block:: console $ pipenv install --skip-lock --dev $ pipenv run python -m pytest diff --git a/docs/source/examples.rst b/docs/source/examples.rst index 8a1fb29..e30df91 100644 --- a/docs/source/examples.rst +++ b/docs/source/examples.rst @@ -18,7 +18,7 @@ Basic Usage In the most basic use case, simply import :func:`~natsorted` and use it as you would :func:`sorted`: -.. code-block:: python +.. code-block:: pycon >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in'] >>> sorted(a) @@ -42,7 +42,7 @@ Sorting with Alpha, Beta, and Release Candidates By default, if you wish to sort versions with a non-strict versioning scheme, you may not get the results you expect: -.. code-block:: python +.. code-block:: pycon >>> a = ['1.2', '1.2rc1', '1.2beta2', '1.2beta1', '1.2alpha', '1.2.1', '1.1', '1.3'] >>> natsorted(a) @@ -51,7 +51,7 @@ scheme, you may not get the results you expect: To make the '1.2' pre-releases come before '1.2.1', you need to use the following recipe: -.. code-block:: python +.. code-block:: pycon >>> natsorted(a, key=lambda x: x.replace('.', '~')) ['1.1', '1.2', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2.1', '1.3'] @@ -59,7 +59,7 @@ recipe: If you also want '1.2' after all the alpha, beta, and rc candidates, you can modify the above recipe: -.. code-block:: python +.. code-block:: pycon >>> natsorted(a, key=lambda x: x.replace('.', '~')+'z') ['1.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2', '1.2.1', '1.3'] @@ -76,7 +76,7 @@ In some cases when sorting file paths with OS-Generated names, the default :mod:`~natsorted` algorithm may not be sufficient. In cases like these, you may need to use the ``ns.PATH`` option: -.. code-block:: python +.. code-block:: pycon >>> a = ['./folder/file (1).txt', ... './folder/file.txt', @@ -100,7 +100,7 @@ characters, it will also properly interpret non-'.' decimal separators and also properly order case. It may be more convenient to just use the :func:`humansorted` function: -.. code-block:: python +.. code-block:: pycon >>> from natsort import humansorted >>> import locale @@ -125,7 +125,7 @@ Controlling Case When Sorting For non-numbers, by default :mod:`natsort` used ordinal sorting (i.e. it sorts by the character's value in the ASCII table). For example: -.. code-block:: python +.. code-block:: pycon >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana'] >>> natsorted(a) @@ -134,7 +134,7 @@ it sorts by the character's value in the ASCII table). For example: There are times when you wish to ignore the case when sorting, you can easily do this with the ``ns.IGNORECASE`` option: -.. code-block:: python +.. code-block:: pycon >>> natsorted(a, alg=ns.IGNORECASE) ['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn'] @@ -147,7 +147,7 @@ Upper-case letters appear first in the ASCII table, but many natural sorting methods place lower-case first. To do this, use ``ns.LOWERCASEFIRST``: -.. code-block:: python +.. code-block:: pycon >>> natsorted(a, alg=ns.LOWERCASEFIRST) ['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn'] @@ -157,7 +157,7 @@ and the lower-case letters grouped together; most would expect all "a"s to bet together regardless of case, and all "b"s, and so on. To achieve this, use ``ns.GROUPLETTERS``: -.. code-block:: python +.. code-block:: pycon >>> natsorted(a, alg=ns.GROUPLETTERS) ['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn'] @@ -165,7 +165,7 @@ achieve this, use ``ns.GROUPLETTERS``: You might combine this with ``ns.LOWERCASEFIRST`` to get what most would expect to be "natural" sorting: -.. code-block:: python +.. code-block:: pycon >>> natsorted(a, alg=ns.G | ns.LF) ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn'] @@ -178,7 +178,7 @@ a valid Python float literal, such as 5, 0.4, -4.78, +4.2E-34, etc. using the ``ns.FLOAT`` key. You can disable the exponential component of the number with ``ns.NOEXP``. -.. code-block:: python +.. code-block:: pycon >>> a = ['a50', 'a51.', 'a+50.4', 'a5.034e1', 'a+50.300'] >>> natsorted(a, alg=ns.FLOAT) @@ -195,7 +195,7 @@ function. Please note that the behavior of the :func:`~realsorted` function was the default behavior of :func:`~natsorted` for :mod:`natsort` version < 4.0.0: -.. code-block:: python +.. code-block:: pycon >>> natsorted(a, alg=ns.REAL) ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.'] @@ -211,7 +211,7 @@ Using a Custom Sorting Key Like the built-in ``sorted`` function, ``natsorted`` can accept a custom sort key so that: -.. code-block:: python +.. code-block:: pycon >>> from operator import attrgetter, itemgetter >>> a = [['a', 'num4'], ['b', 'num8'], ['c', 'num2']] @@ -233,7 +233,7 @@ If you need to sort a list in-place, you cannot use :func:`~natsorted`; you need to pass a key to the :meth:`list.sort` method. The function :func:`~natsort_keygen` is a convenient way to generate these keys for you: -.. code-block:: python +.. code-block:: pycon >>> from natsort import natsort_keygen >>> a = ['a50', 'a51.', 'a50.4', 'a5.034e1', 'a50.300'] @@ -256,7 +256,7 @@ Natural Sorting with ``cmp`` (Python 2 only) If you are using a legacy codebase that requires you to use :func:`cmp` instead of a key-function, you can use :func:`~natcmp`. -.. code-block:: python +.. code-block:: pycon >>> import sys >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in'] @@ -279,7 +279,7 @@ To achieve this you could use the :func:`~index_natsorted` in combination with the convenience function :func:`~order_by_index`: -.. code-block:: python +.. code-block:: pycon >>> from natsort import index_natsorted, order_by_index >>> a = ['a2', 'a9', 'a1', 'a4', 'a10'] @@ -299,7 +299,7 @@ Returning Results in Reverse Order Just like the :func:`sorted` built-in function, you can supply the ``reverse`` option to return the results in reverse order: -.. code-block:: python +.. code-block:: pycon >>> a = ['a2', 'a9', 'a1', 'a4', 'a10'] >>> natsorted(a, reverse=True) @@ -320,7 +320,7 @@ will allow :mod:`natsort` to convert byte arrays to strings for sorting; these functions know not to raise an error if the input is not a byte array, so you can use the key on any arbitrary collection of data. -.. code-block:: python +.. code-block:: pycon >>> from natsort import as_ascii >>> a = [b'a', 14.0, 'b'] @@ -335,7 +335,7 @@ run :mod:`natsort` on a list of bytes, you will get results that are like Python's default sorting behavior. Of course, you can use the decoding functions to solve this: -.. code-block:: python +.. code-block:: pycon >>> from natsort import as_utf8 >>> a = [b'a56', b'a5', b'a6', b'a40'] @@ -347,7 +347,7 @@ functions to solve this: If you need a codec different from ASCII or UTF-8, you can use :func:`decoder` to generate a custom key: -.. code-block:: python +.. code-block:: pycon >>> from natsort import decoder >>> a = [b'a56', b'a5', b'a6', b'a40'] diff --git a/docs/source/howitworks.rst b/docs/source/howitworks.rst index 05f8cdf..12bf569 100644 --- a/docs/source/howitworks.rst +++ b/docs/source/howitworks.rst @@ -28,13 +28,15 @@ First, How Does Natural Sorting Work At a High Level? If I want to compare '2 ft 7 in' to '2 ft 11 in', I might do the following -.. code-block:: python +.. code-block:: pycon >>> '2 ft 7 in' < '2 ft 11 in' False We as humans know that the above should be true, but why does Python think it -is false? Here is how it is performing the comparison:: +is false? Here is how it is performing the comparison: + +.. code-block:: none '2' <=> '2' ==> equal, so keep going ' ' <=> ' ' ==> equal, so keep going @@ -55,12 +57,14 @@ right thing." Luckily, it handles sorting lists of strings right out-of-the-box, so the only hard part is actually making this string-to-list transformation and then Python will handle the rest. -:: +.. code-block:: none '2 ft 7 in' ==> (2, ' ft ', 7, ' in') '2 ft 11 in' ==> (2, ' ft ', 11, ' in') -When Python compares the two, it roughly follows the below logic:: +When Python compares the two, it roughly follows the below logic: + +.. code-block:: none 2 <=> 2 ==> equal, so keep going ' ft ' <=> ' ft ' ==> a string is a special type of sequence - evaluate each character individually @@ -93,7 +97,7 @@ Remarkably, this turns out to be the easy part, owing mostly to Python's easy ac to regular expressions. Breaking an arbitrary string based on a pattern is pretty straightforward. -.. code-block:: python +.. code-block:: pycon >>> import re >>> re.split(r'(\d+)', '2 ft 11 in') @@ -107,7 +111,7 @@ unsigned integers as the above example contains. By real numbers, I mean those l ``-45.4920E-23``. :mod:`natsort` can handle just about any number definition; to that end, here are all the regular expressions used in :mod:`natsort`: -.. code-block:: python +.. code-block:: pycon >>> unsigned_int = r'([0-9]+)' >>> signed_int = r'([-+]?[0-9]+)' @@ -120,7 +124,7 @@ Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float defini wouldn't want (for example) ``"banana"`` to be converted into ``['ba', 'nan', 'a']``, Let's see an example: -.. code-block:: python +.. code-block:: pycon >>> re.split(signed_float, 'The mass of 3 electrons is 2.732815068E-30 kg') ['The mass of ', '3', ' electrons is ', '2.732815068E-30', ' kg'] @@ -155,7 +159,7 @@ coerce a string to a number if it can be coerced, and leaving it alone otherwise (see `this one for coercion`_ and `this one for checking`_ for some high traffic questions), but it mostly boils down to two different solutions, shown here: -.. code-block:: python +.. code-block:: pycon >>> def coerce_try_except(x): ... try: @@ -171,7 +175,7 @@ but it mostly boils down to two different solutions, shown here: Here are some timing results run on my machine: -:: +.. code-block:: pycon In [0]: numbers = list(map(str, range(100))) # A list of numbers as strings @@ -212,7 +216,7 @@ into numeric and non-numeric content *before* being passed to this coercion func the assumption can be made that *if a string begins with a digit or a sign, it can be coerced into a number*. -.. code-block:: python +.. code-block:: pycon >>> def coerce_to_int(x): ... if x[0] in '0123456789+-': @@ -226,7 +230,7 @@ can be coerced into a number*. So how does this perform compared to the standard coercion methods? -:: +.. code-block:: pycon In [6]: %timeit [coerce_to_int(x) for x in numbers] 10000 loops, best of 3: 71.6 µs per loop @@ -244,7 +248,7 @@ if I could get any faster writing a C extension. It's called `fastnumbers`_ and contains a C implementation of the above coercion functions called :func:`fast_int`. How does it fair? Pretty well. -:: +.. code-block:: pycon In [8]: %timeit [fast_int(x) for x in numbers] 10000 loops, best of 3: 30.9 µs per loop @@ -264,7 +268,7 @@ and otherwise the hybrid method will be used. Modifying the hybrid coercion function for floats is straightforward. - .. code-block:: python + .. code-block:: pycon >>> def coerce_to_float(x): ... if x[0] in '.0123456789+-' or x.lower().lstrip()[:3] in ('nan', 'inf'): @@ -283,7 +287,7 @@ TL;DR 1 - The Simple "No Special Cases" Algorithm At this point, our :mod:`natsort` algorithm is essentially the following: -.. code-block:: python +.. code-block:: pycon >>> import re >>> def natsort_key(x, as_float=False, signed=False): @@ -327,7 +331,7 @@ to :mod:`natsort`). Let's apply the :func:`natsort_key` from above to some filesystem paths that you might see being auto-generated from your operating system: -.. code-block:: python +.. code-block:: pycon >>> paths = ['/p/Folder (10)/file.tar.gz', ... '/p/Folder/file.tar.gz', @@ -343,7 +347,7 @@ space character (number 32) comes before the ``/`` character (number 47). If we remove the common prefix in all of the above strings (``'/p/Folder'``), we can see why this happens: -.. code-block:: python +.. code-block:: pycon >>> ' (1)/file.tar.gz' < '/file.tar.gz' True @@ -354,7 +358,7 @@ This isn't very convenient... how do we solve it? We can split the path across the path separators and then sort. A convenient way do to this is with the `Path.parts`_ method from :mod:`pathlib`: -.. code-block:: python +.. code-block:: pycon >>> import pathlib >>> sorted(paths, key=lambda x: tuple(natsort_key(s) for s in pathlib.Path(x).parts)) @@ -364,7 +368,7 @@ Almost! It seems like there is some funny business going on in the final filename component as well. We can solve that nicely and quickly with `Path.suffixes`_ and `Path.stem`_. -.. code-block:: python +.. code-block:: pycon >>> def decompose_path_into_components(x): ... path_split = list(pathlib.Path(x).parts) @@ -392,7 +396,7 @@ separated components is sent to the :mod:`natsort` algorithm, so the result is a tuple of tuples. Once that is done, we can see how comparisons can be done in the expected manner. -.. code-block:: python +.. code-block:: pycon >>> a = natsort_key_with_path_support('/p/Folder (1)/file (1).tar.gz') >>> a @@ -420,7 +424,7 @@ strings is walking a dangerous line if it does not have special handling for comparing numbers and strings. My imagination was not so great at first. Let's take a look at all the ways this can fail with real-world data. -.. code-block:: python +.. code-block:: pycon >>> def natsort_key_with_poor_real_number_support(x): ... split_input = re.split(signed_float, x) @@ -471,7 +475,7 @@ any non-empty string, and we typically want numbers to come before strings. Let's take a look at how this works out. -.. code-block:: python +.. code-block:: pycon >>> from natsort.utils import sep_inserter >>> list(sep_inserter(iter(['apples']), '')) @@ -504,7 +508,7 @@ Handling NaN Let's see what happens when you try to sort a plain old list of numbers when there is a **NaN** floating around in there. -.. code-block:: python +.. code-block:: pycon >>> danger = [7, float('nan'), 22.7, 19, -14, 59.123, 4] >>> sorted(danger) @@ -514,7 +518,7 @@ Clearly that isn't correct, and for once it isn't my fault! `It's hard to compare floating point numbers`_. By definition, **NaN** is unorderable to any other number, and is never equal to any other number, including itself. -.. code-block:: python +.. code-block:: pycon >>> nan = float('nan') >>> 5 > nan @@ -544,7 +548,7 @@ some other value. But what value is *least* astonishing? I chose to replace **NaN** with :math:`-\infty` so that these poorly behaved elements always end up at the front where the users will most likely be alerted to their presence. -.. code-block:: python +.. code-block:: pycon >>> def fix_nan(x): ... if x != x: # only true for NaN @@ -601,7 +605,7 @@ input like ``['/home/me', 42]``. Let's take it out for a spin! -.. code-block:: python +.. code-block:: pycon >>> danger = [7, float('nan'), 22.7, '19', '-14', '59.123', 4] >>> sorted(danger, key=lambda x: natsort_key(x, as_float=True, signed=True)) @@ -675,7 +679,7 @@ Without even thinking about the mess that is adding :mod:`locale` support, First, let's take a look at how it is sorted by default (due to where characters lie on the `ASCII table`_). -.. code-block:: python +.. code-block:: pycon >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana'] >>> sorted(a) @@ -692,7 +696,7 @@ Some believe that both should be true ☹. Some people don't care at all [#f4]_. Solving the first case (I call it *LOWERCASEFIRST*) is actually pretty easy... just call the :meth:`str.swapcase` method on the input. -.. code-block:: python +.. code-block:: pycon >>> sorted(a, key=lambda x: x.swapcase()) ['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn'] @@ -705,7 +709,7 @@ a good thing that in Python 3.3 all case information from unicode characters in non-latin alphabets. -.. code-block:: python +.. code-block:: pycon >>> def remove_case(x): ... try: @@ -720,7 +724,7 @@ The middle case (I call it *GROUPLETTERS*) is less straightforward. The most efficient way to handle this is to duplicate each character with its lowercase version and then the original character. -.. code-block:: python +.. code-block:: pycon >>> import itertools >>> def groupletters(x): @@ -741,7 +745,7 @@ appropriately with respect to each other. There's a problem with this, though. Within the context of :mod:`natsort` we are trying to correctly sort numbers and those should be left alone. -.. code-block:: python +.. code-block:: pycon >>> a = ['Apple5', 'apple', 'Apple4E10', 'Banana'] >>> sorted(a, key=lambda x: natsort_key(x, as_float=True)) @@ -756,7 +760,7 @@ We messed up the numbers! Looks like :func:`groupletters` needs to be applied how this is done here, but basically it requires applying the function in the ``else:`` block of :func:`coerce_to_int`/:func:`coerce_to_float`. -.. code-block:: python +.. code-block:: pycon >>> better_groupletters = natsort_keygen(alg=ns.GROUPLETTERS | ns.REAL) >>> better_groupletters('Apple4E10') @@ -772,7 +776,7 @@ Basic Unicode Support Unicode is hard and complicated. Here's an example. -.. code-block:: python +.. code-block:: pycon >>> b = [b'\x66', b'\x65', b'\xc3\xa9', b'\x65\xcc\x81', b'\x61', b'\x7a'] >>> a = [x.decode('utf8') for x in b] @@ -787,7 +791,7 @@ In fact, many characters have multiple representations. This is a challenge because comparing the two representations would return ``False`` even though they *look* the same. -.. code-block:: python +.. code-block:: pycon >>> a[2] == a[3] False @@ -834,7 +838,7 @@ It seems that most Unicode data is stored and shared in the compressed form which makes it challenging to sort. This can be solved by normalizing all incoming Unicode data to the decompressed form ('NFD') and *then* sorting. -.. code-block:: python +.. code-block:: pycon >>> import unicodedata >>> c = [unicodedata.normalize('NFD', x) for x in a] @@ -861,7 +865,7 @@ First, how to use :mod:`locale` to compare strings? It's actually pretty straightforward. Simply run the input through the :mod:`locale` transformation function :func:`locale.strxfrm`. -.. code-block:: python +.. code-block:: pycon >>> import locale, sys >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') @@ -906,7 +910,7 @@ fixed?), and so :mod:`locale` does not work as expected. How do I define doesn't work as expected? -.. code-block:: python +.. code-block:: pycon >>> a = ['apple', 'Banana', 'banana', 'Apple'] >>> sorted(a) @@ -926,7 +930,7 @@ So, how to deal with this situation? There are two ways to do so. #. Detect if :mod:`locale` is sorting incorrectly (i.e. ``dumb``) by seeing if ``'A'`` is sorted before ``'a'`` (incorrect) or not. - .. code-block:: python + .. code-block:: pycon >>> # This is genuinely the name of this function. >>> # See natsort.compat.locale.py @@ -964,7 +968,7 @@ So what is the problem? Consider the number ``1,234,567`` (assuming the and you will get a :exc:`ValueError`. To handle this properly the thousands separators must be removed. -.. code-block:: python +.. code-block:: pycon >>> float('1,234,567'.replace(',', '')) 1234567.0 @@ -972,7 +976,7 @@ separators must be removed. What if, in our current locale, the thousands separator is ``'.'`` and the ``','`` is the decimal separator (like for the German locale *de_DE*)? -.. code-block:: python +.. code-block:: pycon >>> float('1.234.567'.replace('.', '').replace(',', '.')) 1234567.0 @@ -985,7 +989,7 @@ use this method under its hood? Well, let's take a look at what would happen if we send some possible :mod:`natsort` input through our the above function: -.. code-block:: python +.. code-block:: pycon >>> natsort_key('1,234 apples, please.'.replace(',', '')) ('', 1234, ' apples please.') @@ -1012,7 +1016,7 @@ shown previously will work. Beware, these regular expressions will make your eyes bleed. -.. code-block:: python +.. code-block:: pycon >>> decimal = ',' # Assume German locale, so decimal separator is ',' >>> # Look-behind assertions cannot accept range modifiers, so instead of i.e. diff --git a/docs/source/intro.rst b/docs/source/intro.rst index 6cb4701..b53f8ac 100644 --- a/docs/source/intro.rst +++ b/docs/source/intro.rst @@ -39,7 +39,7 @@ When you try to sort a list of strings that contain numbers, the normal python sort algorithm sorts lexicographically, so you might not get the results that you expect: -.. code-block:: python +.. code-block:: pycon >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in'] >>> sorted(a) @@ -54,7 +54,7 @@ letters (i.e. 'b', 'ba', 'c'). sorting based on meaning and not computer code point).. Using :func:`~natsorted` is simple: -.. code-block:: python +.. code-block:: pycon >>> from natsort import natsorted >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in'] @@ -73,7 +73,7 @@ for more details). `does not sort in-place`. To sort a list and assign the output to the same variable, you must explicitly assign the output to a variable: - .. code-block:: python + .. code-block:: pycon >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in'] >>> natsorted(a) @@ -95,7 +95,7 @@ Sorting Versions This is handled properly by default (as of :mod:`natsort` version >= 4.0.0): -.. code-block:: python +.. code-block:: pycon >>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10'] >>> natsorted(a) @@ -111,7 +111,7 @@ This is useful in scientific data analysis and was the default behavior of :func:`~natsorted` for :mod:`natsort` version < 4.0.0. Use the :func:`~realsorted` function: -.. code-block:: python +.. code-block:: pycon >>> from natsort import realsorted, ns >>> # Note that when interpreting as signed floats, the below numbers are @@ -132,7 +132,7 @@ not on their ordinal value, and a locale-dependent thousands separator and decim separator is accounted for in the number. This can be achieved with the :func:`~humansorted` function: -.. code-block:: python +.. code-block:: pycon >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana'] >>> natsorted(a) @@ -158,7 +158,7 @@ If you need to combine multiple algorithm modifiers (such as ``ns.REAL``, ``ns.LOCALE``, and ``ns.IGNORECASE``), you can combine the options using the bitwise OR operator (``|``). For example, -.. code-block:: python +.. code-block:: pycon >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana'] >>> natsorted(a, alg=ns.REAL | ns.LOCALE | ns.IGNORECASE) @@ -178,7 +178,7 @@ the :class:`~natsort.ns` enum. You can also add your own custom transformation functions with the ``key`` argument. These can be used with ``alg`` if you wish: -.. code-block:: python +.. code-block:: pycon >>> a = ['apple2.50', '2.3apple'] >>> natsorted(a, key=lambda x: x.replace('apple', ''), alg=ns.REAL) @@ -190,7 +190,7 @@ Sorting Mixed Types You can mix and match ``int``, ``float``, and ``str`` (or ``unicode``) types when you sort: -.. code-block:: python +.. code-block:: pycon >>> a = ['4.5', 6, 2.0, '5', 'a'] >>> natsorted(a) @@ -204,7 +204,7 @@ Handling Bytes on Python 3 :mod:`natsort` does not officially support the `bytes` type on Python 3, but convenience functions are provided that help you decode to `str` first: -.. code-block:: python +.. code-block:: pycon >>> from natsort import as_utf8 >>> a = [b'a', 14.0, 'b'] @@ -227,7 +227,7 @@ key using :func:`~natsort_keygen` and then passes that to the built-in generate a custom sorting key to sort in-place using the :meth:`list.sort` method. -.. code-block:: python +.. code-block:: pycon >>> from natsort import natsort_keygen >>> natsort_key = natsort_keygen() @@ -280,7 +280,7 @@ How *does* :mod:`natsort` work? key generator :func:`natsort.natsort_keygen`. :func:`natsort.natsorted` is essentially a wrapper for the following code: - .. code-block:: python + .. code-block:: pycon >>> from natsort import natsort_keygen >>> natsort_key = natsort_keygen() diff --git a/docs/source/locale_issues.rst b/docs/source/locale_issues.rst index c799019..9100468 100644 --- a/docs/source/locale_issues.rst +++ b/docs/source/locale_issues.rst @@ -61,7 +61,9 @@ Explicitly Set the Locale Before Using :func:`~natsort.humansorted` or ``ns.LOCA I have found that unless you explicitly set a locale, the sorted order may not be what you expect. Setting this is straightforward (in the below example I use 'en_US.UTF-8', but you should use your -locale):: +locale): + +.. code-block:: pycon >>> import locale >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') diff --git a/docs/source/shell.rst b/docs/source/shell.rst index 17d1d7b..8a17ccc 100644 --- a/docs/source/shell.rst +++ b/docs/source/shell.rst @@ -14,7 +14,7 @@ Below is the usage and some usage examples for the ``natsort`` shell script. Usage ----- -:: +.. code-block:: none usage: natsort [-h] [--version] [-p] [-f LOW HIGH] [-F LOW HIGH] [-e EXCLUDE] [-r] [-t {digit,int,float,version,ver}] [--nosign] [--noexp] @@ -74,7 +74,9 @@ Description ``natsort`` was originally written to aid in computational chemistry research so that it would be easy to analyze large sets of output files -named after the parameter used:: +named after the parameter used: + +.. code-block:: console $ ls *.out mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out @@ -83,7 +85,9 @@ named after the parameter used:: that the shell sorts in lexicographical order. This is the behavior of programs like ``find`` as well as ``ls``. The problem is passing these files to an analysis program causes them not to appear in numerical order, which can lead -to bad analysis. To remedy this, use ``natsort``:: +to bad analysis. To remedy this, use ``natsort``: + +.. code-block:: console $ natsort *.out mode744.43.out @@ -93,11 +97,15 @@ to bad analysis. To remedy this, use ``natsort``:: $ natsort -t r *.out | xargs your_program ``-t r`` is short for ``--number-type real``. You can also place natsort in -the middle of a pipe:: +the middle of a pipe: + +.. code-block:: console $ find . -name "*.out" | natsort -t r | xargs your_program -To sort version numbers, use the default ``--number-type``:: +To sort version numbers, use the default ``--number-type``: + +.. code-block:: console $ ls * prog-1.10.zip prog-1.9.zip prog-2.0.zip @@ -108,7 +116,9 @@ To sort version numbers, use the default ``--number-type``:: In general, all ``natsort`` shell script options mirror the :func:`~natsorted` API, with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude`` -options. These three options are used as follows:: +options. These three options are used as follows: + +.. code-block:: console $ ls *.out mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out @@ -124,7 +134,9 @@ options. These three options are used as follows:: mode1243.34.out If you are sorting paths with OS-generated filenames, you may require the -``--paths``/``-p`` option:: +``--paths``/``-p`` option: + +.. code-block:: console $ find . ! -path . -type f ./folder/file (1).txt |