diff options
-rw-r--r-- | README.rst | 91 | ||||
-rw-r--r-- | docs/api.rst | 12 | ||||
-rw-r--r-- | docs/examples.rst | 18 | ||||
-rw-r--r-- | docs/howitworks.rst | 210 | ||||
-rw-r--r-- | docs/intro.rst | 94 | ||||
-rw-r--r-- | docs/locale_issues.rst | 16 | ||||
-rw-r--r-- | docs/shell.rst | 8 |
7 files changed, 239 insertions, 210 deletions
@@ -42,8 +42,8 @@ Quick Description ----------------- When you try to sort a list of strings that contain numbers, the normal python -sort algorithm sorts lexicographically, so you might not get the results that you -expect: +sort algorithm sorts lexicographically, so you might not get the results that +you expect: .. code-block:: pycon @@ -73,10 +73,10 @@ naturally. Below are some other things you can do with ``natsort`` for a quick start guide, or the `api <https://natsort.readthedocs.io/en/master/api.html>`_ for complete details). -**Note**: ``natsorted`` is designed to be a drop-in replacement for the built-in -``sorted`` function. Like ``sorted``, ``natsorted`` `does not sort in-place`. -To sort a list and assign the output to the same variable, you must -explicitly assign the output to a variable: +**Note**: ``natsorted`` is designed to be a drop-in replacement for the +built-in ``sorted`` function. Like ``sorted``, ``natsorted`` +`does not sort in-place`. To sort a list and assign the output to the same +variable, you must explicitly assign the output to a variable: .. code-block:: pycon @@ -137,9 +137,9 @@ version < 4.0.0). Use the ``realsorted`` function: Locale-Aware Sorting (or "Human Sorting") +++++++++++++++++++++++++++++++++++++++++ -This is where the non-numeric characters are also ordered based on their meaning, -not on their ordinal value, and a locale-dependent thousands separator and decimal -separator is accounted for in the number. +This is where the non-numeric characters are also ordered based on their +meaning, not on their ordinal value, and a locale-dependent thousands +separator and decimal separator is accounted for in the number. This can be achieved with the ``humansorted`` function: .. code-block:: pycon @@ -185,8 +185,8 @@ bitwise OR operator (``|``). For example, All of the available customizations can be found in the documentation for `the ns enum <https://natsort.readthedocs.io/en/master/api.html#natsort.ns>`_. -You can also add your own custom transformation functions with the ``key`` argument. -These can be used with ``alg`` if you wish. +You can also add your own custom transformation functions with the ``key`` +argument. These can be used with ``alg`` if you wish. .. code-block:: pycon @@ -248,8 +248,9 @@ method. >>> a ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in'] -All of the algorithm customizations mentioned in the `Further Customizing Natsort`_ -section can also be applied to ``natsort_keygen`` through the *alg* keyword option. +All of the algorithm customizations mentioned in the +`Further Customizing Natsort`_ section can also be applied to +``natsort_keygen`` through the *alg* keyword option. Other Useful Things +++++++++++++++++++ @@ -266,18 +267,20 @@ FAQ How do I debug ``natsort.natsorted()``? The best way to debug ``natsorted()`` is to generate a key using ``natsort_keygen()`` with the same options being passed to ``natsorted``. One can take a look at - exactly what is being done with their input using this key - it is highly recommended + exactly what is being done with their input using this key - it is highly + recommended to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_ for *how* to debug, and also to review the `How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_ page for *why* ``natsort`` is doing that to your data. - If you are trying to sort custom classes and running into trouble, please take a look at - https://github.com/SethMMorton/natsort/issues/60. In short, + If you are trying to sort custom classes and running into trouble, please + take a look at https://github.com/SethMMorton/natsort/issues/60. In short, custom classes are not likely to be sorted correctly if one relies - on the behavior of ``__lt__`` and the other rich comparison operators in their - custom class - it is better to use a ``key`` function with ``natsort``, or - use the ``natsort`` key as part of your rich comparison operator definition. + on the behavior of ``__lt__`` and the other rich comparison operators in + their custom class - it is better to use a ``key`` function with + ``natsort``, or use the ``natsort`` key as part of your rich comparison + operator definition. How *does* ``natsort`` work? If you don't want to read `How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_, @@ -286,9 +289,9 @@ How *does* ``natsort`` work? ``natsort`` provides a `key function <https://docs.python.org/3/howto/sorting.html#key-functions>`_ that can be passed to `list.sort() <https://docs.python.org/3/library/stdtypes.html#list.sort>`_ or `sorted() <https://docs.python.org/3/library/functions.html#sorted>`_ in order to - modify the default sorting behavior. This key is generated on-demand with the - key generator ``natsort.natsort_keygen()``. ``natsort.natsorted()`` is essentially - a wrapper for the following code: + modify the default sorting behavior. This key is generated on-demand with + the key generator ``natsort.natsort_keygen()``. ``natsort.natsorted()`` + is essentially a wrapper for the following code: .. code-block:: pycon @@ -341,8 +344,8 @@ The most efficient sorting can occur if you install the `fastnumbers <https://pypi.org/project/fastnumbers>`_ package (version >=2.0.0); it helps with the string to number conversions. ``natsort`` will still run (efficiently) without the package, but if you need -to squeeze out that extra juice it is recommended you include this as a dependency. -``natsort`` will not require (or check) that +to squeeze out that extra juice it is recommended you include this as a +dependency. ``natsort`` will not require (or check) that `fastnumbers <https://pypi.org/project/fastnumbers>`_ is installed at installation. @@ -381,17 +384,18 @@ How to Run Tests Please note that ``natsort`` is NOT set-up to support ``python setup.py test``. The recommended way to run tests is with `tox <https://tox.readthedocs.io/en/latest/>`_. -After installing ``tox``, running tests is as simple as executing the following in the -``natsort`` directory: +After installing ``tox``, running tests is as simple as executing the following +in the ``natsort`` directory: .. code-block:: console $ tox -``tox`` will create virtual a virtual environment for your tests and install all the -needed testing requirements for you. You can specify a particular python version -with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is done with ``tox -e flake8``. -You can see all available testing environments with ``tox --listenvs``. +``tox`` will create virtual a virtual environment for your tests and install +all the needed testing requirements for you. You can specify a particular +python version with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis +is done with ``tox -e flake8``. You can see all available testing environments +with ``tox --listenvs``. If you do not wish to use ``tox``, you can install the testing dependencies with the ``dev/requirements.txt`` file and then run the tests manually using @@ -408,7 +412,8 @@ Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this How to Build Documentation -------------------------- -If you want to build the documentation for ``natsort``, it is recommended to use ``tox``: +If you want to build the documentation for ``natsort``, it is recommended to +use ``tox``: .. code-block:: console @@ -430,10 +435,10 @@ Dropping Python 2.7 Support ``natsort`` version 7.0.0 will drop support for Python 2.7. -The version 6.X branch will remain as a "long term support" branch where bug fixes -are applied so that users who cannot update from Python 2.7 will not be forced to -use a buggy ``natsort`` version. Once version 7.0.0 is released, new features -will not be added to version 6.X, only bug fixes. +The version 6.X branch will remain as a "long term support" branch where bug +fixes are applied so that users who cannot update from Python 2.7 will not be +forced to use a buggy ``natsort`` version. Once version 7.0.0 is released, new +features will not be added to version 6.X, only bug fixes. Deprecated APIs +++++++++++++++ @@ -448,19 +453,21 @@ In ``natsort`` version 6.0.0, the following APIs and functions were removed - ``ns.TYPESAFE`` (deprecated since version 5.0.0) - ``ns.DIGIT`` (deprecated since version 5.0.0) - ``ns.VERSION`` (deprecated since version 5.0.0) - - ``versorted()`` (discouraged since version 4.0.0, officially deprecated since version 5.5.0) - - ``index_versorted()`` (discouraged since version 4.0.0, officially deprecated since version 5.5.0) + - ``versorted()`` (discouraged since version 4.0.0, + officially deprecated since version 5.5.0) + - ``index_versorted()`` (discouraged since version 4.0.0, + officially deprecated since version 5.5.0) -In general, if you want to determine if you are using deprecated APIs you can run your -code with the following flag +In general, if you want to determine if you are using deprecated APIs you +can run your code with the following flag .. code-block:: console $ python -Wdefault::DeprecationWarning my-code.py -By default ``DeprecationWarnings`` are not shown, but this will cause them to be shown. -Alternatively, you can just set the environment variable ``PYTHONWARNINGS`` to -"default::DeprecationWarning" and then run your code. +By default ``DeprecationWarnings`` are not shown, but this will cause them +to be shown. Alternatively, you can just set the environment variable +``PYTHONWARNINGS`` to "default::DeprecationWarning" and then run your code. Dropped Pipenv for Development ++++++++++++++++++++++++++++++ diff --git a/docs/api.rst b/docs/api.rst index 8052606..64794ff 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -89,16 +89,16 @@ Help With Creating Function Keys ++++++++++++++++++++++++++++++++ If you need to create a complicated *key* argument to (for example) -:func:`natsorted` that is actually multiple functions called one after the other, -the following function can help you easily perform this action. It is +:func:`natsorted` that is actually multiple functions called one after the +other, the following function can help you easily perform this action. It is used internally to :mod:`natsort`, and has been exposed publicly for the convenience of the user. .. autofunction:: chain_functions -If you need to be able to search your input for numbers using the same definition -as :mod:`natsort`, you can do so using the following function. Given your chosen -algorithm (selected using the :class:`~natsort.ns` enum), the corresponding regular -expression to locate numbers will be returned. +If you need to be able to search your input for numbers using the same +definition as :mod:`natsort`, you can do so using the following function. +Given your chosen algorithm (selected using the :class:`~natsort.ns` enum), +the corresponding regular expression to locate numbers will be returned. .. autofunction:: numeric_regex_chooser diff --git a/docs/examples.rst b/docs/examples.rst index 04ca632..f09f8e5 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -47,8 +47,8 @@ By default, if you wish to sort versions that are not as simple as >>> natsorted(a) ['1.1', '1.2', '1.2.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.3'] -To make the '1.2' pre-releases come before '1.2.1', you need to use the following -recipe: +To make the '1.2' pre-releases come before '1.2.1', you need to use the +following recipe: .. code-block:: pycon @@ -76,8 +76,8 @@ to assist in sorting. Some examples might be `SemVer <https://python-semver.readthedocs.io/en/latest/api.html>`_. If we are being honest, using these methods to parse a version means you don't -need to use :mod:`natsort` - you should probably just use :func:`sorted` directly. -Here's an example with SemVer: +need to use :mod:`natsort` - you should probably just use :func:`sorted` +directly. Here's an example with SemVer: .. code-block:: pycon @@ -253,8 +253,8 @@ Accounting for Units When Sorting :mod:`natsort` does not come with any pre-built mechanism to sort units, but you can write your own `key` to do this. Below, I will demonstrate sorting imperial lengths (e.g. feet an inches), but of course you can extend this to any -set of units you need. This example is based on code from -`this issue <https://github.com/SethMMorton/natsort/issues/100#issuecomment-530659310>`_, +set of units you need. This example is based on code +`from this issue <https://github.com/SethMMorton/natsort/issues/100#issuecomment-530659310>`_, and uses the function :func:`natsort.numeric_regex_chooser` to build a regular expression that will parse numbers in the same manner as :mod:`natsort` itself. @@ -426,9 +426,9 @@ If you need a codec different from ASCII or UTF-8, you can use Sorting a Pandas DataFrame -------------------------- -As of Pandas version 0.16.0, the sorting methods do not accept a ``key`` argument, -so you cannot simply pass :func:`natsort_keygen` to a Pandas DataFrame and sort. -This request has been made to the Pandas devs; see +As of Pandas version 0.16.0, the sorting methods do not accept a ``key`` +argument, so you cannot simply pass :func:`natsort_keygen` to a Pandas +DataFrame and sort. This request has been made to the Pandas devs; see `issue 3942 <https://github.com/pydata/pandas/issues/3942>`_ if you are interested. If you need to sort a Pandas DataFrame, please check out `this answer on StackOverflow <https://stackoverflow.com/a/29582718/1399279>`_ diff --git a/docs/howitworks.rst b/docs/howitworks.rst index a8176e3..fb157c6 100644 --- a/docs/howitworks.rst +++ b/docs/howitworks.rst @@ -36,7 +36,7 @@ If I want to compare '2 ft 7 in' to '2 ft 11 in', I might do the following We as humans know that the above should be true, but why does Python think it is false? Here is how it is performing the comparison: -.. code-block:: none +.. code-block:: '2' <=> '2' ==> equal, so keep going ' ' <=> ' ' ==> equal, so keep going @@ -53,18 +53,18 @@ The best way to handle this is to break the string into sub-components of numbers and non-numbers, and then convert the numeric parts into :func:`float` or :func:`int` types. This will force Python to actually understand the context of what it is sorting and then "do the -right thing." Luckily, it handles sorting lists of strings right out-of-the-box, -so the only hard part is actually making this string-to-list transformation -and then Python will handle the rest. +right thing." Luckily, it handles sorting lists of strings right +out-of-the-box, so the only hard part is actually making this string-to-list +transformation and then Python will handle the rest. -.. code-block:: none +.. code-block:: '2 ft 7 in' ==> (2, ' ft ', 7, ' in') '2 ft 11 in' ==> (2, ' ft ', 11, ' in') When Python compares the two, it roughly follows the below logic: -.. code-block:: none +.. code-block:: 2 <=> 2 ==> equal, so keep going ' ft ' <=> ' ft ' ==> a string is a special type of sequence - evaluate each character individually @@ -92,10 +92,10 @@ Natsort's Approach Decomposing Strings Into Sub-Components +++++++++++++++++++++++++++++++++++++++ -The first major hurtle to overcome is to decompose the string into sub-components. -Remarkably, this turns out to be the easy part, owing mostly to Python's easy access -to regular expressions. Breaking an arbitrary string based on a pattern is pretty -straightforward. +The first major hurtle to overcome is to decompose the string into +sub-components. Remarkably, this turns out to be the easy part, owing mostly +to Python's easy access to regular expressions. Breaking an arbitrary string +based on a pattern is pretty straightforward. .. code-block:: pycon @@ -106,10 +106,11 @@ straightforward. Clear (assuming you can read regular expressions) and concise. The reason I began developing :mod:`natsort` in the first place was because I -needed to handle the natural sorting of strings containing *real numbers*, not just -unsigned integers as the above example contains. By real numbers, I mean those like -``-45.4920E-23``. :mod:`natsort` can handle just about any number definition; -to that end, here are all the regular expressions used in :mod:`natsort`: +needed to handle the natural sorting of strings containing *real numbers*, not +just unsigned integers as the above example contains. By real numbers, I mean +those like ``-45.4920E-23``. :mod:`natsort` can handle just about any number +definition; to that end, here are all the regular expressions used in +:mod:`natsort`: .. code-block:: pycon @@ -120,9 +121,9 @@ to that end, here are all the regular expressions used in :mod:`natsort`: >>> unsigned_float_no_exponent = r'((?:[0-9]+\.?[0-9]*|\.[0-9]+))' >>> signed_float_no_exponent = r'([-+]?(?:[0-9]+\.?[0-9]*|\.[0-9]+))' -Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float definition because you -wouldn't want (for example) ``"banana"`` to be converted into ``['ba', 'nan', 'a']``, -Let's see an example: +Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float +definition because you wouldn't want (for example) ``"banana"`` to be converted +into ``['ba', 'nan', 'a']``, Let's see an example: .. code-block:: pycon @@ -135,21 +136,21 @@ Let's see an example: actual code there is also handling for non-ASCII unicode characters (such as ⑦), but I will ignore that aspect of :mod:`natsort` in this discussion. -Now, when the user wants to change the definition of a number, it is as easy as changing -the pattern supplied to the regular expression engine. - -Choosing the right default is hard, though (well, in this case it shouldn't have been -but I was rather thick-headed). -In retrospect, it should have been obvious that since essentially all the code examples -I had/have seen for natural sorting were for *unsigned integers*, I should have made the default -definition of a number an *unsigned integer*. But, in the brash days of my youth I assumed -that since my use case was real numbers, everyone else would be happier sorting by real numbers; -so, I made the default definition of a number a *signed float with exponent*. -`This astonished`_ `a lot`_ `of people`_ +Now, when the user wants to change the definition of a number, it is as easy as +changing the pattern supplied to the regular expression engine. + +Choosing the right default is hard, though (well, in this case it shouldn't +have been but I was rather thick-headed). In retrospect, it should have been +obvious that since essentially all the code examples I had/have seen for +natural sorting were for *unsigned integers*, I should have made the default +definition of a number an *unsigned integer*. But, in the brash days of my +youth I assumed that since my use case was real numbers, everyone else would +be happier sorting by real numbers; so, I made the default definition of a +number a *signed float with exponent*. `This astonished`_ `a lot`_ `of people`_ (`and some people aren't very nice when they are astonished`_). Starting with :mod:`natsort` version 4.0.0 the default number definition was -changed to an *unsigned integer* which satisfies the "least astonishment" principle, and -I have not heard a complaint since. +changed to an *unsigned integer* which satisfies the "least astonishment" +principle, and I have not heard a complaint since. Coercing Strings Containing Numbers Into Numbers ++++++++++++++++++++++++++++++++++++++++++++++++ @@ -193,28 +194,29 @@ Here are some timing results run on my machine: In [5]: %timeit [coerce_regex(x) for x in numbers] 10000 loops, best of 3: 123 µs per loop -What can we learn from this? The ``try: except`` method (arguably the most "pythonic" -of the solutions) is best for numeric input, but performs over 5X slower for non-numeric -input. Conversely, the regular expression method, though slower than ``try: except`` for -both input types, is more efficient for non-numeric input than for input that can be -converted to an ``int``. Further, even though the regular expression method is slower -for both input types, it is always at least twice as fast as the worst case for the -``try: except``. - -Why do I care? Shouldn't I just pick a method and not worry about it? Probably. However, -I am very conscious about the performance of :mod:`natsort`, and want it to be a true -drop-in replacement for :func:`sorted` without having to incur a performance penalty. -For the purposes of :mod:`natsort`, there is no clear winner between the two algorithms - -the data being passed to this function will likely be a mix of numeric and non-numeric -string content. Do I use the ``try: except`` method and hope the speed gains on -numbers will offset the non-number performance, or do I use regular expressions and -take the more stable performance? +What can we learn from this? The ``try: except`` method (arguably the most +"pythonic" of the solutions) is best for numeric input, but performs over 5X +slower for non-numeric input. Conversely, the regular expression method, though +slower than ``try: except`` for both input types, is more efficient for +non-numeric input than for input that can be converted to an ``int``. Further, +even though the regular expression method is slower for both input types, it is +always at least twice as fast as the worst case for the ``try: except``. + +Why do I care? Shouldn't I just pick a method and not worry about it? Probably. +However, I am very conscious about the performance of :mod:`natsort`, and want +it to be a true drop-in replacement for :func:`sorted` without having to incur +a performance penalty. For the purposes of :mod:`natsort`, there is no clear +winner between the two algorithms - the data being passed to this function will +likely be a mix of numeric and non-numeric string content. Do I use the +``try: except`` method and hope the speed gains on numbers will offset the +non-number performance, or do I use regular expressions and take the more +stable performance? It turns out that within the context of :mod:`natsort`, some assumptions can be made that make a hybrid approach attractive. Because all strings are pre-split -into numeric and non-numeric content *before* being passed to this coercion function, -the assumption can be made that *if a string begins with a digit or a sign, it -can be coerced into a number*. +into numeric and non-numeric content *before* being passed to this coercion +function, the assumption can be made that *if a string begins with a digit or a +sign, it can be coerced into a number*. .. code-block:: pycon @@ -238,9 +240,9 @@ So how does this perform compared to the standard coercion methods? In [7]: %timeit [coerce_to_int(x) for x in not_numbers] 10000 loops, best of 3: 26.4 µs per loop -The hybrid method eliminates most of the time wasted on numbers checking that it -is in fact a number before passing to :func:`int`, and eliminates the time wasted -in the exception stack for input that is not a number. +The hybrid method eliminates most of the time wasted on numbers checking +that it is in fact a number before passing to :func:`int`, and eliminates +the time wasted in the exception stack for input that is not a number. That's as fast as we can get, right? In pure Python, probably. At least, it's close. But because I am crazy and a glutton for punishment, I decided to see @@ -257,12 +259,12 @@ called :func:`fast_int`. How does it fair? Pretty well. 10000 loops, best of 3: 30 µs per loop During development of :mod:`natsort`, I wanted to ensure that using it did not -get in the way of a user's program by introducing a performance penalty to their code. -To that end, I do not feel like my adventures down the rabbit hole of optimization -of coercion functions was a waste; I can confidently look users in the eye and -say I considered every option in ensuring :mod:`natsort` is as efficient as possible. -This is why if `fastnumbers`_ is installed it will be used for this step, -and otherwise the hybrid method will be used. +get in the way of a user's program by introducing a performance penalty to +their code. To that end, I do not feel like my adventures down the rabbit hole +of optimization of coercion functions was a waste; I can confidently look users +in the eye and say I considered every option in ensuring :mod:`natsort` is as +efficient as possible. This is why if `fastnumbers`_ is installed it will be +used for this step, and otherwise the hybrid method will be used. .. note:: @@ -392,11 +394,11 @@ filename component as well. We can solve that nicely and quickly with >>> sorted(paths, key=natsort_key_with_path_support) ['/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (10)/file.tar.gz'] -This works because in addition to breaking the input by path separators, the final -filename component is separated from its extensions as well [#f1]_. *Then*, each of these -separated components is sent to the :mod:`natsort` algorithm, so the result is -a tuple of tuples. Once that is done, we can see how comparisons can be done in -the expected manner. +This works because in addition to breaking the input by path separators, +the final filename component is separated from its extensions as well +[#f1]_. *Then*, each of these separated components is sent to the +:mod:`natsort` algorithm, so the result is a tuple of tuples. Once that +is done, we can see how comparisons can be done in the expected manner. .. code-block:: pycon @@ -455,22 +457,24 @@ Let's break these down. #. ``natsort_key_with_poor_real_number_support('12 apples') < natsort_key_with_poor_real_number_support('apples')`` is the same as ``(12.0, ' apples') < ('apples',)``, and thus a number gets compared to a string [#f2]_ which also is a no-no. -#. This one scores big on the astonishment scale, especially if one accidentally - uses signed integers or real numbers when they mean to use unsigned integers. +#. This one scores big on the astonishment scale, especially if one + accidentally uses signed integers or real numbers when they mean + to use unsigned integers. ``natsort_key_with_poor_real_number_support('version5.3.0') < natsort_key_with_poor_real_number_support('version5.3rc1')`` - is the same as ``('version', 5.3, 0.0) < ('version', 5.3, 'rc', 1.0)``, so in the - third element a number gets compared to a string, once again the same - old no-no. (The same would happen with ``'version5-3'`` and ``'version5-a'``, - which would be come ``('version', 5, -3)`` and ``('version', 5, '-a')``). - -As you might expect, the solution to the first issue is to wrap the ``re.split`` -call in a ``try: except:`` block and handle the number specially if a -:exc:`TypeError` is raised. The second and third cases *could* be handled + is the same as ``('version', 5.3, 0.0) < ('version', 5.3, 'rc', 1.0)``, + so in the third element a number gets compared to a string, once again + the same old no-no. (The same would happen with ``'version5-3'`` and + ``'version5-a'``, which would become ``('version', 5, -3)`` and + ``('version', 5, '-a')``). + +As you might expect, the solution to the first issue is to wrap the +``re.split`` call in a ``try: except:`` block and handle the number specially +if a :exc:`TypeError` is raised. The second and third cases *could* be handled in a "special case" manner, meaning only respond and do something different if these problems are detected. But a less error-prone method is to ensure that the data is correct-by-construction, and this can be done by ensuring that the returned tuples *always* start with a string, and then alternate -in a string-number-string-number-string patter;n this can be achieved by +in a string-number-string-number-string pattern; this can be achieved by adding an empty string wherever the pattern is not followed [#f3]_. This ends up working out pretty nicely because empty strings are always "less" than any non-empty string, and we typically want numbers to come before strings. @@ -501,7 +505,8 @@ Let's take a look at how this works out. >>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_good_real_number_support) ['version5.3.0', 'version5.3rc1'] -How the "good" version works will be given in `TL;DR 2 - Handling Crappy, Real-World Input`_. +How the "good" version works will be given in +`TL;DR 2 - Handling Crappy, Real-World Input`_. Handling NaN ++++++++++++ @@ -548,7 +553,8 @@ to know how **NaN** will behave in a sorting algorithm). The simplest way to satisfy the "least astonishment" principle is to substitute **NaN** with some other value. But what value is *least* astonishing? I chose to replace **NaN** with :math:`-\infty` so that these poorly behaved elements always -end up at the front where the users will most likely be alerted to their presence. +end up at the front where the users will most likely be alerted to their +presence. .. code-block:: pycon @@ -571,6 +577,8 @@ Let's see how our elegant key function from :ref:`TL;DR 1 <tldr1>` has become bastardized in order to support handling mixed real-world data and user customizations. +.. code-block:: pycon + >>> def natsort_key(x, as_float=False, signed=False, as_path=False): ... if as_float: ... regex = signed_float if signed else unsigned_float @@ -600,10 +608,10 @@ and user customizations. ... return tuple(sep_inserter(coerced_input, '')) ... -And this doesn't even show handling :class:`bytes` type! Notice that we have +And this doesn't even show handling :class:`bytes` type! Notice that we have to do non-obvious things like modify the return form of numbers when ``as_path`` -is given, just to avoid comparing strings and numbers for the case in which a user provides -input like ``['/home/me', 42]``. +is given, just to avoid comparing strings and numbers for the case in which a +user provides input like ``['/home/me', 42]``. Let's take it out for a spin! @@ -629,9 +637,10 @@ Probably the most challenging special case I had to handle was getting :mod:`natsort` to handle sorting the non-numerical parts of input correctly, and also allowing it to sort the numerical bits in different locales. This was in no way what I originally set out to do with this -library, so I was `caught a bit off guard when the request was initially made`_. -I discovered the :mod:`locale` library, and assumed that if it's part of Python's -StdLib there can't be too many dragons, right? +library, so I was +`caught a bit off guard when the request was initially made`_. +I discovered the :mod:`locale` library, and assumed that if it's part of +Python's StdLib there can't be too many dragons, right? .. admonition:: INCOMPLETE LIST OF DRAGONS @@ -653,9 +662,11 @@ These can be summed up as follows: #. :mod:`locale` is a thin wrapper over your operating system's *locale* library, so if *that* is broken (like it is on BSD and OSX) then :mod:`locale` is broken in Python. -#. Because of a bug in legacy Python (i.e. Python 2), there is no uniform way to use - the :mod:`locale` sorting functionality between legacy Python and Python 3. -#. People have differing opinions of how capitalization should affect word order. +#. Because of a bug in legacy Python (i.e. Python 2), there is no uniform + way to use the :mod:`locale` sorting functionality between legacy Python + and Python 3. +#. People have differing opinions of how capitalization should affect word + order. #. There is no built-in way to handle locale-dependent thousands separators and decimal points *robustly*. #. Proper handling of Unicode is complicated. @@ -692,7 +703,8 @@ so all capitalized words appear first. Not everyone agrees that this is the correct order. Some believe that the capitalized words should be last (``['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']``). Some believe that both the lowercase and uppercase versions -should appear together (``['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']``). +should appear together +(``['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']``). Some believe that both should be true ☹. Some people don't care at all [#f4]_. Solving the first case (I call it *LOWERCASEFIRST*) is actually pretty @@ -787,7 +799,6 @@ Unicode is hard and complicated. Here's an example. >>> sorted(a) # doctest: +SKIP ['a', 'e', 'é', 'f', 'z', 'é'] - There are more than one way to represent the character 'é' in Unicode. In fact, many characters have multiple representations. This is a challenge because comparing the two representations would return ``False`` even though @@ -806,12 +817,14 @@ The original approach that :mod:`natsort` took with respect to non-ASCII Unicode characters was to say "just use the :mod:`locale` or :mod:`PyICU` library" and then cross it's fingers and hope those libraries take care of it. As you will find in the following -sections, that comes with its own baggage, and turned out to not always work anyway -(see https://stackoverflow.com/q/45734562/1399279). A more robust approach is to -handle the Unicode out-of-the-box without invoking a heavy-handed library -like :mod:`locale` or :mod:`PyICU`. To do this, we must use *normalization*. - -To fully understand Unicode normalization, `check out some official Unicode documentation`_. +sections, that comes with its own baggage, and turned out to not always work +anyway (see https://stackoverflow.com/q/45734562/1399279). A more robust +approach is to handle the Unicode out-of-the-box without invoking a +heavy-handed library like :mod:`locale` or :mod:`PyICU`. +To do this, we must use *normalization*. + +To fully understand Unicode normalization, +`check out some official Unicode documentation`_. Just kidding... that's too much text. The following StackOverflow answers do a good job at explaining Unicode normalization in simple terms: https://stackoverflow.com/a/7934397/1399279 and @@ -1076,11 +1089,12 @@ what the rest of the world assumes. :func:`sep_inserter` in `util.py`_. .. [#f4] Handling each of these is straightforward, but coupled with the rapidly - fracturing execution paths presented in :ref:`TL;DR 2 <tldr2>` one can imagine - this will get out of hand quickly. If you take a look at `natsort.py`_ and - `util.py`_ you can observe that to avoid this I take a more functional approach - to construting the :mod:`natsort` algorithm as opposed to the procedural approach - illustrated in :ref:`TL;DR 1 <tldr1>` and :ref:`TL;DR 2 <tldr2>`. + fracturing execution paths presented in :ref:`TL;DR 2 <tldr2>` one can + imagine this will get out of hand quickly. If you take a look at + `natsort.py`_ and `util.py`_ you can observe that to avoid this I take + a more functional approach to construting the :mod:`natsort` algorithm + as opposed to the procedural approach illustrated in + :ref:`TL;DR 1 <tldr1>` and :ref:`TL;DR 2 <tldr2>`. .. _ASCII table: https://www.asciitable.com/ .. _getting sorting right is surprisingly hard: http://www.compciv.org/guides/python/fundamentals/sorting-collections-with-sorted/ diff --git a/docs/intro.rst b/docs/intro.rst index e5905fc..0a25a35 100644 --- a/docs/intro.rst +++ b/docs/intro.rst @@ -17,11 +17,11 @@ Simple yet flexible natural sorting in Python. **NOTE**: Please see the `Deprecation Schedule`_ section for changes in :mod:`natsort` version 6.0.0 and in the upcoming version 7.0.0. -:mod:`natsort` is a general utility for sorting lists *naturally*; the definition -of "naturally" is not well-defined, but the most common definition is that numbers -contained within the string should be sorted as numbers and not as you would -other characters. If you need to present sorted output to a user, you probably -want to sort it naturally. +:mod:`natsort` is a general utility for sorting lists *naturally*; the +definition of "naturally" is not well-defined, but the most common definition +is that numbers contained within the string should be sorted as numbers and not +as you would other characters. If you need to present sorted output to a user, +you probably want to sort it naturally. :mod:`natsort` was initially created for sorting scientific output filenames that contained signed floating point numbers in the names. There was a lack of @@ -32,8 +32,9 @@ and its answers and links therein, `this ActiveState forum <https://code.activestate.com/recipes/285264-natural-string-sorting/>`_, and of course `this great article on natural sorting <https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/>`_ from CodingHorror.com for examples of what I mean. -:mod:`natsort` was created to fill in this gap, but has since expanded to handle -just about any definition of a number, as well as other sorting customizations. +:mod:`natsort` was created to fill in this gap, but has since expanded to +handle just about any definition of a number, as well as other sorting +customizations. Quick Description ----------------- @@ -183,8 +184,8 @@ bitwise OR operator (``|``). For example, All of the available customizations can be found in the documentation for the :class:`~natsort.ns` enum. -You can also add your own custom transformation functions with the ``key`` argument. -These can be used with ``alg`` if you wish: +You can also add your own custom transformation functions with the ``key`` +argument. These can be used with ``alg`` if you wish: .. code-block:: pycon @@ -246,8 +247,9 @@ method. >>> a ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in'] -All of the algorithm customizations mentioned in the `Further Customizing Natsort`_ -section can also be applied to :func:`~natsort_keygen` through the *alg* keyword option. +All of the algorithm customizations mentioned in the +`Further Customizing Natsort`_ section can also be applied to :func:`~natsort_keygen` +through the *alg* keyword option. Other Useful Things +++++++++++++++++++ @@ -263,18 +265,19 @@ FAQ How do I debug :func:`~natsorted`? The best way to debug :func:`~natsorted` is to generate a key using :func:`~natsort_keygen` - with the same options being passed to :func:`~natsorted`. One can take a look at - exactly what is being done with their input using this key - it is highly recommended - to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_ + with the same options being passed to :func:`~natsorted`. One can take a + look at exactly what is being done with their input using this key - it is + highly recommended to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_ for *how* to debug, and also to review the :ref:`howitworks` page for *why* :mod:`natsort` is doing that to your data. - If you are trying to sort custom classes and running into trouble, please take a look at - https://github.com/SethMMorton/natsort/issues/60. In short, + If you are trying to sort custom classes and running into trouble, please + take a look at https://github.com/SethMMorton/natsort/issues/60. In short, custom classes are not likely to be sorted correctly if one relies - on the behavior of ``__lt__`` and the other rich comparison operators in their - custom class - it is better to use a ``key`` function with :mod:`natsort`, or - use the :mod:`natsort` key as part of your rich comparison operator definition. + on the behavior of ``__lt__`` and the other rich comparison operators in + their custom class - it is better to use a ``key`` function with + :mod:`natsort`, or use the :mod:`natsort` key as part of your rich + comparison operator definition. How *does* :mod:`natsort` work? If you don't want to read :ref:`howitworks`, here is a quick primer. @@ -318,8 +321,8 @@ How *does* :mod:`natsort` work? Shell script ------------ -:mod:`natsort` comes with a shell script called :mod:`natsort`, or can also be called -from the command line with ``python -m natsort``. +:mod:`natsort` comes with a shell script called :mod:`natsort`, or can also be +called from the command line with ``python -m natsort``. Requirements ------------ @@ -335,9 +338,9 @@ fastnumbers The most efficient sorting can occur if you install the `fastnumbers <https://pypi.org/project/fastnumbers>`_ package (version >=2.0.0); it helps with the string to number conversions. -:mod:`natsort` will still run (efficiently) without the package, but if you need -to squeeze out that extra juice it is recommended you include this as a dependency. -:mod:`natsort` will not require (or check) that +:mod:`natsort` will still run (efficiently) without the package, but if you +need to squeeze out that extra juice it is recommended you include this as a +dependency. :mod:`natsort` will not require (or check) that `fastnumbers <https://pypi.org/project/fastnumbers>`_ is installed at installation. @@ -373,20 +376,22 @@ at installation time to install those dependencies as well - use ``fast`` for How to Run Tests ---------------- -Please note that :mod:`natsort` is NOT set-up to support ``python setup.py test``. +Please note that :mod:`natsort` is NOT set-up to support +``python setup.py test``. The recommended way to run tests is with `tox <https://tox.readthedocs.io/en/latest/>`_. -After installing ``tox``, running tests is as simple as executing the following in the -``natsort`` directory: +After installing ``tox``, running tests is as simple as executing the following +in the ``natsort`` directory: .. code-block:: sh $ tox -``tox`` will create virtual a virtual environment for your tests and install all the -needed testing requirements for you. You can specify a particular python version -with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is done with ``tox -e flake8``. -You can see all available testing environments with ``tox --listenvs``. +``tox`` will create virtual a virtual environment for your tests and install +all the needed testing requirements for you. You can specify a particular +python version with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is +done with ``tox -e flake8``. You can see all available testing environments +with ``tox --listenvs``. If you do not wish to use ``tox``, you can install the testing dependencies with the ``dev/requirements.txt`` file and then run the tests manually using @@ -403,7 +408,8 @@ Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this How to Build Documentation -------------------------- -If you want to build the documentation for :mod:`natsort`, it is recommended to use ``tox``: +If you want to build the documentation for :mod:`natsort`, it is recommended to +use ``tox``: .. code-block:: console @@ -425,10 +431,10 @@ Dropping Python 2.7 Support :mod:`natsort` version 7.0.0 will drop support for Python 2.7. -The version 6.X branch will remain as a "long term support" branch where bug fixes -are applied so that users who cannot update from Python 2.7 will not be forced to -use a buggy :mod:`natsort` version. Once version 7.0.0 is released, new features -will not be added to version 6.X, only bug fixes. +The version 6.X branch will remain as a "long term support" branch where bug +fixes are applied so that users who cannot update from Python 2.7 will not be +forced to use a buggy :mod:`natsort` version. Once version 7.0.0 is released, +new features will not be added to version 6.X, only bug fixes. Deprecated APIs +++++++++++++++ @@ -443,19 +449,21 @@ In :mod:`natsort` version 6.0.0, the following APIs and functions were removed - ``ns.TYPESAFE`` (deprecated since version 5.0.0) - ``ns.DIGIT`` (deprecated since version 5.0.0) - ``ns.VERSION`` (deprecated since version 5.0.0) - - :func:`~natsort.versorted` (discouraged since version 4.0.0, officially deprecated since version 5.5.0) - - :func:`~natsort.index_versorted` (discouraged since version 4.0.0, officially deprecated since version 5.5.0) + - :func:`~natsort.versorted` (discouraged since version 4.0.0, + officially deprecated since version 5.5.0) + - :func:`~natsort.index_versorted` (discouraged since version 4.0.0, + officially deprecated since version 5.5.0) -In general, if you want to determine if you are using deprecated APIs you can run your -code with the following flag +In general, if you want to determine if you are using deprecated APIs you can +run your code with the following flag .. code-block:: console $ python -Wdefault::DeprecationWarning my-code.py -By default :exc:`DeprecationWarnings` are not shown, but this will cause them to be shown. -Alternatively, you can just set the environment variable ``PYTHONWARNINGS`` to -"default::DeprecationWarning" and then run your code. +By default :exc:`DeprecationWarnings` are not shown, but this will cause them +to be shown. Alternatively, you can just set the environment variable +``PYTHONWARNINGS`` to "default::DeprecationWarning" and then run your code. Dropped Pipenv for Development ++++++++++++++++++++++++++++++ diff --git a/docs/locale_issues.rst b/docs/locale_issues.rst index 88cf3b8..427eedc 100644 --- a/docs/locale_issues.rst +++ b/docs/locale_issues.rst @@ -9,9 +9,9 @@ Possible Issues with :func:`~natsort.humansorted` or ``ns.LOCALE`` Being Locale-Aware Means Both Numbers and Non-Numbers ----------------------------------------------------- -In addition to modifying how characters are sorted, ``ns.LOCALE`` will take into -account locale-dependent thousands separators (and locale-dependent decimal -separators if ``ns.FLOAT`` is enabled). This means that if you are in a +In addition to modifying how characters are sorted, ``ns.LOCALE`` will take +into account locale-dependent thousands separators (and locale-dependent +decimal separators if ``ns.FLOAT`` is enabled). This means that if you are in a locale that uses commas as the thousands separator, a number like ``123,456`` will be interpreted as ``123456``. If this is not what you want, you may consider using ``ns.LOCALEALPHA`` which will only enable locale-aware @@ -52,8 +52,8 @@ installed, please keep the following known problems and issues in mind. .. note:: Remember, if you have `PyICU`_ installed you shouldn't need to worry about any of these. -Explicitly Set the Locale Before Using :func:`~natsort.humansorted` or ``ns.LOCALE`` -++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +Explicitly Set the Locale Before Using ``ns.LOCALE`` +++++++++++++++++++++++++++++++++++++++++++++++++++++ I have found that unless you explicitly set a locale, the sorted order may not be what you expect. Setting this is straightforward @@ -90,8 +90,8 @@ install this there is some hope. a built-in lookup table of thousands separators that are incorrect on OS X/BSD (but is possible it is not complete... please file an issue if you see it is not complete) - 2. Use "\*.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than "\*.UTF-8" - locale. I have found that these have fewer issues than "UTF-8", but - your mileage may vary. + 2. Use "\*.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than + "\*.UTF-8" locale. I have found that these have fewer issues than + "UTF-8", but your mileage may vary. .. _PyICU: https://pypi.org/project/PyICU diff --git a/docs/shell.rst b/docs/shell.rst index 8a17ccc..0d7d3c9 100644 --- a/docs/shell.rst +++ b/docs/shell.rst @@ -14,7 +14,7 @@ Below is the usage and some usage examples for the ``natsort`` shell script. Usage ----- -.. code-block:: none +.. code-block:: usage: natsort [-h] [--version] [-p] [-f LOW HIGH] [-F LOW HIGH] [-e EXCLUDE] [-r] [-t {digit,int,float,version,ver}] [--nosign] [--noexp] @@ -81,7 +81,7 @@ named after the parameter used: $ ls *.out mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out -(Obviously, in reality there would be more files, but you get the idea.) Notice +(Obviously, in reality there would be more files, but you get the idea.) Notice that the shell sorts in lexicographical order. This is the behavior of programs like ``find`` as well as ``ls``. The problem is passing these files to an analysis program causes them not to appear in numerical order, which can lead @@ -114,8 +114,8 @@ To sort version numbers, use the default ``--number-type``: prog-1.10.zip prog-2.0.zip -In general, all ``natsort`` shell script options mirror the :func:`~natsorted` API, -with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude`` +In general, all ``natsort`` shell script options mirror the :func:`~natsorted` +API, with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude`` options. These three options are used as follows: .. code-block:: console |