Added documentation about interaction between `subqueryload` and LIMIT/OFFSET.

author: Jack Zhou <univerio@gmail.com> 2014-07-29 11:49:52 -0700
committer: Jack Zhou <univerio@gmail.com> 2014-08-07 12:08:43 -0700
commit: 9e621d18aa84bf20640283e1e7a4abd17af47df9 (patch)
tree: 336e87c07112658ded9b869d682bf242a6e78ebf
parent: e5620993bf89338959a0cce4ba567d92b1dd1821 (diff)
download: sqlalchemy-9e621d18aa84bf20640283e1e7a4abd17af47df9.tar.gz
3 files changed, 105 insertions, 0 deletions
diff --git a/doc/build/faq.rst b/doc/build/faq.rst
index 0c8314cb5..d642d1de8 100644
--- a/doc/build/faq.rst
+++ b/doc/build/faq.rst
@@ -622,6 +622,90 @@ The same idea applies to all the other arguments, such as ``foreign_keys``::
 
         foo = relationship(Dest, foreign_keys=[foo_id, bar_id])
 
+.. _faq_subqueryload_sort:
+
+Why must I always ``ORDER BY`` a unique column when using ``subqueryload``?
+----------------------------------------------------------------------------
+
+The SQL standard prescribes that RDBMSs are free to return rows in any order it
+deems appropriate, if no ``ORDER BY`` clause is specified. This even extends to
+the case where the ``ORDER BY`` clause is not unique across all rows, i.e. rows
+with the same value in the ``ORDER BY`` column(s) will not necessarily be
+returned in a deterministic order.
+
+SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query
+(where the table specified in the relationship is joined to the original query)
+and then attempting to match up the results in Python. This works fine
+normally:
+
+.. sourcecode:: python+sql
+
+    >>> session.query(User).options(subqueryload(User.addresses)).all()
+    {opensql}# the "main" query
+    SELECT users.id AS users_id
+    FROM users
+    {stop}
+    {opensql}# the "load" query issued by subqueryload
+    SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
+    FROM (SELECT users.id AS users_id
+    FROM users) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
+
+Notice how the main query is a subquery in the load query. When an
+``OFFSET``/``LIMIT`` is involved, however, things get a bit tricky:
+
+.. sourcecode:: python+sql
+
+    >>> user = session.query(User).options(subqueryload(User.addresses)).first()
+    {opensql}# the "main" query
+    SELECT users.id AS users_id
+    FROM users
+     LIMIT 1
+    {stop}
+    {opensql}# the "load" query issued by subqueryload
+    SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
+    FROM (SELECT users.id AS users_id
+    FROM users
+     LIMIT 1) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
+
+The main query is still a subquery in the load query, but *it may return a
+different set of results in the second query from the first* because it does
+not have a deterministic sort order! Depending on database internals, there is
+a chance we may get the following resultset for the two queries::
+
+    +--------+
+    |users_id|
+    +--------+
+    |       1|
+    +--------+
+
+    +------------+-----------------+---------------+
+    |addresses_id|addresses_user_id|anon_1_users_id|
+    +------------+-----------------+---------------+
+    |           3|                2|              2|
+    +------------+-----------------+---------------+
+    |           4|                2|              2|
+    +------------+-----------------+---------------+
+
+From SQLAlchemy's point of view, it didn't get any addresses back for user 1,
+so ``user.addresses`` is empty. Oops.
+
+The solution to this problem is to always specify a deterministic sort order,
+so that the main query always returns the same set of rows. This generally
+means that you should :meth:`.Query.order_by` on a unique column on the table,
+usually the primary key::
+
+    session.query(User).options(subqueryload(User.addresses)).order_by(User.id).first()
+
+You can get away with not doing a sort if the ``OFFSET``/``LIMIT`` does not
+throw away any rows at all, but it's much simpler to remember to always ``ORDER
+BY`` the primary key::
+
+    session.query(User).options(subqueryload(User.addresses)).filter(User.id == 1).first()
+
+Note that :func:`.joinedload` does not suffer from the same problem because
+only one query is ever issued, so the load query cannot be different from the
+main query.
+
 Performance
 ===========
 
diff --git a/doc/build/orm/loading.rst b/doc/build/orm/loading.rst
index 6c2fac004..27846b9b2 100644
--- a/doc/build/orm/loading.rst
+++ b/doc/build/orm/loading.rst
@@ -120,6 +120,21 @@ query options:
     # set children to load eagerly with a second statement
     session.query(Parent).options(subqueryload('children')).all()
 
+.. _subquery_loading_tips:
+
+Subquery Loading Tips
+^^^^^^^^^^^^^^^^^^^^^
+
+If you have ``LIMIT`` or ``OFFSET`` in your query, you **must** ``ORDER BY`` a
+unique column, generally the primary key of your table, in order to ensure
+correct results (see :ref:`faq_subqueryload_sort`)::
+
+    # incorrect
+    session.query(User).options(subqueryload(User.addresses)).order_by(User.name).first()
+
+    # correct
+    session.query(User).options(subqueryload(User.addresses)).order_by(User.name, User.id).first()
+
 Loading Along Paths
 -------------------
 
diff --git a/doc/build/orm/tutorial.rst b/doc/build/orm/tutorial.rst
index f90dc48d2..e75eda1ee 100644
--- a/doc/build/orm/tutorial.rst
+++ b/doc/build/orm/tutorial.rst
@@ -1703,6 +1703,12 @@ very easy to use:
     >>> jack.addresses
     [<Address(email_address='jack@google.com')>, <Address(email_address='j25@yahoo.com')>]
 
+.. warning::
+
+   If you use :func:`.subqueryload`, you should generally
+   :meth:`.Query.order_by` on a unique column in order to ensure correct
+   results. See :ref:`subquery_loading_tips`.
+
 Joined Load
 -------------
author	Jack Zhou <univerio@gmail.com>	2014-07-29 11:49:52 -0700
committer	Jack Zhou <univerio@gmail.com>	2014-08-07 12:08:43 -0700
commit	9e621d18aa84bf20640283e1e7a4abd17af47df9 (patch)
tree	336e87c07112658ded9b869d682bf242a6e78ebf
parent	e5620993bf89338959a0cce4ba567d92b1dd1821 (diff)
download	sqlalchemy-9e621d18aa84bf20640283e1e7a4abd17af47df9.tar.gz