summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJack Zhou <univerio@gmail.com>2014-07-29 11:49:52 -0700
committerJack Zhou <univerio@gmail.com>2014-08-07 12:08:43 -0700
commit9e621d18aa84bf20640283e1e7a4abd17af47df9 (patch)
tree336e87c07112658ded9b869d682bf242a6e78ebf
parente5620993bf89338959a0cce4ba567d92b1dd1821 (diff)
downloadsqlalchemy-9e621d18aa84bf20640283e1e7a4abd17af47df9.tar.gz
Added documentation about interaction between `subqueryload` and LIMIT/OFFSET.
-rw-r--r--doc/build/faq.rst84
-rw-r--r--doc/build/orm/loading.rst15
-rw-r--r--doc/build/orm/tutorial.rst6
3 files changed, 105 insertions, 0 deletions
diff --git a/doc/build/faq.rst b/doc/build/faq.rst
index 0c8314cb5..d642d1de8 100644
--- a/doc/build/faq.rst
+++ b/doc/build/faq.rst
@@ -622,6 +622,90 @@ The same idea applies to all the other arguments, such as ``foreign_keys``::
foo = relationship(Dest, foreign_keys=[foo_id, bar_id])
+.. _faq_subqueryload_sort:
+
+Why must I always ``ORDER BY`` a unique column when using ``subqueryload``?
+----------------------------------------------------------------------------
+
+The SQL standard prescribes that RDBMSs are free to return rows in any order it
+deems appropriate, if no ``ORDER BY`` clause is specified. This even extends to
+the case where the ``ORDER BY`` clause is not unique across all rows, i.e. rows
+with the same value in the ``ORDER BY`` column(s) will not necessarily be
+returned in a deterministic order.
+
+SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query
+(where the table specified in the relationship is joined to the original query)
+and then attempting to match up the results in Python. This works fine
+normally:
+
+.. sourcecode:: python+sql
+
+ >>> session.query(User).options(subqueryload(User.addresses)).all()
+ {opensql}# the "main" query
+ SELECT users.id AS users_id
+ FROM users
+ {stop}
+ {opensql}# the "load" query issued by subqueryload
+ SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
+ FROM (SELECT users.id AS users_id
+ FROM users) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
+
+Notice how the main query is a subquery in the load query. When an
+``OFFSET``/``LIMIT`` is involved, however, things get a bit tricky:
+
+.. sourcecode:: python+sql
+
+ >>> user = session.query(User).options(subqueryload(User.addresses)).first()
+ {opensql}# the "main" query
+ SELECT users.id AS users_id
+ FROM users
+ LIMIT 1
+ {stop}
+ {opensql}# the "load" query issued by subqueryload
+ SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
+ FROM (SELECT users.id AS users_id
+ FROM users
+ LIMIT 1) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
+
+The main query is still a subquery in the load query, but *it may return a
+different set of results in the second query from the first* because it does
+not have a deterministic sort order! Depending on database internals, there is
+a chance we may get the following resultset for the two queries::
+
+ +--------+
+ |users_id|
+ +--------+
+ | 1|
+ +--------+
+
+ +------------+-----------------+---------------+
+ |addresses_id|addresses_user_id|anon_1_users_id|
+ +------------+-----------------+---------------+
+ | 3| 2| 2|
+ +------------+-----------------+---------------+
+ | 4| 2| 2|
+ +------------+-----------------+---------------+
+
+From SQLAlchemy's point of view, it didn't get any addresses back for user 1,
+so ``user.addresses`` is empty. Oops.
+
+The solution to this problem is to always specify a deterministic sort order,
+so that the main query always returns the same set of rows. This generally
+means that you should :meth:`.Query.order_by` on a unique column on the table,
+usually the primary key::
+
+ session.query(User).options(subqueryload(User.addresses)).order_by(User.id).first()
+
+You can get away with not doing a sort if the ``OFFSET``/``LIMIT`` does not
+throw away any rows at all, but it's much simpler to remember to always ``ORDER
+BY`` the primary key::
+
+ session.query(User).options(subqueryload(User.addresses)).filter(User.id == 1).first()
+
+Note that :func:`.joinedload` does not suffer from the same problem because
+only one query is ever issued, so the load query cannot be different from the
+main query.
+
Performance
===========
diff --git a/doc/build/orm/loading.rst b/doc/build/orm/loading.rst
index 6c2fac004..27846b9b2 100644
--- a/doc/build/orm/loading.rst
+++ b/doc/build/orm/loading.rst
@@ -120,6 +120,21 @@ query options:
# set children to load eagerly with a second statement
session.query(Parent).options(subqueryload('children')).all()
+.. _subquery_loading_tips:
+
+Subquery Loading Tips
+^^^^^^^^^^^^^^^^^^^^^
+
+If you have ``LIMIT`` or ``OFFSET`` in your query, you **must** ``ORDER BY`` a
+unique column, generally the primary key of your table, in order to ensure
+correct results (see :ref:`faq_subqueryload_sort`)::
+
+ # incorrect
+ session.query(User).options(subqueryload(User.addresses)).order_by(User.name).first()
+
+ # correct
+ session.query(User).options(subqueryload(User.addresses)).order_by(User.name, User.id).first()
+
Loading Along Paths
-------------------
diff --git a/doc/build/orm/tutorial.rst b/doc/build/orm/tutorial.rst
index f90dc48d2..e75eda1ee 100644
--- a/doc/build/orm/tutorial.rst
+++ b/doc/build/orm/tutorial.rst
@@ -1703,6 +1703,12 @@ very easy to use:
>>> jack.addresses
[<Address(email_address='jack@google.com')>, <Address(email_address='j25@yahoo.com')>]
+.. warning::
+
+ If you use :func:`.subqueryload`, you should generally
+ :meth:`.Query.order_by` on a unique column in order to ensure correct
+ results. See :ref:`subquery_loading_tips`.
+
Joined Load
-------------