diff options
author | Jack Zhou <univerio@gmail.com> | 2014-07-29 11:49:52 -0700 |
---|---|---|
committer | Jack Zhou <univerio@gmail.com> | 2014-08-07 12:08:43 -0700 |
commit | 9e621d18aa84bf20640283e1e7a4abd17af47df9 (patch) | |
tree | 336e87c07112658ded9b869d682bf242a6e78ebf | |
parent | e5620993bf89338959a0cce4ba567d92b1dd1821 (diff) | |
download | sqlalchemy-9e621d18aa84bf20640283e1e7a4abd17af47df9.tar.gz |
Added documentation about interaction between `subqueryload` and LIMIT/OFFSET.
-rw-r--r-- | doc/build/faq.rst | 84 | ||||
-rw-r--r-- | doc/build/orm/loading.rst | 15 | ||||
-rw-r--r-- | doc/build/orm/tutorial.rst | 6 |
3 files changed, 105 insertions, 0 deletions
diff --git a/doc/build/faq.rst b/doc/build/faq.rst index 0c8314cb5..d642d1de8 100644 --- a/doc/build/faq.rst +++ b/doc/build/faq.rst @@ -622,6 +622,90 @@ The same idea applies to all the other arguments, such as ``foreign_keys``:: foo = relationship(Dest, foreign_keys=[foo_id, bar_id]) +.. _faq_subqueryload_sort: + +Why must I always ``ORDER BY`` a unique column when using ``subqueryload``? +---------------------------------------------------------------------------- + +The SQL standard prescribes that RDBMSs are free to return rows in any order it +deems appropriate, if no ``ORDER BY`` clause is specified. This even extends to +the case where the ``ORDER BY`` clause is not unique across all rows, i.e. rows +with the same value in the ``ORDER BY`` column(s) will not necessarily be +returned in a deterministic order. + +SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query +(where the table specified in the relationship is joined to the original query) +and then attempting to match up the results in Python. This works fine +normally: + +.. sourcecode:: python+sql + + >>> session.query(User).options(subqueryload(User.addresses)).all() + {opensql}# the "main" query + SELECT users.id AS users_id + FROM users + {stop} + {opensql}# the "load" query issued by subqueryload + SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id + FROM (SELECT users.id AS users_id + FROM users) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id + +Notice how the main query is a subquery in the load query. When an +``OFFSET``/``LIMIT`` is involved, however, things get a bit tricky: + +.. sourcecode:: python+sql + + >>> user = session.query(User).options(subqueryload(User.addresses)).first() + {opensql}# the "main" query + SELECT users.id AS users_id + FROM users + LIMIT 1 + {stop} + {opensql}# the "load" query issued by subqueryload + SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id + FROM (SELECT users.id AS users_id + FROM users + LIMIT 1) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id + +The main query is still a subquery in the load query, but *it may return a +different set of results in the second query from the first* because it does +not have a deterministic sort order! Depending on database internals, there is +a chance we may get the following resultset for the two queries:: + + +--------+ + |users_id| + +--------+ + | 1| + +--------+ + + +------------+-----------------+---------------+ + |addresses_id|addresses_user_id|anon_1_users_id| + +------------+-----------------+---------------+ + | 3| 2| 2| + +------------+-----------------+---------------+ + | 4| 2| 2| + +------------+-----------------+---------------+ + +From SQLAlchemy's point of view, it didn't get any addresses back for user 1, +so ``user.addresses`` is empty. Oops. + +The solution to this problem is to always specify a deterministic sort order, +so that the main query always returns the same set of rows. This generally +means that you should :meth:`.Query.order_by` on a unique column on the table, +usually the primary key:: + + session.query(User).options(subqueryload(User.addresses)).order_by(User.id).first() + +You can get away with not doing a sort if the ``OFFSET``/``LIMIT`` does not +throw away any rows at all, but it's much simpler to remember to always ``ORDER +BY`` the primary key:: + + session.query(User).options(subqueryload(User.addresses)).filter(User.id == 1).first() + +Note that :func:`.joinedload` does not suffer from the same problem because +only one query is ever issued, so the load query cannot be different from the +main query. + Performance =========== diff --git a/doc/build/orm/loading.rst b/doc/build/orm/loading.rst index 6c2fac004..27846b9b2 100644 --- a/doc/build/orm/loading.rst +++ b/doc/build/orm/loading.rst @@ -120,6 +120,21 @@ query options: # set children to load eagerly with a second statement session.query(Parent).options(subqueryload('children')).all() +.. _subquery_loading_tips: + +Subquery Loading Tips +^^^^^^^^^^^^^^^^^^^^^ + +If you have ``LIMIT`` or ``OFFSET`` in your query, you **must** ``ORDER BY`` a +unique column, generally the primary key of your table, in order to ensure +correct results (see :ref:`faq_subqueryload_sort`):: + + # incorrect + session.query(User).options(subqueryload(User.addresses)).order_by(User.name).first() + + # correct + session.query(User).options(subqueryload(User.addresses)).order_by(User.name, User.id).first() + Loading Along Paths ------------------- diff --git a/doc/build/orm/tutorial.rst b/doc/build/orm/tutorial.rst index f90dc48d2..e75eda1ee 100644 --- a/doc/build/orm/tutorial.rst +++ b/doc/build/orm/tutorial.rst @@ -1703,6 +1703,12 @@ very easy to use: >>> jack.addresses [<Address(email_address='jack@google.com')>, <Address(email_address='j25@yahoo.com')>] +.. warning:: + + If you use :func:`.subqueryload`, you should generally + :meth:`.Query.order_by` on a unique column in order to ensure correct + results. See :ref:`subquery_loading_tips`. + Joined Load ------------- |