summaryrefslogtreecommitdiff
path: root/doc/build/orm/relationship_persistence.rst
blob: f56773caa833f4112f33eb6a1ca20e4016e67abd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
Special Relationship Persistence Patterns
=========================================

.. _post_update:

Rows that point to themselves / Mutually Dependent Rows
-------------------------------------------------------

This is a very specific case where relationship() must perform an INSERT and a
second UPDATE in order to properly populate a row (and vice versa an UPDATE
and DELETE in order to delete without violating foreign key constraints). The
two use cases are:

* A table contains a foreign key to itself, and a single row will
  have a foreign key value pointing to its own primary key.
* Two tables each contain a foreign key referencing the other
  table, with a row in each table referencing the other.

For example:

.. sourcecode:: text

              user
    ---------------------------------
    user_id    name   related_user_id
       1       'ed'          1

Or:

.. sourcecode:: text

                 widget                                                  entry
    -------------------------------------------             ---------------------------------
    widget_id     name        favorite_entry_id             entry_id      name      widget_id
       1       'somewidget'          5                         5       'someentry'     1

In the first case, a row points to itself. Technically, a database that uses
sequences such as PostgreSQL or Oracle can INSERT the row at once using a
previously generated value, but databases which rely upon autoincrement-style
primary key identifiers cannot. The :func:`~sqlalchemy.orm.relationship`
always assumes a "parent/child" model of row population during flush, so
unless you are populating the primary key/foreign key columns directly,
:func:`~sqlalchemy.orm.relationship` needs to use two statements.

In the second case, the "widget" row must be inserted before any referring
"entry" rows, but then the "favorite_entry_id" column of that "widget" row
cannot be set until the "entry" rows have been generated. In this case, it's
typically impossible to insert the "widget" and "entry" rows using just two
INSERT statements; an UPDATE must be performed in order to keep foreign key
constraints fulfilled. The exception is if the foreign keys are configured as
"deferred until commit" (a feature some databases support) and if the
identifiers were populated manually (again essentially bypassing
:func:`~sqlalchemy.orm.relationship`).

To enable the usage of a supplementary UPDATE statement,
we use the :paramref:`_orm.relationship.post_update` option
of :func:`_orm.relationship`.  This specifies that the linkage between the
two rows should be created using an UPDATE statement after both rows
have been INSERTED; it also causes the rows to be de-associated with
each other via UPDATE before a DELETE is emitted.  The flag should
be placed on just *one* of the relationships, preferably the
many-to-one side.  Below we illustrate
a complete example, including two :class:`_schema.ForeignKey` constructs::

    from sqlalchemy import Integer, ForeignKey
    from sqlalchemy.orm import mapped_column
    from sqlalchemy.orm import DeclarativeBase
    from sqlalchemy.orm import relationship


    class Base(DeclarativeBase):
        pass


    class Entry(Base):
        __tablename__ = "entry"
        entry_id = mapped_column(Integer, primary_key=True)
        widget_id = mapped_column(Integer, ForeignKey("widget.widget_id"))
        name = mapped_column(String(50))


    class Widget(Base):
        __tablename__ = "widget"

        widget_id = mapped_column(Integer, primary_key=True)
        favorite_entry_id = mapped_column(
            Integer, ForeignKey("entry.entry_id", name="fk_favorite_entry")
        )
        name = mapped_column(String(50))

        entries = relationship(Entry, primaryjoin=widget_id == Entry.widget_id)
        favorite_entry = relationship(
            Entry, primaryjoin=favorite_entry_id == Entry.entry_id, post_update=True
        )

When a structure against the above configuration is flushed, the "widget" row will be
INSERTed minus the "favorite_entry_id" value, then all the "entry" rows will
be INSERTed referencing the parent "widget" row, and then an UPDATE statement
will populate the "favorite_entry_id" column of the "widget" table (it's one
row at a time for the time being):

.. sourcecode:: pycon+sql

    >>> w1 = Widget(name="somewidget")
    >>> e1 = Entry(name="someentry")
    >>> w1.favorite_entry = e1
    >>> w1.entries = [e1]
    >>> session.add_all([w1, e1])
    >>> session.commit()
    {execsql}BEGIN (implicit)
    INSERT INTO widget (favorite_entry_id, name) VALUES (?, ?)
    (None, 'somewidget')
    INSERT INTO entry (widget_id, name) VALUES (?, ?)
    (1, 'someentry')
    UPDATE widget SET favorite_entry_id=? WHERE widget.widget_id = ?
    (1, 1)
    COMMIT

An additional configuration we can specify is to supply a more
comprehensive foreign key constraint on ``Widget``, such that
it's guaranteed that ``favorite_entry_id`` refers to an ``Entry``
that also refers to this ``Widget``.  We can use a composite foreign key,
as illustrated below::

    from sqlalchemy import (
        Integer,
        ForeignKey,
        String,
        UniqueConstraint,
        ForeignKeyConstraint,
    )
    from sqlalchemy.orm import DeclarativeBase
    from sqlalchemy.orm import mapped_column
    from sqlalchemy.orm import relationship


    class Base(DeclarativeBase):
        pass


    class Entry(Base):
        __tablename__ = "entry"
        entry_id = mapped_column(Integer, primary_key=True)
        widget_id = mapped_column(Integer, ForeignKey("widget.widget_id"))
        name = mapped_column(String(50))
        __table_args__ = (UniqueConstraint("entry_id", "widget_id"),)


    class Widget(Base):
        __tablename__ = "widget"

        widget_id = mapped_column(Integer, autoincrement="ignore_fk", primary_key=True)
        favorite_entry_id = mapped_column(Integer)

        name = mapped_column(String(50))

        __table_args__ = (
            ForeignKeyConstraint(
                ["widget_id", "favorite_entry_id"],
                ["entry.widget_id", "entry.entry_id"],
                name="fk_favorite_entry",
            ),
        )

        entries = relationship(
            Entry, primaryjoin=widget_id == Entry.widget_id, foreign_keys=Entry.widget_id
        )
        favorite_entry = relationship(
            Entry,
            primaryjoin=favorite_entry_id == Entry.entry_id,
            foreign_keys=favorite_entry_id,
            post_update=True,
        )

The above mapping features a composite :class:`_schema.ForeignKeyConstraint`
bridging the ``widget_id`` and ``favorite_entry_id`` columns.  To ensure
that ``Widget.widget_id`` remains an "autoincrementing" column we specify
:paramref:`_schema.Column.autoincrement` to the value ``"ignore_fk"``
on :class:`_schema.Column`, and additionally on each
:func:`_orm.relationship` we must limit those columns considered as part of
the foreign key for the purposes of joining and cross-population.

.. _passive_updates:

Mutable Primary Keys / Update Cascades
--------------------------------------

When the primary key of an entity changes, related items
which reference the primary key must also be updated as
well. For databases which enforce referential integrity,
the best strategy is to use the database's ON UPDATE CASCADE
functionality in order to propagate primary key changes
to referenced foreign keys - the values cannot be out
of sync for any moment unless the constraints are marked as "deferrable",
that is, not enforced until the transaction completes.

It is **highly recommended** that an application which seeks to employ
natural primary keys with mutable values to use the ``ON UPDATE CASCADE``
capabilities of the database.   An example mapping which
illustrates this is::

    class User(Base):
        __tablename__ = "user"
        __table_args__ = {"mysql_engine": "InnoDB"}

        username = mapped_column(String(50), primary_key=True)
        fullname = mapped_column(String(100))

        addresses = relationship("Address")


    class Address(Base):
        __tablename__ = "address"
        __table_args__ = {"mysql_engine": "InnoDB"}

        email = mapped_column(String(50), primary_key=True)
        username = mapped_column(
            String(50), ForeignKey("user.username", onupdate="cascade")
        )

Above, we illustrate ``onupdate="cascade"`` on the :class:`_schema.ForeignKey`
object, and we also illustrate the ``mysql_engine='InnoDB'`` setting
which, on a MySQL backend, ensures that the ``InnoDB`` engine supporting
referential integrity is used.  When using SQLite, referential integrity
should be enabled, using the configuration described at
ref_sqlite_foreign_keys.

.. seealso::

    :ref:`passive_deletes` - supporting ON DELETE CASCADE with relationships

    :paramref:`.orm.mapper.passive_updates` - similar feature on :class:`_orm.Mapper`


Simulating limited ON UPDATE CASCADE without foreign key support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In those cases when a database that does not support referential integrity
is used, and natural primary keys with mutable values are in play,
SQLAlchemy offers a feature in order to allow propagation of primary key
values to already-referenced foreign keys to a **limited** extent,
by emitting an UPDATE statement against foreign key columns that immediately
reference a primary key column whose value has changed.
The primary platforms without referential integrity features are
MySQL when the ``MyISAM`` storage engine is used, and SQLite when the
``PRAGMA foreign_keys=ON`` pragma is not used.  The Oracle database also
has no support for ``ON UPDATE CASCADE``, but because it still enforces
referential integrity, needs constraints to be marked as deferrable
so that SQLAlchemy can emit UPDATE statements.

The feature is enabled by setting the
:paramref:`_orm.relationship.passive_updates` flag to ``False``,
most preferably on a one-to-many or
many-to-many :func:`_orm.relationship`.  When "updates" are no longer
"passive" this indicates that SQLAlchemy will
issue UPDATE statements individually for
objects referenced in the collection referred to by the parent object
with a changing primary key value.  This also implies that collections
will be fully loaded into memory if not already locally present.

Our previous mapping using ``passive_updates=False`` looks like::

    class User(Base):
        __tablename__ = "user"

        username = mapped_column(String(50), primary_key=True)
        fullname = mapped_column(String(100))

        # passive_updates=False *only* needed if the database
        # does not implement ON UPDATE CASCADE
        addresses = relationship("Address", passive_updates=False)


    class Address(Base):
        __tablename__ = "address"

        email = mapped_column(String(50), primary_key=True)
        username = mapped_column(String(50), ForeignKey("user.username"))

Key limitations of ``passive_updates=False`` include:

* it performs much more poorly than direct database ON UPDATE CASCADE,
  because it needs to fully pre-load affected collections using SELECT
  and also must emit  UPDATE statements against those values, which it
  will attempt to run  in "batches" but still runs on a per-row basis
  at the DBAPI level.

* the feature cannot "cascade" more than one level.  That is,
  if mapping X has a foreign key which refers to the primary key
  of mapping Y, but then mapping Y's primary key is itself a foreign key
  to mapping Z, ``passive_updates=False`` cannot cascade a change in
  primary key value from ``Z`` to ``X``.

* Configuring ``passive_updates=False`` only on the many-to-one
  side of a relationship will not have a full effect, as the
  unit of work searches only through the current identity
  map for objects that may be referencing the one with a
  mutating primary key, not throughout the database.

As virtually all databases other than Oracle now support ``ON UPDATE CASCADE``,
it is highly recommended that traditional ``ON UPDATE CASCADE`` support be used
in the case that natural and mutable primary key values are in use.