doc/build/faq/performance.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472

.. _faq_performance:

Performance
===========

.. contents::
    :local:
    :class: faq
    :backlinks: none

.. _faq_new_caching:

Why is my application slow after upgrading to 1.4 and/or 2.x?
--------------------------------------------------------------

SQLAlchemy as of version 1.4 includes a
:ref:`SQL compilation caching facility <sql_caching>` which will allow
Core and ORM SQL constructs to cache their stringified form, along with other
structural information used to fetch results from the statement, allowing the
relatively expensive string compilation process to be skipped when another
structurally equivalent construct is next used. This system
relies upon functionality that is implemented for all SQL constructs, including
objects such as  :class:`_schema.Column`,
:func:`_sql.select`, and :class:`_types.TypeEngine` objects, to produce a
**cache key** which fully represents their state to the degree that it affects
the SQL compilation process.

The caching system allows SQLAlchemy 1.4 and above to be more performant than
SQLAlchemy 1.3 with regards to the time spent converting SQL constructs into
strings repeatedly.  However, this only works if caching is enabled for the
dialect and SQL constructs in use; if not, string compilation is usually
similar to that of SQLAlchemy 1.3, with a slight decrease in speed in some
cases.

There is one case however where if SQLAlchemy's new caching system has been
disabled (for reasons below), performance for the ORM may be in fact
significantly poorer than that of 1.3 or other prior releases which is due to
the lack of caching within ORM lazy loaders and object refresh queries, which
in the 1.3 and earlier releases used the now-legacy ``BakedQuery`` system. If
an application is seeing significant (30% or higher) degradations in
performance (measured in time for operations to complete) when switching to
1.4, this is the likely cause of the issue, with steps to mitigate below.

.. seealso::

    :ref:`sql_caching` - overview of the caching system

    :ref:`caching_caveats` - additional information regarding the warnings
    generated for elements that don't enable caching.

Step one - turn on SQL logging and confirm whether or not caching is working
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here, we want to use the technique described at
:ref:`engine logging <sql_caching_logging>`, looking for statements with the
``[no key]`` indicator or even ``[dialect does not support caching]``.
The indicators we would see for SQL statements that are successfully participating
in the caching system would be indicating ``[generated in Xs]`` when
statements are invoked for the first time and then
``[cached since Xs ago]`` for the vast majority of statements subsequent.
If ``[no key]`` is prevalent in particular for SELECT statements, or
if caching is disabled entirely due to ``[dialect does not support caching]``,
this can be the cause of significant performance degradation.

.. seealso::

    :ref:`sql_caching_logging`


Step two - identify what constructs are blocking caching from being enabled
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Assuming statements are not being cached, there should be warnings emitted
early in the application's log (SQLAlchemy 1.4.28 and above only) indicating
dialects, :class:`.TypeEngine` objects, and SQL constructs that are not
participating in caching.

For user defined datatypes such as those which extend :class:`_types.TypeDecorator`
and :class:`_types.UserDefinedType`, the warnings will look like:

.. sourcecode:: text

    sqlalchemy.ext.SAWarning: MyType will not produce a cache key because the
    ``cache_ok`` attribute is not set to True. This can have significant
    performance implications including some performance degradations in
    comparison to prior SQLAlchemy versions. Set this attribute to True if this
    type object's state is safe to use in a cache key, or False to disable this
    warning.

For custom and third party SQL elements, such as those constructed using
the techniques described at :ref:`sqlalchemy.ext.compiler_toplevel`, these
warnings will look like:

.. sourcecode:: text

    sqlalchemy.exc.SAWarning: Class MyClass will not make use of SQL
    compilation caching as it does not set the 'inherit_cache' attribute to
    ``True``. This can have significant performance implications including some
    performance degradations in comparison to prior SQLAlchemy versions. Set
    this attribute to True if this object can make use of the cache key
    generated by the superclass. Alternatively, this attribute may be set to
    False which will disable this warning.

For custom and third party dialects which make use of the :class:`.Dialect`
class hierarchy, the warnings will look like:

.. sourcecode:: text

    sqlalchemy.exc.SAWarning: Dialect database:driver will not make use of SQL
    compilation caching as it does not set the 'supports_statement_cache'
    attribute to ``True``. This can have significant performance implications
    including some performance degradations in comparison to prior SQLAlchemy
    versions. Dialect maintainers should seek to set this attribute to True
    after appropriate development and testing for SQLAlchemy 1.4 caching
    support. Alternatively, this attribute may be set to False which will
    disable this warning.


Step three - enable caching for the given objects and/or seek alternatives
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Steps to mitigate the lack of caching include:

* Review and set :attr:`.ExternalType.cache_ok` to ``True`` for all custom types
  which extend from :class:`_types.TypeDecorator`,
  :class:`_types.UserDefinedType`, as well as subclasses of these such as
  :class:`_types.PickleType`.  Set this **only** if the custom type does not
  include any additional state attributes which affect how it renders SQL::

        class MyCustomType(TypeDecorator):
            cache_ok = True
            impl = String

  If the types in use are from a third-party library, consult with the
  maintainers of that library so that it may be adjusted and released.

  .. seealso::

    :attr:`.ExternalType.cache_ok` - background on requirements to enable
    caching for custom datatypes.

* Make sure third party dialects set :attr:`.Dialect.supports_statement_cache`
  to ``True``. What this indicates is that the maintainers of a third party
  dialect have made sure their dialect works with SQLAlchemy 1.4 or greater,
  and that their dialect doesn't include any compilation features which may get
  in the way of caching. As there are some common compilation patterns which
  can in fact interfere with caching, it's important that dialect maintainers
  check and test this carefully, adjusting for any of the legacy patterns
  which won't work with caching.

  .. seealso::

      :ref:`engine_thirdparty_caching` - background and examples for third-party
      dialects to participate in SQL statement caching.

* Custom SQL classes, including all DQL / DML constructs one might create
  using the :ref:`sqlalchemy.ext.compiler_toplevel`, as well as ad-hoc
  subclasses of objects such as :class:`_schema.Column` or
  :class:`_schema.Table`.   The :attr:`.HasCacheKey.inherit_cache` attribute
  may be set to ``True`` for trivial subclasses, which do not contain any
  subclass-specific state information which affects the SQL compilation.

  .. seealso::

    :ref:`compilerext_caching` - guidelines for applying the
    :attr:`.HasCacheKey.inherit_cache` attribute.


.. seealso::

    :ref:`sql_caching` - caching system overview

    :ref:`caching_caveats` - background on warnings emitted when caching
    is not enabled for specific constructs and/or dialects.


.. _faq_how_to_profile:

How can I profile a SQLAlchemy powered application?
---------------------------------------------------

Looking for performance issues typically involves two strategies.  One
is query profiling, and the other is code profiling.

Query Profiling
^^^^^^^^^^^^^^^

Sometimes just plain SQL logging (enabled via python's logging module
or via the ``echo=True`` argument on :func:`_sa.create_engine`) can give an
idea how long things are taking.  For example, if you log something
right after a SQL operation, you'd see something like this in your
log:

.. sourcecode:: text

    17:37:48,325 INFO  [sqlalchemy.engine.base.Engine.0x...048c] SELECT ...
    17:37:48,326 INFO  [sqlalchemy.engine.base.Engine.0x...048c] {<params>}
    17:37:48,660 DEBUG [myapp.somemessage]

if you logged ``myapp.somemessage`` right after the operation, you know
it took 334ms to complete the SQL part of things.

Logging SQL will also illustrate if dozens/hundreds of queries are
being issued which could be better organized into much fewer queries.
When using the SQLAlchemy ORM, the "eager loading"
feature is provided to partially (:func:`.contains_eager()`) or fully
(:func:`_orm.joinedload()`, :func:`.subqueryload()`)
automate this activity, but without
the ORM "eager loading" typically means to use joins so that results across multiple
tables can be loaded in one result set instead of multiplying numbers
of queries as more depth is added (i.e. ``r + r*r2 + r*r2*r3`` ...)

For more long-term profiling of queries, or to implement an application-side
"slow query" monitor, events can be used to intercept cursor executions,
using a recipe like the following::

    from sqlalchemy import event
    from sqlalchemy.engine import Engine
    import time
    import logging

    logging.basicConfig()
    logger = logging.getLogger("myapp.sqltime")
    logger.setLevel(logging.DEBUG)


    @event.listens_for(Engine, "before_cursor_execute")
    def before_cursor_execute(conn, cursor, statement, parameters, context, executemany):
        conn.info.setdefault("query_start_time", []).append(time.time())
        logger.debug("Start Query: %s", statement)


    @event.listens_for(Engine, "after_cursor_execute")
    def after_cursor_execute(conn, cursor, statement, parameters, context, executemany):
        total = time.time() - conn.info["query_start_time"].pop(-1)
        logger.debug("Query Complete!")
        logger.debug("Total Time: %f", total)

Above, we use the :meth:`_events.ConnectionEvents.before_cursor_execute` and
:meth:`_events.ConnectionEvents.after_cursor_execute` events to establish an interception
point around when a statement is executed.  We attach a timer onto the
connection using the :class:`._ConnectionRecord.info` dictionary; we use a
stack here for the occasional case where the cursor execute events may be nested.

.. _faq_code_profiling:

Code Profiling
^^^^^^^^^^^^^^

If logging reveals that individual queries are taking too long, you'd
need a breakdown of how much time was spent within the database
processing the query, sending results over the network, being handled
by the :term:`DBAPI`, and finally being received by SQLAlchemy's result set
and/or ORM layer.   Each of these stages can present their own
individual bottlenecks, depending on specifics.

For that you need to use the
`Python Profiling Module <https://docs.python.org/2/library/profile.html>`_.
Below is a simple recipe which works profiling into a context manager::

    import cProfile
    import io
    import pstats
    import contextlib


    @contextlib.contextmanager
    def profiled():
        pr = cProfile.Profile()
        pr.enable()
        yield
        pr.disable()
        s = io.StringIO()
        ps = pstats.Stats(pr, stream=s).sort_stats("cumulative")
        ps.print_stats()
        # uncomment this to see who's calling what
        # ps.print_callers()
        print(s.getvalue())

To profile a section of code::

    with profiled():
        session.scalars(select(FooClass).where(FooClass.somevalue == 8)).all()

The output of profiling can be used to give an idea where time is
being spent.   A section of profiling output looks like this:

.. sourcecode:: text

    13726 function calls (13042 primitive calls) in 0.014 seconds

    Ordered by: cumulative time

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    222/21    0.001    0.000    0.011    0.001 lib/sqlalchemy/orm/loading.py:26(instances)
    220/20    0.002    0.000    0.010    0.001 lib/sqlalchemy/orm/loading.py:327(_instance)
    220/20    0.000    0.000    0.010    0.000 lib/sqlalchemy/orm/loading.py:284(populate_state)
       20    0.000    0.000    0.010    0.000 lib/sqlalchemy/orm/strategies.py:987(load_collection_from_subq)
       20    0.000    0.000    0.009    0.000 lib/sqlalchemy/orm/strategies.py:935(get)
        1    0.000    0.000    0.009    0.009 lib/sqlalchemy/orm/strategies.py:940(_load)
       21    0.000    0.000    0.008    0.000 lib/sqlalchemy/orm/strategies.py:942(<genexpr>)
        2    0.000    0.000    0.004    0.002 lib/sqlalchemy/orm/query.py:2400(__iter__)
        2    0.000    0.000    0.002    0.001 lib/sqlalchemy/orm/query.py:2414(_execute_and_instances)
        2    0.000    0.000    0.002    0.001 lib/sqlalchemy/engine/base.py:659(execute)
        2    0.000    0.000    0.002    0.001 lib/sqlalchemy/sql/elements.py:321(_execute_on_connection)
        2    0.000    0.000    0.002    0.001 lib/sqlalchemy/engine/base.py:788(_execute_clauseelement)

    ...

Above, we can see that the ``instances()`` SQLAlchemy function was called 222
times (recursively, and 21 times from the outside), taking a total of .011
seconds for all calls combined.

Execution Slowness
^^^^^^^^^^^^^^^^^^

The specifics of these calls can tell us where the time is being spent.
If for example, you see time being spent within ``cursor.execute()``,
e.g. against the DBAPI:

.. sourcecode:: text

    2    0.102    0.102    0.204    0.102 {method 'execute' of 'sqlite3.Cursor' objects}

this would indicate that the database is taking a long time to start returning
results, and it means your query should be optimized, either by adding indexes
or restructuring the query and/or underlying schema.  For that task,
analysis of the query plan is warranted, using a system such as EXPLAIN,
SHOW PLAN, etc. as is provided by the database backend.

Result Fetching Slowness - Core
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If on the other hand you see many thousands of calls related to fetching rows,
or very long calls to ``fetchall()``, it may
mean your query is returning more rows than expected, or that the fetching
of rows itself is slow.   The ORM itself typically uses ``fetchall()`` to fetch
rows (or ``fetchmany()`` if the :meth:`_query.Query.yield_per` option is used).

An inordinately large number of rows would be indicated
by a very slow call to ``fetchall()`` at the DBAPI level:

.. sourcecode:: text

    2    0.300    0.600    0.300    0.600 {method 'fetchall' of 'sqlite3.Cursor' objects}

An unexpectedly large number of rows, even if the ultimate result doesn't seem
to have many rows, can be the result of a cartesian product - when multiple
sets of rows are combined together without appropriately joining the tables
together.   It's often easy to produce this behavior with SQLAlchemy Core or
ORM query if the wrong :class:`_schema.Column` objects are used in a complex query,
pulling in additional FROM clauses that are unexpected.

On the other hand, a fast call to ``fetchall()`` at the DBAPI level, but then
slowness when SQLAlchemy's :class:`_engine.CursorResult` is asked to do a ``fetchall()``,
may indicate slowness in processing of datatypes, such as unicode conversions
and similar:

.. sourcecode:: text

    # the DBAPI cursor is fast...
    2    0.020    0.040    0.020    0.040 {method 'fetchall' of 'sqlite3.Cursor' objects}

    ...

    # but SQLAlchemy's result proxy is slow, this is type-level processing
    2    0.100    0.200    0.100    0.200 lib/sqlalchemy/engine/result.py:778(fetchall)

In some cases, a backend might be doing type-level processing that isn't
needed.   More specifically, seeing calls within the type API that are slow
are better indicators - below is what it looks like when we use a type like
this::

    from sqlalchemy import TypeDecorator
    import time


    class Foo(TypeDecorator):
        impl = String

        def process_result_value(self, value, thing):
            # intentionally add slowness for illustration purposes
            time.sleep(0.001)
            return value

the profiling output of this intentionally slow operation can be seen like this:

.. sourcecode:: text

      200    0.001    0.000    0.237    0.001 lib/sqlalchemy/sql/type_api.py:911(process)
      200    0.001    0.000    0.236    0.001 test.py:28(process_result_value)
      200    0.235    0.001    0.235    0.001 {time.sleep}

that is, we see many expensive calls within the ``type_api`` system, and the actual
time consuming thing is the ``time.sleep()`` call.

Make sure to check the :ref:`Dialect documentation <dialect_toplevel>`
for notes on known performance tuning suggestions at this level, especially for
databases like Oracle.  There may be systems related to ensuring numeric accuracy
or string processing that may not be needed in all cases.

There also may be even more low-level points at which row-fetching performance is suffering;
for example, if time spent seems to focus on a call like ``socket.receive()``,
that could indicate that everything is fast except for the actual network connection,
and too much time is spent with data moving over the network.

Result Fetching Slowness - ORM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To detect slowness in ORM fetching of rows (which is the most common area
of performance concern), calls like ``populate_state()`` and ``_instance()`` will
illustrate individual ORM object populations:

.. sourcecode:: text

    # the ORM calls _instance for each ORM-loaded row it sees, and
    # populate_state for each ORM-loaded row that results in the population
    # of an object's attributes
    220/20    0.001    0.000    0.010    0.000 lib/sqlalchemy/orm/loading.py:327(_instance)
    220/20    0.000    0.000    0.009    0.000 lib/sqlalchemy/orm/loading.py:284(populate_state)

The ORM's slowness in turning rows into ORM-mapped objects is a product
of the complexity of this operation combined with the overhead of cPython.
Common strategies to mitigate this include:

* fetch individual columns instead of full entities, that is::

      select(User.id, User.name)

  instead of::

      select(User)

* Use :class:`.Bundle` objects to organize column-based results::

      u_b = Bundle("user", User.id, User.name)
      a_b = Bundle("address", Address.id, Address.email)

      for user, address in session.execute(select(u_b, a_b).join(User.addresses)):
          ...

* Use result caching - see :ref:`examples_caching` for an in-depth example
  of this.

* Consider a faster interpreter like that of PyPy.

The output of a profile can be a little daunting but after some
practice they are very easy to read.

.. seealso::

    :ref:`examples_performance` - a suite of performance demonstrations
    with bundled profiling capabilities.

I'm inserting 400,000 rows with the ORM and it's really slow!
-------------------------------------------------------------

The nature of ORM inserts has changed, as most included drivers use RETURNING
with :ref:`insertmanyvalues <engine_insertmanyvalues>` support as of SQLAlchemy
2.0. See the section :ref:`change_6047` for details.

Overall, SQLAlchemy built-in drivers other than that of MySQL should now
offer very fast ORM bulk insert performance.

Third party drivers can opt in to the new bulk infrastructure as well with some
small code changes assuming their backends support the necessary syntaxes.
SQLAlchemy developers would encourage users of third party dialects to post
issues with these drivers, so that they may contact SQLAlchemy developers for
assistance.