summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorMike Bayer <mike_mp@zzzcomputing.com>2022-07-18 15:08:37 -0400
committerMike Bayer <mike_mp@zzzcomputing.com>2022-09-24 11:15:32 -0400
commit2bcc97da424eef7db9a5d02f81d02344925415ee (patch)
tree13d4f04bc7dd40a0207f86aa2fc3a3b49e065674 /doc
parent332188e5680574368001ded52eb0a9d259ecdef5 (diff)
downloadsqlalchemy-2bcc97da424eef7db9a5d02f81d02344925415ee.tar.gz
implement batched INSERT..VALUES () () for executemany
the feature is enabled for all built in backends when RETURNING is used, except for Oracle that doesn't need it, and on psycopg2 and mssql+pyodbc it is used for all INSERT statements, not just those that use RETURNING. third party dialects would need to opt in to the new feature by setting use_insertmanyvalues to True. Also adds dialect-level guards against using returning with executemany where we dont have an implementation to suit it. execute single w/ returning still defers to the server without us checking. Fixes: #6047 Fixes: #7907 Change-Id: I3936d3c00003f02e322f2e43fb949d0e6e568304
Diffstat (limited to 'doc')
-rw-r--r--doc/build/changelog/unreleased_20/6047.rst16
-rw-r--r--doc/build/changelog/whatsnew_20.rst155
-rw-r--r--doc/build/core/connections.rst226
-rw-r--r--doc/build/faq/performance.rst240
-rw-r--r--doc/build/orm/persistence_techniques.rst57
-rw-r--r--doc/build/orm/quickstart.rst16
-rw-r--r--doc/build/tutorial/orm_data_manipulation.rst7
-rw-r--r--doc/build/tutorial/orm_related_objects.rst7
8 files changed, 444 insertions, 280 deletions
diff --git a/doc/build/changelog/unreleased_20/6047.rst b/doc/build/changelog/unreleased_20/6047.rst
new file mode 100644
index 000000000..1addc26ae
--- /dev/null
+++ b/doc/build/changelog/unreleased_20/6047.rst
@@ -0,0 +1,16 @@
+.. change::
+ :tags: feature, orm, sql
+ :tickets: 6047
+
+ Added new feature to all included dialects that support RETURNING
+ called "insertmanyvalues". This is a generalization of the
+ "fast executemany" feature first introduced for the psycopg2 driver
+ in 1.4 at :ref:`change_5263`, which allows the ORM to batch INSERT
+ statements into a much more efficient SQL structure while still being
+ able to fetch newly generated primary key and SQL default values
+ using RETURNING.
+
+ The feature now applies to the many dialects that support RETURNING along
+ with multiple VALUES constructs for INSERT, including all PostgreSQL
+ drivers, SQLite, MariaDB, MS SQL Server. Separately, the Oracle dialect
+ also gains the same capability using native cx_Oracle or OracleDB features.
diff --git a/doc/build/changelog/whatsnew_20.rst b/doc/build/changelog/whatsnew_20.rst
index d1a3980a1..1c6d14b22 100644
--- a/doc/build/changelog/whatsnew_20.rst
+++ b/doc/build/changelog/whatsnew_20.rst
@@ -812,6 +812,161 @@ positional arguments as configured::
:ref:`orm_declarative_native_dataclasses`
+.. _change_6047:
+
+Optimized ORM bulk insert now implemented for all backends other than MySQL
+----------------------------------------------------------------------------
+
+The dramatic performance improvement introduced in the 1.4 series and described
+at :ref:`change_5263` has now been generalized to all included backends that
+support RETURNING, which is all backends other than MySQL: SQLite, MariaDB,
+PostgreSQL (all drivers), Oracle, and SQL Server. While the original feature
+was most critical for the psycopg2 driver which otherwise had major performance
+issues when using ``cursor.executemany()``, the change is also critical for
+other PostgreSQL drivers such as asyncpg, as when using RETURNING,
+single-statement INSERT statements are still unacceptably slow, as well
+as when using SQL Server that also seems to have very slow executemany
+speed for INSERT statements regardless of whether or not RETURNING is used.
+
+The performance of the new feature provides an almost across-the-board
+order of magnitude performance increase for basically every driver when
+INSERTing ORM objects that don't have a pre-assigned primary key value, as
+indicated in the table below, in most cases specific to the use of RETURNING
+which is not normally supported with executemany().
+
+The psycopg2 "fast execution helper" approach consists of transforming an
+INSERT..RETURNING statement with a single parameter set into a single
+statement that INSERTs many parameter sets, using multiple "VALUES..."
+clauses so that it can accommodate many parameter sets at once.
+Parameter sets are then typically batched into groups of 1000
+or similar, so that no single INSERT statement is excessively large, and the
+INSERT statement is then invoked for each batch of parameters, rather than
+for each individual parameter set. Primary key values and server defaults
+are returned by RETURNING, which continues to work as each statement execution
+is invoked using ``cursor.execute()``, rather than ``cursor.executemany()``.
+
+This allows many rows to be inserted in one statement while also being able to
+return newly-generated primary key values as well as SQL and server defaults.
+SQLAlchemy historically has always needed to invoke one statement per parameter
+set, as it relied upon Python DBAPI Features such as ``cursor.lastrowid`` which
+do not support multiple rows.
+
+With most databases now offering RETURNING (with the conspicuous exception of
+MySQL, given that MariaDB supports it), the new change generalizes the psycopg2
+"fast execution helper" approach to all dialects that support RETURNING, which
+now includes SQlite and MariaDB, and for which no other approach for
+"executemany plus RETURNING" is possible, which includes SQLite, MariaDB, all
+PG drivers, and SQL Server. The cx_Oracle and oracledb drivers used for Oracle
+support RETURNING with executemany natively, and this has also been implemented
+to provide equivalent performance improvements. With SQLite and MariaDB now
+offering RETURNING support, ORM use of ``cursor.lastrowid`` is nearly a thing
+of the past, with only MySQL still relying upon it.
+
+For INSERT statements that don't use RETURNING, traditional executemany()
+behavior is used for most backends, with the current exceptions of psycopg2
+and mssql+pyodbc, which both have very slow executemany() performance overall
+and are still improved by the "insertmanyvalues" approach.
+
+Benchmarks
+~~~~~~~~~~
+
+SQLAlchemy includes a :ref:`Performance Suite <examples_performance>` within
+the ``examples/`` directory, where we can make use of the ``bulk_insert``
+suite to benchmark INSERTs of many rows using both Core and ORM in different
+ways.
+
+For the tests below, we are inserting **100,000 objects**, and in all cases we
+actually have 100,000 real Python ORM objects in memory, either created up
+front or generated on the fly. All databases other than SQLite are run over a
+local network connection, not localhost; this causes the "slower" results to be
+extremely slow.
+
+Operations that are improved by this feature include:
+
+* unit of work flushes for objects added to the session using
+ :meth:`_orm.Session.add` and :meth:`_orm.Session.add_all`.
+* the :class:`_orm.Session` "bulk" operations described at
+ :ref:`bulk_operations`
+* An upcoming feature known as "ORM Enabled Insert Statements" that will be
+ an improvement upon the existing :ref:`orm_dml_returning_objects` first
+ introduced as an experimental feature in SQLAlchemy 1.4.
+
+To get a sense of the scale of the operation, below are performance
+measurements using the ``test_flush_no_pk`` performance suite, which
+historically represents SQLAlchemy's worst-case INSERT performance task,
+where objects that don't have primary key values need to be INSERTed, and
+then the newly generated primary key values must be fetched so that the
+objects can be used for subsequent flush operations, such as establishment
+within relationships, flushing joined-inheritance models, etc::
+
+ @Profiler.profile
+ def test_flush_no_pk(n):
+ """INSERT statements via the ORM (batched with RETURNING if available),
+ fetching generated row id"""
+ session = Session(bind=engine)
+ for chunk in range(0, n, 1000):
+ session.add_all(
+ [
+ Customer(
+ name="customer name %d" % i,
+ description="customer description %d" % i,
+ )
+ for i in range(chunk, chunk + 1000)
+ ]
+ )
+ session.flush()
+ session.commit()
+
+This test can be run from any SQLAlchemy source tree as follows::
+
+ python -m examples.performance.bulk_inserts --test test_flush_no_pk
+
+The table below summarizes performance measurements with
+the latest 1.4 series of SQLAlchemy compared to 2.0, both running
+the same test:
+
+============================ ==================== ====================
+Driver SQLA 1.4 Time (secs) SQLA 2.0 Time (secs)
+---------------------------- -------------------- --------------------
+sqlite+pysqlite2 (memory) 6.204843 3.554856
+postgresql+asyncpg (network) 88.292285 4.561492
+postgresql+psycopg (network) N/A (psycopg3) 4.861368
+oracle+cx_Oracle (network) 92.603953 4.809520
+mssql+pyodbc (network) 158.396667 4.825139
+mariadb+mysqldb (network) 71.705197 4.075377
+============================ ==================== ====================
+
+Two additional drivers have no change in performance; the psycopg2 drivers,
+for which fast executemany was already implemented in SQLAlchemy 1.4,
+and MySQL, which continues to not offer RETURNING support:
+
+============================= ==================== ====================
+Driver SQLA 1.4 Time (secs) SQLA 2.0 Time (secs)
+----------------------------- -------------------- --------------------
+postgresql+psycopg2 (network) 4.704876 4.699883
+mysql+mysqldb (network) 77.281997 76.132995
+============================= ==================== ====================
+
+Summary of Changes
+~~~~~~~~~~~~~~~~~~
+
+The following bullets list the individual changes made within 2.0 in order to
+get all drivers to this state:
+
+* RETURNING implemented for SQLite - :ticket:`6195`
+* RETURNING implemented for MariaDB - :ticket:`7011`
+* Fix multi-row RETURNING for Oracle - :ticket:`6245`
+* make insert() executemany() support RETURNING for as many dialects as
+ possible, usually with VALUES() - :ticket:`6047`
+* Emit a warning when RETURNING w/ executemany is used for non-supporting
+ backend (currently no RETURNING backend has this limitation) - :ticket:`7907`
+
+.. seealso::
+
+ :ref:`engine_insertmanyvalues` - Documentation and background on the
+ new feature as well as how to configure it
+
+
.. _change_7311:
Installation is now fully pep-517 enabled
diff --git a/doc/build/core/connections.rst b/doc/build/core/connections.rst
index 2e3c661be..2cdbba193 100644
--- a/doc/build/core/connections.rst
+++ b/doc/build/core/connections.rst
@@ -1318,6 +1318,7 @@ SELECTs with LIMIT/OFFSET are correctly rendered and cached.
:ref:`faq_new_caching` - in the :ref:`faq_toplevel` section
+
.. _engine_lambda_caching:
Using Lambdas to add significant speed gains to statement production
@@ -1742,6 +1743,231 @@ For a series of examples of "lambda" caching with performance comparisons,
see the "short_selects" test suite within the :ref:`examples_performance`
performance example.
+.. _engine_insertmanyvalues:
+
+"Insert Many Values" Behavior for INSERT statements
+====================================================
+
+.. versionadded:: 2.0 see :ref:`change_6047` for background on the change
+ including sample performance tests
+
+As more databases have added support for INSERT..RETURNING, SQLAlchemy has
+undergone a major change in how it approaches the subject of INSERT statements
+where there's a need to acquire server-generated values, most importantly
+server-generated primary key values which allow the new row to be referenced in
+subsequent operations. This issue has for over a decade prevented SQLAlchemy
+from being able to batch large sets of rows into a small number of database
+round trips for the very common case where primary key values are
+server-generated, and historically has been the most significant performance
+bottleneck in the ORM.
+
+With recent support for RETURNING added to SQLite and MariaDB, SQLAlchemy no
+longer needs to rely upon the single-row-only
+`cursor.lastrowid <https://peps.python.org/pep-0249/#lastrowid>`_ attribute
+provided by the :term:`DBAPI` for most backends; RETURNING may now be used for
+all included backends with the exception of MySQL. The remaining performance
+limitation, that the
+`cursor.executemany() <https://peps.python.org/pep-0249/#executemany>`_ DBAPI
+method does not allow for rows to be fetched, is resolved for most backends by
+foregoing the use of ``executemany()`` and instead restructuring individual
+INSERT statements to each accommodate a large number of rows in a single
+statement that is invoked using ``cursor.execute()``. This approach originates
+from the
+`psycopg2 fast execution helpers <https://www.psycopg.org/docs/extras.html#fast-execution-helpers>`_
+feature of the ``psycopg2`` DBAPI, which SQLAlchemy incrementally added more
+and more support towards in recent release series.
+
+Concretely, for most backends the behavior will rewrite a statement of the
+form:
+
+.. sourcecode:: none
+
+ INSERT INTO a (data, x, y) VALUES (%(data)s, %(x)s, %(y)s) RETURNING a.id
+
+into a "batched" form as:
+
+.. sourcecode:: none
+
+ INSERT INTO a (data, x, y) VALUES
+ (%(data_0)s, %(x_0)s, %(y_0)s),
+ (%(data_1)s, %(x_1)s, %(y_1)s),
+ (%(data_2)s, %(x_2)s, %(y_2)s),
+ ...
+ (%(data_78)s, %(x_78)s, %(y_78)s)
+ RETURNING a.id
+
+It's also important to note that the feature will invoke **multiple INSERT
+statements** using the DBAPI ``cursor.execute()`` method,
+within the scope of **single** call to the Core-level
+:meth:`_engine.Connection.execute` method,
+with each statement containing up to a fixed limit of parameter sets.
+This limit is configurable as described below at :ref:`engine_insertmanyvalues_page_size`.
+The separate calls to ``cursor.execute()`` are logged individually and
+also individually passed along to event listeners such as
+:meth:`.ConnectionEvents.before_cursor_execute` (see :ref:`engine_insertmanyvalues_events`
+below).
+
+The feature is enabled for included SQLAlchemy backends that support RETURNING
+as well as "multiple VALUES()" clauses within INSERT statements,
+and takes place for all INSERT...RETURNING statements that are used with
+"executemany" style execution, which occurs when passing a list of dictionaries
+to the :paramref:`_engine.Connection.execute.parameters` parameter of the
+:meth:`_engine.Connection.execute` method, as well as throughout Core and ORM
+for any similar method including ORM methods like :meth:`_orm.Session.execute`
+and asyncio methods like :meth:`_asyncio.AsyncConnection.execute` and
+:meth:`_asyncio.AsyncSession.execute`. The ORM itself also makes use of the
+feature within the :term:`unit of work` process when inserting many rows,
+that is, for large numbers of objects added to a :class:`_orm.Session` using
+methods such as :meth:`_orm.Session.add` and :meth:`_orm.Session.add_all`.
+
+For SQLAlchemy's included dialects, support or equivalent support is currently
+as follows:
+
+* SQLite - supported for SQLite versions 3.35 and above
+* PostgreSQL - all supported Postgresql versions (9 and above)
+* SQL Server - all supported SQL Server versions
+* MariaDB - supported for MariaDB versions 10.5 and above
+* MySQL - no support, no RETURNING feature is present
+* Oracle - supports RETURNING with executemany using native cx_Oracle / OracleDB
+ APIs, for all supported Oracle versions 9 and above, using multi-row OUT
+ parameters. This is not the same implementation as "executemanyvalues", however has
+ the same usage patterns and equivalent performance benefits.
+
+Enabling/Disabling the feature
+------------------------------
+
+To disable the "insertmanyvalues" feature for a given backend for an
+:class:`.Engine` overall, pass the
+:paramref:`_sa.create_engine.use_insertmanyvalues` parameter as ``False`` to
+:func:`_sa.create_engine`::
+
+ engine = create_engine(
+ "mariadb+mariadbconnector://scott:tiger@host/db",
+ use_insertmanyvalues=False
+ )
+
+The feature can also be disabled from being used implicitly for a particular
+:class:`_schema.Table` object by passing the
+:paramref:`_schema.Table.implicit_returning` parameter as ``False``::
+
+ t = Table(
+ 't',
+ metadata,
+ Column('id', Integer, primary_key=True),
+ Column('x', Integer),
+ implicit_returning=False
+ )
+
+The reason one might want to disable RETURNING for a specific table is to
+work around backend-specific limitations. For example, there is a known
+limitation of SQL Server that the ``OUTPUT inserted.<colname>`` feature
+may not work correctly for a table that has INSERT triggers established;
+such a table may need to include ``implicit_returning=False`` (see
+:ref:`mssql_triggers`).
+
+.. _engine_insertmanyvalues_page_size:
+
+Controlling the Batch Size
+---------------------------
+
+A key characteristic of "insertmanyvalues" is that the size of the INSERT
+statement is limited on a fixed max number of "values" clauses as well as a
+dialect-specific fixed total number of bound parameters that may be represented
+in one INSERT statement at a time. When the number of parameter dictionaries
+given exceeds a fixed limit, or when the total number of bound parameters to be
+rendered in a single INSERT statement exceeds a fixed limit (the two fixed
+limits are separate), multiple INSERT statements will be invoked within the
+scope of a single :meth:`_engine.Connection.execute` call, each of which
+accommodate for a portion of the parameter dictionaries, referred towards as a
+"batch". The number of parameter dictionaries represented within each
+"batch" is then known as the "batch size". For example, a batch size of
+500 means that each INSERT statement emitted will INSERT at most 500 rows.
+
+It's potentially important to be able to adjust the batch size,
+as a larger batch size may be more performant for an INSERT where the value
+sets themselves are relatively small, and a smaller batch size may be more
+appropriate for an INSERT that uses very large value sets, where both the size
+of the rendered SQL as well as the total data size being passed in one
+statement may benefit from being limited to a certain size based on backend
+behavior and memory constraints. For this reason the batch size
+can be configured on a per-:class:`.Engine` as well as a per-statement
+basis. The parameter limit on the other hand is fixed based on the known
+characteristics of the database in use.
+
+The batch size defaults to 1000 for most backends, with an additional
+per-dialect "max number of parameters" limiting factor that may reduce the
+batch size further on a per-statement basis. The max number of parameters
+varies by dialect and server version; the largest size is 32700 (chosen as a
+healthy distance away from PostgreSQL's limit of 32767 and SQLite's modern
+limit of 32766, while leaving room for additional parameters in the statement
+as well as for DBAPI quirkiness). Older versions of SQLite (prior to 3.32.0)
+will set this value to 999; SQL Server sets it to 2099. MariaDB has no
+established limit however 32700 remains as a limiting factor for SQL message
+size.
+
+The value of the "batch size" can be affected :class:`_engine.Engine`
+wide via the :paramref:`_sa.create_engine.insertmanyvalues_page_size` parameter.
+Such as, to affect INSERT statements to include up to 100 parameter sets
+in each statement::
+
+ e = create_engine("sqlite://", insertmanyvalues_page_size=100)
+
+The batch size may also be affected on a per statement basis using the
+:paramref:`_engine.Connection.execution_options.insertmanyvalues_page_size`
+execution option, such as per execution::
+
+ with e.begin() as conn:
+ result = conn.execute(
+ table.insert().returning(table.c.id),
+ parameterlist,
+ execution_options={"insertmanyvalues_page_size": 100}
+ )
+
+Or configured on the statement itself::
+
+ stmt = table.insert().returning(table.c.id).execution_options(
+ insertmanyvalues_page_size=100
+ )
+ with e.begin() as conn:
+ result = conn.execute(stmt, parameterlist)
+
+.. _engine_insertmanyvalues_events:
+
+Logging and Events
+-------------------
+
+The "insertmanyvalues" feature integrates fully with SQLAlchemy's statement
+logging as well as cursor events such as :meth:`.ConnectionEvents.before_cursor_execute`.
+When the list of parameters is broken into separate batches, **each INSERT
+statement is logged and passed to event handlers individually**. This is a major change
+compared to how the psycopg2-only feature worked in previous 1.x series of
+SQLAlchemy, where the production of multiple INSERT statements was hidden from
+logging and events. Logging display will truncate the long lists of parameters for readability,
+and will also indicate the specific batch of each statement. The example below illustrates
+an excerpt of this logging::
+
+ INSERT INTO a (data, x, y) VALUES (?, ?, ?), ... 795 characters truncated ... (?, ?, ?), (?, ?, ?) RETURNING id
+ [generated in 0.00177s (insertmanyvalues)] ('d0', 0, 0, 'd1', ...
+ INSERT INTO a (data, x, y) VALUES (?, ?, ?), ... 795 characters truncated ... (?, ?, ?), (?, ?, ?) RETURNING id
+ [insertmanyvalues batch 2 of 10] ('d100', 100, 1000, 'd101', ...
+
+ ...
+
+ INSERT INTO a (data, x, y) VALUES (?, ?, ?), ... 795 characters truncated ... (?, ?, ?), (?, ?, ?) RETURNING id
+ [insertmanyvalues batch 10 of 10] ('d900', 900, 9000, 'd901', ...
+
+Upsert Support
+--------------
+
+The PostgreSQL, SQLite, and MariaDB dialects offer backend-specific
+"upsert" constructs :func:`_postgresql.insert`, :func:`_sqlite.insert`
+and :func:`_mysql.insert`, which are each :class:`_dml.Insert` constructs that
+have an additional method such as ``on_conflict_do_update()` or
+``on_duplicate_key()``. These constructs also support "insertmanyvalues"
+behaviors when they are used with RETURNING, allowing efficient upserts
+with RETURNING to take place.
+
+
.. _engine_disposal:
Engine Disposal
diff --git a/doc/build/faq/performance.rst b/doc/build/faq/performance.rst
index 9da73c7a7..5c0e399d3 100644
--- a/doc/build/faq/performance.rst
+++ b/doc/build/faq/performance.rst
@@ -433,238 +433,18 @@ practice they are very easy to read.
I'm inserting 400,000 rows with the ORM and it's really slow!
-------------------------------------------------------------
-The SQLAlchemy ORM uses the :term:`unit of work` pattern when synchronizing
-changes to the database. This pattern goes far beyond simple "inserts"
-of data. It includes that attributes which are assigned on objects are
-received using an attribute instrumentation system which tracks
-changes on objects as they are made, includes that all rows inserted
-are tracked in an identity map which has the effect that for each row
-SQLAlchemy must retrieve its "last inserted id" if not already given,
-and also involves that rows to be inserted are scanned and sorted for
-dependencies as needed. Objects are also subject to a fair degree of
-bookkeeping in order to keep all of this running, which for a very
-large number of rows at once can create an inordinate amount of time
-spent with large data structures, hence it's best to chunk these.
-
-Basically, unit of work is a large degree of automation in order to
-automate the task of persisting a complex object graph into a
-relational database with no explicit persistence code, and this
-automation has a price.
-
-ORMs are basically not intended for high-performance bulk inserts -
-this is the whole reason SQLAlchemy offers the Core in addition to the
-ORM as a first-class component.
-
-For the use case of fast bulk inserts, the
-SQL generation and execution system that the ORM builds on top of
-is part of the :ref:`Core <sqlexpression_toplevel>`. Using this system directly, we can produce an INSERT that
-is competitive with using the raw database API directly.
-
-.. note::
-
- When using the psycopg2 dialect, consider making use of the :ref:`batch
- execution helpers <psycopg2_executemany_mode>` feature of psycopg2, now
- supported directly by the SQLAlchemy psycopg2 dialect.
-
-Alternatively, the SQLAlchemy ORM offers the :ref:`bulk_operations`
-suite of methods, which provide hooks into subsections of the unit of
-work process in order to emit Core-level INSERT and UPDATE constructs with
-a small degree of ORM-based automation.
-
-The example below illustrates time-based tests for several different
-methods of inserting rows, going from the most automated to the least.
-With cPython, runtimes observed::
-
- Python: 3.8.12 | packaged by conda-forge | (default, Sep 29 2021, 19:42:05) [Clang 11.1.0 ]
- sqlalchemy v1.4.22 (future=True)
- SQLA ORM:
- Total time for 100000 records 5.722 secs
- SQLA ORM pk given:
- Total time for 100000 records 3.781 secs
- SQLA ORM bulk_save_objects:
- Total time for 100000 records 1.385 secs
- SQLA ORM bulk_save_objects, return_defaults:
- Total time for 100000 records 3.858 secs
- SQLA ORM bulk_insert_mappings:
- Total time for 100000 records 0.472 secs
- SQLA ORM bulk_insert_mappings, return_defaults:
- Total time for 100000 records 2.840 secs
- SQLA Core:
- Total time for 100000 records 0.246 secs
- sqlite3:
- Total time for 100000 records 0.153 secs
-
-We can reduce the time by a factor of nearly three using recent versions of `PyPy <https://pypy.org/>`_::
-
- Python: 3.7.10 | packaged by conda-forge | (77787b8f, Sep 07 2021, 14:06:31) [PyPy 7.3.5 with GCC Clang 11.1.0]
- sqlalchemy v1.4.25 (future=True)
- SQLA ORM:
- Total time for 100000 records 2.976 secs
- SQLA ORM pk given:
- Total time for 100000 records 1.676 secs
- SQLA ORM bulk_save_objects:
- Total time for 100000 records 0.658 secs
- SQLA ORM bulk_save_objects, return_defaults:
- Total time for 100000 records 1.158 secs
- SQLA ORM bulk_insert_mappings:
- Total time for 100000 records 0.403 secs
- SQLA ORM bulk_insert_mappings, return_defaults:
- Total time for 100000 records 0.976 secs
- SQLA Core:
- Total time for 100000 records 0.241 secs
- sqlite3:
- Total time for 100000 records 0.128 secs
-
-Script::
+The nature of ORM inserts has changed, as most included drivers use RETURNING
+with :ref:`insertmanyvalues <engine_insertmanyvalues>` support as of SQLAlchemy
+2.0. See the section :ref:`change_6047` for details.
- import contextlib
- import sqlite3
- import sys
- import tempfile
- import time
-
- from sqlalchemy.ext.declarative import declarative_base
- from sqlalchemy import __version__, Column, Integer, String, create_engine, insert
- from sqlalchemy.orm import Session
-
- Base = declarative_base()
+Overall, SQLAlchemy built-in drivers other than that of MySQL should now
+offer very fast ORM bulk insert performance.
+Third party drivers can opt in to the new bulk infrastructure as well with some
+small code changes assuming their backends support the necessary syntaxes.
+SQLAlchemy developers would encourage users of third party dialects to post
+issues with these drivers, so that they may contact SQLAlchemy developers for
+assistance.
- class Customer(Base):
- __tablename__ = "customer"
- id = Column(Integer, primary_key=True)
- name = Column(String(255))
- @contextlib.contextmanager
- def sqlalchemy_session(future):
- with tempfile.NamedTemporaryFile(suffix=".db") as handle:
- dbpath = handle.name
- engine = create_engine(f"sqlite:///{dbpath}", future=future, echo=False)
- session = Session(
- bind=engine, future=future, autoflush=False, expire_on_commit=False
- )
- Base.metadata.create_all(engine)
- yield session
- session.close()
-
-
- def print_result(name, nrows, seconds):
- print(f"{name}:\n{' '*10}Total time for {nrows} records {seconds:.3f} secs")
-
-
- def test_sqlalchemy_orm(n=100000, future=True):
- with sqlalchemy_session(future) as session:
- t0 = time.time()
- for i in range(n):
- customer = Customer()
- customer.name = "NAME " + str(i)
- session.add(customer)
- if i % 1000 == 0:
- session.flush()
- session.commit()
- print_result("SQLA ORM", n, time.time() - t0)
-
-
- def test_sqlalchemy_orm_pk_given(n=100000, future=True):
- with sqlalchemy_session(future) as session:
- t0 = time.time()
- for i in range(n):
- customer = Customer(id=i + 1, name="NAME " + str(i))
- session.add(customer)
- if i % 1000 == 0:
- session.flush()
- session.commit()
- print_result("SQLA ORM pk given", n, time.time() - t0)
-
-
- def test_sqlalchemy_orm_bulk_save_objects(n=100000, future=True, return_defaults=False):
- with sqlalchemy_session(future) as session:
- t0 = time.time()
- for chunk in range(0, n, 10000):
- session.bulk_save_objects(
- [
- Customer(name="NAME " + str(i))
- for i in range(chunk, min(chunk + 10000, n))
- ],
- return_defaults=return_defaults,
- )
- session.commit()
- print_result(
- f"SQLA ORM bulk_save_objects{', return_defaults' if return_defaults else ''}",
- n,
- time.time() - t0,
- )
-
-
- def test_sqlalchemy_orm_bulk_insert(n=100000, future=True, return_defaults=False):
- with sqlalchemy_session(future) as session:
- t0 = time.time()
- for chunk in range(0, n, 10000):
- session.bulk_insert_mappings(
- Customer,
- [
- dict(name="NAME " + str(i))
- for i in range(chunk, min(chunk + 10000, n))
- ],
- return_defaults=return_defaults,
- )
- session.commit()
- print_result(
- f"SQLA ORM bulk_insert_mappings{', return_defaults' if return_defaults else ''}",
- n,
- time.time() - t0,
- )
-
-
- def test_sqlalchemy_core(n=100000, future=True):
- with sqlalchemy_session(future) as session:
- with session.bind.begin() as conn:
- t0 = time.time()
- conn.execute(
- insert(Customer.__table__),
- [{"name": "NAME " + str(i)} for i in range(n)],
- )
- conn.commit()
- print_result("SQLA Core", n, time.time() - t0)
-
-
- @contextlib.contextmanager
- def sqlite3_conn():
- with tempfile.NamedTemporaryFile(suffix=".db") as handle:
- dbpath = handle.name
- conn = sqlite3.connect(dbpath)
- c = conn.cursor()
- c.execute("DROP TABLE IF EXISTS customer")
- c.execute(
- "CREATE TABLE customer (id INTEGER NOT NULL, "
- "name VARCHAR(255), PRIMARY KEY(id))"
- )
- conn.commit()
- yield conn
-
-
- def test_sqlite3(n=100000):
- with sqlite3_conn() as conn:
- c = conn.cursor()
- t0 = time.time()
- for i in range(n):
- row = ("NAME " + str(i),)
- c.execute("INSERT INTO customer (name) VALUES (?)", row)
- conn.commit()
- print_result("sqlite3", n, time.time() - t0)
-
-
- if __name__ == "__main__":
- rows = 100000
- _future = True
- print(f"Python: {' '.join(sys.version.splitlines())}")
- print(f"sqlalchemy v{__version__} (future={_future})")
- test_sqlalchemy_orm(rows, _future)
- test_sqlalchemy_orm_pk_given(rows, _future)
- test_sqlalchemy_orm_bulk_save_objects(rows, _future)
- test_sqlalchemy_orm_bulk_save_objects(rows, _future, True)
- test_sqlalchemy_orm_bulk_insert(rows, _future)
- test_sqlalchemy_orm_bulk_insert(rows, _future, True)
- test_sqlalchemy_core(rows, _future)
- test_sqlite3(rows)
diff --git a/doc/build/orm/persistence_techniques.rst b/doc/build/orm/persistence_techniques.rst
index 94e5601f5..0a4d7e056 100644
--- a/doc/build/orm/persistence_techniques.rst
+++ b/doc/build/orm/persistence_techniques.rst
@@ -875,21 +875,24 @@ Bulk Operations
of the Unit of Work's facilities for emitting INSERT and UPDATE statements
on primary key targeted rows. These routines were added to suit some
cases where many rows being inserted or updated could be run into the
- database without as much of the usual unit of work overhead, in that
- most unit of work features are **disabled**.
+ database without as much of the usual unit of work overhead, by disabling
+ many unit of work features.
- There is **usually no need to use these routines, and they are not easy
- to use as there are many missing behaviors that are usually expected when
- using ORM objects**; for efficient
- bulk inserts, it's better to use the Core :class:`_sql.Insert` construct
- directly. Please read all caveats at :ref:`bulk_operations_caveats`.
+ There is **usually no need to use these routines, particularly in
+ modern SQLAlchemy 2.0 which has greatly improved the performance
+ of ORM unit-of-work INSERTs for most backends.** Ordinary ORM
+ INSERT operations as well as the bulk methods documented here both take
+ advantage of the same :ref:`engine_insertmanyvalues` feature introduced
+ in SQLAlchemy 2.0. For backends that support RETURNING, the vast majority
+ of performance overhead for bulk inserts has been resolved.
+
+ As the bulk operations forego many unit of work features, please read all
+ caveats at :ref:`bulk_operations_caveats`.
.. note:: Bulk INSERT and UPDATE should not be confused with the
- more common feature known as :ref:`orm_expression_update_delete`. This
- feature allows a single UPDATE or DELETE statement with arbitrary WHERE
- criteria to be emitted. There is also an option on some backends to
- use true "upsert" with the ORM, such as on PostgreSQL. See the section
- :ref:`orm_dml_returning_objects` for examples.
+ feature known as :ref:`orm_expression_update_delete`, which
+ allow a single UPDATE or DELETE statement with arbitrary WHERE
+ criteria to be emitted.
.. seealso::
@@ -899,8 +902,6 @@ Bulk Operations
:ref:`orm_dml_returning_objects` - use UPDATE, INSERT or upsert operations to
return ORM objects
-.. versionadded:: 1.0.0
-
Bulk INSERT/per-row UPDATE operations on the :class:`.Session` include
:meth:`.Session.bulk_save_objects`, :meth:`.Session.bulk_insert_mappings`, and
:meth:`.Session.bulk_update_mappings`. The purpose of these methods is to
@@ -908,12 +909,12 @@ directly expose internal elements of the unit of work system, such that
facilities for emitting INSERT and UPDATE statements given dictionaries or
object states can be utilized alone, bypassing the normal unit of work
mechanics of state, relationship and attribute management. The advantages to
-this approach is strictly one of reduced Python overhead:
+this approach is strictly that of reduced Python overhead:
* The flush() process, including the survey of all objects, their state,
their cascade status, the status of all objects associated with them
via :func:`_orm.relationship`, and the topological sort of all operations to
- be performed is completely bypassed. This reduces a great amount of
+ be performed are bypassed. This can in many cases reduce
Python overhead.
* The objects as given have no defined relationship to the target
@@ -935,13 +936,13 @@ this approach is strictly one of reduced Python overhead:
passed to the DBAPI, ensure that the incoming list of objects
are grouped by type.
-* The process of fetching primary keys after an INSERT also is disabled by
- default. When performed correctly, INSERT statements can now more readily
- be batched by the unit of work process into ``executemany()`` blocks, which
- perform vastly better than individual statement invocations.
+* In most cases, the bulk operations don't need to fetch newly generated
+ primary key values after the INSERT proceeds. This is historically a
+ major performance bottleneck in the ORM, however in modern ORM use most
+ backends have full support for RETURNING with multi-row INSERT statements.
* UPDATE statements can similarly be tailored such that all attributes
- are subject to the SET clause unconditionally, again making it much more
+ are subject to the SET clause unconditionally, making it more
likely that ``executemany()`` blocks can be used.
The performance behavior of the bulk routines should be studied using the
@@ -1023,10 +1024,9 @@ are **not available** when using these methods:
value of each row is not available, so the WHERE criteria cannot be
generated.
-* SQL expression inserts / updates (e.g. :ref:`flush_embedded_sql_expressions`) -
- having to evaluate these would prevent INSERT and UPDATE statements from
- being batched together in a straightforward way for a single executemany()
- call as they alter the SQL compilation of the statement itself.
+* SQL expression inserts / updates (e.g. :ref:`flush_embedded_sql_expressions`)
+ are not supported in this mode as it prevents INSERT / UPDATE statements
+ from being efficiently batched.
* ORM events such as :meth:`.MapperEvents.before_insert`, etc. The bulk
session methods have no event support.
@@ -1037,10 +1037,7 @@ Features that **are available** include:
* Version identifier support
-* Multi-table mappings, such as joined-inheritance - however, an object
- to be inserted across multiple tables either needs to have primary key
- identifiers fully populated ahead of time, else the
- :paramref:`.Session.bulk_save_objects.return_defaults` flag must be used,
- which will greatly reduce the performance benefits
+* Multi-table mappings, such as joined-inheritance. Enable
+ :paramref:`.Session.bulk_save_objects.return_defaults` for this to be used.
diff --git a/doc/build/orm/quickstart.rst b/doc/build/orm/quickstart.rst
index 68ec0d01f..3b909f2ad 100644
--- a/doc/build/orm/quickstart.rst
+++ b/doc/build/orm/quickstart.rst
@@ -202,18 +202,10 @@ is used:
...
... session.commit()
{opensql}BEGIN (implicit)
- INSERT INTO user_account (name, fullname) VALUES (?, ?)
- [...] ('spongebob', 'Spongebob Squarepants')
- INSERT INTO user_account (name, fullname) VALUES (?, ?)
- [...] ('sandy', 'Sandy Cheeks')
- INSERT INTO user_account (name, fullname) VALUES (?, ?)
- [...] ('patrick', 'Patrick Star')
- INSERT INTO address (email_address, user_id) VALUES (?, ?)
- [...] ('spongebob@sqlalchemy.org', 1)
- INSERT INTO address (email_address, user_id) VALUES (?, ?)
- [...] ('sandy@sqlalchemy.org', 2)
- INSERT INTO address (email_address, user_id) VALUES (?, ?)
- [...] ('sandy@squirrelpower.org', 2)
+ INSERT INTO user_account (name, fullname) VALUES (?, ?), (?, ?), (?, ?) RETURNING id
+ [...] ('spongebob', 'Spongebob Squarepants', 'sandy', 'Sandy Cheeks', 'patrick', 'Patrick Star')
+ INSERT INTO address (email_address, user_id) VALUES (?, ?), (?, ?), (?, ?) RETURNING id
+ [...] ('spongebob@sqlalchemy.org', 1, 'sandy@sqlalchemy.org', 2, 'sandy@squirrelpower.org', 2)
COMMIT
diff --git a/doc/build/tutorial/orm_data_manipulation.rst b/doc/build/tutorial/orm_data_manipulation.rst
index b0b67f53c..f6237f4aa 100644
--- a/doc/build/tutorial/orm_data_manipulation.rst
+++ b/doc/build/tutorial/orm_data_manipulation.rst
@@ -122,10 +122,9 @@ method:
>>> session.flush()
{opensql}BEGIN (implicit)
- INSERT INTO user_account (name, fullname) VALUES (?, ?)
- [...] ('squidward', 'Squidward Tentacles')
- INSERT INTO user_account (name, fullname) VALUES (?, ?)
- [...] ('ehkrabs', 'Eugene H. Krabs')
+ INSERT INTO user_account (name, fullname) VALUES (?, ?), (?, ?) RETURNING id
+ [...] ('squidward', 'Squidward Tentacles', 'ehkrabs', 'Eugene H. Krabs')
+
Above we observe the :class:`_orm.Session` was first called upon to emit SQL,
so it created a new transaction and emitted the appropriate INSERT statements
diff --git a/doc/build/tutorial/orm_related_objects.rst b/doc/build/tutorial/orm_related_objects.rst
index 0df611e45..a6bb2b9a3 100644
--- a/doc/build/tutorial/orm_related_objects.rst
+++ b/doc/build/tutorial/orm_related_objects.rst
@@ -198,12 +198,11 @@ newly generated primary key of the ``user_account`` row is applied to the
>>> session.commit()
{opensql}INSERT INTO user_account (name, fullname) VALUES (?, ?)
[...] ('pkrabs', 'Pearl Krabs')
- INSERT INTO address (email_address, user_id) VALUES (?, ?)
- [...] ('pearl.krabs@gmail.com', 6)
- INSERT INTO address (email_address, user_id) VALUES (?, ?)
- [...] ('pearl@aol.com', 6)
+ INSERT INTO address (email_address, user_id) VALUES (?, ?), (?, ?) RETURNING id
+ [...] ('pearl.krabs@gmail.com', 6, 'pearl@aol.com', 6)
COMMIT
+
.. _tutorial_loading_relationships:
Loading Relationships