Turn on caching everywhere, add logging

A variety of caching issues found by running all tests with statement caching turned on. The cache system now has a more conservative approach where any subclass of a SQL element will by default invalidate the cache key unless it adds the flag inherit_cache=True at the class level, or if it implements its own caching. Add working caching to a few elements that were omitted previously; fix some caching implementations to suit lesser used edge cases such as json casts and array slices. Refine the way BaseCursorResult and CursorMetaData interact with caching; to suit cases like Alembic modifying table structures, don't cache the cursor metadata if it were created against a cursor.description using non-positional matching, e.g. "select *". if a table re-ordered its columns or added/removed, now that data is obsolete. Additionally we have to adapt the cursor metadata _keymap regardless of if we just processed cursor.description, because if we ran against a cached SQLCompiler we won't have the right columns in _keymap. Other refinements to how and when we do this adaption as some weird cases were exposed in the Postgresql dialect, a text() construct that names just one column that is not actually in the statement. Fixed that also as it looks like a cut-and-paste artifact that doesn't actually affect anything. Various issues with re-use of compiled result maps and cursor metadata in conjunction with tables being changed, such as change in order of columns. mappers can be cleared but the class remains, meaning a mapper has to use itself as the cache key not the class. lots of bound parameter / literal issues, due to Alembic creating a straight subclass of bindparam that renders inline directly. While we can update Alembic to not do this, we have to assume other people might be doing this, so bindparam() implements the inherit_cache=True logic as well that was a bit involved. turn on cache stats in logging. Includes a fix to subqueryloader which moves all setup to the create_row_processor() phase and elminates any storage within the compiled context. This includes some changes to create_row_processor() signature and a revising of the technique used to determine if the loader can participate in polymorphic queries, which is also applied to selectinloading. DML update.values() and ordered_values() now coerces the keys as we have tests that pass an arbitrary class here which only includes __clause_element__(), so the key can't be cached unless it is coerced. this in turn changed how composite attributes support bulk update to use the standard approach of ClauseElement with annotations that are parsed in the ORM context. memory profiling successfully caught that the Session from Query was getting passed into _statement_20() so that was a big win for that test suite. Apparently Compiler had .execute() and .scalar() methods stuck on it, these date back to version 0.4 and there was a single test in the PostgreSQL dialect tests that exercised it for no apparent reason. Removed these methods as well as the concept of a Compiler holding onto a "bind". Fixes: #5386 Change-Id: I990b43aab96b42665af1b2187ad6020bee778784
author: Mike Bayer <mike_mp@zzzcomputing.com> 2020-06-06 20:40:43 -0400
committer: Mike Bayer <mike_mp@zzzcomputing.com> 2020-06-10 15:29:01 -0400
commit: b0cfa7379cf8513a821a3dbe3028c4965d9f85bd (patch)
tree: 19a79632b4f159092d955765ff9f7e842808bce7 /lib/sqlalchemy/engine
parent: 3ab2364e78641c4f0e4b6456afc2cbed39b0d0e6 (diff)
download: sqlalchemy-b0cfa7379cf8513a821a3dbe3028c4965d9f85bd.tar.gz
4 files changed, 122 insertions, 100 deletions
diff --git a/lib/sqlalchemy/engine/base.py b/lib/sqlalchemy/engine/base.py
index a36f4eee2..3e02a29fe 100644
--- a/lib/sqlalchemy/engine/base.py
+++ b/lib/sqlalchemy/engine/base.py
@@ -1175,46 +1175,17 @@ class Connection(Connectable):
         )
 
         compiled_cache = execution_options.get(
-            "compiled_cache", self.dialect._compiled_cache
+            "compiled_cache", self.engine._compiled_cache
         )
 
-        if compiled_cache is not None:
-            elem_cache_key = elem._generate_cache_key()
-        else:
-            elem_cache_key = None
-
-        if elem_cache_key:
-            cache_key, extracted_params = elem_cache_key
-            key = (
-                dialect,
-                cache_key,
-                tuple(keys),
-                bool(schema_translate_map),
-                inline,
-            )
-            compiled_sql = compiled_cache.get(key)
-
-            if compiled_sql is None:
-                compiled_sql = elem.compile(
-                    dialect=dialect,
-                    cache_key=elem_cache_key,
-                    column_keys=keys,
-                    inline=inline,
-                    schema_translate_map=schema_translate_map,
-                    linting=self.dialect.compiler_linting
-                    | compiler.WARN_LINTING,
-                )
-                compiled_cache[key] = compiled_sql
-        else:
-            extracted_params = None
-            compiled_sql = elem.compile(
-                dialect=dialect,
-                column_keys=keys,
-                inline=inline,
-                schema_translate_map=schema_translate_map,
-                linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
-            )
-
+        compiled_sql, extracted_params, cache_hit = elem._compile_w_cache(
+            dialect=dialect,
+            compiled_cache=compiled_cache,
+            column_keys=keys,
+            inline=inline,
+            schema_translate_map=schema_translate_map,
+            linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
+        )
         ret = self._execute_context(
             dialect,
             dialect.execution_ctx_cls._init_compiled,
@@ -1225,6 +1196,7 @@ class Connection(Connectable):
             distilled_params,
             elem,
             extracted_params,
+            cache_hit=cache_hit,
         )
         if has_events:
             self.dispatch.after_execute(
@@ -1389,7 +1361,8 @@ class Connection(Connectable):
         statement,
         parameters,
         execution_options,
-        *args
+        *args,
+        **kw
     ):
         """Create an :class:`.ExecutionContext` and execute, returning
         a :class:`_engine.CursorResult`."""
@@ -1407,7 +1380,7 @@ class Connection(Connectable):
                 conn = self._revalidate_connection()
 
             context = constructor(
-                dialect, self, conn, execution_options, *args
+                dialect, self, conn, execution_options, *args, **kw
             )
         except (exc.PendingRollbackError, exc.ResourceClosedError):
             raise
@@ -1455,32 +1428,21 @@ class Connection(Connectable):
 
             self.engine.logger.info(statement)
 
-            # stats = context._get_cache_stats()
+            stats = context._get_cache_stats()
 
             if not self.engine.hide_parameters:
-                # TODO: I love the stats but a ton of tests that are hardcoded.
-                # to certain log output are failing.
                 self.engine.logger.info(
-                    "%r",
+                    "[%s] %r",
+                    stats,
                     sql_util._repr_params(
                         parameters, batches=10, ismulti=context.executemany
                     ),
                 )
-                # self.engine.logger.info(
-                #    "[%s] %r",
-                #    stats,
-                #    sql_util._repr_params(
-                #        parameters, batches=10, ismulti=context.executemany
-                #    ),
-                # )
             else:
                 self.engine.logger.info(
-                    "[SQL parameters hidden due to hide_parameters=True]"
+                    "[%s] [SQL parameters hidden due to hide_parameters=True]"
+                    % (stats,)
                 )
-                # self.engine.logger.info(
-                #    "[%s] [SQL parameters hidden due to hide_parameters=True]"
-                #    % (stats,)
-                # )
 
         evt_handled = False
         try:
@@ -2369,6 +2331,7 @@ class Engine(Connectable, log.Identified):
         url,
         logging_name=None,
         echo=None,
+        query_cache_size=500,
         execution_options=None,
         hide_parameters=False,
     ):
@@ -2379,14 +2342,43 @@ class Engine(Connectable, log.Identified):
             self.logging_name = logging_name
         self.echo = echo
         self.hide_parameters = hide_parameters
+        if query_cache_size != 0:
+            self._compiled_cache = util.LRUCache(
+                query_cache_size, size_alert=self._lru_size_alert
+            )
+        else:
+            self._compiled_cache = None
         log.instance_logger(self, echoflag=echo)
         if execution_options:
             self.update_execution_options(**execution_options)
 
+    def _lru_size_alert(self, cache):
+        if self._should_log_info:
+            self.logger.info(
+                "Compiled cache size pruning from %d items to %d.  "
+                "Increase cache size to reduce the frequency of pruning.",
+                len(cache),
+                cache.capacity,
+            )
+
     @property
     def engine(self):
         return self
 
+    def clear_compiled_cache(self):
+        """Clear the compiled cache associated with the dialect.
+
+        This applies **only** to the built-in cache that is established
+        via the :paramref:`.create_engine.query_cache_size` parameter.
+        It will not impact any dictionary caches that were passed via the
+        :paramref:`.Connection.execution_options.query_cache` parameter.
+
+        .. versionadded:: 1.4
+
+        """
+        if self._compiled_cache:
+            self._compiled_cache.clear()
+
     def update_execution_options(self, **opt):
         r"""Update the default execution_options dictionary
         of this :class:`_engine.Engine`.
@@ -2874,6 +2866,7 @@ class OptionEngineMixin(object):
         self.dialect = proxied.dialect
         self.logging_name = proxied.logging_name
         self.echo = proxied.echo
+        self._compiled_cache = proxied._compiled_cache
         self.hide_parameters = proxied.hide_parameters
         log.instance_logger(self, echoflag=self.echo)
 
diff --git a/lib/sqlalchemy/engine/create.py b/lib/sqlalchemy/engine/create.py
index 4c912349e..9bf72eb06 100644
--- a/lib/sqlalchemy/engine/create.py
+++ b/lib/sqlalchemy/engine/create.py
@@ -436,7 +436,13 @@ def create_engine(url, **kwargs):
         .. versionadded:: 1.2.3
 
     :param query_cache_size: size of the cache used to cache the SQL string
-     form of queries.  Defaults to zero, which disables caching.
+     form of queries.  Set to zero to disable caching.
+
+     The cache is pruned of its least recently used items when its size reaches
+     N * 1.5.  Defaults to 500, meaning the cache will always store at least
+     500 SQL statements when filled, and will grow up to 750 items at which
+     point it is pruned back down to 500 by removing the 250 least recently
+     used items.
 
      Caching is accomplished on a per-statement basis by generating a
      cache key that represents the statement's structure, then generating
@@ -446,6 +452,11 @@ def create_engine(url, **kwargs):
      bypass the cache.   SQL logging will indicate statistics for each
      statement whether or not it were pull from the cache.
 
+     .. note:: some ORM functions related to unit-of-work persistence as well
+        as some attribute loading strategies will make use of individual
+        per-mapper caches outside of the main cache.
+
+
      .. seealso::
 
         ``engine_caching`` - TODO: this will be an upcoming section describing
diff --git a/lib/sqlalchemy/engine/cursor.py b/lib/sqlalchemy/engine/cursor.py
index d03d79df7..abffe0d1f 100644
--- a/lib/sqlalchemy/engine/cursor.py
+++ b/lib/sqlalchemy/engine/cursor.py
@@ -51,6 +51,7 @@ class CursorResultMetaData(ResultMetaData):
         "_keys",
         "_tuplefilter",
         "_translated_indexes",
+        "_safe_for_cache"
         # don't need _unique_filters support here for now.  Can be added
         # if a need arises.
     )
@@ -104,11 +105,11 @@ class CursorResultMetaData(ResultMetaData):
         return new_metadata
 
     def _adapt_to_context(self, context):
-        """When using a cached result metadata against a new context,
-        we need to rewrite the _keymap so that it has the specific
-        Column objects in the new context inside of it.  this accommodates
-        for select() constructs that contain anonymized columns and
-        are cached.
+        """When using a cached Compiled construct that has a _result_map,
+        for a new statement that used the cached Compiled, we need to ensure
+        the keymap has the Column objects from our new statement as keys.
+        So here we rewrite keymap with new entries for the new columns
+        as matched to those of the cached statement.
 
         """
         if not context.compiled._result_columns:
@@ -124,14 +125,15 @@ class CursorResultMetaData(ResultMetaData):
         # to the result map.
         md = self.__class__.__new__(self.__class__)
 
-        md._keymap = self._keymap.copy()
+        md._keymap = dict(self._keymap)
 
         # match up new columns positionally to the result columns
         for existing, new in zip(
             context.compiled._result_columns,
             invoked_statement._exported_columns_iterator(),
         ):
-            md._keymap[new] = md._keymap[existing[RM_NAME]]
+            if existing[RM_NAME] in md._keymap:
+                md._keymap[new] = md._keymap[existing[RM_NAME]]
 
         md.case_sensitive = self.case_sensitive
         md._processors = self._processors
@@ -147,6 +149,7 @@ class CursorResultMetaData(ResultMetaData):
         self._tuplefilter = None
         self._translated_indexes = None
         self.case_sensitive = dialect.case_sensitive
+        self._safe_for_cache = False
 
         if context.result_column_struct:
             (
@@ -341,6 +344,10 @@ class CursorResultMetaData(ResultMetaData):
             self._keys = [elem[0] for elem in result_columns]
             # pure positional 1-1 case; doesn't need to read
             # the names from cursor.description
+
+            # this metadata is safe to cache because we are guaranteed
+            # to have the columns in the same order for new executions
+            self._safe_for_cache = True
             return [
                 (
                     idx,
@@ -359,9 +366,12 @@ class CursorResultMetaData(ResultMetaData):
                 for idx, rmap_entry in enumerate(result_columns)
             ]
         else:
+
             # name-based or text-positional cases, where we need
             # to read cursor.description names
+
             if textual_ordered:
+                self._safe_for_cache = True
                 # textual positional case
                 raw_iterator = self._merge_textual_cols_by_position(
                     context, cursor_description, result_columns
@@ -369,6 +379,9 @@ class CursorResultMetaData(ResultMetaData):
             elif num_ctx_cols:
                 # compiled SQL with a mismatch of description cols
                 # vs. compiled cols, or textual w/ unordered columns
+                # the order of columns can change if the query is
+                # against a "select *", so not safe to cache
+                self._safe_for_cache = False
                 raw_iterator = self._merge_cols_by_name(
                     context,
                     cursor_description,
@@ -376,7 +389,9 @@ class CursorResultMetaData(ResultMetaData):
                     loose_column_name_matching,
                 )
             else:
-                # no compiled SQL, just a raw string
+                # no compiled SQL, just a raw string, order of columns
+                # can change for "select *"
+                self._safe_for_cache = False
                 raw_iterator = self._merge_cols_by_none(
                     context, cursor_description
                 )
@@ -1152,7 +1167,6 @@ class BaseCursorResult(object):
 
     out_parameters = None
     _metadata = None
-    _metadata_from_cache = False
     _soft_closed = False
     closed = False
 
@@ -1209,33 +1223,38 @@ class BaseCursorResult(object):
     def _init_metadata(self, context, cursor_description):
         if context.compiled:
             if context.compiled._cached_metadata:
-                cached_md = self.context.compiled._cached_metadata
-                self._metadata_from_cache = True
-
-                # result rewrite/ adapt step.  two translations can occur here.
-                # one is if we are invoked against a cached statement, we want
-                # to rewrite the ResultMetaData to reflect the column objects
-                # that are in our current selectable, not the cached one.  the
-                # other is, the CompileState can return an alternative Result
-                # object.   Finally, CompileState might want to tell us to not
-                # actually do the ResultMetaData adapt step if it in fact has
-                # changed the selected columns in any case.
-                compiled = context.compiled
-                if (
-                    compiled
-                    and not compiled._rewrites_selected_columns
-                    and compiled.statement is not context.invoked_statement
-                ):
-                    cached_md = cached_md._adapt_to_context(context)
-
-                self._metadata = metadata = cached_md
-
+                metadata = self.context.compiled._cached_metadata
             else:
-                self._metadata = (
-                    metadata
-                ) = context.compiled._cached_metadata = self._cursor_metadata(
-                    self, cursor_description
-                )
+                metadata = self._cursor_metadata(self, cursor_description)
+                if metadata._safe_for_cache:
+                    context.compiled._cached_metadata = metadata
+
+            # result rewrite/ adapt step.  this is to suit the case
+            # when we are invoked against a cached Compiled object, we want
+            # to rewrite the ResultMetaData to reflect the Column objects
+            # that are in our current SQL statement object, not the one
+            # that is associated with the cached Compiled object.
+            # the Compiled object may also tell us to not
+            # actually do this step; this is to support the ORM where
+            # it is to produce a new Result object in any case, and will
+            # be using the cached Column objects against this database result
+            # so we don't want to rewrite them.
+            #
+            # Basically this step suits the use case where the end user
+            # is using Core SQL expressions and is accessing columns in the
+            # result row using row._mapping[table.c.column].
+            compiled = context.compiled
+            if (
+                compiled
+                and compiled._result_columns
+                and context.cache_hit
+                and not compiled._rewrites_selected_columns
+                and compiled.statement is not context.invoked_statement
+            ):
+                metadata = metadata._adapt_to_context(context)
+
+            self._metadata = metadata
+
         else:
             self._metadata = metadata = self._cursor_metadata(
                 self, cursor_description
diff --git a/lib/sqlalchemy/engine/default.py b/lib/sqlalchemy/engine/default.py
index c682a8ee1..4d516e97c 100644
--- a/lib/sqlalchemy/engine/default.py
+++ b/lib/sqlalchemy/engine/default.py
@@ -230,7 +230,6 @@ class DefaultDialect(interfaces.Dialect):
         supports_native_boolean=None,
         max_identifier_length=None,
         label_length=None,
-        query_cache_size=0,
         # int() is because the @deprecated_params decorator cannot accommodate
         # the direct reference to the "NO_LINTING" object
         compiler_linting=int(compiler.NO_LINTING),
@@ -262,10 +261,6 @@ class DefaultDialect(interfaces.Dialect):
         if supports_native_boolean is not None:
             self.supports_native_boolean = supports_native_boolean
         self.case_sensitive = case_sensitive
-        if query_cache_size != 0:
-            self._compiled_cache = util.LRUCache(query_cache_size)
-        else:
-            self._compiled_cache = None
 
         self._user_defined_max_identifier_length = max_identifier_length
         if self._user_defined_max_identifier_length:
@@ -794,6 +789,7 @@ class DefaultExecutionContext(interfaces.ExecutionContext):
         parameters,
         invoked_statement,
         extracted_parameters,
+        cache_hit=False,
     ):
         """Initialize execution context for a Compiled construct."""
 
@@ -804,6 +800,7 @@ class DefaultExecutionContext(interfaces.ExecutionContext):
         self.extracted_parameters = extracted_parameters
         self.invoked_statement = invoked_statement
         self.compiled = compiled
+        self.cache_hit = cache_hit
 
         self.execution_options = execution_options
 
@@ -1027,13 +1024,15 @@ class DefaultExecutionContext(interfaces.ExecutionContext):
 
     def _get_cache_stats(self):
         if self.compiled is None:
-            return "raw SQL"
+            return "raw sql"
 
         now = time.time()
         if self.compiled.cache_key is None:
-            return "gen %.5fs" % (now - self.compiled._gen_time,)
+            return "no key %.5fs" % (now - self.compiled._gen_time,)
+        elif self.cache_hit:
+            return "cached for %.4gs" % (now - self.compiled._gen_time,)
         else:
-            return "cached %.5fs" % (now - self.compiled._gen_time,)
+            return "generated in %.5fs" % (now - self.compiled._gen_time,)
 
     @util.memoized_property
     def engine(self):
author	Mike Bayer <mike_mp@zzzcomputing.com>	2020-06-06 20:40:43 -0400
committer	Mike Bayer <mike_mp@zzzcomputing.com>	2020-06-10 15:29:01 -0400
commit	b0cfa7379cf8513a821a3dbe3028c4965d9f85bd (patch)
tree	19a79632b4f159092d955765ff9f7e842808bce7 /lib/sqlalchemy/engine
parent	3ab2364e78641c4f0e4b6456afc2cbed39b0d0e6 (diff)
download	sqlalchemy-b0cfa7379cf8513a821a3dbe3028c4965d9f85bd.tar.gz