MDEV-28846 Poor performance when rowid filter contains no elementsbb-10.5-MDEV-28846

When a range rowid filter was used with an index ref access the cost of accessing the index entries for the records rejected by the filter was not taken into account. For a ref access by an index with big average number of records per key this led to poor execution plans if selectivity of the used filter was high. The patch resolves this problem. It also introduces a minor optimization that skips look-ups into a filter that turns out to be empty. With this patch the output of ANALYZE stmt reports the number of look-ups into used rowid filters. The test cases that were supposed to use rowid filters have been adjusted in order to use similar execution plans after this fix. This patch prepared for 10.5 differs from the patch resolving the problem for 10.4 because he code that calculates the cost of index only access has been changed for 10.5 making usage of rowid filters more favorable. Approved by Oleksandr Byelkin <sanja@mariadb.com>
author: Igor Babaev <igor@askmonty.org> 2022-10-25 18:07:27 -0700
committer: Igor Babaev <igor@askmonty.org> 2022-10-25 18:07:27 -0700
commit: eee0fda325fcf353ea863c87a46ff2995eb1711f (patch)
tree: 6117cd851fd12973e5fddfe6acc42056e9722317 /sql/sql_select.cc
parent: 42802ad66c49b6de11b37c7ea4e4658ccc5a94aa (diff)
download: mariadb-git-bb-10.5-MDEV-28846.tar.gz
1 files changed, 46 insertions, 4 deletions
diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 81c4991e801..3d880365d42 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -7496,9 +7496,11 @@ best_access_path(JOIN      *join,
   double best=              DBL_MAX;
   double best_time=         DBL_MAX;
   double records=           DBL_MAX;
+  ha_rows records_for_key=   0;
   table_map best_ref_depends_map= 0;
   Range_rowid_filter_cost_info *best_filter= 0;
   double tmp;
+  double keyread_tmp= 0;
   ha_rows rec;
   bool best_uses_jbuf= FALSE;
   MY_BITMAP *eq_join_set= &s->table->eq_join_set;
@@ -7772,8 +7774,12 @@ best_access_path(JOIN      *join,
             /* Limit the number of matched rows */
             tmp= cost_for_index_read(thd, table, key, (ha_rows) records,
                                      (ha_rows) s->worst_seeks);
+            records_for_key= (ha_rows) records;
+            set_if_smaller(records_for_key, thd->variables.max_seeks_for_key);
+            keyread_tmp= table->file->keyread_time(key, 1, records_for_key);
         got_cost:
             tmp= COST_MULT(tmp, record_count);
+            keyread_tmp= COST_MULT(keyread_tmp, record_count);
           }
         }
         else
@@ -7950,12 +7956,14 @@ best_access_path(JOIN      *join,
             }
 
             /* Limit the number of matched rows */
-            tmp= records;
-            set_if_smaller(tmp, (double) thd->variables.max_seeks_for_key);
-            tmp= cost_for_index_read(thd, table, key, (ha_rows) tmp,
+            tmp= cost_for_index_read(thd, table, key, (ha_rows) records,
                                      (ha_rows) s->worst_seeks);
+            records_for_key= (ha_rows) records;
+            set_if_smaller(records_for_key, thd->variables.max_seeks_for_key);
+            keyread_tmp= table->file->keyread_time(key, 1, records_for_key);
         got_cost2:
             tmp= COST_MULT(tmp, record_count);
+            keyread_tmp= COST_MULT(keyread_tmp, record_count);
           }
           else
           {
@@ -7974,7 +7982,35 @@ best_access_path(JOIN      *join,
 	  (found_part & 1))   // start_key->key can be used for index access
       {
         double rows= record_count * records;
-        double access_cost_factor= MY_MIN(tmp / rows, 1.0);
+
+        /*
+          If we use filter F with selectivity s the the cost of fetching data
+          by key using this filter will be
+             cost_of_fetching_1_row * rows * s +
+             cost_of_fetching_1_key_tuple * rows * (1 - s) +
+             cost_of_1_lookup_into_filter * rows
+          Without using any filter the cost would be just
+             cost_of_fetching_1_row * rows
+
+          So the gain in access cost per row will be
+             cost_of_fetching_1_row * (1 - s) -
+             cost_of_fetching_1_key_tuple * (1 - s) -
+             cost_of_1_lookup_into_filter
+             =
+             (cost_of_fetching_1_row - cost_of_fetching_1_key_tuple) * (1 - s)
+             - cost_of_1_lookup_into_filter
+
+          Here we have:
+             cost_of_fetching_1_row = tmp/rows
+             cost_of_fetching_1_key_tuple = keyread_tmp/rows
+
+          Note that access_cost_factor may be greater than 1.0. In this case
+          we still can expect a gain of using rowid filter due to smaller number
+          of checks for conditions pushed to the joined table.
+	*/
+        double rows_access_cost= MY_MIN(rows, s->worst_seeks);
+        double access_cost_factor= MY_MIN((rows_access_cost - keyread_tmp) /
+                                           rows, 1.0);
         filter=
           table->best_range_rowid_filter_for_partial_join(start_key->key, rows,
                                                           access_cost_factor);
@@ -8135,6 +8171,10 @@ best_access_path(JOIN      *join,
         double rows= record_count * s->found_records;
         double access_cost_factor= MY_MIN(tmp / rows, 1.0);
         uint key_no= s->quick->index;
+
+        /* See the comment concerning using rowid filter for with ref access */
+        keyread_tmp= s->table->opt_range[key_no].index_only_cost;
+        access_cost_factor= MY_MIN((rows - keyread_tmp) / rows, 1.0);
         filter=
         s->table->best_range_rowid_filter_for_partial_join(key_no, rows,
                                                            access_cost_factor);
@@ -20929,6 +20969,8 @@ sub_select(JOIN *join,JOIN_TAB *join_tab,bool end_of_records)
     DBUG_RETURN(NESTED_LOOP_ERROR);
 
   join_tab->build_range_rowid_filter_if_needed();
+  if (join_tab->rowid_filter && join_tab->rowid_filter->is_empty())
+    rc= NESTED_LOOP_NO_MORE_ROWS;
 
   join->return_tab= join_tab;
author	Igor Babaev <igor@askmonty.org>	2022-10-25 18:07:27 -0700
committer	Igor Babaev <igor@askmonty.org>	2022-10-25 18:07:27 -0700
commit	eee0fda325fcf353ea863c87a46ff2995eb1711f (patch)
tree	6117cd851fd12973e5fddfe6acc42056e9722317 /sql/sql_select.cc
parent	42802ad66c49b6de11b37c7ea4e4658ccc5a94aa (diff)
download	mariadb-git-bb-10.5-MDEV-28846.tar.gz