summaryrefslogtreecommitdiff
path: root/sql
diff options
context:
space:
mode:
authorMonty <monty@mariadb.org>2022-08-11 13:05:23 +0300
committerSergei Petrunia <sergey@mariadb.com>2023-02-02 23:54:45 +0300
commitb66cdbd1eaeed7e96317a03a190c496fd062ec71 (patch)
tree501c4ea585cf64180163b5c4f2eeeb424bc36614 /sql
parent590416e21c12129fadf3be7b919006c5074f4377 (diff)
downloadmariadb-git-b66cdbd1eaeed7e96317a03a190c496fd062ec71.tar.gz
Changing all cost calculation to be given in milliseconds
This makes it easier to compare different costs and also allows the optimizer to optimizer different storage engines more reliably. - Added tests/check_costs.pl, a tool to verify optimizer cost calculations. - Most engine costs has been found with this program. All steps to calculate the new costs are documented in Docs/optimizer_costs.txt - User optimizer_cost variables are given in microseconds (as individual costs can be very small). Internally they are stored in ms. - Changed DISK_READ_COST (was DISK_SEEK_BASE_COST) from a hard disk cost (9 ms) to common SSD cost (400MB/sec). - Removed cost calculations for hard disks (rotation etc). - Changed the following handler functions to return IO_AND_CPU_COST. This makes it easy to apply different cost modifiers in ha_..time() functions for io and cpu costs. - scan_time() - rnd_pos_time() & rnd_pos_call_time() - keyread_time() - Enhanched keyread_time() to calculate the full cost of reading of a set of keys with a given number of ranges and optional number of blocks that need to be accessed. - Removed read_time() as keyread_time() + rnd_pos_time() can do the same thing and more. - Tuned cost for: heap, myisam, Aria, InnoDB, archive and MyRocks. Used heap table costs for json_table. The rest are using default engine costs. - Added the following new optimizer variables: - optimizer_disk_read_ratio - optimizer_disk_read_cost - optimizer_key_lookup_cost - optimizer_row_lookup_cost - optimizer_row_next_find_cost - optimizer_scan_cost - Moved all engine specific cost to OPTIMIZER_COSTS structure. - Changed costs to use 'records_out' instead of 'records_read' when recalculating costs. - Split optimizer_costs.h to optimizer_costs.h and optimizer_defaults.h. This allows one to change costs without having to compile a lot of files. - Updated costs for filter lookup. - Use a better cost estimate in best_extension_by_limited_search() for the sorting cost. - Fixed previous issues with 'filtered' explain column as we are now using 'records_out' (min rows seen for table) to calculate filtering. This greatly simplifies the filtering code in JOIN_TAB::save_explain_data(). This change caused a lot of queries to be optimized differently than before, which exposed different issues in the optimizer that needs to be fixed. These fixes are in the following commits. To not have to change the same test case over and over again, the changes in the test cases are done in a single commit after all the critical change sets are done. InnoDB changes: - Updated InnoDB to not divide big range cost with 2. - Added cost for InnoDB (innobase_update_optimizer_costs()). - Don't mark clustered primary key with HA_KEYREAD_ONLY. This will prevent that the optimizer is trying to use index-only scans on the clustered key. - Disabled ha_innobase::scan_time() and ha_innobase::read_time() and ha_innobase::rnd_pos_time() as the default engine cost functions now works good for InnoDB. Other things: - Added --show-query-costs (\Q) option to mysql.cc to show the query cost after each query (good when working with query costs). - Extended my_getopt with GET_ADJUSTED_VALUE which allows one to adjust the value that user is given. This is used to change cost from microseconds (user input) to milliseconds (what the server is internally using). - Added include/my_tracker.h ; Useful include file to quickly test costs of a function. - Use handler::set_table() in all places instead of 'table= arg'. - Added SHOW_OPTIMIZER_COSTS to sys variables. These are input and shown in microseconds for the user but stored as milliseconds. This is to make the numbers easier to read for the user (less pre-zeros). Implemented in 'Sys_var_optimizer_cost' class. - In test_quick_select() do not use index scans if 'no_keyread' is set for the table. This is what we do in other places of the server. - Added THD parameter to Unique::get_use_cost() and check_index_intersect_extension() and similar functions to be able to provide costs to called functions. - Changed 'records' to 'rows' in optimizer_trace. - Write more information to optimizer_trace. - Added INDEX_BLOCK_FILL_FACTOR_MUL (4) and INDEX_BLOCK_FILL_FACTOR_DIV (3) to calculate usage space of keys in b-trees. (Before we used numeric constants). - Removed code that assumed that b-trees has similar costs as binary trees. Replaced with engine calls that returns the cost. - Added Bitmap::find_first_bit() - Added timings to join_cache for ANALYZE table (patch by Sergei Petrunia). - Added records_init and records_after_filter to POSITION to remember more of what best_access_patch() calculates. - table_after_join_selectivity() changed to recalculate 'records_out' based on the new fields from best_access_patch() Bug fixes: - Some queries did not update last_query_cost (was 0). Fixed by moving setting thd->...last_query_cost in JOIN::optimize(). - Write '0' as number of rows for const tables with a matching row. Some internals: - Engine cost are stored in OPTIMIZER_COSTS structure. When a handlerton is created, we also created a new cost variable for the handlerton. We also create a new variable if the user changes a optimizer cost for a not yet loaded handlerton either with command line arguments or with SET @@global.engine.optimizer_cost_variable=xx. - There are 3 global OPTIMIZER_COSTS variables: default_optimizer_costs The default costs + changes from the command line without an engine specifier. heap_optimizer_costs Heap table costs, used for temporary tables tmp_table_optimizer_costs The cost for the default on disk internal temporary table (MyISAM or Aria) - The engine cost for a table is stored in table_share. To speed up accesses the handler has a pointer to this. The cost is copied to the table on first access. If one wants to change the cost one must first update the global engine cost and then do a FLUSH TABLES. This was done to be able to access the costs for an open table without any locks. - When a handlerton is created, the cost are updated the following way: See sql/keycaches.cc for details: - Use 'default_optimizer_costs' as a base - Call hton->update_optimizer_costs() to override with the engines default costs. - Override the costs that the user has specified for the engine. - One handler open, copy the engine cost from handlerton to TABLE_SHARE. - Call handler::update_optimizer_costs() to allow the engine to update cost for this particular table. - There are two costs stored in THD. These are copied to the handler when the table is used in a query: - optimizer_where_cost - optimizer_scan_setup_cost - Simply code in best_access_path() by storing all cost result in a structure. (Idea/Suggestion by Igor)
Diffstat (limited to 'sql')
-rw-r--r--sql/CMakeLists.txt2
-rw-r--r--sql/filesort.cc4
-rw-r--r--sql/filesort_utils.cc23
-rw-r--r--sql/filesort_utils.h1
-rw-r--r--sql/ha_partition.cc127
-rw-r--r--sql/ha_partition.h14
-rw-r--r--sql/handler.cc220
-rw-r--r--sql/handler.h246
-rw-r--r--sql/item_func.cc2
-rw-r--r--sql/json_table.cc5
-rw-r--r--sql/keycaches.cc141
-rw-r--r--sql/keycaches.h2
-rw-r--r--sql/multi_range_read.cc110
-rw-r--r--sql/mysqld.cc113
-rw-r--r--sql/mysqld.h17
-rw-r--r--sql/opt_range.cc121
-rw-r--r--sql/opt_split.cc35
-rw-r--r--sql/opt_subselect.cc92
-rw-r--r--sql/opt_subselect.h8
-rw-r--r--sql/opt_trace.cc4
-rw-r--r--sql/optimizer_costs.h216
-rw-r--r--sql/optimizer_defaults.h183
-rw-r--r--sql/rowid_filter.cc19
-rw-r--r--sql/rowid_filter.h17
-rw-r--r--sql/set_var.cc8
-rw-r--r--sql/set_var.h2
-rw-r--r--sql/sql_bitmap.h12
-rw-r--r--sql/sql_class.cc1
-rw-r--r--sql/sql_class.h14
-rw-r--r--sql/sql_const.h14
-rw-r--r--sql/sql_explain.cc9
-rw-r--r--sql/sql_explain.h4
-rw-r--r--sql/sql_join_cache.cc2
-rw-r--r--sql/sql_plugin.h1
-rw-r--r--sql/sql_select.cc773
-rw-r--r--sql/sql_select.h42
-rw-r--r--sql/sql_show.cc68
-rw-r--r--sql/sql_yacc.yy2
-rw-r--r--sql/sys_vars.cc152
-rw-r--r--sql/sys_vars.inl141
-rw-r--r--sql/table.cc45
-rw-r--r--sql/table.h11
-rw-r--r--sql/uniques.cc31
-rw-r--r--sql/uniques.h2
44 files changed, 2045 insertions, 1011 deletions
diff --git a/sql/CMakeLists.txt b/sql/CMakeLists.txt
index d45ab2adde1..a11bb31bb4f 100644
--- a/sql/CMakeLists.txt
+++ b/sql/CMakeLists.txt
@@ -174,7 +174,7 @@ SET (SQL_SOURCE
sql_tvc.cc sql_tvc.h
opt_split.cc
rowid_filter.cc rowid_filter.h
- optimizer_costs.h
+ optimizer_costs.h optimizer_defaults.h
opt_trace.cc
table_cache.cc encryption.cc temporary_tables.cc
json_table.cc
diff --git a/sql/filesort.cc b/sql/filesort.cc
index 071108f7e91..96eabfdab89 100644
--- a/sql/filesort.cc
+++ b/sql/filesort.cc
@@ -1132,8 +1132,6 @@ write_keys(Sort_param *param, SORT_INFO *fs_info, uint count,
for (uint ix= 0; ix < count; ++ix)
{
uchar *record= fs_info->get_sorted_record(ix);
-
-
if (my_b_write(tempfile, record, param->get_record_length(record)))
DBUG_RETURN(1); /* purecov: inspected */
}
@@ -1678,7 +1676,7 @@ ulong read_to_buffer(IO_CACHE *fromfile, Merge_chunk *buffpek,
num_bytes_read= bytes_to_read;
buffpek->init_current_key();
- buffpek->advance_file_position(num_bytes_read); /* New filepos */
+ buffpek->advance_file_position(num_bytes_read); /* New filepos */
buffpek->decrement_rowcount(count);
buffpek->set_mem_count(count);
return (ulong) num_bytes_read;
diff --git a/sql/filesort_utils.cc b/sql/filesort_utils.cc
index 854033cc8d8..1aa17deb16e 100644
--- a/sql/filesort_utils.cc
+++ b/sql/filesort_utils.cc
@@ -19,7 +19,7 @@
#include "sql_const.h"
#include "sql_sort.h"
#include "table.h"
-
+#include "optimizer_defaults.h"
PSI_memory_key key_memory_Filesort_buffer_sort_keys;
@@ -58,7 +58,6 @@ const LEX_CSTRING filesort_names[]=
Cost of the operation.
*/
-static
double get_qsort_sort_cost(ha_rows num_rows, bool with_addon_fields)
{
const double row_copy_cost= with_addon_fields ? DEFAULT_ROW_COPY_COST :
@@ -106,12 +105,13 @@ double get_pq_sort_cost(size_t num_rows, size_t queue_size,
static
double get_merge_cost(ha_rows num_elements, ha_rows num_buffers,
- size_t elem_size, double compare_cost)
+ size_t elem_size, double compare_cost,
+ double disk_read_cost)
{
/* 2 -> 1 read + 1 write */
const double io_cost= (2.0 * (num_elements * elem_size +
DISK_CHUNK_SIZE - 1) /
- DISK_CHUNK_SIZE);
+ DISK_CHUNK_SIZE) * disk_read_cost;
/* 2 -> 1 insert, 1 pop for the priority queue used to merge the buffers. */
const double cpu_cost= (2.0 * num_elements * log2(1.0 + num_buffers) *
compare_cost) * PQ_SORT_SLOWNESS_CORRECTION_FACTOR;
@@ -131,6 +131,7 @@ double get_merge_many_buffs_cost_fast(ha_rows num_rows,
ha_rows num_keys_per_buffer,
size_t elem_size,
double key_compare_cost,
+ double disk_read_cost,
bool with_addon_fields)
{
DBUG_ASSERT(num_keys_per_buffer != 0);
@@ -162,7 +163,7 @@ double get_merge_many_buffs_cost_fast(ha_rows num_rows,
total_cost+=
num_merge_calls *
get_merge_cost(num_keys_per_buffer * MERGEBUFF, MERGEBUFF, elem_size,
- key_compare_cost);
+ key_compare_cost, disk_read_cost);
// # of records in remaining buffers.
last_n_elems+= num_remaining_buffs * num_keys_per_buffer;
@@ -170,7 +171,7 @@ double get_merge_many_buffs_cost_fast(ha_rows num_rows,
// Cost of merge sort of remaining buffers.
total_cost+=
get_merge_cost(last_n_elems, 1 + num_remaining_buffs, elem_size,
- key_compare_cost);
+ key_compare_cost, disk_read_cost);
num_buffers= num_merge_calls;
num_keys_per_buffer*= MERGEBUFF;
@@ -179,7 +180,7 @@ double get_merge_many_buffs_cost_fast(ha_rows num_rows,
// Simulate final merge_buff call.
last_n_elems+= num_keys_per_buffer * num_buffers;
total_cost+= get_merge_cost(last_n_elems, 1 + num_buffers, elem_size,
- key_compare_cost);
+ key_compare_cost, disk_read_cost);
return total_cost;
}
@@ -238,7 +239,7 @@ void Sort_costs::compute_pq_sort_costs(Sort_param *param, ha_rows num_rows,
{
costs[PQ_SORT_ORDER_BY_FIELDS]=
get_pq_sort_cost(num_rows, queue_size, false) +
- param->sort_form->file->ha_rnd_pos_time(MY_MIN(queue_size - 1, num_rows));
+ param->sort_form->file->ha_rnd_pos_call_time(MY_MIN(queue_size - 1, num_rows));
}
/* Calculate cost with addon fields */
@@ -272,9 +273,10 @@ void Sort_costs::compute_merge_sort_costs(Sort_param *param,
costs[MERGE_SORT_ORDER_BY_FIELDS]=
get_merge_many_buffs_cost_fast(num_rows, num_available_keys,
row_length, DEFAULT_KEY_COMPARE_COST,
+ default_optimizer_costs.disk_read_cost,
false) +
- param->sort_form->file->ha_rnd_pos_time(MY_MIN(param->limit_rows,
- num_rows));
+ param->sort_form->file->ha_rnd_pos_call_time(MY_MIN(param->limit_rows,
+ num_rows));
if (with_addon_fields)
{
@@ -286,6 +288,7 @@ void Sort_costs::compute_merge_sort_costs(Sort_param *param,
costs[MERGE_SORT_ALL_FIELDS]=
get_merge_many_buffs_cost_fast(num_rows, num_available_keys,
row_length, DEFAULT_KEY_COMPARE_COST,
+ DISK_READ_COST_THD(thd),
true);
}
diff --git a/sql/filesort_utils.h b/sql/filesort_utils.h
index b97fc4632c5..73aa2f76a18 100644
--- a/sql/filesort_utils.h
+++ b/sql/filesort_utils.h
@@ -352,6 +352,7 @@ extern const LEX_CSTRING filesort_names[];
double cost_of_filesort(TABLE *table, ORDER *order_by, ha_rows rows_to_read,
ha_rows limit_rows, enum sort_type *used_sort_type);
+double get_qsort_sort_cost(ha_rows num_rows, bool with_addon_fields);
int compare_packed_sort_keys(void *sort_keys, unsigned char **a,
unsigned char **b);
qsort2_cmp get_packed_keys_compare_ptr();
diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc
index 35fb6be53b7..701bffd8dd7 100644
--- a/sql/ha_partition.cc
+++ b/sql/ha_partition.cc
@@ -9737,24 +9737,27 @@ uint ha_partition::get_biggest_used_partition(uint *part_index)
time for scan
*/
-double ha_partition::scan_time()
+IO_AND_CPU_COST ha_partition::scan_time()
{
- double scan_time= 0;
+ IO_AND_CPU_COST scan_time= {0,0};
uint i;
DBUG_ENTER("ha_partition::scan_time");
for (i= bitmap_get_first_set(&m_part_info->read_partitions);
i < m_tot_parts;
i= bitmap_get_next_set(&m_part_info->read_partitions, i))
- scan_time+= m_file[i]->scan_time();
+ {
+ IO_AND_CPU_COST cost= m_file[i]->scan_time();
+ scan_time.io+= cost.io;
+ scan_time.cpu+= cost.cpu;
+ }
if (m_tot_parts)
{
/*
Add TABLE_SCAN_SETUP_COST for partitions to make cost similar to
in ha_scan_time()
*/
- scan_time+= (TABLE_SCAN_SETUP_COST * avg_io_cost() * (m_tot_parts - 1) /
- optimizer_cache_cost);
+ scan_time.cpu+= TABLE_SCAN_SETUP_COST * (m_tot_parts - 1);
}
DBUG_RETURN(scan_time);
}
@@ -9769,34 +9772,78 @@ double ha_partition::scan_time()
@return time for scanning index inx
*/
-double ha_partition::key_scan_time(uint inx)
+IO_AND_CPU_COST ha_partition::key_scan_time(uint inx, ha_rows rows)
{
- double scan_time= 0;
+ IO_AND_CPU_COST scan_time= {0,0};
uint i;
+ uint partitions= bitmap_bits_set(&m_part_info->read_partitions);
+ ha_rows rows_per_part;
DBUG_ENTER("ha_partition::key_scan_time");
+
+ if (partitions == 0)
+ DBUG_RETURN(scan_time);
+ set_if_bigger(rows, 1);
+ rows_per_part= (rows + partitions - 1)/partitions;
+
for (i= bitmap_get_first_set(&m_part_info->read_partitions);
i < m_tot_parts;
i= bitmap_get_next_set(&m_part_info->read_partitions, i))
- scan_time+= m_file[i]->key_scan_time(inx);
+ {
+ IO_AND_CPU_COST cost= m_file[i]->key_scan_time(inx, rows_per_part);
+ scan_time.io+= cost.io;
+ scan_time.cpu+= cost.cpu;
+ }
DBUG_RETURN(scan_time);
}
-double ha_partition::keyread_time(uint inx, uint ranges, ha_rows rows)
+IO_AND_CPU_COST ha_partition::keyread_time(uint inx, ulong ranges, ha_rows rows,
+ ulonglong blocks)
{
- double read_time= 0;
+ IO_AND_CPU_COST read_time= {0,0};
uint i;
+ uint partitions= bitmap_bits_set(&m_part_info->read_partitions);
DBUG_ENTER("ha_partition::keyread_time");
- if (!ranges)
- DBUG_RETURN(handler::keyread_time(inx, ranges, rows));
+ if (partitions == 0)
+ DBUG_RETURN(read_time);
+
+ ha_rows rows_per_part= (rows + partitions - 1)/partitions;
for (i= bitmap_get_first_set(&m_part_info->read_partitions);
i < m_tot_parts;
i= bitmap_get_next_set(&m_part_info->read_partitions, i))
- read_time+= m_file[i]->keyread_time(inx, ranges, rows);
+ {
+ IO_AND_CPU_COST cost= m_file[i]->keyread_time(inx, ranges, rows_per_part,
+ blocks);
+ read_time.io+= cost.io;
+ read_time.cpu+= cost.cpu;
+ }
+ /* Add that we have to do a key lookup for all ranges in all partitions */
+ read_time.cpu= (partitions-1) * ranges * KEY_LOOKUP_COST;
DBUG_RETURN(read_time);
}
+IO_AND_CPU_COST ha_partition::rnd_pos_time(ha_rows rows)
+{
+ IO_AND_CPU_COST read_time= {0,0};
+ uint i;
+ uint partitions= bitmap_bits_set(&m_part_info->read_partitions);
+ if (partitions == 0)
+ return read_time;
+
+ ha_rows rows_per_part= (rows + partitions - 1)/partitions;
+ for (i= bitmap_get_first_set(&m_part_info->read_partitions);
+ i < m_tot_parts;
+ i= bitmap_get_next_set(&m_part_info->read_partitions, i))
+ {
+ IO_AND_CPU_COST cost= m_file[i]->rnd_pos_time(rows_per_part);
+ read_time.io+= cost.io;
+ read_time.cpu+= cost.cpu;
+ }
+ return read_time;
+}
+
+
/**
Find number of records in a range.
@param inx Index number
@@ -9853,6 +9900,8 @@ ha_rows ha_partition::records_in_range(uint inx, const key_range *min_key,
if (estimated_rows && checked_rows &&
checked_rows >= min_rows_to_check)
{
+ /* We cannot use page ranges when there is more than one partion */
+ *pages= unused_page_range;
DBUG_PRINT("info",
("records_in_range(inx %u): %lu (%lu * %lu / %lu)",
inx,
@@ -9866,6 +9915,8 @@ ha_rows ha_partition::records_in_range(uint inx, const key_range *min_key,
DBUG_PRINT("info", ("records_in_range(inx %u): %lu",
inx,
(ulong) estimated_rows));
+ /* We cannot use page ranges when there is more than one partion */
+ *pages= unused_page_range;
DBUG_RETURN(estimated_rows);
}
@@ -9896,33 +9947,6 @@ ha_rows ha_partition::estimate_rows_upper_bound()
}
-/*
- Get time to read
-
- SYNOPSIS
- read_time()
- index Index number used
- ranges Number of ranges
- rows Number of rows
-
- RETURN VALUE
- time for read
-
- DESCRIPTION
- This will be optimised later to include whether or not the index can
- be used with partitioning. To achieve we need to add another parameter
- that specifies how many of the index fields that are bound in the ranges.
- Possibly added as a new call to handlers.
-*/
-
-double ha_partition::read_time(uint index, uint ranges, ha_rows rows)
-{
- DBUG_ENTER("ha_partition::read_time");
-
- DBUG_RETURN(get_open_file_sample()->read_time(index, ranges, rows));
-}
-
-
/**
Number of rows in table. see handler.h
@@ -12145,6 +12169,18 @@ ha_partition::can_convert_nocopy(const Field &field,
return true;
}
+/*
+ Get table costs for the current statement that should be stored in
+ handler->cost variables.
+
+ When we want to support many different table handlers, we should set
+ m_file[i]->costs to point to an unique cost structure per open
+ instance and call something similar as
+ TABLE_SHARE::update_optimizer_costs(handlerton *hton) and
+ handler::update_optimizer_costs(&costs) on it.
+*/
+
+
void ha_partition::set_optimizer_costs(THD *thd)
{
handler::set_optimizer_costs(thd);
@@ -12154,6 +12190,17 @@ void ha_partition::set_optimizer_costs(THD *thd)
m_file[i]->set_optimizer_costs(thd);
}
+/*
+ Get unique table costs for the first instance of the handler and store
+ in table->share
+*/
+
+void ha_partition::update_optimizer_costs(OPTIMIZER_COSTS *costs)
+{
+ uint i= bitmap_get_first_set(&m_part_info->read_partitions);
+ m_file[i]->update_optimizer_costs(costs);
+}
+
struct st_mysql_storage_engine partition_storage_engine=
{ MYSQL_HANDLERTON_INTERFACE_VERSION };
diff --git a/sql/ha_partition.h b/sql/ha_partition.h
index 8aaf87b699c..febc62d9bcf 100644
--- a/sql/ha_partition.h
+++ b/sql/ha_partition.h
@@ -1031,17 +1031,15 @@ public:
/*
Called in test_quick_select to determine if indexes should be used.
*/
- double scan_time() override;
+ IO_AND_CPU_COST scan_time() override;
- double key_scan_time(uint inx) override;
+ IO_AND_CPU_COST key_scan_time(uint inx, ha_rows rows) override;
- double keyread_time(uint inx, uint ranges, ha_rows rows) override;
+ IO_AND_CPU_COST keyread_time(uint inx, ulong ranges, ha_rows rows,
+ ulonglong blocks) override;
+ IO_AND_CPU_COST rnd_pos_time(ha_rows rows) override;
/*
- The next method will never be called if you do not implement indexes.
- */
- double read_time(uint index, uint ranges, ha_rows rows) override;
- /*
For the given range how many records are estimated to be in this range.
Used by optimiser to calculate cost of using a particular index.
*/
@@ -1638,5 +1636,7 @@ public:
bool can_convert_nocopy(const Field &field,
const Column_definition &new_field) const override;
void set_optimizer_costs(THD *thd) override;
+ void update_optimizer_costs(OPTIMIZER_COSTS *costs) override;
};
+
#endif /* HA_PARTITION_INCLUDED */
diff --git a/sql/handler.cc b/sql/handler.cc
index 44c51af5248..a9c832c9274 100644
--- a/sql/handler.cc
+++ b/sql/handler.cc
@@ -46,6 +46,7 @@
#include "ha_sequence.h"
#include "rowid_filter.h"
#include "mysys_err.h"
+#include "optimizer_defaults.h"
#ifdef WITH_PARTITION_STORAGE_ENGINE
#include "ha_partition.h"
@@ -621,8 +622,44 @@ int ha_finalize_handlerton(st_plugin_int *plugin)
}
-const char *hton_no_exts[]= { 0 };
+/*
+ Get a pointer to the global engine optimizer costs (like
+ innodb.disk_read_cost) and store the pointer in the handlerton.
+
+ This is called once when a handlerton is created.
+ We also update the not set global costs with the default costs
+ to allow information_schema to print the real used values.
+*/
+
+static bool update_optimizer_costs(handlerton *hton)
+{
+ OPTIMIZER_COSTS costs= default_optimizer_costs;
+ LEX_CSTRING *name= hton_name(hton);
+
+ if (hton->update_optimizer_costs)
+ hton->update_optimizer_costs(&costs);
+
+ mysql_mutex_lock(&LOCK_optimizer_costs);
+ hton->optimizer_costs= get_or_create_optimizer_costs(name->str,
+ name->length);
+ if (!hton->optimizer_costs)
+ {
+ mysql_mutex_unlock(&LOCK_optimizer_costs);
+ return 1; // OOM
+ }
+
+ /* Update not set values from current default costs */
+ for (uint i=0 ; i < sizeof(OPTIMIZER_COSTS)/sizeof(double) ; i++)
+ {
+ double *var= ((double*) hton->optimizer_costs)+i;
+ if (*var == OPTIMIZER_COST_UNDEF)
+ *var= ((double*) &costs)[i];
+ }
+ mysql_mutex_unlock(&LOCK_optimizer_costs);
+ return 0;
+}
+const char *hton_no_exts[]= { 0 };
int ha_initialize_handlerton(st_plugin_int *plugin)
{
@@ -725,6 +762,12 @@ int ha_initialize_handlerton(st_plugin_int *plugin)
hton->savepoint_offset= savepoint_alloc_size;
savepoint_alloc_size+= tmp;
hton2plugin[hton->slot]=plugin;
+
+ if (plugin->plugin->type == MYSQL_STORAGE_ENGINE_PLUGIN &&
+ !(hton->flags & HTON_HIDDEN) &&
+ update_optimizer_costs(hton))
+ goto err_deinit;
+
if (hton->prepare)
{
total_ha_2pc++;
@@ -764,7 +807,6 @@ int ha_initialize_handlerton(st_plugin_int *plugin)
resolve_sysvar_table_options(hton);
update_discovery_counters(hton, 1);
-
DBUG_RETURN(0);
err_deinit:
@@ -3209,6 +3251,7 @@ handler *handler::clone(const char *name, MEM_ROOT *mem_root)
if (new_handler->ha_open(table, name, table->db_stat,
HA_OPEN_IGNORE_IF_LOCKED, mem_root))
goto err;
+ new_handler->set_optimizer_costs(ha_thd());
return new_handler;
@@ -3240,58 +3283,97 @@ LEX_CSTRING *handler::engine_name()
return hton_name(ht);
}
-
/*
- It is assumed that the value of the parameter 'ranges' can be only 0 or 1.
- If ranges == 1 then the function returns the cost of index only scan
- by index 'keyno' of one range containing 'rows' key entries.
- If ranges == 0 then the function returns only the cost of copying
- those key entries into the engine buffers.
-
- This function doesn't take in account into copying the key to record
- (KEY_COPY_COST) or comparing the key to the where clause (WHERE_COST)
+ Calculate cost for an index scan for given index and number of records.
+
+ @param index Index to use
+ @param ranges Number of ranges (b-tree dives in case of b-tree).
+ Used by partition engine
+ @param rows Number of expected rows
+ @param blocks Number of disk blocks to read (from range optimizer).
+ 0 if not known
+
+ This function does not take in account into looking up the key,
+ copying the key to record and finding the next key. These cost are
+ handled in ha_keyread_time()
*/
-double handler::keyread_time(uint index, uint ranges, ha_rows rows)
+IO_AND_CPU_COST handler::keyread_time(uint index, ulong ranges, ha_rows rows,
+ ulonglong blocks)
{
- size_t len;
- double cost;
- DBUG_ASSERT(ranges == 0 || ranges == 1);
- len= table->key_info[index].key_length + ref_length;
- if (table->file->is_clustering_key(index))
- len= table->s->stored_rec_length;
+ IO_AND_CPU_COST cost;
+ ulonglong io_blocks= 0;
+ DBUG_ASSERT(ranges > 0);
- cost= ((double)rows*len/(stats.block_size+1) *
- INDEX_BLOCK_COPY_COST(table->in_use));
- /*
- We divide the cost with optimizer_cache_cost as ha_keyread_time()
- and ha_key_scan_time() will multiply the result value with
- optimizer_cache_cost and we want to keep the above 'memory operation'
- cost unaffected by this multiplication.
- */
- cost/= optimizer_cache_cost;
- if (ranges)
+ /* memory engine has stats.block_size == 0 */
+ if (stats.block_size)
{
- uint keys_per_block= (uint) (stats.block_size*3/4/len+1);
- /*
- We let the cost grow slowly in proportion to number of rows to
- promote indexes with less rows.
- We do not calculate exact number of block reads as then index
- only reads will be more costly than normal reads, especially
- compared to InnoDB clustered keys.
-
- KEY_LOOKUP_COST is the cost of finding the first key in the
- range. Finding the next key is usually a fast operation so we
- don't count it here, it is taken into account in
- ha_keyread_and_copy_time()
- */
- cost+= (((double) (rows / keys_per_block) + KEY_LOOKUP_COST) *
- avg_io_cost());
+ if (!blocks)
+ {
+ /* Estimate length of index data */
+ if (rows <= 1) // EQ_REF optimization
+ {
+ blocks= 1;
+ io_blocks= (stats.block_size + IO_SIZE - 1)/ IO_SIZE;
+ }
+ else
+ {
+ size_t len= table->key_storage_length(index);
+ blocks= ((ulonglong) ((rows * len / INDEX_BLOCK_FILL_FACTOR_DIV *
+ INDEX_BLOCK_FILL_FACTOR_MUL +
+ stats.block_size-1)) / stats.block_size +
+ (ranges - 1));
+ io_blocks= blocks * stats.block_size / IO_SIZE;
+ }
+ }
+ else
+ io_blocks= blocks * stats.block_size / IO_SIZE;
}
+ cost.io= (double) io_blocks * avg_io_cost();
+ cost.cpu= blocks * INDEX_BLOCK_COPY_COST;
return cost;
}
+/*
+ Cost of doing a set of range scans and finding the key position.
+ This function is used both with index scans (in which case there should be
+ an additional KEY_COPY_COST) and when normal index + fetch row scan,
+ in which case there should an additional rnd_pos_time() cost.
+*/
+
+double handler::ha_keyread_time(uint index, ulong ranges, ha_rows rows,
+ ulonglong blocks)
+{
+ if (rows < ranges)
+ rows= ranges;
+ IO_AND_CPU_COST cost= keyread_time(index, ranges, rows, blocks);
+ return (cost.io * DISK_READ_RATIO +
+ cost.cpu + ranges * KEY_LOOKUP_COST +
+ (rows - ranges) * KEY_NEXT_FIND_COST);
+}
+
+
+/*
+ Read a row from a clustered index
+
+ Cost is similar to ha_rnd_pos_call_time() as a index_read() on a clusterd
+ key has identical code as rnd_pos() (At least in InnoDB:)
+*/
+
+double handler::ha_keyread_clustered_and_copy_time(uint index, ulong ranges,
+ ha_rows rows,
+ ulonglong blocks)
+{
+ if (rows < ranges)
+ rows= ranges;
+ IO_AND_CPU_COST cost= keyread_time(index, ranges, rows, blocks);
+ return (cost.io * DISK_READ_RATIO +
+ cost.cpu + ranges * ROW_LOOKUP_COST +
+ (rows - ranges) * ROW_NEXT_FIND_COST +
+ rows * ROW_COPY_COST);
+}
+
THD *handler::ha_thd(void) const
{
DBUG_ASSERT(!table || !table->in_use || table->in_use == current_thd);
@@ -3364,7 +3446,7 @@ int handler::ha_open(TABLE *table_arg, const char *name, int mode,
name, ht->db_type, table_arg->db_stat, mode,
test_if_locked));
- table= table_arg;
+ set_table(table_arg);
DBUG_ASSERT(table->s == table_share);
DBUG_ASSERT(m_lock_type == F_UNLCK);
DBUG_PRINT("info", ("old m_lock_type: %d F_UNLCK %d", m_lock_type, F_UNLCK));
@@ -3414,14 +3496,15 @@ int handler::ha_open(TABLE *table_arg, const char *name, int mode,
else
dup_ref=ref+ALIGN_SIZE(ref_length);
cached_table_flags= table_flags();
-
+ if (!table->s->optimizer_costs_inited)
+ {
+ table->s->optimizer_costs_inited=1;
+ /* Copy data from global 'engine'.optimizer_costs to TABLE_SHARE */
+ table->s->update_optimizer_costs(partition_ht());
+ /* Update costs depend on table structure */
+ update_optimizer_costs(&table->s->optimizer_costs);
+ }
/* Copy current optimizer costs. Needed in case clone() is used */
- set_optimizer_costs(table->in_use);
- DBUG_ASSERT(optimizer_key_copy_cost >= 0.0);
- DBUG_ASSERT(optimizer_key_next_find_cost >= 0.0);
- DBUG_ASSERT(optimizer_row_copy_cost >= 0.0);
- DBUG_ASSERT(optimizer_where_cost >= 0.0);
- DBUG_ASSERT(optimizer_key_cmp_cost >= 0.0);
reset_statistics();
}
internal_tmp_table= MY_TEST(test_if_locked & HA_OPEN_INTERNAL_TABLE);
@@ -3453,6 +3536,15 @@ int handler::ha_close(void)
DBUG_RETURN(close());
}
+void handler::change_table_ptr(TABLE *table_arg, TABLE_SHARE *share)
+{
+ DBUG_ASSERT(table_arg->s == share);
+ table= table_arg;
+ table_share= share;
+ costs= &share->optimizer_costs;
+ reset_statistics();
+}
+
int handler::ha_rnd_next(uchar *buf)
{
@@ -8767,27 +8859,19 @@ Table_scope_and_contents_source_st::fix_period_fields(THD *thd,
}
/*
- Copy common optimizer cost variables to the engine
-
- This is needed to provide fast acccess to these variables during
- optimization (as we refer to them multiple times).
+ Copy upper level cost to the engine as part of start statement
- The other option would be to access them from thd, but that
- would require a function call (as we cannot access THD from
- an inline handler function) and two extra memory accesses
- for each variable.
+ This is needed to provide fast access to these variables during
+ optimization (as we refer to them multiple times during one query).
- index_block_copy_cost is not copied as it is used so seldom.
+ The other option would be to access them from THD, but that would
+ require a function call (as we cannot easily access THD from an
+ inline handler function) and two extra memory accesses for each
+ variable.
*/
-
void handler::set_optimizer_costs(THD *thd)
{
- optimizer_key_copy_cost= thd->variables.optimizer_key_copy_cost;
- optimizer_key_next_find_cost=
- thd->variables.optimizer_key_next_find_cost;
- optimizer_row_copy_cost= thd->variables.optimizer_row_copy_cost;
- optimizer_where_cost= thd->variables.optimizer_where_cost;
- optimizer_key_cmp_cost= thd->variables.optimizer_key_cmp_cost;
- set_optimizer_cache_cost(thd->optimizer_cache_hit_ratio);
+ optimizer_where_cost= thd->variables.optimizer_where_cost;
+ optimizer_scan_setup_cost= thd->variables.optimizer_scan_setup_cost;
}
diff --git a/sql/handler.h b/sql/handler.h
index 72ab1ed00a3..98ff37cd268 100644
--- a/sql/handler.h
+++ b/sql/handler.h
@@ -26,9 +26,9 @@
#endif
#include "sql_const.h"
-#include "optimizer_costs.h"
#include "sql_basic_types.h"
#include "mysqld.h" /* server_id */
+#include "optimizer_costs.h"
#include "sql_plugin.h" /* plugin_ref, st_plugin_int, plugin */
#include "thr_lock.h" /* thr_lock_type, THR_LOCK_DATA */
#include "sql_cache.h"
@@ -36,6 +36,7 @@
#include "sql_array.h" /* Dynamic_array<> */
#include "mdl.h"
#include "vers_string.h"
+#include "optimizer_costs.h"
#include "sql_analyze_stmt.h" // for Exec_time_tracker
@@ -1046,6 +1047,7 @@ enum enum_schema_tables
SCH_KEY_CACHES,
SCH_KEY_COLUMN_USAGE,
SCH_OPEN_TABLES,
+ SCH_OPTIMIZER_COSTS,
SCH_OPT_TRACE,
SCH_PARAMETERS,
SCH_PARTITIONS,
@@ -1496,6 +1498,10 @@ struct handlerton
/* Called for all storage handlers after ddl recovery is done */
void (*signal_ddl_recovery_done)(handlerton *hton);
+ /* Called at startup to update default engine costs */
+ void (*update_optimizer_costs)(OPTIMIZER_COSTS *costs);
+ void *optimizer_costs; /* Costs are stored here */
+
/*
Optional clauses in the CREATE/ALTER TABLE
*/
@@ -3085,6 +3091,21 @@ enum class Compare_keys : uint32_t
NotEqual
};
+/* Cost for reading a row through an index */
+struct INDEX_READ_COST
+{
+ double read_cost;
+ double index_only_cost;
+};
+
+/* Separated costs for IO and CPU. For handler::keyread_time() */
+struct IO_AND_CPU_COST
+{
+ double io;
+ double cpu;
+};
+
+
/**
The handler class is the interface for dynamically loadable
storage engines. Do not add ifdefs and take care when adding or
@@ -3145,9 +3166,10 @@ protected:
ha_rows estimation_rows_to_insert;
handler *lookup_handler;
public:
- handlerton *ht; /* storage engine of this handler */
- uchar *ref; /* Pointer to current row */
- uchar *dup_ref; /* Pointer to duplicate row */
+ handlerton *ht; /* storage engine of this handler */
+ OPTIMIZER_COSTS *costs; /* Points to table->share->costs */
+ uchar *ref; /* Pointer to current row */
+ uchar *dup_ref; /* Pointer to duplicate row */
uchar *lookup_buffer;
ha_statistics stats;
@@ -3220,15 +3242,6 @@ public:
ulonglong rows_changed;
/* One bigger than needed to avoid to test if key == MAX_KEY */
ulonglong index_rows_read[MAX_KEY+1];
- /*
- Cost of using key/record cache: (100-cache_hit_ratio)/100
- Updated from THD in open_tables()
- */
- double optimizer_cache_cost;
- double optimizer_key_next_find_cost;
- double optimizer_row_copy_cost, optimizer_key_copy_cost;
- double optimizer_where_cost, optimizer_key_cmp_cost;
-
ha_copy_info copy_info;
private:
@@ -3347,13 +3360,15 @@ private:
For non partitioned handlers this is &TABLE_SHARE::ha_share.
*/
Handler_share **ha_share;
+ double optimizer_where_cost; // Copy of THD->...optimzer_where_cost
+ double optimizer_scan_setup_cost; // Copy of THD->...optimzer_scan_...
public:
handler(handlerton *ht_arg, TABLE_SHARE *share_arg)
:table_share(share_arg), table(0),
estimation_rows_to_insert(0),
lookup_handler(this),
- ht(ht_arg), ref(0), lookup_buffer(NULL), end_range(NULL),
+ ht(ht_arg), costs(0), ref(0), lookup_buffer(NULL), end_range(NULL),
implicit_emptied(0),
mark_trx_read_write_done(0),
check_table_binlog_row_based_done(0),
@@ -3364,7 +3379,6 @@ public:
ref_length(sizeof(my_off_t)),
ft_handler(0), inited(NONE), pre_inited(NONE),
pushed_cond(0), next_insert_id(0), insert_id_for_cur_row(0),
- optimizer_cache_cost((100-DEFAULT_CACHE_HIT_RATIO)/100.0),
tracker(NULL),
pushed_idx_cond(NULL),
pushed_idx_cond_keyno(MAX_KEY),
@@ -3378,12 +3392,19 @@ public:
m_psi_numrows(0),
m_psi_locker(NULL),
row_logging(0), row_logging_init(0),
- m_lock_type(F_UNLCK), ha_share(NULL)
+ m_lock_type(F_UNLCK), ha_share(NULL), optimizer_where_cost(0),
+ optimizer_scan_setup_cost(0)
{
DBUG_PRINT("info",
("handler created F_UNLCK %d F_RDLCK %d F_WRLCK %d",
F_UNLCK, F_RDLCK, F_WRLCK));
reset_statistics();
+ /*
+ The following variables should be updated in set_optimizer_costs()
+ which is to be run as part of setting up the table for the query
+ */
+ MEM_UNDEFINED(&optimizer_where_cost, sizeof(optimizer_where_cost));
+ MEM_UNDEFINED(&optimizer_scan_setup_cost, sizeof(optimizer_scan_setup_cost));
}
virtual ~handler(void)
{
@@ -3584,22 +3605,22 @@ public:
bzero(&copy_info, sizeof(copy_info));
reset_copy_info();
}
- virtual void change_table_ptr(TABLE *table_arg, TABLE_SHARE *share)
- {
- table= table_arg;
- table_share= share;
- reset_statistics();
- }
+ virtual void change_table_ptr(TABLE *table_arg, TABLE_SHARE *share);
/*
Time for a full table data scan. To be overrided by engines, should not
be used by the sql level.
*/
protected:
- virtual double scan_time()
+ virtual IO_AND_CPU_COST scan_time()
{
- return (((ulonglong2double(stats.data_file_length) / stats.block_size)) *
- avg_io_cost());
+ IO_AND_CPU_COST cost;
+ ulonglong length= stats.data_file_length;
+ cost.io= (double) (length / IO_SIZE) * avg_io_cost();
+ cost.cpu= (!stats.block_size ? 0.0 :
+ (double) ((length + stats.block_size-1)/stats.block_size) *
+ INDEX_BLOCK_COPY_COST);
+ return cost;
}
public:
@@ -3615,147 +3636,149 @@ public:
a few rows and the extra cost has no practical effect.
*/
- inline double ha_scan_time()
+ inline double ha_scan_time(ha_rows rows)
{
- return (scan_time() * optimizer_cache_cost +
- TABLE_SCAN_SETUP_COST * avg_io_cost());
+ IO_AND_CPU_COST cost= scan_time();
+ return (cost.io * DISK_READ_RATIO +
+ cost.cpu + TABLE_SCAN_SETUP_COST +
+ (double) rows * (ROW_NEXT_FIND_COST + ROW_COPY_COST));
}
/*
- Time for a full table scan, fetching the rows from the table and comparing
- the row with the where clause
+ Time for a full table scan, fetching the rows from the table and comparing
+ the row with the where clause
*/
- inline double ha_scan_and_compare_time(ha_rows records)
+ inline double ha_scan_and_compare_time(ha_rows rows)
{
- return (ha_scan_time() +
- (double) records * (ROW_COPY_COST + WHERE_COST));
+ return ha_scan_time(rows) + (double) rows * WHERE_COST;
}
+ /* Cost of (random) reading a block of IO_SIZE */
virtual double avg_io_cost()
{
- return 1.0;
+ return DISK_READ_COST;
}
- virtual void set_optimizer_costs(THD *thd);
-
/*
- Set cost for finding a row in the engine cache
- This allows the handler to override the cost if there is no
- caching of rows, like in heap or federatedx.
+ Update table->share optimizer costs for this particular table.
+ Called once when table is opened the first time.
*/
- virtual void set_optimizer_cache_cost(double cost)
- {
- optimizer_cache_cost= cost;
- }
-
- /**
- The cost of reading a set of ranges from the table using an index
- to access it.
-
- @param index The index number.
- @param ranges The number of ranges to be read. If 0, it means that
- we calculate separately the cost of reading the key.
- @param rows Total number of rows to be read.
-
- This method can be used to calculate the total cost of scanning a table
- using an index by calling it using read_time(index, 1, table_size).
+ virtual void update_optimizer_costs(OPTIMIZER_COSTS *costs) {}
- This function is to be reimplemented by engines (if needed). The sql_level
- should call ha_read_time(), ha_read_and_copy_time() or
- ha_read_and_compare_time().
+ /*
+ Set handler optimizer cost variables.
+ Called for each table used by the statment
+ This is virtual mainly for the partition engine.
*/
+ virtual void set_optimizer_costs(THD *thd);
+
protected:
- virtual double read_time(uint index, uint ranges, ha_rows rows)
+ /*
+ Cost of reading 'rows' number of rows with a rowid
+ */
+ virtual IO_AND_CPU_COST rnd_pos_time(ha_rows rows)
{
- return ((rows2double(rows) * ROW_LOOKUP_COST +
- rows2double(ranges) * KEY_LOOKUP_COST) * avg_io_cost());
+ double r= rows2double(rows);
+ return
+ {
+ r * avg_io_cost() * stats.block_size/IO_SIZE, // Blocks read
+ r * INDEX_BLOCK_COPY_COST // Copy block from cache
+ };
}
public:
- /* Same as above, but take into account CACHE_COST */
- inline double ha_read_time(uint index, uint ranges, ha_rows rows)
- {
- return read_time(index, ranges, rows) * optimizer_cache_cost;
- }
+ /*
+ Time for doing and internal rnd_pos() inside the engine. For some
+ engine, this is more efficient than the SQL layer calling
+ rnd_pos() as there is no overhead in converting/checking the
+ rnd_pos_value. This is used when calculating the cost of fetching
+ a key+row in one go (like when scanning an index and fetching the
+ row).
+ */
- /* Same as above, but take into account also copying of the row to 'record' */
- inline double ha_read_and_copy_time(uint index, uint ranges, ha_rows rows)
+ inline double ha_rnd_pos_time(ha_rows rows)
{
- return (ha_read_time(index, ranges, rows) +
- rows2double(rows) * ROW_COPY_COST);
+ IO_AND_CPU_COST cost= rnd_pos_time(rows);
+ return (cost.io * DISK_READ_RATIO +
+ cost.cpu + rows2double(rows) * (ROW_LOOKUP_COST + ROW_COPY_COST));
}
- /* Same as above, but take into account also copying and comparing the row */
- inline double ha_read_and_compare_time(uint index, uint ranges, ha_rows rows)
+ /*
+ This cost if when we are calling rnd_pos() explict in the call
+ For the moment this function is identical to ha_rnd_pos time,
+ but that may change in the future after we do more cost checks for
+ more engines.
+ */
+ inline double ha_rnd_pos_call_time(ha_rows rows)
{
- return (ha_read_time(index, ranges, rows) +
- rows2double(rows) * (ROW_COPY_COST + WHERE_COST));
+ IO_AND_CPU_COST cost= rnd_pos_time(rows);
+ return (cost.io * DISK_READ_RATIO +
+ cost.cpu + rows2double(rows) * (ROW_LOOKUP_COST + ROW_COPY_COST));
}
- /* Cost of reading a row with rowid */
-protected:
- virtual double rnd_pos_time(ha_rows rows)
+ inline double ha_rnd_pos_call_and_compare_time(ha_rows rows)
{
- return rows2double(rows) * ROW_LOOKUP_COST * avg_io_cost();
- }
-public:
- /*
- Same as above, but take into account cache_cost and copying of the row
- to 'record'.
- Note that this should normally be same as ha_read_time(some_key, 0, rows)
- */
- inline double ha_rnd_pos_time(ha_rows rows)
- {
- return (rnd_pos_time(rows) * optimizer_cache_cost +
- rows2double(rows) * ROW_COPY_COST);
+ return (ha_rnd_pos_call_time(rows) + rows2double(rows) * WHERE_COST);
}
/**
- Calculate cost of 'index_only' scan for given index and number of records.
-
- @param index Index to read
- @param flag If flag == 1 then the function returns the cost of
- index only scan by index 'index' of one range containing
- 'rows' key entries.
- If flag == 0 then function returns only the cost of copying
- those key entries into the engine buffers.
- @param rows #of records to read
+ Calculate cost of 'index_only' scan for given index, a number of reanges
+ and number of records.
+
+ @param index Index to read
+ @param rows #of records to read
+ @param blocks Number of IO blocks that needs to be accessed.
+ 0 if not known (in which case it's calculated)
*/
protected:
- virtual double keyread_time(uint index, uint flag, ha_rows rows);
+ virtual IO_AND_CPU_COST keyread_time(uint index, ulong ranges, ha_rows rows,
+ ulonglong blocks);
public:
/*
Calculate cost of 'keyread' scan for given index and number of records
including fetching the key to the 'record' buffer.
*/
+ double ha_keyread_time(uint index, ulong ranges, ha_rows rows,
+ ulonglong blocks);
- inline double ha_keyread_time(uint index, uint flag, ha_rows rows)
+ /* Same as above, but take into account copying the key the the SQL layer */
+ inline double ha_keyread_and_copy_time(uint index, ulong ranges,
+ ha_rows rows, ulonglong blocks)
{
- return (keyread_time(index, flag, rows) * optimizer_cache_cost);
+ return (ha_keyread_time(index, ranges, rows, blocks) +
+ (double) rows * KEY_COPY_COST);
}
- /* Same as above, but take into account copying the key the the SQL layer */
- inline double ha_keyread_and_copy_time(uint index, uint flag, ha_rows rows)
+ inline double ha_keyread_and_compare_time(uint index, ulong ranges,
+ ha_rows rows, ulonglong blocks)
{
- return ha_keyread_time(index, flag, rows) + (double) rows * KEY_COPY_COST;
+ return (ha_keyread_time(index, ranges, rows, blocks) +
+ (double) rows * (KEY_COPY_COST + WHERE_COST));
}
+ double ha_keyread_clustered_and_copy_time(uint index, ulong ranges,
+ ha_rows rows,
+ ulonglong blocks);
/*
Time for a full table index scan (without copy or compare cost).
To be overrided by engines, sql level should use ha_key_scan_time().
+ Note that IO_AND_CPU_COST does not include avg_io_cost() !
*/
protected:
- virtual double key_scan_time(uint index)
+ virtual IO_AND_CPU_COST key_scan_time(uint index, ha_rows rows)
{
- return keyread_time(index, 1, records());
+ return keyread_time(index, 1, MY_MAX(rows, 1), 0);
}
public:
/* Cost of doing a full index scan */
- inline double ha_key_scan_time(uint index)
+ inline double ha_key_scan_time(uint index, ha_rows rows)
{
- return (key_scan_time(index) * optimizer_cache_cost);
+ IO_AND_CPU_COST cost= key_scan_time(index, rows);
+ return (cost.io * DISK_READ_RATIO +
+ cost.cpu + INDEX_SCAN_SETUP_COST + KEY_LOOKUP_COST +
+ (double) rows * (KEY_NEXT_FIND_COST + KEY_COPY_COST));
}
/*
@@ -3764,8 +3787,7 @@ public:
*/
inline double ha_key_scan_and_compare_time(uint index, ha_rows rows)
{
- return (ha_key_scan_time(index) +
- (double) rows * (KEY_COPY_COST + WHERE_COST));
+ return ha_key_scan_time(index, rows) + (double) rows * WHERE_COST;
}
virtual const key_map *keys_to_use_for_scanning() { return &key_map_empty; }
@@ -5218,7 +5240,7 @@ public:
ha_share= arg_ha_share;
return false;
}
- void set_table(TABLE* table_arg) { table= table_arg; }
+ inline void set_table(TABLE* table_arg);
int get_lock_type() const { return m_lock_type; }
public:
/* XXX to be removed, see ha_partition::partition_ht() */
@@ -5292,7 +5314,7 @@ protected:
void unlock_shared_ha_data();
/*
- Mroonga needs to call read_time() directly for it's internal handler
+ Mroonga needs to call some xxx_time() directly for it's internal handler
methods
*/
friend class ha_mroonga;
diff --git a/sql/item_func.cc b/sql/item_func.cc
index a07595cbbd8..2f110406a6d 100644
--- a/sql/item_func.cc
+++ b/sql/item_func.cc
@@ -5920,6 +5920,7 @@ bool Item_func_get_system_var::fix_length_and_dec(THD *thd)
decimals=0;
break;
case SHOW_DOUBLE:
+ case SHOW_OPTIMIZER_COST:
decimals= 6;
collation= DTCollation_numeric();
fix_char_length(DBL_DIG + 6);
@@ -5977,6 +5978,7 @@ const Type_handler *Item_func_get_system_var::type_handler() const
case SHOW_CHAR_PTR:
case SHOW_LEX_STRING:
return &type_handler_varchar;
+ case SHOW_OPTIMIZER_COST:
case SHOW_DOUBLE:
return &type_handler_double;
default:
diff --git a/sql/json_table.cc b/sql/json_table.cc
index 05ee83bd3d8..949175d8027 100644
--- a/sql/json_table.cc
+++ b/sql/json_table.cc
@@ -54,6 +54,7 @@ public:
bzero(&m_hton, sizeof(m_hton));
m_hton.tablefile_extensions= hton_no_exts;
m_hton.slot= HA_SLOT_UNDEF;
+ m_hton.flags= HTON_HIDDEN;
}
};
@@ -245,6 +246,10 @@ public:
int open(const char *name, int mode, uint test_if_locked) override
{ return 0; }
int close(void) override { return 0; }
+ void update_optimizer_costs(OPTIMIZER_COSTS *costs)
+ {
+ memcpy(costs, &heap_optimizer_costs, sizeof(*costs));
+ }
int rnd_init(bool scan) override;
int rnd_next(uchar *buf) override;
int rnd_pos(uchar * buf, uchar *pos) override;
diff --git a/sql/keycaches.cc b/sql/keycaches.cc
index 10bec7c1de8..250a287e229 100644
--- a/sql/keycaches.cc
+++ b/sql/keycaches.cc
@@ -15,6 +15,10 @@
#include "mariadb.h"
#include "keycaches.h"
+#include "optimizer_costs.h"
+#include "optimizer_defaults.h"
+#include "handler.h"
+#include "sql_class.h"
/****************************************************************************
Named list handling
@@ -22,10 +26,13 @@
NAMED_ILIST key_caches;
NAMED_ILIST rpl_filters;
+NAMED_ILIST linked_optimizer_costs;
extern "C" PSI_memory_key key_memory_KEY_CACHE;
extern PSI_memory_key key_memory_NAMED_ILINK_name;
+LEX_CSTRING default_base= {STRING_WITH_LEN("default")};
+
/**
ilink (intrusive list element) with a name
*/
@@ -46,7 +53,7 @@ public:
}
inline bool cmp(const char *name_cmp, size_t length)
{
- return length == name_length && !memcmp(name, name_cmp, length);
+ return !system_charset_info->strnncoll(name, name_length, name_cmp, length);
}
~NAMED_ILINK()
{
@@ -72,7 +79,8 @@ uchar* find_named(I_List<NAMED_ILINK> *list, const char *name, size_t length,
}
-bool NAMED_ILIST::delete_element(const char *name, size_t length, void (*free_element)(const char *name, void*))
+bool NAMED_ILIST::delete_element(const char *name, size_t length,
+ void (*free_element)(const char *name, void*))
{
I_List_iterator<NAMED_ILINK> it(*this);
NAMED_ILINK *element;
@@ -104,14 +112,12 @@ void NAMED_ILIST::delete_elements(void (*free_element)(const char *name, void*))
/* Key cache functions */
-LEX_CSTRING default_key_cache_base= {STRING_WITH_LEN("default")};
-
KEY_CACHE zero_key_cache; ///< @@nonexistent_cache.param->value_ptr() points here
KEY_CACHE *get_key_cache(const LEX_CSTRING *cache_name)
{
if (!cache_name || ! cache_name->length)
- cache_name= &default_key_cache_base;
+ cache_name= &default_base;
return ((KEY_CACHE*) find_named(&key_caches,
cache_name->str, cache_name->length, 0));
}
@@ -234,3 +240,128 @@ void free_all_rpl_filters()
{
rpl_filters.delete_elements(free_rpl_filter);
}
+
+
+/******************************************************************************
+ Optimizer costs functions
+******************************************************************************/
+
+LEX_CSTRING default_costs_base= {STRING_WITH_LEN("default")};
+
+OPTIMIZER_COSTS default_optimizer_costs=
+{
+ DEFAULT_DISK_READ_COST, // disk_read_cost
+ DEFAULT_INDEX_BLOCK_COPY_COST, // index_block_copy_cost
+ DEFAULT_WHERE_COST/4, // key_cmp_cost
+ DEFAULT_KEY_COPY_COST, // key_copy_cost
+ DEFAULT_KEY_LOOKUP_COST, // key_lookup_cost
+ DEFAULT_KEY_NEXT_FIND_COST, // key_next_find_cost
+ DEFAULT_DISK_READ_RATIO, // disk_read_ratio
+ DEFAULT_ROW_COPY_COST, // row_copy_cost
+ DEFAULT_ROW_LOOKUP_COST, // row_lookup_cost
+ DEFAULT_ROW_NEXT_FIND_COST, // row_next_find_cost
+ DEFAULT_ROWID_COMPARE_COST, // rowid_compare_cost
+ DEFAULT_ROWID_COPY_COST, // rowid_copy_cost
+ 1 // Cannot be deleted
+};
+
+OPTIMIZER_COSTS heap_optimizer_costs, tmp_table_optimizer_costs;
+
+OPTIMIZER_COSTS *get_optimizer_costs(const LEX_CSTRING *cache_name)
+{
+ if (!cache_name->length)
+ return &default_optimizer_costs;
+ return ((OPTIMIZER_COSTS*) find_named(&linked_optimizer_costs,
+ cache_name->str, cache_name->length,
+ 0));
+}
+
+OPTIMIZER_COSTS *create_optimizer_costs(const char *name, size_t length)
+{
+ OPTIMIZER_COSTS *optimizer_costs;
+ DBUG_ENTER("create_optimizer_costs");
+ DBUG_PRINT("enter",("name: %.*s", (int) length, name));
+
+ if ((optimizer_costs= (OPTIMIZER_COSTS*)
+ my_malloc(key_memory_KEY_CACHE,
+ sizeof(OPTIMIZER_COSTS), MYF(MY_ZEROFILL | MY_WME))))
+ {
+ if (!new NAMED_ILINK(&linked_optimizer_costs, name, length,
+ (uchar*) optimizer_costs))
+ {
+ my_free(optimizer_costs);
+ optimizer_costs= 0;
+ }
+ else
+ {
+ /* Mark that values are not yet set */
+ for (uint i=0 ; i < sizeof(OPTIMIZER_COSTS)/sizeof(double) ; i++)
+ ((double*) optimizer_costs)[i]= OPTIMIZER_COST_UNDEF;
+ }
+ }
+ DBUG_RETURN(optimizer_costs);
+}
+
+
+OPTIMIZER_COSTS *get_or_create_optimizer_costs(const char *name, size_t length)
+{
+ LEX_CSTRING optimizer_costs_name;
+ OPTIMIZER_COSTS *optimizer_costs;
+
+ optimizer_costs_name.str= name;
+ optimizer_costs_name.length= length;
+ if (!(optimizer_costs= get_optimizer_costs(&optimizer_costs_name)))
+ optimizer_costs= create_optimizer_costs(name, length);
+ return optimizer_costs;
+}
+
+extern "C"
+{
+bool process_optimizer_costs(process_optimizer_costs_t func, TABLE *param)
+{
+ I_List_iterator<NAMED_ILINK> it(linked_optimizer_costs);
+ NAMED_ILINK *element;
+ int res= 0;
+
+ while ((element= it++))
+ {
+ LEX_CSTRING name= { element->name, element->name_length };
+ OPTIMIZER_COSTS *costs= (OPTIMIZER_COSTS *) element->data;
+ res |= func(&name, costs, param);
+ }
+ return res != 0;
+}
+}
+
+bool create_default_optimizer_costs()
+{
+ return (new NAMED_ILINK(&linked_optimizer_costs,
+ default_base.str, default_base.length,
+ (uchar*) &default_optimizer_costs)) == 0;
+}
+
+
+/*
+ Make a copy of heap and tmp_table engine costs to be able to create
+ internal temporary tables without taking a mutex.
+*/
+
+void copy_tmptable_optimizer_costs()
+{
+ memcpy(&heap_optimizer_costs, heap_hton->optimizer_costs,
+ sizeof(heap_optimizer_costs));
+ memcpy(&tmp_table_optimizer_costs, TMP_ENGINE_HTON->optimizer_costs,
+ sizeof(tmp_table_optimizer_costs));
+}
+
+
+static void free_optimizer_costs(const char *name, void *cost)
+{
+ if ((OPTIMIZER_COSTS*) cost != &default_optimizer_costs)
+ my_free(cost);
+}
+
+void free_all_optimizer_costs()
+{
+ linked_optimizer_costs.delete_elements(free_optimizer_costs);
+}
diff --git a/sql/keycaches.h b/sql/keycaches.h
index 68c3dd3a2b0..721251b6745 100644
--- a/sql/keycaches.h
+++ b/sql/keycaches.h
@@ -35,7 +35,7 @@ class NAMED_ILIST: public I_List<NAMED_ILINK>
};
/* For key cache */
-extern LEX_CSTRING default_key_cache_base;
+extern LEX_CSTRING default_base;
extern KEY_CACHE zero_key_cache;
extern NAMED_ILIST key_caches;
diff --git a/sql/multi_range_read.cc b/sql/multi_range_read.cc
index 234c4cdfd7a..d4103f669fa 100644
--- a/sql/multi_range_read.cc
+++ b/sql/multi_range_read.cc
@@ -20,6 +20,7 @@
#include "key.h"
#include "sql_statistics.h"
#include "rowid_filter.h"
+#include "optimizer_defaults.h"
/****************************************************************************
* Default MRR implementation (MRR to non-MRR converter)
@@ -302,46 +303,37 @@ handler::multi_range_read_info_const(uint keyno, RANGE_SEQ_IF *seq,
if (total_rows != HA_POS_ERROR)
{
- double io_cost= avg_io_cost();
- double range_lookup_cost= (io_cost * KEY_LOOKUP_COST *
- optimizer_cache_cost);
+ double key_cost;
set_if_smaller(total_rows, max_rows);
/* The following calculation is the same as in multi_range_read_info(): */
*flags |= HA_MRR_USE_DEFAULT_IMPL;
cost->reset();
- cost->avg_io_cost= cost->idx_avg_io_cost= io_cost;
+ cost->avg_io_cost= cost->idx_avg_io_cost= 0; // Not used!
if (!is_clustering_key(keyno))
{
- cost->idx_io_count= (double) io_blocks;
+ key_cost= ha_keyread_time(keyno, n_ranges, total_rows, io_blocks);
+ cost->idx_cpu_cost= key_cost;
+
if (!(*flags & HA_MRR_INDEX_ONLY))
{
- cost->idx_cpu_cost= (ha_keyread_time(keyno, 1, total_rows) +
- (n_ranges-1) * range_lookup_cost);
- cost->cpu_cost= ha_read_time(keyno, 0, total_rows);
- cost->copy_cost= rows2double(total_rows) * ROW_COPY_COST;
+ /* ha_rnd_pos_time includes ROW_COPY_COST */
+ cost->cpu_cost= ha_rnd_pos_time(total_rows);
}
else
{
/* Index only read */
- cost->idx_cpu_cost= (ha_keyread_time(keyno, 1, total_rows) +
- (n_ranges-1) * range_lookup_cost);
- cost->copy_cost= rows2double(total_rows) * KEY_COPY_COST;
+ cost->copy_cost= rows2double(total_rows) * KEY_COPY_COST;
}
}
else
{
- /*
- Clustered index
- If all index dives are to a few blocks, then limit the
- ranges used by read_time to the number of dives.
- */
+ /* Clustered index */
io_blocks+= unassigned_single_point_ranges;
- uint limited_ranges= (uint) MY_MIN((ulonglong) n_ranges, io_blocks);
- cost->idx_cpu_cost= limited_ranges * range_lookup_cost;
- cost->cpu_cost= ha_read_time(keyno, 0, total_rows);
- cost->copy_cost= rows2double(total_rows) * ROW_COPY_COST;
+ key_cost= ha_keyread_time(keyno, n_ranges, total_rows, io_blocks);
+ cost->idx_cpu_cost= key_cost;
+ cost->copy_cost= rows2double(total_rows) * ROW_COPY_COST;
}
cost->comp_cost= (rows2double(total_rows) * WHERE_COST +
MULTI_RANGE_READ_SETUP_COST);
@@ -378,7 +370,7 @@ handler::multi_range_read_info_const(uint keyno, RANGE_SEQ_IF *seq,
@param keyno Index number
@param n_ranges Estimated number of ranges (i.e. intervals) in the
range sequence.
- @param n_rows Estimated total number of records contained within all
+ @param total_rows Estimated total number of records contained within all
of the ranges
@param bufsz INOUT IN: Size of the buffer available for use
OUT: Size of the buffer that will be actually used, or
@@ -393,7 +385,7 @@ handler::multi_range_read_info_const(uint keyno, RANGE_SEQ_IF *seq,
other Error or can't perform the requested scan
*/
-ha_rows handler::multi_range_read_info(uint keyno, uint n_ranges, uint n_rows,
+ha_rows handler::multi_range_read_info(uint keyno, uint n_ranges, uint total_rows,
uint key_parts, uint *bufsz,
uint *flags, Cost_estimate *cost)
{
@@ -410,38 +402,27 @@ ha_rows handler::multi_range_read_info(uint keyno, uint n_ranges, uint n_rows,
/* Produce the same cost as non-MRR code does */
if (!is_clustering_key(keyno))
{
- double range_lookup_cost= (avg_io_cost() * KEY_LOOKUP_COST *
- optimizer_cache_cost);
- /*
- idx_io_count could potentially be increased with the number of
- index leaf blocks we have to read for finding n_rows.
- */
- cost->idx_io_count= n_ranges;
+ double key_cost= ha_keyread_time(keyno, n_ranges, total_rows, 0);
+ cost->idx_cpu_cost= key_cost;
+
if (!(*flags & HA_MRR_INDEX_ONLY))
{
- cost->idx_cpu_cost= (keyread_time(keyno, 1, n_rows) +
- (n_ranges-1) * range_lookup_cost);
- cost->cpu_cost= read_time(keyno, 0, n_rows);
- cost->copy_cost= rows2double(n_rows) * ROW_COPY_COST;
+ /* ha_rnd_pos_time includes ROW_COPY_COST */
+ cost->cpu_cost= ha_rnd_pos_time(total_rows);
}
else
{
- /*
- Same as above, but take into account copying the key to the upper
- level.
- */
- cost->idx_cpu_cost= (keyread_time(keyno, 1, n_rows) +
- (n_ranges-1) * range_lookup_cost);
- cost->copy_cost= rows2double(n_rows) * KEY_COPY_COST;
+ /* Index only read */
+ cost->copy_cost= rows2double(total_rows) * KEY_COPY_COST;
}
}
else
{
/* Clustering key */
- cost->cpu_cost= read_time(keyno, n_ranges, n_rows);
- cost->copy_cost= rows2double(n_rows) * ROW_COPY_COST;
+ cost->cpu_cost= ha_keyread_time(keyno, n_ranges, total_rows, 0);
+ cost->copy_cost= rows2double(total_rows) * ROW_COPY_COST;
}
- cost->comp_cost= rows2double(n_rows) * WHERE_COST;
+ cost->comp_cost= rows2double(total_rows) * WHERE_COST;
return 0;
}
@@ -2043,7 +2024,7 @@ bool DsMrr_impl::get_disk_sweep_mrr_cost(uint keynr, ha_rows rows, uint flags,
cost->mem_cost= (double)rows_in_last_step * elem_size;
/* Total cost of all index accesses */
- index_read_cost= primary_file->ha_keyread_and_copy_time(keynr, 1, rows);
+ index_read_cost= primary_file->ha_keyread_and_copy_time(keynr, 1, rows, 0);
cost->add_io(index_read_cost, 1 /* Random seeks */);
return FALSE;
}
@@ -2081,42 +2062,6 @@ void get_sort_and_sweep_cost(TABLE *table, ha_rows nrows, Cost_estimate *cost)
/**
Get cost of reading nrows table records in a "disk sweep"
- A disk sweep read is a sequence of handler->rnd_pos(rowid) calls that made
- for an ordered sequence of rowids.
-
- We assume hard disk IO. The read is performed as follows:
-
- 1. The disk head is moved to the needed cylinder
- 2. The controller waits for the plate to rotate
- 3. The data is transferred
-
- Time to do #3 is insignificant compared to #2+#1.
-
- Time to move the disk head is proportional to head travel distance.
-
- Time to wait for the plate to rotate depends on whether the disk head
- was moved or not.
-
- If disk head wasn't moved, the wait time is proportional to distance
- between the previous block and the block we're reading.
-
- If the head was moved, we don't know how much we'll need to wait for the
- plate to rotate. We assume the wait time to be a variate with a mean of
- 0.5 of full rotation time.
-
- Our cost units are "random disk seeks". The cost of random disk seek is
- actually not a constant, it depends one range of cylinders we're going
- to access. We make it constant by introducing a fuzzy concept of "typical
- datafile length" (it's fuzzy as it's hard to tell whether it should
- include index file, temp.tables etc). Then random seek cost is:
-
- 1 = half_rotation_cost + move_cost * 1/3 * typical_data_file_length
-
- We define half_rotation_cost as DISK_SEEK_BASE_COST=0.9.
-
- If handler::avg_io_cost() < 1.0, then we will trust the handler
- when it comes to the average cost (this is for example true for HEAP).
-
@param table Table to be accessed
@param nrows Number of rows to retrieve
@param interrupted TRUE <=> Assume that the disk sweep will be
@@ -2131,8 +2076,7 @@ void get_sweep_read_cost(TABLE *table, ha_rows nrows, bool interrupted,
cost->reset();
#ifndef OLD_SWEEP_COST
- cost->cpu_cost= table->file->ha_rnd_pos_time(nrows);
- cost->avg_io_cost= table->file->avg_io_cost();
+ cost->cpu_cost= table->file->ha_rnd_pos_call_time(nrows);
#else
if (table->file->pk_is_clustering_key(table->s->primary_key))
{
diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index edb9732c2eb..738a606f072 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -52,6 +52,7 @@
#include "sql_expression_cache.h" // subquery_cache_miss, subquery_cache_hit
#include "sys_vars_shared.h"
#include "ddl_log.h"
+#include "optimizer_defaults.h"
#include <m_ctype.h>
#include <my_dir.h>
@@ -733,7 +734,7 @@ mysql_mutex_t LOCK_prepared_stmt_count;
#ifdef HAVE_OPENSSL
mysql_mutex_t LOCK_des_key_file;
#endif
-mysql_mutex_t LOCK_backup_log;
+mysql_mutex_t LOCK_backup_log, LOCK_optimizer_costs;
mysql_rwlock_t LOCK_grant, LOCK_sys_init_connect, LOCK_sys_init_slave;
mysql_rwlock_t LOCK_ssl_refresh;
mysql_rwlock_t LOCK_all_status_vars;
@@ -903,7 +904,7 @@ PSI_mutex_key key_BINLOG_LOCK_index, key_BINLOG_LOCK_xid_list,
key_LOCK_crypt, key_LOCK_delayed_create,
key_LOCK_delayed_insert, key_LOCK_delayed_status, key_LOCK_error_log,
key_LOCK_gdl, key_LOCK_global_system_variables,
- key_LOCK_manager, key_LOCK_backup_log,
+ key_LOCK_manager, key_LOCK_backup_log, key_LOCK_optimizer_costs,
key_LOCK_prepared_stmt_count,
key_LOCK_rpl_status, key_LOCK_server_started,
key_LOCK_status, key_LOCK_temp_pool,
@@ -966,6 +967,7 @@ static PSI_mutex_info all_server_mutexes[]=
{ &key_hash_filo_lock, "hash_filo::lock", 0},
{ &key_LOCK_active_mi, "LOCK_active_mi", PSI_FLAG_GLOBAL},
{ &key_LOCK_backup_log, "LOCK_backup_log", PSI_FLAG_GLOBAL},
+ { &key_LOCK_optimizer_costs, "LOCK_optimizer_costs", PSI_FLAG_GLOBAL},
{ &key_LOCK_temp_pool, "LOCK_temp_pool", PSI_FLAG_GLOBAL},
{ &key_LOCK_thread_id, "LOCK_thread_id", PSI_FLAG_GLOBAL},
{ &key_LOCK_crypt, "LOCK_crypt", PSI_FLAG_GLOBAL},
@@ -2006,6 +2008,7 @@ static void clean_up(bool print_message)
mdl_destroy();
dflt_key_cache= 0;
key_caches.delete_elements(free_key_cache);
+ free_all_optimizer_costs();
wt_end();
multi_keycache_free();
sp_cache_end();
@@ -2128,6 +2131,7 @@ static void clean_up_mutexes()
mysql_mutex_destroy(&LOCK_active_mi);
mysql_rwlock_destroy(&LOCK_ssl_refresh);
mysql_mutex_destroy(&LOCK_backup_log);
+ mysql_mutex_destroy(&LOCK_optimizer_costs);
mysql_mutex_destroy(&LOCK_temp_pool);
mysql_rwlock_destroy(&LOCK_sys_init_connect);
mysql_rwlock_destroy(&LOCK_sys_init_slave);
@@ -4502,6 +4506,8 @@ static int init_thread_environment()
mysql_mutex_init(key_LOCK_commit_ordered, &LOCK_commit_ordered,
MY_MUTEX_INIT_SLOW);
mysql_mutex_init(key_LOCK_backup_log, &LOCK_backup_log, MY_MUTEX_INIT_FAST);
+ mysql_mutex_init(key_LOCK_optimizer_costs, &LOCK_optimizer_costs,
+ MY_MUTEX_INIT_FAST);
mysql_mutex_init(key_LOCK_temp_pool, &LOCK_temp_pool, MY_MUTEX_INIT_FAST);
#ifdef HAVE_OPENSSL
@@ -4998,10 +5004,10 @@ static int init_server_components()
/* need to configure logging before initializing storage engines */
if (!opt_bin_log_used && !WSREP_ON)
{
- if (opt_log_slave_updates)
+ if (opt_log_slave_updates && (global_system_variables.log_warnings > 1))
sql_print_warning("You need to use --log-bin to make "
"--log-slave-updates work.");
- if (binlog_format_used)
+ if (binlog_format_used && (global_system_variables.log_warnings > 1))
sql_print_warning("You need to use --log-bin to make "
"--binlog-format work.");
}
@@ -5432,6 +5438,7 @@ static int init_server_components()
unireg_abort(1);
}
#endif
+ copy_tmptable_optimizer_costs();
#ifdef WITH_WSREP
/*
@@ -5490,7 +5497,7 @@ static int init_server_components()
}
else
{
- if (binlog_expire_logs_seconds)
+ if (binlog_expire_logs_seconds && (global_system_variables.log_warnings > 1))
sql_print_warning("You need to use --log-bin to make --expire-logs-days "
"or --binlog-expire-logs-seconds work.");
}
@@ -7825,12 +7832,17 @@ static int mysql_init_variables(void)
strnmov(server_version, MYSQL_SERVER_VERSION, sizeof(server_version)-1);
thread_cache.init();
key_caches.empty();
- if (!(dflt_key_cache= get_or_create_key_cache(default_key_cache_base.str,
- default_key_cache_base.length)))
+ if (!(dflt_key_cache= get_or_create_key_cache(default_base.str,
+ default_base.length)))
{
sql_print_error("Cannot allocate the keycache");
return 1;
}
+ if (create_default_optimizer_costs())
+ {
+ sql_print_error("Cannot allocate optimizer_costs");
+ return 1;
+ }
/* set key_cache_hash.default_value = dflt_key_cache */
multi_keycache_init();
@@ -8404,11 +8416,14 @@ mysqld_get_one_option(const struct my_option *opt, const char *argument,
}
-/** Handle arguments for multiple key caches. */
+/**
+ Handle arguments for multiple key caches, replication_options and
+ optimizer_costs
+ */
C_MODE_START
-static void*
+static void *
mysql_getopt_value(const char *name, uint length,
const struct my_option *option, int *error)
{
@@ -8446,6 +8461,7 @@ mysql_getopt_value(const char *name, uint length,
}
/* We return in all cases above. Let us silence -Wimplicit-fallthrough */
DBUG_ASSERT(0);
+ break;
#ifdef HAVE_REPLICATION
/* fall through */
case OPT_REPLICATE_DO_DB:
@@ -8473,11 +8489,87 @@ mysql_getopt_value(const char *name, uint length,
}
return 0;
}
-#endif
+ break;
+#endif
+ case OPT_COSTS_DISK_READ_COST:
+ case OPT_COSTS_INDEX_BLOCK_COPY_COST:
+ case OPT_COSTS_KEY_CMP_COST:
+ case OPT_COSTS_KEY_COPY_COST:
+ case OPT_COSTS_KEY_LOOKUP_COST:
+ case OPT_COSTS_KEY_NEXT_FIND_COST:
+ case OPT_COSTS_DISK_READ_RATIO:
+ case OPT_COSTS_ROW_COPY_COST:
+ case OPT_COSTS_ROW_LOOKUP_COST:
+ case OPT_COSTS_ROW_NEXT_FIND_COST:
+ case OPT_COSTS_ROWID_CMP_COST:
+ case OPT_COSTS_ROWID_COPY_COST:
+ {
+ OPTIMIZER_COSTS *costs;
+ if (unlikely(!(costs= get_or_create_optimizer_costs(name, length))))
+ {
+ if (error)
+ *error= EXIT_OUT_OF_MEMORY;
+ return 0;
+ }
+ switch (option->id) {
+ case OPT_COSTS_DISK_READ_COST:
+ return &costs->disk_read_cost;
+ case OPT_COSTS_INDEX_BLOCK_COPY_COST:
+ return &costs->index_block_copy_cost;
+ case OPT_COSTS_KEY_CMP_COST:
+ return &costs->key_cmp_cost;
+ case OPT_COSTS_KEY_COPY_COST:
+ return &costs->key_copy_cost;
+ case OPT_COSTS_KEY_LOOKUP_COST:
+ return &costs->key_lookup_cost;
+ case OPT_COSTS_KEY_NEXT_FIND_COST:
+ return &costs->key_next_find_cost;
+ case OPT_COSTS_DISK_READ_RATIO:
+ return &costs->disk_read_ratio;
+ case OPT_COSTS_ROW_COPY_COST:
+ return &costs->row_copy_cost;
+ case OPT_COSTS_ROW_LOOKUP_COST:
+ return &costs->row_lookup_cost;
+ case OPT_COSTS_ROW_NEXT_FIND_COST:
+ return &costs->row_next_find_cost;
+ case OPT_COSTS_ROWID_CMP_COST:
+ return &costs->rowid_cmp_cost;
+ case OPT_COSTS_ROWID_COPY_COST:
+ return &costs->rowid_copy_cost;
+ default:
+ DBUG_ASSERT(0);
+ }
+ }
}
return option->value;
}
+
+static void
+mariadb_getopt_adjust_value(const struct my_option *option, void *value)
+{
+ switch (option->id) {
+ case OPT_COSTS_DISK_READ_COST:
+ case OPT_COSTS_INDEX_BLOCK_COPY_COST:
+ case OPT_COSTS_KEY_CMP_COST:
+ case OPT_COSTS_KEY_COPY_COST:
+ case OPT_COSTS_KEY_LOOKUP_COST:
+ case OPT_COSTS_KEY_NEXT_FIND_COST:
+ case OPT_COSTS_DISK_READ_RATIO:
+ case OPT_COSTS_ROW_COPY_COST:
+ case OPT_COSTS_ROW_LOOKUP_COST:
+ case OPT_COSTS_ROW_NEXT_FIND_COST:
+ case OPT_COSTS_ROWID_CMP_COST:
+ case OPT_COSTS_ROWID_COPY_COST:
+ /* Value from command is line given in usec. Convert to ms */
+ *(double*) value= *(double*) value/1000.0;
+ break;
+ default:
+ break;
+ }
+}
+
+
static void option_error_reporter(enum loglevel level, const char *format, ...)
{
va_list args;
@@ -8516,6 +8608,7 @@ static int get_options(int *argc_ptr, char ***argv_ptr)
my_getopt_get_addr= mysql_getopt_value;
my_getopt_error_reporter= option_error_reporter;
+ my_getopt_adjust_value= mariadb_getopt_adjust_value;
/* prepare all_options array */
my_init_dynamic_array(PSI_INSTRUMENT_ME, &all_options, sizeof(my_option),
diff --git a/sql/mysqld.h b/sql/mysqld.h
index 449c2e50a58..54cafdcde15 100644
--- a/sql/mysqld.h
+++ b/sql/mysqld.h
@@ -330,7 +330,7 @@ extern PSI_mutex_key key_BINLOG_LOCK_index, key_BINLOG_LOCK_xid_list,
key_LOCK_logger, key_LOCK_manager,
key_LOCK_prepared_stmt_count,
key_LOCK_rpl_status, key_LOCK_server_started,
- key_LOCK_status,
+ key_LOCK_status, key_LOCK_optimizer_costs,
key_LOCK_thd_data, key_LOCK_thd_kill,
key_LOCK_user_conn, key_LOG_LOCK_log,
key_master_info_data_lock, key_master_info_run_lock,
@@ -759,7 +759,8 @@ extern mysql_mutex_t
LOCK_error_log, LOCK_delayed_insert, LOCK_short_uuid_generator,
LOCK_delayed_status, LOCK_delayed_create, LOCK_crypt, LOCK_timezone,
LOCK_active_mi, LOCK_manager, LOCK_user_conn,
- LOCK_prepared_stmt_count, LOCK_error_messages, LOCK_backup_log;
+ LOCK_prepared_stmt_count, LOCK_error_messages, LOCK_backup_log,
+ LOCK_optimizer_costs;
extern MYSQL_PLUGIN_IMPORT mysql_mutex_t LOCK_global_system_variables;
extern mysql_rwlock_t LOCK_all_status_vars;
extern mysql_mutex_t LOCK_start_thread;
@@ -794,6 +795,18 @@ enum options_mysqld
OPT_BINLOG_IGNORE_DB,
OPT_BIN_LOG,
OPT_BOOTSTRAP,
+ OPT_COSTS_DISK_READ_COST,
+ OPT_COSTS_INDEX_BLOCK_COPY_COST,
+ OPT_COSTS_KEY_CMP_COST,
+ OPT_COSTS_KEY_COPY_COST,
+ OPT_COSTS_KEY_LOOKUP_COST,
+ OPT_COSTS_KEY_NEXT_FIND_COST,
+ OPT_COSTS_DISK_READ_RATIO,
+ OPT_COSTS_ROW_COPY_COST,
+ OPT_COSTS_ROW_LOOKUP_COST,
+ OPT_COSTS_ROW_NEXT_FIND_COST,
+ OPT_COSTS_ROWID_CMP_COST,
+ OPT_COSTS_ROWID_COPY_COST,
OPT_EXPIRE_LOGS_DAYS,
OPT_BINLOG_EXPIRE_LOGS_SECONDS,
OPT_CONSOLE,
diff --git a/sql/opt_range.cc b/sql/opt_range.cc
index 72e9c35f3e9..9224f1d03f2 100644
--- a/sql/opt_range.cc
+++ b/sql/opt_range.cc
@@ -2740,7 +2740,7 @@ int SQL_SELECT::test_quick_select(THD *thd, key_map keys_to_use,
table_info.add_table_name(head);
Json_writer_object trace_range(thd, "range_analysis");
- if (unlikely(thd->trace_started()))
+ if (unlikely(thd->trace_started()) && read_time != DBL_MAX)
{
Json_writer_object table_rec(thd, "table_scan");
table_rec.add("rows", records).add("cost", read_time);
@@ -2867,10 +2867,11 @@ int SQL_SELECT::test_quick_select(THD *thd, key_map keys_to_use,
thd->mem_root= &alloc;
/* Calculate cost of full index read for the shortest covering index */
- if (!force_quick_range && !head->covering_keys.is_clear_all())
+ if (!force_quick_range && !head->covering_keys.is_clear_all() &&
+ !head->no_keyread)
{
- int key_for_use= find_shortest_key(head, &head->covering_keys);
double key_read_time;
+ uint key_for_use= find_shortest_key(head, &head->covering_keys);
key_read_time= head->file->ha_key_scan_and_compare_time(key_for_use,
records);
DBUG_PRINT("info", ("'all'+'using index' scan will be using key %d, "
@@ -3057,7 +3058,8 @@ int SQL_SELECT::test_quick_select(THD *thd, key_map keys_to_use,
param.table->set_opt_range_condition_rows(group_trp->records);
DBUG_PRINT("info", ("table_rows: %llu opt_range_condition_rows: %llu "
"group_trp->records: %ull",
- table_records, param.table->opt_range_condition_rows,
+ table_records,
+ param.table->opt_range_condition_rows,
group_trp->records));
Json_writer_object grp_summary(thd, "best_group_range_summary");
@@ -5079,7 +5081,7 @@ static double get_sweep_read_cost(const PARAM *param, ha_rows records,
{
DBUG_ENTER("get_sweep_read_cost");
#ifndef OLD_SWEEP_COST
- double cost= (param->table->file->ha_rnd_pos_time(records) +
+ double cost= (param->table->file->ha_rnd_pos_call_time(records) +
(add_time_for_compare ?
records * param->thd->variables.optimizer_where_cost : 0));
DBUG_PRINT("return", ("cost: %g", cost));
@@ -5095,7 +5097,7 @@ static double get_sweep_read_cost(const PARAM *param, ha_rows records,
We are using the primary key to find the rows.
Calculate the cost for this.
*/
- result= table->file->ha_rnd_pos_time(records);
+ result= table->file->ha_rnd_pos_call_time(records);
}
else
{
@@ -5133,7 +5135,7 @@ static double get_sweep_read_cost(const PARAM *param, ha_rows records,
*/
result= busy_blocks;
}
- result+= rows2double(n_rows) * ROW_COPY_COST_THD(param->table->thd);
+ result+= rows2double(n_rows) * param->table->file->ROW_COPY_COST);
}
DBUG_PRINT("return",("cost: %g", result));
DBUG_RETURN(result);
@@ -5347,7 +5349,7 @@ TABLE_READ_PLAN *get_best_disjunct_quick(PARAM *param, SEL_IMERGE *imerge,
is done in QUICK_RANGE_SELECT::row_in_ranges)
*/
double rid_comp_cost= (rows2double(non_cpk_scan_records) *
- ROWID_COMPARE_COST_THD(param->thd));
+ default_optimizer_costs.rowid_cmp_cost);
imerge_cost+= rid_comp_cost;
trace_best_disjunct.add("cost_of_mapping_rowid_in_non_clustered_pk_scan",
rid_comp_cost);
@@ -5359,7 +5361,7 @@ TABLE_READ_PLAN *get_best_disjunct_quick(PARAM *param, SEL_IMERGE *imerge,
double sweep_cost= get_sweep_read_cost(param, non_cpk_scan_records, 0);
imerge_cost+= sweep_cost;
trace_best_disjunct.
- add("records", non_cpk_scan_records).
+ add("rows", non_cpk_scan_records).
add("cost_sort_rowid_and_read_disk", sweep_cost).
add("cost", imerge_cost);
}
@@ -5389,7 +5391,7 @@ TABLE_READ_PLAN *get_best_disjunct_quick(PARAM *param, SEL_IMERGE *imerge,
}
{
- const double dup_removal_cost= Unique::get_use_cost(
+ const double dup_removal_cost= Unique::get_use_cost(thd,
param->imerge_cost_buff, (uint)non_cpk_scan_records,
param->table->file->ref_length,
(size_t)param->thd->variables.sortbuff_size,
@@ -5463,10 +5465,9 @@ skip_to_ror_scan:
double cost;
if ((*cur_child)->is_ror)
{
- /* Ok, we have index_only cost, now get full rows scan cost */
+ /* Ok, we have index_only cost, now get full rows lokoup cost */
cost= param->table->file->
- ha_read_and_compare_time(param->real_keynr[(*cur_child)->key_idx], 1,
- (*cur_child)->records);
+ ha_rnd_pos_call_and_compare_time((*cur_child)->records);
}
else
cost= read_time;
@@ -5935,7 +5936,7 @@ bool prepare_search_best_index_intersect(PARAM *param,
continue;
}
- cost= table->opt_range[(*index_scan)->keynr].index_only_fetch_cost(thd);
+ cost= table->opt_range[(*index_scan)->keynr].index_only_fetch_cost(table);
idx_scan.add("cost", cost);
@@ -6041,7 +6042,7 @@ bool prepare_search_best_index_intersect(PARAM *param,
selected_idx.add("index", key_info->name);
print_keyparts(thd, key_info, (*scan_ptr)->used_key_parts);
selected_idx.
- add("records", (*scan_ptr)->records).
+ add("rows", (*scan_ptr)->records).
add("filtered_records", (*scan_ptr)->filtered_out);
}
}
@@ -6058,7 +6059,7 @@ bool prepare_search_best_index_intersect(PARAM *param,
selected_idx.add("index", key_info->name);
print_keyparts(thd, key_info, (*scan_ptr)->used_key_parts);
selected_idx.
- add("records", (*scan_ptr)->records).
+ add("rows", (*scan_ptr)->records).
add("filtered_records", (*scan_ptr)->filtered_out);
}
}
@@ -6324,7 +6325,8 @@ double get_cpk_filter_cost(ha_rows filtered_records,
*/
static
-bool check_index_intersect_extension(PARTIAL_INDEX_INTERSECT_INFO *curr,
+bool check_index_intersect_extension(THD *thd,
+ PARTIAL_INDEX_INTERSECT_INFO *curr,
INDEX_SCAN_INFO *ext_index_scan,
PARTIAL_INDEX_INTERSECT_INFO *next)
{
@@ -6371,7 +6373,7 @@ bool check_index_intersect_extension(PARTIAL_INDEX_INTERSECT_INFO *curr,
size_t max_memory_size= common_info->max_memory_size;
records_sent_to_unique+= ext_index_scan_records;
- cost= Unique::get_use_cost(buff_elems, (size_t) records_sent_to_unique,
+ cost= Unique::get_use_cost(thd, buff_elems, (size_t) records_sent_to_unique,
key_size,
max_memory_size, compare_factor, TRUE,
&next->in_memory);
@@ -6382,7 +6384,7 @@ bool check_index_intersect_extension(PARTIAL_INDEX_INTERSECT_INFO *curr,
double cost2;
bool in_memory2;
ha_rows records2= records_sent_to_unique-records_filtered_out_by_cpk;
- cost2= Unique::get_use_cost(buff_elems, (size_t) records2, key_size,
+ cost2= Unique::get_use_cost(thd, buff_elems, (size_t) records2, key_size,
max_memory_size, compare_factor, TRUE,
&in_memory2);
cost2+= get_cpk_filter_cost(ext_index_scan_records, common_info->cpk_scan,
@@ -6442,7 +6444,8 @@ bool check_index_intersect_extension(PARTIAL_INDEX_INTERSECT_INFO *curr,
*/
static
-void find_index_intersect_best_extension(PARTIAL_INDEX_INTERSECT_INFO *curr)
+void find_index_intersect_best_extension(THD *thd,
+ PARTIAL_INDEX_INTERSECT_INFO *curr)
{
PARTIAL_INDEX_INTERSECT_INFO next;
COMMON_INDEX_INTERSECT_INFO *common_info= curr->common_info;
@@ -6475,8 +6478,9 @@ void find_index_intersect_best_extension(PARTIAL_INDEX_INTERSECT_INFO *curr)
{
*rem_first_index_scan_ptr= *index_scan_ptr;
*index_scan_ptr= rem_first_index_scan;
- if (check_index_intersect_extension(curr, *rem_first_index_scan_ptr, &next))
- find_index_intersect_best_extension(&next);
+ if (check_index_intersect_extension(thd, curr, *rem_first_index_scan_ptr,
+ &next))
+ find_index_intersect_best_extension(thd, &next);
*index_scan_ptr= *rem_first_index_scan_ptr;
*rem_first_index_scan_ptr= rem_first_index_scan;
}
@@ -6528,7 +6532,7 @@ TRP_INDEX_INTERSECT *get_best_index_intersect(PARAM *param, SEL_TREE *tree,
read_time))
DBUG_RETURN(NULL);
- find_index_intersect_best_extension(&init);
+ find_index_intersect_best_extension(thd, &init);
if (common.best_length <= 1 && !common.best_uses_cpk)
DBUG_RETURN(NULL);
@@ -6697,7 +6701,7 @@ ROR_SCAN_INFO *make_ror_scan(const PARAM *param, int idx, SEL_ARG *sel_arg)
*/
ror_scan->index_read_cost=
param->table->file->ha_keyread_and_copy_time(ror_scan->keynr, 1,
- ror_scan->records);
+ ror_scan->records, 0);
DBUG_RETURN(ror_scan);
}
@@ -13885,10 +13889,10 @@ get_best_group_min_max(PARAM *param, SEL_TREE *tree, double read_time)
cause= "not single_table";
else if (join->select_lex->olap == ROLLUP_TYPE) /* Check (B3) for ROLLUP */
cause= "rollup";
- else if (table->s->keys == 0) /* There are no indexes to use. */
+ else if (table->s->keys == 0) // There are no indexes to use.
cause= "no index";
else if (join->conds && join->conds->used_tables()
- & OUTER_REF_TABLE_BIT) /* Cannot execute with correlated conditions. */
+ & OUTER_REF_TABLE_BIT) // Cannot execute with correlated conditions.
cause= "correlated conditions";
if (cause)
@@ -14093,7 +14097,8 @@ get_best_group_min_max(PARAM *param, SEL_TREE *tree, double read_time)
does not qualify as covering in our case. If this is the case, below
we check that all query fields are indeed covered by 'cur_index'.
*/
- if (cur_index_info->user_defined_key_parts == table->actual_n_key_parts(cur_index_info)
+ if (cur_index_info->user_defined_key_parts ==
+ table->actual_n_key_parts(cur_index_info)
&& pk < MAX_KEY && cur_index != pk &&
(table->file->ha_table_flags() & HA_PRIMARY_KEY_IN_READ_INDEX))
{
@@ -14136,7 +14141,8 @@ get_best_group_min_max(PARAM *param, SEL_TREE *tree, double read_time)
first Item? If so, then why? What is the array for?
*/
/* Above we already checked that all group items are fields. */
- DBUG_ASSERT((*tmp_group->item)->real_item()->type() == Item::FIELD_ITEM);
+ DBUG_ASSERT((*tmp_group->item)->real_item()->type() ==
+ Item::FIELD_ITEM);
Item_field *group_field= (Item_field *) (*tmp_group->item)->real_item();
if (group_field->field->eq(cur_part->field))
{
@@ -15000,24 +15006,28 @@ void cost_group_min_max(TABLE* table, KEY *index_info, uint used_key_parts,
bool have_min, bool have_max,
double *read_cost, ha_rows *records)
{
+ uint keys_per_block, key_length;
ha_rows table_records;
ha_rows num_groups;
ha_rows num_blocks;
- uint keys_per_block;
ha_rows keys_per_group;
ha_rows keys_per_subgroup; /* Average number of keys in sub-groups */
/* formed by a key infix. */
double p_overlap; /* Probability that a sub-group overlaps two blocks. */
double quick_prefix_selectivity;
double io_cost;
+ handler *file= table->file;
DBUG_ENTER("cost_group_min_max");
+ /* Same code as in handler::key_read_time() */
table_records= table->stat_records();
- /* Assume block is 75 % full */
- keys_per_block= (uint) (table->file->stats.block_size * 3 / 4 /
- (index_info->key_length + table->file->ref_length)
- + 1);
- num_blocks= (ha_rows)(table_records / keys_per_block) + 1;
+ key_length= (index_info->key_length + file->ref_length);
+ num_blocks= (table_records * key_length / INDEX_BLOCK_FILL_FACTOR_DIV *
+ INDEX_BLOCK_FILL_FACTOR_MUL) / file->stats.block_size + 1;
+ keys_per_block= (file->stats.block_size /
+ (key_length * INDEX_BLOCK_FILL_FACTOR_MUL /
+ INDEX_BLOCK_FILL_FACTOR_DIV) +
+ 1);
/* Compute the number of keys in a group. */
if (!group_key_parts)
@@ -15035,7 +15045,10 @@ void cost_group_min_max(TABLE* table, KEY *index_info, uint used_key_parts,
keys_per_group= (table_records / 10) + 1;
}
}
- num_groups= (table_records / keys_per_group) + 1;
+ if (keys_per_group > 1)
+ num_groups= (table_records / keys_per_group) + 1;
+ else
+ num_groups= table_records;
/* Apply the selectivity of the quick select for group prefixes. */
if (range_tree && (quick_prefix_records != HA_POS_ERROR))
@@ -15059,8 +15072,7 @@ void cost_group_min_max(TABLE* table, KEY *index_info, uint used_key_parts,
/* There cannot be more groups than matched records */
set_if_smaller(num_groups, quick_prefix_records);
}
- /* Ensure we don't have more groups than rows in table */
- set_if_smaller(num_groups, table_records);
+ DBUG_ASSERT(num_groups <= table_records);
if (used_key_parts > group_key_parts)
{
@@ -15081,39 +15093,22 @@ void cost_group_min_max(TABLE* table, KEY *index_info, uint used_key_parts,
io_cost= (double) MY_MIN(num_groups * (1 + p_overlap), num_blocks);
}
else
- io_cost= (keys_per_group > keys_per_block) ?
- (have_min && have_max) ? (double) (num_groups + 1) :
- (double) num_groups :
- (double) num_blocks;
+ io_cost= ((keys_per_group > keys_per_block) ?
+ (have_min && have_max) ? (double) (num_groups + 1) :
+ (double) num_groups :
+ (double) num_blocks);
/*
CPU cost must be comparable to that of an index scan as computed
in SQL_SELECT::test_quick_select(). When the groups are small,
e.g. for a unique index, using index scan will be cheaper since it
reads the next record without having to re-position to it on every
- group. To make the CPU cost reflect this, we estimate the CPU cost
- as the sum of:
- 1. Cost for evaluating the condition for each num_group
- KEY_COMPARE_COST (similarly as for index scan).
- 2. Cost for navigating the index structure (assuming a b-tree).
- Note: We only add the cost for one index comparision per block. For a
- b-tree the number of comparisons will be larger. However the cost
- is low as all of the upper level b-tree blocks should be in
- memory.
- TODO: This cost should be provided by the storage engine.
- 3. Cost for comparing the row with the where clause
+ group.
*/
- const THD *thd= table->in_use;
- const double tree_traversal_cost=
- ceil(log(static_cast<double>(table_records))/
- log(static_cast<double>(keys_per_block))) *
- thd->variables.optimizer_key_cmp_cost;
-
- const double cpu_cost= (num_groups *
- (tree_traversal_cost +
- thd->variables.optimizer_where_cost));
-
- *read_cost= io_cost + cpu_cost;
+ uint keyno= (uint) (index_info - table->key_info);
+ *read_cost= file->ha_keyread_and_compare_time(keyno, (ulong) num_groups,
+ num_groups,
+ io_cost);
*records= num_groups;
DBUG_PRINT("info",
diff --git a/sql/opt_split.cc b/sql/opt_split.cc
index 8cb82693d96..8848c1820df 100644
--- a/sql/opt_split.cc
+++ b/sql/opt_split.cc
@@ -188,6 +188,7 @@
#include "mariadb.h"
#include "sql_select.h"
#include "opt_trace.h"
+#include "optimizer_defaults.h"
/* Info on a splitting field */
struct SplM_field_info
@@ -665,6 +666,8 @@ add_ext_keyuses_for_splitting_field(Dynamic_array<KEYUSE_EXT> *ext_keyuses,
/*
@brief
Cost of the post join operation used in specification of splittable table
+ This does not include the cost of creating the temporary table as this
+ operation can be executed many times for the same temporary table.
*/
static
@@ -673,13 +676,18 @@ double spl_postjoin_oper_cost(THD *thd, double join_record_count, uint rec_len)
double cost;
TMPTABLE_COSTS tmp_cost= get_tmp_table_costs(thd, join_record_count,
rec_len, 0, 1);
- // cost to fill tmp table
- cost= tmp_cost.create + tmp_cost.write * join_record_count;
- // cost to perform post join operation used here
+ /* cost to fill tmp table */
+ cost= tmp_cost.write * join_record_count;
+ /* cost to perform post join operation used here */
cost+= tmp_cost.lookup * join_record_count;
- cost+= (join_record_count == 0 ? 0 :
- join_record_count * log2 (join_record_count)) *
- SORT_INDEX_CMP_COST; // cost to perform sorting
+ /* cost to preform sorting */
+ /* QQQ
+ We should use cost_of_filesort() for computing sort.
+ Do we always preform sorting ? If not, this should be done conditionally
+ */
+ cost+= ((join_record_count == 0 ? 0 :
+ join_record_count * log2 (join_record_count)) *
+ SORT_INDEX_CMP_COST);
return cost;
}
@@ -873,7 +881,7 @@ void reset_validity_vars_for_keyuses(KEYUSE_EXT *key_keyuse_ext_start,
splitting the function set it as the true plan of materialization
of the table T.
The function caches the found plans for materialization of table T
- together if the info what key was used for splitting. Next time when
+ together with the info what key was used for splitting. Next time when
the optimizer prefers to use the same key the plan is taken from
the cache of plans
@@ -1004,12 +1012,11 @@ SplM_plan_info * JOIN_TAB::choose_best_splitting(double record_count,
spl_opt_info->unsplit_card : 1);
uint rec_len= table->s->rec_buff_length;
-
double split_card= spl_opt_info->unsplit_card * spl_plan->split_sel;
- double oper_cost= split_card *
- spl_postjoin_oper_cost(thd, split_card, rec_len);
- spl_plan->cost= join->best_positions[join->table_count-1].read_time +
- + oper_cost;
+ double oper_cost= (split_card *
+ spl_postjoin_oper_cost(thd, split_card, rec_len));
+ spl_plan->cost= (join->best_positions[join->table_count-1].read_time +
+ oper_cost);
if (unlikely(thd->trace_started()))
{
@@ -1030,7 +1037,7 @@ SplM_plan_info * JOIN_TAB::choose_best_splitting(double record_count,
}
if (spl_plan)
{
- if(record_count * spl_plan->cost < spl_opt_info->unsplit_cost - 0.01)
+ if (record_count * spl_plan->cost + COST_EPS < spl_opt_info->unsplit_cost)
{
/*
The best plan that employs splitting is cheaper than
@@ -1054,7 +1061,7 @@ SplM_plan_info * JOIN_TAB::choose_best_splitting(double record_count,
trace.
add("startup_cost", startup_cost).
add("splitting_cost", spl_plan->cost).
- add("records", records);
+ add("rows", records);
}
}
else
diff --git a/sql/opt_subselect.cc b/sql/opt_subselect.cc
index eed85efbb6d..93459c49c23 100644
--- a/sql/opt_subselect.cc
+++ b/sql/opt_subselect.cc
@@ -35,6 +35,7 @@
#include "sql_test.h"
#include <my_bit.h>
#include "opt_trace.h"
+#include "optimizer_defaults.h"
/*
This file contains optimizations for semi-join subqueries.
@@ -1471,8 +1472,8 @@ void get_delayed_table_estimates(TABLE *table,
hash_sj_engine->tmp_table->s->reclength);
/* Do like in handler::ha_scan_and_compare_time, but ignore the where cost */
- *scan_time= ((data_size/table->file->stats.block_size+2) *
- table->file->avg_io_cost()) + *out_rows * file->ROW_COPY_COST;
+ *scan_time= ((data_size/IO_SIZE * table->file->avg_io_cost()) +
+ *out_rows * file->ROW_COPY_COST);
}
@@ -2595,11 +2596,9 @@ bool optimize_semijoin_nests(JOIN *join, table_map all_table_map)
We don't need to check the where clause for each row, so no
WHERE_COST is needed.
*/
- scan_cost= (TABLE_SCAN_SETUP_COST +
- (cost.block_size == 0 ? 0 :
- ((rowlen * (double) sjm->rows) / cost.block_size +
- TABLE_SCAN_SETUP_COST)));
+ scan_cost= (rowlen * (double) sjm->rows) / cost.block_size;
total_cost= (scan_cost * cost.cache_hit_ratio * cost.avg_io_cost +
+ TABLE_SCAN_SETUP_COST_THD(thd) +
row_copy_cost * sjm->rows);
sjm->scan_cost.convert_from_cost(total_cost);
@@ -2699,8 +2698,6 @@ get_tmp_table_costs(THD *thd, double row_count, uint row_size, bool blobs_used,
bool add_copy_cost)
{
TMPTABLE_COSTS cost;
- double row_copy_cost= add_copy_cost ? ROW_COPY_COST_THD(thd) : 0;
-
/* From heap_prepare_hp_create_info(), assuming one hash key used */
row_size+= sizeof(char*)*2;
row_size= MY_ALIGN(MY_MAX(row_size, sizeof(char*)) + 1, sizeof(char*));
@@ -2708,24 +2705,31 @@ get_tmp_table_costs(THD *thd, double row_count, uint row_size, bool blobs_used,
if (row_count > thd->variables.max_heap_table_size / (double) row_size ||
blobs_used)
{
+ double row_copy_cost= (add_copy_cost ?
+ tmp_table_optimizer_costs.row_copy_cost :
+ 0);
/* Disk based table */
- cost.lookup= ((DISK_TEMPTABLE_LOOKUP_COST *
- thd->optimizer_cache_hit_ratio)) + row_copy_cost;
- cost.write= cost.lookup + row_copy_cost;
+ cost.lookup= ((tmp_table_optimizer_costs.key_lookup_cost *
+ tmp_table_optimizer_costs.disk_read_ratio) +
+ row_copy_cost);
+ cost.write= cost.lookup;
cost.create= DISK_TEMPTABLE_CREATE_COST;
cost.block_size= DISK_TEMPTABLE_BLOCK_SIZE;
- cost.avg_io_cost= 1.0;
- cost.cache_hit_ratio= thd->optimizer_cache_hit_ratio;
+ cost.avg_io_cost= tmp_table_optimizer_costs.disk_read_cost;
+ cost.cache_hit_ratio= tmp_table_optimizer_costs.disk_read_ratio;
}
else
{
/* Values are as they are in heap.h */
+ double row_copy_cost= (add_copy_cost ?
+ heap_optimizer_costs.row_copy_cost :
+ 0);
cost.lookup= HEAP_TEMPTABLE_LOOKUP_COST + row_copy_cost;
- cost.write= cost.lookup + row_copy_cost;
+ cost.write= cost.lookup;
cost.create= HEAP_TEMPTABLE_CREATE_COST;
- cost.block_size= 0;
- cost.avg_io_cost= HEAP_TEMPTABLE_LOOKUP_COST;
- cost.cache_hit_ratio= 1.0;
+ cost.block_size= 1;
+ cost.avg_io_cost= 0;
+ cost.cache_hit_ratio= 0;
}
return cost;
}
@@ -3196,7 +3200,7 @@ bool Sj_materialization_picker::check_qep(JOIN *join,
if (unlikely(trace.trace_started()))
{
trace.
- add("records", *record_count).
+ add("rows", *record_count).
add("cost", *read_time);
}
return TRUE;
@@ -3250,7 +3254,7 @@ bool Sj_materialization_picker::check_qep(JOIN *join,
best_access_path(join, join->positions[i].table, rem_tables,
join->positions, i,
disable_jbuf, prefix_rec_count, &curpos, &dummy);
- prefix_rec_count= COST_MULT(prefix_rec_count, curpos.records_read);
+ prefix_rec_count= COST_MULT(prefix_rec_count, curpos.records_out);
prefix_cost= COST_ADD(prefix_cost, curpos.read_time);
//TODO: take into account join condition selectivity here
}
@@ -3277,7 +3281,7 @@ bool Sj_materialization_picker::check_qep(JOIN *join,
if (unlikely(trace.trace_started()))
{
trace.
- add("records", *record_count).
+ add("rows", *record_count).
add("cost", *read_time);
}
return TRUE;
@@ -3378,7 +3382,7 @@ bool LooseScan_picker::check_qep(JOIN *join,
if (unlikely(trace.trace_started()))
{
trace.
- add("records", *record_count).
+ add("rows", *record_count).
add("cost", *read_time);
}
return TRUE;
@@ -3476,7 +3480,7 @@ bool Firstmatch_picker::check_qep(JOIN *join,
- remove fanout added by the last table
*/
if (*record_count)
- *record_count /= join->positions[idx].records_read;
+ *record_count /= join->positions[idx].records_out;
}
else
{
@@ -3497,7 +3501,7 @@ bool Firstmatch_picker::check_qep(JOIN *join,
if (unlikely(trace.trace_started()))
{
trace.
- add("records", *record_count).
+ add("rows", *record_count).
add("cost", *read_time);
}
return TRUE;
@@ -3624,21 +3628,22 @@ bool Duplicate_weedout_picker::check_qep(JOIN *join,
*/
uint first_tab= first_dupsweedout_table;
double dups_cost;
- double prefix_rec_count;
+ double first_weedout_table_rec_count;
double sj_inner_fanout= 1.0;
double sj_outer_fanout= 1.0;
uint temptable_rec_size;
if (first_tab == join->const_tables)
{
- prefix_rec_count= 1.0;
+ first_weedout_table_rec_count= 1.0;
temptable_rec_size= 0;
dups_cost= 0.0;
}
else
{
dups_cost= join->positions[first_tab - 1].prefix_cost;
- prefix_rec_count= join->positions[first_tab - 1].prefix_record_count;
+ first_weedout_table_rec_count=
+ join->positions[first_tab - 1].prefix_record_count;
temptable_rec_size= 8; /* This is not true but we'll make it so */
}
@@ -3677,17 +3682,14 @@ bool Duplicate_weedout_picker::check_qep(JOIN *join,
sj_outer_fanout,
temptable_rec_size,
0, 0);
- double write_cost=
- COST_ADD(one_cost.create,
- COST_MULT(join->positions[first_tab].prefix_record_count,
- sj_outer_fanout * one_cost.write));
- double full_lookup_cost=
- COST_MULT(join->positions[first_tab].prefix_record_count,
- COST_MULT(sj_outer_fanout,
- sj_inner_fanout * one_cost.lookup));
- *read_time= COST_ADD(dups_cost, COST_ADD(write_cost, full_lookup_cost));
+ double prefix_record_count= join->positions[first_tab].prefix_record_count;
+ double write_cost= (one_cost.create +
+ prefix_record_count * sj_outer_fanout * one_cost.write);
+ double full_lookup_cost= (prefix_record_count * sj_outer_fanout *
+ sj_inner_fanout * one_cost.lookup);
+ *read_time= dups_cost + write_cost + full_lookup_cost;
- *record_count= prefix_rec_count * sj_outer_fanout;
+ *record_count= first_weedout_table_rec_count * sj_outer_fanout;
*handled_fanout= dups_removed_fanout;
*strategy= SJ_OPT_DUPS_WEEDOUT;
if (unlikely(join->thd->trace_started()))
@@ -3695,7 +3697,10 @@ bool Duplicate_weedout_picker::check_qep(JOIN *join,
Json_writer_object trace(join->thd);
trace.
add("strategy", "DuplicateWeedout").
- add("records", *record_count).
+ add("prefix_row_count", prefix_record_count).
+ add("tmp_table_rows", sj_outer_fanout).
+ add("sj_inner_fanout", sj_inner_fanout).
+ add("rows", *record_count).
add("dups_cost", dups_cost).
add("write_cost", write_cost).
add("full_lookup_cost", full_lookup_cost).
@@ -3899,7 +3904,7 @@ static void recalculate_prefix_record_count(JOIN *join, uint start, uint end)
prefix_count= 1.0;
else
prefix_count= COST_MULT(join->best_positions[j-1].prefix_record_count,
- join->best_positions[j-1].records_read);
+ join->best_positions[j-1].records_out);
join->best_positions[j].prefix_record_count= prefix_count;
}
@@ -4051,7 +4056,7 @@ void fix_semijoin_strategies_for_picked_join_order(JOIN *join)
join->best_positions, i,
FALSE, prefix_rec_count,
join->best_positions + i, &dummy);
- prefix_rec_count *= join->best_positions[i].records_read;
+ prefix_rec_count *= join->best_positions[i].records_out;
rem_tables &= ~join->best_positions[i].table->table->map;
}
}
@@ -4093,7 +4098,7 @@ void fix_semijoin_strategies_for_picked_join_order(JOIN *join)
TRUE /* no jbuf */,
record_count, join->best_positions + idx, &dummy);
}
- record_count *= join->best_positions[idx].records_read;
+ record_count *= join->best_positions[idx].records_out;
rem_tables &= ~join->best_positions[idx].table->table->map;
}
}
@@ -4151,7 +4156,7 @@ void fix_semijoin_strategies_for_picked_join_order(JOIN *join)
}
}
rem_tables &= ~join->best_positions[idx].table->table->map;
- record_count *= join->best_positions[idx].records_read;
+ record_count *= join->best_positions[idx].records_out;
}
first_pos->sj_strategy= SJ_OPT_LOOSE_SCAN;
first_pos->n_sj_tables= my_count_bits(first_pos->table->emb_sj_nest->sj_inner_tables);
@@ -5368,7 +5373,8 @@ int setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
Got a table that's not within any semi-join nest. This is a case
like this:
- SELECT * FROM ot1, nt1 WHERE ot1.col IN (SELECT expr FROM it1, it2)
+ SELECT * FROM ot1, nt1 WHERE
+ ot1.col IN (SELECT expr FROM it1, it2)
with a join order of
@@ -6780,7 +6786,7 @@ bool JOIN::choose_subquery_plan(table_map join_tables)
Json_writer_object trace_wrapper(thd);
Json_writer_object trace_subquery(thd, "subquery_plan");
trace_subquery.
- add("records", inner_record_count_1).
+ add("rows", inner_record_count_1).
add("materialization_cost", materialize_strategy_cost).
add("in_exist_cost", in_exists_strategy_cost).
add("choosen", strategy);
diff --git a/sql/opt_subselect.h b/sql/opt_subselect.h
index 4ba90f6c60b..b0053d3db14 100644
--- a/sql/opt_subselect.h
+++ b/sql/opt_subselect.h
@@ -226,15 +226,17 @@ public:
if (!(found_part & 1 ) && /* no usable ref access for 1st key part */
s->table->covering_keys.is_set(key))
{
+ double records, read_time;
part1_conds_met= TRUE;
DBUG_PRINT("info", ("Can use full index scan for LooseScan"));
/* Calculate the cost of complete loose index scan. */
- double records= rows2double(s->table->file->stats.records);
+ records= rows2double(s->table->file->stats.records);
/* The cost is entire index scan cost (divided by 2) */
- double read_time= s->table->file->ha_keyread_and_copy_time(key, 1,
- (ha_rows) records);
+ read_time= s->table->file->ha_keyread_and_copy_time(key, 1,
+ (ha_rows) records,
+ 0);
/*
Now find out how many different keys we will get (for now we
diff --git a/sql/opt_trace.cc b/sql/opt_trace.cc
index 374fc41aba8..d7b3d83bb18 100644
--- a/sql/opt_trace.cc
+++ b/sql/opt_trace.cc
@@ -696,8 +696,8 @@ void print_best_access_for_table(THD *thd, POSITION *pos)
Json_writer_object obj(thd, "chosen_access_method");
obj.
add("type", pos->type == JT_ALL ? "scan" : join_type_str[pos->type]).
- add("records_read", pos->records_read).
- add("records_out", pos->records_out).
+ add("rows_read", pos->records_read).
+ add("rows_out", pos->records_out).
add("cost", pos->read_time).
add("uses_join_buffering", pos->use_join_buffer);
if (pos->range_rowid_filter_info)
diff --git a/sql/optimizer_costs.h b/sql/optimizer_costs.h
index de933969131..698cdbfe41e 100644
--- a/sql/optimizer_costs.h
+++ b/sql/optimizer_costs.h
@@ -18,41 +18,79 @@
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA
*/
-/* This file includes costs variables used by the optimizer */
-
/*
- The table/index cache hit ratio in %. 0 means that a searched for key or row
- will never be in the cache while 100 means it always in the cache.
-
- According to folklore, one need at least 80 % hit rate in the cache for
- MariaDB to run very well. We set CACHE_HIT_RATIO to a bit smaller
- as there is still a cost involved in finding the row in the B tree, hash
- or other seek structure.
+ This file defines costs structures and cost functions used by the optimizer
+*/
- Increasing CACHE_HIT_RATIO will make MariaDB prefer key lookups over
- table scans as the impact of ROW_COPY_COST and INDEX_COPY cost will
- have a larger impact when more rows are exmined..
- Note that avg_io_cost() is multipled with this constant!
+/*
+ OPTIMIZER_COSTS stores cost variables for each engine. They are stored
+ in linked_optimizer_costs (pointed to by handlerton) and TABLE_SHARE.
*/
-#define DEFAULT_CACHE_HIT_RATIO 50
-/* Convert ratio to cost */
-
-static inline double cache_hit_ratio(uint ratio)
+#define OPTIMIZER_COST_UNDEF -1.0
+struct OPTIMIZER_COSTS
{
- return (((double) (100 - ratio)) / 100.0);
-}
+ double disk_read_cost;
+ double index_block_copy_cost;
+ double key_cmp_cost;
+ double key_copy_cost;
+ double key_lookup_cost;
+ double key_next_find_cost;
+ double disk_read_ratio;
+ double row_copy_cost;
+ double row_lookup_cost;
+ double row_next_find_cost;
+ double rowid_cmp_cost;
+ double rowid_copy_cost;
+ double initialized; // Set if default or connected with handlerton
+};
+
+/* Default optimizer costs */
+extern OPTIMIZER_COSTS default_optimizer_costs;
+/*
+ These are used to avoid taking mutex while creating tmp tables
+ These are created once after the server is started so they are
+ not dynamic.
+*/
+extern OPTIMIZER_COSTS heap_optimizer_costs, tmp_table_optimizer_costs;
/*
- Base cost for finding keys and rows from the engine is 1.0
- All other costs should be proportional to these
+ Interface to the engine cost variables. See optimizer_defaults.h for
+ the default values.
*/
-/* Cost for finding the first key in a key scan */
-#define KEY_LOOKUP_COST ((double) 1.0)
-/* Cost of finding a key from a row_ID (not used for clustered keys) */
-#define ROW_LOOKUP_COST ((double) 1.0)
+#define DISK_READ_RATIO costs->disk_read_ratio
+#define KEY_LOOKUP_COST costs->key_lookup_cost
+#define ROW_LOOKUP_COST costs->row_lookup_cost
+#define INDEX_BLOCK_COPY_COST costs->index_block_copy_cost
+#define KEY_COPY_COST costs->key_copy_cost
+#define ROW_COPY_COST costs->row_copy_cost
+#define ROW_COPY_COST_THD(THD) default_optimizer_costs.row_copy_cost
+#define KEY_NEXT_FIND_COST costs->key_next_find_cost
+#define ROW_NEXT_FIND_COST costs->row_next_find_cost
+#define KEY_COMPARE_COST costs->key_cmp_cost
+#define SORT_INDEX_CMP_COST default_optimizer_costs.key_cmp_cost
+#define DISK_READ_COST costs->disk_read_cost
+#define DISK_READ_COST_THD(thd) default_optimizer_costs.disk_read_cost
+
+/* Cost of comparing two rowids. This is set relative to KEY_COMPARE_COST */
+#define ROWID_COMPARE_COST costs->rowid_cmp_cost
+#define ROWID_COMPARE_COST_THD(THD) default_optimizer_costs.rowid_cmp_cost
+
+/* Cost of comparing two rowids. This is set relative to KEY_COPY_COST */
+#define ROWID_COPY_COST costs->rowid_copy_cost
+
+/* Engine unrelated costs. Stored in THD so that the user can change them */
+#define WHERE_COST optimizer_where_cost
+#define WHERE_COST_THD(THD) ((THD)->variables.optimizer_where_cost)
+#define TABLE_SCAN_SETUP_COST optimizer_scan_setup_cost
+#define TABLE_SCAN_SETUP_COST_THD(THD) (THD)->variables.optimizer_scan_setup_cost
+#define INDEX_SCAN_SETUP_COST optimizer_scan_setup_cost/2
+
+/* Default fill factors of an (b-tree) index block is assumed to be 0.75 */
+#define INDEX_BLOCK_FILL_FACTOR_DIV 3
+#define INDEX_BLOCK_FILL_FACTOR_MUL 4
/*
These constants impact the cost of QSORT and priority queue sorting,
@@ -68,94 +106,13 @@ static inline double cache_hit_ratio(uint ratio)
*/
#define QSORT_SORT_SLOWNESS_CORRECTION_FACTOR (0.1)
#define PQ_SORT_SLOWNESS_CORRECTION_FACTOR (0.1)
-/*
- Cost of finding and copying keys from the storage engine index cache to
- an internal cache as part of an index scan.
- Used in handler::keyread_time()
-*/
-#define DEFAULT_INDEX_BLOCK_COPY_COST ((double) 1 / 5.0)
-#define INDEX_BLOCK_COPY_COST(THD) ((THD)->variables.optimizer_index_block_copy_cost)
-
-/*
- Cost of finding the next row during table scan and copying it to
- 'table->record'.
- If this is too small, then table scans will be prefered over 'ref'
- as with table scans there are no key read (KEY_LOOKUP_COST), fewer
- disk reads but more record copying and row comparisions. If it's
- too big then MariaDB will used key lookup even when table scan is
- better.
-*/
-#define DEFAULT_ROW_COPY_COST ((double) 1.0 / 20.0)
-#define ROW_COPY_COST optimizer_row_copy_cost
-#define ROW_COPY_COST_THD(THD) ((THD)->variables.optimizer_row_copy_cost)
/*
Creating a record from the join cache is faster than getting a row from
the engine. JOIN_CACHE_ROW_COPY_COST_FACTOR is the factor used to
take this into account. This is multiplied with ROW_COPY_COST.
*/
-#define JOIN_CACHE_ROW_COPY_COST_FACTOR 0.75
-
-/*
- Cost of finding the next key during index scan and copying it to
- 'table->record'
-
- If this is too small, then index scans will be prefered over 'ref'
- as with table scans there are no key read (KEY_LOOKUP_COST) and
- fewer disk reads.
-*/
-#define DEFAULT_KEY_COPY_COST ((double) 1.0 / 40.0)
-#define KEY_COPY_COST optimizer_key_copy_cost
-#define KEY_COPY_COST_THD(THD) ((THD)->variables.optimizer_key_copy_cost)
-
-/*
- Cost of finding the next index entry and checking it against filter
- This cost is very low as it's done inside the storage engine.
- Should be smaller than KEY_COPY_COST.
- */
-#define DEFAULT_KEY_NEXT_FIND_COST ((double) 1.0 / 80.0)
-#define KEY_NEXT_FIND_COST optimizer_next_find_cost
-
-/**
- The following is used to decide if MariaDB should use table scanning
- instead of reading with keys. The number says how many evaluation of the
- WHERE clause is comparable to reading one extra row from a table.
-*/
-#define DEFAULT_WHERE_COST (1 / 5.0)
-#define WHERE_COST optimizer_where_cost
-#define WHERE_COST_THD(THD) ((THD)->variables.optimizer_where_cost)
-
-#define DEFAULT_KEY_COMPARE_COST (1 / 20.0)
-#define KEY_COMPARE_COST optimizer_key_cmp_cost
-
-/*
- Cost of comparing two rowids. This is set relative to KEY_COMPARE_COST
- This is usally just a memcmp!
-*/
-#define ROWID_COMPARE_COST KEY_COMPARE_COST/10.0
-#define ROWID_COMPARE_COST_THD(THD) ((THD)->variables.KEY_COMPARE_COST / 10.0)
-
-/*
- Setup cost for different operations
-*/
-
-/* Extra cost for doing a range scan. Used to prefer 'ref' over range */
-#define MULTI_RANGE_READ_SETUP_COST (double) (1.0 / 50.0)
-
-/*
- These costs are mainly to handle small tables, like the one we have in the
- mtr test suite
-*/
-/* Extra cost for full table scan. Used to prefer range over table scans */
-#define TABLE_SCAN_SETUP_COST 1.0
-/* Extra cost for full index scan. Used to prefer range over index scans */
-#define INDEX_SCAN_SETUP_COST 1.0
-
-/*
- The lower bound of accepted rows when using filter.
- This is used to ensure that filters are not too agressive.
-*/
-#define MIN_ROWS_AFTER_FILTERING 1.0
+#define JOIN_CACHE_ROW_COPY_COST_FACTOR(thd) 1.0
/*
cost1 is better that cost2 only if cost1 + COST_EPS < cost2
@@ -163,33 +120,8 @@ static inline double cache_hit_ratio(uint ratio)
when there are identical plans. Without COST_EPS some plans in the
test suite would vary depending on floating point calculations done
in different paths.
- */
-#define COST_EPS 0.0001
-
-/*
- For sequential disk seeks the cost formula is:
- DISK_SEEK_BASE_COST + DISK_SEEK_PROP_COST * #blocks_to_skip
-
- The cost of average seek
- DISK_SEEK_BASE_COST + DISK_SEEK_PROP_COST*BLOCKS_IN_AVG_SEEK =1.0.
-*/
-#define DISK_SEEK_BASE_COST ((double)0.9)
-
-#define BLOCKS_IN_AVG_SEEK 128
-
-#define DISK_SEEK_PROP_COST ((double)0.1/BLOCKS_IN_AVG_SEEK)
-
-/*
- Subquery materialization-related constants
*/
-/* This should match ha_heap::read_time() */
-#define HEAP_TEMPTABLE_LOOKUP_COST 0.05
-#define HEAP_TEMPTABLE_CREATE_COST 1.0
-#define DISK_TEMPTABLE_LOOKUP_COST 1.0
-#define DISK_TEMPTABLE_CREATE_COST TMPFILE_CREATE_COST*2 /* 2 tmp tables */
-#define DISK_TEMPTABLE_BLOCK_SIZE 8192
-
-#define SORT_INDEX_CMP_COST 0.02
+#define COST_EPS 0.0000001
#define COST_MAX (DBL_MAX * (1.0 - DBL_EPSILON))
@@ -207,4 +139,22 @@ static inline double COST_MULT(double c, double f)
return (COST_MAX / (f) > (c) ? (c) * (f) : COST_MAX);
}
+OPTIMIZER_COSTS *get_optimizer_costs(const LEX_CSTRING *cache_name);
+OPTIMIZER_COSTS *create_optimizer_costs(const char *name, size_t length);
+OPTIMIZER_COSTS *get_or_create_optimizer_costs(const char *name,
+ size_t length);
+bool create_default_optimizer_costs();
+void copy_tmptable_optimizer_costs();
+void free_all_optimizer_costs();
+struct TABLE;
+
+extern "C"
+{
+ typedef int (*process_optimizer_costs_t) (const LEX_CSTRING *,
+ const OPTIMIZER_COSTS *,
+ TABLE *);
+ bool process_optimizer_costs(process_optimizer_costs_t func, TABLE *param);
+}
+
+
#endif /* OPTIMIZER_COSTS_INCLUDED */
diff --git a/sql/optimizer_defaults.h b/sql/optimizer_defaults.h
new file mode 100644
index 00000000000..8d74bb91cc3
--- /dev/null
+++ b/sql/optimizer_defaults.h
@@ -0,0 +1,183 @@
+#ifndef OPTIMIZER_DEFAULTS_INCLUDED
+#define OPTIMIZER_DEFAULTS_INCLUDED
+/*
+ Copyright (c) 2022, MariaDB AB
+
+ This program is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License
+ as published by the Free Software Foundation; version 2 of
+ the License.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA
+*/
+
+/*
+ This file contains costs constants used by the optimizer
+ All costs should be based on milliseconds (1 cost = 1 ms)
+*/
+
+/* Cost for finding the first key in a key scan */
+#define DEFAULT_KEY_LOOKUP_COST ((double) 0.000435777)
+
+/* Cost of finding a row based on row_ID */
+#define DEFAULT_ROW_LOOKUP_COST ((double) 0.000130839)
+
+/*
+ Cost of finding and copying key and row blocks from the storage
+ engine index cache to an internal cache as part of an index
+ scan. This includes all mutexes that needs to be taken to get
+ exclusive access to a page. The number is taken from accessing an
+ existing blocks from Aria page cache.
+ Used in handler::scan_time() and handler::keyread_time()
+*/
+#define DEFAULT_INDEX_BLOCK_COPY_COST ((double) 3.56e-05)
+
+/*
+ Cost of copying a row to 'table->record'.
+ Used by scan_time() and rnd_pos_time() methods.
+
+ If this is too small, then table scans will be prefered over 'ref'
+ as with table scans there are no key read (KEY_LOOKUP_COST), fewer
+ disk reads but more record copying and row comparisions. If it's
+ too big then MariaDB will used key lookup even when table scan is
+ better.
+*/
+#define DEFAULT_ROW_COPY_COST ((double) 0.000060866)
+
+/*
+ Cost of copying the key to 'table->record'
+
+ If this is too small, then, for small tables, index scans will be
+ prefered over 'ref' as with index scans there are fewer disk reads.
+*/
+#define DEFAULT_KEY_COPY_COST ((double) 0.000015685)
+
+/*
+ Cost of finding the next index entry and checking its rowid against filter
+ This cost is very low as it's done inside the storage engine.
+ Should be smaller than KEY_COPY_COST.
+ */
+#define DEFAULT_KEY_NEXT_FIND_COST ((double) 0.000082347)
+
+/* Cost of finding the next row when scanning a table */
+#define DEFAULT_ROW_NEXT_FIND_COST ((double) 0.000045916)
+
+/**
+ The cost of executing the WHERE clause as part of any row check.
+ Increasing this would force the optimizer to use row combinations
+ that reads fewer rows.
+ The default cost comes from recording times from a simple where clause that
+ compares two fields (date and a double) with constants.
+*/
+#define DEFAULT_WHERE_COST ((double) 3.2e-05)
+
+/* The cost of comparing a key when using range access or sorting */
+#define DEFAULT_KEY_COMPARE_COST 0.000011361
+
+/* Rowid compare is usually just a single memcmp of a short string */
+#define DEFAULT_ROWID_COMPARE_COST 0.000002653
+/* Rowid copy is usually just a single memcpy of a short string */
+#define DEFAULT_ROWID_COPY_COST 0.000002653
+
+/*
+ Average disk seek time on a hard disk is 8-10 ms, which is also
+ about the time to read a IO_SIZE (8192) block.
+
+ A medium ssd is about 400MB/second, which gives us the time for
+ reading an IO_SIZE block to IO_SIZE/400000000 = 0.0000204 sec= 0.02 ms.
+*/
+#define DEFAULT_DISK_READ_COST ((double) IO_SIZE / 400000000.0 * 1000)
+
+/*
+ The follwoing is an old comment for hard-disks, please ignore the
+ following, except if you like history:
+
+ For sequential hard disk seeks the cost formula is:
+ DISK_SEEK_BASE_COST + DISK_SEEK_PROP_COST * #blocks_to_skip
+
+ The cost of average seek
+ DISK_SEEK_BASE_COST + DISK_SEEK_PROP_COST*BLOCKS_IN_AVG_SEEK = 10.
+*/
+
+
+/*
+ The table/index cache_miss/total_cache_request ratio.
+ 1.0 means that a searched for key or row will never be in the cache while
+ 0.0 means it always in the cache (and we don't have to do any disk reads).
+
+ According to folklore, one should not have to access disk for more
+ than 20% of the cache request for MariaDB to run very well.
+ However in practice when we read rows or keys in a query, we will often
+ read the same row over and over again. Because of this we set
+ DEFAULT_DISK_READ_RATIO to 0.20/10 = 0.02.
+
+ Increasing DISK_READ_RATIO will make MariaDB prefer key lookup over
+ table scans as the impact of ROW_COPY_COST and INDEX_COPY cost will
+ have a larger impact when more rows are examined..
+
+ We are not yet taking into account cache usage statistics as this
+ could confuse users as the EXPLAIN and costs for a query would change
+ between to query calls, which may confuse users (and also make the
+ mtr tests very unpredictable).
+
+ Note that the engine's avg_io_cost() (DEFAULT_DISK_READ_COST by default)
+ is multiplied with this constant!
+*/
+
+#define DEFAULT_DISK_READ_RATIO 0.02
+
+/*
+ The following costs are mainly to ensure we don't do table and index
+ scans for small tables, like the one we have in the mtr test suite.
+
+ This is mostly to keep the mtr tests use indexes (as the optimizer would
+ if the tables are large). It will also ensure that EXPLAIN is showing
+ more key user for users where they are testing queries with small tables
+ at the start of projects.
+ This is probably OK for most a the execution time difference between table
+ scan and index scan compared to key lookups so small when using small
+ tables. It also helps to fill the index cache which will help mitigate
+ the speed difference.
+*/
+
+/*
+ Extra cost for full table and index scan. Used to prefer key and range
+ over index and table scans
+
+ INDEX_SCAN_SETUP_COST (defined in optimizer_costs.h) is half of
+ table_scan_setup_cost to get the optimizer to prefer index scans to table
+ scans as key copy is faster than row copy and index blocks provides
+ more information in the cache.
+
+ This will also help MyISAM as with MyISAM the table scans has a cost
+ very close to index scans (they are fast but require a read call
+ that we want to avoid even if it's small).
+
+ 10 usec is about 10 MyISAM row lookups with optimizer_disk_read_ratio= 0.02
+*/
+#define DEFAULT_TABLE_SCAN_SETUP_COST 0.01 // 10 usec
+
+/* Extra cost for doing a range scan. Used to prefer 'ref' over range */
+#define MULTI_RANGE_READ_SETUP_COST KEY_LOOKUP_COST
+
+/*
+ Temporary file and temporary table related costs
+ Used with subquery materialization, derived tables etc
+*/
+
+#define TMPFILE_CREATE_COST 0.5 // Cost of creating and deleting files
+#define HEAP_TEMPTABLE_CREATE_COST 0.025 // ms
+/* Cost taken from HEAP_LOOKUP_COST in ha_heap.cc */
+#define HEAP_TEMPTABLE_LOOKUP_COST (0.00016097*1000 + heap_optimizer_costs.row_copy_cost)
+#define DISK_TEMPTABLE_LOOKUP_COST(thd) (tmp_table_optimizer_costs.key_lookup_cost + tmp_table_optimizer_costs.row_lookup_cost + tmp_table_optimizer_costs.row_copy_cost)
+#define DISK_TEMPTABLE_CREATE_COST TMPFILE_CREATE_COST*2 // 2 tmp tables
+#define DISK_TEMPTABLE_BLOCK_SIZE IO_SIZE
+
+#endif /* OPTIMIZER_DEFAULTS_INCLUDED */
diff --git a/sql/rowid_filter.cc b/sql/rowid_filter.cc
index c0f7fe0755a..4f713edb47f 100644
--- a/sql/rowid_filter.cc
+++ b/sql/rowid_filter.cc
@@ -32,7 +32,7 @@ lookup_cost(Rowid_filter_container_type cont_type)
{
switch (cont_type) {
case SORTED_ARRAY_CONTAINER:
- return log(est_elements)*0.01+key_next_find_cost;
+ return log(est_elements) * rowid_compare_cost + base_lookup_cost;
default:
DBUG_ASSERT(0);
return 0;
@@ -125,11 +125,13 @@ void Range_rowid_filter_cost_info::init(Rowid_filter_container_type cont_type,
key_no= idx;
est_elements= (ulonglong) table->opt_range[key_no].rows;
cost_of_building_range_filter= build_cost(container_type);
+
where_cost= tab->in_use->variables.optimizer_where_cost;
- key_next_find_cost= tab->in_use->variables.optimizer_key_next_find_cost;
+ base_lookup_cost= tab->file->ROW_NEXT_FIND_COST;
+ rowid_compare_cost= tab->file->ROWID_COMPARE_COST;
selectivity= est_elements/((double) table->stat_records());
gain= avg_access_and_eval_gain_per_row(container_type,
- tab->file->optimizer_cache_cost);
+ tab->file->ROW_LOOKUP_COST);
if (gain > 0)
cross_x= cost_of_building_range_filter/gain;
else
@@ -147,15 +149,18 @@ double
Range_rowid_filter_cost_info::build_cost(Rowid_filter_container_type cont_type)
{
double cost;
+ OPTIMIZER_COSTS *costs= &table->s->optimizer_costs;
DBUG_ASSERT(table->opt_range_keys.is_set(key_no));
- cost= table->opt_range[key_no].index_only_fetch_cost(table->in_use);
+ /* Cost of fetching keys */
+ cost= table->opt_range[key_no].index_only_fetch_cost(table);
switch (cont_type) {
-
case SORTED_ARRAY_CONTAINER:
- cost+= ARRAY_WRITE_COST * est_elements; /* cost filling the container */
- cost+= ARRAY_SORT_C * est_elements * log(est_elements); /* sorting cost */
+ /* Add cost of filling container and cost of sorting */
+ cost= (est_elements *
+ (costs->rowid_copy_cost + // Copying rowid
+ costs->rowid_cmp_cost * log2(est_elements))); // Sort
break;
default:
DBUG_ASSERT(0);
diff --git a/sql/rowid_filter.h b/sql/rowid_filter.h
index f7386631c3d..91ffd8c065a 100644
--- a/sql/rowid_filter.h
+++ b/sql/rowid_filter.h
@@ -143,20 +143,6 @@ class SQL_SELECT;
class Rowid_filter_container;
class Range_rowid_filter_cost_info;
-/*
- Cost to write rowid into array. Assume inserting 1000 row id's into the
- array has same cost as a 'disk io' or key fetch
-*/
-#define ARRAY_WRITE_COST 0.001
-/*
- Factor used to calculate cost of sorting rowids in array
- This is multiplied by 'elements * log(elements)', so this factor
- has a very high cost weight!
- A value of 0.001 will have 200 rows have a cost of 1.05 and
- 1000 rows a cost of 6.90.
-*/
-#define ARRAY_SORT_C 0.001
-
typedef enum
{
SORTED_ARRAY_CONTAINER,
@@ -406,7 +392,8 @@ class Range_rowid_filter_cost_info final: public Sql_alloc
/* The index whose range scan would be used to build the range filter */
uint key_no;
double cost_of_building_range_filter;
- double where_cost, key_next_find_cost;
+ double where_cost, base_lookup_cost, rowid_compare_cost;
+
/*
(gain*row_combinations)-cost_of_building_range_filter yields the gain of
the filter for 'row_combinations' key tuples of the index key_no
diff --git a/sql/set_var.cc b/sql/set_var.cc
index 8cb5fcd4870..274ee07f07d 100644
--- a/sql/set_var.cc
+++ b/sql/set_var.cc
@@ -310,7 +310,13 @@ do { \
case SHOW_HA_ROWS: do_num_val (ha_rows,CMD);
#define case_for_double(CMD) \
- case SHOW_DOUBLE: do_num_val (double,CMD)
+ case SHOW_DOUBLE: do_num_val (double,CMD); \
+ case SHOW_OPTIMIZER_COST: \
+ { \
+ double val= ((*(double*) value) == OPTIMIZER_COST_UNDEF ? OPTIMIZER_COST_UNDEF : \
+ (*(double*) value) * 1000); \
+ CMD; \
+ } while (0)
#define case_get_string_as_lex_string \
case SHOW_CHAR: \
diff --git a/sql/set_var.h b/sql/set_var.h
index 570703a8222..38a395adf0f 100644
--- a/sql/set_var.h
+++ b/sql/set_var.h
@@ -84,7 +84,7 @@ protected:
typedef bool (*on_update_function)(sys_var *self, THD *thd, enum_var_type type);
int flags; ///< or'ed flag_enum values
- const SHOW_TYPE show_val_type; ///< what value_ptr() returns for sql_show.cc
+ SHOW_TYPE show_val_type; ///< what value_ptr() returns for sql_show.cc
PolyLock *guard; ///< *second* lock that protects the variable
ptrdiff_t offset; ///< offset to the value from global_system_variables
on_check_function on_check;
diff --git a/sql/sql_bitmap.h b/sql/sql_bitmap.h
index 02dc8198c7c..61b3df2d086 100644
--- a/sql/sql_bitmap.h
+++ b/sql/sql_bitmap.h
@@ -270,13 +270,21 @@ public:
{
return buffer[0];
}
- uint bits_set()
+ uint bits_set() const
{
uint res= 0;
for (size_t i= 0; i < ARRAY_ELEMENTS; i++)
- res += my_count_bits(buffer[i]);
+ if (buffer[i])
+ res+= my_count_bits(buffer[i]);
return res;
}
+ uint find_first_bit() const
+ {
+ for (size_t i= 0; i < ARRAY_ELEMENTS; i++)
+ if (buffer[i])
+ return (uint)i*BITS_PER_ELEMENT + my_find_first_bit(buffer[i]);
+ return width;
+ }
class Iterator
{
const Bitmap& map;
diff --git a/sql/sql_class.cc b/sql/sql_class.cc
index e3fd96822e4..17ce7b3c048 100644
--- a/sql/sql_class.cc
+++ b/sql/sql_class.cc
@@ -1236,7 +1236,6 @@ void THD::init()
*/
variables.pseudo_thread_id= thread_id;
variables.default_master_connection.str= default_master_connection_buff;
- optimizer_cache_hit_ratio= cache_hit_ratio(variables.optimizer_cache_hit_ratio);
::strmake(default_master_connection_buff,
global_system_variables.default_master_connection.str,
variables.default_master_connection.length);
diff --git a/sql/sql_class.h b/sql/sql_class.h
index 047c9516d47..6cdb553629c 100644
--- a/sql/sql_class.h
+++ b/sql/sql_class.h
@@ -692,9 +692,7 @@ typedef struct system_variables
ulonglong slave_skip_counter;
ulonglong max_relay_log_size;
- double optimizer_index_block_copy_cost, optimizer_key_next_find_cost;
- double optimizer_row_copy_cost, optimizer_key_copy_cost;
- double optimizer_where_cost, optimizer_key_cmp_cost;
+ double optimizer_where_cost, optimizer_scan_setup_cost;
double long_query_time_double, max_statement_time_double;
double sample_percentage;
@@ -793,7 +791,6 @@ typedef struct system_variables
uint group_concat_max_len;
uint eq_range_index_dive_limit;
- uint optimizer_cache_hit_ratio; // Stored in handler::optimizer_cache_cost
uint idle_transaction_timeout;
uint idle_readonly_transaction_timeout;
uint idle_write_transaction_timeout;
@@ -831,7 +828,6 @@ typedef struct system_variables
my_bool session_track_user_variables;
#endif // USER_VAR_TRACKING
my_bool tcp_nodelay;
-
plugin_ref table_plugin;
plugin_ref tmp_table_plugin;
plugin_ref enforced_table_plugin;
@@ -2674,7 +2670,6 @@ public:
struct system_status_var org_status_var; // For user statistics
struct system_status_var *initial_status_var; /* used by show status */
THR_LOCK_INFO lock_info; // Locking info of this thread
- double optimizer_cache_hit_ratio; // From optimizer_cache_hit_ratio
/**
Protects THD data accessed from other threads:
@@ -7424,6 +7419,13 @@ inline void handler::decrement_statistics(ulong SSV::*offset) const
status_var_decrement(table->in_use->status_var.*offset);
}
+/* Update references in the handler to the table */
+
+inline void handler::set_table(TABLE* table_arg)
+{
+ table= table_arg;
+ costs= &table_arg->s->optimizer_costs;
+}
inline int handler::ha_ft_read(uchar *buf)
{
diff --git a/sql/sql_const.h b/sql/sql_const.h
index 59faf94dd15..1e5fef4af36 100644
--- a/sql/sql_const.h
+++ b/sql/sql_const.h
@@ -121,11 +121,11 @@
/*
This is used when reading large blocks, sequential read.
- We assume that reading this much will be the same cost as 1 seek / fetching
- one row from the storage engine.
+ We assume that reading this much will be roughly the same cost as 1
+ seek / fetching one row from the storage engine.
+ Cost of one read of DISK_CHUNK_SIZE is DISK_SEEK_BASE_COST (ms).
*/
#define DISK_CHUNK_SIZE (uint) (65536) /* Size of diskbuffer for tmpfiles */
-#define TMPFILE_CREATE_COST 2.0 /* Creating and deleting tmp file */
#define FRM_VER_TRUE_VARCHAR (FRM_VER+4) /* 10 */
#define FRM_VER_EXPRESSSIONS (FRM_VER+5) /* 11 */
@@ -204,8 +204,14 @@
#define MIN_ROWS_TO_USE_TABLE_CACHE 100
#define MIN_ROWS_TO_USE_BULK_INSERT 100
+/*
+ The lower bound of accepted rows when using filter.
+ This is used to ensure that filters are not too agressive.
+*/
+#define MIN_ROWS_AFTER_FILTERING 1.0
+
/**
- Number of rows in a reference table when refereed through a not unique key.
+ Number of rows in a reference table when refered through a not unique key.
This value is only used when we don't know anything about the key
distribution.
*/
diff --git a/sql/sql_explain.cc b/sql/sql_explain.cc
index 7c3e87d42e3..b4adfe8546c 100644
--- a/sql/sql_explain.cc
+++ b/sql/sql_explain.cc
@@ -1379,10 +1379,12 @@ double Explain_table_access::get_r_filtered()
}
-int Explain_table_access::print_explain(select_result_sink *output, uint8 explain_flags,
+int Explain_table_access::print_explain(select_result_sink *output,
+ uint8 explain_flags,
bool is_analyze,
uint select_id, const char *select_type,
- bool using_temporary, bool using_filesort)
+ bool using_temporary,
+ bool using_filesort)
{
THD *thd= output->thd; // note: for SHOW EXPLAIN, this is target thd.
MEM_ROOT *mem_root= thd->mem_root;
@@ -2011,6 +2013,9 @@ void Explain_table_access::print_explain_json(Explain_query *query,
writer->add_double(jbuf_tracker.get_filtered_after_where()*100.0);
else
writer->add_null();
+
+ writer->add_member("r_unpack_time_ms");
+ writer->add_double(jbuf_unpack_tracker.get_time_ms());
}
}
diff --git a/sql/sql_explain.h b/sql/sql_explain.h
index 38c5c3e6595..42a1c360e5b 100644
--- a/sql/sql_explain.h
+++ b/sql/sql_explain.h
@@ -753,7 +753,7 @@ public:
class Explain_table_access : public Sql_alloc
{
public:
- Explain_table_access(MEM_ROOT *root) :
+ Explain_table_access(MEM_ROOT *root, bool timed) :
derived_select_number(0),
non_merged_sjm_number(0),
extra_tags(root),
@@ -766,6 +766,7 @@ public:
pushed_index_cond(NULL),
sjm_nest(NULL),
pre_join_sort(NULL),
+ jbuf_unpack_tracker(timed),
rowid_filter(NULL)
{}
~Explain_table_access() { delete sjm_nest; }
@@ -874,6 +875,7 @@ public:
Gap_time_tracker extra_time_tracker;
Table_access_tracker jbuf_tracker;
+ Time_and_counter_tracker jbuf_unpack_tracker;
Explain_rowid_filter *rowid_filter;
diff --git a/sql/sql_join_cache.cc b/sql/sql_join_cache.cc
index 45419d7e5b1..27eb9a0c83e 100644
--- a/sql/sql_join_cache.cc
+++ b/sql/sql_join_cache.cc
@@ -1600,6 +1600,7 @@ bool JOIN_CACHE::put_record()
bool JOIN_CACHE::get_record()
{
bool res;
+ ANALYZE_START_TRACKING(thd(), join_tab->jbuf_unpack_tracker);
uchar *prev_rec_ptr= 0;
if (with_length)
pos+= size_of_rec_len;
@@ -1615,6 +1616,7 @@ bool JOIN_CACHE::get_record()
if (prev_cache)
prev_cache->get_record_by_pos(prev_rec_ptr);
}
+ ANALYZE_STOP_TRACKING(thd(), join_tab->jbuf_unpack_tracker);
return res;
}
diff --git a/sql/sql_plugin.h b/sql/sql_plugin.h
index d4df8c6468f..df5cd37c3c6 100644
--- a/sql/sql_plugin.h
+++ b/sql/sql_plugin.h
@@ -24,6 +24,7 @@
#define SHOW_always_last SHOW_KEY_CACHE_LONG, \
SHOW_HAVE, SHOW_MY_BOOL, SHOW_HA_ROWS, SHOW_SYS, \
SHOW_LONG_NOFLUSH, SHOW_LEX_STRING, SHOW_ATOMIC_COUNTER_UINT32_T, \
+ SHOW_OPTIMIZER_COST, \
/* SHOW_*_STATUS must be at the end, SHOW_LONG_STATUS being first */ \
SHOW_LONG_STATUS, SHOW_DOUBLE_STATUS, SHOW_LONGLONG_STATUS, \
SHOW_UINT32_STATUS
diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 1dffb98075c..e92f22ebc4a 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -47,6 +47,7 @@
// print_sjm, print_plan, TEST_join
#include "records.h" // init_read_record, end_read_record
#include "filesort.h" // filesort_free_buffers
+#include "filesort_utils.h" // get_qsort_sort_cost
#include "sql_union.h" // mysql_union
#include "opt_subselect.h"
#include "sql_derived.h"
@@ -69,6 +70,7 @@
#include "opt_trace.h"
#include "derived_handler.h"
#include "create_tmp_table.h"
+#include "optimizer_defaults.h"
/*
A key part number that means we're using a fulltext scan.
@@ -100,14 +102,7 @@
#define crash_if_first_double_is_bigger(A,B) DBUG_ASSERT(((A) == 0.0 && (B) == 0.0) || (A)/(B) < 1.0000001)
-#define double_to_rows(A) ((A) >= ((double)HA_POS_ERROR) ? HA_POS_ERROR : (ha_rows) (A))
-
-/* Cost for reading a row through an index */
-struct INDEX_READ_COST
-{
- double read_cost;
- double index_only_cost;
-};
+#define double_to_rows(A) ((A) >= ((double)HA_ROWS_MAX) ? HA_ROWS_MAX : (ha_rows) (A))
const char *join_type_str[]={ "UNKNOWN","system","const","eq_ref","ref",
"MAYBE_REF","ALL","range","index","fulltext",
@@ -258,7 +253,6 @@ static COND *make_cond_for_table_from_pred(THD *thd, Item *root_cond,
bool is_top_and_level);
static Item* part_of_refkey(TABLE *form,Field *field);
-uint find_shortest_key(TABLE *table, const key_map *usable_keys);
static bool test_if_cheaper_ordering(const JOIN_TAB *tab,
ORDER *order, TABLE *table,
key_map usable_keys, int key,
@@ -332,7 +326,8 @@ static bool find_order_in_list(THD *, Ref_ptr_array, TABLE_LIST *, ORDER *,
List<Item> &, List<Item> &, bool, bool, bool);
static double table_after_join_selectivity(JOIN *join, uint idx, JOIN_TAB *s,
- table_map rem_tables);
+ table_map rem_tables,
+ double *records_out);
void set_postjoin_aggr_write_func(JOIN_TAB *tab);
static Item **get_sargable_cond(JOIN *join, TABLE *table);
@@ -434,7 +429,7 @@ bool dbug_user_var_equals_str(THD *thd, const char *name, const char* value)
POSITION::POSITION()
{
table= 0;
- records_read= cond_selectivity= read_time= records_out= 0.0;
+ records_read= cond_selectivity= read_time= records_out= records_init= 0.0;
prefix_record_count= 0.0;
key= 0;
forced_index= 0;
@@ -1903,6 +1898,13 @@ int JOIN::optimize()
res= build_explain();
optimization_state= JOIN::OPTIMIZATION_DONE;
}
+
+ /*
+ Store the cost of this query into a user variable
+ TODO: calculate a correct cost for a query with subqueries and UNIONs.
+ */
+ if (select_lex->select_number == 1)
+ thd->status_var.last_query_cost= best_read;
return res;
}
@@ -2052,6 +2054,7 @@ JOIN::optimize_inner()
{
DBUG_ENTER("JOIN::optimize_inner");
subq_exit_fl= false;
+ best_read= 0.0;
DEBUG_SYNC(thd, "before_join_optimize");
THD_STAGE_INFO(thd, stage_optimizing);
@@ -3598,7 +3601,7 @@ bool JOIN::make_aggr_tables_info()
TABLE* table= create_tmp_table(thd, curr_tab->tmp_table_param,
all_fields,
NULL, distinct,
- TRUE, select_options, HA_POS_ERROR,
+ TRUE, select_options, HA_ROWS_MAX,
&empty_clex_str, !need_tmp,
keep_row_order);
if (!table)
@@ -4243,7 +4246,7 @@ bool
JOIN::add_sorting_to_table(JOIN_TAB *tab, ORDER *order)
{
tab->filesort=
- new (thd->mem_root) Filesort(order, HA_POS_ERROR, tab->keep_current_rowid,
+ new (thd->mem_root) Filesort(order, HA_ROWS_MAX, tab->keep_current_rowid,
tab->select);
if (!tab->filesort)
return true;
@@ -5280,7 +5283,6 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
DYNAMIC_ARRAY *keyuse_array)
{
int error= 0;
- TABLE *UNINIT_VAR(table); /* inited in all loops */
uint i,table_count,const_count,key;
uint sort_space;
table_map found_const_table_map, all_table_map;
@@ -5341,8 +5343,9 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
for (s= stat, i= 0; (tables= ti++); s++, i++)
{
TABLE_LIST *embedding= tables->embedding;
+ TABLE *table= tables->table;
stat_vector[i]=s;
- table_vector[i]=s->table=table=tables->table;
+ table_vector[i]= s->table= table;
s->tab_list= tables;
table->pos_in_table_list= tables;
error= tables->fetch_number_of_rows();
@@ -5475,7 +5478,7 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
for (s= stat ; s < stat_end ; s++)
{
- table= s->table;
+ TABLE *table= s->table;
for (JOIN_TAB *t= stat ; t < stat_end ; t++)
{
if (t->dependent & table->map)
@@ -5579,7 +5582,7 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
for (JOIN_TAB **pos=stat_vector+const_count ; (s= *pos) ; pos++)
{
- table=s->table;
+ TABLE *table= s->table;
if (table->is_filled_at_execution())
continue;
@@ -5632,7 +5635,7 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
(*s->on_expr_ref)->is_expensive()))
{ // system table
int tmp= 0;
- s->type=JT_SYSTEM;
+ s->type= JT_SYSTEM;
join->const_table_map|=table->map;
set_position(join,const_count++,s,(KEYUSE*) 0);
if ((tmp= join_read_const_table(join->thd, s,
@@ -5835,19 +5838,20 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
s->startup_cost= 0;
if (s->type == JT_SYSTEM || s->type == JT_CONST)
{
-
Json_writer_object table_records(thd);
- /* Only one matching row */
- s->found_records= s->records= 1;
- s->records_out= 1.0;
+ ha_rows records= 1;
+ if (s->type == JT_SYSTEM || s->table->file->stats.records == 0)
+ records= s->table->file->stats.records;
+ /* zero or one matching row */
+ s->records= s->found_records= records;
+ s->records_init= s->records_out= rows2double(records);
s->read_time=1.0;
s->worst_seeks=1.0;
- table_records.add_table_name(s)
- .add("rows", s->found_records)
- .add("cost", s->read_time)
- .add("table_type", s->type == JT_CONST ?
- "const" :
- "system");
+ table_records.add_table_name(s).
+ add("rows", s->found_records).
+ add("cost", s->read_time).
+ add("table_type", s->type == JT_CONST ?
+ "const" : "system");
continue;
}
/*
@@ -5899,7 +5903,7 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
s->table->pos_in_table_list->is_materialized_derived())) // (3)
{
bool impossible_range= FALSE;
- ha_rows records= HA_POS_ERROR;
+ ha_rows records= HA_ROWS_MAX;
SQL_SELECT *select= 0;
Item **sargable_cond= NULL;
if (!s->const_keys.is_clear_all())
@@ -5966,6 +5970,7 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
}
else
{
+ double records= 1;
join->const_table_map|= s->table->map;
set_position(join,const_count++,s,(KEYUSE*) 0);
s->type= JT_CONST;
@@ -5976,7 +5981,10 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
s->info= ET_IMPOSSIBLE_ON_CONDITION;
found_const_table_map|= s->table->map;
mark_as_null_row(s->table); // All fields are NULL
+ records= 0;
}
+ s->records_init= s->records_out= records;
+ s->found_records= s->records= (ha_rows)records;
}
}
if (records != HA_POS_ERROR)
@@ -6050,7 +6058,8 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
memcpy((uchar*) join->best_positions,(uchar*) join->positions,
sizeof(POSITION)*join->const_tables);
join->join_record_count= 1.0;
- join->best_read=1.0;
+ /* Const tables are part of optimizer setup and not counted in cost */
+ join->best_read=0.0;
}
if (!(join->select_options & SELECT_DESCRIBE) &&
@@ -6065,7 +6074,7 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
for (i= 0; i < join->table_count ; i++)
if (double rr= join->best_positions[i].records_read)
records= COST_MULT(records, rr);
- rows= records > (double) HA_ROWS_MAX ? HA_ROWS_MAX : (ha_rows) records;
+ rows= double_to_rows(records);
set_if_smaller(rows, unit->lim.get_select_limit());
join->select_lex->increase_derived_records(rows);
}
@@ -7707,8 +7716,9 @@ void set_position(JOIN *join,uint idx,JOIN_TAB *table,KEYUSE *key)
{
join->positions[idx].table= table;
join->positions[idx].key=key;
- join->positions[idx].records_read=1.0; /* This is a const table */
- join->positions[idx].records_out=1.0; /* This is a const table */
+ join->positions[idx].records_read=1.0; /* This is a const table */
+ join->positions[idx].records_out=1.0; /* This is a const table */
+ join->positions[idx].records_init=1.0; /* This is a const table */
join->positions[idx].cond_selectivity= 1.0;
join->positions[idx].ref_depend_map= 0;
@@ -7761,7 +7771,7 @@ void set_position(JOIN *join,uint idx,JOIN_TAB *table,KEYUSE *key)
TODO:
Extend with_found_constraint' to be set for a top level expression of type
X=Y where X and Y has fields from current table and at least one field from
- one o more previous tables.
+ one or more previous tables.
@see also
table_after_join_selectivity() produces selectivity of condition that is
@@ -7861,37 +7871,29 @@ INDEX_READ_COST cost_for_index_read(const THD *thd, const TABLE *table,
DBUG_ENTER("cost_for_index_read");
rows_adjusted= MY_MIN(rows2double(records), (double) thd->variables.max_seeks_for_key);
+ set_if_bigger(rows_adjusted, 1);
+
#ifdef OLD_CODE_LIMITED_SEEKS
set_if_smaller(rows_adjusted, worst_seeks);
#endif
if (file->is_clustering_key(key))
{
- cost.index_only_cost= file->ha_read_time(key, 1, (ha_rows)rows_adjusted);
- /*
- Same computation as in ha_read_and_copy_time()
- We do it explicitely here as we want to use the original value of
- records to compute the record copy cost.
- */
- cost.read_cost= (cost.index_only_cost +
- rows2double(records) * ROW_COPY_COST_THD(thd));
+ cost.index_only_cost=
+ file->ha_keyread_clustered_and_copy_time(key, 1, rows_adjusted, 0);
+ /* There is no 'index_only_read' with a clustered index */
+ cost.read_cost= cost.index_only_cost;
}
else if (table->covering_keys.is_set(key) && !table->no_keyread)
{
- cost.index_only_cost= file->ha_keyread_time(key, 1, (ha_rows)rows_adjusted);
+ cost.index_only_cost= file->ha_keyread_time(key, 1, rows_adjusted, 0);
/* Same computation as in ha_keyread_and_copy_time() */
cost.read_cost= (cost.index_only_cost +
- rows2double(records) * KEY_COPY_COST_THD(thd));
+ rows2double(records) * file->KEY_COPY_COST);
}
else
{
- cost.index_only_cost= file->ha_keyread_time(key, 1, (ha_rows) rows_adjusted);
- /*
- Note that ha_read_time() + ..ROW_COPY_COST should be same
- as ha_rnd_pos_time().
- */
- cost.read_cost= (cost.index_only_cost +
- file->ha_read_time(key, 0, (ha_rows)rows_adjusted) +
- rows2double(records) * ROW_COPY_COST_THD(thd));
+ cost.index_only_cost= file->ha_keyread_time(key, 1, rows_adjusted, 0);
+ cost.read_cost= (cost.index_only_cost + file->ha_rnd_pos_time(records));
}
DBUG_PRINT("statistics", ("index_cost: %.3f full_cost: %.3f",
cost.index_only_cost, cost.read_cost));
@@ -7960,8 +7962,8 @@ apply_filter(THD *thd, TABLE *table, double *cost, double *records_arg,
read even if selectivity (and thus new_records) would be very low.
*/
new_cost= (MY_MAX(cost_of_accepted_rows,
- ranges * KEY_LOOKUP_COST * io_cost *
- table->file->optimizer_cache_cost) +
+ ranges * table->file->KEY_LOOKUP_COST +
+ ranges * io_cost * table->file->DISK_READ_RATIO) +
cost_of_rejected_rows + filter_lookup_cost);
new_total_cost= ((new_cost + new_records * WHERE_COST_THD(thd)) *
prev_records + filter_startup_cost);
@@ -8025,6 +8027,24 @@ apply_filter(THD *thd, TABLE *table, double *cost, double *records_arg,
None
*/
+struct best_plan
+{
+ double cost; // Smallest cost found
+ double records; // Old 'Records'
+ double records_read; // Records accessed
+ double records_after_filter; // Records_read + filter
+ double records_out; // Smallest record count seen
+ Range_rowid_filter_cost_info *filter; // Best filter
+ KEYUSE *key; // Best key
+ SplM_plan_info *spl_plan;
+ table_map ref_depends_map;
+ enum join_type type;
+ uint forced_index;
+ uint max_key_part;
+ bool uses_jbuf;
+};
+
+
void
best_access_path(JOIN *join,
JOIN_TAB *s,
@@ -8040,14 +8060,7 @@ best_access_path(JOIN *join,
uint use_cond_selectivity=
thd->variables.optimizer_use_condition_selectivity;
TABLE *table= s->table;
- KEYUSE *best_key= 0;
- uint best_max_key_part= 0;
- uint best_forced_index= MAX_KEY, forced_index= MAX_KEY;
my_bool found_constraint= 0;
- double best_cost= DBL_MAX;
- double records= DBL_MAX;
- double records_out= table->stat_records() * table->cond_selectivity;
- table_map best_ref_depends_map= 0;
/*
key_dependent is 0 if all key parts could be used or if there was an
EQ_REF table found (which uses all key parts). In other words, we cannot
@@ -8055,18 +8068,29 @@ best_access_path(JOIN *join,
Otherwise it's a bitmap of tables that could improve key usage.
*/
table_map key_dependent= 0;
- Range_rowid_filter_cost_info *best_filter= 0;
double tmp;
ha_rows rec;
- bool best_uses_jbuf= FALSE;
MY_BITMAP *eq_join_set= &s->table->eq_join_set;
KEYUSE *hj_start_key= 0;
- SplM_plan_info *spl_plan= 0;
- enum join_type best_type= JT_UNKNOWN, type= JT_UNKNOWN;
Loose_scan_opt loose_scan_opt;
+ struct best_plan best;
Json_writer_object trace_wrapper(thd, "best_access_path");
DBUG_ENTER("best_access_path");
+ best.cost= DBL_MAX;
+ best.records= DBL_MAX;
+ best.records_read= DBL_MAX;
+ best.records_after_filter= DBL_MAX;
+ best.records_out= table->stat_records() * table->cond_selectivity;
+ best.filter= 0;
+ best.key= 0;
+ best.max_key_part= 0;
+ best.type= JT_UNKNOWN;
+ best.forced_index= MAX_KEY;
+ best.ref_depends_map= 0;
+ best.uses_jbuf= FALSE;
+ best.spl_plan= 0;
+
disable_jbuf= disable_jbuf || idx == join->const_tables;
trace_wrapper.add_table_name(s);
@@ -8076,7 +8100,7 @@ best_access_path(JOIN *join,
loose_scan_opt.init(join, s, remaining_tables);
if (table->is_splittable())
- spl_plan= s->choose_best_splitting(record_count, remaining_tables);
+ best.spl_plan= s->choose_best_splitting(record_count, remaining_tables);
if (unlikely(thd->trace_started()))
{
@@ -8087,10 +8111,10 @@ best_access_path(JOIN *join,
if (s->keyuse)
{ /* Use key if possible */
- KEYUSE *keyuse;
- KEYUSE *start_key=0;
- double best_records= DBL_MAX, index_only_cost= DBL_MAX;
+ KEYUSE *keyuse, *start_key= 0;
+ double index_only_cost= DBL_MAX;
uint max_key_part=0;
+ enum join_type type= JT_UNKNOWN;
/* Test how we can use keys */
rec= s->records/MATCHING_ROWS_IN_OTHER_TABLE; // Assumed records/key
@@ -8112,7 +8136,7 @@ best_access_path(JOIN *join,
key_part_map ref_or_null_part= 0;
key_part_map all_parts= 0;
double startup_cost= s->startup_cost;
- double records_after_filter;
+ double records_after_filter, records_best_filter, records;
Range_rowid_filter_cost_info *filter= 0;
if (is_hash_join_key_no(key))
@@ -8343,7 +8367,6 @@ best_access_path(JOIN *join,
((double) (table->s->max_key_length-keyinfo->key_length) /
(double) table->s->max_key_length)));
set_if_smaller(records, (double)s->records);
- set_if_smaller(records_out, records);
if (records < 2.0)
records=2.0; /* Can't be as good as a unique */
}
@@ -8410,6 +8433,8 @@ best_access_path(JOIN *join,
(!(table->file->index_flags(key, 0, 0) & HA_ONLY_WHOLE_INDEX) ||
found_part == PREV_BITS(uint,keyinfo->user_defined_key_parts)))
{
+ double extra_cost= 0;
+
max_key_part= max_part_bit(found_part);
/*
ReuseRangeEstimateForRef-3:
@@ -8534,7 +8559,7 @@ best_access_path(JOIN *join,
a*keyinfo->user_defined_key_parts - rec_per_key)/
(keyinfo->user_defined_key_parts-1);
else
- records= a;
+ records= rows2double(s->records);
set_if_bigger(records, MIN_ROWS_AFTER_FILTERING);
}
}
@@ -8543,6 +8568,7 @@ best_access_path(JOIN *join,
{
/* We need to do two key searches to find row */
records *= 2.0;
+ extra_cost= s->table->file->KEY_LOOKUP_COST;
}
/*
@@ -8571,13 +8597,15 @@ best_access_path(JOIN *join,
}
}
+ /* Limit the number of matched rows */
+ set_if_smaller(records, (double) s->records);
tmp= records;
set_if_smaller(tmp, (double) thd->variables.max_seeks_for_key);
INDEX_READ_COST cost= cost_for_index_read(thd, table, key,
(ha_rows) tmp,
(ha_rows) s->worst_seeks);
tmp= cost.read_cost;
- index_only_cost= cost.index_only_cost;
+ index_only_cost= cost.index_only_cost+extra_cost;
}
else
{
@@ -8599,7 +8627,7 @@ best_access_path(JOIN *join,
if (records == DBL_MAX) // Key not usable
continue;
- records_after_filter= records;
+ records_best_filter= records_after_filter= records;
/*
Check that start_key->key can be used for index access
@@ -8661,7 +8689,8 @@ best_access_path(JOIN *join,
tmp,
index_only_cost,
record_count,
- &records_out);
+ &records_best_filter);
+ set_if_smaller(best.records_out, records_best_filter);
if (filter)
filter= filter->apply_filter(thd, table, &tmp, &records_after_filter,
&startup_cost,
@@ -8682,20 +8711,31 @@ best_access_path(JOIN *join,
The COST_EPS is here to ensure we use the first key if there are
two 'identical keys' that could be used.
*/
- if (tmp + COST_EPS < best_cost)
+ if (tmp + COST_EPS < best.cost)
{
trace_access_idx.add("chosen", true);
- best_cost= tmp;
+ best.cost= tmp;
/*
We use 'records' instead of 'records_after_filter' here as we want
to have EXPLAIN print the number of rows found by the key access.
*/
- best_records= records; // Records before filter!
- best_key= start_key;
- best_max_key_part= max_key_part;
- best_ref_depends_map= found_ref;
- best_filter= filter;
- best_type= type;
+ best.records= records; // Records before filter!
+ best.records_read= records;
+ /*
+ If we are using 'use_cond_selectivity > 1' then
+ table_after_join_selectivity() may take into account other
+ filters that what is currently used so we have to use
+ records_after_filter. If 'use_cond_selectivity <= 1 then we
+ can use information from the best filter.
+ */
+ best.records_after_filter= ((use_cond_selectivity > 1) ?
+ records_after_filter :
+ records_best_filter);
+ best.key= start_key;
+ best.max_key_part= max_key_part;
+ best.ref_depends_map= found_ref;
+ best.filter= filter;
+ best.type= type;
}
else if (unlikely(thd->trace_started()))
{
@@ -8703,9 +8743,8 @@ best_access_path(JOIN *join,
add("chosen", false).
add("cause", cause ? cause : "cost");
}
- set_if_smaller(records_out, records);
+ set_if_smaller(best.records_out, records);
} /* for each key */
- records= best_records;
}
else
{
@@ -8728,7 +8767,7 @@ best_access_path(JOIN *join,
/* Add dependency for sub queries */
key_dependent|= s->embedded_dependent;
- } /* if (s->keyuse) */
+ } /* if (s->keyuse) */
/* Check that s->key_dependent contains all used_tables found in s->keyuse */
@@ -8744,7 +8783,7 @@ best_access_path(JOIN *join,
(1) s is inner table of semi-join -> join cache is allowed for semijoins
(2) s is inner table of outer join -> join cache is allowed for outer joins
*/
- if (idx > join->const_tables && best_key == 0 &&
+ if (idx > join->const_tables && best.key == 0 &&
(join->allowed_join_cache_types & JOIN_CACHE_HASHED_BIT) &&
join->max_allowed_join_cache_level > 2 &&
!bitmap_is_clear_all(eq_join_set) && !disable_jbuf &&
@@ -8753,11 +8792,11 @@ best_access_path(JOIN *join,
(!(table->map & join->outer_join) ||
join->allowed_outer_join_with_cache)) // (2)
{
- double refills, cmp_time;
+ double refills, row_copy_cost, cmp_time;
/* Estimate the cost of the hash join access to the table */
- double rnd_records= matching_candidates_in_table(s, found_constraint,
+ double rnd_records= matching_candidates_in_table(s, 0,
use_cond_selectivity);
- set_if_smaller(records_out, rnd_records);
+ set_if_smaller(best.records_out, rnd_records);
/*
The following cost calculation is identical to the cost calculation for
@@ -8786,18 +8825,22 @@ best_access_path(JOIN *join,
We assume here that, thanks to the hash, we don't have to compare all
row combinations, only a HASH_FANOUT (10%) rows in the cache.
*/
- cmp_time= (rnd_records * record_count * HASH_FANOUT *
- (ROW_COPY_COST_THD(thd) * JOIN_CACHE_ROW_COPY_COST_FACTOR +
+ row_copy_cost= (ROW_COPY_COST_THD(thd) * 2 *
+ JOIN_CACHE_ROW_COPY_COST_FACTOR(thd));
+ cmp_time= (record_count * row_copy_cost +
+ rnd_records * record_count * HASH_FANOUT *
+ ((idx - join->const_tables) * row_copy_cost +
WHERE_COST_THD(thd)));
tmp= COST_ADD(tmp, cmp_time);
- best_cost= tmp;
- records= rnd_records;
- best_key= hj_start_key;
- best_ref_depends_map= 0;
- best_uses_jbuf= TRUE;
- best_filter= 0;
- best_type= JT_HASH;
+ best.cost= tmp;
+ best.records_read= best.records_after_filter= rows2double(s->records);
+ best.records= rnd_records;
+ best.key= hj_start_key;
+ best.ref_depends_map= 0;
+ best.uses_jbuf= TRUE;
+ best.filter= 0;
+ best.type= JT_HASH;
Json_writer_object trace_access_hash(thd);
if (unlikely(trace_access_hash.trace_started()))
trace_access_hash.
@@ -8805,7 +8848,7 @@ best_access_path(JOIN *join,
add("index", "hj-key").
add("rows", rnd_records).
add("refills", refills).
- add("cost", best_cost).
+ add("cost", best.cost).
add("chosen", true);
}
@@ -8845,21 +8888,25 @@ best_access_path(JOIN *join,
be used for cases with small datasets, which is annoying.
*/
Json_writer_object trace_access_scan(thd);
- if ((records >= s->found_records || best_cost > s->read_time) && // (1)
- !(best_key && best_key->key == MAX_KEY) && // (2)
+ if ((best.records_read >= s->found_records ||
+ best.cost > s->read_time) && // (1)
+ !(best.key && best.key->key == MAX_KEY) && // (2)
!(s->quick &&
s->quick->get_type() != QUICK_SELECT_I::QS_TYPE_GROUP_MIN_MAX && // (2)
- best_key && s->quick->index == best_key->key && // (2)
- best_max_key_part >= table->opt_range[best_key->key].key_parts) &&// (2)
- !((table->file->ha_table_flags() & HA_TABLE_SCAN_ON_INDEX) && // (3)
- !table->covering_keys.is_clear_all() && best_key && !s->quick) &&// (3)
- !(table->force_index_join && best_key && !s->quick) && // (4)
- !(best_key && table->pos_in_table_list->jtbm_subselect)) // (5)
+ best.key && s->quick->index == best.key->key && // (2)
+ best.max_key_part >= table->opt_range[best.key->key].key_parts) &&// (2)
+ !((table->file->ha_table_flags() & HA_TABLE_SCAN_ON_INDEX) && // (3)
+ !table->covering_keys.is_clear_all() && best.key && !s->quick) &&// (3)
+ !(table->force_index_join && best.key && !s->quick) && // (4)
+ !(best.key && table->pos_in_table_list->jtbm_subselect)) // (5)
{ // Check full join
- double rnd_records, records_after_filter, org_records;
+ double records_after_filter, org_records;
+ double records_best_filter;
Range_rowid_filter_cost_info *filter= 0;
double startup_cost= s->startup_cost;
const char *scan_type= "";
+ enum join_type type;
+ uint forced_index= MAX_KEY;
/*
Range optimizer never proposes a RANGE if it isn't better
@@ -8889,7 +8936,8 @@ best_access_path(JOIN *join,
This is done to make records found comparable to what we get with
'ref' access.
*/
- org_records= records_after_filter= rnd_records= rows2double(s->found_records);
+ org_records= records_after_filter= rows2double(s->found_records);
+ records_best_filter= org_records;
if (s->quick->get_type() == QUICK_SELECT_I::QS_TYPE_RANGE)
{
@@ -8907,11 +8955,13 @@ best_access_path(JOIN *join,
range->cost / s->quick->read_time >= 0.9999999));
filter=
- table->best_range_rowid_filter_for_partial_join(key_no, rows2double(range->rows),
+ table->best_range_rowid_filter_for_partial_join(key_no,
+ rows2double(range->rows),
range->find_cost,
range->index_only_cost,
record_count,
- &records_out);
+ &records_best_filter);
+ set_if_smaller(best.records_out, records_best_filter);
if (filter)
{
double filter_cost= range->fetch_cost;
@@ -8940,20 +8990,18 @@ best_access_path(JOIN *join,
{
type= JT_INDEX_MERGE;
}
- set_if_smaller(records_out, records_after_filter);
loose_scan_opt.check_range_access(join, idx, s->quick);
}
else
{
/* We will now calculate cost of scan, with or without join buffer */
- rnd_records= matching_candidates_in_table(s, found_constraint,
- use_cond_selectivity);
- records_after_filter= rnd_records;
- set_if_smaller(records_out, rnd_records);
+ records_best_filter= records_after_filter=
+ matching_candidates_in_table(s, 0, use_cond_selectivity);
+ DBUG_ASSERT(records_after_filter <= s->records);
- org_records= rows2double(s->records);
+ set_if_smaller(best.records_out, records_after_filter);
- DBUG_ASSERT(rnd_records <= s->records);
+ org_records= rows2double(s->records);
/* Estimate cost of reading table. */
if (s->cached_forced_index_type)
@@ -8964,7 +9012,7 @@ best_access_path(JOIN *join,
}
else
{
- if (table->force_index_join && !best_key)
+ if (table->force_index_join && !best.key)
{
/*
The query is using 'forced_index' and we did not find a usable key.
@@ -9008,6 +9056,7 @@ best_access_path(JOIN *join,
tmp= s->cached_scan_and_compare_time;
type= JT_ALL;
}
+ /* Cache result for other calls */
s->cached_forced_index_type= type;
s->cached_forced_index_cost= tmp;
s->cached_forced_index= forced_index;
@@ -9034,7 +9083,7 @@ best_access_path(JOIN *join,
else
{
/* Scan trough join cache */
- double cmp_time, refills;
+ double cmp_time, row_copy_cost, refills;
/*
Calculate cost of checking the the WHERE for this table.
@@ -9052,13 +9101,16 @@ best_access_path(JOIN *join,
/* We come here only if there are already rows in the join cache */
DBUG_ASSERT(idx != join->const_tables);
/*
- Cost of moving each row from each previous table from the join cache
- to it's table record and comparing it with the found and accepted
- row.
+ Cost of:
+ - Copying all previous record combinations to the join cache
+ - Copying the tables from the join cache to table records
+ - Checking the WHERE against the final row combination
*/
- cmp_time= (rnd_records * record_count *
- (ROW_COPY_COST_THD(thd) * (idx - join->const_tables) *
- JOIN_CACHE_ROW_COPY_COST_FACTOR +
+ row_copy_cost= (ROW_COPY_COST_THD(thd) *
+ JOIN_CACHE_ROW_COPY_COST_FACTOR(thd));
+ cmp_time= (record_count * row_copy_cost +
+ records_after_filter * record_count *
+ ((idx - join->const_tables) * row_copy_cost +
WHERE_COST_THD(thd)));
tmp= COST_ADD(tmp, cmp_time);
}
@@ -9074,10 +9126,10 @@ best_access_path(JOIN *join,
trace_access_scan.
add("access_type",
type == JT_ALL ? scan_type : join_type_str[type]).
- add("rows", org_records).
- add("rows_after_scan", rnd_records).
- add("rows_after_filter", records_after_filter).
- add("cost", tmp);
+ add("rows", org_records).
+ add("rows_after_filter", records_after_filter).
+ add("rows_out", best.records_out).
+ add("cost", tmp);
if (type == JT_ALL)
{
trace_access_scan.add("index_only",
@@ -9085,27 +9137,38 @@ best_access_path(JOIN *join,
}
}
- if (tmp + COST_EPS < best_cost)
+ if (tmp + COST_EPS < best.cost)
{
/*
If the table has a range (s->quick is set) make_join_select()
will ensure that this will be used
*/
- best_cost= tmp;
- records= rnd_records;
- best_key= 0;
- best_forced_index= forced_index;
+ best.cost= tmp;
+ best.records_read= org_records; // Records accessed
+ best.records= records_after_filter; // Records to be checked with WHERE
+ /*
+ If we are using 'use_cond_selectivity > 1' then
+ table_after_join_selectivity may take into account other
+ filters that what is currently used so we have to use
+ records_after_filter. If 'use_cond_selectivity <= 1 then we
+ can use information from the best filter.
+ */
+ best.records_after_filter= ((use_cond_selectivity > 1) ?
+ records_after_filter :
+ records_best_filter);
+ best.key= 0;
+ best.forced_index= forced_index;
/*
filter is only set if
s->quick->get_type() == QUICK_SELECT_I::QS_TYPE_RANGE
*/
- best_filter= filter;
+ best.filter= filter;
/* range/index_merge/ALL/index access method are "independent", so: */
- best_ref_depends_map= 0;
- best_uses_jbuf= MY_TEST(!disable_jbuf && !((table->map &
+ best.ref_depends_map= 0;
+ best.uses_jbuf= MY_TEST(!disable_jbuf && !((table->map &
join->outer_join)));
- spl_plan= 0;
- best_type= type;
+ best.spl_plan= 0;
+ best.type= type;
trace_access_scan.add("chosen", true);
}
else
@@ -9120,29 +9183,33 @@ best_access_path(JOIN *join,
add("cause", "cost");
}
+ crash_if_first_double_is_bigger(best.records_out, best.records);
+ crash_if_first_double_is_bigger(best.records_out, best.records_read);
+
/* Update the cost information for the current partial plan */
- crash_if_first_double_is_bigger(records_out, records);
- pos->records_read= records;
- pos->records_out= records_out;
- pos->read_time= best_cost;
- pos->key= best_key;
- pos->forced_index= best_forced_index;
- pos->type= best_type;
+ pos->records_init= best.records_read;
+ pos->records_after_filter= best.records_after_filter;
+ pos->records_read= best.records;
+ pos->records_out= best.records_out;
+ pos->read_time= best.cost;
+ pos->key= best.key;
+ pos->forced_index= best.forced_index;
+ pos->type= best.type;
pos->table= s;
- pos->ref_depend_map= best_ref_depends_map;
+ pos->ref_depend_map= best.ref_depends_map;
pos->loosescan_picker.loosescan_key= MAX_KEY;
- pos->use_join_buffer= best_uses_jbuf;
- pos->spl_plan= spl_plan;
- pos->range_rowid_filter_info= best_filter;
- pos->key_dependent= (best_type == JT_EQ_REF ? (table_map) 0 :
+ pos->use_join_buffer= best.uses_jbuf;
+ pos->spl_plan= best.spl_plan;
+ pos->range_rowid_filter_info= best.filter;
+ pos->key_dependent= (best.type == JT_EQ_REF ? (table_map) 0 :
key_dependent & remaining_tables);
loose_scan_opt.save_to_position(s, loose_scan_pos);
- if (!best_key &&
- idx == join->const_tables &&
+ if (!best.key &&
+ idx == join->const_tables && // First table
table == join->sort_by_table &&
- join->unit->lim.get_select_limit() >= records)
+ join->unit->lim.get_select_limit() >= best.records) // QQQ Why?
{
trace_access_scan.add("use_tmp_table", true);
join->sort_by_table= (TABLE*) 1; // Must use temporary table
@@ -9377,15 +9444,6 @@ choose_plan(JOIN *join, table_map join_tables, TABLE_LIST *emb_sjm_nest)
DBUG_RETURN(TRUE);
}
- /*
- Store the cost of this query into a user variable
- Don't update last_query_cost for statements that are not "flat joins" :
- i.e. they have subqueries, unions or call stored procedures.
- TODO: calculate a correct cost for a query with subqueries and UNIONs.
- */
- if (join->thd->lex->is_single_level_stmt())
- join->thd->status_var.last_query_cost= join->best_read;
-
join->emb_sjm_nest= 0;
DBUG_RETURN(FALSE);
}
@@ -9652,6 +9710,8 @@ optimize_straight_join(JOIN *join, table_map remaining_tables)
{
POSITION *position= join->positions + idx;
Json_writer_object trace_one_table(thd);
+ double original_record_count, current_record_count;
+
if (unlikely(thd->trace_started()))
trace_plan_prefix(join, idx, remaining_tables);
/* Find the best access method from 's' to the current partial plan */
@@ -9660,22 +9720,71 @@ optimize_straight_join(JOIN *join, table_map remaining_tables)
position, &loose_scan_pos);
/* Compute the cost of the new plan extended with 's' */
- record_count= COST_MULT(record_count, position->records_read);
+ current_record_count= COST_MULT(record_count, position->records_out);
read_time= COST_ADD(read_time, position->read_time);
- optimize_semi_joins(join, remaining_tables, idx, &record_count, &read_time,
- &loose_scan_pos);
+ original_record_count= current_record_count;
+ optimize_semi_joins(join, remaining_tables, idx, &current_record_count,
+ &read_time, &loose_scan_pos);
+ if (position->sj_strategy != SJ_OPT_NONE && original_record_count)
+ {
+ /* Adjust records_out to contain the final number of rows */
+ double ratio= current_record_count / original_record_count;
+ /* QQQ This is just to stop an assert later */
+ if (ratio < 1)
+ position->records_out*= ratio;
+ }
+
remaining_tables&= ~(s->table->map);
- double pushdown_cond_selectivity= 1.0;
- if (use_cond_selectivity > 1)
+ if (use_cond_selectivity > 1 && position->sj_strategy == SJ_OPT_NONE)
+ {
+ double pushdown_cond_selectivity, records_out;
pushdown_cond_selectivity= table_after_join_selectivity(join, idx, s,
- remaining_tables);
- position->cond_selectivity= pushdown_cond_selectivity;
+ remaining_tables,
+ &records_out);
+ if (unlikely(thd->trace_started()) &&
+ pushdown_cond_selectivity != 1.0)
+ {
+ trace_one_table.
+ add("pushdown_cond_selectivity", pushdown_cond_selectivity).
+ add("rows_out", records_out);
+ }
+ position->cond_selectivity= pushdown_cond_selectivity;
+ position->records_out= records_out;
+ current_record_count= COST_MULT(record_count, records_out);
+ }
+ else
+ position->cond_selectivity= 1.0;
++idx;
+ record_count= current_record_count;
}
if (join->sort_by_table &&
join->sort_by_table != join->positions[join->const_tables].table->table)
- read_time+= record_count; // We have to make a temp table
+ {
+ /*
+ We may have to make a temp table, note that this is only a
+ heuristic since we cannot know for sure at this point if we
+ we are going to use addon fields or to have flush sorting to
+ disk. We also don't know the temporary table will be in memory
+ or disk.
+ The following calculation takes a middle ground where assume
+ we can sort the keys in memory but have to use a disk based
+ temporary table to retrive the rows.
+ This cost is probably much bigger than it has to be...
+ */
+ double sort_cost;
+ sort_cost= (get_qsort_sort_cost((ha_rows)record_count, 0) +
+ record_count *
+ DISK_TEMPTABLE_LOOKUP_COST(thd));
+ {
+ if (unlikely(thd->trace_started()))
+ {
+ Json_writer_object trace_one_table(thd);
+ trace_one_table.add("estimated_cost_for_sorting", sort_cost);
+ }
+ }
+ read_time= COST_ADD(read_time, sort_cost);
+ }
memcpy((uchar*) join->best_positions, (uchar*) join->positions,
sizeof(POSITION)*idx);
join->join_record_count= record_count;
@@ -10054,8 +10163,7 @@ double JOIN::get_examined_rows()
COST_MULT((double) (tab->get_examined_rows()), prev_fanout));
prev_tab= tab;
}
- examined_rows= (double)
- (records > (double) HA_ROWS_MAX ? HA_ROWS_MAX : (ha_rows) records);
+ examined_rows= double_to_rows(records);
return examined_rows;
}
@@ -10186,9 +10294,10 @@ double table_multi_eq_cond_selectivity(JOIN *join, uint idx, JOIN_TAB *s,
@brief
Get the selectivity of conditions when joining a table
- @param join The optimized join
- @param s The table to be joined for evaluation
- @param rem_tables The bitmap of tables to be joined later
+ @param join The optimized join
+ @param s The table to be joined for evaluation
+ @param rem_tables The bitmap of tables to be joined later
+ @param new_records_out OUT Set to number of rows accepted
@detail
Get selectivity of conditions that can be applied when joining this table
@@ -10202,12 +10311,14 @@ double table_multi_eq_cond_selectivity(JOIN *join, uint idx, JOIN_TAB *s,
condition, "COND(this_table) AND COND(this_table, previous_tables)".
@retval
- selectivity of the conditions imposed on the rows of s
+ selectivity of the conditions imposed on the rows of s related to
+ the rows that we are expected to read (position->records_init).
*/
static
double table_after_join_selectivity(JOIN *join, uint idx, JOIN_TAB *s,
- table_map rem_tables)
+ table_map rem_tables,
+ double *new_records_out)
{
uint16 ref_keyuse_steps_buf[MAX_REF_PARTS];
uint ref_keyuse_size= MAX_REF_PARTS;
@@ -10215,13 +10326,14 @@ double table_after_join_selectivity(JOIN *join, uint idx, JOIN_TAB *s,
Field *field;
TABLE *table= s->table;
MY_BITMAP *read_set= table->read_set;
- double sel= table->cond_selectivity;
POSITION *pos= &join->positions[idx];
+ double sel, records_out= pos->records_out;
uint keyparts= 0;
uint found_part_ref_or_null= 0;
if (pos->key != 0)
{
+ sel= table->cond_selectivity;
/*
A ref access or hash join is used for this table. ref access is created
from
@@ -10395,35 +10507,22 @@ double table_after_join_selectivity(JOIN *join, uint idx, JOIN_TAB *s,
}
keyuse++;
}
- }
- else
- {
/*
- The table is accessed with full table scan, or quick select.
- Selectivity of COND(table) is already accounted for in
- matching_candidates_in_table().
- */
- sel= 1.0;
- }
+ If the field f from the table is equal to a field from one the
+ earlier joined tables then the selectivity of the range conditions
+ over the field f must be discounted.
- /*
- If the field f from the table is equal to a field from one the
- earlier joined tables then the selectivity of the range conditions
- over the field f must be discounted.
-
- We need to discount selectivity only if we're using ref-based
- access method (and have sel!=1).
- If we use ALL/range/index_merge, then sel==1, and no need to discount.
- */
- if (pos->key != NULL)
- {
+ We need to discount selectivity only if we're using ref-based
+ access method (and have sel!=1).
+ If we use ALL/range/index_merge, then sel==1, and no need to discount.
+ */
for (Field **f_ptr=table->field ; (field= *f_ptr) ; f_ptr++)
{
if (!bitmap_is_set(read_set, field->field_index) ||
!field->next_equal_field)
- continue;
- for (Field *next_field= field->next_equal_field;
- next_field != field;
+ continue;
+ for (Field *next_field= field->next_equal_field;
+ next_field != field;
next_field= next_field->next_equal_field)
{
if (!(next_field->table->map & rem_tables) &&
@@ -10438,14 +10537,39 @@ double table_after_join_selectivity(JOIN *join, uint idx, JOIN_TAB *s,
}
}
}
+ /*
+ We have now calculated a more exact 'records_out' taking more index
+ costs into account.
+ pos->records_out previously contained the smallest record count for
+ all range or ref access, which should not be smaller than what we
+ calculated above.
+ */
+ records_out= pos->records_after_filter * sel;
+ set_if_smaller(records_out, pos->records_out);
}
- sel*= table_multi_eq_cond_selectivity(join, idx, s, rem_tables,
+ sel= table_multi_eq_cond_selectivity(join, idx, s, rem_tables,
keyparts, ref_keyuse_steps);
+ records_out*= sel;
+
+ /*
+ Update sel to be relative pos->records_read as that is what some old
+ code expects. Newer code should just use 'position->records_out' instead.
+ */
+ if (pos->records_read == 0)
+ sel= 1.0;
+ else
+ {
+ sel= records_out / pos->records_read;
+ DBUG_ASSERT(sel >= 0.0 and sel <= 1.00001);
+ if (sel > 1.0)
+ sel= 1.0;
+ }
+
exit:
+ *new_records_out= records_out;
if (ref_keyuse_steps != ref_keyuse_steps_buf)
my_free(ref_keyuse_steps);
- DBUG_ASSERT(sel >= 0.0 and sel <= 1.0);
return sel;
}
@@ -10464,7 +10588,7 @@ check_if_edge_table(POSITION *pos,
if ((pos->type == JT_EQ_REF ||
(pos->type == JT_REF &&
- pos->records_read == 1 &&
+ pos->records_init == 1 &&
!pos->range_rowid_filter_info)) &&
pushdown_cond_selectivity >= 0.999)
return SEARCH_FOUND_EDGE;
@@ -10657,7 +10781,7 @@ get_costs_for_tables(JOIN *join, table_map remaining_tables, uint idx,
// pplan_cost already too great, stop search
continue;
- pplan= expand pplan by best_access_method;
+ pplan= expand plan by best_access_method;
remaining_tables= remaining_tables - table T;
if (remaining_tables is not an empty set
and
@@ -10728,8 +10852,8 @@ best_extension_by_limited_search(JOIN *join,
{
THD *thd= join->thd;
/*
- 'join' is a partial plan with lower cost than the best plan so far,
- so continue expanding it further with the tables in 'remaining_tables'.
+ 'join' is a partial plan with lower cost than the best plan so far,
+ so continue expanding it further with the tables in 'remaining_tables'.
*/
JOIN_TAB *s;
double best_record_count= DBL_MAX;
@@ -10746,14 +10870,14 @@ best_extension_by_limited_search(JOIN *join,
if (dbug_user_var_equals_int(thd,
"show_explain_probe_select_id",
join->select_lex->select_number))
- dbug_serve_apcs(thd, 1);
- );
+ dbug_serve_apcs(thd, 1);
+ );
if (unlikely(thd->check_killed())) // Abort
DBUG_RETURN(SEARCH_ABORT);
DBUG_EXECUTE("opt", print_plan(join, idx, record_count, read_time, read_time,
- "part_plan"););
+ "part_plan"););
status_var_increment(thd->status_var.optimizer_join_prefixes_check_calls);
if (join->emb_sjm_nest)
@@ -10842,7 +10966,7 @@ best_extension_by_limited_search(JOIN *join,
!check_interleaving_with_nj(s))
{
table_map real_table_bit= s->table->map;
- double current_record_count, current_read_time;
+ double current_record_count, current_read_time, original_record_count;
double partial_join_cardinality;
POSITION *position= join->positions + idx, *loose_scan_pos;
double pushdown_cond_selectivity;
@@ -10859,7 +10983,7 @@ best_extension_by_limited_search(JOIN *join,
loose_scan_pos= pos->position+1;
/* Compute the cost of the new plan extended with 's' */
- current_record_count= COST_MULT(record_count, position->records_read);
+ current_record_count= COST_MULT(record_count, position->records_out);
current_read_time= COST_ADD(read_time, position->read_time);
if (unlikely(trace_one_table.trace_started()))
@@ -10868,9 +10992,22 @@ best_extension_by_limited_search(JOIN *join,
add("rows_for_plan", current_record_count).
add("cost_for_plan", current_read_time);
}
+ original_record_count= current_record_count;
optimize_semi_joins(join, remaining_tables, idx, &current_record_count,
&current_read_time, loose_scan_pos);
-
+ if (position->sj_strategy != SJ_OPT_NONE)
+ {
+ /* Adjust records_out and current_record_count after semi join */
+ double ratio= current_record_count / original_record_count;
+ /* QQQ This is just to stop an assert later */
+ if (ratio < 1.0)
+ position->records_out*= ratio;
+ if (unlikely(trace_one_table.trace_started()))
+ {
+ trace_one_table.add("sj_rows_out", position->records_out);
+ trace_one_table.add("sj_rows_for_plan", current_record_count);
+ }
+ }
/* Expand only partial plans with lower cost than the best QEP so far */
if (current_read_time + COST_EPS >= join->best_read)
{
@@ -10921,15 +11058,15 @@ best_extension_by_limited_search(JOIN *join,
if (best_record_count > current_record_count ||
best_read_time > current_read_time ||
(idx == join->const_tables && // 's' is the first table in the QEP
- s->table == join->sort_by_table))
+ s->table == join->sort_by_table))
{
/*
Store the current record count and cost as the best
possible cost at this level if the following holds:
- It's the lowest record number and cost so far
- - There is no remaing table that could improve index usage
- or we found an EQ_REF or REF key with less than 2
- matching records (good enough).
+ - There is no remaing table that could improve index usage
+ or we found an EQ_REF or REF key with less than 2
+ matching records (good enough).
*/
if (best_record_count >= current_record_count &&
best_read_time >= current_read_time &&
@@ -10981,17 +11118,26 @@ best_extension_by_limited_search(JOIN *join,
}
pushdown_cond_selectivity= 1.0;
- if (use_cond_selectivity > 1)
+ /*
+ TODO: When a semi-join strategy is applied (sj_strategy!=SJ_OPT_NONE),
+ we should account for selectivity from table_after_join_selectivity().
+ (Condition filtering is performed before the semi-join removes some
+ fanout so this might require moving the code around)
+ */
+ if (use_cond_selectivity > 1 && position->sj_strategy == SJ_OPT_NONE)
+ {
pushdown_cond_selectivity=
table_after_join_selectivity(join, idx, s,
- remaining_tables & ~real_table_bit);
+ remaining_tables & ~real_table_bit,
+ &position->records_out);
+ }
join->positions[idx].cond_selectivity= pushdown_cond_selectivity;
- partial_join_cardinality= (current_record_count *
- pushdown_cond_selectivity);
+ partial_join_cardinality= record_count * position->records_out;
- if (unlikely(thd->trace_started()) && pushdown_cond_selectivity < 1.0)
+ if (unlikely(thd->trace_started()) && pushdown_cond_selectivity < 1.0 &&
+ partial_join_cardinality < current_record_count)
trace_one_table
.add("selectivity", pushdown_cond_selectivity)
.add("estimated_join_cardinality", partial_join_cardinality);
@@ -11036,11 +11182,21 @@ best_extension_by_limited_search(JOIN *join,
{
/*
We may have to make a temp table, note that this is only a
- heuristic since we cannot know for sure at this point.
- Hence it may be wrong.
+ heuristic since we cannot know for sure at this point if we
+ we are going to use addon fields or to have flush sorting to
+ disk. We also don't know the temporary table will be in memory
+ or disk.
+ The following calculation takes a middle ground where assume
+ we can sort the keys in memory but have to use a disk based
+ temporary table to retrive the rows.
+ This cost is probably much bigger than it has to be...
*/
- trace_one_table.add("cost_for_sorting", current_record_count);
- current_read_time= COST_ADD(current_read_time, current_record_count);
+ double sort_cost;
+ sort_cost= (get_qsort_sort_cost((ha_rows)current_record_count,0) +
+ current_record_count *
+ DISK_TEMPTABLE_LOOKUP_COST(thd));
+ trace_one_table.add("cost_for_sorting", sort_cost);
+ current_read_time= COST_ADD(current_read_time, sort_cost);
}
if (current_read_time < join->best_read)
{
@@ -11375,11 +11531,8 @@ prev_record_reads(const POSITION *positions, uint idx, table_map found_ref)
is an inprecise estimate and adding 1 (or, in the worst case,
#max_nested_outer_joins=64-1) will not make it any more precise.
*/
- if (pos->records_read)
- {
- found= COST_MULT(found, pos->records_read);
- found*= pos->cond_selectivity;
- }
+ if (pos->records_out)
+ found= COST_MULT(found, pos->records_out);
}
}
return found;
@@ -11809,7 +11962,7 @@ bool JOIN::get_best_combination()
*/
SJ_MATERIALIZATION_INFO *sjm= cur_pos->table->emb_sj_nest->sj_mat_info;
j->records_read= (sjm->is_sj_scan? sjm->rows : 1.0);
- j->records_out= j->records_read;
+ j->records_init= j->records_out= j->records_read;
j->records= (ha_rows) j->records_read;
j->cond_selectivity= 1.0;
JOIN_TAB *jt;
@@ -11844,6 +11997,7 @@ bool JOIN::get_best_combination()
if (j->type == JT_SYSTEM)
goto loop_end;
+
if (!(keyuse= cur_pos->key))
{
if (cur_pos->type == JT_NEXT) // Forced index
@@ -11864,17 +12018,19 @@ bool JOIN::get_best_combination()
j->range_rowid_filter_info=
cur_pos->range_rowid_filter_info;
- loop_end:
- /*
+ /*
Save records_read in JOIN_TAB so that select_describe()/etc don't have
to access join->best_positions[].
*/
+ j->records_init= cur_pos->records_init;
j->records_read= cur_pos->records_read;
j->records_out= cur_pos->records_out;
+
+ loop_end:
j->cond_selectivity= cur_pos->cond_selectivity;
DBUG_ASSERT(j->cond_selectivity <= 1.0);
crash_if_first_double_is_bigger(j->records_out,
- j->records_read *
+ j->records_init *
(j->range_rowid_filter_info ?
j->range_rowid_filter_info->selectivity :
1.0));
@@ -12637,7 +12793,10 @@ make_outerjoin_info(JOIN *join)
{
if (embedding->is_active_sjm())
{
- /* We're trying to walk out of an SJ-Materialization nest. Don't do this. */
+ /*
+ We're trying to walk out of an SJ-Materialization nest.
+ Don't do this.
+ */
break;
}
/* Ignore sj-nests: */
@@ -12918,8 +13077,10 @@ make_join_select(JOIN *join,SQL_SELECT *select,COND *cond)
tab->use_quick=1;
tab->ref.key= -1;
tab->ref.key_parts=0; // Don't use ref key.
- join->best_positions[i].records_read= rows2double(tab->quick->records);
- /*
+ join->best_positions[i].records_read=
+ join->best_positions[i].records_out=
+ rows2double(tab->quick->records);
+ /*
We will use join cache here : prevent sorting of the first
table only and sort at the end.
*/
@@ -14964,14 +15125,14 @@ void JOIN_TAB::cleanup()
/**
Estimate the time to get rows of the joined table
- Updates found_records, records, cached_scan_time, cached_covering_key,
- read_time and cache_scan_and_compare_time
+ Updates found_records, records, cached_covering_key, read_time and
+ cache_scan_and_compare_time
*/
void JOIN_TAB::estimate_scan_time()
{
THD *thd= join->thd;
- double copy_cost= ROW_COPY_COST_THD(thd);
+ double copy_cost;
cached_covering_key= MAX_KEY;
if (table->is_created())
@@ -14982,6 +15143,7 @@ void JOIN_TAB::estimate_scan_time()
&startup_cost);
table->opt_range_condition_rows= records;
table->used_stat_records= records;
+ copy_cost= table->file->ROW_COPY_COST;
}
else
{
@@ -14995,21 +15157,38 @@ void JOIN_TAB::estimate_scan_time()
if (!table->covering_keys.is_clear_all() && ! table->no_keyread)
{
cached_covering_key= find_shortest_key(table, &table->covering_keys);
- read_time= table->file->ha_key_scan_time(cached_covering_key);
- copy_cost= KEY_COPY_COST_THD(thd);
+ read_time= table->file->ha_key_scan_time(cached_covering_key, records);
+ copy_cost= 0; // included in ha_key_scan_time
}
else
- read_time= table->file->ha_scan_time();
+ {
+ read_time= table->file->ha_scan_time(records);
+ copy_cost= 0;
+ }
}
}
else
{
+ /*
+ The following is same as calling
+ TABLE_SHARE::update_optimizer_costs, but without locks
+ */
+ if (table->s->db_type() == heap_hton)
+ memcpy(&table->s->optimizer_costs, &heap_optimizer_costs,
+ sizeof(heap_optimizer_costs));
+ else
+ memcpy(&table->s->optimizer_costs, &tmp_table_optimizer_costs,
+ sizeof(tmp_table_optimizer_costs));
+ table->file->set_optimizer_costs(thd);
+ table->s->optimizer_costs_inited=1 ;
+
records= table->stat_records();
DBUG_ASSERT(table->opt_range_condition_rows == records);
- read_time= records ? (double) records: 10.0;// TODO:fix this stub
+ read_time= table->file->ha_scan_time(MY_MAX(records, 1000)); // Needs fix..
+ copy_cost= table->s->optimizer_costs.row_copy_cost;
}
+
found_records= records;
- cached_scan_time= read_time;
cached_scan_and_compare_time= (read_time + records *
(copy_cost + WHERE_COST_THD(thd)));
}
@@ -15054,7 +15233,7 @@ ha_rows JOIN_TAB::get_examined_rows()
}
}
else
- examined_rows= records_read;
+ examined_rows= records_init;
if (examined_rows >= (double) HA_ROWS_MAX)
return HA_ROWS_MAX;
@@ -18581,7 +18760,7 @@ table_map JOIN::get_allowed_nj_tables(uint idx)
first_alt TRUE <=> Use the LooseScan plan for the first_tab
no_jbuf_before Don't allow to use join buffering before this
table
- reopt_rec_count OUT New output record count
+ outer_rec_count OUT New output record count
reopt_cost OUT New join prefix cost
DESCRIPTION
@@ -18636,6 +18815,8 @@ void optimize_wo_join_buffering(JOIN *join, uint first_tab, uint last_tab,
table_map save_cur_sj_inner_tables= join->cur_sj_inner_tables;
join->cur_sj_inner_tables= 0;
+ double inner_fanout= 1.0;
+
for (i= first_tab; i <= last_tab; i++)
{
JOIN_TAB *rs= join->positions[i].table;
@@ -18648,31 +18829,43 @@ void optimize_wo_join_buffering(JOIN *join, uint first_tab, uint last_tab,
join->positions, i,
TRUE, rec_count,
&pos, &loose_scan_pos);
+ if ((i == first_tab && first_alt))
+ pos= loose_scan_pos;
}
else
pos= join->positions[i];
- if ((i == first_tab && first_alt))
- pos= loose_scan_pos;
-
reopt_remaining_tables &= ~rs->table->map;
- rec_count= COST_MULT(rec_count, pos.records_read);
cost= COST_ADD(cost, pos.read_time);
- //TODO: take into account join condition selectivity here
- double pushdown_cond_selectivity= 1.0;
- table_map real_table_bit= rs->table->map;
- if (join->thd->variables.optimizer_use_condition_selectivity > 1)
+
+ double records_out= pos.records_out;
+ /*
+ The (i != last_tab) is here to mimic what
+ best_extension_by_limited_search() does: do not call
+ table_after_join_selectivity() for the join_tab where the semi-join
+ strategy is applied
+ */
+ if (i != last_tab &&
+ join->thd->variables.optimizer_use_condition_selectivity > 1)
{
+ table_map real_table_bit= rs->table->map;
+ double __attribute__((unused)) pushdown_cond_selectivity;
pushdown_cond_selectivity=
table_after_join_selectivity(join, i, rs,
reopt_remaining_tables &
- ~real_table_bit);
+ ~real_table_bit, &records_out);
}
- (*outer_rec_count) *= pushdown_cond_selectivity;
- if (!rs->emb_sj_nest)
- *outer_rec_count= COST_MULT(*outer_rec_count, pos.records_read);
+ rec_count= COST_MULT(rec_count, records_out);
+ *outer_rec_count= COST_MULT(*outer_rec_count, records_out);
+ if (rs->emb_sj_nest)
+ inner_fanout= COST_MULT(inner_fanout, records_out);
}
+
+ /* Discount the fanout produced by the subquery */
+ if (inner_fanout > 1.0)
+ *outer_rec_count /= inner_fanout;
+
join->cur_sj_inner_tables= save_cur_sj_inner_tables;
*reopt_cost= cost;
@@ -20916,7 +21109,7 @@ TABLE *create_tmp_table_for_schema(THD *thd, TMP_TABLE_PARAM *param,
{
TABLE *table;
Create_tmp_table maker((ORDER *) NULL, false, false,
- select_options, HA_POS_ERROR);
+ select_options, HA_ROWS_MAX);
if (!(table= maker.start(thd, param, &table_alias)) ||
maker.add_schema_fields(thd, table, param, schema_table) ||
maker.finalize(thd, table, param, do_not_open, keep_row_order))
@@ -21096,7 +21289,6 @@ bool Virtual_tmp_table::sp_set_all_fields_from_item(THD *thd, Item *value)
return false;
}
-
bool open_tmp_table(TABLE *table)
{
int error;
@@ -21110,6 +21302,7 @@ bool open_tmp_table(TABLE *table)
}
table->db_stat= HA_OPEN_KEYFILE;
(void) table->file->extra(HA_EXTRA_QUICK); /* Faster */
+ table->file->set_optimizer_costs(table->in_use);
if (!table->is_created())
{
table->set_created();
@@ -24794,31 +24987,40 @@ ok:
@return
MAX_KEY no suitable key found
key index otherwise
+
+ @notes
+ We should not use keyread_time() as in the case of disk_read_cost= 0
+ all keys would be regarded equal.
*/
uint find_shortest_key(TABLE *table, const key_map *usable_keys)
{
- double min_cost= DBL_MAX;
+ size_t min_length= INT_MAX32;
uint best= MAX_KEY;
- if (!usable_keys->is_clear_all())
+ uint possible_keys= usable_keys->bits_set();
+
+ if (possible_keys)
{
+ if (possible_keys == 1)
+ return usable_keys->find_first_bit();
+
for (uint nr=0; nr < table->s->keys ; nr++)
{
if (usable_keys->is_set(nr))
{
- double cost= table->file->ha_key_scan_time(nr);
- if (cost < min_cost)
+ size_t length= table->key_storage_length(nr);
+ if (length < min_length)
{
- min_cost= cost;
- best=nr;
+ min_length= length;
+ best= nr;
}
- DBUG_ASSERT(best < MAX_KEY);
}
}
}
return best;
}
+
/**
Test if a second key is the subkey of the first one.
@@ -28335,6 +28537,7 @@ bool JOIN_TAB::save_explain_data(Explain_table_access *eta,
// psergey-todo: data for filtering!
tracker= &eta->tracker;
jbuf_tracker= &eta->jbuf_tracker;
+ jbuf_unpack_tracker= &eta->jbuf_unpack_tracker;
/* Enable the table access time tracker only for "ANALYZE stmt" */
if (thd->lex->analyze_stmt)
@@ -28563,12 +28766,13 @@ bool JOIN_TAB::save_explain_data(Explain_table_access *eta,
ha_rows examined_rows= get_examined_rows();
eta->rows_set= true;
- eta->rows= examined_rows;
+ eta->rows= double_to_rows(examined_rows);
/* "filtered" */
float f= 0.0;
if (examined_rows)
{
+#ifdef OLD_CODE // QQQ
double pushdown_cond_selectivity= cond_selectivity;
if (pushdown_cond_selectivity != 1.0)
f= (float) (100.0 * pushdown_cond_selectivity);
@@ -28576,6 +28780,9 @@ bool JOIN_TAB::save_explain_data(Explain_table_access *eta,
f= (float) (100.0 * range_rowid_filter_info->selectivity);
else
f= (float) (100.0 * records_read / examined_rows);
+#else
+ f= (float) (100.0 * records_out / examined_rows);
+#endif
}
set_if_smaller(f, 100.0);
eta->filtered_set= true;
@@ -28971,9 +29178,9 @@ int JOIN::save_explain_data_intern(Explain_query *output,
continue;
}
-
Explain_table_access *eta= (new (output->mem_root)
- Explain_table_access(output->mem_root));
+ Explain_table_access(output->mem_root,
+ thd->lex->analyze_stmt));
if (!eta)
DBUG_RETURN(1);
@@ -30005,7 +30212,7 @@ void JOIN::cache_const_exprs()
- If there is no quick select return the full cost from
cost_for_index_read() (Doing a full scan with up to 'limit' records)
- @param pos Result from best_acccess_path(). Is NULL for
+ @param pos Result from best_access_path(). Is NULL for
single-table UPDATE/DELETE
@param table Table to be sorted
@param keynr Which index to use
@@ -30091,7 +30298,7 @@ static bool get_range_limit_read_cost(const POSITION *pos,
/*
Calculate the number of rows we have to check if we are
- doing a full index scan (as a suitabe range scan was not available).
+ doing a full index scan (as a suitable range scan was not available).
We assume that each of the tested indexes is not correlated
with ref_key. Thus, to select first N records we have to scan
@@ -30280,12 +30487,12 @@ test_if_cheaper_ordering(const JOIN_TAB *tab, ORDER *order, TABLE *table,
trace_cheaper_ordering.add_table_name(tab);
else
trace_cheaper_ordering.add_table_name(table);
- trace_cheaper_ordering
- .add("rows_estimation", rows_estimate)
- .add("read_cost", read_time)
- .add("filesort_cost", filesort_cost)
- .add("filesort_type", filesort_names[filesort_type].str)
- .add("fanout", fanout);
+ trace_cheaper_ordering.
+ add("rows_estimation", rows_estimate).
+ add("filesort_cost", filesort_cost).
+ add("read_cost", read_time).
+ add("filesort_type", filesort_names[filesort_type].str).
+ add("fanout", fanout);
}
Json_writer_array possible_keys(thd,"possible_keys");
diff --git a/sql/sql_select.h b/sql/sql_select.h
index f6c1ca63a99..02af1aecd09 100644
--- a/sql/sql_select.h
+++ b/sql/sql_select.h
@@ -309,6 +309,7 @@ typedef struct st_join_table {
Table_access_tracker *tracker;
Table_access_tracker *jbuf_tracker;
+ Time_and_counter_tracker *jbuf_unpack_tracker;
// READ_RECORD::Setup_func materialize_table;
READ_RECORD::Setup_func read_first_record;
@@ -341,6 +342,9 @@ typedef struct st_join_table {
*/
double read_time;
+ /* Copy of POSITION::records_init, set by get_best_combination() */
+ double records_init;
+
/* Copy of POSITION::records_read, set by get_best_combination() */
double records_read;
@@ -356,7 +360,6 @@ typedef struct st_join_table {
double partial_join_cardinality;
/* set by estimate_scan_time() */
- double cached_scan_time;
double cached_scan_and_compare_time;
double cached_forced_index_cost;
@@ -959,21 +962,44 @@ public:
/* The table that's put into join order */
JOIN_TAB *table;
+ /* number of rows that will be read from the table */
+ double records_init;
+
+ /*
+ Number of rows left after filtering, calculated in best_access_path()
+ In case of use_cond_selectivity > 1 it contains rows after the used
+ rowid filter (if such one exists).
+ If use_cond_selectivity <= 1 it contains the minimum rows of any
+ rowid filtering or records_init if no filter exists.
+ */
+ double records_after_filter;
+
/*
- The number of rows that will be read from the table
+ Number of expected rows before applying the full WHERE clause. This
+ includes rowid filter and table->cond_selectivity if
+ use_cond_selectivity > 1. See matching_candidates_in_table().
+ Should normally not be used.
*/
double records_read;
/*
- The "fanout": number of output rows that will be produced (after
+ The number of rows after applying the WHERE clause.
+
+ Same as the "fanout": number of output rows that will be produced (after
pushed down selection condition is applied) per each row combination of
previous tables.
- This takes into account table->cond_selectivity, the WHERE clause
- related to this table calculated in
- calculate_cond_selectivity_for_table(), and the used rowid filter but
- does not take into account the WHERE clause involving preceding tables
- calculated in table_after_join_selectivity().
+ In best_access_path() it is set to the minum number of accepted rows
+ for any possible access method or filter:
+
+ records_out takes into account table->cond_selectivity, the WHERE clause
+ related to this table calculated in calculate_cond_selectivity_for_table(),
+ and the used rowid filter.
+
+ After best_access_path() records_out it does not yet take into
+ account the part of the WHERE clause involving preceding tables.
+ records_out is updated in best_extension_by_limited_search() to take these
+ tables into account by calling table_after_join_selectivity().
*/
double records_out;
diff --git a/sql/sql_show.cc b/sql/sql_show.cc
index 9510d74c86d..a716cae6955 100644
--- a/sql/sql_show.cc
+++ b/sql/sql_show.cc
@@ -50,6 +50,7 @@
#include "authors.h"
#include "contributors.h"
#include "sql_partition.h"
+#include "optimizer_defaults.h"
#ifdef HAVE_EVENT_SCHEDULER
#include "events.h"
#include "event_data_objects.h"
@@ -3668,6 +3669,9 @@ const char* get_one_variable(THD *thd,
/* 6 is the default precision for '%f' in sprintf() */
end= buff + my_fcvt(*value.as_double, 6, buff, NULL);
break;
+ case SHOW_OPTIMIZER_COST: // Stored in 1ms, displayed in us
+ end= buff + my_fcvt(*value.as_double*1000, 6, buff, NULL);
+ break;
case SHOW_LONG_STATUS:
value.as_char= status_var_value.as_char + value.as_intptr;
/* fall through */
@@ -9196,6 +9200,49 @@ int fill_key_cache_tables(THD *thd, TABLE_LIST *tables, COND *cond)
}
+/* Ensure we return 'OPTIMIZER_COST_UNDEF' if cost < 0 */
+
+static double fix_cost(double cost)
+{
+ return cost < 0 ? OPTIMIZER_COST_UNDEF : cost;
+}
+
+static int run_fill_optimizer_costs_tables(const LEX_CSTRING *name,
+ const OPTIMIZER_COSTS *costs,
+ TABLE *table)
+{
+ THD *thd= table->in_use;
+ DBUG_ENTER("run_fill_optimizer_costs_tables");
+
+ restore_record(table, s->default_values);
+ table->field[0]->store(name->str, name->length, system_charset_info);
+ table->field[1]->store(fix_cost(costs->disk_read_cost*1000.0));
+ table->field[2]->store(fix_cost(costs->index_block_copy_cost*1000.0));
+ table->field[3]->store(fix_cost(costs->key_cmp_cost*1000.0));
+ table->field[4]->store(fix_cost(costs->key_copy_cost*1000.0));
+ table->field[5]->store(fix_cost(costs->key_lookup_cost*1000.0));
+ table->field[6]->store(fix_cost(costs->key_next_find_cost*1000.0));
+ table->field[7]->store(fix_cost(costs->disk_read_ratio));
+ table->field[8]->store(fix_cost(costs->row_copy_cost*1000.0));
+ table->field[9]->store(fix_cost(costs->row_lookup_cost*1000.0));
+ table->field[10]->store(fix_cost(costs->row_next_find_cost*1000.0));
+ table->field[11]->store(fix_cost(costs->rowid_cmp_cost*1000.0));
+ table->field[12]->store(fix_cost(costs->rowid_copy_cost*1000.0));
+
+ DBUG_RETURN(schema_table_store_record(thd, table));
+}
+
+
+int fill_optimizer_costs_tables(THD *thd, TABLE_LIST *tables, COND *cond)
+{
+ DBUG_ENTER("fill_optimizer_costs_tables");
+
+ int res= process_optimizer_costs(run_fill_optimizer_costs_tables,
+ tables->table);
+ DBUG_RETURN(res);
+}
+
+
namespace Show {
ST_FIELD_INFO schema_fields_info[]=
@@ -9824,6 +9871,25 @@ ST_FIELD_INFO keycache_fields_info[]=
};
+ST_FIELD_INFO optimizer_costs_fields_info[]=
+{
+ Column("ENGINE", Varchar(NAME_LEN),NOT_NULL),
+ Column("OPTIMIZER_DISK_READ_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_INDEX_BLOCK_COPY_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_KEY_COMPARE_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_KEY_COPY_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_KEY_LOOKUP_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_KEY_NEXT_FIND_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_DISK_READ_RATIO", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_ROW_COPY_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_ROW_LOOKUP_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_ROW_NEXT_FIND_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_ROWID_COMPARE_COST", Decimal(906), NOT_NULL),
+ Column("OPTIMIZER_ROWID_COPY_COST", Decimal(906), NOT_NULL),
+ CEnd()
+};
+
+
ST_FIELD_INFO show_explain_tabular_fields_info[]=
{
Column("id", SLonglong(3), NULLABLE, "id"),
@@ -9962,6 +10028,8 @@ ST_SCHEMA_TABLE schema_tables[]=
OPTIMIZE_I_S_TABLE|OPEN_TABLE_ONLY},
{"OPEN_TABLES", Show::open_tables_fields_info, 0,
fill_open_tables, make_old_format, 0, -1, -1, 1, 0},
+ {"OPTIMIZER_COSTS", Show::optimizer_costs_fields_info, 0,
+ fill_optimizer_costs_tables, 0, 0, -1,-1, 0, 0},
{"OPTIMIZER_TRACE", Show::optimizer_trace_info, 0,
fill_optimizer_trace_info, NULL, NULL, -1, -1, false, 0},
{"PARAMETERS", Show::parameters_fields_info, 0,
diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
index f0ed46646ae..b4d0c7eaa1f 100644
--- a/sql/sql_yacc.yy
+++ b/sql/sql_yacc.yy
@@ -8277,7 +8277,7 @@ assign_to_keycache_parts:
key_cache_name:
ident { $$= $1; }
- | DEFAULT { $$ = default_key_cache_base; }
+ | DEFAULT { $$ = default_base; }
;
preload:
diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc
index e7b1e8e6d1d..0df627a640a 100644
--- a/sql/sys_vars.cc
+++ b/sql/sys_vars.cc
@@ -53,8 +53,9 @@
#include "debug_sync.h" // DEBUG_SYNC
#include "sql_show.h"
#include "opt_trace_context.h"
-
#include "log_event.h"
+#include "optimizer_defaults.h"
+
#ifdef WITH_PERFSCHEMA_STORAGE_ENGINE
#include "../storage/perfschema/pfs_server.h"
#endif /* WITH_PERFSCHEMA_STORAGE_ENGINE */
@@ -6977,68 +6978,111 @@ static Sys_var_ulong Sys_optimizer_max_sel_arg_weight(
SESSION_VAR(optimizer_max_sel_arg_weight), CMD_LINE(REQUIRED_ARG),
VALID_RANGE(0, ULONG_MAX), DEFAULT(SEL_ARG::MAX_WEIGHT), BLOCK_SIZE(1));
-/*
- We don't allow 100 for optimizer_cache_cost as there is always a small
- cost of finding the key, on cached pages, that we have to take into account.
-*/
-static bool update_optimizer_cache_hit_ratio(sys_var *self, THD *thd,
- enum_var_type type)
-{
- if (type == OPT_SESSION)
- thd->optimizer_cache_hit_ratio=
- cache_hit_ratio(thd->variables.optimizer_cache_hit_ratio);
- return 0;
-}
-
-static Sys_var_uint Sys_optimizer_cache_hit_ratio(
- "optimizer_cache_hit_ratio",
- "Expected hit rate of the row and index cache in storage engines. "
- "The value should be an integer between 0 and 99, where 0 means cache is "
- "empty and 99 means that value is almost always in the cache.",
- SESSION_VAR(optimizer_cache_hit_ratio), CMD_LINE(REQUIRED_ARG),
- VALID_RANGE(0, 99), DEFAULT(DEFAULT_CACHE_HIT_RATIO), 1, NO_MUTEX_GUARD,
- NOT_IN_BINLOG, ON_CHECK(0), ON_UPDATE(update_optimizer_cache_hit_ratio));
-
-static Sys_var_double Sys_optimizer_key_copy_cost(
+static Sys_var_engine_optimizer_cost Sys_optimizer_disk_read_ratio(
+ "optimizer_disk_read_ratio",
+ "Chance that we have to do a disk read to find a row or index entry from "
+ "the engine cache (cache_misses/total_cache_requests). 0.0 means that "
+ "everything is cached and 1.0 means that nothing is expected to be in the "
+ "engine cache.",
+ COST_VAR(disk_read_ratio),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_DISK_READ_RATIO),
+ VALID_RANGE(0.0, 1.0), DEFAULT(DEFAULT_DISK_READ_RATIO), COST_ADJUST(1));
+
+static Sys_var_engine_optimizer_cost Sys_optimizer_key_lookup_cost(
+ "optimizer_key_lookup_cost",
+ "Cost for finding a key based on a key value",
+ COST_VAR(key_lookup_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_KEY_LOOKUP_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_KEY_LOOKUP_COST), COST_ADJUST(1000));
+
+static Sys_var_engine_optimizer_cost Sys_optimizer_row_lookup_cost(
+ "optimizer_row_lookup_cost",
+ "Cost of finding a row based on a rowid or a clustered key.",
+ COST_VAR(row_lookup_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_ROW_LOOKUP_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_ROW_LOOKUP_COST), COST_ADJUST(1000));
+
+static Sys_var_engine_optimizer_cost Sys_optimizer_disk_read_cost(
+ "optimizer_disk_read_cost",
+ "Cost of reading a block of IO_SIZE (4096) from a disk (in usec).",
+ COST_VAR(disk_read_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_DISK_READ_COST),
+ VALID_RANGE(0, 10000), DEFAULT(DEFAULT_DISK_READ_COST), COST_ADJUST(1000));
+
+static Sys_var_engine_optimizer_cost Sys_optimizer_key_copy_cost(
"optimizer_key_copy_cost",
- "Cost of finding the next key in the engine and copying it to the SQL layer.",
- SESSION_VAR(optimizer_key_copy_cost), CMD_LINE(REQUIRED_ARG),
- VALID_RANGE(0, 1), DEFAULT(DEFAULT_KEY_COPY_COST), NO_MUTEX_GUARD,
- NOT_IN_BINLOG);
+ "Cost of finding the next key in the engine and copying it to the SQL "
+ "layer.",
+ COST_VAR(key_copy_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_KEY_COPY_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_KEY_COPY_COST), COST_ADJUST(1000));
-static Sys_var_double Sys_optimizer_index_block_copy_cost(
+static Sys_var_engine_optimizer_cost Sys_optimizer_index_block_copy_cost(
"optimizer_index_block_copy_cost",
- "Cost of copying a key block from the cache to intern storage as part of an "
- "index scan.",
- SESSION_VAR(optimizer_index_block_copy_cost), CMD_LINE(REQUIRED_ARG),
- VALID_RANGE(0, 1), DEFAULT(DEFAULT_INDEX_BLOCK_COPY_COST), NO_MUTEX_GUARD,
- NOT_IN_BINLOG);
-
-static Sys_var_double Sys_optimizer_key_next_find_cost(
+ "Cost of copying a key block from the cache to intern storage as part of "
+ "an index scan.",
+ COST_VAR(index_block_copy_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_INDEX_BLOCK_COPY_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_INDEX_BLOCK_COPY_COST), COST_ADJUST(1000));
+
+static Sys_var_engine_optimizer_cost Sys_optimizer_row_next_find_cost(
+ "optimizer_row_next_find_cost",
+ "Cost of finding the next row when scanning the table.",
+ COST_VAR(row_next_find_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_ROW_NEXT_FIND_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_ROW_NEXT_FIND_COST), COST_ADJUST(1000));
+
+static Sys_var_engine_optimizer_cost Sys_optimizer_key_next_find_cost(
"optimizer_key_next_find_cost",
"Cost of finding the next key and rowid when using filters.",
- SESSION_VAR(optimizer_key_next_find_cost), CMD_LINE(REQUIRED_ARG),
- VALID_RANGE(0, 1), DEFAULT(DEFAULT_KEY_NEXT_FIND_COST), NO_MUTEX_GUARD,
- NOT_IN_BINLOG);
+ COST_VAR(key_next_find_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_KEY_NEXT_FIND_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_KEY_NEXT_FIND_COST), COST_ADJUST(1000));
-static Sys_var_double Sys_optimizer_row_copy_cost(
+static Sys_var_engine_optimizer_cost Sys_optimizer_row_copy_cost(
"optimizer_row_copy_cost",
"Cost of copying a row from the engine or the join cache to the SQL layer.",
- SESSION_VAR(optimizer_row_copy_cost), CMD_LINE(REQUIRED_ARG),
- VALID_RANGE(0, 1), DEFAULT(DEFAULT_ROW_COPY_COST), NO_MUTEX_GUARD,
- NOT_IN_BINLOG);
+ COST_VAR(row_copy_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_ROW_COPY_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_ROW_COPY_COST), COST_ADJUST(1000));
-static Sys_var_double Sys_optimizer_where_cost(
- "optimizer_where_cost",
- "Cost of checking the row against the WHERE clause.",
- SESSION_VAR(optimizer_where_cost), CMD_LINE(REQUIRED_ARG),
- VALID_RANGE(0, 1), DEFAULT(DEFAULT_WHERE_COST), NO_MUTEX_GUARD,
- NOT_IN_BINLOG);
-
-static Sys_var_double Sys_optimizer_key_cmp_cost(
+static Sys_var_engine_optimizer_cost Sys_optimizer_key_cmp_cost(
"optimizer_key_compare_cost",
"Cost of checking a key against the end key condition.",
- SESSION_VAR(optimizer_key_cmp_cost), CMD_LINE(REQUIRED_ARG),
- VALID_RANGE(0, 1), DEFAULT(DEFAULT_KEY_COMPARE_COST), NO_MUTEX_GUARD,
- NOT_IN_BINLOG);
+ COST_VAR(key_cmp_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_KEY_CMP_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_KEY_COMPARE_COST), COST_ADJUST(1000));
+
+static Sys_var_engine_optimizer_cost Sys_optimizer_rowid_cmp_cost(
+ "optimizer_rowid_compare_cost",
+ "Cost of comparing two rowid's",
+ COST_VAR(rowid_cmp_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_ROWID_CMP_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_ROWID_COMPARE_COST), COST_ADJUST(1000));
+
+static Sys_var_engine_optimizer_cost Sys_optimizer_rowid_copy_cost(
+ "optimizer_rowid_copy_cost",
+ "Cost of copying a rowid",
+ COST_VAR(rowid_copy_cost),
+ CMD_LINE(REQUIRED_ARG, OPT_COSTS_ROWID_COPY_COST),
+ VALID_RANGE(0, 1000), DEFAULT(DEFAULT_ROWID_COPY_COST), COST_ADJUST(1000));
+
+/* The following costs are stored in THD and handler */
+
+static Sys_var_optimizer_cost Sys_optimizer_where_cost(
+ "optimizer_where_cost",
+ "Cost of checking the row against the WHERE clause. Increasing this will "
+ "have the optimizer to prefer plans with less row combinations.",
+ SESSION_VAR(optimizer_where_cost),
+ CMD_LINE(REQUIRED_ARG),
+ VALID_RANGE(0, 100000), DEFAULT(DEFAULT_WHERE_COST), COST_ADJUST(1000));
+
+static Sys_var_optimizer_cost Sys_optimizer_scan_cost(
+ "optimizer_scan_setup_cost",
+ "Extra cost added to TABLE and INDEX scans to get optimizer to prefer "
+ "index lookups.",
+ SESSION_VAR(optimizer_scan_setup_cost),
+ CMD_LINE(REQUIRED_ARG),
+ VALID_RANGE(0, 100000000), DEFAULT(DEFAULT_TABLE_SCAN_SETUP_COST),
+ COST_ADJUST(1000));
diff --git a/sql/sys_vars.inl b/sql/sys_vars.inl
index b1d7bc31255..5997446a61e 100644
--- a/sql/sys_vars.inl
+++ b/sql/sys_vars.inl
@@ -32,6 +32,7 @@
#include "rpl_mi.h" // For Multi-Source Replication
#include "debug_sync.h"
#include "sql_acl.h" // check_global_access()
+#include "optimizer_defaults.h" // create_optimizer_costs
/*
a set of mostly trivial (as in f(X)=X) defines below to make system variable
@@ -40,6 +41,7 @@
#define VALID_RANGE(X,Y) X,Y
#define DEFAULT(X) X
#define BLOCK_SIZE(X) X
+#define COST_ADJUST(X) X
#define GLOBAL_VAR(X) sys_var::GLOBAL, (((char*)&(X))-(char*)&global_system_variables), sizeof(X)
#define SESSION_VAR(X) sys_var::SESSION, offsetof(SV, X), sizeof(((SV *)0)->X)
#define SESSION_ONLY(X) sys_var::ONLY_SESSION, offsetof(SV, X), sizeof(((SV *)0)->X)
@@ -1048,7 +1050,7 @@ public:
/* If no basename, assume it's for the key cache named 'default' */
if (!base_name->length)
- base_name= &default_key_cache_base;
+ base_name= &default_base;
key_cache= get_key_cache(base_name);
@@ -1228,6 +1230,143 @@ public:
{ var->save_result.double_value= getopt_ulonglong2double(option.def_value); }
};
+
+/*
+ Optimizer costs
+ Stored as cost factor (1 cost = 1 ms).
+ Given and displayed as microsconds (as most values are very small)
+*/
+
+class Sys_var_optimizer_cost: public Sys_var_double
+{
+public:
+ double cost_adjust;
+ Sys_var_optimizer_cost(const char *name_arg,
+ const char *comment, int flag_args, ptrdiff_t off, size_t size,
+ CMD_LINE getopt,
+ double min_val, double max_val, double def_val,
+ ulong arg_cost_adjust, PolyLock *lock=0,
+ enum binlog_status_enum binlog_status_arg=VARIABLE_NOT_IN_BINLOG,
+ on_check_function on_check_func=0,
+ on_update_function on_update_func=0,
+ const char *substitute=0)
+ :Sys_var_double(name_arg, comment, flag_args, off, size, getopt,
+ min_val, max_val, def_val, lock,
+ binlog_status_arg,
+ on_check_func,
+ on_update_func,
+ substitute)
+ {
+ if (arg_cost_adjust == 1000)
+ {
+ show_val_type= SHOW_OPTIMIZER_COST; // For select @@var
+ option.var_type|= GET_ADJUST_VALUE;
+ }
+ cost_adjust= (double) arg_cost_adjust;
+ global_var(double)= (double)option.def_value/cost_adjust; // To usec
+ }
+ bool session_update(THD *thd, set_var *var)
+ {
+ session_var(thd, double)= var->save_result.double_value/cost_adjust;
+ return false;
+ }
+ bool global_update(THD *thd, set_var *var)
+ {
+ global_var(double)= var->save_result.double_value/cost_adjust;
+ return false;
+ }
+ void session_save_default(THD *thd, set_var *var)
+ { var->save_result.double_value= global_var(double) * cost_adjust; }
+
+ void global_save_default(THD *thd, set_var *var)
+ {
+ var->save_result.double_value= getopt_ulonglong2double(option.def_value)*
+ cost_adjust;
+ }
+};
+
+
+/*
+ The class for optimzer costs with structured names, unique for each engine.
+ Used as 'engine.variable_name'
+
+ Class specific constructor arguments:
+ everything derived from Sys_var_optimizer_cost
+
+ Backing store: double
+
+ @note these variables can be only GLOBAL
+*/
+
+#define COST_VAR(X) GLOBAL_VAR(default_optimizer_costs.X)
+#define cost_var_ptr(KC, OFF) (((uchar*)(KC))+(OFF))
+#define cost_var(KC, OFF) (*(double*)cost_var_ptr(KC, OFF))
+typedef bool (*cost_update_function)(THD *, OPTIMIZER_COSTS *, ptrdiff_t,
+ double, double);
+
+static bool update_cost(THD *thd, OPTIMIZER_COSTS *key_cache,
+ ptrdiff_t offset, double new_value, double cost_adjust)
+{
+ cost_var(key_cache, offset)= new_value / cost_adjust;
+ return 0;
+}
+
+
+class Sys_var_engine_optimizer_cost: public Sys_var_optimizer_cost
+{
+ cost_update_function cost_update;
+ public:
+ Sys_var_engine_optimizer_cost(const char *name_arg,
+ const char *comment, int flag_args, ptrdiff_t off, size_t size,
+ CMD_LINE getopt,
+ double min_val, double max_val, double def_val,
+ long cost_adjust, PolyLock *lock= 0,
+ cost_update_function on_update_func= update_cost,
+ const char *substitute=0)
+ : Sys_var_optimizer_cost(name_arg, comment, flag_args, off, size,
+ getopt, min_val, max_val, def_val, cost_adjust,
+ lock, VARIABLE_NOT_IN_BINLOG, 0,
+ 0, substitute),
+ cost_update(on_update_func)
+ {
+ option.var_type|= GET_ASK_ADDR;
+ option.value= (uchar**)1; // crash me, please
+ // fix an offset from global_system_variables to be an offset in KEY_CACHE
+ offset= global_var_ptr() - (uchar*) &default_optimizer_costs;
+ SYSVAR_ASSERT(scope() == GLOBAL);
+ }
+ bool global_update(THD *thd, set_var *var)
+ {
+ double new_value= var->save_result.double_value;
+ LEX_CSTRING *base_name= &var->base;
+ OPTIMIZER_COSTS *optimizer_costs;
+ bool res;
+
+ /* If no basename, assume it's for the default costs */
+ if (!base_name->length)
+ base_name= &default_base;
+
+ mysql_mutex_lock(&LOCK_optimizer_costs);
+ if (!(optimizer_costs= get_or_create_optimizer_costs(base_name->str,
+ base_name->length)))
+ {
+ mysql_mutex_unlock(&LOCK_optimizer_costs);
+ return true;
+ }
+ res= cost_update(thd, optimizer_costs, offset, new_value, cost_adjust);
+ mysql_mutex_unlock(&LOCK_optimizer_costs);
+ return res;
+ }
+ const uchar *global_value_ptr(THD *thd, const LEX_CSTRING *base) const
+ {
+ OPTIMIZER_COSTS *optimizer_costs= get_optimizer_costs(base);
+ if (!optimizer_costs)
+ optimizer_costs= &default_optimizer_costs;
+ return cost_var_ptr(optimizer_costs, offset);
+ }
+};
+
+
/**
The class for the @max_user_connections.
It's derived from Sys_var_uint, but non-standard session value
diff --git a/sql/table.cc b/sql/table.cc
index bcd65835ffa..57844e7734c 100644
--- a/sql/table.cc
+++ b/sql/table.cc
@@ -2294,7 +2294,7 @@ int TABLE_SHARE::init_from_binary_frm_image(THD *thd, bool write,
share->keynames.count != keys))
goto err;
- /* Allocate handler */
+ /* Allocate handler */
if (!(handler_file= get_new_handler(share, thd->mem_root,
plugin_hton(se_plugin))))
goto err;
@@ -2792,6 +2792,8 @@ int TABLE_SHARE::init_from_binary_frm_image(THD *thd, bool write,
DBUG_ASSERT((null_pos + (null_bit_pos + 7) / 8) <= share->field[0]->ptr);
}
+ share->primary_key= MAX_KEY;
+
/* Fix key->name and key_part->field */
if (key_parts)
{
@@ -2923,6 +2925,11 @@ int TABLE_SHARE::init_from_binary_frm_image(THD *thd, bool write,
}
}
+ /* Primary key must be set early as engine may use it in index_flag() */
+ share->primary_key= (primary_key < MAX_KEY &&
+ share->keys_in_use.is_set(primary_key) ?
+ primary_key : MAX_KEY);
+
key_first_info= keyinfo;
for (uint key=0 ; key < keys ; key++,keyinfo++)
{
@@ -3165,7 +3172,7 @@ int TABLE_SHARE::init_from_binary_frm_image(THD *thd, bool write,
if (primary_key < MAX_KEY &&
(share->keys_in_use.is_set(primary_key)))
{
- share->primary_key= primary_key;
+ DBUG_ASSERT(share->primary_key == primary_key);
/*
If we are using an integer as the primary key then allow the user to
refer to it as '_rowid'
@@ -3182,10 +3189,10 @@ int TABLE_SHARE::init_from_binary_frm_image(THD *thd, bool write,
}
}
else
- share->primary_key = MAX_KEY; // we do not have a primary key
+ {
+ DBUG_ASSERT(share->primary_key == MAX_KEY);
+ }
}
- else
- share->primary_key= MAX_KEY;
if (new_field_pack_flag <= 1)
{
/* Old file format with default as not null */
@@ -3413,6 +3420,27 @@ err:
}
+/*
+ Make a copy of optimizer costs to be able to access these without any locks
+ and to allow the engine to update costs.
+*/
+
+void TABLE_SHARE::update_optimizer_costs(handlerton *hton)
+{
+ if (hton != view_pseudo_hton && !(hton->flags & HTON_HIDDEN))
+ {
+ mysql_mutex_lock(&LOCK_optimizer_costs);
+ memcpy(&optimizer_costs, hton->optimizer_costs, sizeof(optimizer_costs));
+ mysql_mutex_unlock(&LOCK_optimizer_costs);
+ }
+ else
+ {
+ bzero(&optimizer_costs, sizeof(optimizer_costs));
+ MEM_UNDEFINED(&optimizer_costs, sizeof(optimizer_costs));
+ }
+}
+
+
static bool sql_unusable_for_discovery(THD *thd, handlerton *engine,
const char *sql)
{
@@ -5673,7 +5701,6 @@ void TABLE::init(THD *thd, TABLE_LIST *tl)
no_cache= false;
initialize_opt_range_structures();
-
/*
Update optimizer_costs to ensure that a SET STATEMENT of the
variables it will work.
@@ -10438,10 +10465,10 @@ inline void TABLE::initialize_opt_range_structures()
}
-double TABLE::OPT_RANGE::index_only_fetch_cost(THD *thd)
+double TABLE::OPT_RANGE::index_only_fetch_cost(TABLE *table)
{
- return (index_only_cost + (double) rows *
- thd->variables.optimizer_key_copy_cost);
+ return (index_only_cost +
+ (double) rows * table->s->optimizer_costs.key_copy_cost);
}
diff --git a/sql/table.h b/sql/table.h
index d94127ec486..edeeb6e6241 100644
--- a/sql/table.h
+++ b/sql/table.h
@@ -813,6 +813,7 @@ struct TABLE_SHARE
return is_view ? view_pseudo_hton :
db_plugin ? plugin_hton(db_plugin) : NULL;
}
+ OPTIMIZER_COSTS optimizer_costs; /* Copy of get_optimizer_costs() */
enum row_type row_type; /* How rows are stored */
enum Table_type table_type;
enum tmp_table_type tmp_table;
@@ -888,6 +889,7 @@ struct TABLE_SHARE
bool has_update_default_function;
bool can_do_row_logging; /* 1 if table supports RBR */
bool long_unique_table;
+ bool optimizer_costs_inited;
ulong table_map_id; /* for row-based replication */
@@ -1194,6 +1196,7 @@ struct TABLE_SHARE
void set_overlapped_keys();
void set_ignored_indexes();
key_map usable_indexes(THD *thd);
+ void update_optimizer_costs(handlerton *hton);
};
/* not NULL, but cannot be dereferenced */
@@ -1420,7 +1423,7 @@ public:
Cost of fetching keys with index only read and returning them to the
sql level.
*/
- double index_only_fetch_cost(THD *thd);
+ double index_only_fetch_cost(TABLE *table);
} *opt_range;
/*
Bitmaps of key parts that =const for the duration of join execution. If
@@ -1745,6 +1748,12 @@ public:
uint actual_n_key_parts(KEY *keyinfo);
ulong actual_key_flags(KEY *keyinfo);
int update_virtual_field(Field *vf, bool ignore_warnings);
+ inline size_t key_storage_length(uint index)
+ {
+ if (file->is_clustering_key(index))
+ return s->stored_rec_length;
+ return key_info[index].key_length + file->ref_length;
+ }
int update_virtual_fields(handler *h, enum_vcol_update_mode update_mode);
int update_default_fields(bool ignore_errors);
void evaluate_update_default_function();
diff --git a/sql/uniques.cc b/sql/uniques.cc
index a09655bcaca..8555fc21624 100644
--- a/sql/uniques.cc
+++ b/sql/uniques.cc
@@ -159,7 +159,7 @@ inline double log2_n_fact(double x)
total_buf_elems* log2(n_buffers) * ROWID_COMPARE_COST;
*/
-static double get_merge_buffers_cost(uint *buff_elems, uint elem_size,
+static double get_merge_buffers_cost(THD *thd, uint *buff_elems, uint elem_size,
uint *first, uint *last,
double compare_factor)
{
@@ -171,7 +171,8 @@ static double get_merge_buffers_cost(uint *buff_elems, uint elem_size,
size_t n_buffers= last - first + 1;
/* Using log2(n)=log(n)/log(2) formula */
- return (2*((double)total_buf_elems*elem_size) / IO_SIZE +
+ return (2*((double)total_buf_elems*elem_size) / IO_SIZE *
+ default_optimizer_costs.disk_read_cost +
total_buf_elems*log((double) n_buffers) * compare_factor / M_LN2);
}
@@ -185,6 +186,7 @@ static double get_merge_buffers_cost(uint *buff_elems, uint elem_size,
SYNOPSIS
get_merge_many_buffs_cost()
+ thd THD, used to get disk_read_cost
buffer buffer space for temporary data, at least
Unique::get_cost_calc_buff_size bytes
maxbuffer # of full buffers
@@ -203,7 +205,8 @@ static double get_merge_buffers_cost(uint *buff_elems, uint elem_size,
Cost of merge in disk seeks.
*/
-static double get_merge_many_buffs_cost(uint *buffer,
+static double get_merge_many_buffs_cost(THD *thd,
+ uint *buffer,
uint maxbuffer, uint max_n_elems,
uint last_n_elems, int elem_size,
double compare_factor)
@@ -231,13 +234,13 @@ static double get_merge_many_buffs_cost(uint *buffer,
uint lastbuff= 0;
for (i = 0; i <= (int) maxbuffer - MERGEBUFF*3/2; i += MERGEBUFF)
{
- total_cost+=get_merge_buffers_cost(buff_elems, elem_size,
+ total_cost+=get_merge_buffers_cost(thd, buff_elems, elem_size,
buff_elems + i,
buff_elems + i + MERGEBUFF-1,
compare_factor);
lastbuff++;
}
- total_cost+=get_merge_buffers_cost(buff_elems, elem_size,
+ total_cost+=get_merge_buffers_cost(thd, buff_elems, elem_size,
buff_elems + i,
buff_elems + maxbuffer,
compare_factor);
@@ -246,7 +249,7 @@ static double get_merge_many_buffs_cost(uint *buffer,
}
/* Simulate final merge_buff call. */
- total_cost += get_merge_buffers_cost(buff_elems, elem_size,
+ total_cost += get_merge_buffers_cost(thd, buff_elems, elem_size,
buff_elems, buff_elems + maxbuffer,
compare_factor);
return total_cost;
@@ -304,7 +307,7 @@ static double get_merge_many_buffs_cost(uint *buffer,
these will be random seeks.
*/
-double Unique::get_use_cost(uint *buffer, size_t nkeys, uint key_size,
+double Unique::get_use_cost(THD *thd, uint *buffer, size_t nkeys, uint key_size,
size_t max_in_memory_size,
double compare_factor,
bool intersect_fl, bool *in_memory)
@@ -312,7 +315,7 @@ double Unique::get_use_cost(uint *buffer, size_t nkeys, uint key_size,
size_t max_elements_in_tree;
size_t last_tree_elems;
size_t n_full_trees; /* number of trees in unique - 1 */
- double result;
+ double result, disk_read_cost;
max_elements_in_tree= ((size_t) max_in_memory_size /
ALIGN_SIZE(sizeof(TREE_ELEMENT)+key_size));
@@ -345,14 +348,15 @@ double Unique::get_use_cost(uint *buffer, size_t nkeys, uint key_size,
First, add cost of writing all trees to disk, assuming that all disk
writes are sequential.
*/
- result += DISK_SEEK_BASE_COST * n_full_trees *
- ceil(((double) key_size)*max_elements_in_tree / IO_SIZE);
- result += DISK_SEEK_BASE_COST * ceil(((double) key_size)*last_tree_elems / IO_SIZE);
+ disk_read_cost= DISK_READ_COST_THD(thd);
+ result += disk_read_cost * n_full_trees *
+ ceil(((double) key_size)*max_elements_in_tree / DISK_CHUNK_SIZE);
+ result += disk_read_cost * ceil(((double) key_size)*last_tree_elems / DISK_CHUNK_SIZE);
/* Cost of merge */
if (intersect_fl)
key_size+= sizeof(element_count);
- double merge_cost= get_merge_many_buffs_cost(buffer, (uint)n_full_trees,
+ double merge_cost= get_merge_many_buffs_cost(thd, buffer, (uint)n_full_trees,
(uint)max_elements_in_tree,
(uint)last_tree_elems, key_size,
compare_factor);
@@ -361,7 +365,8 @@ double Unique::get_use_cost(uint *buffer, size_t nkeys, uint key_size,
Add cost of reading the resulting sequence, assuming there were no
duplicate elements.
*/
- result += ceil((double)key_size*nkeys/IO_SIZE);
+ result+= (ceil((double)key_size*nkeys/IO_SIZE) *
+ default_optimizer_costs.disk_read_cost);
return result;
}
diff --git a/sql/uniques.h b/sql/uniques.h
index f4c45cde095..ecc49794efe 100644
--- a/sql/uniques.h
+++ b/sql/uniques.h
@@ -78,7 +78,7 @@ public:
return log((double) tree_elems) * compare_factor / M_LN2;
}
- static double get_use_cost(uint *buffer, size_t nkeys, uint key_size,
+ static double get_use_cost(THD *thd, uint *buffer, size_t nkeys, uint key_size,
size_t max_in_memory_size, double compare_factor,
bool intersect_fl, bool *in_memory);
inline static int get_cost_calc_buff_size(size_t nkeys, uint key_size,