Another fixup, to be combined with the previous commitsselectivity-10.7-rbz

Fixed a lot of inconsistencies in optimizer cost calculation. The main objective was get cost calculation as similar (and accurate) as possible to make different plans more comparable. - Replaced constant 2.0 with new define TABLE_SCAN_SETUP_COST. - Added RECORD_COPY_COST, the cost of finding the next row and copying it to record for table scans. - Added INDEX_COPY_COST, the cost of finding the next key and copying it to record for index scans. - Added INDEX_NEXT_FIND_COST, the cost of finding the next index entry and checking it against filter. - Some scan cost estimates did not take into account TIME_FOR_COMPARE. Now all scan costs takes this into account. (main.show_explain) - Fixed that we don't calculate TIME_FOR_COMPARE twice for some plans, like in optimize_straight_join() and greedy_search() - JOIN_TAB::scan_time() did not take into account index only scans, which produced a wrong cost when index scan was used. Fixed by adding support for covering keys. Cached also the calculated values to avoid future calls during optimization phase. - Fixed that most index cost calculation are done the same way and more close to 'range' calculations. The effects of this change are: - Cost of index scan is now lower than before, which causes some tests result to change. (innodb.innodb, main.show_explain) - Fixed the EQ_REF takes into account clustered and covered keys. - Ensured that index_scan_cost() == range(scan_of_all_rows_in_table_using_one_range) + MULTI_RANGE_READ_INFO_CONST. One effect of this is that if there is choice of doing a full index scan and a range-index scan over almost the whole table then index scan will be preferred (no range-read setup cost). - Rowid filter setup cost and filter compare cost now takes into account fetching and checking the rowid (INDEX_NEXT_FIND_COST). (main.partition_pruning heap.heap_btree main.log_state) - Introduced ha_scan_time() that takes into account the CPU cost of finding the next row and copying the row from the engine to 'record'. This causes costs of table scan to slightly increase and some test to changed their plan from ALL to RANGE or ALL to ref. (innodb.innodb_mysql, main.select_pkeycache) - Introduced ha_scan_and_compare_time() to is like ha_scan_time() but also adds the cost of checking the where clause (TIME_FOR_COMPARE). - Cost of index scan was too low before compared to anything else. - Introduced ha_keyread_time(rows) that takes into account finding the next row and copying the key value to 'record' (INDEX_COPY_COST). - Introduced ha_key_scan_time() for calculating an index scan over all rows. - Added IDX_LOOKUP_COST to keyread_time() as a startup cost. - Added index_only_fetch_cost() as a convenience function to OPT_RANGE. - keyread_time() cost is slightly reduced to prefer shorter keys. - All of the above caused some index_merge combinations to be rejected because of cost (main.index_intersect) - Added checking of the WHERE clause of the accepted rows to ROR costs in get_best_ror_intersect() - Fixed bug in get_best_ror_intersect() where 'min_cost' was not updated, and the cost we compared with was not the one that was used. - Removed '- 0.001' from 'join->best_read' and optimize_straight_join() to ensure that the 'Last_query_cost' status variable contains the same value as the one that was calculated by the optimizer. - Extend get_range_limit_read_cost() to take into considering cost_for_index_read() if there where no quick keys. This will reduce the computed cost for ORDER BY with LIMIT in some cases. (main.innodb_ext_key) - Added INDEX_NEXT_FIND_COST to Range_rowid_filter_cost_info::lookup_cost to account of the time to find and check the next key value against the container - Changed 'JOIN_TAB:::scan_time() to take into consideration clustered and covered keys. The values are now cached and we only have to call this function once. Other calls are changed to use the cached values. Function renamed to JOIN_TAB::estimate_scan_time(). Other things: - Added some 'if (thd->trace_started())' to speed up code - Removed not used function Cost_estimate::is_zero() - Simplified testing of HA_POS_ERROR in get_best_ror_intersect(). (No cost changes)
author: Monty <monty@mariadb.org> 2021-11-09 14:45:53 +0200
committer: Sergei Krivonos <sergei.krivonos@mariadb.com> 2021-11-10 15:04:47 +0200
commit: aa9410206271df35e8b9bcc8b1c2769accc19171 (patch)
tree: df49882b217ea92edbeebc1bfb3e56cd2592c1ff
parent: 38cf74dd29f7b1ae87cd665d37d225c54e594884 (diff)
download: mariadb-git-selectivity-10.7-rbz.tar.gz
19 files changed, 500 insertions, 226 deletions
diff --git a/mysql-test/include/last_query_cost.inc b/mysql-test/include/last_query_cost.inc
new file mode 100644
index 00000000000..a18fd9e4c04
--- /dev/null
+++ b/mysql-test/include/last_query_cost.inc
@@ -0,0 +1,5 @@
+--disable_query_log
+--disable_column_names
+show status like 'last_query_cost';
+--enable_column_names
+--enable_query_log
diff --git a/mysql-test/include/world.inc b/mysql-test/include/world.inc
index 1e81c5c1aa7..a6f877ce0cd 100644
--- a/mysql-test/include/world.inc
+++ b/mysql-test/include/world.inc
@@ -4,6 +4,7 @@
 
 # Table Country
 
+BEGIN;
 INSERT IGNORE INTO Country VALUES
 ('AFG','Afghanistan',652090.00,22720000,1),
 ('NLD','Netherlands',41526.00,15864000,5),
@@ -5339,5 +5340,6 @@ INSERT INTO CountryLanguage VALUES
 ('CHN','Dong',0.2),
 ('RUS','Belorussian',0.3),
 ('USA','Portuguese',0.2);
+COMMIT;
 
 ANALYZE TABLE Country, City, CountryLanguage;
diff --git a/mysql-test/main/costs.result b/mysql-test/main/costs.result
new file mode 100644
index 00000000000..11558419b37
--- /dev/null
+++ b/mysql-test/main/costs.result
@@ -0,0 +1,91 @@
+create table t1 (a int primary key, b int, c int, d int, e int, key ba (b,a), key bda (b,d,a), key cba (c,b,a), key cb (c,b), key d (d)) engine=aria;
+insert into t1 select seq,seq,seq,seq,seq from seq_1_to_10;
+insert into t1 values(20,2,2,2,2),(21,3,4,5,6);
+#
+# Get different scan costs
+#
+explain select sum(e) as "table_scan" from t1;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	ALL	NULL	NULL	NULL	NULL	12	
+Last_query_cost	5.700000
+explain select sum(a) as "index scan" from t1;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	index	NULL	PRIMARY	4	NULL	12	Using index
+Last_query_cost	2.677929
+#
+# Range scans should be used if we don't examine all rows in the table
+#
+explain select count(a) from t1;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Select tables optimized away
+Last_query_cost	0.000000
+explain select count(*) from t1 where a > 0;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	index	PRIMARY	PRIMARY	4	NULL	12	Using where; Using index
+Last_query_cost	2.677929
+explain select count(*) from t1 where a > 1;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	index	PRIMARY	PRIMARY	4	NULL	12	Using where; Using index
+Last_query_cost	2.677929
+explain select count(*) from t1 where a > 2;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	PRIMARY	PRIMARY	4	NULL	11	Using where; Using index
+Last_query_cost	2.485185
+#
+# Shorter indexes are prefered over longer indexs
+#
+explain select sum(a+b) from t1;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	index	NULL	ba	9	NULL	12	Using index
+Last_query_cost	2.679394
+explain select count(*) from t1 where b between 5 and 10;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	ba,bda	ba	5	NULL	6	Using where; Using index
+Last_query_cost	1.422197
+explain select sum(b+c) from t1 where b between 5 and 6 and c between 5 and 6;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	ba,bda,cba,cb	cb	10	NULL	2	Using where; Using index
+Last_query_cost	0.570781
+# Cost of 'd' should be slightly smaller as key 'ba' is longer than 'd'
+explain select count(*) from t1 where b > 6;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	ba,bda	ba	5	NULL	5	Using where; Using index
+Last_query_cost	1.209331
+explain select count(*) from t1 where d > 6;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	d	d	5	NULL	5	Using where; Using index
+Last_query_cost	1.208843
+#
+# Check covering index usage
+#
+explain select a,b,c from t1 where a=b;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	index	NULL	cba	14	NULL	12	Using where; Using index
+Last_query_cost	2.680859
+#
+# Prefer ref keys over ranges
+#
+explain select count(*) from t1 where b=2;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	ref	ba,bda	ba	5	const	2	Using index
+Last_query_cost	0.550732
+explain select count(*) from t1 where b=2 and c=2;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	ref	ba,bda,cba,cb	cb	10	const,const	2	Using index
+Last_query_cost	0.550781
+explain select count(*) from t1 where b=3 and c between 3 and 4;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	ba,bda,cba,cb	cb	10	NULL	2	Using where; Using index
+Last_query_cost	0.570781
+#
+# Prefer eq keys over ref keys
+#
+explain select a,b,e from t1 where a=10 or a=11;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	PRIMARY	PRIMARY	4	NULL	2	Using index condition
+Last_query_cost	2.670488
+explain select a,b,e from t1 where d=10 or d=11;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	d	d	5	NULL	2	Using index condition
+Last_query_cost	2.670537
+drop table t1;
diff --git a/mysql-test/main/costs.test b/mysql-test/main/costs.test
new file mode 100644
index 00000000000..6dcc41b57a3
--- /dev/null
+++ b/mysql-test/main/costs.test
@@ -0,0 +1,75 @@
+#
+# Test of cost calcuations. This test is using the Aria engine as the cost
+# calculations are stable for it.
+#
+--source include/have_sequence.inc
+
+create table t1 (a int primary key, b int, c int, d int, e int, key ba (b,a), key bda (b,d,a), key cba (c,b,a), key cb (c,b), key d (d)) engine=aria;
+insert into t1 select seq,seq,seq,seq,seq from seq_1_to_10;
+insert into t1 values(20,2,2,2,2),(21,3,4,5,6);
+
+--echo #
+--echo # Get different scan costs
+--echo #
+
+explain select sum(e) as "table_scan" from t1;
+--source include/last_query_cost.inc
+explain select sum(a) as "index scan" from t1;
+--source include/last_query_cost.inc
+
+--echo #
+--echo # Range scans should be used if we don't examine all rows in the table
+--echo #
+explain select count(a) from t1;
+--source include/last_query_cost.inc
+explain select count(*) from t1 where a > 0;
+--source include/last_query_cost.inc
+explain select count(*) from t1 where a > 1;
+--source include/last_query_cost.inc
+explain select count(*) from t1 where a > 2;
+--source include/last_query_cost.inc
+
+--echo #
+--echo # Shorter indexes are prefered over longer indexs
+--echo #
+explain select sum(a+b) from t1;
+--source include/last_query_cost.inc
+explain select count(*) from t1 where b between 5 and 10;
+--source include/last_query_cost.inc
+explain select sum(b+c) from t1 where b between 5 and 6 and c between 5 and 6;
+--source include/last_query_cost.inc
+
+--echo # Cost of 'd' should be slightly smaller as key 'ba' is longer than 'd'
+explain select count(*) from t1 where b > 6;
+--source include/last_query_cost.inc
+explain select count(*) from t1 where d > 6;
+--source include/last_query_cost.inc
+
+
+--echo #
+--echo # Check covering index usage
+--echo #
+explain select a,b,c from t1 where a=b;
+--source include/last_query_cost.inc
+
+--echo #
+--echo # Prefer ref keys over ranges
+--echo #
+
+explain select count(*) from t1 where b=2;
+--source include/last_query_cost.inc
+explain select count(*) from t1 where b=2 and c=2;
+--source include/last_query_cost.inc
+explain select count(*) from t1 where b=3 and c between 3 and 4;
+--source include/last_query_cost.inc
+
+--echo #
+--echo # Prefer eq keys over ref keys
+--echo #
+
+explain select a,b,e from t1 where a=10 or a=11;
+--source include/last_query_cost.inc
+explain select a,b,e from t1 where d=10 or d=11;
+--source include/last_query_cost.inc
+
+drop table t1;
diff --git a/mysql-test/main/innodb_ext_key,off.rdiff b/mysql-test/main/innodb_ext_key,off.rdiff
index ff3469082b9..f363a2f856b 100644
--- a/mysql-test/main/innodb_ext_key,off.rdiff
+++ b/mysql-test/main/innodb_ext_key,off.rdiff
@@ -1,19 +1,19 @@
---- main/innodb_ext_key.result	2021-11-02 14:21:53.618835348 +0200
-+++ main/innodb_ext_key,off.reject	2021-11-02 14:21:43.358869061 +0200
+--- main/innodb_ext_key.result	2021-11-08 20:14:49.411376232 +0200
++++ /home/my/maria-10.7/mysql-test/main/innodb_ext_key,off.reject	2021-11-08 20:19:02.426517119 +0200
 @@ -9,7 +9,7 @@
  explain
  select count(*) from lineitem where l_orderkey=130 and l_shipdate='1992-07-01';
  id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
--1	SIMPLE	lineitem	ref	PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity	i_l_shipdate	8	const,const	1	Using index
-+1	SIMPLE	lineitem	ref	PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity	i_l_shipdate	4	const	6	Using where; Using index
+-1	SIMPLE	lineitem	ref|filter	PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity	i_l_orderkey|i_l_shipdate	4|8	const	5 (0%)	Using where; Using rowid filter
++1	SIMPLE	lineitem	index_merge	PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity	i_l_orderkey,i_l_shipdate	4,4	NULL	1	Using intersect(i_l_orderkey,i_l_shipdate); Using where; Using index
  flush status;
  select count(*) from lineitem where l_orderkey=130 and l_shipdate='1992-07-01';
  count(*)
 @@ -19,7 +19,7 @@
  Handler_read_first	0
- Handler_read_key	1
+ Handler_read_key	2
  Handler_read_last	0
--Handler_read_next	1
+-Handler_read_next	2
 +Handler_read_next	6
  Handler_read_prev	0
  Handler_read_retry	0
@@ -95,13 +95,16 @@
  where l_shipdate='1992-07-01' and l_orderkey=130;
  id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 -1	SIMPLE	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Select tables optimized away
-+1	SIMPLE	lineitem	ref	PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity	i_l_shipdate	4	const	6	Using where; Using index
++1	SIMPLE	lineitem	index_merge	PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity	i_l_orderkey,i_l_shipdate	4,4	NULL	1	Using intersect(i_l_orderkey,i_l_shipdate); Using where; Using index
  flush status;
  select max(l_linenumber) from lineitem 
  where l_shipdate='1992-07-01' and l_orderkey=130;
-@@ -145,7 +145,7 @@
+@@ -143,9 +143,9 @@
+ show status like 'handler_read%';
+ Variable_name	Value
  Handler_read_first	0
- Handler_read_key	1
+-Handler_read_key	1
++Handler_read_key	2
  Handler_read_last	0
 -Handler_read_next	0
 +Handler_read_next	6
@@ -265,7 +268,7 @@
  drop table t0,t1,t2;
  #
  # MDEV-11196: Error:Run-Time Check Failure #2 - Stack around the variable 'key_buff'
-@@ -766,11 +766,12 @@
+@@ -766,13 +766,14 @@
      "select_id": 1,
      "table": {
        "table_name": "t1",
@@ -279,8 +282,11 @@
 +      "used_key_parts": ["f2"],
 +      "ref": ["const"],
        "rows": 1,
-       "filtered": 100,
+-      "filtered": 50,
++      "filtered": 100,
        "index_condition": "t1.pk1 <= 5 and t1.pk2 <= 5 and t1.f2 = 'abc'",
+       "attached_condition": "t1.f1 <= '3'"
+     }
 @@ -799,8 +800,8 @@
        "access_type": "range",
        "possible_keys": ["k1"],
@@ -290,5 +296,5 @@
 +      "key_length": "3007",
 +      "used_key_parts": ["pk1", "f2"],
        "rows": 1,
-       "filtered": 100,
+       "filtered": 50,
        "index_condition": "t1.f2 <= 5 and t1.pk2 <= 5 and t1.pk1 = 'abc'",
diff --git a/mysql-test/main/innodb_ext_key.result b/mysql-test/main/innodb_ext_key.result
index 27c2299c2ec..a342b9cb78d 100644
--- a/mysql-test/main/innodb_ext_key.result
+++ b/mysql-test/main/innodb_ext_key.result
@@ -9,7 +9,7 @@ use dbt3_s001;
 explain
 select count(*) from lineitem where l_orderkey=130 and l_shipdate='1992-07-01';
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	lineitem	ref	PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity	i_l_shipdate	8	const,const	1	Using index
+1	SIMPLE	lineitem	ref|filter	PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity	i_l_orderkey|i_l_shipdate	4|8	const	5 (0%)	Using where; Using rowid filter
 flush status;
 select count(*) from lineitem where l_orderkey=130 and l_shipdate='1992-07-01';
 count(*)
@@ -17,9 +17,9 @@ count(*)
 show status like 'handler_read%';
 Variable_name	Value
 Handler_read_first	0
-Handler_read_key	1
+Handler_read_key	2
 Handler_read_last	0
-Handler_read_next	1
+Handler_read_next	2
 Handler_read_prev	0
 Handler_read_retry	0
 Handler_read_rnd	0
@@ -385,8 +385,8 @@ SELECT a FROM t1 AS t, t2
 WHERE c = a AND b IN (SELECT b FROM t1, t2 WHERE b = t.b);
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 1	PRIMARY	t	index	a,b	b	7	NULL	10	Using index
-1	PRIMARY	t1	ref	b	b	3	test.t.b	2	Using index; Start temporary
-1	PRIMARY	t2	index	NULL	PRIMARY	4	NULL	11	Using index; End temporary; Using join buffer (flat, BNL join)
+1	PRIMARY	t1	ref	b	b	3	test.t.b	2	Using index
+1	PRIMARY	t2	index	NULL	PRIMARY	4	NULL	11	Using index; FirstMatch(t)
 1	PRIMARY	t2	eq_ref	PRIMARY	PRIMARY	4	test.t.a	1	Using index
 SELECT a FROM t1 AS t, t2 
 WHERE c = a AND b IN (SELECT b FROM t1, t2 WHERE b = t.b);
@@ -645,7 +645,7 @@ a
 2
 explain select a from t2 where b is null order by a desc limit 2;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t2	range	b	b	9	NULL	3	Using where; Using filesort
+1	SIMPLE	t2	index	b	PRIMARY	8	NULL	2	Using where
 select a from t2 where b is null order by a desc limit 2;
 a
 3
@@ -772,7 +772,7 @@ EXPLAIN
       "key_length": "3070",
       "used_key_parts": ["f2", "pk1"],
       "rows": 1,
-      "filtered": 100,
+      "filtered": 50,
       "index_condition": "t1.pk1 <= 5 and t1.pk2 <= 5 and t1.f2 = 'abc'",
       "attached_condition": "t1.f1 <= '3'"
     }
@@ -802,7 +802,7 @@ EXPLAIN
       "key_length": "3011",
       "used_key_parts": ["pk1", "f2", "pk2"],
       "rows": 1,
-      "filtered": 100,
+      "filtered": 50,
       "index_condition": "t1.f2 <= 5 and t1.pk2 <= 5 and t1.pk1 = 'abc'",
       "attached_condition": "t1.f1 <= '3'"
     }
diff --git a/mysql-test/main/join.result b/mysql-test/main/join.result
index 6ea42ea6588..04fb014ac43 100644
--- a/mysql-test/main/join.result
+++ b/mysql-test/main/join.result
@@ -892,7 +892,7 @@ id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 1	SIMPLE	t1	ALL	NULL	NULL	NULL	NULL	10	
 show status like '%cost%';
 Variable_name	Value
-Last_query_cost	6.016090
+Last_query_cost	7.016090
 select 'The cost of accessing t1 (dont care if it changes' '^';
 The cost of accessing t1 (dont care if it changes
 The cost of accessing t1 (dont care if it changes^
@@ -906,7 +906,7 @@ id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 1	SIMPLE	B	eq_ref	PRIMARY	PRIMARY	4	test.A.b	1	
 show status like '%cost%';
 Variable_name	Value
-Last_query_cost	34.016090
+Last_query_cost	35.016090
 select '^^: The above should be ~= 20 + cost(select * from t1). Value less than 20 is an error' Z;
 Z
 ^^: The above should be ~= 20 + cost(select * from t1). Value less than 20 is an error
@@ -1323,9 +1323,9 @@ FROM t4 JOIN
 (t1 JOIN t3 ON t3.ref_t1=t1.c1 JOIN t2 ON t2.ref_t1=t1.c1)
 ON t4.ref_t1=t1.c1;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t4	ALL	NULL	NULL	NULL	NULL	4	
-1	SIMPLE	t1	ALL	PRIMARY	NULL	NULL	NULL	3	Using where; Using join buffer (flat, BNL join)
-1	SIMPLE	t3	ALL	NULL	NULL	NULL	NULL	2	Using where; Using join buffer (incremental, BNL join)
+1	SIMPLE	t4	ALL	NULL	NULL	NULL	NULL	4	Using where
+1	SIMPLE	t1	eq_ref	PRIMARY	PRIMARY	4	test.t4.ref_t1	1	
+1	SIMPLE	t3	ALL	NULL	NULL	NULL	NULL	2	Using where; Using join buffer (flat, BNL join)
 1	SIMPLE	t2	ALL	NULL	NULL	NULL	NULL	1	Using where; Using join buffer (incremental, BNL join)
 EXPLAIN
 SELECT * 
diff --git a/sql/filesort.cc b/sql/filesort.cc
index 07194cb2e4f..fabe185ecfd 100644
--- a/sql/filesort.cc
+++ b/sql/filesort.cc
@@ -1601,7 +1601,7 @@ static bool check_if_pq_applicable(Sort_param *param,
         (PQ_slowness * num_rows + param->max_keys_per_buffer) *
         log((double) param->max_keys_per_buffer) / TIME_FOR_COMPARE_ROWID;
       const double pq_io_cost=
-        param->max_rows * table->file->scan_time() / 2.0;
+        param->max_rows * table->file->ha_scan_time(num_rows) / 2.0;
       const double pq_cost= pq_cpu_cost + pq_io_cost;
 
       if (sort_merge_cost < pq_cost)
diff --git a/sql/handler.cc b/sql/handler.cc
index 4d53c3e8424..2427ecfcefe 100644
--- a/sql/handler.cc
+++ b/sql/handler.cc
@@ -3208,6 +3208,9 @@ LEX_CSTRING *handler::engine_name()
   by index 'keyno' of one range containing 'rows' key entries.
   If ranges == 0 then the function returns only the cost of copying
   those key entries into the engine buffers.
+
+ This function doesn't take in account into copying the key to record
+ (INDEX_COPY_COST) or comparing the key to the where clause (TIME_FOR_COMPARE)
 */
 
 double handler::keyread_time(uint index, uint ranges, ha_rows rows)
@@ -3220,8 +3223,19 @@ double handler::keyread_time(uint index, uint ranges, ha_rows rows)
   if (ranges)
   {
     uint keys_per_block= (uint) (stats.block_size*3/4/len+1);
-    ulonglong blocks= (rows+ keys_per_block- 1)/keys_per_block;
-    cost+= blocks;
+    /*
+      We let the cost grow slowly in proportion to number of rows to
+      promote indexes with less rows.
+      We do not calculate exact number of block reads as then index
+      only reads will be more costly than normal reads, especially
+      compared to InnoDB clustered keys.
+
+      KEYREAD_TIME_SETUP_COST is the cost of finding the first key in
+      the reange.  Finding the next one is usually a fast operation so we
+      don't count it here, it is taken into account in
+      ha_keyread_time()
+    */
+    cost+= (double) (rows / keys_per_block) + IDX_LOOKUP_COST;
   }
   return cost;
 }
diff --git a/sql/handler.h b/sql/handler.h
index 544a2a2f21e..72995db0e18 100644
--- a/sql/handler.h
+++ b/sql/handler.h
@@ -2890,17 +2890,6 @@ public:
            CPU_COEFF*idx_cpu_cost;
   }
 
-  /**
-    Whether or not all costs in the object are zero
-
-    @return true if all costs are zero, false otherwise
-  */
-  bool is_zero() const
-  {
-    return io_count == 0.0 && idx_io_count == 0.0 && cpu_cost == 0.0 &&
-      import_cost == 0.0 && mem_cost == 0.0 && comp_cost;
-  }
-
   void reset()
   {
     avg_io_cost= 1.0;
@@ -3634,29 +3623,34 @@ public:
   }
 
   /*
-     Time for a full table scan + fetching the rows from the table
-     The +1 here is to ensure that we prefer unique keys over table scans
-     for small tables
+    Time for a full table data scan. To be overrided by engines, should not
+    be used by the sql level.
   */
-  double ha_scan_time()
+  virtual double scan_time()
   {
-    return scan_time() + (double) stats.records * RECORD_COPY_COST + 1;
+    return ((ulonglong2double(stats.data_file_length) / stats.block_size + 2) *
+            avg_io_cost());
   }
 
   /*
-    Time for a full table scan of data file
+     Time for a full table scan + fetching the rows from the table
   */
-  virtual double scan_time()
+  inline double ha_scan_time(ha_rows records)
   {
-    return ((ulonglong2double(stats.data_file_length) / stats.block_size + 2) *
-            avg_io_cost());
+    return scan_time() + (double) records * RECORD_COPY_COST;
   }
 
-  virtual double key_scan_time(uint index)
+  /*
+     Time for a full table scan, fetching the rows from the table and comparing
+     the row with the where clause
+  */
+  inline double ha_scan_and_compare_time(ha_rows records)
   {
-    return keyread_time(index, 1, records());
+    return (scan_time() +
+            (double) records * (RECORD_COPY_COST + 1/TIME_FOR_COMPARE));
   }
 
+
   virtual double avg_io_cost()
   {
    return 1.0;
@@ -3677,6 +3671,7 @@ public:
   virtual double read_time(uint index, uint ranges, ha_rows rows)
   { return rows2double(ranges+rows); }
 
+
   /**
     Calculate cost of 'keyread' scan for given index and number of records.
 
@@ -3686,6 +3681,31 @@ public:
   */
   virtual double keyread_time(uint index, uint ranges, ha_rows rows);
 
+  /*
+    Calculate cost of 'keyread' scan for given index and number of records
+    including fetching the key to the 'record' buffer.
+  */
+
+  inline double ha_keyread_time(uint index, uint ranges, ha_rows rows)
+  {
+    return keyread_time(index, ranges, rows) + (double) rows * INDEX_COPY_COST;
+  }
+
+  /* Cost of doing a full index scan */
+  inline double ha_key_scan_time(uint index)
+  {
+    return ha_keyread_time(index, 1, records());
+  }
+
+  /*
+    Time for a full table index scan (without copy or compare cost).
+    To be overrided by engines, sql level should use ha_key_scan_time().
+  */
+  virtual double key_scan_time(uint index)
+  {
+    return keyread_time(index, 1, records());
+  }
+
   virtual const key_map *keys_to_use_for_scanning() { return &key_map_empty; }
 
   /*
diff --git a/sql/multi_range_read.cc b/sql/multi_range_read.cc
index 3f8f2aea742..3409d5aa585 100644
--- a/sql/multi_range_read.cc
+++ b/sql/multi_range_read.cc
@@ -310,10 +310,18 @@ handler::multi_range_read_info_const(uint keyno, RANGE_SEQ_IF *seq,
     if (!is_clustering_key(keyno))
     {
       cost->idx_io_count= (double) io_blocks;
-      cost->idx_cpu_cost= (keyread_time(keyno, 0, total_rows) +
-                           n_ranges * IDX_LOOKUP_COST);
       if (!(*flags & HA_MRR_INDEX_ONLY))
+      {
+        cost->idx_cpu_cost= (keyread_time(keyno, 1, total_rows) +
+                             (n_ranges-1) * IDX_LOOKUP_COST);
         cost->cpu_cost= read_time(keyno, 0, total_rows);
+      }
+      else
+      {
+        /* Index only read */
+        cost->idx_cpu_cost= (ha_keyread_time(keyno, 1, total_rows) +
+                             (n_ranges-1) * IDX_LOOKUP_COST);
+      }
     }
     else
     {
@@ -399,12 +407,20 @@ ha_rows handler::multi_range_read_info(uint keyno, uint n_ranges, uint n_rows,
       index leaf blocks we have to read for finding n_rows.
     */
     cost->idx_io_count=  n_ranges;
-    cost->idx_cpu_cost= (keyread_time(keyno, 0, n_rows) +
-                         n_ranges * IDX_LOOKUP_COST);
     if (!(*flags & HA_MRR_INDEX_ONLY))
     {
+      cost->idx_cpu_cost= (keyread_time(keyno, 1, n_rows) +
+                           (n_ranges-1) * IDX_LOOKUP_COST);
       cost->cpu_cost= read_time(keyno, 0, n_rows);
     }
+    else
+    {
+      /*
+        Same as above, but take into account copying the key to the upper level.
+      */
+      cost->idx_cpu_cost= (ha_keyread_time(keyno, 1, n_rows) +
+                           (n_ranges-1) * IDX_LOOKUP_COST);
+    }
   }
   else
   {
@@ -2012,7 +2028,7 @@ bool DsMrr_impl::get_disk_sweep_mrr_cost(uint keynr, ha_rows rows, uint flags,
     cost->mem_cost= (double)rows_in_last_step * elem_size;
   
   /* Total cost of all index accesses */
-  index_read_cost= primary_file->keyread_time(keynr, 1, rows);
+  index_read_cost= primary_file->ha_keyread_time(keynr, 1, rows);
   cost->add_io(index_read_cost, 1 /* Random seeks */);
   return FALSE;
 }
diff --git a/sql/opt_range.cc b/sql/opt_range.cc
index 6732f53eb28..1eb199da103 100644
--- a/sql/opt_range.cc
+++ b/sql/opt_range.cc
@@ -2675,7 +2675,6 @@ int SQL_SELECT::test_quick_select(THD *thd, key_map keys_to_use,
                                   bool only_single_index_range_scan)
 {
   uint idx;
-  double scan_time;
   Item *notnull_cond= NULL;
   TABLE_READ_PLAN *best_trp= NULL;
   SEL_ARG **backup_keys= 0;
@@ -2701,20 +2700,25 @@ int SQL_SELECT::test_quick_select(THD *thd, key_map keys_to_use,
   notnull_cond= head->notnull_cond;
 
   if (head->force_index || force_quick_range)
-    scan_time= read_time= DBL_MAX;
+    read_time= DBL_MAX;
   else
   {
-    scan_time= rows2double(records) / TIME_FOR_COMPARE;
+    double cmp_time= rows2double(records) / TIME_FOR_COMPARE;
     /*
-      The 2 is there to prefer range scans to full table scans.
-      This is mainly to make the test suite happy as many tests has
-      very few rows. In real life tables has more than a few rows and the
-      +2 has no practical effect.
+      The TABLE_SCAN_SETUP_COST is there to prefer range scans to full
+      table scans.  This is mainly to make the test suite happy as
+      many tests has very few rows. In real life tables has more than
+      a few rows and the extra cost has no practical effect.
     */
-    read_time= (double) head->file->scan_time() + scan_time + 2;
-    if (limit < records && read_time < (double) records + scan_time + 1 )
+    read_time= ((double) head->file->ha_scan_time(records) + cmp_time +
+                TABLE_SCAN_SETUP_COST);
+    /*
+      The test for read_time is there only to not modify read_time if
+      ha_scan_time() returned a really big value
+    */
+    if (limit < records && read_time < (double) records + cmp_time + 1)
     {
-      read_time= (double) records + scan_time + 1; // Force to use index
+      read_time= (double) records + cmp_time + 1; // Force to use index
       notnull_cond= NULL;
     }
   }
@@ -2854,7 +2858,7 @@ int SQL_SELECT::test_quick_select(THD *thd, key_map keys_to_use,
     if (!force_quick_range && !head->covering_keys.is_clear_all())
     {
       int key_for_use= find_shortest_key(head, &head->covering_keys);
-      double key_read_time= (head->file->key_scan_time(key_for_use) +
+      double key_read_time= (head->file->ha_key_scan_time(key_for_use) +
                              rows2double(records) / TIME_FOR_COMPARE);
       DBUG_PRINT("info",  ("'all'+'using index' scan will be using key %d, "
                            "read time %g", key_for_use, key_read_time));
@@ -5878,7 +5882,7 @@ bool prepare_search_best_index_intersect(PARAM *param,
       continue;
     }
 
-    cost= table->opt_range[(*index_scan)->keynr].index_only_cost;
+    cost= table->opt_range[(*index_scan)->keynr].index_only_fetch_cost();
 
     idx_scan.add("cost", cost);
 
@@ -6625,7 +6629,7 @@ ROR_SCAN_INFO *make_ror_scan(const PARAM *param, int idx, SEL_ARG *sel_arg)
       bitmap_set_bit(&ror_scan->covered_fields, key_part->fieldnr-1);
   }
   ror_scan->index_read_cost=
-    param->table->file->keyread_time(ror_scan->keynr, 1, ror_scan->records);
+    param->table->file->ha_keyread_time(ror_scan->keynr, 1, ror_scan->records);
   DBUG_RETURN(ror_scan);
 }
 
@@ -6986,8 +6990,8 @@ static bool ror_intersect_add(ROR_INTERSECT_INFO *info,
       each record of every scan. Assuming 1/TIME_FOR_COMPARE_ROWID
       per check this gives us:
     */
-    const double idx_cost= rows2double(info->index_records) /
-                              TIME_FOR_COMPARE_ROWID;
+    const double idx_cost= (rows2double(info->index_records) /
+                            TIME_FOR_COMPARE_ROWID);
     info->index_scan_costs+= idx_cost;
     trace_costs->add("index_scan_cost", idx_cost);
   }
@@ -7098,6 +7102,7 @@ TRP_ROR_INTERSECT *get_best_ror_intersect(const PARAM *param, SEL_TREE *tree,
   uint idx;
   double min_cost= DBL_MAX;
   THD *thd= param->thd;
+  ha_rows best_rows;
   DBUG_ENTER("get_best_ror_intersect");
   DBUG_PRINT("enter", ("opt_range_condition_rows: %llu  cond_selectivity: %g",
                        (ulonglong) param->table->opt_range_condition_rows,
@@ -7215,7 +7220,7 @@ TRP_ROR_INTERSECT *get_best_ror_intersect(const PARAM *param, SEL_TREE *tree,
       /* Local minimum found, save it */
       ror_intersect_cpy(intersect_best, intersect);
       intersect_scans_best= intersect_scans_end;
-      min_cost = intersect->total_cost;
+      min_cost= intersect->total_cost;
       trace_idx.add("chosen", true);
     }
     else
@@ -7254,6 +7259,7 @@ TRP_ROR_INTERSECT *get_best_ror_intersect(const PARAM *param, SEL_TREE *tree,
     if (ror_intersect_add(intersect, cpk_scan, &trace_cpk, TRUE) &&
         (intersect->total_cost < min_cost))
     {
+      min_cost= intersect->total_cost;
       trace_cpk.add("clustered_pk_scan_added_to_intersect", true)
                .add("cumulated_cost", intersect->total_cost);
       intersect_best= intersect; //just set pointer here
@@ -7274,6 +7280,14 @@ TRP_ROR_INTERSECT *get_best_ror_intersect(const PARAM *param, SEL_TREE *tree,
   }
   trace_cpk.end();
 
+  /*
+    Adjust row count and add the cost of comparing the final rows to the
+    WHERE clause
+  */
+  best_rows= double2rows(intersect_best->out_rows);
+  set_if_bigger(best_rows, 1);
+  min_cost+= best_rows/TIME_FOR_COMPARE;
+
   /* Ok, return ROR-intersect plan if we have found one */
   TRP_ROR_INTERSECT *trp= NULL;
   if (min_cost < read_time && (cpk_scan || best_num > 1))
@@ -7287,11 +7301,7 @@ TRP_ROR_INTERSECT *get_best_ror_intersect(const PARAM *param, SEL_TREE *tree,
     memcpy(trp->first_scan, intersect_scans, best_num*sizeof(ROR_SCAN_INFO*));
     trp->last_scan=  trp->first_scan + best_num;
     trp->is_covering= intersect_best->is_covering;
-    trp->read_cost= intersect_best->total_cost;
-    /* Prevent divisons by zero */
-    ha_rows best_rows = double2rows(intersect_best->out_rows);
-    if (!best_rows)
-      best_rows= 1;
+    trp->read_cost= min_cost;
     set_if_smaller(param->table->opt_range_condition_rows, best_rows);
     trp->records= best_rows;
     trp->index_scan_costs= intersect_best->index_scan_costs;
@@ -7548,19 +7558,19 @@ static TRP_RANGE *get_key_scans_params(PARAM *param, SEL_TREE *tree,
                                         for_range_access, &mrr_flags,
                                         &buf_size, &cost, &is_ror_scan);
 
-      if (!for_range_access && !is_ror_scan &&
-          !optimizer_flag(param->thd,OPTIMIZER_SWITCH_INDEX_MERGE_SORT_UNION))
+      if (found_records == HA_POS_ERROR ||
+          (!for_range_access && !is_ror_scan &&
+           !optimizer_flag(param->thd,OPTIMIZER_SWITCH_INDEX_MERGE_SORT_UNION)))
       {
         /* The scan is not a ROR-scan, just skip it */
         continue;
       }
-
-      if (found_records != HA_POS_ERROR && tree->index_scans &&
+      found_read_time= cost.total_cost();
+      if (tree->index_scans &&
           (index_scan= (INDEX_SCAN_INFO *)alloc_root(param->mem_root,
 						     sizeof(INDEX_SCAN_INFO))))
       {
         Json_writer_array trace_range(thd, "ranges");
-
         const KEY &cur_key= param->table->key_info[keynr];
         const KEY_PART_INFO *key_part= cur_key.key_part;
 
@@ -7581,15 +7591,14 @@ static TRP_RANGE *get_key_scans_params(PARAM *param, SEL_TREE *tree,
                  .add("using_mrr", !(mrr_flags & HA_MRR_USE_DEFAULT_IMPL))
                  .add("index_only", read_index_only)
                  .add("rows", found_records)
-                 .add("cost", cost.total_cost());
+                 .add("cost", found_read_time);
       }
-      if ((found_records != HA_POS_ERROR) && is_ror_scan)
+      if (is_ror_scan)
       {
         tree->n_ror_scans++;
         tree->ror_scans_map.set_bit(idx);
       }
-      if (found_records != HA_POS_ERROR &&
-          read_time > (found_read_time= cost.total_cost()))
+      if (read_time > found_read_time)
       {
         read_time=    found_read_time;
         best_records= found_records;
diff --git a/sql/opt_subselect.cc b/sql/opt_subselect.cc
index d91557c5be2..473a71e20f4 100644
--- a/sql/opt_subselect.cc
+++ b/sql/opt_subselect.cc
@@ -1435,9 +1435,9 @@ void get_delayed_table_estimates(TABLE *table,
   /* Calculate cost of scanning the temptable */
   double data_size= COST_MULT(item->jtbm_record_count,
                               hash_sj_engine->tmp_table->s->reclength);
-  /* Do like in handler::scan_time() */
+  /* Do like in handler::ha_scan_time() */
   *scan_time= ((data_size/table->file->stats.block_size+2) *
-               table->file->avg_io_cost());
+               table->file->avg_io_cost()) + *out_rows * RECORD_COPY_COST;
 }
 
 
diff --git a/sql/opt_subselect.h b/sql/opt_subselect.h
index abd37f1e98e..e246707ec29 100644
--- a/sql/opt_subselect.h
+++ b/sql/opt_subselect.h
@@ -233,8 +233,8 @@ public:
         double records= rows2double(s->table->file->stats.records);
 
         /* The cost is entire index scan cost (divided by 2) */
-        double read_time= s->table->file->keyread_time(key, 1,
-                                                       (ha_rows) records);
+        double read_time= s->table->file->ha_keyread_time(key, 1,
+                                                          (ha_rows) records);
 
         /*
           Now find out how many different keys we will get (for now we
diff --git a/sql/rowid_filter.cc b/sql/rowid_filter.cc
index 7f9179331c9..47ec943630f 100644
--- a/sql/rowid_filter.cc
+++ b/sql/rowid_filter.cc
@@ -22,14 +22,17 @@
 #include "sql_select.h"
 #include "opt_trace.h"
 
+/*
+  INDEX_NEXT_FIND_COST below is the cost of finding the next possible key
+  and calling handler_rowid_filter_check() to check it against the filter
+*/
 
 double Range_rowid_filter_cost_info::
 lookup_cost(Rowid_filter_container_type cont_type)
 {
   switch (cont_type) {
   case SORTED_ARRAY_CONTAINER:
-    /* The addition is here to take care of arrays with 1 element */
-    return log(est_elements)*0.01+0.001;
+    return log(est_elements)*0.01+INDEX_NEXT_FIND_COST;
   default:
     DBUG_ASSERT(0);
     return 0;
@@ -139,10 +142,10 @@ void Range_rowid_filter_cost_info::init(Rowid_filter_container_type cont_type,
 double
 Range_rowid_filter_cost_info::build_cost(Rowid_filter_container_type cont_type)
 {
-  double cost= 0;
+  double cost;
   DBUG_ASSERT(table->opt_range_keys.is_set(key_no));
 
-  cost+= table->opt_range[key_no].index_only_cost;
+  cost= table->opt_range[key_no].index_only_fetch_cost();
 
   switch (cont_type) {
 
diff --git a/sql/sql_const.h b/sql/sql_const.h
index b7105d323df..ab6afd54d95 100644
--- a/sql/sql_const.h
+++ b/sql/sql_const.h
@@ -208,8 +208,21 @@
 #define TIME_FOR_COMPARE_IDX    20.0
 
 #define IDX_BLOCK_COPY_COST  ((double) 1 / TIME_FOR_COMPARE)
+/* Time for finding the first key in a key scan */
 #define IDX_LOOKUP_COST      ((double) 1 / 8)
 #define MULTI_RANGE_READ_SETUP_COST (IDX_BLOCK_COPY_COST/10)
+/* Extra cost for full table scan. Used to prefer range scans over table scans */
+#define TABLE_SCAN_SETUP_COST 2.0
+/* cost of finding the next row and copying it to record for table scans */
+#define RECORD_COPY_COST     ((double) 1.0 / 40.0)
+/* cost of finding the next key and copying it to record for index scans */
+#define INDEX_COPY_COST     ((double) 1.0 / 80.0)
+/*
+  Cost of finding the next index entry and checking it against filter
+  This cost is very low as it's done inside the storage engine.
+ */
+#define INDEX_NEXT_FIND_COST ((double) 1.0 / 100.0)
+
 /*
   The lower bound of accepted rows when using filter.
   This is used to ensure that filters are not too agressive.
diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 4b339e36a9c..ce9e82c674a 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -5716,17 +5716,11 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
                                         "system");
         continue;
       }
-      /* Approximate found rows and time to read them */
-      if (s->table->is_filled_at_execution())
-      {
-        get_delayed_table_estimates(s->table, &s->records, &s->read_time,
-                                    &s->startup_cost);
-        s->found_records= s->records;
-        s->table->opt_range_condition_rows= s->records;
-        s->table->used_stat_records= s->records;
-      }
-      else
-        s->scan_time();
+      /*
+        Approximate found rows and time to read them
+        Update found_records, records, read_time and other scan related variables
+      */
+      s->estimate_scan_time();
 
       if (s->table->is_splittable())
         s->add_keyuses_for_splitting();
@@ -7629,8 +7623,8 @@ static double matching_candidates_in_table(JOIN_TAB *s,
 
   One main difference between the functions is that
   multi_range_read_info_const() adds a very small cost per range
-  (IDX_LOOKUP_COST) and also MULTI_RANGE_READ_SETUP_COST, to ensure that
-  'ref' is preferred slightly over ranges.
+  MULTI_RANGE_READ_SETUP_COST, to ensure that 'ref' is preferred slightly over
+  ranges.
 */
 
 INDEX_READ_COST cost_for_index_read(const THD *thd, const TABLE *table,
@@ -7645,9 +7639,13 @@ INDEX_READ_COST cost_for_index_read(const THD *thd, const TABLE *table,
   if (file->is_clustering_key(key))
     cost.read_cost= cost.index_only_cost= file->read_time(key, 1, records);
   else if (table->covering_keys.is_set(key))
-    cost.read_cost= cost.index_only_cost= file->keyread_time(key, 1, records);
+    cost.read_cost= cost.index_only_cost= file->ha_keyread_time(key, 1, records);
   else
   {
+    /*
+      The cost of finding the next row in the index. Used mainly for
+      checking against a filter.
+    */
     cost.index_only_cost= file->keyread_time(key, 1, records);
     cost.read_cost= (cost.index_only_cost +
                      file->read_time(key, 0, MY_MIN(records, worst_seeks)));
@@ -7689,12 +7687,11 @@ apply_filter(THD *thd, double *cost, double *records_arg, double *startup_cost,
 
   /*
     Calculate number of resulting rows after filtering
-    Protect against too agressive filter, however allow it to
-    be as low as records if records < MIN_ROWS_AFTER_FILTERING.
+    Here we trust selectivity and not will not adjust rows up even if
+    the end result is low. This means that new_records is allwoed to be
+    be < 1.0
   */
   new_records= records * selectivity;
-  new_records= MY_MAX(new_records, MY_MIN(records,
-                                          MIN_ROWS_AFTER_FILTERING));
 
   /*
     Calculate the cost of the filter based on that we had originally
@@ -7708,7 +7705,7 @@ apply_filter(THD *thd, double *cost, double *records_arg, double *startup_cost,
     with the WHERE clause.
   */
   cost_of_accepted_rows= (*cost/records * new_records);
-  cost_of_rejected_rows= index_only_cost/records*(records-new_records);
+  cost_of_rejected_rows= index_only_cost/records * (records-new_records);
   new_cost= (cost_of_accepted_rows + cost_of_rejected_rows +
              filter_lookup_cost);
 
@@ -7987,9 +7984,10 @@ best_access_path(JOIN      *join,
             }
             else
             {
-              /* TODO: Adjust cost for covering and clustering key */
-              tmp= table->file->avg_io_cost();
-              index_only_cost= MY_MIN(IDX_LOOKUP_COST, tmp);
+              INDEX_READ_COST cost= cost_for_index_read(thd, table, key,
+                                                        1,1);
+              tmp= cost.read_cost;
+              index_only_cost= cost.index_only_cost;
             }
             /*
               Calculate an adjusted cost based on how many records are read
@@ -8087,7 +8085,7 @@ best_access_path(JOIN      *join,
                 }
               }
             }
-            /* Limit the number of matched rows */
+            /* Calculate the cost of the index access */
             INDEX_READ_COST cost= cost_for_index_read(thd, table, key,
                                                       (ha_rows) records,
                                                       (ha_rows) s->worst_seeks);
@@ -8333,11 +8331,19 @@ best_access_path(JOIN      *join,
 
       tmp= COST_MULT(tmp, record_count);
       tmp= COST_ADD(tmp, startup_cost);
-      if (tmp + 0.0001 < best_cost)
+      /*
+        The COST_EPS is here to ensure we use the first key if there are
+        two 'identical keys' that could be used.
+      */
+      if (tmp + COST_EPS < best_cost)
       {
         trace_access_idx.add("cost", tmp).add("chosen", true);
         best_cost= tmp;
-        best_records= records_after_filter;
+        /*
+          We use 'records' instead of 'records_after_filter' here as we want
+          to have EXPLAIN print the number of rows found by the key access.
+        */
+        best_records= records;                  // Records before filter!
         best_key= start_key;
         best_max_key_part= max_key_part;
         best_ref_depends_map= found_ref;
@@ -8389,14 +8395,7 @@ best_access_path(JOIN      *join,
       tmp= s->quick->read_time;
     }
     else
-    {
-      tmp= s->scan_time();
-      /*
-        Cost of comparing the found row with the attached WHERE
-        This is not part of scan_time()!
-      */
-      tmp= COST_ADD(tmp, s->records / TIME_FOR_COMPARE);
-    }
+      tmp= s->cached_scan_and_compare_time;
 
     /* We read the table as many times as join buffer becomes full. */
     refills= (1.0 + floor((double) cache_record_length(join,idx) *
@@ -8474,7 +8473,7 @@ best_access_path(JOIN      *join,
       !(s->table->force_index && best_key && !s->quick) &&               // (4)
       !(best_key && s->table->pos_in_table_list->jtbm_subselect))        // (5)
   {                                             // Check full join
-    double rnd_records;
+    double rnd_records, records_after_filter;
     Range_rowid_filter_cost_info *filter= 0;
     double startup_cost= s->startup_cost;
 
@@ -8506,7 +8505,7 @@ best_access_path(JOIN      *join,
         This is done to make records found comparable to what we get with
         'ref' access.
       */
-      rnd_records= s->found_records;
+      records_after_filter= rnd_records= s->found_records;
 
       if (s->quick->get_type() == QUICK_SELECT_I::QS_TYPE_RANGE)
       {
@@ -8514,8 +8513,15 @@ best_access_path(JOIN      *join,
         TABLE::OPT_RANGE *range= &s->table->opt_range[key_no];
         double rows= record_count * range->rows;
 
+        /*
+          Ensure that 'range' and 's' are comming from the same source
+          The complex 'double' comparison is there because floating point
+          registers complications when costs are calculated.
+        */
         DBUG_ASSERT(range->rows == s->found_records);
-        DBUG_ASSERT(s->quick->read_time == range->cost);
+        DBUG_ASSERT((range->cost == 0.0 && s->quick->read_time == 0.0) ||
+                     (range->cost / s->quick->read_time <= 1.0000001 &&
+                      range->cost / s->quick->read_time >= 0.9999999));
 
         /*
           The difference in cost between doing a fetch of the index entry
@@ -8533,7 +8539,7 @@ best_access_path(JOIN      *join,
         s->table->best_range_rowid_filter_for_partial_join(key_no, rows,
                                                            access_cost_factor);
         if (filter)
-          filter= filter->apply_filter(thd, &tmp, &rnd_records,
+          filter= filter->apply_filter(thd, &tmp, &records_after_filter,
                                        &startup_cost,
                                        range->fetch_cost,
                                        range->index_only_cost,
@@ -8548,9 +8554,10 @@ best_access_path(JOIN      *join,
     }
     else
     {
-      /* We will now calcualte cost of scan, with or without join buffer */
+      /* We will now calculate cost of scan, with or without join buffer */
       rnd_records= matching_candidates_in_table(s, found_constraint,
                                                 use_cond_selectivity);
+      records_after_filter= rnd_records;
       DBUG_ASSERT(rnd_records <= s->records);
 
       /* Estimate cost of reading table. */
@@ -8565,7 +8572,7 @@ best_access_path(JOIN      *join,
       }
       else // table scan
       {
-        tmp= s->scan_time();
+        tmp= s->cached_scan_time;
         type= JT_ALL;
       }
 
@@ -8628,8 +8635,16 @@ best_access_path(JOIN      *join,
     else
       tmp+= startup_cost;
 
-    trace_access_scan.add("rows", rnd_records).add("cost", tmp).
-      add("startup_cost", startup_cost);
+    if (thd->trace_started())
+    {
+      trace_access_scan.add("rows", records_after_filter).add("cost", tmp).
+        add("startup_cost", startup_cost);
+      if (type == JT_ALL)
+      {
+        trace_access_scan.add("index_only",
+                              (s->cached_covering_key != MAX_KEY));
+      }
+    }
 
     if (tmp < best_cost)
     {
@@ -8651,14 +8666,16 @@ best_access_path(JOIN      *join,
                                                   join->outer_join)));
       spl_plan= 0;
       best_type= type;
+      trace_access_scan.add("chosen", true);
     }
-    trace_access_scan.add("chosen", best_key == NULL);
+    else
+      trace_access_scan.add("chosen", false);
   }
   else
   {
-    trace_access_scan.add("type", "scan");
-    trace_access_scan.add("chosen", false);
-    trace_access_scan.add("cause", "cost");
+    trace_access_scan.add("type", "scan").
+      add("chosen", false).
+      add("cause", "cost");
   }
 
   /* Update the cost information for the current partial plan */
@@ -9180,12 +9197,7 @@ optimize_straight_join(JOIN *join, table_map join_tables)
 
     /* compute the cost of the new plan extended with 's' */
     record_count= COST_MULT(record_count, position->records_read);
-    const double filter_cmp_gain= position->range_rowid_filter_info
-      ? position->range_rowid_filter_info->get_cmp_gain(record_count)
-      : 0;
-    read_time+= COST_ADD(read_time - filter_cmp_gain,
-                         COST_ADD(position->read_time,
-                                  record_count / TIME_FOR_COMPARE));
+    read_time+= COST_ADD(read_time, position->read_time);
     advance_sj_state(join, join_tables, idx, &record_count, &read_time,
                      &loose_scan_pos);
 
@@ -9204,7 +9216,7 @@ optimize_straight_join(JOIN *join, table_map join_tables)
   memcpy((uchar*) join->best_positions, (uchar*) join->positions,
          sizeof(POSITION)*idx);
   join->join_record_count= record_count;
-  join->best_read= read_time - 0.001;
+  join->best_read= read_time;
 }
 
 
@@ -9376,9 +9388,7 @@ greedy_search(JOIN      *join,
 
     /* compute the cost of the new plan extended with 'best_table' */
     record_count= COST_MULT(record_count, join->positions[idx].records_read);
-    read_time= COST_ADD(read_time,
-                         COST_ADD(join->positions[idx].read_time,
-                                  record_count / TIME_FOR_COMPARE));
+    read_time= COST_ADD(read_time, join->positions[idx].read_time);
 
     remaining_tables&= ~(best_table->table->map);
     --size_remain;
@@ -9486,9 +9496,7 @@ void JOIN::get_partial_cost_and_fanout(int end_tab_idx,
     if (tab->records_read && (cur_table_map & filter_map))
     {
       record_count= COST_MULT(record_count, tab->records_read);
-      read_time= COST_ADD(read_time,
-                          COST_ADD(tab->read_time,
-                                   record_count / TIME_FOR_COMPARE));
+      read_time= COST_ADD(read_time, tab->read_time);
       if (tab->emb_sj_nest)
         sj_inner_fanout= COST_MULT(sj_inner_fanout, tab->records_read);
 				     }
@@ -9503,7 +9511,7 @@ void JOIN::get_partial_cost_and_fanout(int end_tab_idx,
     if (tab == end_tab)
       break;
   }
-  *read_time_arg= read_time;// + record_count / TIME_FOR_COMPARE;
+  *read_time_arg= read_time;
   *record_count_arg= record_count;
 }
 
@@ -9532,9 +9540,8 @@ void JOIN::get_prefix_cost_and_fanout(uint n_tables,
       record_count= COST_MULT(record_count, best_positions[i].records_read);
       read_time= COST_ADD(read_time, best_positions[i].read_time);
     }
-    /* TODO: Take into account condition selectivities here */
   }
-  *read_time_arg= read_time;// + record_count / TIME_FOR_COMPARE;
+  *read_time_arg= read_time;
   *record_count_arg= record_count;
 }
 
@@ -10133,7 +10140,6 @@ best_extension_by_limited_search(JOIN      *join,
     {
       double current_record_count, current_read_time;
       double partial_join_cardinality;
-      double filter_cmp_gain;
       double pushdown_cond_selectivity;
       POSITION loose_scan_pos, *position= join->positions + idx;
 
@@ -10150,14 +10156,7 @@ best_extension_by_limited_search(JOIN      *join,
 
       /* Compute the cost of extending the plan with 's' */
       current_record_count= COST_MULT(record_count, position->records_read);
-      filter_cmp_gain= position->range_rowid_filter_info ?
-        position->range_rowid_filter_info->get_cmp_gain(current_record_count) :
-        0;
-      current_read_time=COST_ADD(read_time,
-                                 COST_ADD(position->read_time -
-                                          filter_cmp_gain,
-                                          current_record_count /
-                                          TIME_FOR_COMPARE));
+      current_read_time=COST_ADD(read_time, position->read_time);
 
       if (unlikely(thd->trace_started()))
       {
@@ -10274,7 +10273,7 @@ best_extension_by_limited_search(JOIN      *join,
           memcpy((uchar*) join->best_positions, (uchar*) join->positions,
                  sizeof(POSITION) * (idx + 1));
           join->join_record_count= partial_join_cardinality;
-          join->best_read= current_read_time - 0.001;
+          join->best_read= current_read_time;
         }
         DBUG_EXECUTE("opt", print_plan(join, idx+1,
                                        current_record_count,
@@ -12365,9 +12364,9 @@ make_join_select(JOIN *join,SQL_SELECT *select,COND *cond)
 			     (sel->quick_keys.is_clear_all() ||
                               (sel->quick && 
                                sel->quick->read_time > 
-                               tab->table->file->scan_time() + 
-                               tab->table->file->stats.records/TIME_FOR_COMPARE
-                               ))) ?
+                               tab->table->file->
+                               ha_scan_and_compare_time(tab->table->file->
+                                                        stats.records)))) ?
 	      2 : 1;
 	    sel->read_tables= used_tables & ~current_map;
             sel->quick_keys.clear_all();
@@ -13851,7 +13850,10 @@ make_join_readinfo(JOIN *join, ulonglong options, uint no_jbuf_after)
                 tab->index= table->s->primary_key;
               else
 #endif
+              {
                 tab->index=find_shortest_key(table, & table->covering_keys);
+                DBUG_ASSERT(tab->index == tab->cached_covering_key);
+              }
             }
 	    tab->read_first_record= join_read_first;
             /* Read with index_first / index_next */
@@ -14103,11 +14105,14 @@ void JOIN_TAB::cleanup()
 
   Note that this doesn't take into account of checking the WHERE clause
   for all found rows (TIME_FOR_COMPARE)
+
+  Updates found_records, records, cached_scan_time, cached_covering_key and
+  read_time
 */
 
-double JOIN_TAB::scan_time()
+void JOIN_TAB::estimate_scan_time()
 {
-  double res;
+  cached_covering_key= MAX_KEY;
   if (table->is_created())
   {
     if (table->is_filled_at_execution())
@@ -14116,25 +14121,31 @@ double JOIN_TAB::scan_time()
                                   &startup_cost);
       found_records= records;
       table->opt_range_condition_rows= records;
+      table->used_stat_records= records;
     }
     else
     {
       found_records= records= table->stat_records();
-      read_time= table->file->scan_time();
       /*
         table->opt_range_condition_rows has already been set to
         table->file->stats.records
       */
+      if (!table->covering_keys.is_clear_all() && ! table->no_keyread)
+      {
+        cached_covering_key= find_shortest_key(table, &table->covering_keys);
+        read_time= table->file->ha_key_scan_time(cached_covering_key);
+      }
+      else
+        read_time= table->file->ha_scan_time(records);
     }
-    res= read_time;
   }
   else
   {
     found_records= records=table->stat_records();
     read_time= found_records ? (double)found_records: 10.0;// TODO:fix this stub
-    res= read_time;
   }
-  return res;
+  cached_scan_time= read_time;
+  cached_scan_and_compare_time= read_time + found_records/TIME_FOR_COMPARE;
 }
 
 
@@ -16044,7 +16055,7 @@ static COND *build_equal_items(JOIN *join, COND *cond,
         table->on_expr= build_equal_items(join, table->on_expr, inherited,
                                           nested_join_list, ignore_on_conds,
                                           &table->cond_equal);
-        if (unlikely(join->thd->trace_started()))
+        if (unlikely(thd->trace_started()))
         {
           const char *table_name;
           if (table->nested_join)
@@ -17636,7 +17647,6 @@ void optimize_wo_join_buffering(JOIN *join, uint first_tab, uint last_tab,
     reopt_remaining_tables &= ~rs->table->map;
     rec_count= COST_MULT(rec_count, pos.records_read);
     cost= COST_ADD(cost, pos.read_time);
-    cost= COST_ADD(cost, rec_count / TIME_FOR_COMPARE);
     //TODO: take into account join condition selectivity here
     double pushdown_cond_selectivity= 1.0;
     table_map real_table_bit= rs->table->map;
@@ -23752,7 +23762,7 @@ uint find_shortest_key(TABLE *table, const key_map *usable_keys)
     {
       if (usable_keys->is_set(nr))
       {
-        double cost= table->file->keyread_time(nr, 1, table->file->records());
+        double cost= table->file->ha_key_scan_time(nr);
         if (cost < min_cost)
         {
           min_cost= cost;
@@ -28931,6 +28941,8 @@ void JOIN::cache_const_exprs()
    - if there is a ref(const) access, we try to use it, too.
    - quick and ref(const) use different cost formulas, so if both are possible
       we should make a cost-based choice.
+   - If there is no quick select return the full cost from
+     cost_for_index_read() (Doing a full scan with up to 'limit' records)
 
   rows_limit is the number of rows we would need to read when using a full
   index scan. This is generally higher than the N from "LIMIT N" clause,
@@ -28942,23 +28954,17 @@ void JOIN::cache_const_exprs()
   @param  rows_limit       See explanation above
   @param  read_time OUT    Cost of reading using quick or ref(const) access.
 
-
-  @return 
-    true   There was a possible quick or ref access, its cost is in the OUT
-           parameters.
-    false  No quick or ref(const) possible (and so, the caller will attempt 
-           to use a full index scan on this index).
+  @return
+    best cost found
 */
 
-static bool get_range_limit_read_cost(const JOIN_TAB *tab, 
-                                      const TABLE *table, 
-                                      ha_rows table_records,
-                                      uint keynr,
-                                      ha_rows rows_limit,
-                                      double *read_time)
+static double get_range_limit_read_cost(const JOIN_TAB *tab,
+                                        const TABLE *table,
+                                        ha_rows table_records,
+                                        uint keynr,
+                                        ha_rows rows_limit)
 {
-  bool res= false;
-  /* 
+  /*
     We need to adjust the estimates if we had a quick select (or ref(const)) on
     index keynr.
   */
@@ -29049,10 +29055,11 @@ static bool get_range_limit_read_cost(const JOIN_TAB *tab,
       */
       best_cost *= rows_limit_for_quick / best_rows;
     }
-    *read_time= best_cost;
-    res= true;
+    return best_cost;
   }
-  return res;
+  return cost_for_index_read(table->in_use, table, keynr,
+                             rows_limit,
+                             (ha_rows) tab->worst_seeks).read_cost;
 }
 
 
@@ -29161,13 +29168,12 @@ test_if_cheaper_ordering(const JOIN_TAB *tab, ORDER *order, TABLE *table,
     }
   }
   else
-    read_time= table->file->scan_time();        // TODO: Add TIME_FOR_COMPARE
-  
+    read_time= table->file->ha_scan_and_compare_time(table->file->stats.records);
+
   trace_cheaper_ordering.add("fanout", fanout);
   /*
     TODO: add cost of sorting here.
   */
-  read_time += COST_EPS;
   trace_cheaper_ordering.add("read_time", read_time);
   /*
     Calculate the selectivity of the ref_key for REF_ACCESS. For
@@ -29203,8 +29209,9 @@ test_if_cheaper_ordering(const JOIN_TAB *tab, ORDER *order, TABLE *table,
   for (nr=0; nr < table->s->keys ; nr++)
   {
     int direction;
-    ha_rows select_limit= select_limit_arg;
     uint used_key_parts= 0;
+    ha_rows select_limit= select_limit_arg;
+    double range_scan_time;
     Json_writer_object possible_key(thd);
     possible_key.add("index", table->key_info[nr].name);
 
@@ -29364,23 +29371,16 @@ test_if_cheaper_ordering(const JOIN_TAB *tab, ORDER *order, TABLE *table,
           to calculate the cost of accessing data rows for one 
           index entry.
         */
-#ifdef NEED_TESTING
-        index_scan_time= (cost_for_index_read(table->in_use, table, nr,
-                                              select_limit,
-                                              (ha_rows) tab->worst_seeks) +
-                          select_limit / TIME_FOR_COMPARE);
-#else
         index_scan_time= (select_limit/rec_per_key *
-                          MY_MIN(rec_per_key, table->file->scan_time()));
-#endif
-        double range_scan_time;
-        if (get_range_limit_read_cost(tab, table, table_records, nr,
-                                      select_limit, &range_scan_time))
-        {
-          possible_key.add("range_scan_time", range_scan_time);
-          if (range_scan_time < index_scan_time)
-            index_scan_time= range_scan_time;
-        }
+                          MY_MIN(rec_per_key,
+                                 table->file->
+                                 ha_scan_time(select_limit)));
+        range_scan_time= get_range_limit_read_cost(tab, table, table_records, nr,
+                                                   select_limit);
+        if (range_scan_time < index_scan_time)
+          index_scan_time= range_scan_time;
+        index_scan_time+= select_limit / TIME_FOR_COMPARE;
+
         possible_key.add("index_scan_time", index_scan_time);
 
         if ((ref_key < 0 && (group || table->force_index || is_covering)) ||
@@ -29440,20 +29440,20 @@ test_if_cheaper_ordering(const JOIN_TAB *tab, ORDER *order, TABLE *table,
             possible_key.add("cause", cause);
           }
         }
-        else
+        else if (unlikely(possible_key.trace_started()))
         {
           possible_key.add("usable", false);
           possible_key.add("cause", "cost");
         }
       }
-      else
+      else if (unlikely(possible_key.trace_started()))
       {
         possible_key.add("usable", false);
         if (!group && select_limit == HA_POS_ERROR)
           possible_key.add("cause", "order by without limit");
       }
     }
-    else
+    else if (unlikely(possible_key.trace_started()))
     {
       if (keys.is_set(nr))
       {
diff --git a/sql/sql_select.h b/sql/sql_select.h
index d4e06f42249..dbffc15f066 100644
--- a/sql/sql_select.h
+++ b/sql/sql_select.h
@@ -309,11 +309,6 @@ typedef struct st_join_table {
   Table_access_tracker *tracker;
 
   Table_access_tracker *jbuf_tracker;
-  /* 
-    Bitmap of TAB_INFO_* bits that encodes special line for EXPLAIN 'Extra'
-    column, or 0 if there is no info.
-  */
-  uint          packed_info;
 
   //  READ_RECORD::Setup_func materialize_table;
   READ_RECORD::Setup_func read_first_record;
@@ -357,8 +352,17 @@ typedef struct st_join_table {
     
   double        partial_join_cardinality;
 
+  /* set by estimate_scan_time() */
+  double        cached_scan_time;
+  double        cached_scan_and_compare_time;
+
   table_map	dependent,key_dependent;
   /*
+    Bitmap of TAB_INFO_* bits that encodes special line for EXPLAIN 'Extra'
+    column, or 0 if there is no info.
+  */
+  uint          packed_info;
+  /*
      1 - use quick select
      2 - use "Range checked for each record"
   */
@@ -370,6 +374,7 @@ typedef struct st_join_table {
   uint          index;
   uint		status;				///< Save status for cache
   uint		used_fields;
+  uint          cached_covering_key;            /* Set by estimate_scan_time() */
   ulong         used_fieldlength;
   ulong         max_used_fieldlength;
   uint          used_blobs;
@@ -637,7 +642,7 @@ typedef struct st_join_table {
   {
     return (is_hash_join_key_no(key) ? hj_key : table->key_info+key);
   }
-  double scan_time();
+  void estimate_scan_time();
   ha_rows get_examined_rows();
   bool preread_init();
 
diff --git a/sql/table.h b/sql/table.h
index 0caea757eb6..7d0640fbfbf 100644
--- a/sql/table.h
+++ b/sql/table.h
@@ -1356,16 +1356,31 @@ public:
     uint        key_parts;
     uint        ranges;
     ha_rows     rows;
-    /* Cost of fetching and comparing the row aginst the WHERE clause */
+    /*
+      The full cost of a using 'range'. Includes fetching the rows
+      trough keys and comparing the rows aginst the WHERE clause
+    */
     double      cost;
-    /* Cost of comparing row with WHERE clause. Included in 'cost' */
+    /*
+      Cost of feching the rows based on a rowids and comparing the rows with
+      WHERE clause. Included in 'cost'
+    */
     double      fetch_cost;
     /*
-      If there is a range access by i-th index then the cost of
-      index only access for it is stored in index_only_costs[i]
+      Cost of fetching the keys, not including copying the key to
+      record. Included in 'cost'. This is used with filtering.
     */
     double      index_only_cost;
     bool        first_key_part_has_only_one_value;
+
+    /*
+      Cost of fetching a keys with index only read and returning them to the
+      sql level.
+    */
+    double index_only_fetch_cost()
+    {
+      return index_only_cost + (double) rows * INDEX_COPY_COST;
+    }
   } *opt_range;
   /* 
      Bitmaps of key parts that =const for the duration of join execution. If
author	Monty <monty@mariadb.org>	2021-11-09 14:45:53 +0200
committer	Sergei Krivonos <sergei.krivonos@mariadb.com>	2021-11-10 15:04:47 +0200
commit	aa9410206271df35e8b9bcc8b1c2769accc19171 (patch)
tree	df49882b217ea92edbeebc1bfb3e56cd2592c1ff
parent	38cf74dd29f7b1ae87cd665d37d225c54e594884 (diff)
download	mariadb-git-selectivity-10.7-rbz.tar.gz