summaryrefslogtreecommitdiff
path: root/sql/handler.h
diff options
context:
space:
mode:
authorAleksey Midenkov <midenok@gmail.com>2021-11-30 01:46:27 +0300
committerAleksey Midenkov <midenok@gmail.com>2021-12-10 13:41:49 +0300
commit111f58c9233c183f3a5cf2caac5dc26ad4e0cbc7 (patch)
tree04abd7d34fae9f0c56ace319429022c139f40f94 /sql/handler.h
parent80834a8f5d0bbaefd98baac6399321bb2e06e97c (diff)
downloadmariadb-git-bb-10.7-midenok-MDEV-25292.tar.gz
MDEV-25292 Atomic CREATE OR REPLACE TABLEbb-10.7-midenok-MDEV-25292
Atomic replace algorithm Two DDL chains are used for CREATE OR REPLACE: ddl_log_state_create (C) and ddl_log_state_rm (D). 1. (C) Write DDL_LOG_CREATE_TABLE_ACTION of TMP table (drops TMP table); 2. Create new table as TMP; 3. Do everything with TMP (like insert data); 4. (D) Write DDL_LOG_RENAME_TABLE_ACTION from ORIG to TMP (replays TMP -> ORIG); 5. (D) Write DDL_LOG_DROP_ACTION of ORIG; 6. (C) close chain; 7. (D) replay chain Temporary table for CREATE OR REPLACE Before dropping "old" table, CREATE OR REPLACE creates "tmp" table. ddl_log_state_create holds the drop of the "tmp" table. When everything is OK (data is inserted, "tmp" is ready) ddl_log_state_rm is written to replace "old" with "tmp". Until ddl_log_state_create is closed ddl_log_state_rm is not executed. After the binlogging is done ddl_log_state_create is closed. At that point ddl_log_state_rm is executed and "tmp" is replaced with "old". That is: final rename is done by the DDL log. With that important role of DDL log for CREATE OR REPLACE operation replay of ddl_log_state_rm must fail at the first hit error and print the error message if possible. F.ex. foreign key error is discovered at this phase: InnoDB rejects to drop the "old" table and returns corresponding foreign key error code. Additional notes - CREATE TABLE without REPLACE is not affected by this commit. - Engines having HTON_EXPENSIVE_RENAME flag set are not affected by this commit. - CREATE TABLE .. SELECT XID usage is fixed and now there is no need to log DROP TABLE via DDL_CREATE_TABLE_PHASE_LOG (see comments in do_postlock()). XID is now correctly updated so it disables DDL_LOG_DROP_TABLE_ACTION. Note that binary log is flushed at the final stage when the table is ready. So if we have XID in the binary log we don't need to drop the table. - Three variations of CREATE OR REPLACE handled: 1. CREATE OR REPLACE TABLE t1 (..); 2. CREATE OR REPLACE TABLE t1 LIKE t2; 3. CREATE OR REPLACE TABLE t1 SELECT ..; - Test case uses 5 combinations for engines (aria, aria_notrans, myisam, ib, expensive_rename) and 2 combinations for binlog types (row, stmt). Combinations help to check differences between the results. Error failures are tested for the above three variations. - Triggers mechanism is unaffected by this change. This is tested in create_replace.test. - LOCK TABLES is affected. Lock restoration must be done after "rm" chain is replayed. Rename and drop via DDL log We replay ddl_log_state_rm to drop the old table and rename the temporary table. In that case we must throw the correct error message if ddl_log_revert() fails (f.ex. on FK error). If table is deleted earlier and not via DDL log and the crash happened, your create chain is not closed. Linked drop chain is not executed and the new table is not installed. But the old table is already deleted. ddl_log.cc changes Now we can place action before DDL_LOG_DROP_INIT_ACTION and it will be replayed after DDL_LOG_DROP_TABLE_ACTION. report_error parameter for ddl_log_revert() allows to fail at first error and print the error message if possible. ddl_log_execute_action() now can print error message. Since we now can handle errors from ddl_log_execute_action() (in case of non-recovery execution) unconditional setting "error= TRUE" is wrong (it was wrong anyway because it was overwritten at the end of the function). On XID usage Like with all other atomic DDL operations XID is used to avoid inconsistency between master and slave in the case of a crash after binary log is written and before ddl_log_state_create is closed. On recovery XIDs are taken from binary log and corresponding DDL log events get disabled. That is done by ddl_log_close_binlogged_events(). On linking two chains together Chains are executed in the ascending order of entry_pos of execute entries. But entry_pos assignment order is undefined: it may assign bigger number for the first chain and then smaller number for the second chain. So the execution order in that case will be reverse: second chain will be executed first. To avoid that we link one chain to another. While the master chain is active the slave chain is not executed. That is: only one chain can be executed in two linked chains. The interface ddl_log_link_chains() was done in "MDEV-22166 ddl_log_write_execute_entry() extension". Refactoring: moved select_field_count into Alter_info. As atomic CREATER OR REPLACE .. SELECT now uses temporary table there is a need to have both C_ALTER_TABLE and select_field_count in one call. Semantically creation mode and field count are two different things. Making creation mode negative constants and field count positive variable into one parameter seems to be a lazy hack for not making the second parameter. select_field_count does not make sense without alter_info->create_list, so the natural way is to hold it in Alter_info too. More on CREATE OR REPLACE .. SELECT We use create_and_open_tmp_table() like in ALTER TABLE to create temporary TABLE object (tmp_table is (NON_)TRANSACTIONAL_TMP_TABLE). After we created such TABLE object we use create_info->tmp_table() instead of table->s->tmp_table when we need to check for parser-requested tmp-table. External locking is required for temporary table created by create_and_open_tmp_table(). F.ex. that disables logging for Aria transactional tables and wihout that (when no mysql_lock_tables() is done) it cannot work correctly. For external locking we require Aria table to work in non-transactional mode. That is usually done by ha_enable_transaction(false). But we cannot disable transaction completely because: 1. binlog rollback removes pending row events (binlog_remove_pending_rows_event()). The row events are added during CREATE .. SELECT data insertion phase. 2. replication slave highly depends on transaction and cannot work without it. So we put temporary Aria table into non-transactional mode with "thd->transaction->on hack". See comment for on_save variable. Note that Aria table has internal_table mode. But we cannot use it because: if (!internal_table) { mysql_mutex_lock(&THR_LOCK_myisam); old_info= test_if_reopen(name_buff); } For internal_table test_if_reopen() is not called and we get a new MARIA_SHARE for each file handler. In that case duplicate errors are missed because insert and lookup in CREATE .. SELECT is done via two different handlers (see create_lookup_handler()). For temporary table before dropping TABLE_SHARE by drop_temporary_table() we must do ha_reset(). ha_reset() releases storage share. Without that the share is kept and the second CREATE OR REPLACE .. SELECT fails with: HA_ERR_TABLE_EXIST (156): MyISAM table '#sql-create-b5377-4-t2' is in use (most likely by a MERGE table). Try FLUSH TABLES. HA_EXTRA_PREPARE_FOR_DROP also removes MYISAM_SHARE, but that is not needed as ha_reset() does the job. ha_reset() is usually done by mark_tmp_table_as_free_for_reuse(). But we don't need that mechanism for our temporary table. Atomic_info in HA_CREATE_INFO Many functions in CREATE TABLE pass the same parameters. These parameters are part of table creation info and should be in HA_CREATE_INFO (or whatever). Passing parameters via single structure is much easier for adding new data and refactoring.
Diffstat (limited to 'sql/handler.h')
-rw-r--r--sql/handler.h55
1 files changed, 50 insertions, 5 deletions
diff --git a/sql/handler.h b/sql/handler.h
index fe61666bf20..1a2156e7c46 100644
--- a/sql/handler.h
+++ b/sql/handler.h
@@ -35,6 +35,7 @@
#include "sql_array.h" /* Dynamic_array<> */
#include "mdl.h"
#include "vers_string.h"
+#include "backup.h"
#include "sql_analyze_stmt.h" // for Exec_time_tracker
@@ -1827,6 +1828,12 @@ handlerton *ha_default_tmp_handlerton(THD *thd);
*/
#define HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT (1 << 20)
+/*
+ Indicates that rename table is expensive operation.
+ When set atomic CREATE OR REPLACE TABLE is not used.
+*/
+#define HTON_EXPENSIVE_RENAME (1 << 21)
+
class Ha_trx_info;
struct THD_TRANS
@@ -2303,8 +2310,7 @@ struct Table_scope_and_contents_source_st:
bool fix_period_fields(THD *thd, Alter_info *alter_info);
bool check_fields(THD *thd, Alter_info *alter_info,
const Lex_table_name &table_name,
- const Lex_table_name &db,
- int select_count= 0);
+ const Lex_table_name &db);
bool check_period_fields(THD *thd, Alter_info *alter_info);
bool vers_fix_system_fields(THD *thd, Alter_info *alter_info,
@@ -2312,9 +2318,33 @@ struct Table_scope_and_contents_source_st:
bool vers_check_system_fields(THD *thd, Alter_info *alter_info,
const Lex_table_name &table_name,
- const Lex_table_name &db,
- int select_count= 0);
+ const Lex_table_name &db);
+};
+typedef struct st_ddl_log_state DDL_LOG_STATE;
+
+struct Atomic_info
+{
+ TABLE_LIST *tmp_name;
+ DDL_LOG_STATE *ddl_log_state_create;
+ DDL_LOG_STATE *ddl_log_state_rm;
+ backup_log_info drop_entry;
+
+ Atomic_info() :
+ tmp_name(NULL),
+ ddl_log_state_create(NULL),
+ ddl_log_state_rm(NULL)
+ {
+ bzero(&drop_entry, sizeof(drop_entry));
+ }
+
+ Atomic_info(DDL_LOG_STATE *ddl_log_state_rm) :
+ tmp_name(NULL),
+ ddl_log_state_create(NULL),
+ ddl_log_state_rm(ddl_log_state_rm)
+ {
+ bzero(&drop_entry, sizeof(drop_entry));
+ }
};
@@ -2324,7 +2354,8 @@ struct Table_scope_and_contents_source_st:
parts are handled on the SQL level and are not needed on the handler level.
*/
struct HA_CREATE_INFO: public Table_scope_and_contents_source_st,
- public Schema_specification_st
+ public Schema_specification_st,
+ public Atomic_info
{
/* TODO: remove after MDEV-20865 */
Alter_info *alter_info;
@@ -2369,6 +2400,16 @@ struct HA_CREATE_INFO: public Table_scope_and_contents_source_st,
else
return table_options;
}
+ bool ok_atomic_replace() const
+ {
+ return !tmp_table() && !sequence &&
+ !(db_type->flags & HTON_EXPENSIVE_RENAME) &&
+ !DBUG_IF("ddl_log_expensive_rename");
+ }
+ bool handle_atomic_replace(THD *thd, const LEX_CSTRING &db,
+ const LEX_CSTRING &table_name,
+ const DDL_options_st options);
+ bool finalize_ddl(THD *thd);
};
@@ -2401,6 +2442,10 @@ struct Table_specification_st: public HA_CREATE_INFO,
HA_CREATE_INFO::options= 0;
DDL_options_st::init();
}
+ bool is_atomic_replace() const
+ {
+ return or_replace() && ok_atomic_replace();
+ }
};