MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel

replication causing replication to fail. In parallel replication, we run transactions from the master in parallel, but force them to commit in the same order they did on the master. If we force T1 to commit before T2, but T2 holds eg. a row lock that is needed by T1, we get a deadlock when T2 waits until T1 has committed. Usually, we do not run T1 and T2 in parallel if there is a chance that they can have conflicting locks like this, but there are certain edge cases where it can occasionally happen (eg. MDEV-5914, MDEV-5941, MDEV-6020). The bug was that this would cause replication to hang, eventually getting a lock timeout and causing the slave to stop with error. With this patch, InnoDB will report back to the upper layer whenever a transactions T1 is about to do a lock wait on T2. If T1 and T2 are parallel replication transactions, and T2 needs to commit later than T1, we can thus detect the deadlock; we then kill T2, setting a flag that causes it to catch the kill and convert it to a deadlock error; this error will then cause T2 to roll back and release its locks (so that T1 can commit), and later T2 will be re-tried and eventually also committed. The kill happens asynchroneously in a slave background thread; this is necessary, as the reporting from InnoDB about lock waits happen deep inside the locking code, at a point where it is not possible to directly call THD::awake() due to mutexes held. Deadlock is assumed to be (very) rarely occuring, so this patch tries to minimise the performance impact on the normal case where no deadlocks occur, rather than optimise the handling of the occasional deadlock. Also fix transaction retry due to deadlock when it happens after a transaction already signalled to later transactions that it started to commit. In this case we need to undo this signalling (and later redo it when we commit again during retry), so following transactions will not start too early. Also add a missing thd->send_kill_message() that got triggered during testing (this corrects an incorrect fix for MySQL Bug#58933).
author: unknown <knielsen@knielsen-hq.org> 2014-06-03 10:31:11 +0200
committer: Kristian Nielsen <knielsen@knielsen-hq.org> 2014-06-03 10:31:11 +0200
commit: 629b822913348cec56ec7a80a236f0ba2e613585 (patch)
tree: 8c713a0ad975deb8b6764af03b1cca8e8cd195db /mysql-test/suite/rpl/t/rpl_mdev6020.test
parent: 787c470cef54574e744eb5dfd9153d837fe67e45 (diff)
download: mariadb-git-629b822913348cec56ec7a80a236f0ba2e613585.tar.gz
1 files changed, 70 insertions, 0 deletions
diff --git a/mysql-test/suite/rpl/t/rpl_mdev6020.test b/mysql-test/suite/rpl/t/rpl_mdev6020.test
new file mode 100644
index 00000000000..2fd342f5eda
--- /dev/null
+++ b/mysql-test/suite/rpl/t/rpl_mdev6020.test
@@ -0,0 +1,70 @@
+--source include/have_innodb.inc
+--source include/have_partition.inc
+--source include/have_binlog_format_mixed_or_row.inc
+--source include/master-slave.inc
+
+--connection slave
+--source include/stop_slave.inc
+
+--connection master
+--let $datadir= `SELECT @@datadir`
+
+--let $rpl_server_number= 1
+--source include/rpl_stop_server.inc
+
+--remove_file $datadir/master-bin.000001
+--remove_file $datadir/master-bin.state
+--copy_file $MYSQL_TEST_DIR/std_data/mdev6020-mysql-bin.000001 $datadir/master-bin.000001
+
+--let $rpl_server_number= 1
+--source include/rpl_start_server.inc
+
+--source include/wait_until_connected_again.inc
+
+--connection slave
+SET SQL_LOG_BIN=0;
+ALTER TABLE mysql.gtid_slave_pos ENGINE = InnoDB;
+SET SQL_LOG_BIN=1;
+SET @old_engine= @@GLOBAL.default_storage_engine;
+SET GLOBAL default_storage_engine=InnoDB;
+SET @old_parallel= @@GLOBAL.slave_parallel_threads;
+SET GLOBAL slave_parallel_threads=12;
+--replace_result $SERVER_MYPORT_1 SERVER_MYPORT_1
+eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root', master_log_file='master-bin.000001', master_log_pos=4;
+--source include/start_slave.inc
+
+--connection master
+SET SQL_LOG_BIN=0;
+ALTER TABLE mysql.gtid_slave_pos ENGINE = InnoDB;
+SET SQL_LOG_BIN=1;
+--save_master_pos
+
+--connection slave
+--sync_with_master
+
+SELECT @@gtid_slave_pos;
+CHECKSUM TABLE table0_int_autoinc, table0_key_pk_parts_2_int_autoinc, table100_int_autoinc, table100_key_pk_parts_2_int_autoinc, table10_int_autoinc, table10_key_pk_parts_2_int_autoinc, table1_int_autoinc, table1_key_pk_parts_2_int_autoinc, table2_int_autoinc, table2_key_pk_parts_2_int_autoinc;
+
+--source include/stop_slave.inc
+
+
+SET GLOBAL default_storage_engine= @old_engine;
+SET GLOBAL slave_parallel_threads=@old_parallel;
+SET sql_log_bin=0;
+DROP TABLE table0_int_autoinc;
+DROP TABLE table0_key_pk_parts_2_int_autoinc;
+DROP TABLE table100_int_autoinc;
+DROP TABLE table100_key_pk_parts_2_int_autoinc;
+DROP TABLE table10_int_autoinc;
+DROP TABLE table10_key_pk_parts_2_int_autoinc;
+DROP TABLE table1_int_autoinc;
+DROP TABLE table1_key_pk_parts_2_int_autoinc;
+DROP TABLE table2_int_autoinc;
+DROP TABLE table2_key_pk_parts_2_int_autoinc;
+SET sql_log_bin=1;
+
+--source include/start_slave.inc
+
+--connection master
+
+--source include/rpl_end.inc
author	unknown <knielsen@knielsen-hq.org>	2014-06-03 10:31:11 +0200
committer	Kristian Nielsen <knielsen@knielsen-hq.org>	2014-06-03 10:31:11 +0200
commit	629b822913348cec56ec7a80a236f0ba2e613585 (patch)
tree	8c713a0ad975deb8b6764af03b1cca8e8cd195db /mysql-test/suite/rpl/t/rpl_mdev6020.test
parent	787c470cef54574e744eb5dfd9153d837fe67e45 (diff)
download	mariadb-git-629b822913348cec56ec7a80a236f0ba2e613585.tar.gz