From ad0d203f2ec9b3c696e6c688fe9314f498efc232 Mon Sep 17 00:00:00 2001
From: Kristian Nielsen <knielsen@knielsen-hq.org>
Date: Wed, 18 Feb 2015 12:22:50 +0100
Subject: MDEV-6589: Incorrect relay log start position when restarting SQL
 thread after error in parallel replication

The problem occurs in parallel replication in GTID mode, when we are using
multiple replication domains. In this case, if the SQL thread stops, the
slave GTID position may refer to a different point in the relay log for each
domain.

The bug was that when the SQL thread was stopped and restarted (but the IO
thread was kept running), the SQL thread would resume applying the relay log
from the point of the most advanced replication domain, silently skipping all
earlier events within other domains. This caused replication corruption.

This patch solves the problem by storing, when the SQL thread stops with
multiple parallel replication domains active, the current GTID
position. Additionally, the current position in the relay logs is moved back
to a point known to be earlier than the current position of any replication
domain. Then when the SQL thread restarts from the earlier position, GTIDs
encountered are compared against the stored GTID position. Any GTID that was
already applied before the stop is skipped to avoid duplicate apply.

This patch should have no effect if multi-domain GTID parallel replication is
not used. Similarly, if both SQL and IO thread are stopped and restarted, the
patch has no effect, as in this case the existing relay logs are removed and
re-fetched from the master at the current global @@gtid_slave_pos.
---
 sql/rpl_gtid.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'sql/rpl_gtid.h')

diff --git a/sql/rpl_gtid.h b/sql/rpl_gtid.h
index 3e9e2fce25f..22771833845 100644
--- a/sql/rpl_gtid.h
+++ b/sql/rpl_gtid.h
@@ -235,6 +235,7 @@ struct rpl_binlog_state
   void reset();
   void free();
   bool load(struct rpl_gtid *list, uint32 count);
+  bool load(rpl_slave_state *slave_pos);
   int update_nolock(const struct rpl_gtid *gtid, bool strict);
   int update(const struct rpl_gtid *gtid, bool strict);
   int update_with_next_gtid(uint32 domain_id, uint32 server_id,
-- 
cgit v1.2.1


From 78c74dbe30d3a22feec5d069c7424d5a8a86ea4c Mon Sep 17 00:00:00 2001
From: Kristian Nielsen <knielsen@knielsen-hq.org>
Date: Wed, 4 Mar 2015 13:10:37 +0100
Subject: MDEV-6403: Temporary tables lost at STOP SLAVE in GTID mode if master
 has not rotated binlog since restart

The binlog contains specially marked format description events to mark
when a master restart happened (which could have caused temporary
tables to be silently dropped). Such events also cause slave to close
temporary tables.

However, there was a bug that if after this, slave re-connects to the
master in GTID mode, the master can send an old format description
event again. If temporary tables are closed when such event is seen
for the second time, it might drop temporary tables created after that
event, and cause replication failure.

With this patch, the restart flag of the format description event is
cleared by the master when it is sent to the slave in a subsequent
connection, to avoid the errorneous temp table close.
---
 sql/rpl_gtid.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'sql/rpl_gtid.h')

diff --git a/sql/rpl_gtid.h b/sql/rpl_gtid.h
index 22771833845..997540728a5 100644
--- a/sql/rpl_gtid.h
+++ b/sql/rpl_gtid.h
@@ -288,6 +288,7 @@ struct slave_connection_state
   int to_string(String *out_str);
   int append_to_string(String *out_str);
   int get_gtid_list(rpl_gtid *gtid_list, uint32 list_size);
+  bool is_pos_reached();
 };
 
 
-- 
cgit v1.2.1