Fix for BUG#2011 "rare race condition producing "binlog has bad magic number" error in slave".

The problem was that when the slave SQL thread reads a hot relay log (hot = the one being written to by the slave I/O thread), it must have the LOCK_log. It already took it for read_log_event(), but needs it also for check_binlog_magic(). This should fix all recently reported failures of the rpl_max_relay_size test in 4.1 and 5.0 (though the bug exists since 4.0, it showed up first in 5.0). sql/slave.cc: Fix for BUG#2011 "rare race condition producing "binlog has bad magic number" error in slave". The problem was that when the slave SQL thread reads a hot relay log (hot = the one being written to by the slave I/O thread), it must have the LOCK_log. It already took it for read_log_event(), but needs it also for check_binlog_magic().
author: unknown <guilhem@mysql.com> 2003-12-04 15:30:14 +0100
committer: unknown <guilhem@mysql.com> 2003-12-04 15:30:14 +0100
commit: 8479e5a379044acad53baa8f9b7a8cd9266c582a (patch)
tree: ebe7a79e0ce5811215ea0ef2a0b0c25db2c1e5b4 /sql/slave.cc
parent: e8fc6d460c379bbb7b5b4b75e9919ea226a62a43 (diff)
download: mariadb-git-8479e5a379044acad53baa8f9b7a8cd9266c582a.tar.gz
1 files changed, 26 insertions, 5 deletions
diff --git a/sql/slave.cc b/sql/slave.cc
index 6816d968007..5bc31fd6a21 100644
--- a/sql/slave.cc
+++ b/sql/slave.cc
@@ -3497,8 +3497,20 @@ rli->relay_log_pos=%s rli->pending=%lu",
 		sizeof(rli->relay_log_name)-1);
 	flush_relay_log_info(rli);
       }
-	
-      // next log is hot 
+
+      /*
+        Now we want to open this next log. To know if it's a hot log (the one
+        being written by the I/O thread now) or a cold log, we can use
+        is_active(); if it is hot, we use the I/O cache; if it's cold we open
+        the file normally. But if is_active() reports that the log is hot, this
+        may change between the test and the consequence of the test. So we may
+        open the I/O cache whereas the log is now cold, which is nonsense.
+        To guard against this, we need to have LOCK_log.
+      */
+
+      DBUG_PRINT("info",("hot_log: %d",hot_log));
+      if (!hot_log) /* if hot_log, we already have this mutex */
+        pthread_mutex_lock(log_lock);
       if (rli->relay_log.is_active(rli->linfo.log_file_name))
       {
 #ifdef EXTRA_DEBUG
@@ -3511,15 +3523,24 @@ rli->relay_log_pos=%s rli->pending=%lu",
 	  
 	/*
 	  Read pointer has to be at the start since we are the only
-	  reader
+	  reader.
+          We must keep the LOCK_log to read the 4 first bytes, as this is a hot
+          log (same as when we call read_log_event() above: for a hot log we
+          take the mutex).
 	*/
 	if (check_binlog_magic(cur_log,&errmsg))
+        {
+          if (!hot_log) pthread_mutex_unlock(log_lock);
 	  goto err;
+        }
+        if (!hot_log) pthread_mutex_unlock(log_lock);
 	continue;
       }
+      if (!hot_log) pthread_mutex_unlock(log_lock);
       /*
-	if we get here, the log was not hot, so we will have to
-	open it ourselves
+	if we get here, the log was not hot, so we will have to open it
+	ourselves. We are sure that the log is still not hot now (a log can get
+	from hot to cold, but not from cold to hot). No need for LOCK_log.
       */
 #ifdef EXTRA_DEBUG
       sql_print_error("next log '%s' is not active",
author	unknown <guilhem@mysql.com>	2003-12-04 15:30:14 +0100
committer	unknown <guilhem@mysql.com>	2003-12-04 15:30:14 +0100
commit	8479e5a379044acad53baa8f9b7a8cd9266c582a (patch)
tree	ebe7a79e0ce5811215ea0ef2a0b0c25db2c1e5b4 /sql/slave.cc
parent	e8fc6d460c379bbb7b5b4b75e9919ea226a62a43 (diff)
download	mariadb-git-8479e5a379044acad53baa8f9b7a8cd9266c582a.tar.gz