Bug#37780: Make KILL reliable (main.kill fails randomly)

- A prerequisite cleanup patch for making KILL reliable. The test case main.kill did not work reliably. The following problems have been identified: 1. A kill signal could go lost if it came in, short before a thread went reading on the client connection. 2. A kill signal could go lost if it came in, short before a thread went waiting on a condition variable. These problems have been solved as follows. Please see also added code comments for more details. 1. There is no safe way to detect, when a thread enters the blocking state of a read(2) or recv(2) system call, where it can be interrupted by a signal. Hence it is not possible to wait for the right moment to send a kill signal. It has been decided, not to fix it in the code. Instead, the test case repeats the KILL statement until the connection terminates. 2. Before waiting on a condition variable, we register it together with a synchronizating mutex in THD::mysys_var. After this, we need to test THD::killed again. At some places we did only test it in a loop condition before the registration. When THD::killed had been set between this test and the registration, we entered waiting without noticing the killed flag. Additional checks ahve been introduced where required. In addition to the above, a re-write of the main.kill test case has been done. All sleeps have been replaced by Debug Sync Facility synchronization. A couple of sync points have been added to the server code. To avoid further problems, if the test case fails in spite of the fixes, the test case has been added to the "experimental" list for now. - Most of the work on this patch is authored by Ingo Struewing mysql-test/t/kill.test: Re-wrote test case to use Debug Sync points instead of sleeps sql/event_queue.cc: Fixed kill detection in Event_queue::cond_wait() by adding a check after enter_cond(). sql/lock.cc: Moved Debug Sync points behind enter_cond(). Fixed comments. sql/slave.cc: Fixed kill detection in start_slave_thread() by adding a check after enter_cond(). sql/sql_class.cc: Swapped order of kill and close in THD::awake(). Added comments. sql/sql_class.h: Added a comment to THD::killed. sql/sql_parse.cc: Added a sync point in do_command(). sql/sql_select.cc: Added a sync point in JOIN::optimize().
author: Davi Arnaut <davi.arnaut@oracle.com> 2010-10-22 09:58:09 -0200
committer: Davi Arnaut <davi.arnaut@oracle.com> 2010-10-22 09:58:09 -0200
commit: 2881b8014ca7101684358b25aaf54784c7f43613 (patch)
tree: 4571f70663dd1d045d339716fc55ff6c809fec4a /sql/slave.cc
parent: a776e5f3d297f45d63f48ad919ccd46307cddb30 (diff)
download: mariadb-git-2881b8014ca7101684358b25aaf54784c7f43613.tar.gz
1 files changed, 11 insertions, 3 deletions
diff --git a/sql/slave.cc b/sql/slave.cc
index ab8952069fb..a6313f0b850 100644
--- a/sql/slave.cc
+++ b/sql/slave.cc
@@ -721,9 +721,17 @@ int start_slave_thread(
     while (start_id == *slave_run_id)
     {
       DBUG_PRINT("sleep",("Waiting for slave thread to start"));
-      const char* old_msg = thd->enter_cond(start_cond,cond_lock,
-                                            "Waiting for slave thread to start");
-      mysql_cond_wait(start_cond, cond_lock);
+      const char *old_msg= thd->enter_cond(start_cond, cond_lock,
+                                           "Waiting for slave thread to start");
+      /*
+        It is not sufficient to test this at loop bottom. We must test
+        it after registering the mutex in enter_cond(). If the kill
+        happens after testing of thd->killed and before the mutex is
+        registered, we could otherwise go waiting though thd->killed is
+        set.
+      */
+      if (!thd->killed)
+        mysql_cond_wait(start_cond, cond_lock);
       thd->exit_cond(old_msg);
       mysql_mutex_lock(cond_lock); // re-acquire it as exit_cond() released
       if (thd->killed)
author	Davi Arnaut <davi.arnaut@oracle.com>	2010-10-22 09:58:09 -0200
committer	Davi Arnaut <davi.arnaut@oracle.com>	2010-10-22 09:58:09 -0200
commit	2881b8014ca7101684358b25aaf54784c7f43613 (patch)
tree	4571f70663dd1d045d339716fc55ff6c809fec4a /sql/slave.cc
parent	a776e5f3d297f45d63f48ad919ccd46307cddb30 (diff)
download	mariadb-git-2881b8014ca7101684358b25aaf54784c7f43613.tar.gz