diff options
author | Davi Arnaut <davi.arnaut@oracle.com> | 2010-10-22 09:58:09 -0200 |
---|---|---|
committer | Davi Arnaut <davi.arnaut@oracle.com> | 2010-10-22 09:58:09 -0200 |
commit | 2881b8014ca7101684358b25aaf54784c7f43613 (patch) | |
tree | 4571f70663dd1d045d339716fc55ff6c809fec4a /sql/slave.cc | |
parent | a776e5f3d297f45d63f48ad919ccd46307cddb30 (diff) | |
download | mariadb-git-2881b8014ca7101684358b25aaf54784c7f43613.tar.gz |
Bug#37780: Make KILL reliable (main.kill fails randomly)
- A prerequisite cleanup patch for making KILL reliable.
The test case main.kill did not work reliably.
The following problems have been identified:
1. A kill signal could go lost if it came in, short before a
thread went reading on the client connection.
2. A kill signal could go lost if it came in, short before a
thread went waiting on a condition variable.
These problems have been solved as follows. Please see also
added code comments for more details.
1. There is no safe way to detect, when a thread enters the
blocking state of a read(2) or recv(2) system call, where it
can be interrupted by a signal. Hence it is not possible to
wait for the right moment to send a kill signal. It has been
decided, not to fix it in the code. Instead, the test case
repeats the KILL statement until the connection terminates.
2. Before waiting on a condition variable, we register it
together with a synchronizating mutex in THD::mysys_var. After
this, we need to test THD::killed again. At some places we did
only test it in a loop condition before the registration. When
THD::killed had been set between this test and the registration,
we entered waiting without noticing the killed flag. Additional
checks ahve been introduced where required.
In addition to the above, a re-write of the main.kill test
case has been done. All sleeps have been replaced by Debug
Sync Facility synchronization. A couple of sync points have
been added to the server code.
To avoid further problems, if the test case fails in spite of
the fixes, the test case has been added to the "experimental"
list for now.
- Most of the work on this patch is authored by Ingo Struewing
mysql-test/t/kill.test:
Re-wrote test case to use Debug Sync points instead of sleeps
sql/event_queue.cc:
Fixed kill detection in Event_queue::cond_wait() by adding a check
after enter_cond().
sql/lock.cc:
Moved Debug Sync points behind enter_cond().
Fixed comments.
sql/slave.cc:
Fixed kill detection in start_slave_thread() by adding a check
after enter_cond().
sql/sql_class.cc:
Swapped order of kill and close in THD::awake().
Added comments.
sql/sql_class.h:
Added a comment to THD::killed.
sql/sql_parse.cc:
Added a sync point in do_command().
sql/sql_select.cc:
Added a sync point in JOIN::optimize().
Diffstat (limited to 'sql/slave.cc')
-rw-r--r-- | sql/slave.cc | 14 |
1 files changed, 11 insertions, 3 deletions
diff --git a/sql/slave.cc b/sql/slave.cc index ab8952069fb..a6313f0b850 100644 --- a/sql/slave.cc +++ b/sql/slave.cc @@ -721,9 +721,17 @@ int start_slave_thread( while (start_id == *slave_run_id) { DBUG_PRINT("sleep",("Waiting for slave thread to start")); - const char* old_msg = thd->enter_cond(start_cond,cond_lock, - "Waiting for slave thread to start"); - mysql_cond_wait(start_cond, cond_lock); + const char *old_msg= thd->enter_cond(start_cond, cond_lock, + "Waiting for slave thread to start"); + /* + It is not sufficient to test this at loop bottom. We must test + it after registering the mutex in enter_cond(). If the kill + happens after testing of thd->killed and before the mutex is + registered, we could otherwise go waiting though thd->killed is + set. + */ + if (!thd->killed) + mysql_cond_wait(start_cond, cond_lock); thd->exit_cond(old_msg); mysql_mutex_lock(cond_lock); // re-acquire it as exit_cond() released if (thd->killed) |