diff options
author | sjaakola <seppo.jaakola@iki.fi> | 2020-05-19 11:12:26 +0300 |
---|---|---|
committer | Jan Lindström <jan.lindstrom@mariadb.com> | 2020-06-23 13:58:12 +0300 |
commit | 5463ad35a3d070a6aed8a266a0b834613dadfc29 (patch) | |
tree | 3c410f7cc6c5f9ebe0bf7f486ff6bc7556e49676 /sql/sql_parse.cc | |
parent | 37c88445e30d52c965bcb19b19fa710c3eb4fad9 (diff) | |
download | mariadb-git-bb-10.1-MDEV-21910-2.tar.gz |
MDEV-21910 Deadlock between BF abort and manual KILL commandbb-10.1-MDEV-21910-2
When high priority replication slave applier encounters lock conflict in innodb,
it will force the conflicting lock holder transaction (victim) to rollback.
This is a must in multi-master sychronous replication model to avoid cluster lock-up.
This high priority victim abort (aka "brute force" (BF) abort), is started
from innodb lock manager while holding the global lock_sys->mutex and victim's
transaction's (trx) mutex.
Depending on the execution state of the victim transaction, it may happen that the
BF abort will call for THD::awake() to wake up the victim transaction for the rollback.
Now, if BF abort requires THD::awake() to be called, then the applier thread executed
locking protocol of: lock_sys->mutex -> victim THD::LOCK_thd_data
If, at the same time another DBMS super user issues KILL command to abort the same victim,
it will execute locking protocol of: victim THD::LOCK_thd_data -> lock_sys->mutex.
These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking
deadlock may occur.
The fix in this commit adds THD::wsrep_killed flag to synchronize who can kill the victim
This flag is set both when BF is called for from innodb and by KILL command.
Either path of victim killing will bail out if victim's wsrep_killed is already
set to avoid mutex conflicts with the other aborter execution.
A new test case was added in galera.galera_bf_kill.test for scenario where
wsrep applier thread and manual KILL command try to kill same idle victim
Diffstat (limited to 'sql/sql_parse.cc')
-rw-r--r-- | sql/sql_parse.cc | 20 |
1 files changed, 18 insertions, 2 deletions
diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc index 656da3b6a79..765cca31e81 100644 --- a/sql/sql_parse.cc +++ b/sql/sql_parse.cc @@ -8324,8 +8324,24 @@ kill_one_thread(THD *thd, longlong id, killed_state kill_signal, killed_type typ thd->security_ctx->user_matches(tmp->security_ctx)) && !wsrep_thd_is_BF(tmp, false)) { - tmp->awake(kill_signal); - error=0; +#ifdef WITH_WSREP + DEBUG_SYNC(thd, "before_awake_no_mutex"); + + // Note that find_thread_by_id will lock tmp->LOCK_thd_data + if (wsrep_thd_set_wsrep_killed(tmp)) + { + WSREP_DEBUG("Kill transaction thread %llu skipped due to wsrep_killed", tmp->thread_id); + error= 0; + } + else +#endif /* WITH_WSREP */ + { + WSREP_DEBUG("kill_one_thread %llu, victim: %llu wsrep_killed %d by signal %d", + thd->thread_id, id, tmp->wsrep_killed, kill_signal); + tmp->awake(kill_signal); + WSREP_DEBUG("victim: %llu taken care of", id); + error= 0; + } } else error= (type == KILL_TYPE_QUERY ? ER_KILL_QUERY_DENIED_ERROR : |