summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMartin Schwenke <martin@meltin.net>2015-05-04 15:27:19 +1000
committerAmitay Isaacs <amitay@samba.org>2015-05-04 10:40:36 +0200
commit20a7945a2695d7ed811237adde5af6549e53c6e9 (patch)
treed6e29f0d894b80fbf0e661030adc03d0548d13bf
parent26ad4b368d9b7be12baa28ad62ae6346c4b907ee (diff)
downloadsamba-20a7945a2695d7ed811237adde5af6549e53c6e9.tar.gz
Revert "ctdb-recoverd: Abort when daemon can take recovery lock during recovery"
This reverts commit 39d2fd330a60ea590d76213f8cb406a42fa8d680. An election can occur in the middle of a recovery. During the election the recovery master can change. When a node loses a round of the election and stops being the recovery master it releases the recovery lock. Then at the end of the ongoing recovery all nodes are able to take the recovery lock so they will all abort. The most likely cause for a change in recovery master is that several (all?) nodes are starting up and the "connected-ness" of each node is a primary factor in winning the election. In this situation the recovery master can bounce around the cluster. The simplest solution is to revert this patch so that the recovery will fail. The new recovery master will then start a new recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon May 4 10:40:36 CEST 2015 on sn-devel-104
-rw-r--r--ctdb/server/ctdb_recover.c6
1 files changed, 2 insertions, 4 deletions
diff --git a/ctdb/server/ctdb_recover.c b/ctdb/server/ctdb_recover.c
index 7a684d55320..427d9858648 100644
--- a/ctdb/server/ctdb_recover.c
+++ b/ctdb/server/ctdb_recover.c
@@ -504,13 +504,11 @@ static void set_recmode_handler(struct event_context *ev, struct fd_event *fde,
*/
ret = sys_read(state->fd[0], &c, 1);
if (ret != 1 || c != 0) {
- const char *msg = \
- "Took recovery lock from daemon - probably a cluster filesystem lock coherence problem";
ctdb_request_control_reply(
state->ctdb, state->c, NULL, -1,
- msg);
+ "Took recovery lock from daemon during recovery - probably a cluster filesystem lock coherence problem");
talloc_free(state);
- ctdb_die(state->ctdb, msg);
+ return;
}
state->ctdb->recovery_mode = state->recmode;