ctdb-tcp: Avoid orphaning the TCP incoming queue

CTDB's incoming queue handling does not check whether an existing queue exists, so can overwrite the pointer to the queue. This used to be harmless until commit c68b6f96f26664459187ab2fbd56767fb31767e0 changed the read callback to use a parent structure as the callback data. Instead of cleaning up an orphaned queue on disconnect, as before, this will now free the new queue. At first glance it doesn't seem possible that 2 incoming connections from the same node could be processed before the intervening disconnect. However, the incoming connections and disconnect occur on different file descriptors. The queue can become orphaned on node A when the following sequence occurs: 1. Node A comes up 2. Node A accepts an incoming connection from node B 3. Node B processes a timeout before noticing that outgoing the queue is writable 4. Node B tears down the outgoing connection to node A 5. Node B initiates a new connection to node A 6. Node A accepts an incoming connection from node B Node A processes then the disconnect of the old incoming connection from (2) but tears down the new incoming connection from (6). This then occurs until the originally affected node is restarted. However, due to the number of outgoing connection attempts and associated teardowns, this induces the same behaviour on the corresponding incoming queue on all nodes that node A attempts to connect to. Therefore, other nodes become affected and need to be restarted too. As a result, the whole cluster probably needs to be restarted to recover from this situation. The problem can occur any time CTDB is started on a node. The fix is to avoid accepting new incoming connections when a queue for incoming connections is already present. The connecting node will simply retry establishing its outgoing connection. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit d0baad257e511280ff3e5c7372c38c43df841070)
author: Martin Schwenke <martin@meltin.net> 2019-10-29 15:29:11 +1100
committer: Karolin Seeger <kseeger@samba.org> 2019-11-20 11:15:25 +0000
commit: 14406d123ab4587715ca97114e933f3ae1e31c17 (patch)
tree: bf976c7aa94d1ac48a654620b1968a9c431b575d
parent: 20b823fc255e640a6bbf4debcc69c738b53c5229 (diff)
download: samba-14406d123ab4587715ca97114e933f3ae1e31c17.tar.gz
1 files changed, 7 insertions, 0 deletions
diff --git a/ctdb/tcp/tcp_connect.c b/ctdb/tcp/tcp_connect.c
index 66e10e841a1..a30ee23cf7c 100644
--- a/ctdb/tcp/tcp_connect.c
+++ b/ctdb/tcp/tcp_connect.c
@@ -312,6 +312,13 @@ static void ctdb_listen_event(struct tevent_context *ev, struct tevent_fd *fde,
 		return;
 	}
 
+	if (tnode->in_queue != NULL) {
+		DBG_ERR("Incoming queue active, rejecting connection from %s\n",
+			ctdb_addr_to_str(&addr));
+		close(fd);
+		return;
+	}
+
 	ret = set_blocking(fd, false);
 	if (ret != 0) {
 		DBG_ERR("Failed to set socket non-blocking (%s)\n",
author	Martin Schwenke <martin@meltin.net>	2019-10-29 15:29:11 +1100
committer	Karolin Seeger <kseeger@samba.org>	2019-11-20 11:15:25 +0000
commit	14406d123ab4587715ca97114e933f3ae1e31c17 (patch)
tree	bf976c7aa94d1ac48a654620b1968a9c431b575d
parent	20b823fc255e640a6bbf4debcc69c738b53c5229 (diff)
download	samba-14406d123ab4587715ca97114e933f3ae1e31c17.tar.gz