summaryrefslogtreecommitdiff
path: root/ctdb
Commit message (Collapse)AuthorAgeFilesLines
* ctdb-tcp: Close inflight connecting TCP sockets after forkVolker Lendecke2019-11-201-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c68b6f96f26 changed the talloc hierarchy such that outgoing TCP sockets while sitting in the async connect() syscall are not freed via ctdb_tcp_shutdown() anymore, they are hanging off a longer-running structure. Free this structure as well. If an outgoing TCP socket leaks into a long-running child process (possibly the recovery daemon), this connection will never be closed as seen by the destination node. Because with recent changes incoming connections will not be accepted as long as any incoming connection is alive, with that socket leak into the recovery daemon we will never again be able to successfully connect to the node that is affected by this leak. Further attempts to connect will be discarded by the destination as long as the recovery daemon keeps this socket alive. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175 RN: Avoid communication breakdown on node reconnect Signed-off-by: Martin Schwenke <martin@meltin.net> Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit a6d99d9e5c5bc58e6d56be7a6c1dbc7c8d1a882f) Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org> Autobuild-Date(v4-9-test): Wed Nov 20 14:58:33 UTC 2019 on sn-devel-144
* ctdb-tcp: Drop tracking of file descriptor for incoming connectionsMartin Schwenke2019-11-204-11/+0
| | | | | | | | | | | This file descriptor is owned by the incoming queue. It will be closed when the queue is torn down. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit bf47bc18bb8a94231870ef821c0352b7a15c2e28)
* ctdb-tcp: Avoid orphaning the TCP incoming queueMartin Schwenke2019-11-201-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CTDB's incoming queue handling does not check whether an existing queue exists, so can overwrite the pointer to the queue. This used to be harmless until commit c68b6f96f26664459187ab2fbd56767fb31767e0 changed the read callback to use a parent structure as the callback data. Instead of cleaning up an orphaned queue on disconnect, as before, this will now free the new queue. At first glance it doesn't seem possible that 2 incoming connections from the same node could be processed before the intervening disconnect. However, the incoming connections and disconnect occur on different file descriptors. The queue can become orphaned on node A when the following sequence occurs: 1. Node A comes up 2. Node A accepts an incoming connection from node B 3. Node B processes a timeout before noticing that outgoing the queue is writable 4. Node B tears down the outgoing connection to node A 5. Node B initiates a new connection to node A 6. Node A accepts an incoming connection from node B Node A processes then the disconnect of the old incoming connection from (2) but tears down the new incoming connection from (6). This then occurs until the originally affected node is restarted. However, due to the number of outgoing connection attempts and associated teardowns, this induces the same behaviour on the corresponding incoming queue on all nodes that node A attempts to connect to. Therefore, other nodes become affected and need to be restarted too. As a result, the whole cluster probably needs to be restarted to recover from this situation. The problem can occur any time CTDB is started on a node. The fix is to avoid accepting new incoming connections when a queue for incoming connections is already present. The connecting node will simply retry establishing its outgoing connection. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit d0baad257e511280ff3e5c7372c38c43df841070)
* ctdb-tcp: Check incoming queue to see if incoming connection is upMartin Schwenke2019-11-201-1/+1
| | | | | | | | | | | This makes it consistent with the reverse case. Also, in_fd will soon be removed. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit e62b3a05a874db13a848573d2e2fb1c157393b9c)
* ctdb-vacuum: Process all records not deleted on a remote nodeAmitay Isaacs2019-10-161-1/+1
| | | | | | | | | | | This currently skips the last record. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14147 RN: Avoid potential data loss during recovery after vacuuming error Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> (cherry picked from commit 33f1c9d9654fbdcb99c23f9d23c4bbe2cc596b98)
* ctdb-tools: Stop deleted nodes from influencing ctdb nodestatus exit codeMartin Schwenke2019-09-201-1/+7
| | | | | | | | | | | | | | Deleted nodes should simply be ignored. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14129 RN: Stop deleted nodes from influencing ctdb nodestatus exit code Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 32b5ceb31936ec5447362236c1809db003561d29) Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org> Autobuild-Date(v4-9-test): Fri Sep 20 14:09:11 UTC 2019 on sn-devel-144
* ctdb: fix compilation on systems with glibc robust mutexesRalph Boehme2019-09-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | On older systems like SLES 11 without POSIX robust mutexes, but with glib robust mutexes where all the functions are available but have a "_np" suffix, compilation fails in: ctdb/tests/src/test_mutex_raw.c.239.o: In function `worker': /root/samba-4.10.6/bin/default/../../ctdb/tests/src/test_mutex_raw.c:129: undefined reference to `pthread_mutex_consistent' ctdb/tests/src/test_mutex_raw.c.239.o: In function `main': /root/samba-4.10.6/bin/default/../../ctdb/tests/src/test_mutex_raw.c:285: undefined reference to `pthread_mutex_consistent' /root/samba-4.10.6/bin/default/../../ctdb/tests/src/test_mutex_raw.c:332: undefined reference to `pthread_mutexattr_setrobust' /root/samba-4.10.6/bin/default/../../ctdb/tests/src/test_mutex_raw.c:363: undefined reference to `pthread_mutex_consistent' collect2: ld returned 1 exit status This could be fixed by using libreplace system/threads.h instead of pthreads.h directly, but as there has been a desire to keep test_mutex_raw.c standalone and compilable without other external depenencies then libc and libpthread, make the tool developer build only. This should get the average user over the cliff. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14038 RN: Fix compiling ctdb on older systems lacking POSIX robust mutexes Signed-off-by: Ralph Boehme <slow@samba.org> Reviewed-by: Martin Schwenke <martin@meltin.net> (cherry picked from commit f5388f97792ac2d7962950dad91aaf8ad49bceaa) Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org> Autobuild-Date(v4-9-test): Thu Sep 5 16:12:34 UTC 2019 on sn-devel-144
* ctdb-recoverd: Fix typo in previous fixMartin Schwenke2019-09-031-1/+1
| | | | | | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Aug 27 15:29:11 UTC 2019 on sn-devel-184 (cherry picked from commit 8190993d99284162bd8699780248bb2edfec2673)
* ctdb-tests: Clear deleted record via recovery instead of vacuumingMartin Schwenke2019-09-031-11/+5
| | | | | | | | | | | | | | | | | | | | This test has been flapping because sometimes the record is not vacuumed within the expected time period, perhaps even because the check for the record can interfere with vacuuming. However, instead of waiting for vacuuming the record can be cleared by doing a recovery. This should be much more reliable. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 RN: Fix flapping CTDB tests Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Wed Aug 21 13:06:57 UTC 2019 on sn-devel-184 (backported from commit 71ad473ba805abe23bbe6c1a1290612e448e73f3) Signed-off-by: Martin Schwenke <martin@meltin.net>
* ctdb-tests: Strengthen volatile DB traverse testMartin Schwenke2019-09-031-15/+52
| | | | | | | | | | | Check the record count more often, from multiple nodes. Add a case with multiple records. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit ca4df06080709adf0cbebc95b0a70b4090dad5ba)
* ctdb-recoverd: Only check for LMASTER nodes in the VNN mapMartin Schwenke2019-09-031-4/+10
| | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 5d655ac6f2ff82f8f1c89b06870d600a1a3c7a8a)
* ctdb-tests: Don't retrieve the VNN map from target node for notlmasterMartin Schwenke2019-09-031-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use the VNN map from the node running node_has_status(). This means that wait_until_node_has_status 1 notlmaster 10 0 will run "ctdb status" on node 0 and check (for up to 10 seconds) if node 1 is in the VNN map. If the LMASTER capability has been dropped on node 1 then the above will wait for the VNN map to be updated on node 0. This will happen as part of the recovery that is triggered by the change of LMASTER capability. The next command will then only be able to attach to $TESTDB after the recovery is complete thus guaranteeing a sane state for the test to continue. This stops simple/79_volatile_db_traverse.sh from going into recovery during the traverse or at some other inconvenient time. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 53daeb2f878af1634a26e05cb86d87e2faf20173)
* ctdb-tests: Handle special cases first and returnMartin Schwenke2019-09-031-31/+28
| | | | | | | | | | All the other cases involve matching bits. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit bff1a3a548a2cace997b767d78bb824438664cb7)
* ctdb-tests: Inline handling of recovered and notlmaster statusesMartin Schwenke2019-09-031-6/+12
| | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit bb59073515ee5f7886b5d9a20d7b2805857c2708)
* ctdb-tests: Drop unused node statuses frozen/unfrozenMartin Schwenke2019-09-031-6/+2
| | | | | | | | | | Silently drop unused local variable mpat. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 9b09a87326af28877301ad27bcec5bb13744e2b6)
* ctdb-tests: Reformat node_has_status()Martin Schwenke2019-09-031-46/+48
| | | | | | | | | | Re-indent and drop non-POSIX left-parenthesis from case labels. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 52227d19735a3305ad633672c70385f443f222f0)
* ctdb-daemon: Make node inactive in the NODE_STOP controlMartin Schwenke2019-08-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Currently some of this is supported by a periodic check in the recovery daemon's main_loop(), which notices the flag change, sets recovery mode active and freezes databases. If STOP_NODE returns immediately then the associated recovery can complete and the node can be continued before databases are actually frozen. Instead, immediately do all of the things that make a node inactive. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087 RN: Stop "ctdb stop" from completing before freezing databases Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Aug 20 08:32:27 UTC 2019 on sn-devel-184 (cherry picked from commit e9f2e205ee89f4f3d6302cc11b4d0eb2efaf0f53) Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org> Autobuild-Date(v4-9-test): Wed Aug 28 12:04:13 UTC 2019 on sn-devel-144
* ctdb-daemon: Drop unused function ctdb_local_node_got_banned()Martin Schwenke2019-08-282-25/+0
| | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 91ac4c13d8472955d1f04bd775ec4b3ff8bf1b61)
* ctdb-daemon: Switch banning code to use ctdb_node_become_inactive()Martin Schwenke2019-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | There's no reason to avoid immediately setting recovery mode to active and initiating freeze of databases. This effectively reverts the following commits: d8f3b490bbb691c9916eed0df5b980c1aef23c85 b4357a79d916b1f8ade8fa78563fbef0ce670aa9 The latter is now implemented using a control, resulting in looser coupling. See also the following commit: f8141e91a693912ea1107a49320e83702a80757a BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 0f5f7b7cf4e970f3f36c5e0b3d09e710fe90801a)
* ctdb-daemon: Factor out new function ctdb_node_become_inactive()Martin Schwenke2019-08-282-0/+45
| | | | | | | | | | | This is a superset of ctdb_local_node_got_banned() so will replace that function, and will also be used in the NODE_STOP control. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit a42bcaabb63722411bee52b80cbfc795593defbc)
* ctdb-tcp: Mark node as disconnected if incoming connection goes awayMartin Schwenke2019-08-282-2/+5
| | | | | | | | | | | | | | | | To make it easy to pass the node data to the upcall, the private data for ctdb_tcp_read_cb() needs to be changed from tnode to node. RN: Avoid marking a node as connected before it can receive packets BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Fri Aug 16 22:50:35 UTC 2019 on sn-devel-184 (cherry picked from commit 73c850eda4209b688a169aeeb20c453b738cbb35)
* ctdb-tcp: Only mark a node connected if both directions are upMartin Schwenke2019-08-281-3/+17
| | | | | | | | | | | | | | Nodes are currently marked as up if the outgoing connection is established. However, if the incoming connection is not yet established then this node could send a request where the replying node can not queue its reply. Wait until both directions are up before marking a node as connected. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 8c98c10f242bc722beffc711e85c0e4f2e74cd57)
* ctdb-tcp: Create outbound queue when the connection becomes writableMartin Schwenke2019-08-283-12/+25
| | | | | | | | | | | | | | | | | Since commit ddd97553f0a8bfaada178ec4a7460d76fa21f079 ctdb_queue_send() doesn't queue a packet if the connection isn't yet established (i.e. when fd == -1). So, don't bother creating the outbound queue during initialisation but create it when the connection becomes writable. Now the presence of the queue indicates that the outbound connection is up. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 7f4854d9643a096a6d8a354fcd27b7c6ed24a75e)
* ctdb-tcp: Use TALLOC_FREE()Martin Schwenke2019-08-281-4/+2
| | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit d80d9edb4dc107b15a35a39e5c966a3eaed6453a)
* ctdb-tcp: Move incoming fd and queue into struct ctdb_tcp_nodeMartin Schwenke2019-08-284-34/+61
| | | | | | | | | | | This makes it easy to track both incoming and outgoing connectivity states. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit c68b6f96f26664459187ab2fbd56767fb31767e0)
* ctdb-tcp: Rename fd -> out_fdMartin Schwenke2019-08-283-49/+72
| | | | | | | | | | | | | | in_fd is coming soon. Fix coding style violations in the affected and adjacent lines. Modernise some debug macros and make them more consistent (e.g. drop logging of errno when strerror(errno) is already logged. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit c06620169fc178ea6db2631f03edf008285d8cf2)
* ctdb-daemon: Add function ctdb_ip_to_node()Martin Schwenke2019-08-282-5/+21
| | | | | | | | | | | | | | | This is the core logic from ctdb_ip_to_pnn(), so re-implement that that function using ctdb_ip_to_node(). Something similar (ctdb_ip_to_nodeid()) was recently removed in commit 010c1d77cd7e192b1fff39b7b91fccbdbbf4a786 because it wasn't required. Now there is a use case. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 3acb8e9d1c854b577d6be282257269df83055d31)
* ctdb-daemon: Replace function ctdb_ip_to_nodeid() with ctdb_ip_to_pnn()Martin Schwenke2019-08-284-20/+18
| | | | | | | | | | | | Node ID is a poorly defined concept, indicating the slot in the node map where the IP address was found. This signed value also ends up compared to num_nodes, which is unsigned, producing unwanted warnings. Just return the PNN because this what both callers really want. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 010c1d77cd7e192b1fff39b7b91fccbdbbf4a786)
* ctdb-config: depend on /etc/ctdb/nodes fileRafael David Tinoco2019-08-081-0/+1
| | | | | | | | | | | | | | | | | | CTDB should start as a disabled unit (systemd) in most of the distributions and, when trying to enable it for the first time, user should get an unconfigured, or similar, error. Depending on /etc/ctdb/nodes file will give a clear direction to final user on what is needed in order to get cluster up and running. It should work like previous ENABLED=NO variables in SySV like initialization scripts. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14017 RN: ctdb.service should only start if /etc/ctdb/nodes is not empty Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit c5803507df7def388edcd5b6cbfee30cd217b536)
* ctdb-scripts: Fix tcp_tw_recycle existence checkRafael David Tinoco via samba-technical2019-06-211-2/+2
| | | | | | | | | | | | | | | | | net.ipv4.tcp_tw_recycle has been removed from Linux 4.12 but, still, makes sense to check its existence. Unfortunately, current check does not test for the procfs file existence. This commit fixes the issue. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13984 Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org> Autobuild-Date(master): Tue Jun 4 23:31:24 UTC 2019 on sn-devel-184 (cherry picked from commit 843fbb1207ee7ac84f3282974b66b9290d8da0ac)
* ctdb-common: Fix memory leak in run_procAmitay Isaacs2019-05-171-2/+5
| | | | | | | | | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue May 14 08:59:03 UTC 2019 on sn-devel-184 (cherry picked from commit b1f4c86eea022999d5439e4a6ef3494fe41479b6) Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org> Autobuild-Date(v4-9-test): Fri May 17 10:56:19 UTC 2019 on sn-devel-144
* ctdb-common: Fix memory leakMartin Schwenke2019-05-171-1/+2
| | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 30bc6e2529cdd444d4ec7902844c3a6fb0858090)
* ctdb-recoverd: Fix memory leakMartin Schwenke2019-05-171-1/+1
| | | | | | | | | | | state is always freed before exiting this function, so allocate fde off it instead of long-lived ctdb context. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 6a2941e2a9fd6ab2d5b8dbac042b61a7b1b0b914)
* ctdb:common: Do not print NULL if we don't get a sockpathAndreas Schneider2019-05-171-1/+1
| | | | | | | | | | | | sock_socket_start_recv() might not fill sockpath if we return early. Found by GCC 9. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13937 Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> (cherry picked from commit 830cb7e67568de5f3ce359cb6af3be8ab545c824)
* ctdb-daemon: Never use 0 as a client IDMartin Schwenke2019-05-171-1/+47
| | | | | | | | | | | | ctdb_control_db_attach() and ctdb_control_db_detach() assume that any control with client ID 0 comes from another daemon and treat it specially. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13930 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 8663e0a64fbdb9ea16babbfe87d6f5d7a7b72bbd)
* ctdb-tests: Fix logic error in simple ctdb reloadips testMartin Schwenke2019-05-171-17/+20
| | | | | | | | | | | | | | | | There is a chance that restoring IP addresses to the test node will result in different IP addresses being assigned to that node. Removing a single IP address may then fail (or be a no-op) if it is done after the restore. So, swap the single IP address removal to happen first, then restore, then remove all IP addresses. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit dc89db8ca6aadd4a9f7e8a85843c53709d04587c)
* ctdb-tests: Make ctdb reloadips tests more reliableMartin Schwenke2019-05-172-7/+61
| | | | | | | | | | | | | ctdb reloadips will fail if it can't disable takover runs. The most likely reason for this is that there is already a takeover run in progress. We can't predict when this will happen, so retry if this occurs. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 8be4ee1a28d5c037955832b6f827d40f28f02796)
* ctdb-tests: Capture output in $out on failure as wellMartin Schwenke2019-05-171-3/+5
| | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit cf00db40355b49443263187f9d97934f91287e51)
* ctdb-tests: Don't clean up test var directory in autotest targetMartin Schwenke2019-05-171-1/+1
| | | | | | | | | | | | | | | | | | If the directory is always cleaned up then it is not possible to look at daemon logs to debug test failures. This target is only really used by autobuild.py, which (optionally) cleans up the parent directory anyway. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue May 7 06:56:01 UTC 2019 on sn-devel-184 (cherry picked from commit 5a9e338330fe136908a3a17a5df81c054c5cc5b0)
* ctdb-tests: Fix usage messageMartin Schwenke2019-05-171-1/+1
| | | | | | | | | | | | Since commit 0e9ead8f28fced3ebfa888786a1dc5bb59e734a3 daemons have been shut down after each test, so this option no longer has anything to do with killing daemons. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit a2ab6485e027ebb13871c7d83b7626ac5c9b98c0)
* ctdb-tests: Wait to allow database attach/detach to take effectMartin Schwenke2019-05-172-36/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes the detach test fails: Check detaching single test database detach_test1.tdb BAD: database detach_test1.tdb is still attached Number of databases:4 dbid:0x5ae995ee name:detach_test4.tdb path:tests/var/simple/node.0/db/volatile/detach_test4.tdb.0 dbid:0xd84cc13c name:detach_test3.tdb path:tests/var/simple/node.0/db/volatile/detach_test3.tdb.0 dbid:0x8e8e8cef name:detach_test2.tdb path:tests/var/simple/node.0/db/volatile/detach_test2.tdb.0 dbid:0xc62491f4 name:detach_test1.tdb path:tests/var/simple/node.0/db/volatile/detach_test1.tdb.0 Number of databases:3 dbid:0x5ae995ee name:detach_test4.tdb path:tests/var/simple/node.1/db/volatile/detach_test4.tdb.1 dbid:0xd84cc13c name:detach_test3.tdb path:tests/var/simple/node.1/db/volatile/detach_test3.tdb.1 dbid:0x8e8e8cef name:detach_test2.tdb path:tests/var/simple/node.1/db/volatile/detach_test2.tdb.1 Number of databases:4 dbid:0x5ae995ee name:detach_test4.tdb path:tests/var/simple/node.2/db/volatile/detach_test4.tdb.2 dbid:0xd84cc13c name:detach_test3.tdb path:tests/var/simple/node.2/db/volatile/detach_test3.tdb.2 dbid:0x8e8e8cef name:detach_test2.tdb path:tests/var/simple/node.2/db/volatile/detach_test2.tdb.2 dbid:0xc62491f4 name:detach_test1.tdb path:tests/var/simple/node.2/db/volatile/detach_test1.tdb.2 *** TEST COMPLETED (RC=1) AT 2019-04-27 03:35:40, CLEANING UP... When issued from a client, the detach control re-broadcasts itself asynchronously to all nodes and then returns success. The controls to some nodes to do the actual detach may still be in flight when success is returned to the client. Therefore, the test should wait for a few seconds to allow the asynchronous controls to complete. The same is true for the attach control, so workaround the problem in the attach test too. An alternative is to make the attach and detach controls synchronous by avoiding the broadcast and waiting for the results of the individual controls sent to the nodes. However, a simple implementation would involve adding new nested event loops. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 3cb53a7a05409925024d6a67bcfaeb962d896e0b)
* ctdb-tests: Avoid bulk output in $out, prefer $outfileMartin Schwenke2019-05-1738-198/+167
| | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 066cc5b0c561464ed08890d9aa1a1a55b545e9cc)
* ctdb-tests: Make try_command_on_node less error-proneMartin Schwenke2019-05-171-8/+22
| | | | | | | | | | | | | | | | | | | | This sometimes fails, apparently due to a cat process in onnode getting EAGAIN. The conclusion is that tests that process large amounts of output should not depend on a sub-shell delivering that output into a shell variable. Change try_command_on_node() to leave all of the output in file $outfile and just put the first 1KB into $out. $outfile is removed after each test completes. Change the implementation of sanity_check_output() to use $outfile instead of $out. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 9d02452a24625df5f62fd6d45a16effe2fa45fbe)
* ctdb-tests: Change sanity_check_output() to internally use $outMartin Schwenke2019-05-1711-21/+15
| | | | | | | | | | | | All callers are currently passed $out. Global variable $out is used in many other places so use it here to simplify the interface and make future changes simpler. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 7c3819d1ac264acf998f426e0cef7f6211e0ddee)
* ctdb-tests: Extend test to cover ctdb rddumpmemoryMartin Schwenke2019-05-171-2/+5
| | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=13923 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 8108b3134c017c22d245fc5b2207a88d44ab0dd2)
* ctdb-tools: Fix ctdb dumpmemory to avoid printing trailing NULMartin Schwenke2019-05-171-4/+6
| | | | | | | | | | Fix ctdb rddumpmemory too. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13923 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit f78d9388fb459dc83fafb4da6e683e3137ad40e1)
* ctdb-common: Avoid race between fd and signal eventsAmitay Isaacs2019-04-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=13895 In run_proc, there was an implicit assumption that when a process exits, fd event (pipe between parent and child) would be processed first and signal event (SIGCHLD for the child) would be processed later. However, that is not the case. SIGCHLD can be received asynchronously any time even when the pipe data has not fully been read. This causes run_proc to miss some of the output from child process in tests. When SIGCHLD is being processed, if the pipe between parent and child is still open, then do an explict read from the pipe to ensure we read any data still in the pipe before closing the pipe. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri Apr 12 08:19:29 UTC 2019 on sn-devel-144 (cherry picked from commit 289201277cd983b27cdfd5376c607eab112b4082) Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org> Autobuild-Date(v4-9-test): Mon Apr 15 12:55:46 UTC 2019 on sn-devel-144
* ctdb-daemon: Revert "We can not assume that just because we could complete a ↵Martin Schwenke2019-04-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | TCP handshake" We also can not assume that nodes can be marked as connected via only the keepalive mechanism. Keepalives are not sent to disconnected nodes so, in the absence of other packets (e.g. broadcasts), 2 nodes may never become marked as connected to each other. Revert to marking nodes as connected in the TCP transport code. If a connection is to a non(-operational) ctdbd then it will revert to disconnected after a short while and may actually flap. This should be rare. This reverts commit 66919db3d7ab1e091223faf515b183af8bfddc83. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13888 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit 38dc6d11a26c2e9a2cae7927321f2216ceb1c5ec)
* ctdb-scripts: Update statd-callout to try several configuration filesMartin Schwenke2019-04-121-1/+2
| | | | | | | | | | | | The alternative seems to be to try something via CTDB_NFS_CALLOUT. That would be complicated and seems like overkill for something this simple. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13860 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@samba.org> (cherry picked from commit a2bd4085896804ee2da811e17f18c78a5bf4e658)
* ctdb-scripts: Allow load_system_config() to take multiple alternativesMartin Schwenke2019-04-121-9/+10
| | | | | | | | | | | | | The situation for NFS config has got more complicated and is probably broken in statd-callout on Debian-like systems at the moment. Allow several alternative configuration names to be tried. Stop after the first that is found and loaded. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13860 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@samba.org> (cherry picked from commit 0d67ea5fcca766734ecc73ad6b0139f7c13a15c5)