Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Be a bit less verbose here: it's actually a very common case and perfectly ↵ | Simon MacMullen | 2014-10-14 | 1 | -2/+2 |
| | | | | normal; no need to dump state. | ||||
* | Check whether the cluster is fully connected before trying to autoheal, and ↵ | Simon MacMullen | 2014-10-14 | 1 | -25/+41 |
| | | | | ignore autoheal requests if it isn't. | ||||
* | Switch to having the winner inform the losers that they need to stop, rather ↵ | Simon MacMullen | 2014-10-14 | 1 | -25/+8 |
| | | | | than having the leader do it. This fixes the race where the leader tells them to stop before the partition has healed from the winner's POV. So it should be simpler and more correct. | ||||
* | In fact, that case can't happen since bug 26043, so let's simplify. | Simon MacMullen | 2014-10-14 | 1 | -7/+0 |
| | |||||
* | Distinguish between "already stopped" (fine, carry on) or "already down" ↵ | Simon MacMullen | 2014-10-03 | 1 | -16/+26 |
| | | | | (abort since we've lost contact). | ||||
* | Make sure we don't hang waiting for a node to go down if it went down before ↵ | Simon MacMullen | 2014-09-12 | 1 | -9/+28 |
| | | | | we became the winner. | ||||
* | Separate out responsibilities in the various node state detection functions. ↵bug26225 | Simon MacMullen | 2014-06-27 | 1 | -0/+1 |
| | | | | Only ping_all/0 is allowed to establish new tcp connections (and thus take significant time for them to time out if necessary). This removes a significant delay while waiting for pause_minority to start. | ||||
* | Update copyright for 2014bug25940 | Simon MacMullen | 2014-03-17 | 1 | -1/+1 |
| | |||||
* | Explainbug26043 | Simon MacMullen | 2014-03-14 | 1 | -1/+3 |
| | |||||
* | Fix stupidity, and rename. | Simon MacMullen | 2014-03-14 | 1 | -4/+4 |
| | |||||
* | Get the leader to transition directly to winner or loser state if that's ↵ | Simon MacMullen | 2014-03-10 | 1 | -2/+17 |
| | | | | where it's going, or wait in a special lead_waiting status if neither, so that if we get any more autoheal requests we can ignore them. | ||||
* | Merge in defaultbug26027 | Simon MacMullen | 2014-02-26 | 1 | -1/+12 |
|\ | |||||
| * | If we abandon autoheal while in winner_waiting then let the losing nodes ↵bug26038 | Simon MacMullen | 2014-02-26 | 1 | -1/+12 |
| | | | | | | | | know they can carry on. | ||||
* | | Update comment. | Simon MacMullen | 2014-02-26 | 1 | -4/+7 |
| | | |||||
* | | Eliminate the node_stopped message, since it is possible that a badly-timed ↵ | Simon MacMullen | 2014-02-26 | 1 | -19/+17 |
|/ | | | | stop_app could lead to us missing it. Instead just go based on whether the rabbit stops - if it stops for any reason other than autoheal, we just send it a message it will ignore and continue. | ||||
* | Inform autoheal that a node is down on nodedown not rabbit app down; ↵bug26006 | Simon MacMullen | 2014-02-17 | 1 | -2/+0 |
| | | | | therefore stop ignoring nodedown in winner_waiting. | ||||
* | Refresh branch from stable | Emile Joubert | 2013-07-31 | 1 | -2/+1 |
|\ | |||||
| * | More sensible API for partitions, do not return errors.bug25651 | Simon MacMullen | 2013-07-04 | 1 | -2/+1 |
| | | |||||
* | | s/VMware/GoPivotal/g | Simon MacMullen | 2013-07-01 | 1 | -2/+2 |
|/ | |||||
* | space--bug25560 | Simon MacMullen | 2013-05-20 | 1 | -1/+1 |
| | |||||
* | Ignore autoheal requests if we are already autohealing. | Simon MacMullen | 2013-05-20 | 1 | -0/+6 |
| | |||||
* | Remove obsolete and wrong comment. | Simon MacMullen | 2013-05-20 | 1 | -7/+2 |
| | |||||
* | Move those functions to their own place, and replace the autoheal ↵ | Simon MacMullen | 2013-04-22 | 1 | -1/+1 |
| | | | | all_nodes_up check with all_rabbit_nodes_up since it will depend on the rabbit application running to DTRT. | ||||
* | Have the leader decide what to do and then just tell other nodes (rather ↵ | Simon MacMullen | 2013-04-18 | 1 | -97/+58 |
| | | | | than have them request a winner). Substantially more reliable and shorter than previously. | ||||
* | Rename states to hopefully be clearer; add more comments. | Simon MacMullen | 2013-04-17 | 1 | -12/+42 |
| | |||||
* | First pass at splitting all the autoheal stuff out into a separate module. | Simon MacMullen | 2013-04-17 | 1 | -0/+208 |