summaryrefslogtreecommitdiff
path: root/src/rabbit_autoheal.erl
Commit message (Collapse)AuthorAgeFilesLines
* Be a bit less verbose here: it's actually a very common case and perfectly ↵Simon MacMullen2014-10-141-2/+2
| | | | normal; no need to dump state.
* Check whether the cluster is fully connected before trying to autoheal, and ↵Simon MacMullen2014-10-141-25/+41
| | | | ignore autoheal requests if it isn't.
* Switch to having the winner inform the losers that they need to stop, rather ↵Simon MacMullen2014-10-141-25/+8
| | | | than having the leader do it. This fixes the race where the leader tells them to stop before the partition has healed from the winner's POV. So it should be simpler and more correct.
* In fact, that case can't happen since bug 26043, so let's simplify.Simon MacMullen2014-10-141-7/+0
|
* Distinguish between "already stopped" (fine, carry on) or "already down" ↵Simon MacMullen2014-10-031-16/+26
| | | | (abort since we've lost contact).
* Make sure we don't hang waiting for a node to go down if it went down before ↵Simon MacMullen2014-09-121-9/+28
| | | | we became the winner.
* Separate out responsibilities in the various node state detection functions. ↵bug26225Simon MacMullen2014-06-271-0/+1
| | | | Only ping_all/0 is allowed to establish new tcp connections (and thus take significant time for them to time out if necessary). This removes a significant delay while waiting for pause_minority to start.
* Update copyright for 2014bug25940Simon MacMullen2014-03-171-1/+1
|
* Explainbug26043Simon MacMullen2014-03-141-1/+3
|
* Fix stupidity, and rename.Simon MacMullen2014-03-141-4/+4
|
* Get the leader to transition directly to winner or loser state if that's ↵Simon MacMullen2014-03-101-2/+17
| | | | where it's going, or wait in a special lead_waiting status if neither, so that if we get any more autoheal requests we can ignore them.
* Merge in defaultbug26027Simon MacMullen2014-02-261-1/+12
|\
| * If we abandon autoheal while in winner_waiting then let the losing nodes ↵bug26038Simon MacMullen2014-02-261-1/+12
| | | | | | | | know they can carry on.
* | Update comment.Simon MacMullen2014-02-261-4/+7
| |
* | Eliminate the node_stopped message, since it is possible that a badly-timed ↵Simon MacMullen2014-02-261-19/+17
|/ | | | stop_app could lead to us missing it. Instead just go based on whether the rabbit stops - if it stops for any reason other than autoheal, we just send it a message it will ignore and continue.
* Inform autoheal that a node is down on nodedown not rabbit app down; ↵bug26006Simon MacMullen2014-02-171-2/+0
| | | | therefore stop ignoring nodedown in winner_waiting.
* Refresh branch from stableEmile Joubert2013-07-311-2/+1
|\
| * More sensible API for partitions, do not return errors.bug25651Simon MacMullen2013-07-041-2/+1
| |
* | s/VMware/GoPivotal/gSimon MacMullen2013-07-011-2/+2
|/
* space--bug25560Simon MacMullen2013-05-201-1/+1
|
* Ignore autoheal requests if we are already autohealing.Simon MacMullen2013-05-201-0/+6
|
* Remove obsolete and wrong comment.Simon MacMullen2013-05-201-7/+2
|
* Move those functions to their own place, and replace the autoheal ↵Simon MacMullen2013-04-221-1/+1
| | | | all_nodes_up check with all_rabbit_nodes_up since it will depend on the rabbit application running to DTRT.
* Have the leader decide what to do and then just tell other nodes (rather ↵Simon MacMullen2013-04-181-97/+58
| | | | than have them request a winner). Substantially more reliable and shorter than previously.
* Rename states to hopefully be clearer; add more comments.Simon MacMullen2013-04-171-12/+42
|
* First pass at splitting all the autoheal stuff out into a separate module.Simon MacMullen2013-04-171-0/+208