summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKarolos Antoniadis <karolos@gmail.com>2019-08-02 11:08:47 -0700
committerMichael Han <hanm@apache.org>2019-08-02 11:08:47 -0700
commit3882a0171f91280bf1adbbd4ffaeb17cb5131316 (patch)
tree80ce483b067249665419457fe28a76ad68ee3de6
parent104e78e008f84ab2d350b68db5a8a46297135faa (diff)
downloadzookeeper-3882a0171f91280bf1adbbd4ffaeb17cb5131316.tar.gz
ZOOKEEPER-3479: Logging false leader election times
See https://issues.apache.org/jira/browse/ZOOKEEPER-3479 here for the issue. The problem is that the logged leader election times are much smaller than in reality. For example, we have log files that contain the following lines: ``` 2019-07-31 00:57:31,852 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer1318] - PeerState set to LOOKING 2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer1193] - LOOKING 2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):FastLeaderElection885] - New election. My id = 1, proposed zxid=0x100000001 [...] 2019-07-31 00:57:32,272 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):Follower69] - FOLLOWING - LEADER ELECTION TOOK - 1 MS ``` Leader election took more than 400ms, not 1ms! The reason this logging bug exists has to do with this [line](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1402) in `QuorumPeer`. The computation of the leader election time starts inside [`FastLeaderElection.lookForLeader` ](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L915) where `start_fle` is set. After a server has found a leader, it repeats the `QuorumPeer` [loop](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1288) and resets the `start_fle` value. Hence, when, for example, a follower [logs](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Follower.java#L75) how much time it took for the leader election, it logs a much smaller time than it actually took for the leader election to complete. To be precise, it just logs the time it took to loop inside [`QuorumPeer`](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1288). Furthermore, if we remove the problematic [line](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1402), then there is no reason to check whether `start_fle == 0` in [`FastLeaderElection`](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L914), since `start_fle` is always `0` at that point. I'm not sure if this `if` statement is needed in the first place. Author: foo bar <karolos@gmail.com> Reviewers: Michael Han <hanm@apache.org>, Allan Lyu <fangmin@apache.org> Closes #1031 from insumity/ZOOKEEPER-3479
-rw-r--r--zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java5
-rw-r--r--zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java1
2 files changed, 2 insertions, 4 deletions
diff --git a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java b/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java
index 6dacc49ab..42fed411e 100644
--- a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java
+++ b/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java
@@ -911,9 +911,8 @@ public class FastLeaderElection implements Election {
LOG.warn("Failed to register with JMX", e);
self.jmxLeaderElectionBean = null;
}
- if (self.start_fle == 0) {
- self.start_fle = Time.currentElapsedTime();
- }
+
+ self.start_fle = Time.currentElapsedTime();
try {
Map<Long, Vote> recvset = new HashMap<Long, Vote>();
diff --git a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java b/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
index 521f6d42b..ce794d1e5 100644
--- a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
+++ b/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
@@ -1399,7 +1399,6 @@ public class QuorumPeer extends ZooKeeperThread implements QuorumStats.Provider
}
break;
}
- start_fle = Time.currentElapsedTime();
}
} finally {
LOG.warn("QuorumPeer main thread exited");