summaryrefslogtreecommitdiff
path: root/gear/__init__.py
diff options
context:
space:
mode:
authorJan Kundrát <jkt@kde.org>2014-11-28 01:48:58 +0100
committerJan Kundrát <jkt@kde.org>2014-11-28 02:00:21 +0100
commit77ba48998f1f1dd881bff852b7c07ef87998e359 (patch)
tree2621319f1867c494e79674aacba8c6eef594f817 /gear/__init__.py
parent7bad8c1b9275bfacc47e7d5b8ba5abb6204a0c21 (diff)
downloadgear-77ba48998f1f1dd881bff852b7c07ef87998e359.tar.gz
Fix race between stopWaitingForJobs() and getJob()
Turbo-Hipster's test suite was failing for me in a very non-deterministic manner -- sometimes the ZuulClient would get stuck in a call to gearman-Worker's getJob. It turned out that it was possible for the whole worker.stopWaitingForJobs to finish before a call to worker.getJob) gets scheduled. This meant that the stopWaitingForJobs' logic which tried hard to interrupt any pending getJob() calls failed. The fix is to let the getJob() check whether it missed it chance, i.e. whether the whole worker is not supposed to be running anymore. In order to guarantee thread safety, both setting of this variable and checking whether it's set should happen in a synchronized manner. Stuff gets messy here: both getJob() and stopWaitingForJobs() acquire a lock, which means that getJob() must *not* hold the lock while it blocks (otherwise, stopWaitingForJobs() won't be able to interrupt it because it would get deadlocked before it gets a chance to enqueue its Nones). It seems that there's one illusive race, though -- when thread B is executing getJob() and gets interrupted right after the try/finally terminates (and hence the lock is already released) and execution turns into stopWaitingForJobs, it's quite possible that the self.running gets unset after the getJob has already checked it. However, the lock also protects the self.waiting_for_jobs which means that either the stopWaitingForJobs() will see an increased waiting_for_jobs integer, or that the getJob() will notice an updated self.running. Change-Id: I51ec9cf06622d91ab22a4ff80630fae7913d4b5d
Diffstat (limited to 'gear/__init__.py')
-rw-r--r--gear/__init__.py19
1 files changed, 18 insertions, 1 deletions
diff --git a/gear/__init__.py b/gear/__init__.py
index fd8ad09..b266b31 100644
--- a/gear/__init__.py
+++ b/gear/__init__.py
@@ -1791,6 +1791,14 @@ class Worker(BaseClient):
if not job:
self._updateStateMachines()
+
+ # That variable get cleared during _shutdown(), before the
+ # stopWaitingForJobs() is called. The check has to happen with the
+ # self.job_lock held, otherwise there would be a window for race
+ # conditions between manipulation of "running" and
+ # "waiting_for_jobs".
+ if not self.running:
+ raise InterruptedError()
finally:
self.job_lock.release()
@@ -1837,7 +1845,16 @@ class Worker(BaseClient):
self.job_lock.release()
def _shutdown(self):
- super(Worker, self)._shutdown()
+ self.job_lock.acquire()
+ try:
+ # The upstream _shutdown() will clear the "running" bool. Because
+ # that is a variable which is used for proper synchronization of
+ # the exit within getJob() which might be about to be called from a
+ # separate thread, it's important to call it with a proper lock
+ # being held.
+ super(Worker, self)._shutdown()
+ finally:
+ self.job_lock.release()
self.stopWaitingForJobs()
def handleNoop(self, packet):