diff options
author | Jan Kundrát <jkt@kde.org> | 2014-11-28 01:48:58 +0100 |
---|---|---|
committer | Jan Kundrát <jkt@kde.org> | 2014-11-28 02:00:21 +0100 |
commit | 77ba48998f1f1dd881bff852b7c07ef87998e359 (patch) | |
tree | 2621319f1867c494e79674aacba8c6eef594f817 /gear | |
parent | 7bad8c1b9275bfacc47e7d5b8ba5abb6204a0c21 (diff) | |
download | gear-77ba48998f1f1dd881bff852b7c07ef87998e359.tar.gz |
Fix race between stopWaitingForJobs() and getJob()
Turbo-Hipster's test suite was failing for me in a very non-deterministic
manner -- sometimes the ZuulClient would get stuck in a call to
gearman-Worker's getJob. It turned out that it was possible for the
whole worker.stopWaitingForJobs to finish before a call to
worker.getJob) gets scheduled. This meant that the stopWaitingForJobs'
logic which tried hard to interrupt any pending getJob() calls failed.
The fix is to let the getJob() check whether it missed it chance, i.e.
whether the whole worker is not supposed to be running anymore. In order
to guarantee thread safety, both setting of this variable and checking
whether it's set should happen in a synchronized manner. Stuff gets
messy here: both getJob() and stopWaitingForJobs() acquire a lock, which
means that getJob() must *not* hold the lock while it blocks (otherwise,
stopWaitingForJobs() won't be able to interrupt it because it would get
deadlocked before it gets a chance to enqueue its Nones).
It seems that there's one illusive race, though -- when thread B is
executing getJob() and gets interrupted right after the try/finally
terminates (and hence the lock is already released) and execution turns
into stopWaitingForJobs, it's quite possible that the self.running gets
unset after the getJob has already checked it. However, the lock also
protects the self.waiting_for_jobs which means that either the
stopWaitingForJobs() will see an increased waiting_for_jobs integer, or
that the getJob() will notice an updated self.running.
Change-Id: I51ec9cf06622d91ab22a4ff80630fae7913d4b5d
Diffstat (limited to 'gear')
-rw-r--r-- | gear/__init__.py | 19 |
1 files changed, 18 insertions, 1 deletions
diff --git a/gear/__init__.py b/gear/__init__.py index fd8ad09..b266b31 100644 --- a/gear/__init__.py +++ b/gear/__init__.py @@ -1791,6 +1791,14 @@ class Worker(BaseClient): if not job: self._updateStateMachines() + + # That variable get cleared during _shutdown(), before the + # stopWaitingForJobs() is called. The check has to happen with the + # self.job_lock held, otherwise there would be a window for race + # conditions between manipulation of "running" and + # "waiting_for_jobs". + if not self.running: + raise InterruptedError() finally: self.job_lock.release() @@ -1837,7 +1845,16 @@ class Worker(BaseClient): self.job_lock.release() def _shutdown(self): - super(Worker, self)._shutdown() + self.job_lock.acquire() + try: + # The upstream _shutdown() will clear the "running" bool. Because + # that is a variable which is used for proper synchronization of + # the exit within getJob() which might be about to be called from a + # separate thread, it's important to call it with a proper lock + # being held. + super(Worker, self)._shutdown() + finally: + self.job_lock.release() self.stopWaitingForJobs() def handleNoop(self, packet): |