diff options
author | Jenkins <jenkins@review.openstack.org> | 2014-08-31 01:44:03 +0000 |
---|---|---|
committer | Gerrit Code Review <review@openstack.org> | 2014-08-31 01:44:03 +0000 |
commit | 205485eabdbfe8bd8afbae83e4495536658f3d70 (patch) | |
tree | dc06e90e9125ded73834679c100a750c3aa06637 | |
parent | 9c0329bcfa6e42e68200e2482ed3d47446f0c832 (diff) | |
parent | c2ec0b2e4980dee3d28134df34b1bc664c7a7cac (diff) | |
download | taskflow-205485eabdbfe8bd8afbae83e4495536658f3d70.tar.gz |
Merge "Add a conductor considerations section"
-rw-r--r-- | doc/source/conductors.rst | 34 | ||||
-rw-r--r-- | doc/source/img/conductor_cycle.png | bin | 0 -> 36940 bytes | |||
-rw-r--r-- | doc/source/jobs.rst | 6 | ||||
-rw-r--r-- | doc/source/workers.rst | 4 |
4 files changed, 38 insertions, 6 deletions
diff --git a/doc/source/conductors.rst b/doc/source/conductors.rst index 4dfa3e3..25eb75c 100644 --- a/doc/source/conductors.rst +++ b/doc/source/conductors.rst @@ -24,9 +24,41 @@ They are responsible for the following: .. note:: - They are inspired by and have similar responsiblities + They are inspired by and have similar responsibilities as `railroad conductors`_. +Considerations +============== + +Some usage considerations should be used when using a conductor to make sure +it's used in a safe and reliable manner. Eventually we hope to make these +non-issues but for now they are worth mentioning. + +Endless cycling +--------------- + +**What:** Jobs that fail (due to some type of internal error) on one conductor +will be abandoned by that conductor and then another conductor may experience +those same errors and abandon it (and repeat). This will create a job +abandonment cycle that will continue for as long as the job exists in an +claimable state. + +**Example:** + +.. image:: img/conductor_cycle.png + :scale: 70% + :alt: Conductor cycling + +**Alleviate by:** + +#. Forcefully delete jobs that have been failing continuously after a given + number of conductor attempts. This can be either done manually or + automatically via scripts (or other associated monitoring). +#. Resolve the internal error's cause (storage backend failure, other...). +#. Help implement `jobboard garbage binning`_. + +.. _jobboard garbage binning: https://blueprints.launchpad.net/taskflow/+spec/jobboard-garbage-bin + Interfaces ========== diff --git a/doc/source/img/conductor_cycle.png b/doc/source/img/conductor_cycle.png Binary files differnew file mode 100644 index 0000000..b09d71b --- /dev/null +++ b/doc/source/img/conductor_cycle.png diff --git a/doc/source/jobs.rst b/doc/source/jobs.rst index 05592b5..048a66e 100644 --- a/doc/source/jobs.rst +++ b/doc/source/jobs.rst @@ -214,7 +214,7 @@ the engine can immediately stop doing further work. The effect that this causes is that when a claim is lost another engine can immediately attempt to acquire the claim that was previously lost and it *could* begin working on the unfinished tasks that the later engine may also still be executing (since that -engine is not yet aware that it has lost the claim). +engine is not yet aware that it has *lost* the claim). **TLDR:** not `preemptable`_, possible to become aware of losing a claim after the fact (at the next state change), another engine could have acquired @@ -235,8 +235,8 @@ the claim by then, therefore both would be *working* on a job. #. Delay claiming partially completed work by adding a wait period (to allow the previous engine to coalesce) before working on a partially completed job - (combine this with the prior suggestions and dual-engine issues should be - avoided). + (combine this with the prior suggestions and *most* dual-engine issues + should be avoided). .. _idempotent: http://en.wikipedia.org/wiki/Idempotence .. _preemptable: http://en.wikipedia.org/wiki/Preemption_%28computing%29 diff --git a/doc/source/workers.rst b/doc/source/workers.rst index 6ed987b..9c2f2b9 100644 --- a/doc/source/workers.rst +++ b/doc/source/workers.rst @@ -7,8 +7,7 @@ Overview This is engine that schedules tasks to **workers** -- separate processes dedicated for certain atoms execution, possibly running on other machines, -connected via `amqp`_ (or other supported `kombu -<http://kombu.readthedocs.org/>`_ transports). +connected via `amqp`_ (or other supported `kombu`_ transports). .. note:: @@ -18,6 +17,7 @@ connected via `amqp`_ (or other supported `kombu production ready. .. _blueprint page: https://blueprints.launchpad.net/taskflow?searchtext=wbe +.. _kombu: http://kombu.readthedocs.org/ Terminology ----------- |