summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJenkins <jenkins@review.openstack.org>2014-08-31 01:44:03 +0000
committerGerrit Code Review <review@openstack.org>2014-08-31 01:44:03 +0000
commit205485eabdbfe8bd8afbae83e4495536658f3d70 (patch)
treedc06e90e9125ded73834679c100a750c3aa06637
parent9c0329bcfa6e42e68200e2482ed3d47446f0c832 (diff)
parentc2ec0b2e4980dee3d28134df34b1bc664c7a7cac (diff)
downloadtaskflow-205485eabdbfe8bd8afbae83e4495536658f3d70.tar.gz
Merge "Add a conductor considerations section"
-rw-r--r--doc/source/conductors.rst34
-rw-r--r--doc/source/img/conductor_cycle.pngbin0 -> 36940 bytes
-rw-r--r--doc/source/jobs.rst6
-rw-r--r--doc/source/workers.rst4
4 files changed, 38 insertions, 6 deletions
diff --git a/doc/source/conductors.rst b/doc/source/conductors.rst
index 4dfa3e3..25eb75c 100644
--- a/doc/source/conductors.rst
+++ b/doc/source/conductors.rst
@@ -24,9 +24,41 @@ They are responsible for the following:
.. note::
- They are inspired by and have similar responsiblities
+ They are inspired by and have similar responsibilities
as `railroad conductors`_.
+Considerations
+==============
+
+Some usage considerations should be used when using a conductor to make sure
+it's used in a safe and reliable manner. Eventually we hope to make these
+non-issues but for now they are worth mentioning.
+
+Endless cycling
+---------------
+
+**What:** Jobs that fail (due to some type of internal error) on one conductor
+will be abandoned by that conductor and then another conductor may experience
+those same errors and abandon it (and repeat). This will create a job
+abandonment cycle that will continue for as long as the job exists in an
+claimable state.
+
+**Example:**
+
+.. image:: img/conductor_cycle.png
+ :scale: 70%
+ :alt: Conductor cycling
+
+**Alleviate by:**
+
+#. Forcefully delete jobs that have been failing continuously after a given
+ number of conductor attempts. This can be either done manually or
+ automatically via scripts (or other associated monitoring).
+#. Resolve the internal error's cause (storage backend failure, other...).
+#. Help implement `jobboard garbage binning`_.
+
+.. _jobboard garbage binning: https://blueprints.launchpad.net/taskflow/+spec/jobboard-garbage-bin
+
Interfaces
==========
diff --git a/doc/source/img/conductor_cycle.png b/doc/source/img/conductor_cycle.png
new file mode 100644
index 0000000..b09d71b
--- /dev/null
+++ b/doc/source/img/conductor_cycle.png
Binary files differ
diff --git a/doc/source/jobs.rst b/doc/source/jobs.rst
index 05592b5..048a66e 100644
--- a/doc/source/jobs.rst
+++ b/doc/source/jobs.rst
@@ -214,7 +214,7 @@ the engine can immediately stop doing further work. The effect that this causes
is that when a claim is lost another engine can immediately attempt to acquire
the claim that was previously lost and it *could* begin working on the
unfinished tasks that the later engine may also still be executing (since that
-engine is not yet aware that it has lost the claim).
+engine is not yet aware that it has *lost* the claim).
**TLDR:** not `preemptable`_, possible to become aware of losing a claim
after the fact (at the next state change), another engine could have acquired
@@ -235,8 +235,8 @@ the claim by then, therefore both would be *working* on a job.
#. Delay claiming partially completed work by adding a wait period (to allow
the previous engine to coalesce) before working on a partially completed job
- (combine this with the prior suggestions and dual-engine issues should be
- avoided).
+ (combine this with the prior suggestions and *most* dual-engine issues
+ should be avoided).
.. _idempotent: http://en.wikipedia.org/wiki/Idempotence
.. _preemptable: http://en.wikipedia.org/wiki/Preemption_%28computing%29
diff --git a/doc/source/workers.rst b/doc/source/workers.rst
index 6ed987b..9c2f2b9 100644
--- a/doc/source/workers.rst
+++ b/doc/source/workers.rst
@@ -7,8 +7,7 @@ Overview
This is engine that schedules tasks to **workers** -- separate processes
dedicated for certain atoms execution, possibly running on other machines,
-connected via `amqp`_ (or other supported `kombu
-<http://kombu.readthedocs.org/>`_ transports).
+connected via `amqp`_ (or other supported `kombu`_ transports).
.. note::
@@ -18,6 +17,7 @@ connected via `amqp`_ (or other supported `kombu
production ready.
.. _blueprint page: https://blueprints.launchpad.net/taskflow?searchtext=wbe
+.. _kombu: http://kombu.readthedocs.org/
Terminology
-----------