docs/build/usage.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251

===========
Usage Guide
===========

dogpile.core provides a locking interface around a "value creation" and
"value retrieval" pair of functions.

The primary interface is the :class:`.Lock` object, which provides for
the invocation of the creation function by only one thread and/or process at
a time, deferring all other threads/processes to the "value retrieval" function
until the single creation thread is completed.

Do I Need to Learn the dogpile.core API Directly?
=================================================

It's anticipated that most users of dogpile.core will be using it indirectly via the
`dogpile.cache <http://bitbucket.org/zzzeek/dogpile.cache>`_ caching
front-end.  If you fall into this category, then the short answer is no.

dogpile.core provides core internals to the
`dogpile.cache <http://bitbucket.org/zzzeek/dogpile.cache>`_
package, which provides a simple-to-use caching API, rudimental
backends for Memcached and others, and easy hooks to add new backends.
Users of dogpile.cache
don't need to know or access dogpile.core's APIs directly, though a rough understanding
the general idea is always helpful.

Using the core dogpile.core APIs described here directly implies you're building your own
resource-usage system outside, or in addition to, the one
`dogpile.cache <http://bitbucket.org/zzzeek/dogpile.cache>`_ provides.

Rudimentary Usage
==================

The primary API dogpile provides is the :class:`.Lock` object.   This object allows for
functions that provide mutexing, value creation, as well as value retrieval.

.. versionchanged:: 0.4.0
    The :class:`.Dogpile` class is no longer the primary API of dogpile,
    replaced by the more straightforward :class:`.Lock` object.

An example usage is as follows::

  from dogpile.core import Lock, NeedRegenerationException
  import threading
  import time

  # store a reference to a "resource", some
  # object that is expensive to create.
  the_resource = [None]

  def some_creation_function():
      # call a value creation function
      value = create_some_resource()

      # get creationtime using time.time()
      creationtime = time.time()

      # keep track of the value and creation time in the "cache"
      the_resource[0] = tup = (value, creationtime)

      # return the tuple of (value, creationtime)
      return tup

  def retrieve_resource():
      # function that retrieves the resource and
      # creation time.

      # if no resource, then raise NeedRegenerationException
      if the_resource[0] is None:
          raise NeedRegenerationException()

      # else return the tuple of (value, creationtime)
      return the_resource[0]

  # a mutex, which needs here to be shared across all invocations
  # of this particular creation function
  mutex = threading.Lock()

  with Lock(mutex, some_creation_function, retrieve_resource, 3600) as value:
        # some function that uses
        # the resource.  Won't reach
        # here until some_creation_function()
        # has completed at least once.
        value.do_something()

Above, ``some_creation_function()`` will be called
when :class:`.Lock` is first invoked as a context manager.   The value returned by this
function is then passed into the ``with`` block, where it can be used
by application code.  Concurrent threads which
call :class:`.Lock` during this initial period
will be blocked until ``some_creation_function()`` completes.

Once the creation function has completed successfully the first time,
new calls to :class:`.Lock` will call ``retrieve_resource()``
in order to get the current cached value as well as its creation
time; if the creation time is older than the current time minus
an expiration time of 3600, then ``some_creation_function()``
will be called again, but only by one thread/process, using the given
mutex object as a source of synchronization.  Concurrent threads/processes
which call :class:`.Lock` during this period will fall through,
and not be blocked; instead, the "stale" value just returned by
``retrieve_resource()`` will continue to be returned until the creation
function has finished.

The :class:`.Lock` API is designed to work with simple cache backends
like Memcached.   It addresses such issues as:

* Values can disappear from the cache at any time, before our expiration
  time is reached.  The :class:`.NeedRegenerationException` class is used
  to alert the :class:`.Lock` object that a value needs regeneration ahead
  of the usual expiration time.
* There's no function in a Memcached-like system to "check" for a key without
  actually retrieving it.  The usage of the ``retrieve_resource()`` function
  allows that we check for an existing key and also return the existing value,
  if any, at the same time, without the need for two separate round trips.
* The "creation" function used by :class:`.Lock` is expected to store the
  newly created value in the cache, as well as to return it.   This is also
  more efficient than using two separate round trips to separately store,
  and re-retrieve, the object.

.. _caching_decorator:

Using dogpile.core for Caching
===============================

dogpile.core is part of an effort to "break up" the Beaker
package into smaller, simpler components (which also work better). Here, we
illustrate how to approximate Beaker's "cache decoration"
function, to decorate any function and store the value in
Memcached.  We create a Python decorator function called ``cached()`` which
will provide caching for the output of a single function.  It's given
the "key" which we'd like to use in Memcached, and internally it makes
usage of :class:`.Lock`, along with a thread based mutex (we'll see a distributed mutex
in the next section)::

    import pylibmc
    import threading
    import time
    from dogpile.core import Lock, NeedRegenerationException

    mc_pool = pylibmc.ThreadMappedPool(pylibmc.Client("localhost"))

    def cached(key, expiration_time):
        """A decorator that will cache the return value of a function
        in memcached given a key."""

        mutex = threading.Lock()

        def get_value():
             with mc_pool.reserve() as mc:
                value_plus_time = mc.get(key)
                if value_plus_time is None:
                    raise NeedRegenerationException()
                # return a tuple (value, createdtime)
                return value_plus_time

        def decorate(fn):
            def gen_cached():
                value = fn()
                with mc_pool.reserve() as mc:
                    # create a tuple (value, createdtime)
                    value_plus_time = (value, time.time())
                    mc.put(key, value_plus_time)
                return value_plus_time

            def invoke():
                with Lock(mutex, gen_cached, get_value, expiration_time) as value:
                    return value
            return invoke

        return decorate

Using the above, we can decorate any function as::

    @cached("some key", 3600)
    def generate_my_expensive_value():
        return slow_database.lookup("stuff")

The :class:`.Lock` object will ensure that only one thread at a time performs ``slow_database.lookup()``,
and only every 3600 seconds, unless Memcached has removed the value, in which case it will
be called again as needed.

In particular, dogpile.core's system allows us to call the memcached get() function at most
once per access, instead of Beaker's system which calls it twice, and doesn't make us call
get() when we just created the value.

For the mutex object, we keep a ``threading.Lock`` object that's local
to the decorated function, rather than using a global lock.   This localizes
the in-process locking to be local to this one decorated function.   In the next section,
we'll see the usage of a cross-process lock that accomplishes this differently.

Using a File or Distributed Lock with Dogpile
==============================================

The examples thus far use a ``threading.Lock()`` object for synchronization.
If our application uses multiple processes, we will want to coordinate creation
operations not just on threads, but on some mutex that other processes can access.

In this example
we'll use a file-based lock as provided by the `lockfile <http://pypi.python.org/pypi/lockfile>`_
package, which uses a unix-symlink concept to provide a filesystem-level lock (which also
has been made threadsafe).  Another strategy may base itself directly off the Unix ``os.flock()``
call, or use an NFS-safe file lock like `flufl.lock <http://pypi.python.org/pypi/flufl.lock>`_,
and still another approach is to lock against a cache server, using a recipe
such as that described at `Using Memcached as a Distributed Locking Service <http://www.regexprn.com/2010/05/using-memcached-as-distributed-locking.html>`_.

What all of these locking schemes have in common is that unlike the Python ``threading.Lock``
object, they all need access to an actual key which acts as the symbol that all processes
will coordinate upon.   So here, we will also need to create the "mutex" which we
pass to :class:`.Lock` using the ``key`` argument::

    import lockfile
    import os
    from hashlib import sha1

    # ... other imports and setup from the previous example

    def cached(key, expiration_time):
        """A decorator that will cache the return value of a function
        in memcached given a key."""

        lock_path = os.path.join("/tmp", "%s.lock" % sha1(key).hexdigest())

        # ... get_value() from the previous example goes here

        def decorate(fn):
            # ... gen_cached() from the previous example goes here

            def invoke():
                # create an ad-hoc FileLock
                mutex = lockfile.FileLock(lock_path)

                with Lock(mutex, gen_cached, get_value, expiration_time) as value:
                    return value
            return invoke

        return decorate

For a given key "some_key", we generate a hex digest of the key,
then use ``lockfile.FileLock()`` to create a lock against the file
``/tmp/53def077a4264bd3183d4eb21b1f56f883e1b572.lock``.   Any number of :class:`.Lock`
objects in various processes will now coordinate with each other, using this common
filename as the "baton" against which creation of a new value proceeds.

Unlike when we used ``threading.Lock``, the file lock is ultimately locking
on a file, so multiple instances of ``FileLock()`` will all coordinate on
that same file - it's often the case that file locks that rely upon ``flock()``
require non-threaded usage, so a unique filesystem lock per thread is often a good
idea in any case.