diff options
author | Mike Bayer <mike_mp@zzzcomputing.com> | 2011-11-17 17:32:48 -0500 |
---|---|---|
committer | Mike Bayer <mike_mp@zzzcomputing.com> | 2011-11-17 17:32:48 -0500 |
commit | 7e9d4bc787e70b62a64792d3c12feb200c82536a (patch) | |
tree | ca6ba3daa854dacabe9ad072f84c67aab9cb11a1 | |
parent | 918e6727e65087339ffa10e5ccdeebe8ffaf541a (diff) | |
download | dogpile-core-7e9d4bc787e70b62a64792d3c12feb200c82536a.tar.gz |
finish examples
-rw-r--r-- | docs/build/usage.rst | 175 |
1 files changed, 130 insertions, 45 deletions
diff --git a/docs/build/usage.rst b/docs/build/usage.rst index 8b222bd..487f463 100644 --- a/docs/build/usage.rst +++ b/docs/build/usage.rst @@ -99,7 +99,7 @@ With the above pattern, :class:`.SyncReaderDogpile` will allow concurrent readers to read from the current version of the datafile as the ``create_expensive_datafile()`` function proceeds with its -job of generating the information for a new version. +job of generating the information for a new version. When the data is ready to be written, the :meth:`.SyncReaderDogpile.acquire_write_lock` call will block until all current readers of the datafile have completed @@ -161,6 +161,8 @@ Example:: Note that ``get_value_from_cache()`` should not raise :class:`.NeedRegenerationException` a second time directly after ``create_and_cache_value()`` has been called. +.. _caching_decorator: + Using Dogpile for Caching -------------------------- @@ -220,6 +222,8 @@ In particular, Dogpile's system allows us to call the memcached get() function a once per access, instead of Beaker's system which calls it twice, and doesn't make us call get() when we just created the value. +.. _scaling_on_keys: + Scaling Dogpile against Many Keys ---------------------------------- @@ -254,14 +258,13 @@ constructed around the notion of a creation function that is only invoked as needed. We also will instruct the :meth:`.Dogpile.acquire` method to use a "creation time" value that we retrieve from the cache, via the ``value_and_created_fn`` parameter, which supercedes the -``value_fn`` we used earlier to expect a function that will return a tuple -of ``(value, created_at)``:: +``value_fn`` we used earlier. ``value_and_created_fn`` expects a function that will return a tuple +of ``(value, created_at)``, where it's assumed both have been retrieved from +the cache backend:: import pylibmc import pickle - import os import time - import sha1 from dogpile import Dogpile, NeedRegenerationException, NameRegistry mc_pool = pylibmc.ThreadMappedPool(pylibmc.Client("localhost")) @@ -271,51 +274,133 @@ of ``(value, created_at)``:: dogpile_registry = NameRegistry(create_dogpile) - def cache(expiration_time): - - def get_or_create(key): - def get_value(): - with mc_pool.reserve() as mc: - value = mc.get(key) - if value is None: - raise NeedRegenerationException() - # deserialize a tuple - # (value, createdtime) - return pickle.loads(value) - - dogpile = dogpile_registry.get(key, expiration_time) + def get_or_create(key, expiration_time, creation_function): + def get_value(): + with mc_pool.reserve() as mc: + value = mc.get(key) + if value is None: + raise NeedRegenerationException() + # deserialize a tuple + # (value, createdtime) + return pickle.loads(value) + + def gen_cached(): + value = creation_function() + with mc_pool.reserve() as mc: + # serialize a tuple + # (value, createdtime) + value = (value, time.time()) + mc.put(mangled_key, pickle.dumps(value)) + return value + + dogpile = dogpile_registry.get(key, expiration_time) + + with dogpile.acquire(gen_cached, value_and_created_fn=get_value) as value: + return value + + +Stepping through the above code: + +* After the imports, we set up the memcached backend using the ``pylibmc`` library's + recommended pattern for thread-safe access. +* We create a Python function that will, given a cache key and an expiration time, + produce a :class:`.Dogpile` object which will produce the dogpile mutex on an + as-needed basis. The function here doesn't actually need the key, even though + the :class:`.NameRegistry` will be passing it in. Later, we'll see the scenario + for which we'll need this value. +* We construct a :class:`.NameRegistry`, using our dogpile creator function, that + will generate for us new :class:`.Dogpile` locks for individual keys as needed. +* We define the ``get_or_create()`` function. This function will accept the cache + key, an expiration time value, and a function that is used to create a new value + if one does not exist or the current value is expired. +* The ``get_or_create()`` function defines two callables, ``get_value()`` and + ``gen_cached()``. These two functions are exactly analogous to the the + functions of the same name in :ref:`caching_decorator` - ``get_value()`` + retrieves the value from the cache, raising :class:`.NeedRegenerationException` + if not present; ``gen_cached()`` calls the creation function to generate a new + value, stores it in the cache, and returns it. The only difference here is that + instead of storing and retrieving the value alone from the cache, the value is + stored along with its creation time; when we make a new value, we set this + to ``time.time()``. While the value and creation time pair are stored here + as a pickled tuple, it doesn't actually matter how the two are persisted; + only that the tuple value is returned from both functions. +* We acquire a new or existing :class:`.Dogpile` object from the registry using + :meth:`.NameRegistry.get`. We pass the identifying key as well as the expiration + time. A new :class:`.Dogpile` is created for the given key if one does not + exist. If a :class:`.Dogpile` lock already exists in memory for the given key, + we get that one back. +* We then call :meth:`.Dogpile.acquire` as we did in the previous cache examples, + except we use the ``value_and_created_fn`` keyword for our ``get_value()`` + function. :class:`.Dogpile` uses the "created time" value we pull from our + cache to determine when the value was last created. + +An example usage of the completed function:: + + import urllib2 + + def get_some_value(key): + """retrieve a datafile from a slow site based on the given key.""" + def get_data(): + return urllib2.urlopen( + "http://someslowsite.com/some_important_datafile_%s.json" % key + ).read() + return get_or_create(key, 3600, get_data) + + my_data = get_some_value("somekey") - def gen_cached(): - value = fn() - with mc_pool.reserve() as mc: - # serialize a tuple - # (value, createdtime) - value = (value, time.time()) - mc.put(mangled_key, pickle.dumps(value)) - return value +Using a File or Distributed Lock with Dogpile +---------------------------------------------- - with dogpile.acquire(gen_cached, value_and_created_fn=get_value) as value: - return value +The final twist on the caching pattern is to fix the issue of the Dogpile mutex +itself being local to the current process. When a handful of threads all go +to access some key in our cache, they will access the same :class:`.Dogpile` object +which internally can synchronize their activity using a Python ``threading.Lock``. +But in this example we're talking to a Memcached cache. What if we have many +servers which all access this cache? We'd like all of these servers to coordinate +together so that we don't just prevent the dogpile problem within a single process, +we prevent it across all servers. + +To accomplish this, we need an object that can coordinate processes. In this example +we'll use a file-based lock as provided by the `lockfile <http://pypi.python.org/pypi/lockfile>`_ +package, which uses a unix-symlink concept to provide a filesystem-level lock (which also +has been made threadsafe). Another strategy may base itself directly off the Unix ``os.flock()`` +call, and still another approach is to lock within Memcached itself, using a recipe +such as that described at `Using Memcached as a Distributed Locking Service <http://www.regexprn.com/2010/05/using-memcached-as-distributed-locking.html>`_. +The type of lock chosen here is based on a tradeoff between global availability +and reliable performance. The file-based lock will perform more reliably than the +memcached lock, but may be difficult to make accessible to multiple servers (with NFS +being the most likely option, which would eliminate the possibility of the ``os.flock()`` +call). The memcached lock on the other hand will provide the perfect scope, being available +from the same memcached server that the cached value itself comes from; however the lock may +vanish in some cases, which means we still could get a cache-regeneration pileup in that case. + +What all of these locking schemes have in common is that unlike the Python ``threading.Lock`` +object, they all need access to an actual key which acts as the symbol that all processes +will coordinate upon. This is where the ``key`` argument to our ``create_dogpile()`` +function introduced in :ref:`scaling_on_keys` comes in. The example can remain +the same, except for the changes below to just that function:: + + import lockfile + import os + from hashlib import sha1 - return get_or_create + # ... other imports and setup from the previous example -Above, we use ``Dogpile.registry()`` to create a name-based "registry" of ``Dogpile`` -objects. This object will provide to us a ``Dogpile`` object that's -unique on a certain name (or any hashable object) when we call the ``get()`` method. -When all usages of that name are complete, the ``Dogpile`` -object falls out of scope. This way, an application can handle millions of keys -without needing to have millions of ``Dogpile`` objects persistently resident in memory. + def create_dogpile(key, expiration_time): + lock_path = os.path.join("/tmp", "%s.lock" % sha1(key).hexdigest()) + return Dogpile( + expiration_time, + lock=lockfile.FileLock(path) + ) -The next part of the approach here is that we'll tell Dogpile that we'll give it -the "creation time" that we'll store in our -cache - we do this using the ``value_and_created_fn`` argument, which assumes we'll -be storing and loading the value as a tuple of (value, createdtime). The creation time -should always be calculated via ``time.time()``. The ``acquire()`` function -returns the "value" portion of the tuple to us and uses the -"createdtime" portion to determine if the value is expired. + # ... everything else from the previous example +Where above,the only change is the ``lock`` argument passed to the constructor of +:class:`.Dogpile`. For a given key "some_key", we generate a hex digest of it +first as a quick way to remove any filesystem-unfriendly characters, we then use +``lockfile.FileLock()`` to create a lock against the file +``/tmp/53def077a4264bd3183d4eb21b1f56f883e1b572.lock``. Any number of :class:`.Dogpile` +objects in various processes will now coordinate with each other, using this common +filename as the "baton" against which creation of a new value proceeds. -Using a File or Distributed Lock with Dogpile ----------------------------------------------- -The example below will use a file-based mutex using `lockfile <http://pypi.python.org/pypi/lockfile>`_. |