summaryrefslogtreecommitdiff
path: root/docs/source
diff options
context:
space:
mode:
Diffstat (limited to 'docs/source')
-rw-r--r--docs/source/cloudwatch_tut.rst85
-rw-r--r--docs/source/index.rst1
-rw-r--r--docs/source/ref/route53.rst29
-rw-r--r--docs/source/releasenotes/v2.30.0.rst28
-rw-r--r--docs/source/s3_tut.rst55
-rw-r--r--docs/source/sqs_tut.rst44
-rw-r--r--docs/source/swf_tut.rst148
7 files changed, 313 insertions, 77 deletions
diff --git a/docs/source/cloudwatch_tut.rst b/docs/source/cloudwatch_tut.rst
index 37263a8d..027cd980 100644
--- a/docs/source/cloudwatch_tut.rst
+++ b/docs/source/cloudwatch_tut.rst
@@ -16,45 +16,39 @@ it does, you can do this::
>>> c = boto.ec2.cloudwatch.connect_to_region('us-west-2')
>>> metrics = c.list_metrics()
>>> metrics
- [Metric:NetworkIn,
- Metric:NetworkOut,
- Metric:NetworkOut(InstanceType,m1.small),
- Metric:NetworkIn(InstanceId,i-e573e68c),
- Metric:CPUUtilization(InstanceId,i-e573e68c),
- Metric:DiskWriteBytes(InstanceType,m1.small),
- Metric:DiskWriteBytes(ImageId,ami-a1ffb63),
- Metric:NetworkOut(ImageId,ami-a1ffb63),
- Metric:DiskWriteOps(InstanceType,m1.small),
- Metric:DiskReadBytes(InstanceType,m1.small),
- Metric:DiskReadOps(ImageId,ami-a1ffb63),
- Metric:CPUUtilization(InstanceType,m1.small),
- Metric:NetworkIn(ImageId,ami-a1ffb63),
- Metric:DiskReadOps(InstanceType,m1.small),
- Metric:DiskReadBytes,
+ [Metric:DiskReadBytes,
Metric:CPUUtilization,
- Metric:DiskWriteBytes(InstanceId,i-e573e68c),
- Metric:DiskWriteOps(InstanceId,i-e573e68c),
+ Metric:DiskWriteOps,
Metric:DiskWriteOps,
Metric:DiskReadOps,
- Metric:CPUUtilization(ImageId,ami-a1ffb63),
- Metric:DiskReadOps(InstanceId,i-e573e68c),
- Metric:NetworkOut(InstanceId,i-e573e68c),
- Metric:DiskReadBytes(ImageId,ami-a1ffb63),
- Metric:DiskReadBytes(InstanceId,i-e573e68c),
- Metric:DiskWriteBytes,
- Metric:NetworkIn(InstanceType,m1.small),
- Metric:DiskWriteOps(ImageId,ami-a1ffb63)]
+ Metric:DiskReadBytes,
+ Metric:DiskReadOps,
+ Metric:CPUUtilization,
+ Metric:DiskWriteOps,
+ Metric:NetworkIn,
+ Metric:NetworkOut,
+ Metric:NetworkIn,
+ Metric:DiskReadBytes,
+ Metric:DiskWriteBytes,
+ Metric:DiskWriteBytes,
+ Metric:NetworkIn,
+ Metric:NetworkIn,
+ Metric:NetworkOut,
+ Metric:NetworkOut,
+ Metric:DiskReadOps,
+ Metric:CPUUtilization,
+ Metric:DiskReadOps,
+ Metric:CPUUtilization,
+ Metric:DiskWriteBytes,
+ Metric:DiskWriteBytes,
+ Metric:DiskReadBytes,
+ Metric:NetworkOut,
+ Metric:DiskWriteOps]
+
The list_metrics call will return a list of all of the available metrics
that you can query against. Each entry in the list is a Metric object.
-As you can see from the list above, some of the metrics are generic metrics
-and some have Dimensions associated with them (e.g. InstanceType=m1.small).
-The Dimension can be used to refine your query. So, for example, I could
-query the metric Metric:CPUUtilization which would create the desired statistic
-by aggregating cpu utilization data across all sources of information available
-or I could refine that by querying the metric
-Metric:CPUUtilization(InstanceId,i-e573e68c) which would use only the data
-associated with the instance identified by the instance ID i-e573e68c.
+As you can see from the list above, some of the metrics are repeated. The repeated metrics are across different dimensions (per-instance, per-image type, per instance type) which can identified by looking at the dimensions property.
Because for this example, I'm only monitoring a single instance, the set
of metrics available to me are fairly limited. If I was monitoring many
@@ -62,12 +56,21 @@ instances, using many different instance types and AMI's and also several
load balancers, the list of available metrics would grow considerably.
Once you have the list of available metrics, you can actually
-query the CloudWatch system for that metric. Let's choose the CPU utilization
-metric for our instance.::
+query the CloudWatch system for that metric.
+Let's choose the CPU utilization metric for one of the ImageID.::
+ >>> m_image = metrics[7]
+ >>> m_image
+ Metric:CPUUtilization
+ >>> m_image.dimensions
+ {u'ImageId': [u'ami-6ac2a85a']}
+
+Let's choose another CPU utilization metric for our instance.::
- >>> m = metrics[5]
+ >>> m = metrics[20]
>>> m
- Metric:CPUUtilization(InstanceId,i-e573e68c)
+ Metric:CPUUtilization
+ >>> m.dimensions
+ {u'InstanceId': [u'i-4ca81747']}
The Metric object has a query method that lets us actually perform
the query against the collected data in CloudWatch. To call that,
@@ -87,8 +90,7 @@ values::
And Units must be one of the following::
- ['Seconds', 'Percent', 'Bytes', 'Bits', 'Count',
- 'Bytes/Second', 'Bits/Second', 'Count/Second']
+ ['Seconds', 'Microseconds', 'Milliseconds', 'Bytes', 'Kilobytes', 'Megabytes', 'Gigabytes', 'Terabytes', 'Bits', 'Kilobits', 'Megabits', 'Gigabits', 'Terabits', 'Percent', 'Count', 'Bytes/Second', 'Kilobytes/Second', 'Megabytes/Second', 'Gigabytes/Second', 'Terabytes/Second', 'Bits/Second', 'Kilobits/Second', 'Megabits/Second', 'Gigabits/Second', 'Terabits/Second', 'Count/Second', None]
The query method also takes an optional parameter, period. This
parameter controls the granularity (in seconds) of the data returned.
@@ -108,9 +110,8 @@ about that particular data point.::
>>> d = datapoints[0]
>>> d
- {u'Average': 0.0,
- u'SampleCount': 1.0,
- u'Timestamp': u'2009-05-21T19:55:00Z',
+ {u'Timestamp': datetime.datetime(2014, 6, 23, 22, 25),
+ u'Average': 20.0,
u'Unit': u'Percent'}
My server obviously isn't very busy right now!
diff --git a/docs/source/index.rst b/docs/source/index.rst
index c97d3919..c260822a 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -119,6 +119,7 @@ Release Notes
.. toctree::
:titlesonly:
+ releasenotes/v2.30.0
releasenotes/v2.29.1
releasenotes/v2.29.0
releasenotes/v2.28.0
diff --git a/docs/source/ref/route53.rst b/docs/source/ref/route53.rst
index f3a03bab..1d4af2c6 100644
--- a/docs/source/ref/route53.rst
+++ b/docs/source/ref/route53.rst
@@ -9,26 +9,47 @@ boto.route53.connection
-----------------------
.. automodule:: boto.route53.connection
- :members:
+ :members:
:undoc-members:
boto.route53.exception
----------------------
.. automodule:: boto.route53.exception
- :members:
+ :members:
+ :undoc-members:
+
+boto.route53.healthcheck
+------------------------
+
+.. automodule:: boto.route53.healthcheck
+ :members:
+ :undoc-members:
+
+boto.route53.hostedzone
+-----------------------
+
+.. automodule:: boto.route53.hostedzone
+ :members:
:undoc-members:
boto.route53.record
-------------------
.. automodule:: boto.route53.record
- :members:
+ :members:
+ :undoc-members:
+
+boto.route53.status
+-------------------
+
+.. automodule:: boto.route53.status
+ :members:
:undoc-members:
boto.route53.zone
-----------------
.. automodule:: boto.route53.zone
- :members:
+ :members:
:undoc-members:
diff --git a/docs/source/releasenotes/v2.30.0.rst b/docs/source/releasenotes/v2.30.0.rst
new file mode 100644
index 00000000..bf5ad6d0
--- /dev/null
+++ b/docs/source/releasenotes/v2.30.0.rst
@@ -0,0 +1,28 @@
+boto v2.30.0
+============
+
+:date: 2014/07/01
+
+This release adds new Amazon EC2 instance types, new regions for AWS CloudTrail and Amazon Kinesis, Amazon S3 presigning using signature version 4, and several documentation and bugfixes.
+
+
+Changes
+-------
+* Add EC2 T2 instance types (:sha:`544f8925cb`)
+* Add new regions for CloudTrail and Kinesis (:sha:`4d67e19914`)
+* Fixed some code formatting and typo in SQS tutorial docs. (:issue:`2332`, :sha:`08c8fed`)
+* Documentation update -- Child workflows and poll API. (:issue:`2333`, :issue:`2063`, :issue:`2064`, :sha:`4835676`)
+* DOC Tutorial update for metrics and use of dimensions property. (:issue:`2340`, :issue:`2336`, :sha:`45fda90`)
+* Let people know only EC2 supported for cloudwatch. (:issue:`2341`, :sha:`98f03e2`)
+* Add namespace to AccessControlPolicy xml representation. (:issue:`2342`, :sha:`ce07446`)
+* Make ip_addr optional in Route53 HealthCheck. (:issue:`2345`, :sha:`79c35ca`)
+* Add S3 SigV4 Presigning. (:issue:`2349`, :sha:`125c4ce`)
+* Add missing route53 autodoc. (:issue:`2343`, :sha:`6472811`)
+* Adds scan_index_forward and limit to DynamoDB table query count. (:issue:`2184`, :sha:`4b6d222`)
+* Add method TaggedEC2Object.add_tags(). (:issue:`2259`, :sha:`eea5467`)
+* Add network interface lookup to EC2. Add update/attach/detach methods to NetworkInterface object. (:issue:`2311`, :sha:`4d44530`)
+* Parse date/time in a locale independent manner. (:issue:`2317`, :issue:`2271`, :sha:`3b715e5`)
+* Add documentation for delete_hosted_zone. (:issue:`2316`, :sha:`a0fdd39`)
+* s/existance/existence/ (:issue:`2315`, :sha:`b8dfa1c`)
+* Add multipart upload section to the S3 tutorial. (:issue:`2308`, :sha:`99953d4`)
+* Only attempt shared creds load if path is a file. (:issue:`2305`, :sha:`0bffa3b`)
diff --git a/docs/source/s3_tut.rst b/docs/source/s3_tut.rst
index 9db92211..e5de8af9 100644
--- a/docs/source/s3_tut.rst
+++ b/docs/source/s3_tut.rst
@@ -161,6 +161,61 @@ exists within a bucket, you can skip the check for a key on the server.
>>> key_we_know_is_there = b.get_key('mykey', validate=False)
+Storing Large Data
+------------------
+
+At times the data you may want to store will be hundreds of megabytes or
+more in size. S3 allows you to split such files into smaller components.
+You upload each component in turn and then S3 combines them into the final
+object. While this is fairly straightforward, it requires a few extra steps
+to be taken. The example below makes use of the FileChunkIO module, so
+``pip install FileChunkIO`` if it isn't already installed.
+
+::
+
+ >>> import math, os
+ >>> import boto
+ >>> from filechunkio import FileChunkIO
+
+ # Connect to S3
+ >>> c = boto.connect_s3()
+ >>> b = c.get_bucket('mybucket')
+
+ # Get file info
+ >>> source_path = 'path/to/your/file.ext'
+ >>> source_size = os.stat(source_path).st_size
+
+ # Create a multipart upload request
+ >>> mp = b.initiate_multipart_upload(os.path.basename(source_path))
+
+ # Use a chunk size of 50 MiB (feel free to change this)
+ >>> chunk_size = 52428800
+ >>> chunk_count = int(math.ceil(source_size / chunk_size))
+
+ # Send the file parts, using FileChunkIO to create a file-like object
+ # that points to a certain byte range within the original file. We
+ # set bytes to never exceed the original file size.
+ >>> for i in range(chunk_count + 1):
+ >>> offset = chunk_size * i
+ >>> bytes = min(chunk_size, source_size - offset)
+ >>> with FileChunkIO(source_path, 'r', offset=offset,
+ bytes=bytes) as fp:
+ >>> mp.upload_part_from_file(fp, part_num=i + 1)
+
+ # Finish the upload
+ >>> mp.complete_upload()
+
+It is also possible to upload the parts in parallel using threads. The
+``s3put`` script that ships with Boto provides an example of doing so
+using a thread pool.
+
+Note that if you forget to call either ``mp.complete_upload()`` or
+``mp.cancel_upload()`` you will be left with an incomplete upload and
+charged for the storage consumed by the uploaded parts. A call to
+``bucket.get_all_multipart_uploads()`` can help to show lost multipart
+upload parts.
+
+
Accessing A Bucket
------------------
diff --git a/docs/source/sqs_tut.rst b/docs/source/sqs_tut.rst
index f86aa3e8..9b8e508c 100644
--- a/docs/source/sqs_tut.rst
+++ b/docs/source/sqs_tut.rst
@@ -22,7 +22,7 @@ The recommended method of doing this is as follows::
At this point the variable conn will point to an SQSConnection object in the
US-WEST-2 region. Bear in mind that just as any other AWS service, SQS is
region-specific. In this example, the AWS access key and AWS secret key are
-passed in to the method explicitely. Alternatively, you can set the environment
+passed in to the method explicitly. Alternatively, you can set the environment
variables:
* ``AWS_ACCESS_KEY_ID`` - Your AWS Access Key ID
@@ -116,17 +116,17 @@ values of the message that was written to the queue.
Arbitrary message attributes can be defined by setting a simple dictionary
of values on the message object::
->>> m = Message()
->>> m.message_attributes = {
- "name1": {
- "data_type": "String",
- "string_value": "I am a string"
- },
- "name2": {
- "data_type": "Number",
- "string_value": "12"
- }
-}
+ >>> m = Message()
+ >>> m.message_attributes = {
+ ... "name1": {
+ ... "data_type": "String",
+ ... "string_value": "I am a string"
+ ... },
+ ... "name2": {
+ ... "data_type": "Number",
+ ... "string_value": "12"
+ ... }
+ ... }
Note that by default, these arbitrary attributes are not returned when
you request messages from a queue. Instead, you must request them via
@@ -159,7 +159,7 @@ default boto Message object. To register your message class, you would::
where MyMessage is the class definition for your message class. Your
message class should subclass the boto Message because there is a small
-bit of Python magic happening in the __setattr__ method of the boto Message
+bit of Python magic happening in the ``__setattr__`` method of the boto Message
class.
Reading Messages
@@ -203,14 +203,14 @@ passing a num_messages parameter (defaults to 1) you can control the maximum
number of messages that will be returned by the method. To show this
feature off, first let's load up a few more messages.
->>> for i in range(1, 11):
-... m = Message()
-... m.set_body('This is message %d' % i)
-... q.write(m)
-...
->>> rs = q.get_messages(10)
->>> len(rs)
-10
+ >>> for i in range(1, 11):
+ ... m = Message()
+ ... m.set_body('This is message %d' % i)
+ ... q.write(m)
+ ...
+ >>> rs = q.get_messages(10)
+ >>> len(rs)
+ 10
Don't be alarmed if the length of the result set returned by the get_messages
call is less than 10. Sometimes it takes some time for new messages to become
@@ -275,5 +275,5 @@ messages in a queue to a local file:
>>> q.dump('messages.txt', sep='\n------------------\n')
This will read all of the messages in the queue and write the bodies of
-each of the messages to the file messages.txt. The option sep argument
+each of the messages to the file messages.txt. The optional ``sep`` argument
is a separator that will be printed between each message body in the file.
diff --git a/docs/source/swf_tut.rst b/docs/source/swf_tut.rst
index 68588265..ffbacfd2 100644
--- a/docs/source/swf_tut.rst
+++ b/docs/source/swf_tut.rst
@@ -1,5 +1,5 @@
.. swf_tut:
- :Authors: Slawek "oozie" Ligus <root@ooz.ie>
+ :Authors: Slawek "oozie" Ligus <root@ooz.ie>, Brad Morris <bradley.s.morris@gmail.com>
===============================
Amazon Simple Workflow Tutorial
@@ -60,7 +60,7 @@ Before workflows and activities can be used, they have to be registered with SWF
registerables = []
registerables.append(swf.Domain(name=DOMAIN))
- for workflow_type in ('HelloWorkflow', 'SerialWorkflow', 'ParallelWorkflow'):
+ for workflow_type in ('HelloWorkflow', 'SerialWorkflow', 'ParallelWorkflow', 'SubWorkflow'):
registerables.append(swf.WorkflowType(domain=DOMAIN, name=workflow_type, version=VERSION, task_list='default'))
for activity_type in ('HelloWorld', 'ActivityA', 'ActivityB', 'ActivityC'):
@@ -441,11 +441,11 @@ The decider schedules all activities at once and marks progress until all activi
import boto.swf.layer2 as swf
import time
-
+
SCHED_COUNT = 5
-
+
class ParallelDecider(swf.Decider):
-
+
domain = 'boto_tutorial'
task_list = 'default'
def run(self):
@@ -480,12 +480,12 @@ Again, the only bit of information a worker needs is which task list to poll.
# parallel_worker.py
import time
import boto.swf.layer2 as swf
-
+
class ParallelWorker(swf.ActivityWorker):
-
+
domain = 'boto_tutorial'
task_list = 'default'
-
+
def run(self):
"""Report current time."""
activity_task = self.poll()
@@ -517,7 +517,7 @@ Run two or more workers to see how the service partitions work execution in para
working on activity1
working on activity3
working on activity4
-
+
.. code-block:: bash
$ python -i parallel_worker.py
@@ -528,6 +528,136 @@ Run two or more workers to see how the service partitions work execution in para
As seen above, the work was partitioned between the two running workers.
+Sub-Workflows
+-------------
+
+Sometimes it's desired or necessary to break the process up into multiple workflows.
+
+Since the decider is stateless, it's up to you to determine which workflow is being used and which action
+you would like to take.
+
+.. code-block:: python
+
+ import boto.swf.layer2 as swf
+
+ class SubWorkflowDecider(swf.Decider):
+
+ domain = 'boto_tutorial'
+ task_list = 'default'
+ version = '1.0'
+
+ def run(self):
+ history = self.poll()
+ events = []
+ if 'events' in history:
+ events = history['events']
+ # Collect the entire history if there are enough events to become paginated
+ while 'nextPageToken' in history:
+ history = self.poll(next_page_token=history['nextPageToken'])
+ if 'events' in history:
+ events = events + history['events']
+
+ workflow_type = history['workflowType']['name']
+
+ # Get all of the relevent events that have happened since the last decision task was started
+ workflow_events = [e for e in events
+ if e['eventId'] > history['previousStartedEventId'] and
+ not e['eventType'].startswith('Decision')]
+
+ decisions = swf.Layer1Decisions()
+
+ for event in workflow_events:
+ last_event_type = event['eventType']
+ if last_event_type == 'WorkflowExecutionStarted':
+ if workflow_type == 'SerialWorkflow':
+ decisions.start_child_workflow_execution('SubWorkflow', self.version,
+ "subworkflow_1", task_list=self.task_list, input="sub_1")
+ elif workflow_type == 'SubWorkflow':
+ for i in range(2):
+ decisions.schedule_activity_task("activity_%d" % i, 'ActivityA', self.version, task_list='a_tasks')
+ else:
+ decisions.fail_workflow_execution(reason="Unknown workflow %s" % workflow_type)
+ break
+
+ elif last_event_type == 'ChildWorkflowExecutionCompleted':
+ decisions.schedule_activity_task("activity_2", 'ActivityB', self.version, task_list='b_tasks')
+
+ elif last_event_type == 'ActivityTaskCompleted':
+ attrs = event['activityTaskCompletedEventAttributes']
+ activity = events[attrs['scheduledEventId'] - 1]
+ activity_name = activity['activityTaskScheduledEventAttributes']['activityType']['name']
+
+ if activity_name == 'ActivityA':
+ completed_count = sum([1 for a in events if a['eventType'] == 'ActivityTaskCompleted'])
+ if completed_count == 2:
+ # Complete the child workflow
+ decisions.complete_workflow_execution()
+ elif activity_name == 'ActivityB':
+ # Complete the parent workflow
+ decisions.complete_workflow_execution()
+
+ self.complete(decisions=decisions)
+ return True
+
+Misc
+----
+
+Some of these things are not obvious by reading the API documents, so hopefully they help you
+avoid some time-consuming pitfalls.
+
+Pagination
+==========
+
+When the decider polls for new tasks, the maximum number of events it will return at a time is 100
+(configurable to a smaller number, but not larger). When running a workflow, this number gets quickly
+exceeded. If it does, the decision task will contain a key ``nextPageToken`` which can be submit to the
+``poll()`` call to get the next page of events.
+
+.. code-block:: python
+
+ decision_task = self.poll()
+
+ events = []
+ if 'events' in decision_task:
+ events = decision_task['events']
+ while 'nextPageToken' in decision_task:
+ decision_task = self.poll(next_page_token=decision_task['nextPageToken'])
+ if 'events' in decision_task:
+ events += decision_task['events']
+
+Depending on your workflow logic, you might not need to aggregate all of the events.
+
+Decision Tasks
+==============
+
+When first running deciders and activities, it may seem that the decider gets called for every event that
+an activity triggers; however, this is not the case. More than one event can happen between decision tasks.
+The decision task will contain a key ``previousStartedEventId`` that lets you know the ``eventId`` of the
+last DecisionTaskStarted event that was processed. Your script will need to handle all of the events
+that have happened since then, not just the last activity.
+
+.. code-block:: python
+
+ workflow_events = [e for e in events if e['eventId'] > decision_task['previousStartedEventId']]
+
+You may also wish to still filter out tasks that start with 'Decision' or filter it in some other way
+that fulfills your needs. You will now have to iterate over the workflow_events list and respond to
+each event, as it may contain multiple events.
+
+Filtering Events
+================
+
+When running many tasks in parallel, a common task is searching through the history to see how many events
+of a particular activity type started, completed, and/or failed. Some basic list comprehension makes
+this trivial.
+
+.. code-block:: python
+
+ def filter_completed_events(self, events, type):
+ completed = [e for e in events if e['eventType'] == 'ActivityTaskCompleted']
+ orig = [events[e['activityTaskCompletedEventAttributes']['scheduledEventId']-1] for e in completed]
+ return [e for e in orig if e['activityTaskScheduledEventAttributes']['activityType']['name'] == type]
+
.. _Amazon SWF API Reference: http://docs.aws.amazon.com/amazonswf/latest/apireference/Welcome.html
.. _StackOverflow questions: http://stackoverflow.com/questions/tagged/amazon-swf
.. _Miscellaneous Blog Articles: http://log.ooz.ie/search/label/SimpleWorkflow