tomaz.me - Tag: pythoncomputers, programming, startups and life.https://www.tomaz.me2022-11-26T21:25:27+01:00Tomaz Muraustomaz@tomaz.meMaking StackStorm Fasthttps://www.tomaz.me/2021/07/04/making-stackstorm-fast.html2021-07-04T00:00:00+02:00<h2 id="making-stackstorm-fast"><a href="/2021/07/04/making-stackstorm-fast.html">Making StackStorm Fast</a></h2>
<p>In this post I will describe changes to the StackStorm database abstraction
layer which landed in <a href="https://stackstorm.com/2021/06/29/stackstorm-v3-5-0-released/">StackStorm v3.5.0</a>. Those changes will substantially
speed up action executions and workflow runs for most users.</p>
<div class="imginline">
<a href="http://www.stackstorm.com" target="_blank"><img src="/images/stackstorm-fast/st2_fast_2.png" class="inline" /></a>
</div>
<p>Based on the benchmarks and load testing we have performed, most actions
which return large results and workflows which pass large datasets around
should see speed ups in the range of up to 5-15x.</p>
<p>If you want to learn more about the details you can do that below. Alternatively
if you only care about the numbers, you can go directly to the
<a href="#numbers">Numbers, numbers, numbers</a> section.</p>
<h3 id="background-and-history">Background and History</h3>
<p>Today StackStorm is used for solving a very diverse set of problems – from IT
and infrastructure provisioning to complex CI/CD pipeline, automated remediation,
various data processing pipelines and more.</p>
<p>Solving a lot of those problems requires passing large datasets around – this
usually involves passing around large dictionary objects to the actions (which
can be in the range of many MBs) and then inside the workflow, filtering down
the result object and passing it to other tasks in the workflow.</p>
<p>This works fine when working with small objects, but it starts to break when
larger datasets are passed around (dictionaries over 500 KB).</p>
<p>In fact, passing large results around has been StackStorm’s achilles heel for
many years now (see some of the existing issues -
<a href="https://github.com/StackStorm/st2/issues/3712">#3718</a>,
<a href="https://github.com/StackStorm/st2/issues/4798">#4798</a>,
<a href="https://github.com/StackStorm/st2web/issues/625">#625</a>). Things will still work, but
executions and workflows which handle large datasets will get progressively
slower and waste progressively more CPU cycles and no one likes slow software
and wasting CPU cycles (looking at you bitcoin).</p>
<p>One of the more popular workarounds usually involves storage those larger
results / datasets in a 3d party system (such as a database) and then querying
this system and retrieving data inside the action.</p>
<p>There have been many attempts to improve that in the past (see
<a href="https://github.com/StackStorm/st2/pull/4837">#4837</a>,
<a href="https://github.com/StackStorm/st2/pull/4838">#4838</a>,
<a href="https://github.com/StackStorm/st2/pull/4846">#4846</a>) and we did make some smaller
incremental improvements over the years, but most of them were in the range of a
couple of 10% of an improvement maximum.</p>
<p>After an almost year long break from StackStorm due to the busy work and life
situation, I used StackStorm again to scratch my own itch. I noticed the age
old “large results” problem hasn’t been solved yet so I decided to take a
look at the issue again and try to make more progress on the PR I originally
started more than a year ago (https://github.com/StackStorm/st2/pull/4846).</p>
<p>It took many late nights, but I was finally able to make good progress on it.
This should bring substantial speed ups and improvements to all StackStorm
users.</p>
<h3 id="why-the-problem-exists-today">Why the problem exists today</h3>
<p>Before we look into the implemented solution, I want to briefly explain why
StackStorm today is slow and inefficient when working with large datasets.</p>
<p>Primary reason why StackStorm is slow when working with large datasets is
because we utilize <code class="language-plaintext highlighter-rouge">EscapedDictField()</code> and <code class="language-plaintext highlighter-rouge">EscapedDynamicField()</code>
mongoengine field types for storing execution results and workflow state.</p>
<p>Those field types seemed like good candidates when we started almost 7 years
ago (and they do work relatively OK for smaller results and other metadata
like fields), but over the years after people started to push more data
through it, it turned out they are very slow and inefficient for storing and
retrieving large datasets.</p>
<p>The slowness boils down to two main reasons:</p>
<ul>
<li>Field keys need to be escaped. Since <code class="language-plaintext highlighter-rouge">.</code> and <code class="language-plaintext highlighter-rouge">$</code> are special characters
in MongoDB used for querying, they need to be escaped recursively in all the
keys of a dictionary which is to be stored in the database. This can get slow
with large and deeply nested dictionaries.</li>
<li>mongoengine ORM library we use to interact with MongoDB is known to be be
very slow compared to using pymongo directly when working with large documents
(see <a href="https://github.com/MongoEngine/mongoengine/issues/1230">#1230</a> and
<a href="https://stackoverflow.com/questions/35257305/mongoengine-is-very-slow-on-large-documents-compared-to-native-pymongo-usage">https://stackoverflow.com/questions/35257305/mongoengine-is-very-slow-on-large-documents-compared-to-native-pymongo-usage)</a>.
This is mostly due to the complex and slow conversion of types mongoengine
performs when storing and retrieving documents.</li>
</ul>
<p>Those fields are also bad candidates for what we are using them for. Data we
are storing (results) is a more or less opaque binary blob to the database,
but we are storing it in a very rich field type which supports querying on
field keys and values. We don’t rely on any of that functionality and as
you know, nothing comes for free – querying on dictionary field values
requires more complex data structures internally in MongoDB and in some
cases also indexes. That’s wasteful and unnecessary in our case.</p>
<h3 id="solving-the-problem">Solving the Problem</h3>
<p>Over the years there have been many discussions on how to improve that. A
lot of users said we should switch away from MongoDB.</p>
<p>To begin with, I need to start and say I’m not a big fan of MongoDB, but
the actual database layer itself is not the problem here.</p>
<p>If switching to a different database technology was justified (aka the
bottleneck was the database itself and nor our code or libraries we depend
on), then I may say go for it, but the reality is that even then, such a
rewrite is not even close to being realistic.</p>
<p>We do have abstractions / ORM in place for working with the database
layer, but as anyone who was worked in a software project which has grown
organically over time knows, those abstractions get broken, misused or worked
around over time (for good or bad reasons, that’s it’s not even important for
this discussion).</p>
<p>Reality is that moving to a different database technology would likely
require many man months hours of work and we simply don’t have that. The
change would also be much more risky, very disruptive and likely result
in many regressions and bugs – I have participated in multiple major
rewrites in the past and no matter how many tests you have, how good are
the coding practices, the team, etc. there will always be bugs and
regressions. Nothing beats miles on the code and with a rewrite you are
removing all those miles and battle tested / hardened code with new code
which doesn’t have any of that.</p>
<p>Luckily after a bunch of research and prototyping I was able to come up with a
relatively simple solution which is much less invasive, fully backward
compatible and brings some serious improvements all across the board.</p>
<h3 id="implemented-approach">Implemented Approach</h3>
<p>Now that we know that using <code class="language-plaintext highlighter-rouge">DictField</code> and <code class="language-plaintext highlighter-rouge">DynamicField</code> is slow and
expensive, the challenge is to find a different field type which offers
much better performance.</p>
<p>After prototyping and benchmarking various approaches, I was able to find that
using binary data field type is the most efficient solution for our problem –
when using that field type, we can avoid all the escaping and most importantly,
very slow type conversions inside mongoengine.</p>
<p>This also works very well for us, since execution results, workflow results,
etc. are just an opaque blob to the database layer (we don’t perform any direct
queries on the result values or similar).</p>
<p>That’s all good, but in reality in StackStorm results are JSON dictionaries
which can contain all the simple types (dicts, lists, numbers, strings,
booleans - and as I recently learned, apparently even sets even though that’s
not a official JSON type, but mongoengine and some JSON libraries just
“silently” serialize it to a list). This means we still need to serialize data
in some fashion which can be deserialized fast and efficiently on retrieval
from the database.</p>
<p>Based on micro benchmark results, I decided to settle down on JSON,
specifically orjson library which offers very good performance on large
datasets. So with the new field type changes, execution result and various
other fields are now serialized as JSON string and stored in a database as a
binary blob (well, we did add some sugar coat on top of JSON, just to make it
a bit more future proof and allow us to change the format in future, if needed
and also implement things such as per field compression, etc.).</p>
<p>Technically using some kind of binary format (think Protobuf, msgpack,
flatbuffers, etc.) may be even faster, but those formats are primarily meant
for structured data (think all the fields and types are known up front) and
that’s not the case with our result and other fields – they can contain
arbitrary JSON dictionaries. While you can design a Protobuf structure which
would support our schemaless format, that would add a lot of overhead and very
likely in the end be slower than using JSON + orjson.</p>
<p>So even though the change sounds and looks really simple (remember – simple
code and designs are always better!) in reality it took a lot of time to get
everything to work and tests to pass (there were a lot of edge cases, code
breaking abstractions, etc.), but luckily all of that is behind us now.</p>
<p>This new field type is now used for various models (execution, live action,
workflow, task execution, trigger instance, etc.).</p>
<p>Most improvements should be seen in the action runner and workflow engine
service layer, but secondary improvements should also be seen in st2api (when
retrieving and listing execution results, etc.) and rules engine (when
evaluating rules against trigger instances with large payloads).</p>
<h3 id="-numbers-numbers-numbers"><a name="numbers"></a> Numbers, numbers, numbers</h3>
<p>Now that we know how the new changes and field type works, let’s look at the
most important thing – actual numbers.</p>
<h4 id="micro-benchmarks">Micro-benchmarks</h4>
<p>I believe all decisions like that should be made and backed up with data so I
started with some micro benchmarks for my proposed changes.</p>
<p>Those micro benchmarks measure how long it takes to insert and read a document
with a single large field from MongoDB comparing old and the new field type.</p>
<p>We also have micro benchmarks which cover more scenarios (think small values,
document with a lot of fields, document with single large field, etc.), but
those are not referenced here.</p>
<p><strong>1. Database writes</strong></p>
<div class="imginline">
<a href="/images/stackstorm-fast/image1.png" target="_blank"><img src="/images/stackstorm-fast/image1.png" class="inline" /></a>
<span class="image-caption">This screenshot shows that the new field type (json dict field) is
~10x faster over EscapedDynamicField and ~15x over EscapedDictField when saving 4 MB field
value in the database.</span>
</div>
<p><strong>2. Database reads</strong></p>
<div class="imginline">
<a href="/images/stackstorm-fast/image6.png" target="_blank"><img src="/images/stackstorm-fast/image6.png" class="inline" /></a>
<span class="image-caption">This screenshot shows that the new field is about ~7x faster
over EscapedDynamicField and ~40x over EscapedDictField..</span>
</div>
<p>P.S. You should only look at the relative change and not absolute numbers.
Those benchmarks ran on a relatively powerful server. On a smaller VMs
you may see different absolute numbers, but the relative change should be about
the same.</p>
<p>Those micro benchmarks also run daily as part of our CI to prevent regressions
and similar and you can view the complete results <a href="https://github.com/StackStorm/st2/actions/workflows/microbenchmarks.yaml">here</a>.</p>
<h4 id="end-to-end-load-tests">End to end load tests</h4>
<p>Micro benchmarks always serve as a good starting point, but in the end we care
about the complete picture.</p>
<p>Things never run in isolation, so we need to put all the pieces together and
measure how it performs in real-life scenarios.</p>
<p>To measure this, I utilized some synthetic and some more real-life like actions
and workflows.</p>
<p><strong>1. Python runner action</strong></p>
<p>Here we have a simple Python runner action which reads a 4 MB JSON file from
disk and returns it as an execution result.</p>
<p>Old field type</p>
<div class="imginline">
<a href="/images/stackstorm-fast/image2.png" target="_blank"><img src="/images/stackstorm-fast/image2.png" class="inline" /></a>
</div>
<p>New field type</p>
<div class="imginline">
<a href="/images/stackstorm-fast/image3.png" target="_blank"><img src="/images/stackstorm-fast/image3.png" class="inline" /></a>
</div>
<p>With the old field type it takes 12 seconds and with the new one it takes 1.</p>
<p>For the actual duration, please refer to the “log” field. Previous versions of
StackStorm contained a bug and didn’t accurately measure / reprt action run time –
the end_timestamp – start_timestamp only measures how long it took for action
execution to complete, but it didn’t include actual time it took to persist
execution result in the database (and with large results actual persistence
could easily take many 10s of seconds) – and execution is not actually
completed until data is persisted in the database.</p>
<p><strong>2. Orquesta Workflow</strong></p>
<p>In this test I utilized an orquesta workflow which runs Python runner action
which returns ~650 KB of data and this data is then passed to other tasks in the workflow.</p>
<p>Old field type</p>
<div class="imginline">
<a href="/images/stackstorm-fast/image4.png" target="_blank"><img src="/images/stackstorm-fast/image4.png" class="inline" /></a>
</div>
<p>New field type</p>
<div class="imginline">
<a href="/images/stackstorm-fast/image5.png" target="_blank"><img src="/images/stackstorm-fast/image5.png" class="inline" /></a>
</div>
<p>Here we see that with the old field type it takes 95 seconds and with the new
one it takes 10 seconds.</p>
<p>With workflows we see even larger improvements. The reason for that is that
actual workflow related models utilize multiple fields of this type and also
perform many more database operations (read and writes) compared to simple
non-workflow actions.</p>
<hr />
<p>You don’t need to take my word for it. You can download StackStorm v3.5.0 and
test the changes with your workloads.</p>
<p>Some of the early adopters have already tested those changes before StackStorm
v3.5.0 was released with their workloads and so far the feedback has been very
positive - speed up in the range of 5-15x.</p>
<h3 id="other-improvements">Other Improvements</h3>
<p>In addition to the database layer improvements which are the start of the v3.5.0
release, I also made various performance improvements in other parts of the
system:</p>
<ul>
<li>Various API and CLI operations have been sped up by switching to orjson for
serializarion and deserialization and various other optimizations.</li>
<li>Pack registration has been improved by reducing the number of redundant
queries and similar.</li>
<li>Various code which utilizes <code class="language-plaintext highlighter-rouge">yaml.safe_load</code> has been speed up by switching
to C versions of those functions.</li>
<li>ISO8601 / RFC3339 date time strings parsing has been speed up by switching to
<code class="language-plaintext highlighter-rouge">udatetime</code> library</li>
<li>Service start up time has been sped by utilizing <code class="language-plaintext highlighter-rouge">stevedore</code> library more
efficiently.</li>
<li>WebUI has been substantially sped up - we won’t retrieve and display very
large results by default anymore. In the past, WebUI would simply freeze the
browser window / tab when viewing the history tab. Do keep in mind that righ
now only the execution part has been optimized and in some other scenarios
WebUI will still try to load syntax highlighting very large datasets which
will result in browser freezing.</li>
</ul>
<h3 id="conclusion">Conclusion</h3>
<p>I’m personally very excited about those changes and hope you are as well.</p>
<p>They help address one of StackStorm’s long known pain points. And we are not
just talking about 10% here and there, but up to 10-15x improvements for
executions and workflows which work with larger datasets (> 500 KB).</p>
<p>That 10-15x speed up doesn’t just mean executions and workflows will complete
faster, but also much lower CPU utilization and less wasted CPU cycles (as
described above, due to the various conversions, storing large fields in the
database and to a lesser extent also reading them, was previously a very CPU
intensive task).</p>
<p>So in a sense, you can view of those changes as getting additional resources /
servers for free – previously you might have needed to add new pods / servers
running StackStorm services, but with those changes you should able to get
much better throughput (executions / seconds) with the existing resources
(you may even be able to scale down!). Hey, who doesn’t like free servers :)</p>
<p>This means many large StackStorm users will be able to save many hundreds
and thousands of $ per month in infrastructure costs. If this change will
benefit you and your can afford it, check <a href="https://stackstorm.com/donate/">Donate</a> page on how you can
help the project.</p>
<h3 id="thanks">Thanks</h3>
<p>I would like to thank everyone who has contributed to the performance
improvements in any way.</p>
<p>Thank to everyone who has helped to review that massive PR with over 100
commits (Winson, Drew, Jacob, Amanda), @guzzijones and others who have tested
the changes while they were still in development and more.</p>
<p>This also includes many of our long term uses such as Nick Maludy,
@jdmeyer3 and others who have reported this issue a long time ago and worked
around the limitations when working with larger datasets in various different
ways.</p>
<p>Special thanks also to v3.5.0 release managers <a href="https://github.com/amanda11">Amanda</a> and <a href="https://github.com/winem">Marcel</a>.</p>
Consuming AWS EventBridge Events inside StackStormhttps://www.tomaz.me/2019/07/13/consuming-aws-eventbridge-events-in-stackstorm.html2019-07-13T00:00:00+02:00<h2 id="consuming-aws-eventbridge-events-inside-stackstorm"><a href="/2019/07/13/consuming-aws-eventbridge-events-in-stackstorm.html">Consuming AWS EventBridge Events inside StackStorm</a></h2>
<p>Amazon Web Services (AWS) recently launched a new product called <a href="https://aws.amazon.com/eventbridge/">Amazon
EventBridge</a>.</p>
<p>EventBridge has a lot of similarities to <a href="https://stackstorm.com">StackStorm</a>, a popular open-source
cross-domain event-driven infrastructure automation platform. In some ways, you
could think of it as a very light weight and limited version of StackStorm
as a service (SaaS).</p>
<p>In this blog post I will should you how you can extend StackStorm functionality
by consuming thousands of different events which are available through Amazon
EventsBridge.</p>
<h3 id="why">Why?</h3>
<p>First of all you might ask why you would want to do that.</p>
<p><a href="https://exchange.stackstorm.org/">StackStorm Exchange</a> already offers many different packs which allows users
to integrate with various popular projects and services (including AWS). In fact,
StackStorm Exchange integration integration packs expose over 1500 different
actions.</p>
<div class="imginline">
<a href="" target="_blank"><img src="/images/2019-07-13-consuming-aws-eventbridge-events-in-stackstorm/exchange.png" class="inline" /></a>
<span class="image-caption">StackStorm Exchange aka Pack Marketplace.</span>
</div>
<p>Even though StackStorm Exchange offers integration with many different products
and services, those integrations are still limited, especially on the incoming
events / triggers side.</p>
<p>Since event-driven automation is all about the events which can trigger various
actions and business logic, the more events you have access to, the better.</p>
<p>Run a workflow which runs Ansible provision, creates a CloudFlare DNS record,
adds new server to Nagios, adds server to the loadbalancer when a new EC2
instance is started? Check.</p>
<p>Honk your Tesla Model S horn when your satellite passes and establishes a
contact with <a href="https://aws.amazon.com/ground-station/">AWS Ground Station</a>? Check.</p>
<p>Having access to many thousands of different events exposed through EventBridge
opens up almost unlimited automation possibilities.</p>
<p>For a list of some of the events supported by EventsBridge, please refer to
<a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/event-types.html">their documentation</a>.</p>
<h3 id="consuming-eventbridge-events-inside-stackstorm">Consuming EventBridge Events Inside StackStorm</h3>
<p>There are many possible ways to integrate StackStorm and EventBridge and
consume EventBridge events inside StackStorm. Some more complex than others.</p>
<p>In this post, I will describe an approach which utilizes AWS Lambda function.</p>
<p>I decided to go with AWS Lambda approach because it’s simple and straightforward.
It looks like this:</p>
<div class="imginline">
<a href="https://exchange.stackstorm.org/" target="_blank"><img src="/images/2019-07-13-consuming-aws-eventbridge-events-in-stackstorm/eventbridge_stackstorm.png" class="inline" /></a>
<span class="image-caption">AWS / partner event -> AWS EventBridge -> AWS Lambda Function -> StackStorm Webhooks API</span>
</div>
<ol>
<li>Event is generated by AWS service or a partner SaaS product</li>
<li>EventBridge rule matches an event and triggers AWS Lambda Function (rule target)</li>
<li>AWS Lambda Function sends an event to StackStorm using StackStorm Webhooks
API endpoint</li>
</ol>
<h4 id="1-create-stackstorm-rule-which-exposes-a-new-webhook">1. Create StackStorm Rule Which Exposes a New Webhook</h4>
<p>First we need to create a StackStorm rule which exposes a new <code class="language-plaintext highlighter-rouge">eventbridge</code>
webhook. This webhook will be available through
<code class="language-plaintext highlighter-rouge">https://<example.com>/api/v1/webhooks/eventbridge</code> URL.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">wget https://gist.githubusercontent.com/Kami/204a8f676c0d1de39dc841b699054a68/raw/b3d63fd7749137da76fa35ca1c34b47fd574458d/write_eventbridge_data_to_file.yaml
st2 rule create write_eventbridge_data_to_file.yaml</code></pre></figure>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">name</span><span class="pi">:</span> <span class="s2">"</span><span class="s">write_eventbridge_data_to_file"</span>
<span class="na">pack</span><span class="pi">:</span> <span class="s2">"</span><span class="s">default"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Test</span><span class="nv"> </span><span class="s">rule</span><span class="nv"> </span><span class="s">which</span><span class="nv"> </span><span class="s">writes</span><span class="nv"> </span><span class="s">AWS</span><span class="nv"> </span><span class="s">EventBridge</span><span class="nv"> </span><span class="s">event</span><span class="nv"> </span><span class="s">data</span><span class="nv"> </span><span class="s">to</span><span class="nv"> </span><span class="s">file."</span>
<span class="na">enabled</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">trigger</span><span class="pi">:</span>
<span class="na">type</span><span class="pi">:</span> <span class="s2">"</span><span class="s">core.st2.webhook"</span>
<span class="na">parameters</span><span class="pi">:</span>
<span class="na">url</span><span class="pi">:</span> <span class="s2">"</span><span class="s">eventbridge"</span>
<span class="na">criteria</span><span class="pi">:</span>
<span class="na">trigger.body.detail.eventSource</span><span class="pi">:</span>
<span class="na">pattern</span><span class="pi">:</span> <span class="s2">"</span><span class="s">ec2.amazonaws.com"</span>
<span class="na">type</span><span class="pi">:</span> <span class="s2">"</span><span class="s">equals"</span>
<span class="na">trigger.body.detail.eventName</span><span class="pi">:</span>
<span class="na">pattern</span><span class="pi">:</span> <span class="s2">"</span><span class="s">RunInstances"</span>
<span class="na">type</span><span class="pi">:</span> <span class="s2">"</span><span class="s">equals"</span>
<span class="na">action</span><span class="pi">:</span>
<span class="na">ref</span><span class="pi">:</span> <span class="s2">"</span><span class="s">core.local"</span>
<span class="na">parameters</span><span class="pi">:</span>
<span class="na">cmd</span><span class="pi">:</span> <span class="s2">"</span><span class="s">echo</span><span class="nv"> </span><span class="se">\"</span><span class="s">{{trigger.body}}</span><span class="se">\"</span><span class="nv"> </span><span class="s">>></span><span class="nv"> </span><span class="s">~/st2.webhook.out"</span></code></pre></figure>
<p>You can have as many rules as you want with the same webhook URL parameter.
This means you can utilize the same webhook endpoint to match as many
different events and trigger as many different actions / workflows as you want.</p>
<p>In the <code class="language-plaintext highlighter-rouge">criteria</code> field we filter on events which correspond to new EC2
instance launches (<code class="language-plaintext highlighter-rouge">eventName</code> matches <code class="language-plaintext highlighter-rouge">RunInstances</code> and <code class="language-plaintext highlighter-rouge">eventSource</code>
matches <code class="language-plaintext highlighter-rouge">ec2.amazonaws.com</code>). StackStorm <a href="https://docs.stackstorm.com/rules.html#critera-comparison[">rule criteria comparison
operators</a> are quite expressive so you can also get more creative than that.</p>
<p>As this is just an example, we simply write a body of the matched event to
a file on disk (<code class="language-plaintext highlighter-rouge">/home/stanley/st2.webhook.out</code>). In a real life scenario,
you would likely utilize <a href="https://github.com/StackStorm/orquesta">Orquesta workflow</a> which runs your complex or less
complex business logic.</p>
<p>This could involve steps and actions such as:</p>
<ul>
<li>Add new instance to the load-balancer</li>
<li>Add new instance to your monitoring system</li>
<li>Notify Slack channel new instance has been started</li>
<li>Configure your firewall for the new instance</li>
<li>Run Ansible provision on it</li>
<li>etc.</li>
</ul>
<h4 id="2-configure-and-deploy-aws-lambda-function">2. Configure and Deploy AWS Lambda Function</h4>
<p>Once your rule is configured, you need to configure and deploy AWS Lambda
function.</p>
<p>You can find code for the Lambda Python function I wrote here -
<a href="https://github.com/Kami/aws-lambda-event-to-stackstorm">https://github.com/Kami/aws-lambda-event-to-stackstorm</a>.</p>
<p>I decided to use Lambda Python environment, but the actual handler is very
simple so I could easily use JavaScript and Node.js environment instead.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">git clone https://github.com/Kami/aws-lambda-event-to-stackstorm.git
<span class="nb">cd </span>aws-lambda-event-to-stackstorm
<span class="c"># Install python-lambda package which takes care of creating and deploying</span>
<span class="c"># Lambda bundle for your</span>
pip <span class="nb">install </span>python-lambda
<span class="c"># Edit config.yaml file and make sure all the required environment variables</span>
<span class="c"># are set - things such as StackStorm Webhook URL, API key, etc.</span>
<span class="c"># vim config.yaml</span>
<span class="c"># Deploy your Lambda function</span>
<span class="c"># For that command to work, you need to have awscli package installed and</span>
<span class="c"># configured on your system (pip install --upgrade --user awscli ; aws configure)</span>
lambda deploy
<span class="c"># You can also test it locally by using the provided event.json sample event</span>
lambda invoke</code></pre></figure>
<p>You can confirm that the function has been deployed by going to the AWS console
or by running AWS CLI command:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">aws lambda list-function
aws lambda get-function <span class="nt">--function-name</span> send_event_to_stackstorm</code></pre></figure>
<p>And you can verify that it’s running by tailing the function logs:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">LAMBDA_FUNCTION_NAME</span><span class="o">=</span><span class="s2">"send_event_to_stackstorm"</span>
<span class="nv">LOG_STREAM_NAME</span><span class="o">=</span><span class="sb">`</span>aws logs describe-log-streams <span class="nt">--log-group-name</span> <span class="s2">"/aws/lambda/</span><span class="k">${</span><span class="nv">LAMBDA_FUNCTION_NAME</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--query</span> logStreams[<span class="k">*</span><span class="o">]</span>.logStreamName | jq <span class="s1">'.[0]'</span> | xargs<span class="sb">`</span>
aws logs get-log-events <span class="nt">--log-group-name</span> <span class="s2">"/aws/lambda/</span><span class="k">${</span><span class="nv">LAMBDA_FUNCTION_NAME</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--log-stream-name</span> <span class="s2">"</span><span class="k">${</span><span class="nv">LOG_STREAM_NAME</span><span class="k">}</span><span class="s2">"</span></code></pre></figure>
<h4 id="2-create-aws-eventbridge-rule-which-runs-your-lambda-function">2. Create AWS EventBridge Rule Which Runs Your Lambda Function</h4>
<p>Now we need to create AWS EventBridge rule which will match the events and
trigger AWS Lambda function.</p>
<div class="imginline">
<a href="/images/2019-07-13-consuming-aws-eventbridge-events-in-stackstorm/eventbridge_rule.png" target="_blank"><img src="/images/2019-07-13-consuming-aws-eventbridge-events-in-stackstorm/eventbridge_rule.png" class="inline" /></a>
<span class="image-caption">AWS EventBridge Rule Configuration</span>
</div>
<p>As you can see in the screenshot above, I simply configured the rule to send
every event to Lambda function.</p>
<p>This may be OK for testing, but for production usage, you should narrow this
down to the actual events you are interested in. If you don’t, you might get
surprised by your AWS Lambda bill - even on small AWS accounts, there are tons
of events being being constantly generated by various services and account
actions.</p>
<h4 id="3-monitor-your-stackstorm-instance-for-new-aws-eventbridge-events">3. Monitor your StackStorm Instance For New AWS EventBridge Events</h4>
<p>As soon as you configure and enable the rule, new AWS EventBridge events
(trigger instances) should start flowing into your StackStorm deployment.</p>
<p>You can monitor for new instances using <code class="language-plaintext highlighter-rouge">st2 trace list</code> and
<code class="language-plaintext highlighter-rouge">st2 trigger-instance list</code> commands.</p>
<div class="imginline">
<a href="/images/2019-07-13-consuming-aws-eventbridge-events-in-stackstorm/st2_trace_list.png" target="_blank"><img src="/images/2019-07-13-consuming-aws-eventbridge-events-in-stackstorm/st2_trace_list.png" class="inline" /></a>
<span class="image-caption">AWS EventBridge event matched StackStorm rule
criteria and triggered an action execution.</span>
</div>
<p>And as soon as a new EC2 instance is launched, your action which was defined in
the StackStorm rule above will be executed.</p>
<h4 id="conclusion">Conclusion</h4>
<p>This post showed how easy it is to consume AWS EventBridge events inside
StackStorm and tie those two services together.</p>
<p>Gaining access to many thousand of different AWS and AWS partner events
inside StackStorm opens up many new possibilities and allows you to apply
cross-domain automation to many new situations.</p>
Migrating from Zerigo to Rackspace Cloud DNS using Libcloudhttps://www.tomaz.me/2014/01/18/migrating-from-zerigo-to-rackspace-cloud-dns-using-libcloud.html2014-01-18T00:00:00+01:00<h2 id="migrating-from-zerigo-to-rackspace-cloud-dns-using-libcloud"><a href="/2014/01/18/migrating-from-zerigo-to-rackspace-cloud-dns-using-libcloud.html">Migrating from Zerigo to Rackspace Cloud DNS using Libcloud</a></h2>
<p>In this blog post I’m going to describe how to migrate from <a href="http://www.zerigo.com/managed-dns">Zerigo DNS</a>
to <a href="http://www.rackspace.com/cloud/dns/">Rackspace Cloud DNS</a> using a ~80 lines long Python script which utilizes
<a href="https://libcloud.apache.org/">Libcloud</a>.</p>
<div class="imginline">
<a href="http://libcloud.apache.org" target="_blank">
<img src="/images/2013-12-11-libcloud-update-key-pair-management-methods-are-now-part-of-the-base-api/libcloud.png" class="inline" /></a>
</div>
<h3 id="background-and-motivation">Background and Motivation</h3>
<p>In September of the last year, I wrote how to <a href="/2013/09/07/exporting-libcloud-dns-zone-to-bind-zone-file-format-and-migrating-between-dns-providers.html">export a Libcloud zone to the
BIND zone format</a> and use the BIND zone file to migrate between DNS
providers.</p>
<p>At that time, my motivation for migrating away from Zerigo was mostly fueled
by a very unreliable service which was a consequence of DDoS attacks and less
than ideal service architecture.</p>
<p>I have a paid Zerigo plan, so back then, I only migrated the most important
domains to a different provider. Not long after I have done this, Zerigo
announced that they have <a href="http://www.zerigo.com/article/akamai-dns-partnership">partnered with Akamai</a> and that going forward,
they will outsource running of the DNS infrastructure to Akami and as such,
the service should be way more stable and reliable.</p>
<p>I thought great, I won’t need to migrate rest of the domains away, but an
unplesant surprise came earlier this month, when Zerigo announced pricing
changes (see <a href="http://www.zerigo.com/news/notice-zerigo-dns-change-of-plans">1</a>, <a href="http://www.zerigo.com/news/zerigo-price-increase-facts">2</a>, <a href="http://www.zerigo.com/news/on-grandfathering-pre-paid-dns-accounts">3</a> & <a href="https://gist.github.com/Kami/5199908f006383dbfdcc">4</a>).</p>
<p>Previously, I have paid <strong>19$ years per year</strong>, but with a new plan which
matches my current one, I would need to pay <strong>25$ per month</strong>. That’s with
an existing customer loyalty discount. New customers will need to pay
<strong>38$ per month</strong> (what a great deal, instead of paying 24 times more,
now I need to pay <strong>just</strong> 15 times more!). Yes, you have read this correctly,
that’s more than one order of magniture per year more than I used to pay
before.</p>
<p>I honestly don’t mind paying for a great software and services and I wouldn’t
mind paying a little more if the service improved, but that kind or price
increase is simply too much. That is especially true, because all of the ~15
domains that I still have at Zerigo are used to host non-profit and community
websites and paying 25$ per month is simply too much.</p>
<h3 id="why-rackspace-cloud-dns">Why Rackspace Cloud DNS?</h3>
<p><em>Disclamer: I used to work at Rackspace, but I don’t work there anymore and
I’m not affiliated with them in any way.</em></p>
<p>Before I dive further, lets have a look at why you might want to use Rackspace
Cloud DNS.</p>
<p>The main reason for me to migrate to Rackspace is that they have a decent
API, they are supported in Libcloud and best of all, the service is totally
free for the existing cloud servers customers. On top of that, the service is
supposed to use Anycast.</p>
<p>All of that made it a good fit for hosting my non-profit domains there.</p>
<p>I also need to add that I haven’t used the service a lot before, so I can’t
really talk much about the service relablitity at this point. Only time and
monitoring will tell how reliable the service really is.</p>
<h3 id="migrating-from-zerigo-dns-to-rackspace-cloud-dns-using-libcloud">Migrating from Zerigo DNS to Rackspace Cloud DNS using Libcloud</h3>
<p>Instead of using Libcloud’s export to BIND zone file functionality, this script
works by talking directly to both of the provider APIs.</p>
<p>The reason for that is that this approach is more robust and makes
performing partial migrations and synchronizations easier. On top of that it
also works with other providers which don’t support importing a BIND zone file.</p>
<p>It’s also important to note that the script relies on some Libcloud fixes which
are currently only available in trunk. As such, you should use <code class="language-plaintext highlighter-rouge">pip</code> to
install latest version from Git inside a virtual environment:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">pip <span class="nb">install</span> <span class="nt">-e</span> git+https://github.com/apache/libcloud.git@trunk#egg<span class="o">=</span>libcloud</code></pre></figure>
<p>After you have done this, you can use the script bellow to migrate all of
your zones from Zerigo to Rackspace:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">hashlib</span>
<span class="kn">from</span> <span class="nn">libcloud.dns.types</span> <span class="kn">import</span> <span class="n">Provider</span><span class="p">,</span> <span class="n">RecordType</span>
<span class="kn">from</span> <span class="nn">libcloud.dns.providers</span> <span class="kn">import</span> <span class="n">get_driver</span>
<span class="n">ZERIGO_USERNAME</span> <span class="o">=</span> <span class="s">''</span>
<span class="n">ZERIGO_API_KEY</span> <span class="o">=</span> <span class="s">''</span>
<span class="n">RACKSPACE_USERNAME</span> <span class="o">=</span> <span class="s">''</span>
<span class="n">RACKSPACE_API_KEY</span> <span class="o">=</span> <span class="s">''</span>
<span class="n">CONTACT_EMAIL</span> <span class="o">=</span> <span class="s">''</span> <span class="c1"># Rackspace requires a valid email for every domain
</span>
<span class="n">ZONE_TTL</span> <span class="o">=</span> <span class="mi">30</span> <span class="o">*</span> <span class="mi">60</span> <span class="c1"># Default zone TTL (in seconds) which should be used
</span><span class="n">MIN_TTL</span> <span class="o">=</span> <span class="mi">300</span> <span class="c1"># Minim TTL supported by the target provider
</span><span class="n">IGNORED_RECORD_TYPES</span> <span class="o">=</span> <span class="p">[</span><span class="n">RecordType</span><span class="p">.</span><span class="n">NS</span><span class="p">,</span> <span class="n">RecordType</span><span class="p">.</span><span class="n">PTR</span><span class="p">]</span>
<span class="n">source_cls</span> <span class="o">=</span> <span class="n">get_driver</span><span class="p">(</span><span class="n">Provider</span><span class="p">.</span><span class="n">ZERIGO</span><span class="p">)(</span><span class="n">ZERIGO_USERNAME</span><span class="p">,</span> <span class="n">ZERIGO_API_KEY</span><span class="p">)</span>
<span class="n">destination_cls</span> <span class="o">=</span> <span class="n">get_driver</span><span class="p">(</span><span class="n">Provider</span><span class="p">.</span><span class="n">RACKSPACE</span><span class="p">)(</span><span class="n">RACKSPACE_USERNAME</span><span class="p">,</span>
<span class="n">RACKSPACE_API_KEY</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_record_hash</span><span class="p">(</span><span class="n">record</span><span class="p">):</span>
<span class="s">"""
Return a hash for the provided record. This is used to determine if the
record already exists.
"""</span>
<span class="n">record_hash</span> <span class="o">=</span> <span class="n">hashlib</span><span class="p">.</span><span class="n">md5</span><span class="p">(</span><span class="s">'%s-%s-%s'</span> <span class="o">%</span> <span class="p">(</span><span class="n">record</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">record</span><span class="p">.</span><span class="nb">type</span><span class="p">,</span>
<span class="n">record</span><span class="p">.</span><span class="n">data</span><span class="p">)).</span><span class="n">hexdigest</span><span class="p">()</span>
<span class="k">return</span> <span class="n">record_hash</span>
<span class="n">source_zones</span> <span class="o">=</span> <span class="n">source_cls</span><span class="p">.</span><span class="n">list_zones</span><span class="p">()</span>
<span class="n">destination_zones</span> <span class="o">=</span> <span class="n">destination_cls</span><span class="p">.</span><span class="n">list_zones</span><span class="p">()</span>
<span class="n">destination_domains</span> <span class="o">=</span> <span class="p">[</span><span class="n">zone</span><span class="p">.</span><span class="n">domain</span> <span class="k">for</span> <span class="n">zone</span> <span class="ow">in</span> <span class="n">destination_zones</span><span class="p">]</span>
<span class="c1"># 1. Create zones
</span><span class="k">for</span> <span class="n">zone</span> <span class="ow">in</span> <span class="n">source_zones</span><span class="p">:</span>
<span class="k">if</span> <span class="n">zone</span><span class="p">.</span><span class="n">domain</span> <span class="ow">in</span> <span class="n">destination_domains</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Zone "%s" already exists, skipping...'</span> <span class="o">%</span> <span class="p">(</span><span class="n">zone</span><span class="p">.</span><span class="n">domain</span><span class="p">))</span>
<span class="k">continue</span>
<span class="n">extra</span> <span class="o">=</span> <span class="p">{</span><span class="s">'email'</span><span class="p">:</span> <span class="n">CONTACT_EMAIL</span><span class="p">}</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Creating zone: %s'</span> <span class="o">%</span> <span class="p">(</span><span class="n">zone</span><span class="p">.</span><span class="n">domain</span><span class="p">))</span>
<span class="n">destination_cls</span><span class="p">.</span><span class="n">create_zone</span><span class="p">(</span><span class="n">domain</span><span class="o">=</span><span class="n">zone</span><span class="p">.</span><span class="n">domain</span><span class="p">,</span> <span class="n">ttl</span><span class="o">=</span><span class="n">ZONE_TTL</span><span class="p">,</span>
<span class="n">extra</span><span class="o">=</span><span class="n">extra</span><span class="p">)</span>
<span class="n">destination_zones</span> <span class="o">=</span> <span class="n">destination_cls</span><span class="p">.</span><span class="n">list_zones</span><span class="p">()</span>
<span class="n">supported_record_type</span> <span class="o">=</span> <span class="n">destination_cls</span><span class="p">.</span><span class="n">list_record_types</span><span class="p">()</span>
<span class="c1"># 2. Create records
</span><span class="k">for</span> <span class="n">source_zone</span> <span class="ow">in</span> <span class="n">source_zones</span><span class="p">:</span>
<span class="n">destination_zone</span> <span class="o">=</span> <span class="p">[</span><span class="n">zone</span> <span class="k">for</span> <span class="n">zone</span> <span class="ow">in</span> <span class="n">destination_zones</span>
<span class="k">if</span> <span class="n">zone</span><span class="p">.</span><span class="n">domain</span> <span class="o">==</span> <span class="n">source_zone</span><span class="p">.</span><span class="n">domain</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="n">source_records</span> <span class="o">=</span> <span class="n">source_zone</span><span class="p">.</span><span class="n">list_records</span><span class="p">()</span>
<span class="n">destination_records</span> <span class="o">=</span> <span class="n">destination_zone</span><span class="p">.</span><span class="n">list_records</span><span class="p">()</span>
<span class="k">for</span> <span class="n">source_record</span> <span class="ow">in</span> <span class="n">source_records</span><span class="p">:</span>
<span class="c1"># Rackspace doesn't have a special SPF record type
</span> <span class="k">if</span> <span class="n">source_record</span><span class="p">.</span><span class="nb">type</span> <span class="o">==</span> <span class="n">RecordType</span><span class="p">.</span><span class="n">SPF</span><span class="p">:</span>
<span class="n">source_record</span><span class="p">.</span><span class="nb">type</span> <span class="o">=</span> <span class="n">RecordType</span><span class="p">.</span><span class="n">TXT</span>
<span class="n">record_hash</span> <span class="o">=</span> <span class="n">get_record_hash</span><span class="p">(</span><span class="n">source_record</span><span class="p">)</span>
<span class="n">destination_record_hashes</span> <span class="o">=</span> <span class="p">[</span><span class="n">get_record_hash</span><span class="p">(</span><span class="n">record</span><span class="p">)</span> <span class="k">for</span> <span class="n">record</span>
<span class="ow">in</span> <span class="n">destination_records</span><span class="p">]</span>
<span class="k">if</span> <span class="n">source_record</span><span class="p">.</span><span class="n">name</span><span class="p">:</span>
<span class="n">fqdn</span> <span class="o">=</span> <span class="s">'%s.%s'</span> <span class="o">%</span> <span class="p">(</span><span class="n">source_record</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">source_zone</span><span class="p">.</span><span class="n">domain</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">fqdn</span> <span class="o">=</span> <span class="n">source_zone</span><span class="p">.</span><span class="n">domain</span>
<span class="k">if</span> <span class="n">record_hash</span> <span class="ow">in</span> <span class="n">destination_record_hashes</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Record "%s" already exists, skipping...'</span> <span class="o">%</span> <span class="p">(</span><span class="n">fqdn</span><span class="p">))</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">source_record</span><span class="p">.</span><span class="nb">type</span> <span class="ow">in</span> <span class="n">IGNORED_RECORD_TYPES</span><span class="p">:</span>
<span class="k">print</span><span class="p">((</span><span class="s">'Encountered ignored record type (type=%s,name=%s) '</span>
<span class="s">'skipping...'</span><span class="p">)</span> <span class="o">%</span> <span class="p">(</span><span class="n">source_record</span><span class="p">.</span><span class="nb">type</span><span class="p">,</span> <span class="n">fqdn</span><span class="p">))</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="nb">type</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">supported_record_type</span><span class="p">:</span>
<span class="k">print</span><span class="p">((</span><span class="s">'Encountered unsupported record type (type=%s,name=%s)'</span>
<span class="s">', skipping...'</span><span class="p">)</span> <span class="o">%</span> <span class="p">(</span><span class="n">source_record</span><span class="p">.</span><span class="nb">type</span><span class="p">,</span> <span class="n">fqdn</span><span class="p">))</span>
<span class="k">continue</span>
<span class="n">extra</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">ttl</span> <span class="o">=</span> <span class="n">source_record</span><span class="p">.</span><span class="n">extra</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'ttl'</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
<span class="n">priority</span> <span class="o">=</span> <span class="n">source_record</span><span class="p">.</span><span class="n">extra</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'priority'</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ttl</span><span class="p">:</span>
<span class="k">if</span> <span class="n">ttl</span> <span class="o"><</span> <span class="n">MIN_TTL</span><span class="p">:</span>
<span class="n">ttl</span> <span class="o">=</span> <span class="n">MIN_TTL</span>
<span class="n">extra</span><span class="p">[</span><span class="s">'ttl'</span><span class="p">]</span> <span class="o">=</span> <span class="n">ttl</span>
<span class="k">if</span> <span class="n">priority</span><span class="p">:</span>
<span class="n">extra</span><span class="p">[</span><span class="s">'priority'</span><span class="p">]</span> <span class="o">=</span> <span class="n">priority</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">source_record</span><span class="p">.</span><span class="n">name</span>
<span class="nb">type</span> <span class="o">=</span> <span class="n">source_record</span><span class="p">.</span><span class="nb">type</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">source_record</span><span class="p">.</span><span class="n">data</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Creating a record: %s'</span> <span class="o">%</span> <span class="p">(</span><span class="n">fqdn</span><span class="p">))</span>
<span class="n">destination_zone</span><span class="p">.</span><span class="n">create_record</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">type</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">,</span>
<span class="n">extra</span><span class="o">=</span><span class="n">extra</span><span class="p">)</span></code></pre></figure>
<p>Before proceeeding it’s worth knowing that there are some differences between
the providers and some limitations you should be aware of:</p>
<ul>
<li>Zerigo supports more record types. If you use more advanced record types
which are not supported by Rackspace, then Rackspace might not be a good
fit for you.</li>
<li>Rackspace only allows you to create <code class="language-plaintext highlighter-rouge">PTR</code> records for resources (cloud
servers & load balancers) which are hosted in their data centers.</li>
<li>Rackspace doesn’t support <code class="language-plaintext highlighter-rouge">SPF</code> record type. This is not a big deal since
this record type has been deprecated anyway and <code class="language-plaintext highlighter-rouge">TXT</code> can be used instead.
This script transparently handled remapping of <code class="language-plaintext highlighter-rouge">SPF</code> to <code class="language-plaintext highlighter-rouge">TXT</code> for you.</li>
<li>Minimum supported TTL by Zerigo is <code class="language-plaintext highlighter-rouge">180</code> seconds and the minimum supported
TTL by Rackspace is <code class="language-plaintext highlighter-rouge">300</code> seconds. If during the migration the script
encounteres a TTL smaller than 300 seconds, it simply uses the smallest
possible TTL which is 300 seconds.</li>
</ul>
<p>To use it, simply plug in your API credentials and run it:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">python migrate_dns_providers.py</code></pre></figure>
<div class="imginline">
<a href="/images/2014-01-18-migrating-from-zerigo-to-rackspace-cloud-dns-using-libcloud/zerigo.png" class="fancybox"><img src="/images/2014-01-18-migrating-from-zerigo-to-rackspace-cloud-dns-using-libcloud/zerigo.png" class="inline" /></a>
<span class="image-caption">Zerigo control panel.</span>
</div>
<p>If the script for some reason fails half-way through (bad connectivity, API
issues, etc.), it’s safe to run it again since all the operations are
idempotent.</p>
<div class="imginline">
<a href="/images/2014-01-18-migrating-from-zerigo-to-rackspace-cloud-dns-using-libcloud/rax.png" class="fancybox"><img src="/images/2014-01-18-migrating-from-zerigo-to-rackspace-cloud-dns-using-libcloud/rax.png" class="inline" /></a>
<span class="image-caption">Rackspace Cloud DNS control panel after the migration.</span>
</div>
<p>After you have run the script, you should check if everything looks OK and if
it does, you can go ahead and change the DNS records for your domains to point
to the Rackspace Cloud DNS servers (<code class="language-plaintext highlighter-rouge">dns1.stabletransit.com</code> &
<code class="language-plaintext highlighter-rouge">dns2.stabletransit.com</code>).</p>
Designing a server-side application for secure storage of access tokens and other secretshttps://www.tomaz.me/2013/12/27/designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets.html2013-12-27T00:00:00+01:00<h2 id="designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets"><a href="/2013/12/27/designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets.html">Designing a server-side application for secure storage of access tokens and other secrets</a></h2>
<p>One of the projects I’m currently working on is an augmented inbox service. The
primary goal of the service is to allow user to use email in a more efficient
manner and spend less time in their inbox.</p>
<p>It helps user to achieve that by overlaying an inbox with all kind of important
contextual information about the sender or recipient. This overlay consists of
different insights, metrics and suggestions which are derived from the
historical usage data and real-time information obtained from the social media
profiles. You can think of with as <a href="https://rapportive.com/">Rapporitve</a> with contextual data.</p>
<div class="imginline">
<img src="/images/2013-12-27-designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets/extension_prototype.png" class="inline" />
<span class="image-caption">Early prototype of the overlay served as a
Chrome Extension.</span>
</div>
<p>Historical data is obtained by analyzing user’s inbox. Service works by
connecting to the GMail’s IMAP servers using SASL XOAUTH2 mechanism<sup id="fnref:fn1" role="doc-noteref"><a href="#fn:fn1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>To authenticate with the GMail IMAP servers, the service uses an access token
which is obtained from the Google authorization servers using a user-specific
refresh token and the OAuth 2.0 refresh token flow.</p>
<p>Since the service needs to periodically fetch user’s emails, this means it
needs to securely and safely persist the refresh token so it can be re-used
later on.</p>
<p>In this blog post I’m going to describe how I have approached and designed the
server-side application architecture to provide a secure storage of refresh
tokens.</p>
<p>Keep in mind that you can use a similar approach to securely store other user’s
secrets (different service keys, access tokens, credentials, ssh keys and so
on).</p>
<h3 id="background--motivation">Background & Motivation</h3>
<p>It doesn’t really matter what kind of application you are working on, you
should always treat user’s privacy and security as a top priority. This is
especially important if you are handling sensitive, private or secret data
(like refresh tokens in this case).</p>
<p>This means you should dedicate sufficient time and resources into designing,
developing and reviewing your application and making sure it’s secure. Sadly a
lot of people and organizations don’t recognize that (or they are simply
ignorant). Because of that, incidents like a recent <a href="http://nakedsecurity.sophos.com/2013/11/04/anatomy-of-a-password-disaster-adobes-giant-sized-cryptographic-blunder/">Adobe breach</a> (Adobe
encrypted passwords using 3DES in ECB mode, seriously!) and <a href="http://open.bufferapp.com/buffer-has-been-hacked-here-is-whats-going-on/">Buffer hack</a>
are a lot more catastrophic than they would be if those companies would store
credentials properly and in a secure manner.</p>
<h3 id="application-design--security-principles">Application Design & Security Principles</h3>
<p>Here are some of the main security principles I have adhered to while designing
and working on the application:</p>
<ul>
<li>Keep it simple.</li>
<li>Don’t roll your own crypto, use well known, researched and tested principles,
algorithms, methods and libraries.</li>
<li>Use <a href="http://en.wikipedia.org/wiki/Layered_security">layered approach to security</a>.</li>
<li>Design the services to adhere to the <a href="http://en.wikipedia.org/wiki/Principle_of_least_privilege">principle of least privilege</a>.</li>
<li>To reduce the attack surface design simple and small services.</li>
<li>Isolate different services and components.</li>
</ul>
<h3 id="quick-note-about-isolation">Quick Note About Isolation</h3>
<p>As noted above, I have used isolation and layered approach to security. In
this case, the service isolation consists of the following layers:</p>
<ul>
<li>Virtualization (Xen)</li>
<li>Isolated private networks</li>
<li>Software firewall</li>
</ul>
<h3 id="architecture-overview">Architecture Overview</h3>
<p>This section contains a high-level application architecture overview and a
short description of the important services and their roles.</p>
<div class="imginline">
<a href="/images/2013-12-27-designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets/arch_overview.png" class="fancybox" rel="post">
<img src="/images/2013-12-27-designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets/arch_overview_thumb.png" class="inline" />
</a>
<span class="image-caption">High level architecture overview.</span>
</div>
<p>Note: Dashed line indicates a TLS connection and a red container indicates an
isolated private network.</p>
<p><strong>Cassandra</strong></p>
<p>Cassandra cluster is used for storing metrics, insights and other information
and metrics about the email account.</p>
<p><strong>PostgreSQL</strong></p>
<p>PostgreSQL is used for storing account meta data. At the moment, this includes
user <-> email account mappings and time zone information for each email
account.</p>
<p><strong>Web Application</strong></p>
<p>Web Application is a simple Django application which is, at the moment,
only responsible for two things:</p>
<ol>
<li>Performing an initial OAuth 2 token exchange and retrieving the refresh
token from the Google authorization servers. This happens when the user
first registers and connects their Gmail account.</li>
<li>Logging the user in. This happens on subsequent requests after the user
has already connected their account.</li>
</ol>
<p><strong>API Service</strong></p>
<p><a href="http://www.tornadoweb.org/en/stable/">Tornado</a> service which exposes a public API for retrieving metrics and
insights from the metrics database (Cassandra).</p>
<p><strong>Workers</strong></p>
<p>This service consists of <a href="http://www.celeryproject.org/">Celery</a> worker processes which run different
jobs:</p>
<ol>
<li>Retrieval job - This job fetches email messages from the IMAP servers,
parses them and stores email meta data<sup id="fnref:fn2" role="doc-noteref"><a href="#fn:fn2" class="footnote" rel="footnote">2</a></sup> in the database.</li>
<li>Aggregation jobs - Those jobs aggregate previously retrieved metrics for
the following periods: daily, weekly and monthly.</li>
<li>Processing Jobs - Those jobs process previously aggregated data and infer
all kinds of insights from it.</li>
</ol>
<p>To be able to authenticate and fetch email messages from the IMAP servers, this
service needs to have access to the access token for the email account in
question.</p>
<div class="imginline">
<a href="https://developers.google.com/accounts/docs/OAuth2WebServer" target="_blank"><img src="/images/2013-12-27-designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets/oauth2_webflow.png" class="inline" /></a>
<span class="image-caption">OAuth2 web application flow. Source: https://developers.google.com/accounts/docs/OAuth2WebServer</span>
</div>
<p>The service obtains access token by hitting and asking the token storage get
service for it (more on that bellow).</p>
<p>As such, this service is also the only one which has access to the token
storage get service and access tokens.</p>
<p><strong>Token Storage Service</strong></p>
<p>Token storage service actually consists of two separate services. First one
is responsible solely for securely storing refresh tokens (“token storage set
service”) and the second one (“token storage get service”) is responsible for
retrieving refresh token from the database, decrypting it using the private key
and using the decrypted refresh token to obtain access token from the Google
authorization servers.</p>
<p>To reduce the attack surface area, both services are designed to be small and
simple. Both of them are simple Tornado services which expose an HTTP API with
a single method to the consumers.</p>
<p>On top of that, those services run in an isolated private network and only
small set of services (two to be exact) have access to it. Web application
has access to the set service and workers have access to the get service.</p>
<p>Authentication to those services is handled using certificates. Certificate
based authentication is not ideal because it adds a lot of overhead and
basically requires you to manage and run your own certificate authority, but
that’s a complex topic for a different post. For now it suffices to say that
we have a simple process in place which works fine for a small number of
certificates.</p>
<p><strong>Token Storage SET Service</strong></p>
<p>This service exposes a method for storing an encrypted refresh token in a local
token database. Refresh tokens are encrypted using public-key / asymmetric
cryptography (more on that bellow).</p>
<p>Refresh token is encrypted using a public key on the web server which is
responsible for performing an initial OAuth 2.0 exchange and retrieving the
refresh token from Google authorization servers.</p>
<div class="imginline">
<a href="/images/2013-12-27-designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets/token_storage_set_service_flow.png" class="fancybox" rel="post">
<img src="/images/2013-12-27-designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets/token_storage_set_service_flow_thumb.png" class="inline" />
</a>
<span class="image-caption">Token storage SET service work-flow.</span>
</div>
<p><strong>Token Storage GET Service</strong></p>
<p>This service exposes a single method for retrieving an access token for an
email account. The service retrieves access token for an email account by first
retrieving encrypted refresh token from a local token database, decrypting it
using a private key and then using this decrypted refresh token to obtain a
temporary access token from the Google authorization servers.</p>
<p>As you can see above, this service needs to have access to the private key to
be able to decrypt the refresh token. As such, this is the only service which
has access to the private key and ability to decrypt the refresh token.</p>
<div class="imginline">
<a href="/images/2013-12-27-designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets/token_storage_get_service_flow.png" class="fancybox" rel="post">
<img src="/images/2013-12-27-designing-a-server-side-application-for-secure-storage-of-access-tokens-and-other-secrets/token_storage_get_service_flow_thumb.png" class="inline" /></a>
<span class="image-caption">Token storage GET service work-flow.</span>
</div>
<h3 id="public-key-cryptography--keyczar">Public Key Cryptography & Keyczar</h3>
<p>As noted above, public-key cryptography is used to protect and securely store
the refresh tokens. More, specifically <a href="http://en.wikipedia.org/wiki/RSA_(cryptosystem)">RSA algorithm</a> with a 4096 bit key
is used.</p>
<p>There are multiple ways to do public-key cryptography in Python (big chunk of
the server side application is written in Python), some of the more popular
choices include:</p>
<ul>
<li><a href="https://www.dlitz.net/software/pycrypto/">PyCrypto</a></li>
<li><a href="https://pypi.python.org/pypi/M2Crypto">M2Crypto</a></li>
<li><a href="http://www.keyczar.org/">KeyCzar</a></li>
<li>And it looks like in the near future we will have another option available -
<a href="https://cryptography.io/en/latest/">cryptography</a></li>
</ul>
<p>Because of my previous experience and other benefits which are mentioned later
on, I have decided to go with KeyCzar.</p>
<p>KeyCzar is an open source cryptographic toolkit developed by Google with
<a href="https://code.google.com/p/keyczar/wiki/CppTutorial">C++</a>, Java and <a href="https://code.google.com/p/keyczar/wiki/SamplePythonUsage">Python</a> bindings available. One of the main goals of
KeyCzar is to make it easier for developers to use cryptography safely. Unlike
other existing libraries mentioned above, it exposes a higher-level API with
more sane default values which makes it harder for developer to use it in a
wrong or a potentially harmful way.</p>
<p>On top of that, it also includes a <a href="https://code.google.com/p/keyczar/wiki/KeyczarTool">command-line tool</a> which allows users
to manage (create, rotate, revoke) key files.</p>
<h3 id="key-storage-and-management">Key Storage and Management</h3>
<p>Storage and management of the cryptographic keys is out of scope of this blog
post, but it’s worth nothing that it’s also an important topic. All of the
effort you have put into designing and making your application secure doesn’t
matter if you don’t securely store cryptographic keys which are used to protect
your secrets.</p>
<p>If you are using Amazon EC2 and are hosted on Amazon cloud, you should have a
look at <a href="http://aws.amazon.com/cloudhsm/">CloudHSM</a>. On the other hand, if you are self-hosted, you should
have a look at <a href="https://www.yubico.com/products/yubihsm/">YubiHSM</a>, a secure and cost-effective alternative to other
usually more expensive hardware based security modules.</p>
<p>In the future, <a href="https://github.com/cloudkeep/barbican">Barbican</a> might also prove itself as a viable, lower
security software based alternative.</p>
<h3 id="conclusion">Conclusion</h3>
<p>This time I have mostly focused on the high-level server-side application
architecture, but in the future posts I plan to go into more details about
the following topics:</p>
<ul>
<li>how we handle key management</li>
<li>how we handle isolation via isolated private networks</li>
<li>how we handle client side security in the chrome extension</li>
</ul>
<p>Note: If you think I have done something wrong or something can be further
improved, don’t hesitate to <a href="/about.html#contact-info">contact me</a>.</p>
<p>authenticate using username + OAuth 2.0 access token instead of using a more
traditional approach of using a username + password.</p>
<p>store the following meta data items and header values (if available):
<code class="language-plaintext highlighter-rouge">Message UID</code>, <code class="language-plaintext highlighter-rouge">Subject</code>, <code class="language-plaintext highlighter-rouge">From</code>, <code class="language-plaintext highlighter-rouge">To</code>, <code class="language-plaintext highlighter-rouge">Date</code>, <code class="language-plaintext highlighter-rouge">X-Received</code>, <code class="language-plaintext highlighter-rouge">Received-SPF</code>,
<code class="language-plaintext highlighter-rouge">DKIM-Signature</code>, <code class="language-plaintext highlighter-rouge">Authentication-Results</code>, <code class="language-plaintext highlighter-rouge">Message-ID</code>, <code class="language-plaintext highlighter-rouge">In-Reply-To</code>. On
top of that, this “raw” data is only stored until it’s aggregated (for the
average case this means less than 24 hours) and at most one week.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:fn1" role="doc-endnote">
<p>In a nut shell, <a href="https://developers.google.com/gmail/xoauth2_protocol">SASL XOAUTH2</a> mechanism allows clients to <a href="#fnref:fn1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:fn2" role="doc-endnote">
<p>In the first version, we don’t touch or store message body. We just <a href="#fnref:fn2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Libcloud update - Key pair management methods are now part of the base APIhttps://www.tomaz.me/2013/12/11/libcloud-update-key-pair-management-methods-are-now-part-of-the-base-api.html2013-12-11T00:00:00+01:00<h2 id="libcloud-update---key-pair-management-methods-are-now-part-of-the-base-api"><a href="/2013/12/11/libcloud-update-key-pair-management-methods-are-now-part-of-the-base-api.html">Libcloud update - Key pair management methods are now part of the base API</a></h2>
<p>Yesterday, I have <a href="https://github.com/apache/libcloud/pull/189">merged a Libcloud pull request</a> which promotes SSH key
pair management methods to be part of the base <a href="http://libcloud.apache.org/">Libcloud</a> compute API.</p>
<div class="imginline">
<a href="http://libcloud.apache.org" target="_blank">
<img src="/images/2013-12-11-libcloud-update-key-pair-management-methods-are-now-part-of-the-base-api/libcloud.png" class="inline" /></a>
</div>
<p>In this post I’m going to talk a bit about the project history and evolution
and show how to utilize this new functionality.</p>
<h3 id="history-and-background">History and Background</h3>
<p>Libcloud was originally developed in 2009 at <a href="http://en.wikipedia.org/wiki/Cloudkick">Cloudkick</a> to solve a problem
of talking to multiple different cloud provider APIs.</p>
<p>Later that year, the project <a href="http://incubator.apache.org/projects/libcloud.html">joined Apache Incubator</a> and in May of 2011,
the project graduated from the incubator to a top level project.</p>
<div class="imginline">
<a href="http://xkcd.com/927/" target="_blank"><img src="/images/2013-12-11-libcloud-update-key-pair-management-methods-are-now-part-of-the-base-api/standards.png" class="inline" /></a>
<span class="image-caption">An example of how Libcloud did
<strong>not</strong> came to be.</span>
</div>
<p>First couple of versions were simple and only exposed a small API (~6 methods)
for managing cloud / virtual servers.</p>
<div class="imginline">
<img src="/images/2013-12-11-libcloud-update-key-pair-management-methods-are-now-part-of-the-base-api/libcloud_apis.png" class="inline" />
<span class="image-caption">A list of methods supported in the first few
versions of Libcloud. Source: <a href="http://paul.querna.org/slides/libcloud-2010-06.pdf" target="_blank">
Apache Libcloud @ Open Source Bridge</a> presentation.</span>
</div>
<p>Down the road, the cloud evolution pace and competition increased and providers
started adding more and more features and new services. On top of that, demand
from our users also grew, so it made sense for us to start thinking about
increasing the project scope and add support for other cloud services.</p>
<p>As such, <a href="http://mail-archives.apache.org/mod_mbox/libcloud-dev/201105.mbox/%3CBANLkTi%3DLqBidHLHUwAJSAWSzd-qSpad%2BdA%40mail.gmail.com%3E">version 0.5.0</a> was born in 2011. This version represented a very
important milestone for the project. It was a first release which moved away
from only supporting the compute API and added support for managing cloud load
balancers and object storage.</p>
<p>Not long afterwards, <a href="http://mail-archives.apache.org/mod_mbox/libcloud-dev/201111.mbox/%3CCAJMHEmKkRPVeLjJ%2BCeTFU0wrW2QbyOz2bd3HVLi3Ydw283oDKQ%40mail.gmail.com%3E">version 0.6.0</a> which added support for a brand new
DNS API was released.</p>
<p>Since then, we haven’t added support for any other new APIs, but have spent a
lot of time improving the existing functionality, adding new features, adding
support for new providers and improving the library all around.</p>
<p>If you are curious about what we have been working on lately, you should have
a look at the <a href="https://libcloud.readthedocs.org/en/latest/upgrade_notes.html#libcloud-0-14-0">Upgrade Notes</a> and <a href="https://github.com/apache/libcloud/blob/trunk/CHANGES#L3">Changelog</a> for Libcloud 0.14.0,
a new stable version which should be released some time in the next couple of
weeks.</p>
<h3 id="ssh-key-pair-management-methods-promotion">SSH key pair management methods promotion</h3>
<p>Functionality for managing key pairs was already available in some drivers as
part of the extension methods and arguments for quite some time now. In
Libcloud, extension methods expose provider specific functionality and usually
differ from one provider to another.</p>
<p>Not long ago, we have spent some time unifying those arguments and method, but
there were still some minor differences between different providers which made
up for not so pleasant experience for out users.</p>
<p>Because of that, I have decided to again evaluate how much sense it makes for
us to promote those methods to be part of the base Libcloud compute API. It’s
important to keep in mind that Libcloud acts as a lowest common denominator
which means that only functionality which is exposed by majority of providers
supported in Libcloud can be part of the base API.</p>
<p>It turned out that most of the providers we support offer key pair management
functionality which means those methods are indeed a good candidate to be part
of the base API. Because of that, I have I decided to <a href="http://mail-archives.apache.org/mod_mbox/libcloud-dev/201312.mbox/%3CCAJMHEmKOsFYJZDZQLb_Z2q1Rs8Ke%2B%2BxUNnqqEbPjyTccTgPYHQ%40mail.gmail.com%3E">write up a proposal</a>.</p>
<div class="imginline">
<a href="https://libcloud.readthedocs.org/en/latest/compute/key_pair_management.html" target="_blank"><img src="/images/2013-12-11-libcloud-update-key-pair-management-methods-are-now-part-of-the-base-api/docs.png" class="inline" /></a>
<span class="image-caption">Documentation for the new functionality.</span>
</div>
<p>After some feedback and tweaks to the proposed interface, I have implemented
the proposed changes and updated the existing code. To ease the migration and
make it less painful for users which rely on the existing extension methods, I
have decided to deprecate those methods and leave them in place until the next
major release.</p>
<h3 id="working-with-the-new-api">Working with the new API</h3>
<p>The example bellow demonstrates how to use new ssh key pair management methods
which are now part of the base compute API.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">from</span> <span class="nn">pprint</span> <span class="kn">import</span> <span class="n">pprint</span>
<span class="kn">from</span> <span class="nn">libcloud.compute.types</span> <span class="kn">import</span> <span class="n">Provider</span>
<span class="kn">from</span> <span class="nn">libcloud.compute.providers</span> <span class="kn">import</span> <span class="n">get_driver</span>
<span class="n">cls</span> <span class="o">=</span> <span class="n">get_driver</span><span class="p">(</span><span class="n">Provider</span><span class="p">.</span><span class="n">EXOSCALE</span><span class="p">)</span>
<span class="n">driver</span> <span class="o">=</span> <span class="n">cls</span><span class="p">(</span><span class="s">'api key'</span><span class="p">,</span> <span class="s">'api secret key'</span><span class="p">)</span>
<span class="c1"># Create a new key pair. Most providers will return generated private key in
# the response which can be accessed at key_pair.private_key
</span><span class="n">key_pair</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">create_key_pair</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'test-key-pair-1'</span><span class="p">)</span>
<span class="n">pprint</span><span class="p">(</span><span class="n">key_pair</span><span class="p">)</span>
<span class="c1"># Import an existing public key from a file. If you have public key as a
# string, you can use import_key_pair_from_string method instead.
</span><span class="n">key_file_path</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">expanduser</span><span class="p">(</span><span class="s">'~/.ssh/id_rsa_test.pub'</span><span class="p">)</span>
<span class="n">key_pair</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">import_key_pair_from_file</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'test-key-pair-2'</span><span class="p">,</span>
<span class="n">key_file_path</span><span class="o">=</span><span class="n">key_file_path</span><span class="p">)</span>
<span class="n">pprint</span><span class="p">(</span><span class="n">key_pair</span><span class="p">)</span>
<span class="c1"># Retrieve information about previously created key pair
</span><span class="n">key_pair</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">get_key_pair</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'test-key-pair-1'</span><span class="p">)</span>
<span class="n">pprint</span><span class="p">(</span><span class="n">key_pair</span><span class="p">)</span>
<span class="c1"># Delete a key pair we have previously created
</span><span class="n">status</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">delete_key_pair</span><span class="p">(</span><span class="n">key_pair</span><span class="o">=</span><span class="n">key_pair</span><span class="p">)</span>
<span class="n">pprint</span><span class="p">(</span><span class="n">status</span><span class="p">)</span></code></pre></figure>
<p>As you can see, I have used Exoscale provider in my example, but it should work
exactly the same with other providers which support this functionality.
Currently those providers are Amazon EC2, OpenStack (and other OpenStack based
providers such as Rackspace) and CloudStack (and other CloudStack based
providers such as Exoscale and Ikoula).</p>
<p>For a full list of providers which support this functionality, please refer to
the <a href="https://libcloud.readthedocs.org/en/latest/compute/supported_providers.html#supported-methods-key-pair-management">supported providers / methods page</a>.</p>
<h3 id="conclusion">Conclusion</h3>
<p>I hope the addition of the SSH key pair management methods to the base compute
API will make it even easier for our users to work with multiple providers and
pave the surface for the promotion of other methods which will make Libcloud
more suitable for other more complex / advanced use cases.</p>
Libcloud and the road to 1.0 releasehttps://www.tomaz.me/2013/10/28/libcloud-and-the-road-to-1-0-release.html2013-10-28T00:00:00+01:00<h2 id="libcloud-and-the-road-to-10-release"><a href="/2013/10/28/libcloud-and-the-road-to-1-0-release.html">Libcloud and the road to 1.0 release</a></h2>
<p>Back in September of 2011, I was a <a href="http://twit.tv/show/floss-weekly/181">guest on FLOSS Weekly</a> where I was
interviewed about <a href="http://libcloud.apache.org/">Libcloud</a>.</p>
<div class="imginline">
<img src="/images/2013-10-28-libcloud-and-the-road-to-1-0-release/libcloud.png" class="inline" />
</div>
<p>If you are not familiar with <a href="http://twit.tv/show/floss-weekly">FLOSS Weekly</a>, it’s a weekly podcast (hence the
name) about <a href="http://en.wikipedia.org/wiki/Gratis_versus_libre">free and libre</a> open source software. I have been listening to
to it for a long time (it’s a great way to spend time while you run errands / shop
for groceries / bike to work) and one of my favorite things is that it covers a
very wide range of guests and topics. Guests range from new comers to open source
word to people with 15+ years of experience and a long open source contribution
history. Some goes for projects. They range from small hobby projects you might
never heard about to popular projects with very large communities and
ecosystems such as Arduino and OpenStack Swift.</p>
<h3 id="the-road-to-10">The road to 1.0</h3>
<p>Anyway, lets get back on topic.</p>
<p>One of the questions was how stable Libcloud is and when users can expect a
1.0 release. At that time, some of the APIs such as DNS and storage were added
just recently, but compute API has been stable for quite a while and used in
production in multiple places.</p>
<p>My answer was something along the lines that for the main part, Libcloud
already is production ready and 0.1 versioning scheme is just an artifact
left over from the past and 1.0 version should hopefully be released some
time next year.</p>
<p>It has been more than 2 years since then and we have made numerous releases
during that time, but a version 1.0 still hasn’t been released yet.</p>
<p>You might ask why. The closest reason for that is that a documentation was
lacking and we just simply hadn’t made the switch yet.</p>
<div class="imginline">
<a href="https://libcloud.readthedocs.org"><img src="/images/2013-10-28-libcloud-and-the-road-to-1-0-release/documentation.png" class="inline" /></a>
<span class="image-caption">New documentation which is available at
<a href="https://libcloud.readthedocs.org">https://libcloud.readthedocs.org</a>.</span>
</div>
<p>Documentation situation has been <a href="https://libcloud.readthedocs.org/en/latest/">improving lately</a>, so a while back I
thought it’s finally time to start working on a 1.0 release.</p>
<h3 id="10-does-it-even-matter">1.0, does it even matter?</h3>
<p>Some of you might ask why switch to 1.0 and not just continue with 0.1 series?</p>
<p>There are multiple reasons for that:</p>
<ul>
<li>
<p><code class="language-plaintext highlighter-rouge">1.0</code> release indicates production readiness to a lot of people. I’m
personally more of a rolling release guy and believe that the whole
“production ready” concept is often misleading and the switch is usually made
purely out of marketing and political reasons and not technical ones.
In any case, a lot of users still associate 1.0 with production readiness so
it makes sense for us to switch and indicate them that Libcloud is safe to be
used in production.</p>
</li>
<li>
<p>Move to 1.0 will finally allow us to use <a href="http://semver.org/">semantic versioning</a>. As noted
above, current versioning scheme is mostly and artifact from the past.
Using semantic versioning will make it easier for our users to understand
what is going on and know what to expect with each release. If you want to
know other reasons, see <a href="https://libcloud.readthedocs.org/en/latest/">this</a> mailing list thread.</p>
</li>
</ul>
<div class="imginline">
<img src="/images/2013-10-28-libcloud-and-the-road-to-1-0-release/semantic_versioning.png" class="inline" />
<span class="image-caption">An example of semantic versioning scheme. Source: http://www.aosabook.org/en/eclipse.html</span>
</div>
<p>Some of you might also say that this switch seems kinds arbitrary. That is true,
but as noted above, base APIs have been stable for a long time and at this
point, we should just do it.</p>
<p>On top of that, Linux kernel did a similar thing with <a href="http://arstechnica.com/information-technology/2011/07/linux-kernel-version-bumped-up-to-30-as-20th-birthday-approaches/">transition from 2.6 to
3.0</a> and if Linux kernel can do it, we can do it as well :P</p>
<h3 id="road-to-10">Road to 1.0</h3>
<p>I would say that at this point the road to 1.0 is pretty shallow and my goal is
to get the release out some time in the next couple of months.</p>
<p>We are currently working on a 0.14.0 release. This releases includes some
pretty big changes and improvements and will most likely be a last release
with backward-incompatible changes before the 1.0 one.</p>
<p>One of the more important changes this release bring is improved support for
providers with multiple regions. For more, see <a href="https://libcloud.readthedocs.org/en/latest/upgrade_notes.html#libcloud-0-14-0">Upgrade notes</a> and
<a href="https://git-wip-us.apache.org/repos/asf?p=libcloud.git;a=blob;f=CHANGES#l3">CHANGES</a> file.</p>
<p>And as far as backward-incompatible changes go, we have doing our best to
avoid them as much as possible, even before the 1.0 release. Sadly that is not
always possible and we did manage to have some backward incompatible changes
in the past, but all of those changes very small and non-invasive. We also
didn’t press users to update their code as soon as possible and we supported
old (deprecated) way of doing things for a long time to make a transition
easier.</p>
<p>Another thing which I would also like to see done before 1.0 release is an
improved and more user-friendly website. It is something which has been on my
todo for a long time and I have even started to work on it in the past, but I
was always distracted by other things and never made much progress.</p>
<div class="imginline">
<img src="/images/2013-10-28-libcloud-and-the-road-to-1-0-release/new_website.png" class="inline" />
<span class="image-caption">Sneak peek of a new website. Keep in mind that
this is a very early draft which is likely to change in the near future.</span>
</div>
<p>In any case, Jerry recently started working on an improved design based on
Bootstrap 3. Once the new design is ready, it should be a fairly easy and smooth
ride from then on.</p>
<h3 id="how-can-i-help">How can I help?</h3>
<p>As always, any kind of help and contributions are welcome and appreciated. One
of the things we need the most help with at the moment is documentation and
testing of the 0.14.0 release once it becomes available.</p>
<p>For information on how to contribute, see <a href="https://libcloud.readthedocs.org/en/latest/development.html#contributing">this page</a>.</p>
<h3 id="so-when">So, when?</h3>
<p>As noted above, two main pre-requisites for the 1.0 release are a new website
and a 0.14 release. Both of those things should be available in the near
future.</p>
<p>We obviously do a lot of testing and have a fairly comprehensive test suite,
but nothing is perfect and sadly, some bugs almost always manage to get
overlooked.</p>
<p>I imagine this will also be the case with 0.14.0 release and even more so
since it includes a large number of changes and improvements.</p>
<p>Because of that, I want to give 0.14 enough time in the wild before preparing
a 1.0 release. This means you can expect 1.0 release around 6 - 12 weeks
after 0.14 becomes available.</p>
Migrating from epydoc to Sphinx style docstrings using sed and some command line fuhttps://www.tomaz.me/2013/09/28/migrating-from-epydoc-to-sphinx-style-docstrings-using-sed-and-some-command-line-fu.html2013-09-28T00:00:00+02:00<h2 id="migrating-from-epydoc-to-sphinx-style-docstrings-using-sed-and-some-command-line-fu"><a href="/2013/09/28/migrating-from-epydoc-to-sphinx-style-docstrings-using-sed-and-some-command-line-fu.html">Migrating from epydoc to Sphinx style docstrings using sed and some command line fu</a></h2>
<p>This post describes how to migrate Python API documentation which uses
<a href="http://epydoc.sourceforge.net/">epydoc</a> style docstings to <a href="http://sphinx-doc.org/">Sphinx</a> format using sed and some command
line fu.</p>
<h3 id="motivation">Motivation</h3>
<p>After a gentle nudge by <a href="http://alexgaynor.net/">Alex Gaynor</a>, we have recently finally started
to work on a task which was long overdue - improving documentation for the
<a href="http://libcloud.apache.org/">Libcloud</a> project.</p>
<p>Improving and updating documentation has been on my todo for a long time, but
I was always too busy and / or had an excuse to work on code or some other
non-documentation related part of the project.</p>
<p>I know there is no good excuse or apology for that, but I don’t want digress
too much from the original title of this post, so I plan to go into more
details in a separate blog post. For now it suffices to say that we have
already made quite a lot of progress and as always,
<a href="http://ci.apache.org/projects/libcloud/docs/development.html#contributing">your contributions are very much appreciated and welcome</a>.</p>
<div class="imginline">
<a href="https://libcloud.apache.org/docs/"><img src="/images/2013-09-28-migrating-from-epydoc-to-sphinx-style-docstrings-using-sed-and-some-command-line-fu/libcloud_docs.png" class="inline" /></a>
<span class="image-caption">New documentation already looks way better than
the old one.</span>
</div>
<p>This task included writing new documentation and moving existing regular and
API documentation to Sphinx.</p>
<p>Existing documentation was stored in subversion (using Apache CMS) in Markdown
format. The move to Sphinx and reStructuredText was performed manually. The
reason for that is that the existing documentation was pretty poor and lacking
and the move didn’t just involve changing the format, but it also involved
rewriting the text and filling the gaps.</p>
<p>Existing API documentation and docstrings used epytext markup. Unlike regular
documentation, API documentation didn’t need rewriting and we just wanted to
migrate to a Sphinx style docstring format so we could use
<a href="http://sphinx-doc.org/ext/autodoc.html">autodoc extension</a>.</p>
<h3 id="migrating-from-epydoc-to-sphinx-style-docstring-format">Migrating from epydoc to Sphinx style docstring format</h3>
<p>There are multiple ways to approach this task:</p>
<ol>
<li>Write a Sphinx extension which converts epytext tags to Sphinx format on the
fly</li>
<li>Update all the epytext tags in the code</li>
</ol>
<p>I decided to go with #2 and automate it using some command line fu. The reason
for that is, that on the fly translation slows things down and moving forward,
you end up with two style of docstrings in your code (epytext for old and Sphinx
for new code).</p>
<p>The only downside of the second approach that it touches a lot of code and in
case you have a lot of open pull requests, this could result in a bunch of merge
conflicts down the road, so keep that in mind.</p>
<p>The script which I used for the migration can be found bellow:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c">#!/usr/bin/env bash</span>
<span class="c">#</span>
<span class="c"># Script for migrating from epydoc to Sphinx style docstrings.</span>
<span class="c">#</span>
<span class="c"># WARNING: THIS SCRIPT MODIFIES FILES IN PLACE. BE SURE TO BACKUP THEM BEFORE</span>
<span class="c"># RUNNING IT.</span>
<span class="nv">DIRECTORY</span><span class="o">=</span><span class="nv">$1</span>
<span class="nv">SED</span><span class="o">=</span><span class="sb">`</span>which gsed gnused <span class="nb">sed</span><span class="sb">`</span>
<span class="k">for </span>value <span class="k">in</span> <span class="nv">$SED</span>
<span class="k">do
</span><span class="nv">SED</span><span class="o">=</span><span class="k">${</span><span class="nv">value</span><span class="k">}</span>
<span class="nb">break
</span><span class="k">done
if</span> <span class="o">[</span> <span class="o">!</span> <span class="nv">$DIRECTORY</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
</span><span class="nb">echo</span> <span class="s2">"Usage: ./migrate_docstrings.sh <directory with your code>"</span>
<span class="nb">exit </span>1
<span class="k">fi
</span>OLD_VALUES[0]<span class="o">=</span><span class="s1">'@type'</span>
OLD_VALUES[1]<span class="o">=</span><span class="s1">'@keyword'</span>
OLD_VALUES[2]<span class="o">=</span><span class="s1">'@param'</span>
OLD_VALUES[3]<span class="o">=</span><span class="s1">'@return'</span>
OLD_VALUES[4]<span class="o">=</span><span class="s1">'@rtype'</span>
OLD_VALUES[5]<span class="o">=</span><span class="s1">'L{\([^}]\+\)}'</span>
OLD_VALUES[6]<span class="o">=</span><span class="s1">'C{\(int\|float\|str\|list\|tuple\|dict\|bool\|None\|generator\|object\)}'</span>
OLD_VALUES[7]<span class="o">=</span><span class="s1">'@\(ivar\|cvar\|var\)'</span>
NEW_VALUES[0]<span class="o">=</span><span class="s1">':type'</span>
NEW_VALUES[1]<span class="o">=</span><span class="s1">':keyword'</span>
NEW_VALUES[2]<span class="o">=</span><span class="s1">':param'</span>
NEW_VALUES[3]<span class="o">=</span><span class="s1">':return'</span>
NEW_VALUES[4]<span class="o">=</span><span class="s1">':rtype'</span>
NEW_VALUES[5]<span class="o">=</span><span class="s1">':class:`\1`'</span>
NEW_VALUES[6]<span class="o">=</span><span class="s1">'``\1``'</span>
NEW_VALUES[7]<span class="o">=</span><span class="s1">':\1'</span>
<span class="k">for</span> <span class="o">((</span> i <span class="o">=</span> 0 <span class="p">;</span> i < <span class="k">${#</span><span class="nv">OLD_VALUES</span><span class="p">[@]</span><span class="k">}</span> <span class="p">;</span> i++ <span class="o">))</span>
<span class="k">do
</span><span class="nv">old_value</span><span class="o">=</span><span class="k">${</span><span class="nv">OLD_VALUES</span><span class="p">[</span><span class="nv">$i</span><span class="p">]</span><span class="k">}</span>
<span class="nv">new_value</span><span class="o">=</span><span class="k">${</span><span class="nv">NEW_VALUES</span><span class="p">[</span><span class="nv">$i</span><span class="p">]</span><span class="k">}</span>
<span class="nv">cmd</span><span class="o">=</span><span class="s2">"find </span><span class="k">${</span><span class="nv">DIRECTORY</span><span class="k">}</span><span class="s2"> -name '*.py' -type f -print0 | xargs -0 </span><span class="k">${</span><span class="nv">SED</span><span class="k">}</span><span class="s2"> -i -e 's/</span><span class="k">${</span><span class="nv">old_value</span><span class="k">}</span><span class="s2">/</span><span class="k">${</span><span class="nv">new_value</span><span class="k">}</span><span class="s2">/g'"</span>
<span class="nb">echo</span> <span class="s2">"Migrating: </span><span class="k">${</span><span class="nv">old_value</span><span class="k">}</span><span class="s2"> -> </span><span class="k">${</span><span class="nv">new_value</span><span class="k">}</span><span class="s2">"</span>
<span class="nb">eval</span> <span class="s2">"</span><span class="nv">$cmd</span><span class="s2">"</span>
<span class="k">done</span></code></pre></figure>
<p>(script is also available as gist at
<a href="https://gist.github.com/Kami/6734885#file-migrate_docstrings-sh">https://gist.github.com/Kami/6734885</a>)</p>
<p>As you can see, the script is very simple and has some limitations (noted
bellow), but it worked very well for us. As usually, <a href="http://en.wikipedia.org/wiki/Pareto_principle">80-20</a> rule also applies
in this case.</p>
<p>Limitations of this script:</p>
<ul>
<li>Script does a very simple search and replace and has no knowledge or context
of the surrounding code and text. This means that if you have some code which
looks like epytext docstrings, this script might unwillingly replace it.</li>
<li>I only added support for tags we use. As such, the script doesn’t support
all the epytext tags. This shouldn’t be a big deal though. It’s fairly easy
to change it and add support for all of the tags. You can find a list
of all the available tags on <a href="http://epydoc.sourceforge.net/manual-fields.html">this page</a>.</li>
</ul>
135 days of commits and 50+ open source contributions laterhttps://www.tomaz.me/2013/09/26/135-days-of-commits-and-50-plus-open-source-contributions-later.html2013-09-26T00:00:00+02:00<h2 id="135-days-of-commits-and-50-open-source-contributions-later"><a href="/2013/09/26/135-days-of-commits-and-50-plus-open-source-contributions-later.html">135 days of commits and 50+ open source contributions later</a></h2>
<p>It has been more than 2 months since I left Rackspace to work on my own startup.
Those two months have been very busy and many different things have happened.</p>
<p>After a lot of development, brainstorming and working with our potential
customers, David and I have decided to stop working on a project we originally
started to work on after I left Rackspace (<a href="https://www.wadodo.com/presentation/">Wadodo</a>) and focus our effort on two
other projects.</p>
<p>I plan to go into more details why we have decided to do that in a future blog
post, but it suffices to say that we have decided to focus our efforts on
projects in other fields where we have more experience and connections
(personal training, big data and distributed systems).</p>
<div class="imginline">
<a href="https://www.wadodo.com/presentation/">
<img src="/images/2013-09-26-135-days-of-commits-and-50-plus-open-source-contributions-later/wadodo.png" class="inline" />
</a>
<span class="image-caption">Wadodo</span>
</div>
<p>First of those projects has already been launched. It’s an online marketplace
for personal trainers and athletes called <a href="https://www.coachspree.com/">CoachSpree</a>. Second project is
well underway and should be launched in the near future.</p>
<p>To make those products a reality, I have spent a lot of time on coding and
non-coding (customer development and acquisition, …) tasks. In this post I’m
going to ignore other topics for a moment and focus solely on coding tasks, more
specifically on my open-source contributions.</p>
<p>Why, you might ask?</p>
<p>There are multiple reasons:</p>
<ol>
<li>It’s nice to look back and see things that have been accomplished.</li>
<li>To encourage more people to contribute to open source projects.</li>
<li>To show people there is always time to give back and contribute to open
source projects, even while working crazy hours on a startup.</li>
<li>To push myself to contribute even more in the future.</li>
</ol>
<h3 id="open-source-contributions-are-good-yo">Open source contributions are good, yo!</h3>
<p>During this period I have made more than 50 contributions to more than 30
different open source projects. A big chunk of changes I have contributed were
smaller bug fixes and feature additions, but there were also larger
contributions and feature additions.</p>
<div class="imginline">
<a href="https://github.com/Kami">
<img src="/images/2013-09-26-135-days-of-commits-and-50-plus-open-source-contributions-later/contributions_graph.png" class="inline" />
</a>
<span class="image-caption">Github activity graph (also includes commits to
private repositories)</span>
</div>
<p>It’s also important to keep in mind that the sheer contribution size and number
lines of code is not always a good indicator of how much time was actually
spent working on the issue.</p>
<p>Some of the smaller bug fixes I’ve contributed took substantially more time than
other larger contributions. The reason for that is that some of the smaller
contributions were fixes for some really nasty edge cases. And as it usually goes,
debugging nasty edge conditions can be very time consuming and many times it’s
really hard to write a test case which reproduces the issue.</p>
<p>Some projects I have contributed to are listed bellow. This list excludes
projects where I’m a primary author.</p>
<ul>
<li><a href="https://github.com/codasus/django-location-field">django-location-field</a></li>
<li><a href="https://github.com/jakewins/django-money">django-money</a></li>
<li><a href="https://github.com/jezdez/django-avatar">django-avatar</a></li>
<li><a href="https://github.com/senko/python-video-converter">python-video-converter</a></li>
<li><a href="https://github.com/jsocol/django-waffle">django-waffle</a></li>
<li><a href="https://bitbucket.org/ubernostrum/django-registration/">django-registration</a></li>
<li><a href="https://github.com/jorgebastida/django-dajax/">django-dajax</a></li>
<li><a href="https://github.com/BBC-News/wraith">wraith</a></li>
<li><a href="https://github.com/cyberdelia/django-pipeline">django-pipeline</a></li>
<li><a href="https://github.com/apache/cordova-android">cordova-android</a></li>
<li><a href="https://github.com/paulchakravarti/gmail-sender">gmail-sender</a></li>
<li><a href="https://github.com/paramiko/paramiko">paramiko</a></li>
<li><a href="https://github.com/racker/node-cassandra-client">node-cassandra-client</a></li>
<li><a href="https://github.com/datastax/python-driver">python-driver</a></li>
<li><a href="https://github.com/gotwarlost/istanbul">istanbul</a></li>
<li><a href="https://github.com/ICTO/ansible-jenkins">ansible-jenkins</a></li>
<li><a href="https://github.com/jorgebastida/django-dajax">django-dajax</a></li>
<li><a href="https://github.com/apache/libcloud">libcloud</a></li>
</ul>
<p>As you can see, there is a lot of Python, but there are also contributions in
other languages such as JavaScript (Node.js), Ruby and Bash.</p>
<h3 id="conclusion">Conclusion</h3>
<p>My goal is to continue and hopefully exceed this pace of open source contributions
and giving back to the community in the future.</p>
<p>I also hope this post will inspire other developers to contribute more. I would
especially like to see more smaller web agencies here in Slovenia to contribute
more.</p>
<p>I know a lot of such companies who rely on many open source projects and
technologies, but they run forks instead of contributing changes back upstream.</p>
<p>Lets ignore moral perspective and not giving back for a moment and focus solely
on the cost of forking. From a quick glance and a short-term thinking, forking
might seem like a time saving thing to do. It is true that it might save you
some time in the short term, but in most cases it’s going to result in a lot
of additional work and maintenance headaches in the future.</p>
<p>So keep that in mind next time you fork a project.</p>
10 secrets to sustainable open source communities; great presentation about open source communitieshttps://www.tomaz.me/2013/09/16/great-presentation-about-open-source-communities.html2013-09-16T00:00:00+02:00<h2 id="10-secrets-to-sustainable-open-source-communities-great-presentation-about-open-source-communities"><a href="/2013/09/16/great-presentation-about-open-source-communities.html">10 secrets to sustainable open source communities; great presentation about open source communities</a></h2>
<p>A while back I have encountered a great presentation about open source
communities titled <a href="http://www.slideshare.net/eleddy/os-con2013">10 secrets to sustainable open source communities</a> which
has been delivered by Elizabeth Leddy at OSCON 2013.</p>
<p>The presentation primarily talks about author’s experience with the Plone
project and community, but the lessons and observations in it can be applied to
pretty much any open source project out there.</p>
<div class="imginline"><img src="/images/2013-09-16-great-presentation-about-open-source-communities/slide.png" class="inline" />
<span class="image-caption">One of the slides from the presentation</span></div>
<p>Why I find this presentation so good you might ask? Here are some highlights
from the presentation:</p>
<ul>
<li>Open source is more than just contributing code</li>
<li>In many cases open source projects outlast relationships and jobs</li>
<li>A community can as as your extended family</li>
<li>People move on (aka life happens) so you should plan accordingly</li>
<li>Measuring community success is important</li>
<li>Diversity is important</li>
<li>Communication is important</li>
<li>Transparency is important</li>
<li>In-person communication and meetups are important</li>
<li>Soft skills are important</li>
<li>Project governance is important</li>
</ul>
<p>For more, I encourage you to go <a href="http://www.slideshare.net/eleddy/os-con2013">check out the presentation</a>.</p>
Exporting Libcloud DNS zone to BIND zone file format and migrating between DNS providershttps://www.tomaz.me/2013/09/07/exporting-libcloud-dns-zone-to-bind-zone-file-format-and-migrating-between-dns-providers.html2013-09-07T00:00:00+02:00<h2 id="exporting-libcloud-dns-zone-to-bind-zone-file-format-and-migrating-between-dns-providers"><a href="/2013/09/07/exporting-libcloud-dns-zone-to-bind-zone-file-format-and-migrating-between-dns-providers.html">Exporting Libcloud DNS zone to BIND zone file format and migrating between DNS providers</a></h2>
<p>Because of the reliability issues and pretty much non-existent and
non-responsive customer service (if you don’t believe me, check <a href="https://twitter.com/search?q=zerigo&src=typd&mode=realtime">Twitter</a>
which is full of complaints) I migrated some of my domains away from Zerigo to
a different DNS provider. To do that, I wrote a simple Python script which
allows you to export <a href="http://libcloud.apache.org/">Libcloud</a> <a href="https://ci.apache.org/projects/libcloud/docs/dns/index.html">DNS zone</a> to a BIND zone file format.</p>
<h3 id="motivation--history">Motivation & History</h3>
<p>I have been Zerigo user for almost 4 years now. One of the primary reasons
why I have migrated most of my domains to Zerigo back then was that they
were one of the first DNS providers which offered a simple management REST API.</p>
<p>At the beginning things were working flawlessly, but after the <a href="http://investors.8x8.com/releasedetail.cfm?ReleaseID=585970">acquisition by
8x8</a> things have started degrading. Things have became especially
bad in the last year or so. During that time Zerigo has been a target of
multiple DDoS attacks which took down majority or all of their DNS servers
(<a href="http://copperegg.com/zerigo-and-the-ddos-attack-of-july-23/">1</a>, <a href="http://news.softpedia.com/news/DNS-Provider-Zerigo-Hit-by-DDOS-Attack-362771.shtml">2</a>). Those attacks caused a major disruptions for a lot of Zerigo
DNS users.</p>
<div class="imginline"><img src="/images/2013-09-07-exporting-libcloud-dns-zone-to-bind-zone-file-format-and-migrating-between-dns-providers/anycast.png" class="inline" />
<span class="image-caption">Anycast provides better resilience against DDoS attacks</span></div>
<p>Recently Zerigo <a href="http://www.zerigo.com/article/improving-zerigo-dns-server-reliability">posted an announcement</a> where they said that they have
implemented multiple measures to improve their DNS servers reliability. Sadly,
as displayed a couple of weeks ago when they were a target of another
DDoS attack, those measurements didn’t help. The problem is that most of the
measurements they have implemented are just small patches which don’t address
a root cause. To improve the reliability of their service, the first step they
would need to take is to move all of their DNS servers to <a href="http://en.wikipedia.org/wiki/Anycast">Anycast</a>. Anycast
has long been used by many commercial DNS providers to increase performance and
availability.</p>
<h3 id="exporting-libcloud-dns-zone-to-bind-zone-file-format">Exporting Libcloud DNS zone to BIND zone file format</h3>
<p>Yesterday I have decided to migrate more of my domains to a different provider.
To expedite the migration I wrote a simple Python script which can take a
<a href="http://libcloud.apache.org/">Libcloud</a> <a href="https://ci.apache.org/projects/libcloud/docs/dns/index.html">DNS zone</a> and create a BIND zone file for it.</p>
<p>Advantage of this approach over writing a script which uses Libcloud to
directly re-create all the records under a different provider is that it’s more
efficient and as an output you get a file which you can use with any DNS
software or provider which supports BIND zone file format.</p>
<p>Keep in mind that a similar thing can be achieved by using a <code class="language-plaintext highlighter-rouge">dig</code> tool (<code class="language-plaintext highlighter-rouge">dig
+nocmd example.com any +multiline +noall +answer</code>). The problem with <code class="language-plaintext highlighter-rouge">dig</code>
approach is that unless zone transfers are enabled for your IP address, you
won’t receive all the records back.</p>
<p>Aforementioned script can be found on <a href="https://github.com/Kami/python-libcloud-dns-to-bind-zone">Github</a>.</p>
<h3 id="usage">Usage</h3>
<p>To use this script and create a BIND zone file, follow the steps bellow:</p>
<ol>
<li>Install the Python package</li>
</ol>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>pip <span class="nb">install</span> <span class="nt">-e</span> git+https://github.com/Kami/python-libcloud-dns-to-bind-zone@master#egg<span class="o">=</span>libcloud_to_bind</code></pre></figure>
<p>The script is so simple I haven’t decided to publish it to PyPi yet. I plan to
start a discussion on the Libcloud mailing list and if more people find it
useful and are OK with that, I will include this functionality in the core.</p>
<ol>
<li>Take a look at <a href="https://ci.apache.org/projects/libcloud/docs/dns/supported_providers.html">example.py</a> and modify it to suit your needs. For
example:</li>
</ol>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">libcloud.dns.types</span> <span class="kn">import</span> <span class="n">Provider</span>
<span class="kn">from</span> <span class="nn">libcloud.dns.providers</span> <span class="kn">import</span> <span class="n">get_driver</span>
<span class="kn">from</span> <span class="nn">libcloud_to_bind</span> <span class="kn">import</span> <span class="n">libcloud_zone_to_bind_zone_file</span>
<span class="n">DOMAIN_TO_EXPORT</span> <span class="o">=</span> <span class="s">'example.com'</span>
<span class="n">Zerigo</span> <span class="o">=</span> <span class="n">get_driver</span><span class="p">(</span><span class="n">Provider</span><span class="p">.</span><span class="n">ZERIGO</span><span class="p">)</span>
<span class="n">driver</span> <span class="o">=</span> <span class="n">Zerigo</span><span class="p">(</span><span class="s">'email'</span><span class="p">,</span> <span class="s">'api key'</span><span class="p">)</span>
<span class="n">zones</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">list_zones</span><span class="p">()</span>
<span class="n">zone</span> <span class="o">=</span> <span class="p">[</span><span class="n">z</span> <span class="k">for</span> <span class="n">z</span> <span class="ow">in</span> <span class="n">zones</span> <span class="k">if</span> <span class="n">z</span><span class="p">.</span><span class="n">domain</span> <span class="o">==</span> <span class="n">DOMAIN_TO_EXPORT</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">libcloud_zone_to_bind_zone_file</span><span class="p">(</span><span class="n">zone</span><span class="o">=</span><span class="n">zone</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span></code></pre></figure>
<p>Keep in mind that you can replace Zerigo with any other provider <a href="https://ci.apache.org/projects/libcloud/docs/dns/supported_providers.html">supported by
Libcloud</a>.</p>
<ol>
<li>Run the code</li>
</ol>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>pypy example.py</code></pre></figure>
<p>(yes, Libcloud works just fine under <a href="http://pypy.org/">PyPy</a>)</p>
<p>Here is an example output for one of my domains:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="p">;</span> Generated by Libcloud v0.13.0 on 2013-09-07 00:09:09
<span class="nv">$ORIGIN</span> tomaz.me.
<span class="nv">$TTL</span> 900
tomaz.me. 900 IN TXT <span class="s2">"v=spf1 include:_spf.google.com ~all"</span>
tomaz.me. 900 IN A 207.97.227.245
tomaz.me. 900 IN MX 30 aspmx5.googlemail.com.
testsrv.tomaz.me. 900 IN SRV 10 10 333 google.com.
mail._domainkey.atlantis.tomaz.me. 900 IN TXT <span class="s2">"v=DKIM1; k=rsa; t=y; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC6HeU4PBI+JuEWAe03Bzye1Gs+U2vXhbSloSNbXr9JDWMygyQtCjxN7brHahqFambBtmdQ5VmbukM+HFlKUoaNz7Q97KaKRQg8mDvSmLJkHmAw5PzZJXfzrfkoLmXhN6K4XnwLWJ0BFWPyEPdpwCX8v9v3kB0INJU4hNjwdy/+6wIDAQAB"</span>
www.tomaz.me. 900 IN CNAME kami.github.com.
tomaz.me. 900 IN MX 30 aspmx3.googlemail.com.
google._domainkey.tomaz.me. 900 IN TXT <span class="s2">"v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDNCHa8VeffMv+X/fRkPgHC9MN2Eh5vQqMkWy4e/YnFbWgF1JilL1Yn9nN54A5WV7lZpCTIvuOC2CrQrIcaBpfr+8SjYsjGO91dz8cwgqZkl7mAjKs7nz8U0PsstuI9i4V3LsHC4NVGOirAgnKA4HXVhxGRuyE94+tuNJ6XDLJoNQIDAQAB"</span>
tomaz.me. 900 IN MX 20 alt2.aspmx.l.google.com.
atlantis.tomaz.me. 900 IN TXT <span class="s2">"v=spf1 ip4:178.63.79.14 ip4:178.63.79.48 ip4:178.63.79.49 ip4:178.63.79.50 ip6:2a01:4f8:121:3121::2"</span>
atlantis.tomaz.me. 900 IN PTR atlantis.tomaz.me.
tomaz.me. 900 IN TXT google-site-verification<span class="o">=</span>Rgex8ShgIRWUlb9j0Ivw5uHllb0p9skEdJqkSMqvX_o
test5.tomaz.me. 900 IN AAAA 2620:0:1cfe:face:b00c::3
tomaz.me. 900 IN SPF <span class="s2">"v=spf1 include:_spf.google.com ~all"</span>
tomaz.me. 900 IN MX 20 alt1.aspmx.l.google.com.
atlantis.tomaz.me. 900 IN A 178.63.79.14
test5.tomaz.me. 900 IN A 127.0.0.1
ponies.tomaz.me. 900 IN A 86.58.76.208
tomaz.me. 900 IN MX 30 aspmx2.googlemail.com.
atlantis.tomaz.me. 900 IN AAAA 2a01:4f8:121:3121::2
secure.tomaz.me. 900 IN A 86.58.76.208
tomaz.me. 900 IN MX 10 aspmx.l.google.com.
tomaz.me. 900 IN MX 30 aspmx4.googlemail.com.</code></pre></figure>