Tomaz Muraus' personal blog

Making StackStorm Fast

2021-07-04T00:00:00+02:00

Making StackStorm Fast

In this post I will describe changes to the StackStorm database abstraction layer which landed in StackStorm v3.5.0. Those changes will substantially speed up action executions and workflow runs for most users.

Based on the benchmarks and load testing we have performed, most actions which return large results and workflows which pass large datasets around should see speed ups in the range of up to 5-15x.

If you want to learn more about the details you can do that below. Alternatively if you only care about the numbers, you can go directly to the Numbers, numbers, numbers section.

Background and History

Today StackStorm is used for solving a very diverse set of problems – from IT and infrastructure provisioning to complex CI/CD pipeline, automated remediation, various data processing pipelines and more.

Solving a lot of those problems requires passing large datasets around – this usually involves passing around large dictionary objects to the actions (which can be in the range of many MBs) and then inside the workflow, filtering down the result object and passing it to other tasks in the workflow.

This works fine when working with small objects, but it starts to break when larger datasets are passed around (dictionaries over 500 KB).

In fact, passing large results around has been StackStorm’s achilles heel for many years now (see some of the existing issues - #3718, #4798, #625). Things will still work, but executions and workflows which handle large datasets will get progressively slower and waste progressively more CPU cycles and no one likes slow software and wasting CPU cycles (looking at you bitcoin).

One of the more popular workarounds usually involves storage those larger results / datasets in a 3d party system (such as a database) and then querying this system and retrieving data inside the action.

There have been many attempts to improve that in the past (see #4837, #4838, #4846) and we did make some smaller incremental improvements over the years, but most of them were in the range of a couple of 10% of an improvement maximum.

After an almost year long break from StackStorm due to the busy work and life situation, I used StackStorm again to scratch my own itch. I noticed the age old “large results” problem hasn’t been solved yet so I decided to take a look at the issue again and try to make more progress on the PR I originally started more than a year ago (https://github.com/StackStorm/st2/pull/4846).

It took many late nights, but I was finally able to make good progress on it. This should bring substantial speed ups and improvements to all StackStorm users.

Why the problem exists today

Before we look into the implemented solution, I want to briefly explain why StackStorm today is slow and inefficient when working with large datasets.

Primary reason why StackStorm is slow when working with large datasets is because we utilize EscapedDictField() and EscapedDynamicField() mongoengine field types for storing execution results and workflow state.

Those field types seemed like good candidates when we started almost 7 years ago (and they do work relatively OK for smaller results and other metadata like fields), but over the years after people started to push more data through it, it turned out they are very slow and inefficient for storing and retrieving large datasets.

The slowness boils down to two main reasons:

Field keys need to be escaped. Since . and $ are special characters in MongoDB used for querying, they need to be escaped recursively in all the keys of a dictionary which is to be stored in the database. This can get slow with large and deeply nested dictionaries.
mongoengine ORM library we use to interact with MongoDB is known to be be very slow compared to using pymongo directly when working with large documents (see #1230 and https://stackoverflow.com/questions/35257305/mongoengine-is-very-slow-on-large-documents-compared-to-native-pymongo-usage). This is mostly due to the complex and slow conversion of types mongoengine performs when storing and retrieving documents.

Those fields are also bad candidates for what we are using them for. Data we are storing (results) is a more or less opaque binary blob to the database, but we are storing it in a very rich field type which supports querying on field keys and values. We don’t rely on any of that functionality and as you know, nothing comes for free – querying on dictionary field values requires more complex data structures internally in MongoDB and in some cases also indexes. That’s wasteful and unnecessary in our case.

Solving the Problem

Over the years there have been many discussions on how to improve that. A lot of users said we should switch away from MongoDB.

To begin with, I need to start and say I’m not a big fan of MongoDB, but the actual database layer itself is not the problem here.

If switching to a different database technology was justified (aka the bottleneck was the database itself and nor our code or libraries we depend on), then I may say go for it, but the reality is that even then, such a rewrite is not even close to being realistic.

We do have abstractions / ORM in place for working with the database layer, but as anyone who was worked in a software project which has grown organically over time knows, those abstractions get broken, misused or worked around over time (for good or bad reasons, that’s it’s not even important for this discussion).

Reality is that moving to a different database technology would likely require many man months hours of work and we simply don’t have that. The change would also be much more risky, very disruptive and likely result in many regressions and bugs – I have participated in multiple major rewrites in the past and no matter how many tests you have, how good are the coding practices, the team, etc. there will always be bugs and regressions. Nothing beats miles on the code and with a rewrite you are removing all those miles and battle tested / hardened code with new code which doesn’t have any of that.

Luckily after a bunch of research and prototyping I was able to come up with a relatively simple solution which is much less invasive, fully backward compatible and brings some serious improvements all across the board.

Implemented Approach

Now that we know that using DictField and DynamicField is slow and expensive, the challenge is to find a different field type which offers much better performance.

After prototyping and benchmarking various approaches, I was able to find that using binary data field type is the most efficient solution for our problem – when using that field type, we can avoid all the escaping and most importantly, very slow type conversions inside mongoengine.

This also works very well for us, since execution results, workflow results, etc. are just an opaque blob to the database layer (we don’t perform any direct queries on the result values or similar).

That’s all good, but in reality in StackStorm results are JSON dictionaries which can contain all the simple types (dicts, lists, numbers, strings, booleans - and as I recently learned, apparently even sets even though that’s not a official JSON type, but mongoengine and some JSON libraries just “silently” serialize it to a list). This means we still need to serialize data in some fashion which can be deserialized fast and efficiently on retrieval from the database.

Based on micro benchmark results, I decided to settle down on JSON, specifically orjson library which offers very good performance on large datasets. So with the new field type changes, execution result and various other fields are now serialized as JSON string and stored in a database as a binary blob (well, we did add some sugar coat on top of JSON, just to make it a bit more future proof and allow us to change the format in future, if needed and also implement things such as per field compression, etc.).

Technically using some kind of binary format (think Protobuf, msgpack, flatbuffers, etc.) may be even faster, but those formats are primarily meant for structured data (think all the fields and types are known up front) and that’s not the case with our result and other fields – they can contain arbitrary JSON dictionaries. While you can design a Protobuf structure which would support our schemaless format, that would add a lot of overhead and very likely in the end be slower than using JSON + orjson.

So even though the change sounds and looks really simple (remember – simple code and designs are always better!) in reality it took a lot of time to get everything to work and tests to pass (there were a lot of edge cases, code breaking abstractions, etc.), but luckily all of that is behind us now.

This new field type is now used for various models (execution, live action, workflow, task execution, trigger instance, etc.).

Most improvements should be seen in the action runner and workflow engine service layer, but secondary improvements should also be seen in st2api (when retrieving and listing execution results, etc.) and rules engine (when evaluating rules against trigger instances with large payloads).

Numbers, numbers, numbers

Now that we know how the new changes and field type works, let’s look at the most important thing – actual numbers.

Micro-benchmarks

I believe all decisions like that should be made and backed up with data so I started with some micro benchmarks for my proposed changes.

Those micro benchmarks measure how long it takes to insert and read a document with a single large field from MongoDB comparing old and the new field type.

We also have micro benchmarks which cover more scenarios (think small values, document with a lot of fields, document with single large field, etc.), but those are not referenced here.

1. Database writes

This screenshot shows that the new field type (json dict field) is ~10x faster over EscapedDynamicField and ~15x over EscapedDictField when saving 4 MB field value in the database.

2. Database reads

This screenshot shows that the new field is about ~7x faster over EscapedDynamicField and ~40x over EscapedDictField..

P.S. You should only look at the relative change and not absolute numbers. Those benchmarks ran on a relatively powerful server. On a smaller VMs you may see different absolute numbers, but the relative change should be about the same.

Those micro benchmarks also run daily as part of our CI to prevent regressions and similar and you can view the complete results here.

End to end load tests

Micro benchmarks always serve as a good starting point, but in the end we care about the complete picture.

Things never run in isolation, so we need to put all the pieces together and measure how it performs in real-life scenarios.

To measure this, I utilized some synthetic and some more real-life like actions and workflows.

1. Python runner action

Here we have a simple Python runner action which reads a 4 MB JSON file from disk and returns it as an execution result.

Old field type

New field type

With the old field type it takes 12 seconds and with the new one it takes 1.

For the actual duration, please refer to the “log” field. Previous versions of StackStorm contained a bug and didn’t accurately measure / reprt action run time – the end_timestamp – start_timestamp only measures how long it took for action execution to complete, but it didn’t include actual time it took to persist execution result in the database (and with large results actual persistence could easily take many 10s of seconds) – and execution is not actually completed until data is persisted in the database.

2. Orquesta Workflow

In this test I utilized an orquesta workflow which runs Python runner action which returns ~650 KB of data and this data is then passed to other tasks in the workflow.

Old field type

New field type

Here we see that with the old field type it takes 95 seconds and with the new one it takes 10 seconds.

With workflows we see even larger improvements. The reason for that is that actual workflow related models utilize multiple fields of this type and also perform many more database operations (read and writes) compared to simple non-workflow actions.

You don’t need to take my word for it. You can download StackStorm v3.5.0 and test the changes with your workloads.

Some of the early adopters have already tested those changes before StackStorm v3.5.0 was released with their workloads and so far the feedback has been very positive - speed up in the range of 5-15x.

Other Improvements

In addition to the database layer improvements which are the start of the v3.5.0 release, I also made various performance improvements in other parts of the system:

Various API and CLI operations have been sped up by switching to orjson for serializarion and deserialization and various other optimizations.
Pack registration has been improved by reducing the number of redundant queries and similar.
Various code which utilizes yaml.safe_load has been speed up by switching to C versions of those functions.
ISO8601 / RFC3339 date time strings parsing has been speed up by switching to udatetime library
Service start up time has been sped by utilizing stevedore library more efficiently.
WebUI has been substantially sped up - we won’t retrieve and display very large results by default anymore. In the past, WebUI would simply freeze the browser window / tab when viewing the history tab. Do keep in mind that righ now only the execution part has been optimized and in some other scenarios WebUI will still try to load syntax highlighting very large datasets which will result in browser freezing.

Conclusion

I’m personally very excited about those changes and hope you are as well.

They help address one of StackStorm’s long known pain points. And we are not just talking about 10% here and there, but up to 10-15x improvements for executions and workflows which work with larger datasets (> 500 KB).

That 10-15x speed up doesn’t just mean executions and workflows will complete faster, but also much lower CPU utilization and less wasted CPU cycles (as described above, due to the various conversions, storing large fields in the database and to a lesser extent also reading them, was previously a very CPU intensive task).

So in a sense, you can view of those changes as getting additional resources / servers for free – previously you might have needed to add new pods / servers running StackStorm services, but with those changes you should able to get much better throughput (executions / seconds) with the existing resources (you may even be able to scale down!). Hey, who doesn’t like free servers :)

This means many large StackStorm users will be able to save many hundreds and thousands of $ per month in infrastructure costs. If this change will benefit you and your can afford it, check Donate page on how you can help the project.

Thanks

I would like to thank everyone who has contributed to the performance improvements in any way.

Thank to everyone who has helped to review that massive PR with over 100 commits (Winson, Drew, Jacob, Amanda), @guzzijones and others who have tested the changes while they were still in development and more.

This also includes many of our long term uses such as Nick Maludy, @jdmeyer3 and others who have reported this issue a long time ago and worked around the limitations when working with larger datasets in various different ways.

Special thanks also to v3.5.0 release managers Amanda and Marcel.

Consuming AWS EventBridge Events inside StackStorm

2019-07-13T00:00:00+02:00

Consuming AWS EventBridge Events inside StackStorm

Amazon Web Services (AWS) recently launched a new product called Amazon EventBridge.

EventBridge has a lot of similarities to StackStorm, a popular open-source cross-domain event-driven infrastructure automation platform. In some ways, you could think of it as a very light weight and limited version of StackStorm as a service (SaaS).

In this blog post I will should you how you can extend StackStorm functionality by consuming thousands of different events which are available through Amazon EventsBridge.

Why?

First of all you might ask why you would want to do that.

StackStorm Exchange already offers many different packs which allows users to integrate with various popular projects and services (including AWS). In fact, StackStorm Exchange integration integration packs expose over 1500 different actions.

StackStorm Exchange aka Pack Marketplace.

Even though StackStorm Exchange offers integration with many different products and services, those integrations are still limited, especially on the incoming events / triggers side.

Since event-driven automation is all about the events which can trigger various actions and business logic, the more events you have access to, the better.

Run a workflow which runs Ansible provision, creates a CloudFlare DNS record, adds new server to Nagios, adds server to the loadbalancer when a new EC2 instance is started? Check.

Honk your Tesla Model S horn when your satellite passes and establishes a contact with AWS Ground Station? Check.

Having access to many thousands of different events exposed through EventBridge opens up almost unlimited automation possibilities.

For a list of some of the events supported by EventsBridge, please refer to their documentation.

Consuming EventBridge Events Inside StackStorm

There are many possible ways to integrate StackStorm and EventBridge and consume EventBridge events inside StackStorm. Some more complex than others.

In this post, I will describe an approach which utilizes AWS Lambda function.

I decided to go with AWS Lambda approach because it’s simple and straightforward. It looks like this:

AWS / partner event -> AWS EventBridge -> AWS Lambda Function -> StackStorm Webhooks API

Event is generated by AWS service or a partner SaaS product
EventBridge rule matches an event and triggers AWS Lambda Function (rule target)
AWS Lambda Function sends an event to StackStorm using StackStorm Webhooks API endpoint

1. Create StackStorm Rule Which Exposes a New Webhook

First we need to create a StackStorm rule which exposes a new eventbridge webhook. This webhook will be available through https://<example.com>/api/v1/webhooks/eventbridge URL.

wget https://gist.githubusercontent.com/Kami/204a8f676c0d1de39dc841b699054a68/raw/b3d63fd7749137da76fa35ca1c34b47fd574458d/write_eventbridge_data_to_file.yaml
st2 rule create write_eventbridge_data_to_file.yaml

name: "write_eventbridge_data_to_file"
pack: "default"
description: "Test rule which writes AWS EventBridge event data to file."
enabled: true

trigger:
  type: "core.st2.webhook"
  parameters:
    url: "eventbridge"

criteria:
  trigger.body.detail.eventSource:
    pattern: "ec2.amazonaws.com"
    type: "equals"
  trigger.body.detail.eventName:
    pattern: "RunInstances"
    type: "equals"

action:
  ref: "core.local"
  parameters:

    cmd: "echo \"{{trigger.body}}\" >> ~/st2.webhook.out"

You can have as many rules as you want with the same webhook URL parameter. This means you can utilize the same webhook endpoint to match as many different events and trigger as many different actions / workflows as you want.

In the criteria field we filter on events which correspond to new EC2 instance launches (eventName matches RunInstances and eventSource matches ec2.amazonaws.com). StackStorm rule criteria comparison operators are quite expressive so you can also get more creative than that.

As this is just an example, we simply write a body of the matched event to a file on disk (/home/stanley/st2.webhook.out). In a real life scenario, you would likely utilize Orquesta workflow which runs your complex or less complex business logic.

This could involve steps and actions such as:

Add new instance to the load-balancer
Add new instance to your monitoring system
Notify Slack channel new instance has been started
Configure your firewall for the new instance
Run Ansible provision on it
etc.

2. Configure and Deploy AWS Lambda Function

Once your rule is configured, you need to configure and deploy AWS Lambda function.

You can find code for the Lambda Python function I wrote here - https://github.com/Kami/aws-lambda-event-to-stackstorm.

I decided to use Lambda Python environment, but the actual handler is very simple so I could easily use JavaScript and Node.js environment instead.

git clone https://github.com/Kami/aws-lambda-event-to-stackstorm.git
cd aws-lambda-event-to-stackstorm

# Install python-lambda package which takes care of creating and deploying
# Lambda bundle for your
pip install python-lambda

# Edit config.yaml file and make sure all the required environment variables
# are set - things such as StackStorm Webhook URL, API key, etc.
# vim config.yaml

# Deploy your Lambda function
# For that command to work, you need to have awscli package installed and
# configured on your system (pip install --upgrade --user awscli ; aws configure)
lambda deploy

# You can also test it locally by using the provided event.json sample event
lambda invoke

You can confirm that the function has been deployed by going to the AWS console or by running AWS CLI command:

aws lambda list-function
aws lambda get-function --function-name send_event_to_stackstorm

And you can verify that it’s running by tailing the function logs:

LAMBDA_FUNCTION_NAME="send_event_to_stackstorm"
LOG_STREAM_NAME=`aws logs describe-log-streams --log-group-name "/aws/lambda/${LAMBDA_FUNCTION_NAME}" --query logStreams[*].logStreamName | jq '.[0]' | xargs`
aws logs get-log-events --log-group-name "/aws/lambda/${LAMBDA_FUNCTION_NAME}" --log-stream-name "${LOG_STREAM_NAME}"

2. Create AWS EventBridge Rule Which Runs Your Lambda Function

Now we need to create AWS EventBridge rule which will match the events and trigger AWS Lambda function.

AWS EventBridge Rule Configuration

As you can see in the screenshot above, I simply configured the rule to send every event to Lambda function.

This may be OK for testing, but for production usage, you should narrow this down to the actual events you are interested in. If you don’t, you might get surprised by your AWS Lambda bill - even on small AWS accounts, there are tons of events being being constantly generated by various services and account actions.

3. Monitor your StackStorm Instance For New AWS EventBridge Events

As soon as you configure and enable the rule, new AWS EventBridge events (trigger instances) should start flowing into your StackStorm deployment.

You can monitor for new instances using st2 trace list and st2 trigger-instance list commands.

AWS EventBridge event matched StackStorm rule criteria and triggered an action execution.

And as soon as a new EC2 instance is launched, your action which was defined in the StackStorm rule above will be executed.

Conclusion

This post showed how easy it is to consume AWS EventBridge events inside StackStorm and tie those two services together.

Gaining access to many thousand of different AWS and AWS partner events inside StackStorm opens up many new possibilities and allows you to apply cross-domain automation to many new situations.

Our airport setup (weather station, cameras and LiveATC audio feed)

2017-09-18T00:00:00+02:00

Our airport setup (weather station, cameras and LiveATC audio feed)

In this post I describe how I set up weather station, live camera feed and liveatc.net audio feed for our little airstrip.

Real-time camera feed and weather information displayed on the club website.

Background

A while back I moved from Ljubljana to a small town around 30 km East of Ljubljana. Before I moved here, I used to fly small single engine planes (for fun) out of Ljubljana Airport and Portoroz Airport.

This means that both of those two airports are now too far to regularly fly out of them. With no traffic, it would take around 1 hour to get to Ljubljana Airport and around 2 hours to get to Portoroz Airport. Those two hours can easily turn into 3 hours during peak summer time when highway is full of holiday travelers (done that once, have no plans to repeat it).

Regular flying is very important to stay current and be a safe pilot so I needed to find a new airport which is closer by. Luckily there are two airports in the 30 minutes driving distance - Airport Zagorje ob Savi¹ and Airport Sentvid Pri Sticni.

Zagorje ob Savi Airport (Vzletisce Ruardi).

Both of themare small, un-towered general aviation airports with a relatively short grass runway.

For some reason, I decided to visit Zagorje ob Savi Airport and club first. There I met a bunch of dedicated, friendly and welcoming people. They were the main reason I decided to make this my new “home airport”.

Both of the airports I used to fly from, were bigger towered international airports (for Slovenian standard, but in global scheme of things, they are still really small and low traffic airports). Compared to those airports, this small airport also feels a lot more casual, social and homey[^2].

As a big DIY person, I really love connecting / automating / improving things with technology, so I also wanted to bring some of those improvements to the airport.[^7]

In this post you can learn a little about our setup - notably our weather station, web camera feed and LiveATC audio feed.

Internet and Networking Setup

To be able to add some technological improvements to the airport, we first needed an internet connection. Sadly there is no fixed internet connection at the airport, but there is good 3G / 4G coverage so I decided to re-purpose one of my old Android phones to act as an access point.

I purchased a local prepaid plan with 50 GB monthly data cap and utilized tethering functionality on an Android phone. So far, the whole setup has been online for almost a month and we haven’t encountered any issues yet.

Based on the data usage calculations I did[^4], 50 GB should also be plenty.

Because the devices are behind NAT and all the incoming connections and ports are blocked, I set up an SSH tunnel and port forwarding from a Rasperry Pi which is behind NAT to an outside cloud-server to which I have access. This way I can tweak and change various settings, without needing to be physically present at the airfield.

Weather Station

Weather is one of the most important factors for flying (especially for small single engine planes in VFR conditions) so getting a weather station was the first thing I did.

It’s also worth noting that the airfield already had a wind sock before. Since wind is a very important factor in flying, windsock is one of the many mandatory things a place needs to be officially classified as an airfield or an airport.

I did some research and I decided to purchase a Fine Offset WH2900 weather station.

Here is a list of minimum featuresthat a weather station needs to have for me to consider the purchase.

temperature, dew point data
weather data (direction, speed)
pressure data
rainfall data - nice to have, but not mandatory
ability to easily fetch data via USB or similar, or ability to automatically send data to the internet

Fine Offset WH2900 weather station, sold under various different brands and names.

This weather station offered all these features and based on the specifications and reviews I read online, it offered the best price / performance ratio in this range (there is of course no upper limit for the price and you could easily spend more than 1000$ for a more professional weather station).

It’s also worth noting that Fine Offset is a Chinese OEM and I purchased the station from a German company on Amazon under a different name (there are many companies which sell those weather stations under their own brand).

Luckily, the tower at the airport already had an antenna pole on which we could install the station. The location of the weather station is one of the most important factors if you want accurate data. Trees, houses and other obstacles in the proximity can all affect the readings.

weathe station installing data tower

The pole itself offered a good location, because it’s quite high (around 6 meters) and relatively far from things which could affect the readings.

The setup itself was simple, because the weather station only needs to be connected to the wireless access point and configured to send data to Weather Underground.

Sending data directly to WU is good because it’s easy to set up, but it also means you are locked-in to an online service and there is no official way for the weather station to send data to your personal server or directly grab data from it.

As a big open-source proponent and believer that everyone should have access to their own data, this was especially a big downside for me. This and lack of security are also one of the main reasons why I’m so skeptical and worried about the whole IoT movement.

Luckily, there is a great open-source project called weewx out there which allows you to fetch data from the station. This project retrieves data from the weather station by sniffing and reverse engineering the network traffic sent to WU by the weather station.

Now that the station was set up, we can easily see this data in real-time on the LCD display which also acts as a base station and online on Weather Underground.

Real-time weather data as shown on Weather Underground.

It would also come handy for this data to be available in concise form via text message and over the phone so I decided to implement “poor man’s ATIS / AWOS”.

For that, I utilized Weather Underground API, some Python and Plivo telephony service. Python script fetches real-time data using the WU API, caches it and process it so it works with Plivo text-to-speech API.

I originally tried to use Twilio as I had experience using it in the past, but it couldn’t get it to work with a Slovenian inbound phone number. I was able to reach the phone number if I used Skype call or foreign phone number, but none of the Slovenian telephony providers were able to route phone calls to it.

In addition to this data (wind, pressure, temperature / dew point) being useful to flying, we can now also observe various short and long term weather patterns and trends. If you are a bit of a weather geek like myself, that’s quite cool.

Very small temperature and dew point spread? Very likely there is fog and low clouds at the airfield.

Web Cameras

Ability to view real-time weather information is useful, but it would be even better if you could also see in real-time what is going on at the airport.

And that’s where the web cameras come in.

In addition to allowing you to see what (if anything) is going on at the airfield, cameras are a great addition to real-time weather information when you are evaluating weather. They allow you to see real-life conditions and things such as cloud coverage, fog / mist, etc.

I already had a bunch of Foscam C1 wide-angle HD cameras at home so I decided to lease / move two of them to the airfield.

Foscam C1. A relatively in-expensive IP camera which offers decent image quality.[^5]

Since the cameras are meant to be used indoors, we installed them inside the tower to protect them from bad weather and vandalism. The picture is not crystal clear because there is a privacy foil on the glass which makes the picture look a bit darker then normal and there is also some glare from the sun depending on the sun position, but it’s good enough for now.

Now that the cameras were installed, they needed to be configured to stream / send images to the internet. This camera and many other IP cameras allow you to connect to the camera to view stream in real-time or to periodically upload image to a FTP server.

Example image from one of the cameras.

It’s worth noting that this and most other IP cameras are very insecure by default and in many cases, can’t be made more secure (no SSL, many security vulnerabilities, etc.).

This is not so problematic if camera is located in a public place such as is the case here, but I still decided to make it as secure as possible.

The cameras are behind NAT and all the incoming connections and ports are blocked which solves part of a problem. Second part is solved by utilizing “periodically upload image” to FTP functionality. Sadly the camera only supports FTP (insecure) and not FTPS (secure) so I needed to get a bit creative.

I configured the cameras to send images to an FTP server which is running inside a local network on a Rasperry Pi 3 based Linux server. Rasperry Pi then periodically sends those images to a cloud server over FTPS.

This cloud server is also used to automatically process those images (create thumbnails, optimize image size, create time-lapse gifs, add watermark, etc.) and serve them over HTTPS.[^6]

Those real-time camera images can be accessed on the website. In addition to the real-time images, you can also view time-lapse videos for the past hour on the website.

Time-lapse videos are generated from raw camera images every 10 minutes for the period of now - 60 minutes. Serving the time-lapse videos presents some unique challenges. A lot of people will access the website through a phone which can’t (easily) play videos so instead of generating a video file, I decided to generate an animated .gif image.

The problem with that is that animated gif is really a poor storage format for what is basically a video, so I needed to experiment a bit with ImageMagick command line options to find the combination which offers the best quality / size ratio.

By default, generated .gif size was around 28-45 MB, but I managed to get it down to 5-12MB with a reasonable quality.

In addition to the time-lapse videos which are generated automatically for the past hour, raw camera images can be used to generate time-lapse videos of the various interesting time frames (e.g. when a storm passes, etc.). An example of such time-lapse video can be found below.

LiveATC audio feed

Now that the weather station and cameras were in place, the only thing missing was real-time audio feed from the airport.

The airport is un-towered which means there is no ATC, but we do have UNICOM / CTAF station on a local sport frequency of 123.500 MHz. This station is active when there are multiple people flying and offers traffic and runway condition advisories.[^3]

Nowadays it’s very easy very to set up software based radio receiver by utilizing a USB DVB-T TV tunner dongle based on the RTL2832U chipset. The chip covers a very wide frequency range (24 - 1766 MHz), but if you want good sound signal quality you need to use an antenna specifically optimized for a frequency range you want to receive.

In this case, that is the airband frequency range (108 - 140 MHz). I purchased an inexpensive 30$ airband whip antenna which seems to be doing a good job of receiving transmission from the airplanes in vicinity. It’s also worth noting that VHF frequencies signal follows line of sight propagation which means the antenna itself should be mounted as high as possible and away from any obstructions.

To receive the signal and stream it to the internet, I connected USB dongle to Rasperry Pi model 3, installed rtl-sdr library and RTLSDR-Airband.

Some of the used equipement.

RTLSDR-Airband is an open-source Linux program optimized for AM voice channel reception and sending this data to online services such as liveatc.net.

Before setting up Rasperry Pi, I connected USB dongle to my computer and used open source Gqrx SDR program which allows you to visualize received signal (strength, noise, etc.) and tweak various parameters.

Gqrx is a great open-source program for software defined radio.

This was an important step, because simply connecting dongle to the Rasperry Pi and setting the frequency in rtl airband config usually doesn’t yield good results out of the box.

Gqrx allows you to find the optimal frequency and other settings (gain, squelch, etc.) where the signal is the strongest and audio quality is the best.

If the whole thing wouldn’t work out well, I also had a backup plan. Backup plan involved using a USB sound card and connecting headphones output on the handheld airband radio / transceiver directly to the line-in on the sound card.

Once I found the best settings, I plugged dongle to Rasperry Pi, configured rtl_airband settings and configured rtl_airband to stream audio to my self hosted Icecast server to test it out.

After I confirmed everything was working, I contacted people at LiveATC for settings for their Icecast server so our feed would also be included on LiveATC.

There were some issues because our airfield was the first one without an ICAO identifier, but they were very accommodating and in the end we managed to find a good compromise. The compromise was listing our feed under Ljubljana / LJLJ page.

Our feed (including the archives) is available on LiveATC.net.

Future Improvements

The whole setup has been running less than a month so far and I already have some ideas for future improvements and additions.

Traffic Feed

Now that we have live audio feed, the only thing missing is a life traffic feed - ability to see which airplanes are flying in the area.

There are already services such as flightradar24 out there which allow you to track aircraft movements in real time, but mostly, they only track commercial traffic via ADS-B (they also cover some GA traffic, but because of their receiver locations, they only cover airplanes flying at higher altitudes).

There are a couple of ways to track aircraft movements:

ADS-B
Via transponder (Mode C / Mode S)
FLARM (proprietary protocol, mostly used in gliders)

ADS-B provides the most information, but it’s not widely used here in Slovenia (yet).

Mode C and Mode S transponders are more common here. Mode C is not all that useful because it only provides altitude, but Mode S also provides other data.

Both of those approaches use the same frequency (1090 MHz) so it’s possible to receive those signals using the same RTL-SDLR USB dongle which is used to receive audio signal for LiveATC feed.

So in theory, all it takes is a USB RTL-SDLR dongle plugged to Rasperry Pi, antenna optimized for this frequency range and some software on Rasperry Pi which interacts with RTL-SDLR and processes the received signal.

In fact, there are already some projects such as PilotAware out there so we might be able to utilize existing software and won’t need to write our own.

pilotaware radar screen / traffic screen

Power Bank Based UPS

All the devices are connected to the electrical network and haven’t had any power outage yet, but outages can happen so it would be good to be able to connect all the devices to a UPS.

Luckily, all the devices consume very little power so it should be possible to make a simple UPS a USB power bank which supports pass through charging and has a capacity of at least 15.000 mAh.

Equipment List and Costs

For the reference, here is a full list of the physical equipment used:

Fine Offset 2900 Weather Station - $200$
2x Foscam C1 - 2x ~70$ (used to be around 100$ when I first got it, but now you can get it for much less)
Rasperry Pi model 3, enclosure, micro SD card - ~80$
Whip airband antenna - ~25$
NooElec USB RTL-SDR dongle - ~30$
USB sound card - ~30$ (optional and only needed if you want to use airband transceiver / scanner instead of the DVB-T tunner for receiving audio)
MCX to BNC adapter - ~10$

In addition to that, other costs include 10$/month for a cloud server and 17$/month for a 4G data plan.

doesn’t meet all the local Civil Aviation Authority rules and standards for an airport, most notably, the minimum runway length. [^2]: “casual” and “homey” don’t need to mean unsafe. For me, safety is the most important factor and always comes first so I never compromise on it. [^3]: Being a non-ATC facility means it can only offers advisories and recommendations, but no clearances or similar. Pilots perform their actions based on their discretion at their own risk. It would be foolish to do so, but pilots can also chose to ignore those recommendations if they wish so. [^4]: New images are uploaded from both of the cameras every 10 seconds, and image size is around 80-150 KB, the audio stream is 16kbps and weather station uses negligible amount of data. For the worst case scenario, this gives a consumption of less than 1.1 GB per day. [^5]: As noted before, most of the popular IP cameras and other IoT devices have a very poor security record so you should be very careful when you install them in a private place and connect them to the internet. Ideally they should be deployed in an isolated network which doesn’t have access to the internet. [^6]: Free Let’s Encrypt SSL certificate is used and CloudFlare is used as a reverse proxy to act as a caching layer in-front of the webserver. [^7]: Flying itself is rewarding and a lot of fun, but being a tech person myself, working on side projects like this one is even more satisfactory and fun than flying :)

Well, technically it’s an airstrip (vzletisce) and not an airport. It ↩

Libcloud now supports OpenStack Identity (Keystone) API v3

2014-08-23T00:00:00+02:00

Libcloud now supports OpenStack Identity (Keystone) API v3

I have recently pushed support for OpenStack Identity API v3 to Libcloud trunk. In this blog post I’m going to have a look at the motivation for that, changes which were involved and show some examples of how you can utilize those changes and newly available features.

What is OpenStack Keystone / Identity service?

OpenStack Keystone is an OpenStack project that provides identity and authentication related features to OpenStack projects such as Nova.

The project started as a simple service which only provided basic authentication features, but it has since grown into a fully fledged and powerful identity management service.

The latest version supports advanced user management, multiple projects, complex ACLs and more.

Future release will also include a Keystone to Keystone federation feature which will makes things such as a seamless cross-cloud authorizations possible.

Motivation

Support for OpenStack Nova was first added to Libcloud back in 2011. First version only included support for a simple token based authentication.

Since then a lot has changed and new (and more flexible) OpenStack Keystone versions have been released. We have been pretty good at following those changes and support for authenticating against Keystone API v2.0 has been available in Libcloud for a long time.

Those changes worked fine, but the problem was that not much thinking went into them and support for multiple Keystone versions was added after the fact. This means that the code was hacky, inflexible, hard to re-use and extend.

Luckily, those things were (mostly) hidden from the end user who just wanted to connect to the OpenStack installation. They only became apparent if you wanted to talk directly to the Keystone service or do anything more complex with it.

For one of the features we are working on at DivvyCloud, we needed support authenticating and talking to OpenStack Keystone API v3. Since Libcloud didn’t include support for this version yet, I decide to go ahead and add it.

All of the “hackiness” of the existing code also became very apparent when I wanted to add support for API v3. Because of that, I have decided to spend more time on it, do it “the right way” and refactor the existing code to make it more re-usable, extensible and maintainable.

Refactoring the existing code

Before my changes, all of the logic for talking to Keystone, handling of the token expiration, re-authentication, etc. was contained in a single class (OpenStackAuthConnection).

To authenticate, there was one method per Keystone API version (authenticate_1_0, authenticate_1_1, authenticate_2_0_with_apikey, authenticate_2_0_with_password). This means there was a lot of duplicated code, the code was hard to extend, etc.

I went ahead and moved to a “base class with common functionality” + “one class per Keystone API version” model. This approach has multiple advantages over the old one:

the code is easier to re-use, maintain and extend
version specific functionality is available via methods on the version specific class
less coupling

Some other notable changes are described bellow.

All of the identity related code has been moved from libcloud.common.openstack to a new libcloud.common.openstack_identity module.

This module reduces coupling between general OpenStack and Identity related code and makes code re-use and other things easier.

Before my changes, parsed service catalog entries were stored in an unstructured dictionary on the OpenStackServiceCatalog class. To make things even worse, the structure and the contents of the dictionary differed based on the Keystone API version.

Dynamic nature of Python can be a huge asset and can make development and prototyping faster and easier. The problem is that when it’s abused / overused it makes code hard to use, maintain and reason about. Sadly, that’s pretty common in the Python world and many times, people tend to over-use dictionaries and base their APIs around passing around unstructured dictionaries.

I refactored the code to store service catalog entries in a structured format (a list of OpenStackServiceCatalogEntry and OpenStackServiceCatalogEntryEndpoint objects).

Now only the code which parses service catalog responses needs to know about the response structure. The user itself doesn’t need to know anything about the internal structure and the code for retrieving entries from the service catalog is API version agnostic.

In addition to the changes mentioned above, OpenStackIdentity_3_0_Connection class now also contains methods for performing different administrative related tasks such as user, role, domain and project management.

Examples

This section includes some examples which show how to use the newly available functionality. For more information, please refer to the docstrings in the openstack_identity module.

Authenticating against Keystone API v3 using the OpenStack compute driver

This example shows how to authenticate against Keystone API v3 using the OpenStack compute driver (for the time being, default auth version used by the compute driver is 2.0).

from pprint import pprint

from libcloud.compute.types import Provider
from libcloud.compute.providers import get_driver

cls = get_driver(Provider.OPENSTACK)
driver = cls('<username>', '<password>',
             ex_force_auth_version='3.x_password',
             ex_force_auth_url='http://192.168.1.100:5000',
             ex_force_service_type='compute',
             ex_force_service_region='regionOne',
             ex_tenant_name='<my tenant>')

pprint(driver.list_nodes())

Obtaining auth token scoped to the domain

This example show how to obtain a token which is scoped to a domain and not to a project / tenant which is a default.

Keep in mind that most of the OpenStack services don’t yet support tokens which are scoped to a domain, so such tokens are of a limited use right now.

from pprint import pprint

from libcloud.common.openstack_identity import OpenStackIdentity_3_0_Connection
from libcloud.common.openstack_identity import OpenStackIdentityTokenScope

driver = OpenStackIdentity_3_0_Connection(auth_url='http://<host>:<port>',
                                          user_id='admin',
                                          key='<key>',
                                          token_scope=OpenStackIdentityTokenScope.DOMAIN,
                                          domain_name='Default',
                                          tenant_name='admin')

driver.authenticate()
pprint(driver.auth_token)

Talking directly to the OpenStack Keystone API v3

This example shows how to talk directly to OpenStack Keystone API v3 and perform administrative tasks such as listing users and roles.

from pprint import pprint

from libcloud.common.openstack_identity import OpenStackIdentity_3_0_Connection
from libcloud.common.openstack_identity import OpenStackIdentityTokenScope

driver = OpenStackIdentity_3_0_Connection(auth_url='http://<host>:<port>',
                                          user_id='admin',
                                          key='<key>',
                                          token_scope=OpenStackIdentityTokenScope.PROJECT,
                                          tenant_name='admin')

# This call doesn't require authentication
pprint(driver.list_supported_versions())

# The calls bellow require authentication and admin access
# (depends on the ACL configuration)
driver.authenticate()

users = driver.list_users()
roles = driver.list_roles()

pprint(users)
pprint(roles)

A quick note on backward compatibility

If you only use OpenStack compute driver, those changes are fully backward compatible and you aren’t affected.

If you use OpenStackAuthConnection class to talk directly to the Keystone installation, you need to update your code to either use the new OpenStackIdentityConnection class or a version specific class since OpenStackAuthConnection class has been removed.

Libcloud at ApacheCon NA, 2014 in Denver, Colorado

2014-03-07T00:00:00+01:00

Libcloud at ApacheCon NA, 2014 in Denver, Colorado

This is just a quick heads up that I will be attending ApacheCon North America, 2014 in Denver, Colorado next month.

I will be giving two talks. First one is titled 5 years of Libcloud. This is retrospective talk where I will tell a story of how Libcloud grew from a small project originally developed for the needs of Cloudkick product into a fully fledged and relatively popular Apache project.

I will go into details on some of the challenges we faced, what we learned from them and how we grew the community and the project.

Second one is titled Reducing barriers to contribution to an Apache project.

To give you some context, I first need to say that I’m a person who loves to move fast and loves lean and efficient approaches and teams. On top of that, I also have zero tolerance for unnecessary / useless processes and deep and mostly useless hierarchies.

All of that means I don’t see myself a big company person, where having useless processes, which among many other things, slow innovation down is usually the norm.

Apache is a relatively big organization which means it has it’s own fair share of (useless) proceses. A lot of “new era” developers who grew up with Github also consider Apache as slow, inflexible and place where projects go to die¹.

In this talk I will go into details why this is not totally true and how Apache is (slowly) becoming more flexible and changing to adopt those new work-flows.

On top of that, I will also give some examples on how you can adopt those new work-flows, iterate fast and still receive all the benefits from being an Apache project. Those examples will be taken directly from the things we have learned at the Apache Libcloud project.

Depending on how many people will attend the talk, I think it would also be very interesting to turn this into a panel where other people can contribute their ideas and we can discuss how to reduce barriers even further and make Apache more attractive for “new-era projects”.

Besides my talks, Sebastien Goasguen, a long time contributor to the project who has recently joined the PMC is also giving a talk titled Apache Libcloud.

If you are around, you should stop by to listen to those talks and contribute your ideas to my second talk.

Citation needed. I’m kinda lazy and tired atm, but you can Google it up. ↩

Libcloud Google Summer of Code 2014 Call for Participation

2014-02-11T00:00:00+01:00

Libcloud Google Summer of Code 2014 Call for Participation

This is call for participation / proposals for all the students who are interested in working on Apache Libcloud project during the Google Summer of Code 2014 program.

Before diving further, I just want to give a short disclaimer. We (Apache Software Foundation and Libcloud as a project) haven’t been accepted into Google Summer of Code 2014 yet, but we will apply, cross our fingers and hope we get a spot.

We will know if we have been accepted on February 24th when Google publishes a list of the accepted mentoring organizations.

What is Google Summer of Code?

Google Summer of Code is a program where Google sponsors students from around the world to spend their summer working on open-source projects. Student is paid 5500$ if they successfully complete all of their evaluations. More information about the program can be found on the project website.

This year is also special, because it’s a tenth anniversary of the program. To celebrate the anniversary, Google is, among other things, giving out 5500$ for successfully completed projects instead of the usual 5000$.

Google Summer of Code and Libcloud

Apache Software Foundation is not a stranger to Google Summer of Code since it has already participated in this program multiple times over the past years.

We (as in Libcloud project) have also participated in GSoC 2014 with one project. I have mentored a student Ilgiz Islamgulov from Russia who has worked and successfully completed (yeah, software is never really done, but in this case completed refers to him passing all the GSoC evaluations) a project called Libcloud REST interface.

If you want to know more about his project and our GSoC 2012 participation, you should check the following links:

Why should I participate / What do I get out of participating in GSoC?

Before looking at our call for proposals, let’s have a look at why you might want to participate in Google Summer of Code program.

For a moment, lets forget about the money and have a look at a couple of other reasons why participating in GSoC is great for you:

Instead of spending your summer flipping burgers (or similar) you can spend summer flipping bits (fun!)
You will learn how open source projects and communities work
You will get experience with working in distributed / remote teams
You will get experience with working across different timezones
You will make new friends in the open source community
It’s a great thing for your C.V. You will be able to show potential employers some concrete things you have worked on.

Libcloud Google Summer of Code 2014 Call For Proposals

This is the part where you, dear students come in. We would like to invite all the students who are interested in participating to start thinking about the things they could work on and start reaching out to the community.

It doesn’t matter if you are already using or contributing to the project or not, everyone is welcome (people who have or are already contributing to the project have a slight advantage though since we already know what they are capable of)!

Only pre-requisite is a good knowledge of Python, HTTP, REST APIs and a basic familiarity with different cloud services such as Amazon EC2 and others.

As noted in the opening paragraph, we haven’t been accepted yet and the student applications will open in about a month. The reason I’m already posting this is because we want to give potential candidates plenty of time to get familiar with the project and the community.

If you would like to participate, now is the right time to start exploring the existing ideas (you are also more than welcome to propose your own ideas), start thinking about the things you could work, getting familiar with the code base and start reaching out to the community.

New Libcloud website is now live

2014-01-26T00:00:00+01:00

New Libcloud website is now live

I’m happy to announce that a new Libcloud website is now live at libcloud.apache.org.

Design and layout wise, previous website hasn’t really changed since 2009 so a makeover was long overdue.

New website.

The new website includes many new features and improvements. One of the more important ones is a new and fully responsive design which means that the content can now also more easily be consumed on devices with smaller resolutions such as mobile phones and tablets.

On top of that the new website is now powered by Jekyll (same as my blog) which makes adding content and many other things easier.

Without further ado, I encourage you to go check out the new website and read the announcement blog post.

Say hello to Živa

2014-01-24T00:00:00+01:00

Say hello to Živa

This week I made a big decision. I finally decided to get a dog. In this blog post I’m going to introduce you to my new 24kg / 52 lbs heavy fluffy meatball friend called Živa.

Živa on the way to her new home.

Why get a dog in the first place?

First I need to say that I do like cats, but I was always more of a dog person. I always wanted my own dog, but haven’t really had a chance before.

When I still lived at my parents place, I couldn’t do it because we lived in an apartment and my parents didn’t allow it.

Later on when I moved to the US I was thinking of getting a dog again, but after some more thought I decided it would be irresponsible for me to do it. I did have a nice apartment with a patio, but the problem is that dogs need a lot of attention and I wasn’t really planning in staying in the US for the long term. I was working a lot and leaving the dog at home by itself for a half day or more just seems irresponsible to me.

On top of that, finding an apartment and landlord in San Francisco which allows larger dogs is quite hard and makes moving very painful.

Another opportunity showed itself again recently when I moved back to Slovenia. I decided that I want to stay in Europe for the foreseeable future and work from home. On top of that, my landlord here in Ljubljana has no problem with me having a dog inside the apartment.

Working from home means I can dedicate enough attention and love to the dog and that is also one of the main reasons why I decided to get it.

Adopting a dog from a shelter

A lot of people today buy puppies from local dog breeders. The problem is that all puppies are cute and a lot of people don’t realize that dogs are a big responsibility, especially when they grow up. Usually this leads to a lot of dogs being dumped once they grow up.

I’m not like that so I decided to adopt a slightly older dog. In top of that, I believe that dogs living in the shelters need more help so I decided to adopt one from a local shelter.

Before visiting the shelter in person, I have visited the website and immediately fell in love with a lovely young female called Živa. I know that dogs are very similar to people and outside look is not everything so I didn’t keep my hopes too high.

Luckily, when I visited the shelter in person it turned out that she’s one of the friendliest dogs there. Unlike other dogs, she was very friendly, didn’t bark and simply stuck her nose through the fence to greet me.

After getting to know her a little better, I came back on Monday and decided to adopt her.

Živa

Without further ado, lets get to know this lovely (not so) little creature.

Živa is a mixed breed (looks very much like a German shepherd) ~10 months old female which weighs around 24 kg / 52 lbs. She was found on the street and arrived in the shelter about two months ago. Her name is in Slovenian and stands for lively / playful / cheery. She got this name at the shelter and I decided to keep it since it describes her temperament really well.

Resting.

She is a very friendly, quiet and a kind dog, but she also has almost unlimited energy and needs a tons of play time.

Tummy rub time.

Even though she is very friendly she is currently still afraid of a lot of things including moving vehicles, bridges, stairs and so on. Interesting enough, she is not afraid of other people and dogs. She loves to jump on and kiss people :-)

Dat look.

I don’t know her history, but being afraid of so many thing probably means that her previous owner didn’t socialize her much / enough.

First day with a dog

First day was very happy, but also very stressful for both of us. We both didn’t get a lot of sleep (she was constantly checking on me and I was checking on her) and there were some potty accidents. Potty accidents were mostly my fault because I didn’t recognize that she needs to go to the toilet (I thought she just wants to play).

Deer antler is nom nom.

Mmm, bone marrow...

Plans for the future

First couple of days were great, but there is still a tons of things to do in the future. She needs more socializing, I need to train her not to pull and she needs to be trained to not be so afraid of many things.

As far as the pulling goes, I tried some manual approaches without much success. I’ve ordered an anti pull harness which goes around her front legs and doesn’t hurt her. Hopefully that will help.

On top of that, I also plan to take her to the dog school. The dog school is actually more for me than her (this is my first dog).

Migrating from Zerigo to Rackspace Cloud DNS using Libcloud

2014-01-18T00:00:00+01:00

Migrating from Zerigo to Rackspace Cloud DNS using Libcloud

In this blog post I’m going to describe how to migrate from Zerigo DNS to Rackspace Cloud DNS using a ~80 lines long Python script which utilizes Libcloud.

Background and Motivation

In September of the last year, I wrote how to export a Libcloud zone to the BIND zone format and use the BIND zone file to migrate between DNS providers.

At that time, my motivation for migrating away from Zerigo was mostly fueled by a very unreliable service which was a consequence of DDoS attacks and less than ideal service architecture.

I have a paid Zerigo plan, so back then, I only migrated the most important domains to a different provider. Not long after I have done this, Zerigo announced that they have partnered with Akamai and that going forward, they will outsource running of the DNS infrastructure to Akami and as such, the service should be way more stable and reliable.

I thought great, I won’t need to migrate rest of the domains away, but an unplesant surprise came earlier this month, when Zerigo announced pricing changes (see 1, 2, 3 & 4).

Previously, I have paid 19$ years per year, but with a new plan which matches my current one, I would need to pay 25$ per month. That’s with an existing customer loyalty discount. New customers will need to pay 38$ per month (what a great deal, instead of paying 24 times more, now I need to pay just 15 times more!). Yes, you have read this correctly, that’s more than one order of magniture per year more than I used to pay before.

I honestly don’t mind paying for a great software and services and I wouldn’t mind paying a little more if the service improved, but that kind or price increase is simply too much. That is especially true, because all of the ~15 domains that I still have at Zerigo are used to host non-profit and community websites and paying 25$ per month is simply too much.

Why Rackspace Cloud DNS?

Disclamer: I used to work at Rackspace, but I don’t work there anymore and I’m not affiliated with them in any way.

Before I dive further, lets have a look at why you might want to use Rackspace Cloud DNS.

The main reason for me to migrate to Rackspace is that they have a decent API, they are supported in Libcloud and best of all, the service is totally free for the existing cloud servers customers. On top of that, the service is supposed to use Anycast.

All of that made it a good fit for hosting my non-profit domains there.

I also need to add that I haven’t used the service a lot before, so I can’t really talk much about the service relablitity at this point. Only time and monitoring will tell how reliable the service really is.

Migrating from Zerigo DNS to Rackspace Cloud DNS using Libcloud

Instead of using Libcloud’s export to BIND zone file functionality, this script works by talking directly to both of the provider APIs.

The reason for that is that this approach is more robust and makes performing partial migrations and synchronizations easier. On top of that it also works with other providers which don’t support importing a BIND zone file.

It’s also important to note that the script relies on some Libcloud fixes which are currently only available in trunk. As such, you should use pip to install latest version from Git inside a virtual environment:

pip install -e git+https://github.com/apache/libcloud.git@trunk#egg=libcloud

After you have done this, you can use the script bellow to migrate all of your zones from Zerigo to Rackspace:

import hashlib

from libcloud.dns.types import Provider, RecordType
from libcloud.dns.providers import get_driver

ZERIGO_USERNAME = ''
ZERIGO_API_KEY = ''

RACKSPACE_USERNAME = ''
RACKSPACE_API_KEY = ''

CONTACT_EMAIL = ''  # Rackspace requires a valid email for every domain

ZONE_TTL = 30 * 60  # Default zone TTL (in seconds) which should be used
MIN_TTL = 300  # Minim TTL supported by the target provider
IGNORED_RECORD_TYPES = [RecordType.NS, RecordType.PTR]

source_cls = get_driver(Provider.ZERIGO)(ZERIGO_USERNAME, ZERIGO_API_KEY)
destination_cls = get_driver(Provider.RACKSPACE)(RACKSPACE_USERNAME,
                                                 RACKSPACE_API_KEY)


def get_record_hash(record):
    """
    Return a hash for the provided record. This is used to determine if the
    record already exists.
    """
    record_hash = hashlib.md5('%s-%s-%s' % (record.name, record.type,
                                            record.data)).hexdigest()
    return record_hash

source_zones = source_cls.list_zones()
destination_zones = destination_cls.list_zones()

destination_domains = [zone.domain for zone in destination_zones]

# 1. Create zones
for zone in source_zones:
    if zone.domain in destination_domains:
        print('Zone "%s" already exists, skipping...' % (zone.domain))
        continue

    extra = {'email': CONTACT_EMAIL}

    print('Creating zone: %s' % (zone.domain))
    destination_cls.create_zone(domain=zone.domain, ttl=ZONE_TTL,
                                extra=extra)

destination_zones = destination_cls.list_zones()

supported_record_type = destination_cls.list_record_types()

# 2. Create records
for source_zone in source_zones:
    destination_zone = [zone for zone in destination_zones
                        if zone.domain == source_zone.domain][0]

    source_records = source_zone.list_records()
    destination_records = destination_zone.list_records()

    for source_record in source_records:
        # Rackspace doesn't have a special SPF record type
        if source_record.type == RecordType.SPF:
            source_record.type = RecordType.TXT

        record_hash = get_record_hash(source_record)
        destination_record_hashes = [get_record_hash(record) for record
                                     in destination_records]

        if source_record.name:
            fqdn = '%s.%s' % (source_record.name, source_zone.domain)
        else:
            fqdn = source_zone.domain

        if record_hash in destination_record_hashes:
            print('Record "%s" already exists, skipping...' % (fqdn))
            continue

        if source_record.type in IGNORED_RECORD_TYPES:
            print(('Encountered ignored record type (type=%s,name=%s) '
                  'skipping...') % (source_record.type, fqdn))
            continue

        if type not in supported_record_type:
            print(('Encountered unsupported record type (type=%s,name=%s)'
                  ', skipping...') % (source_record.type, fqdn))
            continue

        extra = {}

        ttl = source_record.extra.get('ttl', None)
        priority = source_record.extra.get('priority', None)

        if ttl:
            if ttl < MIN_TTL:
                ttl = MIN_TTL
            extra['ttl'] = ttl

        if priority:
            extra['priority'] = priority

        name = source_record.name
        type = source_record.type
        data = source_record.data

        print('Creating a record: %s' % (fqdn))
        destination_zone.create_record(name=name, type=type, data=data,
                                       extra=extra)

Before proceeeding it’s worth knowing that there are some differences between the providers and some limitations you should be aware of:

Zerigo supports more record types. If you use more advanced record types which are not supported by Rackspace, then Rackspace might not be a good fit for you.
Rackspace only allows you to create PTR records for resources (cloud servers & load balancers) which are hosted in their data centers.
Rackspace doesn’t support SPF record type. This is not a big deal since this record type has been deprecated anyway and TXT can be used instead. This script transparently handled remapping of SPF to TXT for you.
Minimum supported TTL by Zerigo is 180 seconds and the minimum supported TTL by Rackspace is 300 seconds. If during the migration the script encounteres a TTL smaller than 300 seconds, it simply uses the smallest possible TTL which is 300 seconds.

To use it, simply plug in your API credentials and run it:

python migrate_dns_providers.py

Zerigo control panel.

If the script for some reason fails half-way through (bad connectivity, API issues, etc.), it’s safe to run it again since all the operations are idempotent.

Rackspace Cloud DNS control panel after the migration.

After you have run the script, you should check if everything looks OK and if it does, you can go ahead and change the DNS records for your domains to point to the Rackspace Cloud DNS servers (dns1.stabletransit.com & dns2.stabletransit.com).

Programatically detecting type / platform of the Amazon Machine Images

2014-01-12T00:00:00+01:00

Programatically detecting type / platform of the Amazon Machine Images

Yesterday I was talking with one of the Libcloud users on our IRC channel. The user was trying to figure out if there is a programmatic way to detect type of the image used (also called a platform) by an EC2 instance (e.g. Linux, RHEL, Windows, Windows with SQL server, etc.).

This information is important because the EC2 instance pricing depends on the type of the image used (more on that bellow).

I was already looking into this in the past while trying to extend pricing information which is available in Libcloud. I didn’t have much luck back then, but I decided to look into it again and dig deeper this time.

After a lot of research and poking with the API, it turned out that there still seems to be no programmatic and reliable way to determine that (if I missed something out, please let me know).

In this post I’m going to have a quick look at how EC2 instance pricing works and at some of the less than ideal approaches which can be used to determine the image type.

How EC2 instance pricing works

First lets have a quick look at how the whole EC2 instance pricing works.

Compared to a lot of other cloud providers, EC2 pricing is very complex and depends on multiple factors:

Region (us-east-1 us-west-1, eu-west-1, …)
Instance type (t1.micro, m1.small, m1.xlarge, …)
Image type (Linux, RHEL, SLES, Windows, Windows with SQL Server standard, …)
Is the instance EBS optimized
Is the instance on-demand, reserved or spot
Volume discounts
Data transfer
Other resources associated with this instance (e.g. EBS volumes)

If you want to calculate an accurate instance pricing information, you need to take into account all the factors mentioned above.

Amazon EC2 pricing information

Amazon offers all the pricing information in a human readable format on their pricing page, but they don’t offer a documented API which could be used to consume this information programatically.

Luckily, the pricing page reads JSON files (e.g. http://aws.amazon.com/ec2/pricing/json/linux-od.json) which can also be consumed programatically.

Those JSON files are undocumented and the bad thing with any undocumented feature is that it could be changed or removed at any time without any prior notice.

Sadly that’s the best we’ve get so far so we need to stick with it for now.

Programatically detecting the image type / platform

I’ve spent a bunch of time researching and poking with the API and the web interface, but I had no luck with finding an API method which would return that information.

DescribeImages API method does return platform attribute, but only for Windows based images. This means you still need to use a different approach to detect RHEL, SLES and other type of Windows images.

EC2 api has some undocumented features like the undocumented max-instances, max-elastic-ips and vpc-max-elastic-ips value for the AttributeName filter used by the DescribeAccountAttributes API method. Because of that, I also tried a bunch of undocumented things and filter values, but I had no luck with retrieving a platform attribute for all the images or retrieving only RHEL based images.

The interesting thing is that the web interface does show an image type / platform, but it seems to use a private method to obtain this information.

Image platform as displayed in the web interface.

Web interface calls a private API method which returns information which is not available via the public one.

1. Inferring platform from the image details

Each image has name a name, description and a bunch of other attributes associated with it.

This information can be used to infer the platform from it or to build a static list which maps image id to a platform.

Inferring platform from the name and description should work reasonably well for the standard images, but it breaks down for private or copied images with custom names and descriptions.

On the other hand, the problem with a static list approach is that it doesn’t scale and it’s time consuming and error prone to keep it up to date.

2. Scrapping The Cloud Market website

The Cloud Market website provides details (including platform / image type) for every publicly available Amazon Machine Image.

This approach basically just builds on the static list approach, but instead of putting the burden of keeping this list up to date on you, it puts it on the Cloud Market team.

The Cloud Market website provides an API, but you can only retrieve details for the images which you are owner of. This means that to retrieve a platform for a particular image, you need to scrape the website which again is very hacky and far from ideal.

Conclusion

As you can see, all of the approaches I have describes are hacky and far from ideal, but sadly that’s the best we have so far.

Let’s just hope Amazon will pick their stuff together and finally provide an official API for this in the near future.

Tomaz Muraus' personal blog

Making StackStorm Fast

Background and History

Why the problem exists today

Solving the Problem

Implemented Approach

Numbers, numbers, numbers

Micro-benchmarks

End to end load tests

Other Improvements

Conclusion

Thanks

Consuming AWS EventBridge Events inside StackStorm

Why?

Consuming EventBridge Events Inside StackStorm

1. Create StackStorm Rule Which Exposes a New Webhook

2. Configure and Deploy AWS Lambda Function

2. Create AWS EventBridge Rule Which Runs Your Lambda Function

3. Monitor your StackStorm Instance For New AWS EventBridge Events

Conclusion

Our airport setup (weather station, cameras and LiveATC audio feed)

Background

Internet and Networking Setup

Weather Station

Web Cameras

LiveATC audio feed

Future Improvements

Traffic Feed

Power Bank Based UPS

Equipment List and Costs

Libcloud now supports OpenStack Identity (Keystone) API v3

What is OpenStack Keystone / Identity service?

Motivation

Refactoring the existing code

Identity related code has been moved to a separate (independent) module

Improvements in the service catalog related code

Addition of the administrative related functionality

Examples

Authenticating against Keystone API v3 using the OpenStack compute driver

Obtaining auth token scoped to the domain

Talking directly to the OpenStack Keystone API v3

A quick note on backward compatibility

Libcloud at ApacheCon NA, 2014 in Denver, Colorado

Libcloud Google Summer of Code 2014 Call for Participation

What is Google Summer of Code?

Google Summer of Code and Libcloud

Why should I participate / What do I get out of participating in GSoC?

Libcloud Google Summer of Code 2014 Call For Proposals

New Libcloud website is now live

Say hello to Živa

Why get a dog in the first place?

Adopting a dog from a shelter

Živa

First day with a dog

Plans for the future

Migrating from Zerigo to Rackspace Cloud DNS using Libcloud

Background and Motivation

Why Rackspace Cloud DNS?

Migrating from Zerigo DNS to Rackspace Cloud DNS using Libcloud

Programatically detecting type / platform of the Amazon Machine Images

How EC2 instance pricing works

Amazon EC2 pricing information

Programatically detecting the image type / platform

1. Inferring platform from the image details

2. Scrapping The Cloud Market website

Conclusion