Programatically detecting type / platform of the Amazon Machine Images

Yesterday I was talking with one of the Libcloud users on our IRC channel. The user was trying to figure out if there is a programmatic way to detect type of the image used (also called a platform) by an EC2 instance (e.g. Linux, RHEL, Windows, Windows with SQL server, etc.).

This information is important because the EC2 instance pricing depends on the type of the image used (more on that bellow).

I was already looking into this in the past while trying to extend pricing information which is available in Libcloud. I didn’t have much luck back then, but I decided to look into it again and dig deeper this time.

After a lot of research and poking with the API, it turned out that there still seems to be no programmatic and reliable way to determine that (if I missed something out, please let me know).

In this post I’m going to have a quick look at how EC2 instance pricing works and at some of the less than ideal approaches which can be used to determine the image type.

How EC2 instance pricing works

First lets have a quick look at how the whole EC2 instance pricing works.

Compared to a lot of other cloud providers, EC2 pricing is very complex and depends on multiple factors:

  1. Region (us-east-1 us-west-1, eu-west-1, …)
  2. Instance type (t1.micro, m1.small, m1.xlarge, …)
  3. Image type (Linux, RHEL, SLES, Windows, Windows with SQL Server standard, …)
  4. Is the instance EBS optimized
  5. Is the instance on-demand, reserved or spot
  6. Volume discounts
  7. Data transfer
  8. Other resources associated with this instance (e.g. EBS volumes)

If you want to calculate an accurate instance pricing information, you need to take into account all the factors mentioned above.

Amazon EC2 pricing information

Amazon offers all the pricing information in a human readable format on their pricing page, but they don’t offer a documented API which could be used to consume this information programatically.

Luckily, the pricing page reads JSON files (e.g. http://aws.amazon.com/ec2/pricing/json/linux-od.json) which can also be consumed programatically.

Those JSON files are undocumented and the bad thing with any undocumented feature is that it could be changed or removed at any time without any prior notice.

Sadly that’s the best we’ve get so far so we need to stick with it for now.

Programatically detecting the image type / platform

I’ve spent a bunch of time researching and poking with the API and the web interface, but I had no luck with finding an API method which would return that information.

DescribeImages API method does return platform attribute, but only for Windows based images. This means you still need to use a different approach to detect RHEL, SLES and other type of Windows images.

EC2 api has some undocumented features like the undocumented max-instances, max-elastic-ips and vpc-max-elastic-ips value for the AttributeName filter used by the DescribeAccountAttributes API method. Because of that, I also tried a bunch of undocumented things and filter values, but I had no luck with retrieving a platform attribute for all the images or retrieving only RHEL based images.

The interesting thing is that the web interface does show an image type / platform, but it seems to use a private method to obtain this information.

Image platform as displayed in the web interface.
Web interface calls a private API method which returns information which is not available via the public one.

1. Inferring platform from the image details

Each image has name a name, description and a bunch of other attributes associated with it.

This information can be used to infer the platform from it or to build a static list which maps image id to a platform.

Inferring platform from the name and description should work reasonably well for the standard images, but it breaks down for private or copied images with custom names and descriptions.

On the other hand, the problem with a static list approach is that it doesn’t scale and it’s time consuming and error prone to keep it up to date.

2. Scrapping The Cloud Market website

The Cloud Market website provides details (including platform / image type) for every publicly available Amazon Machine Image.

This approach basically just builds on the static list approach, but instead of putting the burden of keeping this list up to date on you, it puts it on the Cloud Market team.

The Cloud Market website provides an API, but you can only retrieve details for the images which you are owner of. This means that to retrieve a platform for a particular image, you need to scrape the website which again is very hacky and far from ideal.

Conclusion

As you can see, all of the approaches I have describes are hacky and far from ideal, but sadly that’s the best we have so far.

Let’s just hope Amazon will pick their stuff together and finally provide an official API for this in the near future.