Speeding up tests which talk to the Cassandra database

In 2011 I wrote a blog post on speeding up Django and Twisted tests titled Making Django and Twisted tests faster.

Today I’m going to show how to speed up tests which talk to the Cassandra database. Speedups will be achieved simply by tweaking the Cassandra configuration file.

Speeding up tests

There are many ways to speed up your application tests. Most common ways include paralleling the tests and updating them so they don’t touch a disk (hey, disk is slow!).

How hard it is to implement those things depends on your application architecture, programming language used, algorithms used and so on. In ideal world you would use Erlang and all your problems would be embarrassingly parallel. Sadly in many (most?) cases, this is not true.

If your application and tests weren’t build with parallelization in mind, making your tests run in parallel will be very hard and in many cases it’s not even worth the effort.

This car is fast, how about your tests?

Today I’m going to ignore parallelization for a moment and focus on how to speed up tests which talk to the Cassandra database. I’ll focus on how to do this by simply tweaking the Cassandra configuration file.

The reason I’m focusing on this approach is that it takes very little effort and it has a potential to offer substantial speedups (aka offers most bang for the buck).

Keep in mind that the same general approach also applies to other databases. If you Google around you can find many articles which show how to do that for MySQL, PostgreSQL and so on.

Speeding up tests which talk to Cassandra

This is very generic guide for speeding up the tests. Actual speedup depends on many factors and in some cases they will be very small to none (YMMV).

Some of the factors which affect the speedup are:

  • how many writes and reads your tests perform
  • amount of memory available to Cassandra
  • memtable flush setting
  • storage device used for sstable files
  • is your Cassandra process long running or do you spin up a new instance for every test run

1. Disable commit log

Cassandra provides write durability by appending writes to a commit log. Depending on the commitlog_sync option, the commit log is then synced to disk either periodically (every 10 seconds by default) or in the batch mode, Cassandra will wait with acknowledging writes until a commit log has been fully flushed (fsynced) to disk.

The first and simplest way to speed things up is by disabling commit log. This can be achieved on a per keyspace basis using durable_writes option. For example:

    CREATE KEYSPACE "YourKeyspace"
        WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1}
        AND durable_writes = false;

2. Disable periodic saving of cache to disk

Second way to speed things up is to disable periodic saving of key and row cache to disk.

This can be achieved by setting key_cache_save_period and row_cache_save_period option to 0.

As noted above, your mileage may vary. If you are spinning up a new instance of Cassandra for every test run and your tests don’t run for a very long time (by default key cache is written to disk every 4 hours), this setting won’t bring you any noticeable speedups.

3. Using ram disk for data directory

The last and probably the most well known option is telling your database to write data to RAM instead of a hard drive. Cassandra doesn’t allow you to fully turn off memtable flushing to sstables on disk so this can be achieved by using a ram drive.

To create a ram drive on a Linux distribution, you can run use the following commands:

    $ mkdir /tmp/ramdisk
    $ chmod 777 /tmp/ramdisk
    $ sudo mount -t tmpfs -o size=384M tmpfs /tmp/ramdisk/

After the ram drive has been created, update your Cassandra config and redirect all writes to a directory in /tmp/ramdisk/. This can be achieved by updating the following options:

  • data_file_directories
  • commitlog_directory
  • saved_caches_directory

If you have followed the first two steps, updating last two options is not necessary. They are included here for completeness.