Migrating from epydoc to Sphinx style docstrings using sed and some command line fu

This post describes how to migrate Python API documentation which uses epydoc style docstings to Sphinx format using sed and some command line fu.

Motivation

After a gentle nudge by Alex Gaynor, we have recently finally started to work on a task which was long overdue - improving documentation for the Libcloud project.

Improving and updating documentation has been on my todo for a long time, but I was always too busy and / or had an excuse to work on code or some other non-documentation related part of the project.

I know there is no good excuse or apology for that, but I don’t want digress too much from the original title of this post, so I plan to go into more details in a separate blog post. For now it suffices to say that we have already made quite a lot of progress and as always, your contributions are very much appreciated and welcome.

New documentation already looks way better than the old one.

This task included writing new documentation and moving existing regular and API documentation to Sphinx.

Existing documentation was stored in subversion (using Apache CMS) in Markdown format. The move to Sphinx and reStructuredText was performed manually. The reason for that is that the existing documentation was pretty poor and lacking and the move didn’t just involve changing the format, but it also involved rewriting the text and filling the gaps.

Existing API documentation and docstrings used epytext markup. Unlike regular documentation, API documentation didn’t need rewriting and we just wanted to migrate to a Sphinx style docstring format so we could use autodoc extension.

Migrating from epydoc to Sphinx style docstring format

There are multiple ways to approach this task:

  1. Write a Sphinx extension which converts epytext tags to Sphinx format on the fly
  2. Update all the epytext tags in the code

I decided to go with #2 and automate it using some command line fu. The reason for that is, that on the fly translation slows things down and moving forward, you end up with two style of docstrings in your code (epytext for old and Sphinx for new code).

The only downside of the second approach that it touches a lot of code and in case you have a lot of open pull requests, this could result in a bunch of merge conflicts down the road, so keep that in mind.

The script which I used for the migration can be found bellow:

#!/usr/bin/env bash
#
# Script for migrating from epydoc to Sphinx style docstrings.
#
# WARNING: THIS SCRIPT MODIFIES FILES IN PLACE. BE SURE TO BACKUP THEM BEFORE
# RUNNING IT.

DIRECTORY=$1

SED=`which gsed gnused sed`

for value in $SED
do
    SED=${value}
    break
done

if [ ! $DIRECTORY ]; then
    echo "Usage: ./migrate_docstrings.sh <directory with your code>"
    exit 1
fi

OLD_VALUES[0]='@type'
OLD_VALUES[1]='@keyword'
OLD_VALUES[2]='@param'
OLD_VALUES[3]='@return'
OLD_VALUES[4]='@rtype'
OLD_VALUES[5]='L{\([^}]\+\)}'
OLD_VALUES[6]='C{\(int\|float\|str\|list\|tuple\|dict\|bool\|None\|generator\|object\)}'
OLD_VALUES[7]='@\(ivar\|cvar\|var\)'

NEW_VALUES[0]=':type'
NEW_VALUES[1]=':keyword'
NEW_VALUES[2]=':param'
NEW_VALUES[3]=':return'
NEW_VALUES[4]=':rtype'
NEW_VALUES[5]=':class:`\1`'
NEW_VALUES[6]='``\1``'
NEW_VALUES[7]=':\1'

for (( i = 0 ; i < ${#OLD_VALUES[@]} ; i++ ))
do
    old_value=${OLD_VALUES[$i]}
    new_value=${NEW_VALUES[$i]}

    cmd="find ${DIRECTORY} -name '*.py' -type f -print0 | xargs -0 ${SED} -i -e 's/${old_value}/${new_value}/g'"

    echo "Migrating: ${old_value} -> ${new_value}"
    eval "$cmd"
done

(script is also available as gist at https://gist.github.com/Kami/6734885)

As you can see, the script is very simple and has some limitations (noted bellow), but it worked very well for us. As usually, 80-20 rule also applies in this case.

Limitations of this script:

  • Script does a very simple search and replace and has no knowledge or context of the surrounding code and text. This means that if you have some code which looks like epytext docstrings, this script might unwillingly replace it.
  • I only added support for tags we use. As such, the script doesn’t support all the epytext tags. This shouldn’t be a big deal though. It’s fairly easy to change it and add support for all of the tags. You can find a list of all the available tags on this page.