Background processing with Celery

Ljubljana Python Meetup, October 2013

Tomaz Muraus / @kamislo

Agenda

  • Message Queues
  • Message Queue Use Cases
    • Interprocess communication
    • Background processing
  • Celery
    • Introduction
    • Example

Who am I?

  • Tomaz Muraus
  • Long time Pythonista
  • Apache Libcloud Committer & Project chair
  • Previously: Cloudkick, Rackspace
  • Currently: Hacking away on my startup
  • Likes: Distributed systems, big data, open standards and systems, startups, ponies
  • tomaz.me
  • github.com/Kami
  • @kamislo

Messages Queues

  • First In First Out (FIFO)
  • Popular channel for communication between threads and processes
  • Queues are everywhere (kernel, networking - kqueue, epoll, TCP packet queues, pub/sub patterns in modern JavaScript frameworks, etc.)
  • Different patterns - fan out, pub/sub, worker queues, RPC, etc.

Terminology

  • Queue - buffer / data structure which stores messages
  • Producer - application that sends messages
  • Consumer - application that receives (consumes) message
  • AMPQ defines more - it's a monstrosity, you don't want to read it

Why Queues?

  • Easier to scale - simply add more workers
  • Better resource utilization and faster task completion times
  • Easier to implement custom QoS logic and rate limiting (multiple queues, etc.)
  • Easier to distribute work across many computers
  • More responsive webapps and better user experience

Use Cases

  • Interprocess communication
  • Background processing (task / worker queues)

Interprocess communication

  • Asynchronous communication between processes
  • Similar to Erlang actors and message passing
  • Typically publish / subscribe pattern (message is delivered to multiple consumers)

Background Processing / Task Queues

  • Perform (long running) tasks later in the background
  • Tasks get added to the queue
  • Worker process (consumer) takes tasks from the queue and processes them
  • Each task is only consumed by one worker

Example of tasks where you should probably use a message queue

  • Email delivery
  • Notification delivery (email, SMS, ...)
  • Image resizing
  • Video conversion
  • Page scraping / crawling
  • "Heavy" calculation (e.g. calculate number of views for a thread)

Queuing Software

  • RabbitMQ (Erlang)
  • Beanstalk, Gearman
  • ActiveMQ (Java, enterprisy)
  • Comcast Cloud Message Bus
  • Redis (RPUSH, BRPOP)
  • MongoDB (don't use it)
  • Amazon SQS, OpenStack Marconi, IronMQ (SaaS)

Celery

celeryproject.org

  • Distributed Task Queue library written in Python
  • First version released in 2009
  • Simple, Highly available, Fast, Flexible
  • Nice API
  • Active community

Celery - Features

  • Background jobs
  • Periodic jobs (cron like syntax)
  • Support for persistent results storage
  • Monitoring, Rate limiting, ...
  • Django Integration
  • Plugable architecture
  • Supports many different brokers (RabbitMQ, ActiveMQ, Redis, ZooKeeper, Amazon SQS, ...)
  • Supports multiple execution pool types (multiprocessing, eventlet, gevent)
  • Supports complex workflows (chains, groups, chords, ...)

Example - Send an email upon registration

  • Shows how to asynchronously send an activation / welcome email to the user upon registration

Example - Send an email upon registration (Celery task code)


from django.core.mail import send_mail

from celery.task import task

@task
def send_welcome_email(email_address):
    send_mail('Welcome to your site', 'message.', '[email protected]',
              [email_address], fail_silently=False)

Example - Send an email upon registration (Django code)


from django.views.generic.base import View

from yourapp.tasks.misc import send_welcome_email

class UserRegistrationView(View):
    def post(self, request):
        # registration code
        send_welcome_email.delay(email_address=email)

Questions?

Thanks

Message queues are a great and powerful abstraction, but like a hammer, they are not always the best or right tool for the job.

@KamiSLO / tomaz.me

Image Credits