DevOps Zone is brought to you in partnership with:

Luke and the other bloggers on the Avid Life Media blog develop at Avid Life Media Inc. - a leading social entertainment company that operates some of the most vibrant dating brands and communities on the Internet. Luke is a DZone MVB and is not an employee of DZone and has posted 7 posts at DZone. You can read more from them at their website. View Full User Profile

Several Solutions for Queue & Worker Systems

11.22.2012
| 11323 views |
  • submit to reddit

Almost every dating site we’ve built has a job queue of some kind. We enqueue email sends, statistic updates, logging, fraud detection and more. Anything that doesn’t immediately impact the response we’re preparing for the user can be run in the background.

Over the years we’ve tried several different queue/worker systems and we’d like to share our findings to help others decide on a technology.

The solutions distinguish themselves in a few key areas:

Scalability

There are several separate scalability concerns to consider, such as the throughput of the queue, the number of the workers it can support and the maximum queue size. Some solutions are poorly suited to larger numbers of workers or large queues.

Priority

Sending a password reset email shouldn’t wait until a batch of hundreds of thousands of “Your Daily Picks” emails have been processed. The queue solution should therefore provide a way to give a job a numeric priority or to split jobs into multiple queues. Named “low_priority” and “high_priority” queues are often sufficient, with workers only proceeding with low priority work when the high priority queue is empty. For a time we had each worker dedicated to a particular queue with the number for each priority tweaked as needed, but this required too much attention to keep it running smoothly.

Administrative Tools

Some solutions provide web based administration tools to check queue lengths, contents of the queue and to resubmit failed jobs. Others have only a console interface, and strictly enforce that only the top element of the queue can be examined, the remainder being opaque.

Scheduling

Not all jobs should be executed immediately. We might schedule fraud analysis for a user several hours after signup when there is more data to analyze, or schedule a retry for a credit card transaction. Not all solutions support scheduling tasks.

You might be tempted to rely on a series of cron jobs to handle this. It has been our experience that this scales poorly. You cannot easily have multiple workers and must protect against a long running batch “running into” the next scheduled run of the cron job.

Redundancy

The queue server can be a single point of failure. We need to be careful that we don’t lose jobs and can continue processing in the event of failure.

Beanstalk

Beanstalk is very fast and simple and has many great features not seen in other solutions. It supports clients in many languages and makes it easy to write your own workers. It supports priorities and “burying” failed jobs so they will be attempted again after all other jobs are completed. Jobs in progress aren’t actually popped off the queue, but are rather marked as being in progress. If the job isn’t completed in a given amount of time it can be processed by another worker. Beanstalk behaves much like a transactional database in this regard. Jobs can also be delayed so they run in the future.

We use Beanstalk on Ashley Madison and we’re very happy with it. Unfortunately, there isn’t a good redundancy solution. Without shared storage and a failover system of some sort you are looking at a single point of failure. There is also no official admin console although there is a third party project.

RabbitMQ

RabbitMQ is an Erlang implementation of the Advance Message Queue Protocol (AMQP) brought to us by VMWare.

In theory, RabbitMQ should scale better and with better redundancy than any other option because Erlang can easily scale across multiple machines. The distributed database should prevent any data loss and provide high availability.

One downside is that it’s a “formal” queue. You can look at and pop the top most job, but you can’t get a sense of what’s in your queue. AMQP doesn’t let you check the queue length, but there are RabbitMQ specific admin consoles that work around the issue.

There’s no support for numeric priorities or delayed jobs, nor any transactional support. It’s up to the developer to elegantly handle failures and avoid losing jobs.

The biggest “gotcha” we’ve run into is that performance degrades very quickly as queue size increases. Memory use is extremely high, so it’s quite easy to get a large queue. In fact, memory use is often more than 10x the size of the same data in Beanstalk or Resque because Erlang doesn’t actually support strings. On a 64-bit machine, each character consumes 16 bytes. We found that if the queue size exceeded 1GB we were in for a world of pain. We no longer use RabbitMQ and don’t recommend it under any circumstances.

DelayedJob

DelayedJob is a Ruby library, so it’s only an option if you are using Ruby. DelayedJob stores its queue in a single table in MySQL. If you have an existing Ruby on Rails application using MySQL it’s extremely easy to add to your app. You get all the features of MySQL, specifically transactions and master/slave replication for redundancy. You can delay a job or have it run immediately and there’s a third party admin console.

If your queue is sharing the same database as your application, large volumes of inserts and deletes to manage the queue will dominate the replication queue, causing delays on slaves. Also, the workers poll the database server for jobs and each poll can involve a full table scan. With some schema changes this can be avoided; nevertheless, large volumes with many workers does not work well with DelayedJob.

While we no longer use DelayedJob due to our volume, it’s fine for small sites.

Resque

Resque was written by the folks at Github and is heavily inspired by DelayedJob. It’s a Ruby library, but there is port for Coffeescript.  Resque stores its queue in Redis. While Beanstalk and RabbitMQ are not likely to see use in your app other than as a job store, Redis is a great key-value store that can be used for logging, session storage and more.

We found that Resque was the “thin edge of the wedge” of Redis for us. Once Redis was in our environment we began to use it for storing other non-relational data. Redis can be setup to do master-slave replication, so Resque has got redundancy covered too.

Resque has an “official” admin console that is a pleasure to use. You can easily monitor queue length, view jobs and resubmit failed tasks.

Resque really shines with its plugins. They include support for priority queues, batching, forking and more. A few really stand out as “must haves.”

Resque-Scheduler

Resque-Scheduler not only allows jobs to be delayed until a specific time, it can also replace cron functionality. Recurring jobs are added to a queue of your choice and then executed as normal, making it easy to spread out cron work across multiple servers.

Resque-Mailer

For many small sites, the only use for a job queue is to send email. Resque-Mailer integrates with Rails’ ActionMailer and makes background email sends all but transparent.

In summary, if you need something lightweight with support for many languages and are willing to accept a single point of failure then Beanstalk is a great choice. If you are using Ruby, we highly recommend using Resque. The most important thing to remember is you do need a job queue. Any job queue is likely better than none at all.




Published at DZone with permission of Luke Galea, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Juho Juopperi replied on Thu, 2012/11/29 - 3:09am

We're now evaluating RabbitMQ and you seem to be talking of a different product. Have you checked you facts?

  • RabbitMQ has transactions (channel.txCommit()...)
  • Worker can see the queue length (getMessageCount())
  • Erlang's support of strings has nothing to do with the amount bytes that you put in a message. 1kB of ASCII text seems to take about 1.2kB on the persistent storage file. If you e.g. serialize a java object with a string you'll get two-byte characters (because ja has two-byte characters). Encode string into UTF-8 and you'll get mostly one-byte characters.

Michael N- replied on Sun, 2012/12/02 - 12:15pm

> "because Erlang doesn’t actually support strings. [...] don’t recommend [RabbitMQ] under any circumstances."

That's like saying C or C++ doesn't "support strings", is therefore inefficient, and shouldn't be used at all.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.