The disk backed rabbitmq queue and you

RabbitMQ is still doing great but had one very bad flaw: more than 1.5m un-eaten messages on a single “small” type server (1-2GB RAM) would just take the whole thing down. And the current solution involves giving your machine more swap. When you’re not in a real time scenario (worker messaging on “fake M/R tasks” is ours), you want your messages to stick around and not break all the servers. A broken queue really bites. Fortunately, the LShift guys are on the case and today I got to try out the new disk-backed queue setup. It’s in as the bug20980 branch.

You’ll want to build rabbitmq yourself, not so bad….

# Get dependencies and checkout rabbitmq
apt-get install mercurial libncurses-dev
hg clone http://hg.rabbitmq.com/rabbitmq-server
hg clone http://hg.rabbitmq.com/rabbitmq-codegen

# Download erlang compiler & etc
curl -O "http://www.erlang.org/download/otp_src_R13B01.tar.gz"
tar xzvfย otp_src_R13B01.tar.gz
cdย otp_src_R13B01
./configure; make; make install

# Build rabbit from the "bug20980" branch which allows the disk-pin
cd ~/rabbitmq-server
hg update -C bug20980
make

# If you want your disk-backed queues to not go into /var/lib/rabbitmq
# (e.g. amazon machine!!) you'll want to do this (/vol is our EBS location)
export RABBITMQ_LOG_BASE="/vol/rabbitlog"
export RABBITMQ_MNESIA_BASE="/vol/mnesia"

# Start up rabbitmq
cd scripts
./rabbitmq-server

Now you’ve got two choices. You can have rabbit make the decision when to send your queues off to “disk mode,” or you can do it yourself. The yourself version is:

./rabbitmqctl pin_queue_to_disk queue_name_you_want_to_pin

You can check up on the status of your queues, including their mode, like so:

./rabbitmqctl list_queues name mode messages messages_ready messages_unacknowledged durable
Listing queues ...
memqueue	mixed	100	100	0	true
diskqueue	disk	2509876	2509876	0	true

If you let Rabbit make the choice, it seems to wait for your system RAM to start running out. Queues will stay in “mixed” mode until that happens (on my EC2 m1.large 8GB RAM machine w/ 4GB swap, this happened when we got to 75% via top – 5 million 1KB or so messages.) The disk mode kicks in automatically (and seems to stay there) – your RAM usage will just drop almost immediately. It won’t crash!ย 

Why not stay with disk mode full time? Performance, although on EC2 machines it’s kind of a wash anyway. We all know that IO and network throughput on these guys is … suboptimal … to put it lightly. With mixed mode on I am still getting roughly 2000 messages/second added and 25 messages/second to ack&read (both of these #s are from another EC2 instance on the same zone talking to the broker serially via py-amqplib.) Disk mode is slightly worse write speed (1900 m/s) and no noticeable difference is read speed (there is obviously something else blocking us there.)

Going to give this some real world stress tests over the next week, but so far no real problems.

Many thanks to LShiftย for the help.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s