Performance Zone is brought to you in partnership with:

I've been a software engineer for about 7 years now. In the beginning, I started my career in the financial industry in Wall Street, working on overly complex systems that almost made me lose interest in the software craft. A couple years later, I decided to move on to working with startups and haven't looked back since. I particularly, enjoy working with Redis and using it's simple datastructures to efficiently solve complex problems. Santosh is a DZone MVB and is not an employee of DZone and has posted 9 posts at DZone. You can read more from them at their website. View Full User Profile

Watch Command for Maintaining Consistency of Redis Dataset

08.12.2012
| 3138 views |
  • submit to reddit

As you start using Redis more, you soon find yourself delving into redis' transactions. A traditional RDBMS' view of CAS (compare-and-set) transactions is:

  • - Lock down the entire database to writes, allowing just the one connection in the transaction to write
  • - Perform a query to figure out which things need to change (Compare Step)
  • - Change those things (Set step)
  • - Release the lock (this happens automaticaly as part of your transaction completing)

Redis, on the other hand uses Optimistic Locking which makes CAS transactions look like:

  • - Start tracking stuff you think could change while you are in your transaction
  • - Perform a query to figure out which things need to change (Compare Step)
  • - Execute the transaction to change those things (Conditional Set)
  • - Check to see if the transaction completed successfully
  • - Repeat from step 1 to re-run transaction or just abort

Step 1, where you start tracking stuff prior to doing anything is where Optimistic Locking and the more traditional Pessimistic Locking diverge in a pretty big way.

Optimistic Locking

The general idea behind optimistic locking is that you need to know before hand what you think might change while you are perform a transaction and watch out for that. Pessimistic locking on the other hand, is a more heavy handed approach where you don’t want anything to change while you are in the middle of a transaction. Pessimistic locking, as you might’ve guessed, is more punishing on performance and for write heavy datastores like Redis that need to maintain high performance, it just is not an option. The downside with optimistic locking, though is that more of the heavy lifting falls on the engineer, who needs to put in a little more thought while dealing with transactions. Redis ships with a watch command that lets you specify what keys you want to keep an eye on, prior to running a multi-exec transaction.

An example

All this sounds great, but nothing beats a real-world example to see how to work with this and why it might be harder than you think. Recently, I was working on a task that required me to do just this — when a user logs into our app, figure all of that users facebook friends who are logged in, in our app and send that over.

The setup

The data in our app is structured in the following format:

  • - Users are hashes of the format “user|<user-id>”
  • - All currently logged-in users have their facebook ID’s stored in a “logged_in_fb_ids” set
  • - There is a facebook ID to user ID reverse look up hash map “fb_id_to_user_id_hash”
  • - Every user has a set of facebook friends ids that contains facebook id’s of people they are friends with on facebook — “fb_friend_ids_for_user|”

So finding the name of someone given their facebook id would look something like:

fb_id = 123456
user_id = redis.hget("fb_id_to_user_id_hash", fb_id)
user_name = redis.hget("user|#{user_id}", "name")
First stab

Given the above structure of data in the redis dataset a first shot at fetching all of the user_ids for all of my facebook friends who are currently logged in the app right now, might look like:

def fetch_loggedin_fb_friends_for_user(user_id)
  fb_ids_of_my_friends_who_are_loggedin = redis.sinter("fb_friend_ids_for_user|#{user_id}", "logged_in_fb_ids")
  user_ids_of_my_fb_friends = redis.multi do |multi|
    fb_ids_of_my_friends_who_are_loggedin.each do |fb_id|
      multi.hget("fb_id_to_user_id_hash", fb_id)
    end
  end
  user_ids_of_my_fb_friends
end
Race condition

The race condition with this approach is that between the time I figure out the facebook id’s of my facebook friends who are currently logged in to my app:

fb_ids_of_my_friends_who_are_loggedin = redis.sinter("fb_friend_ids_for_user|#{user_id}", "logged_in_fb_ids")

And then figuring out the user_id’s for those facebook id’s, a user could have logged off and I’d be incorrectly reporting someone as being logged on when they are not. Granted, in this case, displaying stale data is not a big deal but there is an easy fix.

What I really want is between the between the time I start querying logged_in_fb_ids, to the time I compute the user_ids for those people, no changes should have happened in the logged_in_fb_ids set. If no changes, were made to this set while I was running my transaction I know that the user_ids_of_my_fb_friends data is accurate.

Fix

The fix in this case, is as simple as watch'ing the logged_in_fb_ids set before kicking off the multi-exec transaction, and then re-running the transaction if something changed.

def fetch_loggedin_fb_friends_for_user(user_id)
  redis.watch("logged_in_fb_ids")
  fb_ids_of_my_friends_who_are_loggedin = redis.sinter("fb_friend_ids_for_user|#{user_id}", "logged_in_fb_ids")
  user_ids_of_my_fb_friends = redis.multi do |multi|
    fb_ids_of_my_friends_who_are_loggedin.each do |fb_id|
      multi.hget("fb_id_to_user_id_hash", fb_id)
    end
  end
  user_ids_of_my_fb_friends
end

failed_cnt = 0
RETRY = 3
my_loggedin_fb_friends = fetch_loggedin_fb_friends_for_user(my_user_id)
while failed_cnt < RETRY && !my_loggedin_fb_friends
  my_loggedin_fb_friends = fetch_loggedin_fb_friends_for_user(my_user_id)
end

Here, we’ve chosen to retry 3 times, someone else might choose to abort the transaction. Your strategy for handling failed CAS transactions would depend on the situation.

TLDR;

Redis' watch command is what you should be using for CAS transactions.

 

 

 

 

 

Published at DZone with permission of Santosh Kumar, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)