Performance Zone is brought to you in partnership with:

Developer with experience in a variety of different systems and technologies, with a customer focus and balance with business goals. Particularly interested in backend and large scale systems, and also interested in high level architecture, and API design. Always open to feedback in order to keep learning and improving as a professional. Rodrigo is a DZone MVB and is not an employee of DZone and has posted 39 posts at DZone. You can read more from them at their website. View Full User Profile

Understanding Vector Clocks with Riak

07.25.2012
| 5185 views |
  • submit to reddit
Riak is one databases that uses vector clocks for conflict resolution. I came across these two blog posts on Basho.com, company which develops Riak, and these posts are great at explaining the basics of Vector Clocks - definitely a must read if you're into distributed systems:

Why vector clocks are easy?

Why vector clocks are hard?

Voldermort DB (by LinkedIn) is another DB that uses Vector Clocks, as explained below. Not surprisingly, it also takes the idea from Amazon's Dynamo (like Riak):

The redundancy of storage makes the system more resilient to server failure. Since each value is stored N times, you can tolerate as many as N – 1 machine failures without data loss. This causes other problems, though. Since each value is stored in multiple places it is possible that one of these servers will not get updated (say because it is crashed when the update occurs). To help solve this problem Voldemort uses a data versioning mechanism called Vector Clocks that are common in distributed programming. This is an idea we took from Amazon’s Dynamo system. This data versioning allows the servers to detect stale data when it is read and repair it.

Voldermort's code in Java can be on code.google.com.

Finally, before I end this post, you may be asking "why complicate so much?" (if I could get a penny every time I heard that when discussing distributed systems... :-). But in this case, it's a good and typical question: can't we just use timestamp and last one wins? The problem, though, is that it requires times to be perfectly synchronized - which is very difficult and oftentimes impossible. By using vector clocks, you don't have this requirement on the system.
Published at DZone with permission of Rodrigo De Castro, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)