NoSQL Zone is brought to you in partnership with:

I am the founder and lead developer of Hibernate Envers, a Hibernate core module, which provides entity versioning/auditing capabilities. I am also one of the co-founders of SoftwareMill, a company specializing in delivering customized software solutions (http://softwaremill.com, "Extraordinary software as a standard"), based on Java and JBoss technologies. After work, apart from being involved in development of Envers, I work on several small open source projects, like ElasticMQ (simple message queue written in Scala with an SQS interface), projects around static analysis (using JSR 308 - Typestate Annotations/ Checkers Framework and FindBugs), and some CDI/Weld (not always portable) extensions, like autofactories or stackable security interceptors. I am also interested in new JVM-based languages, especially with functional elements (like Scala, JRuby) and frameworks built using them (like Lift), as well as improving the ways we use Dependency Injection. Adam is a DZone MVB and is not an employee of DZone and has posted 52 posts at DZone. You can read more from them at their website. View Full User Profile

Tryting to Understand CAP

01.13.2013
| 2939 views |
  • submit to reddit

The CAP theorem, stated by Brewer and proved by Gilbert and Lynch specifies a property of distributed systems. It states that such a system cannot guarantee at the same time Consistency, Availability and Partition tolerance. It is also often said as a catchy phrase:

Consistency, Availability, Partition Tolerance – pick any two

used mostly when talking about NoSQL databases and suggesting that a distributed system can be characterized as either CA, AP or CP (see e.g. here).

For some time I’ve been trying to understand the different combinations and what do they mean in practice; having some time at various airports I caught up on reading, and here’s what I came up with.

What’s C, A and P?

First let’s define how I understand the three guarantees, basing on some of the articles I’ve read.

Consistency is the easiest one. It roughly means that the clients get the same view of data. By saying that a system is consistent we often mean strong consistency, but it also can come in different flavors, e.g. casual.

Availability is a property saying that every request to a non-failing node will return a (meaningful) response. The response may not contain all data (so the harvest will not be 100%, see the appropriate section in [3]), but it should be useful for the client.

Partition tolerance means that the system will continue working even if any number of messages sent between nodes is lost. This can be e.g. a network failure between two datacenters, where nodes in each datacenter form a partition. Also note that a failure of any number of nodes forms a partition (it is not possible to distinguish between a network failure and a node failing and stopping to respond to messages).

The hardest part for me is understanding the difference between Availability and Partition tolerance. Also, the various articles don’t specify what they mean by saying that a system is “working” after being e.g. partitioned – does it mean that every request gets a response with useful data, or are responses “Sorry, I can’t give you data right now” acceptable also?

P+C?

Let’s assume that a system is partition tolerant and that it has more than one node. If a partition is formed, splitting the system in two, the system should continue working. Hence both partitions allow clients to write. But then, how to guarantee consistency? If one client writes to partition 1, and another to partition 2? Hence: P => ~C.

A+~P?

Suppose now that a system is available and that we have more than one node. As the system is available, it should respond to requests even if some nodes die. As noted above, some nodes dying are equivalent to a partition. So if the system is still working, we have partition tolerance. Hence: A => P.

A+C?

Summarizing the two implications above (A => P and P => ~C), we get: A => ~C, so that an available system cannot be consistent (if it has more than one node). In practice however, there are of course AC systems, e.g. single-node RDBMS. Or even master-slave/master-master replicated RDBMS, provided there’s a central router knowing which nodes live and directing client appropriately. Such a router is then a single point of failure (SPoF).

Relax?

I suspect that in reality, when e.g. NoSQL/NewSQL systems are characterized with the CAP properties, they assume some relaxed form of C/A/P. Unfortunately, all of the definitions flying around seem to be pretty vague and are more of hand-waving than proper, precise statements. I think it would be much easier to explore the ever-growing ecosystem of new datastores if they could be more easily characterized; maybe the CAP vocabulary is just not enough?

Please correct me if I’m wrong somewhere, and I probably am! :)

Adam

Some articles I used for my research:

[1] http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
[2] http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/
[3] http://codahale.com/you-cant-sacrifice-partition-tolerance/

Published at DZone with permission of Adam Warski, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)