Big Data/BI Zone is brought to you in partnership with:

Doug has been engrossed in programming since his parents first bought him an Apple IIe computer in 4th grade. Throughout his early career, Doug proved his flexibility and ingenuity in crafting solutions in a variety of environments. Doug’s most recent work has been in the telecom industry developing tools to analyze large amounts of network traffic using C++ and Python. Doug loves learning and synthesizing this knowledge into code and blog articles. Doug is a DZone MVB and is not an employee of DZone and has posted 28 posts at DZone. You can read more from them at their website. View Full User Profile

Why FoundationDB Might Be All It's Cracked Up To Be

03.08.2013
| 2890 views |
  • submit to reddit

When I first heard about FoundationDB, I couldn’t imagine how it could be anything but vaporware. Seemed like Unicorns crapping happy rainbows to solve all your problems. As I’m learning more about it though, I realize it could actually be something ground breaking.

NoSQL: Let's Review…

So, I need to step back and explain one reason NoSQL databases have been revolutionary. In the days of yore, we used to normalize all our data across multiple tables on a single database living on a single machine. Unfortunately, Moore’s law eventually crapped out and maybe more importantly hard drive space stopped increasing massively. Our data and demands on it only kept growing. We needed to start trying to distribute our database across multiple machines.

Turns out, its hard to maintain transactionality in a distributed, heavily normalized SQL database. As such, a lot of NoSQL systems have emerged with simpler features, many promoting a model based around some kind of single row/document/value that can be looked up/inserted with a key. Transactionality for these systems is limited a single key value entry (“row” in Cassandra/HBase or “document” in (Mongo/Couch) — we’ll just call them rows here). Rows are easily stored in a single node, although we can replicate this row to multiple nodes. Despite being replicated, it turns out transactionally working with single rows in distributed NoSQL is easier than guaranteeing transactionality of an SQL query visiting potentially many SQL tables.

There are deep design ramifications/limitations to the transactional nature of rows. First you always try to cram a lot of data related to the row’s key into a single row, ending up with massive rows of hierarchical or flat data that all relates to the row key. This lets you cover as much data as possible under the row-based transactionality guarantee. Second, as you only have a single key to use from the system, you must chose very wisely what your key will be. You may need to think hard how your data will be looked up through its whole life, it can be hard to go back. Additionally, if you need to lookup on a secondary value, you better hope that your database is friendly enough to have a secondary key feature or otherwise you’ll need to maintain secondary row for storing the relationship. Then you have the problem of working across two rows, which doesn’t fit in the transactionality guarantee. Third, you might lose the ability to perform a join across multiple rows. In most NoSQL data stores, joining is discouraged and denormalization into large rows is the encouraged best practice.

FoundationDB Is Different

FoundationDB is a distributed, sorted key-value store with support for arbitrary transactions across multiple key-values — multiple “rows” — in the database.

To understand the distinction, let me pilfer an example from their tutorial. Their tutorial models a university class signup system. You know, the same system every CS major has had to implement in their programming 101 class. Anyway, to demonstrate the potential power here, I just want to share a single function with you, the class signup function:

def attendsKey(s, c):
    """ Key for student(s) attending class(c)"""
    return fdb.tuple.pack(('attends', s, c))

def classKey(c):
    """ Key for num available seats in class"""
    return fdb.tuple.pack(('class', c))

@fdb.transactional
def signup(tr, s, c):
    rec = attendsKey(s, c) # generates key for a whether a student attends a class
    if tr[rec].present(): return # already signed up (step 3)

    seatsLeft = int(tr[classKey(c)]) ## Get the num seats left for a class
    if not seatsLeft: raise Exception('no remaining seats') ## (step 3)

    classes = tr[attendsKeys(s)] ## Count the number of "attends" records for this student
    if len(list(classes)) >= 5: raise Exception('too many classes') ## (step 4)

    tr[classKey(c)] = str(seatsLeft-1) ## decrement the available steps
    tr[rec] = '' # mark that this student attends this class

Okay, more than one function, but the other functions are just helpers to show you how keys are getting generated.

Important here is that all work is done through signups first argument, tr, this is the transaction object where all work is done. First we check for the existence of a special key that indicates whether student s is attending classc. Then in the same transaction, we work on a completely different “row” — the count of students attending a class. If we are able to, we update that count and then create a row to store the fact that that student stores that class. More important than what is actually happening here, FoundationDB is able to attempt to perform this transaction atomically across the entire cluster.

If this were a more traditional NoSQL store, we would have to take a bit more awkward tack to do this atomically. We’d have to chose either the class or the student to make the row that we can work with atomically. Implicitly, our key would become either a lookup for a class or a lookup for a student. For the sake of discussion, lets say we made our rows classes and we simply stored the id of all the students attending that class in that row. Its trivial to work on classes to add/remove students. We simply lookup a class and append the student id to sign them up.

Conceptually this model is pretty simple, but its lacking if we suddenly want to lookup students in the database. What would that query look like? Can you do it atomically? You’ll need to have another type of rows for students. Then you have to entities to work across outside of the transactionality guarantees.

FoundationDB == Unopinionated Transactions

A big reason that many NoSQL stores were simplified to the atomic row architecture is to get away from the forced large-scale transactionality (and performance hit) of SQL transactions. The solution was to go back to making everything a map and to make accesses to each entry/row/document a transaction. So we all bought into that and began working our schemas into that model.

However, at the end of the day both SQL and traditional NoSQL are both very opinionated about a transaction should be. Despite the transaction manifesto, Foundation is completely unopinionated when it comes to how you define transactions. The same signup code above could easily be implemented as two or three transactions if that was truly what was called for.

This power is expressed in how you access Foundation. Foundation gets exposed more as a library for defining transactions on an arbitrary key-value store. This narrower aim lets you write code in your language, not constrained to a second query language or awkwardly fitting your code to an ORM. Instead, You write natural code expressing the transactions that you want to perform over the key-value store. Pretty exciting stuff.

Whoah Whoah Whoah, Slow Your Roll Sparky, Looks Cool And All But Prove This Isn’t A Giant Boondongle?

Okay: Foundation is new and unproven. There are plenty of unanswered questions about it. How does it perform vs {HBase/Cassandra/Mongo/Couch/…}? What is the cost of this transactionality? At what point does its transactional architecture stop scaling? What are the trade-offs? Etc Etc

Yeah, yeah so don’t start rewriting all your database code to use Foundation, that would be pretty crazy. Nevertheless, the unopinionated, highly client-controlled notion of transactionality is ground-breaking, obviously useful, and I’m hopeful it can be successful.

Published at DZone with permission of Doug Turnbull, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)