NoSQL Zone is brought to you in partnership with:

Kristina Chodorow is a core contributor to MongoDB. She has written several O'Reilly books (MongoDB: The Definitive Guide, Scaling MongoDB, and 50 Tips and Tricks for MongoDB Developers) and has given talks at conferences around the world, including OSCON, FOSDEM, Latinoware, TEK·X, and YAPC. Her Twitter handle is @kchodorow. Kristina is a DZone MVB and is not an employee of DZone and has posted 52 posts at DZone. You can read more from them at their website. View Full User Profile

MongoDB: Sharding with the Fishes

05.03.2012
| 2425 views |
  • submit to reddit

Sharding is the not-so-revolutionary way that MongoDB scales writes (it’s very similar to techniques described in the Big Table paper and by PNUTS) but many people are unfamiliar with what it is and how it works.  If you’ve seen a talk on MongoDB or looked at the website, you’ve probably seen a diagram of sharding that looks like this:

…which probably looks a bit like “I hope I don’t have to understand that.”

However, it’s actually quite simple: it’s exactly how the Mafia is structured (or, at least, how The Godfather taught me it was):

  • The shards are the peons: someone tells them to do something (e.g., a query or an insert), they do it and report back.
  • The mongos is the godfather. It knows what data each peon has and gives them orders.  It’s basically a router for the requests.
  • The config server is the consigliere. It knows where all of the data is at any given time and lets the boss know so that he can focus on bossing. The consigliere keeps the organization running smoothly.

For a concrete example, let’s say we have a collection of blog posts.  You choose a “shard key,” which is the value Mongo will use to split the data across shards.  We’ll choose “date” as the shard key, which means it will be split up based on values in the “date” field.  If we have four shards, they might contain data something like:

  • shard #1: beginning of time up to June 2009
  • shard #2: July 2009 to November 2009
  • shard #3: December 2009 to February 2010
  • shard #4: March 2010 through the end of time

Now that we’ve got our peons set up, let’s ask the godfather for some favors.

Queries

Say you query for all documents created from the beginning of this year (January 1st, 2010) up to the present.  Here’s what happens:

  1. You (the client) send the query to the godfather.
  2. The godfather knows which shards contain the data you’re looking for, so he sends the query to shards #3 and #4.
  3. shard #3 and shard #4 execute the query and return the results to the godfather.
  4. The godfather puts together the results he’s received and sends them back to the client.

Note how all of the sharding stuff is done a layer away from the client, so your application doesn’t have to be sharding-aware, it can just query the godfather as though it were a normal mongod instance.

Inserts

Suppose you want to insert a new document with today’s date.  Here’s the sequence of events:

  1. You send the document to the godfather.
  2. It sees today’s date and routes it to shard #4.
  3. shard #4 inserts the document.

Again, identical to a single-server setup from the client’s point of view.

So where’s the consigliere?

Suppose you start getting millions of documents inserted with the date September 2009.  Shard #2 begins to swell up like a bloated corpse.  The consigliere will notice this and, when shard #2 gets too big it will split the data on shard #2 and migrate some of it to other, emptier shards.  Then it will let the godfather know that now shard #2 contains July 2009-September 15th 2009 and shard #3 contains September 16th 2009-February 2010.

The consigliere is also responsibly for figuring out what to do when you add a new shard to the cluster.  It figures out if it should keep the new shard in reserve or migrate some data to it right away.  Basically, it’s the brains of the operation.

Whenever the consigliere moves around data, it lets the godfather know what the final configuration is so that the godfather can continue routing requests correctly.

Leave the gun.  Take the cannolis.

This scaling deliciousness is, unfortunately, still very alpha.  You can help us out by telling us where our documentation sucks (specifics are better than “it sucks”), testing it out on your machine, and voting for features you’d like to see.

Published at DZone with permission of Kristina Chodorow, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)