Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2573 posts at DZone. You can read more from them at their website. View Full User Profile

MongoDB 1.6 Arrives With Auto-Sharding and Replica Sets

08.04.2010
| 13962 views |
  • submit to reddit

Companies like Foursquare, bit.ly, and BoxedIce have been using the beta version of MongoDB 1.6 in production for a few months and they have been very impressed with the new functionality in today's release.  Two main new features in 1.6 are Auto-Sharding and Replica Sets.  Auto-sharding allows the automation of horizontal scaling for MongoDB across multiple nodes, and Replica sets will automate failover and redundancy so that data existence on any number of servers across multiple datacenters is assured.  MongoDB is a NoSQL (non-relational) document store.

How to Make Replica Sets
Replica sets are a new method for database replication in MongoDB 1.6.  They support automatic failover, automatic recovery of nodes, and multi data center operations.  A replica set will hold any number of members and the data will fully exist on each one.  There is always a primary server which receives reads and writes while other members accept reads only.  If your primary fails, another member will take over automatically.  Once you have the replica set, you can begin sharding.

Setting up Replica Sets


How to use Auto-Sharding
Configuring the auto-sharding has three parts in MongoDB 1.6.  First, you need to set up the shard servers with the --shardsvr parameter when starting each mongod instance.  The config servers run with the --configsvr parameter to store metadata for the shard.  The documentation states that “a production shard cluster will have three config server processes, each existing on a separate machine. Writes to config servers use a two-phase commit to ensure an atomic and replicated transaction of the shard cluster’s metadata.”  Finally, the shard servers and config servers connect to the 'mongos', which are the processes that your clients connect to when routing queries to the appropriate shards.  They are self-contained and they usually run on each of your app servers.



Sharding setup



Foursquare moved all check-ins and user data to MongoDB from PostgreSQL once they no longer could use a single physical machine.  “With foursquare's growing popularity, we were fast approaching the point where it would no longer be feasible to store user checkins on a single physical machine.,” said Harry Heymann, Engineering Lead at Foursquare. “MongoDB's auto-sharding capabilities enabled us to easily transition to a multi-node cluster that will enable us to continue to grow for the foreseeable future.”

Another MongoDB user is Bit.ly, a service with 50 million users and 12.5 billion 'shortens' per month.  They store user history in MongoDB. Other major production users of MongoDB include the New York Times, Etsy, SourceForge, and Shutterfly.  Downloads of the database exceeded 50,000 in the month of July.

For more info on how BoxedIce set up their auto-sharding, check out their recent blog entry.