NoSQL Zone is brought to you in partnership with:

' ! Moshe Kaplan constantly helps successful firms getting to the next level and he is thrilled to uncover some of his secrets. Mr. Kaplan is a seasoned project management and cloud technologies lecturer. He is also known to be a cloud and SCRUM evangelist Moshe is a dSero.com Co-Founder. He was a R&D Director at Essence Security, led RockeTier and served as a board member in the IGT and as a department head at a top IDF IT unit. Moshe holds M.Sc and B.Sc from TAU. Moshe is a DZone MVB and is not an employee of DZone and has posted 59 posts at DZone. You can read more from them at their website. View Full User Profile

MongoDB Sharding

07.25.2013
| 8716 views |
  • submit to reddit

If I should have made some safe bets on the near future, I would choose two: Hadoop and MongoDB. 

There is a huge demand for both technologies and many players consider these technologies as a foundation for their future products.
Sharding?
MySQL Sharding was a major issue for large scale installations and it is the same for mongoDB large installations.
Back to Basics
Mongo is pretty similar to a regular database, but it has two main advantages: 1) Software engineers love it as it can easily be used for object persist-ency and 2) it support unstructured objects (documents) that can easily store different objects based on the same virtual class.
mongoDB terms

  1. Database: database
  2. Collections: very similar to tables.
  3. Documents: very similar to rows. Yet, a document can be as flexible as a JSON document can be. For example, it may include 1 to many fields in the document itself.
  4. mongod: a mongoDB instance or shard.
  5. Chunk: a 64MB storage unit that stores documents.
  6. Config database: Chunks to mongos mapping directory.
Why use sharding?
  1. Support large dataset using commodity servers.
  2. Support high IO requirements using commodity disks.
What are mongoDB sharding features?

  1. Range-based Data Partitioning: a very similar method to MySQL partitioning. You should choose one or more fields (shard key) that sharding will be based on. You should choose a shard key according to the business logic, like splitting according to account id in a SaaS application.
  2. Automatic Data Volume Distribution: mongoDB will take care of the shards balancing by itself according to the chosen shard key.
  3. Transparent Query Routing: mongoDB takes care of queries map reduce to multiple shared by itself when a query does not match the shard key (very much like Hadoop).
Key Recommendations for mongoDB Sharding
  1. Sufficient Carnality: choose a shard key that can be split later to more shards if a database size is getting too large (exceeds chunk size).
  2. Uniform Distribution: choose a sharding key that will spread a in uniform distribution to avoid unbalanced design.
  3. Distribute Write Operations: if you have a billing system, prefer to shard according to account id rather than shard according to billing month. Otherwise, in a given day, probably only a single shard will be used.
  4. Query according to the shard key: if any of your queries will include the shard key, each of your queries will result in a single shard query. Otherwise, it will generate N queries (one per shard).
Technical Aspects for mongoDB Sharding
  1. Every sharded collection must have an index that its first fields are the shard key (use shardCollection for that).
  2. Chunk size default limit is 64MB
  3. When a chunk reaches this limit, mongoDB will split it to two.
  4. If chunks are not distributed uniformly, mongoDB will start migrating chunks between different mongos.
  5. Cluster Balancer is taking care of this process.
  6. Balancing can cause performance issues and therefore can be restricted to off peak hours (nights and weekends for example) using balancing windows.
  7. The shards mapping to mongos is saved at the config database.
  8. Replication should be considered as well  a complementary method.
Bottom Line mongoDB brings to the table an out of the box sharding solution that can scale your operations. Now, you only need to analyze your needs and select the right solution for them.
Published at DZone with permission of Moshe Kaplan, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)