Tip: How do you know when to shard your database
We at ScaleBase talk about sharding so much, it’s difficult for us to see why someone wouldn’t want to shard. But just because we’re so enthusiastic about our transparent sharding mechanism, it doesn’t mean we can’t understand the very basic question, “When do I shard?”
Well, it’s not the most difficult question to answer. I’ll keep it short: if your database exceeds the memory you have on a single machine, you should shard. If you hit I/O, your performance suffers, and sharding will assist.
Why? That’s easy to explain.
Databases in general (and MySQL is no exception) try to cache data. Because accessing memory is so much faster than accessing disk (even with SSDs), database providers have developed rather sophisticated caching algorithms. For instance, running a query caches the query and its results. Indexes are stored in memory so that, when running a query, the database doesn’t have to hit the disk twice.
But if the database is big, it won’t fit into memory. Sometime even the index won’t fit into memory. This is when you start seeing database performance degradation. So the best date to start sharding is when you can’t add more memory to your database server. This can come sooner rather than later. As we all know, data is booming, and if you’re running in the cloud there is only so much memory your cloud provider will give you. With sharding, every machine has its own data, which fits in RAM. And if you need more – just add an additional shard.
The other parameter is the number of concurrent connections. If you reach the limit of connections your machine can handle, it’s time to shard your database. Every sharded database gets less hits/second, requires fewer connections – and can work faster.
So, if your database does not fit in memory, or if you have too many concurrent users hitting your database – try out ScaleBase, for our transparent sharding solution.
To learn more, try out ScaleBase Transparent Sharding solution at www.scalebase.com
Published at DZone with permission of its author, Liran Zelkha. (2 votes)
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





Comments
Nigel Eiland replied on Sun, 2011/11/20 - 10:50am
Roger Lindsjö replied on Mon, 2011/11/21 - 7:32am
Basically what you are saying is that if it is too big to fit in ram (and you can not get more ram), get more machines (with more ram).
Are you seriously saying that all data should be no further away than ram? That would increase the cost of datawahouses quite a lot I guess.
And if so, why should it only apply to databases? Why not the file system itself? I need to access logfiles from my runtimes, say a few hundred GBs of data. For maximum performance this should all of course be in ram, but that would not be cost effective. For every 1TB disk I buy I also buy 1TB of ram?
I think sharding is a great way to go (and your technology seems interresting), but a broad statement such as the one you did makes you look not very serious in my eyes.
Prem Kurian Philip replied on Wed, 2011/11/23 - 2:09pm
in response to:
Nigel Eiland
Ah! apparently not!