NoSQL Zone is brought to you in partnership with:

I have been in the Software business for close to 30 years now, and I have 25+ years with SQL-based relational databases. Although I have used Unix even longer than that, I am pretty much an operating system agnostic. Over the years, I have work in many positions, from support engineer to sales engineer and consultant. Anders is a DZone MVB and is not an employee of DZone and has posted 40 posts at DZone. You can read more from them at their website. View Full User Profile

The Simple Comparison of Key-Value Stores

08.07.2012
| 5397 views |
  • submit to reddit
Faster and faster, that is what we want from our databases. And the biggest roadblock for the MySQL Dragster is the speed of the hard disk, right? No, I'm not going to debate that, that is just the case. And how do you fix that then? Well, if what is limiting your dragster is a roadblock, then either you drive round the roadblock of you make it disappear faster, or in computer terms:
  • Avoid using the disk by instead putting as much data as you can in RAM
  • Use faster disks (like SSDs)
Now, to be honest, this analogy isn't that good because the performance limiting factor of the disk is so huge, and contrary to popular belief, it's not getting any better! But we have SSDs you say? Yes, that makes the hard drive faster, but the CPU and RAM are getting even faster! But let's assume that we have enough memory so we do not need the disk? Will just about everything go at the speed of light? Nope, what happens here is that stuff that wasn't even noticeable in terms of limiting performance when the disk was there, as disk-I/O is such a huge bottleneck, suddenly shows it's dirty face!

Like this:  As the CPU cores are getting faster, but not that much faster anymore, due to physical limitations, we have more and more of these CPU cores instead. And suddenly, any limitation in getting those CPUs to work well together suddenly turns into a major headache! Like a mutex shared by all threads. Like the Query Cache mutex in MySQL for example!

With this in mind I am now ready to commence with the benchmarks that I wrote about in may. Yes, it took some time to get the data loaded into MySQL and in the process I managed to create a new Open Source project for exporting and importing JSON data from and to MySQL.  With that in place I now have something that is sort-of real world data. I had to remove a few columns (or Attributes of you are a JSON taliban) to make MySQL Cluster work with this data, because MySQL Cluster stores VARCHAR data as fixed length data on disk which means a few things:
  • There is a lot more stuff to write to disk.
  • UTF-8 means that there is 3 times more data to write!

All this means that MySQL Cluster may well work in well as an alternative to some key-valuestore setups, but not all, and it depends what "value" means here. If "value" means "document" or "object", then we need to use a VARCHAR or something like that for the value, which will be real limiting in the case of MySQL Cluster. And I'll try to be real nice to MySQL Cluster here, so I end up with a really simple schema:

CREATE TABLE `keyvalue` (
  `id` bigint(20) NOT NULL,
  `value1` int(11) DEFAULT NULL,
  `value2` double DEFAULT NULL,
  PRIMARY KEY (`id`)
)

And in this table I load some 105.000.000 rows. Should be simple with MySQL Cluster, right? Except that MySQL Cluster will only accommodate 512 Mb of hash data per partition (this is a really, really silly limitation! Is 512 Mb the MySQL Cluster version of "640 K should be enough for anyone?"). But this fixable, and with 4 partitions, this works just fine.

As a side note, without disk data, MySQL Cluster feels MUCH more stable. The occasional data loss and other weirdness that I experienced what I tried to load the table with VARCHAR data are now completely gone. So not only does disk data limit you on in terms of data types (VARCHAR), it also seems to need some more development. And no, I wasn't in the mood of reproducing the errors I got.

Anyway, on my server here at home, having an 8-Core AMD CPU and 16 Gb RAM, waiting to run this benchmark. I'm testing MySQL with InnoDB, MySQL Cluster and MongoDB. The test program is the same in all cases, I read 1.000.000 rows 10 times distributed over 100 threads. To be fair to all, I made sure that what data I had would fit in memory and that it WAS in memory, so I did a couple of warmup runs first. In the case of NDB I used the MySQL API, not NDBAPI (I will try this eventually). The results I got was this:

  • MongoDB - 110.000 reads per second
  • MySQL with InnoDB - 30.000 reads per second
  • MySQL with NDB - 32.000 reads per second

In the case of NDB, I had these setting, beyond the standard stuff:

ndb_use_exact_count=0
ndb_force_send=0

And that makes one helluva difference, I can tell you that! Why this isn't by default I don not really know, I assume there is a good reason for it, but someone has to tell me what it is. Now, I did load data also, and the results there were similar, but as I was loading JSON and that is pretty native to MongoDB, that was expected, MongoDB was about 2.5 times faster than NDB / InnoDB, which were on par with each other. I will not give any exact numbers here as loading data depends on so much more in terms of tuning.

This is not the end of this story though, if we assume, as MySQL was way behind MongoDB in terms of performance, but InnoDB and NDB were on par with eachother, one might at least want to try the theory that it is the MySQL part that is slowing things down, and this can be tested by running MySQL / NDB with more than one mysqld, and that is the next thing to try. Then we have the HANDLER interface and proper NDBAPI also, the latter should be a lot faster, really. And yes, I really should see what MyISAM can do for me. And MariaDB.

And before I end, please MySQL Cluster developers, remove that silly 512 Mb HASH index size per partition limitation (see the manual here). As RAM get's less and less expensive, and if we, for example, wanted to avoid Network and Disk I/O as much as possible, and instead use RAM and CPU (like in an Amazon environment), then this turns into a rather sreious limitation. My tests above was on hard iron though.

Published at DZone with permission of Anders Karlsson, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)