NoSQL Zone is brought to you in partnership with:

Zardosht Kasheff has been a senior member of Tokutek’s engineering team since 2008, working on TokuDB and TokuMX, both leveraging Fractal Tree Indexing and designed for big data workloads. TokuMX is MongoDB with Fractal Tree indexing storage. TokuDB is a high performance storage engine for MySQL. Prior to Tokutek, Zardosht worked at Microsoft on remote terminal services. Zardosht holds B.S. and M.S. degrees in Computer Science from MIT, where he pursued research on cache-oblivious dynamic search trees under Bradley Kuszmaul at the Computer Science and Artificial Intelligence Laboratory. Zardosht is a DZone MVB and is not an employee of DZone and has posted 24 posts at DZone. You can read more from them at their website. View Full User Profile

Why Unique Indexes are Bad

  • submit to reddit

Before creating a unique index in TokuMX or TokuDB, ask yourself, “does my application really depend on the database enforcing uniqueness of this key?” If the answer is ANYTHING other than yes, do not declare the index to be unique. Why? Because unique indexes may kill your write performance. In this post, I’ll explain why.

Unique indexes are a strange beast: they have no impact on standard databases that use B-Trees, such as MongoDB and MySQL, but may be horribly painful for databases that use write optimized data structures, like TokuMX’s Fractal Tree(R) indexes. How? They essentially drag the Fractal Tree index down to the B-Tree’s level of performance.

When a user declares a unique index, the user tells the database, “please help me and enforce uniqueness on this index.” So, before doing any insertion into a unique index, the database must first verify that the key being inserted does not already exist. If the possible location of the key is not in memory, which may happen if the working set does not fit in memory, then the database MUST perform an I/O to bring into memory the contents of the potential location (be it a leaf node in a tree, or an offset into a memory mapped file), in order to check whether the key exists in that location.

I/Os are expensive. A hard disk has only so many I/O’s available per second, so we should be wary of ever using one when we don’t need to.

This is where the difference in impact that unique indexes have on B-Trees v. write optimized data structures comes into play. For the B-Tree, the I/O does not matter, because the subsequent insertion would perform the I/O anyway, to insert the key. But a Fractal Tree index does not require an I/O on insertion. For Fractal Tree indexes, this I/O is a big cost!

So essentially, unique keys drag the performance of Fractal Tree indexes down to a B-Tree’s level, eliminating one of the biggest innovations of TokuMX and TokuDB for MySQL.

So, the moral of the story is this: don’t create a unique index unless you REALLY REALLY have to. Your performance may suffer greatly. Don’t create unique indexes because you happen to know an index will be unique. Only do so if your application depends on the database enforcing uniqueness. And if it really does, you may want to try to find a way to change your application.

Published at DZone with permission of Zardosht Kasheff, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)