I specialise MySQL Server performance as well as in performance of application stacks using MySQL, especially LAMP. Web sites handling millions of visitors a day dealing with terabytes of data and hundreds of servers is king of applications I love the most. Peter is a DZone MVB and is not an employee of DZone and has posted 228 posts at DZone. You can read more from them at their website. View Full User Profile

The Doom of Multiple Storage Engines

05.10.2010
| 4634 views |
  • submit to reddit

One of the big “Selling Points” of MySQL is support for Multiple Storage engines, and from the glance view it is indeed great to provide users with same top level SQL interface allowing them to store their data many different way. As nice as it sounds the in theory this benefit comes at very significant cost in performance, operational and development complexity. Let me touch just on the few complications it brings in

Conversions – Each storage engine has its own data storage format so MySQL has to do a lot of copying and conversions while accessing the data, which significantly limits performance compared to basically “zero copy” design one may have streaming data from memory when it fits there.
Optimizer and Execution Storage engines are not created equal especially if you look at In Memory storage engines vs Distributed ones. Trying to handle all different cases in Optimizer – such as in memory nature of MEMORY storage engine, clustering by Primary key in Innodb and distributed nature of NDB makes it more complicated when it should be and still because it has to serve so many use cases it is not tuned to exploit full performance potential of any given storage engine.
Synchronization The top level on MySQL side, such as .frm files and binary log files has to be synchronized with storage engine transactions which require a lot of complications and performance overhead. MySQL has to do several fsync() calls per transaction commit to be on the safe side and still number of operations are not fully safe – you can for example get .frm out of sync with internal data dictionary if you crash at wrong time.
Transaction Support We have to deal with mix of locks on high level as well as on storage engine internally. We have to deal with different locking strategies storage engines may have and if we venture into using cross storage engine transactions we’re up for a lot of surprises.
Backup Cross storage engine backup is a pain because they are so different. Some of them are distributed and some in memory so even OS level approaches such as LVM backup may not work in all cases. We also get into synchronization aspects making backup which prevents from having hot backup in many cases.
Replication Support for different storage engines is a pain in replication. To start we have to write second binary log file when all transaction engines have their own !
Quality Assurance Multiple storage engines are much more complicated to test. And I believe multi-storage engine use is really not tested very deeply. I’ve seen some tests being run for different storage engines, but really this is not enough as MySQL allows you using multiple storage engines in the query you need to test all kinds of combination such as running multi-table update joining PBXT and Innodb tables.
Operations Things like backup and monitoring becomes a lot more complicated plus you have to deal with balancing – allocating resources to different storage engines and performance tuning becomes more complicated.

Now what is interesting for probably 95% of applications single storage engine would be good enough. In fact people already do not love to mix multiple storage engines very actively because of potential complications involved.

Now lets think what we could have if we have a version of MySQL Server which drops everything but Innodb Storage engine (it could be NDB, PBXT etc, it does not matter).

We could save a lot of CPU cycles by having storage format same as processing format. We could tune Optimizer to handle Innodb specifics well. We could get rid of SQL level table locks and using Innodb internal data dictionary instead of Innodb files. We would use Innodb transactional log for replication (which could be extended a bit for this purpose). Finally backup can be done in truly hot way without nasty “FLUSH TABLE WITH READLOCK” and hoping nobody is touching “mysql” database any more. Single Storage Engine server would be also a lot easier to test and operate.

This also would not mean one has to give up flexibility completely, for example one can imagine having Innodb tables which do not log the changes, hence being faster for update operations. One could also lock them in memory to ensure predictable in memory performance.

References
Published at DZone with permission of Peter Zaitsev, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Nicolas Jouve replied on Mon, 2010/05/10 - 11:20am

I do agree completely with this.

And infact, I even think that "multiple storage engines" is not a selling point but has been instead a big marketing error , as the image people have of MySQL is very confused due to the fact that people do not necessarily talk about the "same MySQL".

Michael Eric replied on Wed, 2012/09/26 - 4:00pm

I think this is exactly the reasoning why PostgreSQL only has one storage engine to rule them all.

It seems like the Drizzle project is trying to address a lot of these types of performance issues, but they are keeping support for pluggable storage engines (MyISAM notably absent) and even extending it. I wonder what their take on those issues is, or if they are able to get around some of them (perhaps by rigidly defining how a storage engine must behave/store data to avoid the copying issues?). If anyone has some insight I’d love to hear it.

ubuntu 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.