Performance Zone is brought to you in partnership with:

I specialise MySQL Server performance as well as in performance of application stacks using MySQL, especially LAMP. Web sites handling millions of visitors a day dealing with terabytes of data and hundreds of servers is king of applications I love the most. Peter is a DZone MVB and is not an employee of DZone and has posted 229 posts at DZone. You can read more from them at their website. View Full User Profile

Enabling Crash-safe Slaves with MySQL 5.6

09.17.2013
| 1252 views |
  • submit to reddit

This post comes from  at the MySQL Performance blog.

Being able to configure slaves to be crash-safe is one of the major improvements of MySQL 5.6 with regards to replication. However, we noticed some confusion on how to enable this feature correctly, so let’s clarify how it should be done.

In Short

1. Stop MySQL on slave
2. Add relay_log_info_repository = TABLE and relay_log_recovery = ON in my.cnf
3. Restart MySQL and relax

The Gory Details

To fully understand why you should change the above settings if you want crash-safe slaves, let’s first look at the reasons why replication can break when a slave crashes.

On a slave, replication involves two threads: the I/O thread, which copies the binary log of the master to a local copy called the relay log, and the SQL thread, which then executes the queries written in the relay log. The current position of each thread is stored in a file: master.info for the I/O thread and relay-log.info for the SQL thread.

So far, so good. The first problem is that these files are not synced to disk each time they are written to; whenever there is a crash, positions that are stored are likely to be incorrect. MySQL 5.5 has a fix for this: you can set sync_master_info = 1 and sync_relay_log_info = 1 to make sure both files are written and synced to disk after each transaction. Syncing is not free, of course, but if you have write-back cache, these settings can be valuable.

But wait, even with sync_master_info = 1 and sync_relay_info = 1, bad things can happen. The reason is that replication information is written after the transaction is committed. So if a crash occurs after the transaction is committed and before the replication information is updated, replication information will be wrong when the server restarts and a transaction could be executed twice. The effect will depend on the transaction; replication may still run fine, it may be broken, or inconsistencies can even be silently created.

MySQL 5.6 tackles this problem by letting us store replication information in tables instead of files (a mysql.slave_relay_log_info table is created when relay_log_info_repository = TABLE and the mysql.slave_master_info table is created with master_info_repository = TABLE). The idea is simple: we can include the update of the replication information inside the transaction, making sure it is always in sync with the data.

In pseudo-code, instead of having:

START TRANSACTION;
-- Statement 1
-- ...
-- Statement N
COMMIT;
-- Update replication info files

The server now behaves as if we had:

START TRANSACTION;
-- Statement 1
-- ...
-- Statement N
-- Update replication info
COMMIT;

Unfortunately, this is not as straightforward as it may appear. For the SQL thread, it works well because the server can update the slave_relay_info_info table at the same time it commits a transaction. However, for the I/O thread, the update of the table is not related to any transaction execution, so how can the server know when to update the table?

The answer is: it is controlled by sync_master_info. The default is 10,000, meaning that the I/O thread position is only updated every 10,000 transactions. This is obviously not good to make the slave crash-safe. One solution is to set sync_master_info = 1, but as I mentioned, it may have a performance impact (this is why 1 is not the default setting).

However, there is a more elegant solution -- using relay_log_recovery = ON, which will require a MySQL restart. This setting makes sure that when the server starts up, the position for the I/O thread is recovered from the slave_relay_log_info table, which is always up-to-date. Thus, you do not even need to store I/O thread information in a table for the slave to be crash-safe. In other words, setting master_info_repository = TABLE is not necessary.

As a final side-note, once relay_log_info_repository = TABLE, the sync_relay_log_info setting becomes irrelevant, since the table will always be updated at each transaction commit no matter the value of the setting. So, you can safely remove it from your configuration file.

I hope this post will help you benefit from this great feature!



Published at DZone with permission of Peter Zaitsev, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)