Performance Zone is brought to you in partnership with:

I specialise MySQL Server performance as well as in performance of application stacks using MySQL, especially LAMP. Web sites handling millions of visitors a day dealing with terabytes of data and hundreds of servers is king of applications I love the most. Peter is a DZone MVB and is not an employee of DZone and has posted 272 posts at DZone. You can read more from them at their website. View Full User Profile

ALTER TABLE Performance with Fast Index Creation

06.28.2012
| 5163 views |
  • submit to reddit

Today I was looking at the ALTER TABLE performance with fast index creation and without it with different buffer pool sizes. Results are pretty interesting. I used modified Sysbench table for these tests because original table as initially created only has index on column K which initially contains only zeros, which means index is very small and also very fast to build by insertion as insertions happen in the “end” of index tree. I’ve updated column to have bunch of long random strings update sbtest set c=concat(sha1(rand()),’-',sha1(rand()),’-',sha1(rand()),’-',sha1(rand()),’-',sha1(rand())); and added key on column C: alter table sbtest add key c(c); The box I’m using for test is rather old box with 8GB of RAM and 4*SATA disks in RAID10. I used 10mil row table which would look as following in terms of data and index size:

mysql> show table status like "sbtest" \G
*************************** 1. row ***************************
           Name: sbtest
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 10000060
 Avg_row_length: 224
    Data_length: 2247098368
Max_data_length: 0
   Index_length: 1460322304
      Data_free: 7340032
 Auto_increment: 10000001
    Create_time: 2012-06-27 13:04:56
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

So we have about 3.7GB total size out of which 1.4GB is the Index.

First Lets look at results of Adding and Dropping column for 128M buffer pool, sized to illustrate the case of index being significantly larger than amount of memory.

Blue Columns in this case is the default configuration, second is with expand_fast_index_creation=1 which builds table with primary key only when creating a table and when builds all indexes by sort. The speedup we can observe is over 20 times. Note this benefit only applies to the case when you’re adding/dropping/changing columns as if you just add index it will be done by sort even without this additional Percona Server feature.

Lets now see with 6GB buffer pool which is large enough to fit complete table with index (but is a bit too small to keep 2 copies) In this case we’ll see the following picture:

Comparing these two graphs we can see few different observations:

Buffer Pool Size Matters There is 10x difference between the case when index fits in memory to when it does not. The difference can be even larger in the real world. Working with indexing, such as inserting data in random order can become very slow when index does not fit to memory any more.

Creating Index by sort is great Creating index by sort can provide over 20x performance improvement to the speed of ALTER TABLE, as this is not only step which ALTER TABLE does the improvement to the index creation speed itself has to be even larger. It is also offers better performance for in memory case, even though in this case the gain is just about 2x for this workload. Still substantial.

Sorting needs OS Cache It is not clearly seen from the graphs but with index built by sort enabled the performance of ALTER TABLE with 128MB was better than with 6GB. It looks like this does not make sense but in fact it does if you think about building index stage. On this box with 8GB of memory allocating 6GB to buffer pool made very little memory available for OS cache, so index build process required more IO than if only 128MB were used for buffer pool. The difference was rather small – less than 10% though, which is because disks are not that bad with sequential IO, which is what file merge process uses. This is important illustration as in many cases when people see performance reduction with large buffer pool they forget what bottleneck might be somewhere else where memory resources might be just needed more.

Take Away: If you’re having large tables and need to run ALTER TABLE which rebuilds the table or OPTIMIZE TABLE do not forget to enable expand_fast_index_creation it can make a whole process a lot faster.

P.S The tests were done with Percona Server 5.5.24-26.0

 

 

Published at DZone with permission of Peter Zaitsev, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)