This post comes from Vadim Tkachenko at the MySQL Performance Blog.
After compiling Percona Server with TokuDB, of course I wanted to compare InnoDB performance vs TokuDB.
I have a particular workload I’m interested in testing – it is an insert-intensive workload (which is TokuDB’s strong suit) with some roll-up aggregation, which should produce updates in-place (I will use INSERT .. ON DUPLICATE KEY UPDATE statements for that), so it will produce all good amount of reads.
few words about the hardware: I am going to use new the Dell PowerEdge
R420 with two Intel(R) Xeon(R) CPU E5-2450 0 @ 2.10GHz, 48GB of RAM and
SATA SSD: Kingston HyperX 3K 240 GB.
Workload: I will use two different schemas. The first schema is from sysbench, and the table looks like:
CREATE TABLE sbtest$I ( id BIGINT UNSIGNED NOT NULL, k INTEGER UNSIGNED DEFAULT '0' NOT NULL, c CHAR(120) DEFAULT '' NOT NULL, pad CHAR(60) DEFAULT '' NOT NULL, PRIMARY KEY (id) ) CREATE INDEX k on sbtest$I(k)
and tables sbtest$I_r10, sbtest$I_r100, sbtest$I_r1000, with roll-up sum for 10, 100, 1000 records in the main table.
and transactions for this workload are:
$ID=monotonically increasing ID $K=rand(0,10000) // distributed by pareto distribution $C, $PAD = random_string() BEGIN INSERT INTO sbtest (id, k, c, pad) VALUES ($ID, $K, $C, $PAD); INSERT INTO sbtest_r10 (id, k) VALUES ($ID/10, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k; INSERT INTO sbtest_r100 (id, k) VALUES ($ID/100, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k; INSERT INTO sbtest_r1000 (id, k) VALUES ($ID/1000, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k; COMMIT;
So this workload produces SEQUENTIAL inserts into Primary Key, that
this is quite suitable for InnoDB, and in it we have random inserts into
SECONDARY KEYS (k),
which is not so good for InnoDB.
So let’s see what results we have. The results are in TPS (more is better) and we start with empty tables.
Now, before looking at the graph, please do not jump to conclusions, as the graph is MISLEADING.
we see that InnoDB performance steadily declines from 24000 tps to
18000 tps, but InnoDB can’t make 5h run. After 3h the disk is full, and
InnoDB data size is about 210GB with 234.238.440 inserted records.
While TokuDB averages around 14000 tps mark with some periodical drops into 10000 tps area.
TokuDB datasize after 5h of run is about 50GB with 276.934.863 records.
So why do I say that the graph is misleading?
Obviously we can say that InnoDB is faster, but you should look into the steady decline of InnoDB throughput. Eventually it will drop to the level 14000 tps and below. I do not have enough space on this SSD to run this experiment that long. So there we see the strong side of TokuDB: it has more than 4x data compression on this dataset.
We can easily fill TokuDB tables with 1bln of rows on this SSD, and projected InnoDB performance on this size will be the same or worse, but will require 1TB in size.
Now to see this point of intersection, let’s review different workload (which actually is closer to what I need).
Tables looks like:
CREATE TABLE `sbtest1` ( `hid` int(10) unsigned NOT NULL DEFAULT '0', `mid` int(10) unsigned NOT NULL DEFAULT '0', `id` bigint(20) unsigned NOT NULL, `k` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`hid`,`mid`,`id`) )
and transactions are:
$HID=rand(0,10000) // distributed by uniform distribution $MID=rand(0,10000) // distributed by uniform distribution $ID=monotonically non-decreasing ID $K=rand(0,10000) // distributed by pareto distribution BEGIN INSERT INTO sbtest (hid, mid, id, k) VALUES ($HID, $MID, $ID, $K); INSERT INTO sbtest_r10 (hid, mid, id, k) VALUES ($HID, $MID, $ID/10, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k; INSERT INTO sbtest_r100 (hid, mid, id, k) VALUES ($HID, $MID, $ID/100, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k; INSERT INTO sbtest_r1000 (hid, mid, id, k) VALUES ($HID, $MID, $ID/1000, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k; COMMIT;
That is, our PRIMARY KEY is not sequential anymore, which is bad for InnoDB, but this is what I need for my setup (I still can have synthetic auto_inc PK, but in that case I still will need SECONDARY KEY (hid,mid,id) ).
Also please note the transaction produces 4 INSERTs and workload is very write intensive.
So what are results in this case:
InnoDB gradually declines as data growth (which is expected) and by the end of 5 hours averages at 2700 tps.
With TokuDB we also see a drop, and by end of 5 hours the average throughput is 7800 tps.
Something to take into account: TokuDB results are not quite stable, that is why I also show 5-minute moving averages to TokuDB.
So TokuDB shows about 2.8x better throughput, and on data size:
- InnoDB table: 58GB and 244.980.192 records
- TokuDB table: 15GB and 232.927.460 records
TokuDB looks better in this workload, however the sparse throughput is
worrisome to me. Let’s zoom in to 10 min intervals and see throughput:
We can see periodical drops, which I believe are related to 60-sec checkpoint interval, as TokuDB does time-based checkpoints.
These drops are quite concerning, and it might be a problem for some users.
Now, I understand that my PRIMARY KEY (
id is sequential, and
mid is low selectivity is not good for fast inserts, but it is suitable for range selects by
id. However it will interesting how both InnoDB and TokuDB performs if PK is (
mid). This also will affect select performance, so we will need to measure that also.
And, if you want to repeat this benchmark, the sysbench code is on Launchpad
command line to run:
sysbench --test=insert_roll.lua --oltp-table-size=10000 --mysql-user=root --oltp-tables-count=32 --mysql_table_engine=tokudb --oltp_auto_inc=on --max-time=18000 --report-interval=10 --max-requests=0 --num-threads=32 --rand-type=pareto run
and InnoDB options are:
[mysqld_safe] numa-interleave malloc-lib=/usr/local/Percona-Server-5.5.30-rel30.2-500.Linux.x86_64/lib/mysql/libjemalloc.so [mysqld] gdb datadir=/mnt/data/mysql #for SSD innodb_flush_neighbor_pages = none innodb_adaptive_flushing_method = keep_average innodb_file_per_table = true innodb_data_file_path = ibdata1:100M:autoextend innodb_flush_method = O_DIRECT innodb_log_buffer_size = 256M innodb_flush_log_at_trx_commit = 1 innodb_buffer_pool_size = 36G innodb_log_file_size = 4G innodb_log_files_in_group = 2 innodb_log_block_size=4096 #####plugin options innodb_read_io_threads = 16 innodb_write_io_threads = 4 innodb_io_capacity = 4000 innodb_use_native_aio=0 #not innodb options (fixed) port = 3306 back_log = 50 max_connections = 2000 max_prepared_stmt_count=500000 max_connect_errors = 10 table_open_cache = 2048 max_allowed_packet = 16M binlog_cache_size = 16M max_heap_table_size = 64M sort_buffer_size = 4M join_buffer_size = 4M thread_cache_size = 1000 query_cache_size = 0 query_cache_type = 0 ft_min_word_len = 4 #default_table_type = InnoDB thread_stack = 192K tmp_table_size = 64M server-id = 10 #*** MyISAM Specific options key_buffer_size = 8M read_buffer_size = 1M read_rnd_buffer_size = 4M bulk_insert_buffer_size = 8M myisam_sort_buffer_size = 8M myisam_max_sort_file_size = 10G #myisam_max_extra_sort_file_size = 10G myisam_repair_threads = 1 myisam_recover socket=/var/lib/mysql/mysql.sock user=root skip-grant-tables
TokuDB-related options are all defaults, as I understand from documentation TokuDB comes with good settings out-of-box, but I am ready to tune something if there are suggestions.