Performance Zone is brought to you in partnership with:

I specialise MySQL Server performance as well as in performance of application stacks using MySQL, especially LAMP. Web sites handling millions of visitors a day dealing with terabytes of data and hundreds of servers is king of applications I love the most. Peter is a DZone MVB and is not an employee of DZone and has posted 195 posts at DZone. You can read more from them at their website. View Full User Profile

Virident vCache vs. FlashCache: Part 2

05.17.2013
| 994 views |
  • submit to reddit
This post comes from  at the MySQL Performance Blog.

This is the second part in a two-part series comparing Virident’s vCache to FlashCache. The first part was focused on usability and feature comparison; in this post, we’ll look at some sysbench test results.

Disclosure: The research and testing conducted for this post were sponsored by Virident.

First, some background information. All tests were conducted on Percona’s Cisco UCS C250 test machine, and both the vCache and FlashCache tests used the same 2.2TB Virident FlashMAX II as the cache storage device. EXT4 is the filesystem, and CentOS 6.4 the operating system, although the pre-release modules I received from Virident required the use of the CentOS 6.2 kernel, 2.6.32-220, so that was the kernel in use for all of the benchmarks on both systems. The benchmark tool used was sysbench 0.5 and the version of MySQL used was Percona Server 5.5.30-rel30.1-465. Each test was allowed to run for 7200 seconds, and the first 3600 seconds were discarded as warmup time; the remaining 3600 seconds were averaged into 10-second intervals. All tests were conducted with approximately 78GiB of data (32 tables, 10M rows each) and a 4GiB buffer pool. The cache devices were flushed to disk immediately prior to and immediately following each test run.

With that out of the way, let’s look at some numbers.

vCache vs. vCache – MySQL parameter testing

The first test was designed to look solely at vCache performance under some different sets of MySQL configuration parameters. For example, given that the front-end device is a very fast PCIe SSD, would it make more sense to configure MySQL as if it were using SSD storage or to just use an optimized HDD storage configuration? After creating a vCache device with the default configuration, I started with a baseline HDD configuration for MySQL (configuration A, listed at the bottom of this post) and then tried three additional sets of experiments. First, the baseline configuration plus:

innodb_read_io_threads = 16
innodb_write_io_threads = 16

We call this configuration B. The next one contained four SSD-specific optimizations based partially on some earlier work that I’d done with this Virident card (configuration C):

innodb_io_capacity = 30000
innodb_adaptive_flushing_method = keep_average
innodb_flush_neighbor_pages=none
innodb_max_dirty_pages_pct = 60

And then finally, a fourth test (configuration D) which combined the parameter changes from tests B and C. The graph below shows the sysbench throughput (tps) for these four configurations:
vcache_trx_params
As we can see, all of the configuration options produce numbers that, in the absence of outliers, are roughly identical, but it’s configuration C (shown in the graph as the blue line – SSD config) which shows the most consistent performance. The others all have assorted performance drops scattered throughout the graph. We see the exact same pattern when looking at transaction latency; the baseline numbers are roughly identical for all four configurations, but configuration C avoids the spikes and produces a very constant and predictable result.

vcache_response_params

vCache vs. FlashCache – the basics

Once I’d determined that configuration C appeared to produce the most optimal results, I moved on to reviewing FlashCache performance versus that of vCache, and I also included a “no cache” test run as well using the base HDD MySQL configuration for purposes of comparison. Given the apparent differences in time-based flushing in vCache and FlashCache, both cache devices were set up so that time-based flushing was disabled. Also, both devices were set up such that all IO would be cached (i.e., no special treatment of sequential writes) and with a 50% dirty page threshold. Again, for comparison purposes, I also include the numbers from the vCache test where the time-based flushing is enabled.

vcache_fcache_trx_params

As we’d expect, the HDD-only solution barely registered on the graph. With a buffer pool that’s much smaller than the working set, the no-cache approach is fairly crippled and ineffectual. FlashCache does substantially better, coming in at an average of around 600 tps, but vCache is about 3x better. The interesting item here is that vCache with time-based flushing enabled actually produces better and more consistent performance than vCache without time-based flushing, but even at its worst, the vCache test without time-based flushing still outperforms FlashCache by over 2x, on average.

Looking just at sysbench reads, vCache with time-based flushing consistently hit about 27000 per second, whereas without time-based flushing it averaged about 12500. FlashCache came in around 7500 or so. Sysbench writes came in just under 8000 for vCache + time-based flushing, around 6000 for vCache without time-based flushing, and somewhere around 2500 for FlashCache.

vcache_fcache_read_write

We can take a look at some vmstat data to see what’s actually happening on the system during all these various tests. Clockwise from the top left in the next graph, we have “no cache”, “FlashCache”, “vCache with no time-based flushing”, and “vCache with time-based flushing.” As the images demonstrate, the no-cache system is being crushed by IO wait. FlashCache and vCache both show improvements, but it’s not until we get to vCache with the time-based flushing that we see some nice, predictable, constant performance.

cpu-usage-all

So why is it the case that vCache with time-based flushing appears to outperform all the rest? My hypothesis here is that time-based flushing allows the backing store to be written to at a more constant and, potentially, submaximal, rate compared to dirty-page-threshold flushing, which kicks in at a given level and then attempts to flush as quickly as possible to bring the dirty pages back within acceptable bounds. This is, however, only a hypothesis.

vCache vs. FlashCache – dirty page threshold

Finally, we examine the impact of a couple of different dirty-page ratios on device performance, since this is the only parameter which can be reliably varied between the two in the same way. The following graph shows sysbench OLTP performance for FlashCache vs. vCache with a 10% dirty threshold versus the same metrics at a 50% dirty threshold. Time-based flushing has been disabled. In this case, both systems produce better performance when the dirty-page threshold is set to 50%, but once again, vCache at 10% outperforms FlashCache at 10%.

vcache-dirty_trx_params

The one interesting item here is that vCache actually appears to get *better* over time; I’m not entirely sure why that’s the case or at what point the performance is going to level off since these tests were all run for 2 hours anyway, but I think the overall results still speak for themselves, and even with a vCache volume where the dirty ratio is only 10%, such as might be the case where a deployment has a massive data set size in relation to both the working set and the cache device size, the numbers are encouraging.

Conclusion

Overall, the I think the graphs speak for themselves. When the working set outstrips the available buffer pool memory but still fits into the cache device, vCache shines. Compared to a deployment with no SSD cache whatsoever, FlashCache still does quite well, massively outperforming the HDD-only setup, but it doesn’t even really come close to the numbers obtained with vCache. There may be ways to adjust the FlashCache configuration to produce better or more consistent results, or results that are more inline with the numbers put up by vCache, but when we consider that overall usability was one of the evaluation points and combine that with the fact that the best vCache performance results were obtained with the default vCache configuration, I think vCache can be declared the clear winner.

Base MySQL & Benchmark Configuration

All benchmarks were conducted with the following:

sysbench ­­--num­-threads=32 ­­--test=tests/db/oltp.lua ­­--oltp_tables_count=32 \
--oltp­-table­-size=10000000 ­­--rand­-init=on ­­--report­-interval=1 ­­--rand­-type=pareto \
--forced­-shutdown=1 ­­--max­-time=7200 ­­--max­-requests=0 ­­--percentile=95 ­­\
--mysql­-user=root --mysql­-socket=/tmp/mysql.sock ­­--mysql­-table­-engine=innodb ­­\
--oltp­-read­-only=off run
The base MySQL configuration (configuration A) appears below:
#####fixed innodb options
innodb_file_format = barracuda
innodb_buffer_pool_size = 4G
innodb_file_per_table = true
innodb_data_file_path = ibdata1:100M
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 128M
innodb_flush_log_at_trx_commit = 1
innodb_log_file_size = 1G
innodb_log_files_in_group = 2
innodb_purge_threads = 1
innodb_fast_shutdown = 1
#not innodb options (fixed)
back_log = 50
wait_timeout = 120
max_connections = 5000
max_prepared_stmt_count=500000
max_connect_errors = 10
table_open_cache = 10240
max_allowed_packet = 16M
binlog_cache_size = 16M
max_heap_table_size = 64M
sort_buffer_size = 4M
join_buffer_size = 4M
thread_cache_size = 1000
query_cache_size = 0
query_cache_type = 0
ft_min_word_len = 4
thread_stack = 192K
tmp_table_size = 64M
server­id = 101
key_buffer_size = 8M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 8M
myisam_sort_buffer_size = 8M
myisam_max_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover
Published at DZone with permission of Peter Zaitsev, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)