How to Test & Benchmark CDNs?
I recently came across a very interesting post about CDN testing by Jonathan Klein, someone I respect tremendously in the Web Performance world. During his tests with Webpagetest he was unable to “see” the value provided by the CDN on his site.
I wanted to follow-up on his blog and share some of the knowledge we have acquired both from our experiences at DoubleClick (building our own CDN in 1997, then using various commercial CDNs like Speedera, Akamai, then using a combo Internal + Akamai until Google’s acquisition and switching to the Google stack) and now with Catchpoint constantly monitoring CDNs.
The CDN Promise
The basic promise of the traditional CDN product: Better / Faster Web Performance for your End-Users by serving static assets from the Edge! (Note: CDN companies offer additional products like Site Acceleration, which goes beyond serving static files)
In theory they can deliver on this promise because:
- They have bigger network pipes than you do.
- They have a lot more servers than you can ever ask for.
- They have more network peering than you.
- They know more about network peering, dns, servers…
- They have more engineers dedicated to optimization of various stacks to deliver things faster…
- They can deliver content close to the end-user because they have servers in major cities.
- They have teams monitoring this stuff 24/7.
- They can handle huge spikes.
In other words they leverage economies of scale to provide you with a service that is faster, better, and more cost-effective than what it would be if you relied on your own infrastructure.
So how do you test and benchmark CDNs to make sure they deliver on their promise? This is not so much about the toolset, some will say that Backbone monitoring is not accurate; it’s about the methodology. A benchmark is a benchmark, if there are issues just adjust the lens & zoom in you will see the differences and problems.
The testing period is very important, make sure you test for a long period of at least 1 to 2 weeks! And test as frequently as possible! (I was recently involved in multi CDN benchmark where a company used 1 Million + data points from Catchpoint alone, other tools were used).
Before you start your tests, please make sure the DNS TTL for all URL benchmarked are the same. For example you cannot have a TTL of 60 seconds for cdn.test.com and a TTL of 3,600 seconds for origin.test.com and a TTL of 300 seconds for cdn2.test.com.
Please make sure you are comparing the same file size.
The testing methodology I have used and have seen others use relies on a 2-phase approach:
Phase1: Static Content Performance Testing
Use an external performance tool (Synthetic Backbone or Last Mile, RUM, WPT…) to load 2 pair of files of various sizes (5kb, 10kb, 50kb, 100kb, 500kb…) from both the CDN URL and the Origin URL.
Phase 1 Key Take Aways:
- Your want to make sure that the CDN is faster than your Origin.
- You want the CDN to be good at delivering a 5kb file but also a larger payload.
- One of the goals of CDNs is to minimize the geographical impact by reducing the latency to fetch all the bits necessary to make your page load faster. So make sure to understand the performance impact by geography before and after or origin vs. cdn url.
Phase 2: Web Page Performance Test
Create a custom html page based on your existing webpages. Place CSS files, JS files and Images but NO third parties (ads, widgets, tracking, etc), something that matches your setup (CDNs cannot do anything about 3rd parties). Same as the previous phase, Page A will hit the CDN and Page B will hit your Origin. Your CDN will make your site faster but cannot make the ads load any faster. Monitor and Measure both pages using the same tools as in phase 1.
Phase 2 Key Take Aways:
- Page with CDN is faster, look at render start time, document complete, total web page response.
- Geographical performance improvements.
Metrics & Data to look for
- DNS time. Some CDNs have more complex DNS setup than others and can slow things down. What I have seen is the time gained in Wait time was diluted by slower DNS response time.
Keep in mind that DNS performance from the last mile, end user, is quite different from the tests run in the backbone. End users rely on DNS resolvers of their ISPs or Public Resolvers. Backbone monitoring relies on resolvers that very close to the machine running the tests.
- Connect time: This is to make sure your CDN has great network connectivity, low latency and no packet loss. Additionally you want to make sure it does not get slower during peak hours and they are routing you to the right network peering. Example if an end-user is on Verizon FIOS there is no reason to go through 5 different backbone networks because that CDN does not have a direct peering with Verizon.
- Wait time: This metrics is important when looking at various CDNs, it helps you see if your content is hot on the edge or that does edge needs to fetch it from the Origin servers. The Wait time is also an indicator of potential capacity issues or bad configuration at the CDN level or Origin server (for example setting cache expiration in the past). A CDN will deliver different performance if an asset is hot, requested 100,000+ times in the past hour vs. a few times an hour. A CDN is a shared environment where more popular items are faster to deliver than others, if something is in memory it’s fast, if it has to hit a spinning disk it’s a different story. Thus I would personally consider having Solid State Drives as criteria in my CDN selection.
- Throughput: Make sure that the throughput of the CDN test is higher than the origin no matter what the file size is!
- Traceroutes! You need to run traceroutes from where you are monitoring to make sure you are not mapped to the wrong place. Many CDNs use commercial geo-mapping databases and the data for the IP could be wrong. From my Time Warner home connection in Los Angeles, some CDNs sent requests to UK (at times)
Other things to keep an eye on
- Most CDNs will give you access to a control panel so make sure you monitor your Cache Hit / Miss ratio. How often do they have to come back to the Origin! A good CDN architecture should not come often to the Origin. We disqualified various CDNs at DoubleClick because they would not agree to our miss ratio SLAs. You have to also ask questions about what happen when an edge server does not have that content? How long does it take to purge a file? How long does it take to load a file to the edge? How long before a Cname is active?
- How well the CDN handles public Name Resolvers such as OpenDNS, Dyn, Google? These companies are carrying more and more of the DNS traffic and this could impact certain CDNs geo-load balancing algorithms.
- Are the metrics from the CDN consistent? DNS, Connect, Wait and Response (Please do not just look at averages), remember Great performance is more than speed, its reliability or ability to deliver a consistent experience.
So after doing all these tests, the simple questions that must be answered are:
- Did the CDN improve my performance in key markets where we lack physical server presence?
- By how much? Is it a 5% improvement, 20% or 200%?
- Will this translate into revenue increase? Put in place a way to monitor the long-term impact of CDN on your revenue. The reason I used long terms is because you have probably lost those users due to slow performance, so you have to give them time to come back! (Are users in California placing more orders more since a CDN was introduced and reduced response time by 2 seconds in that market?).
Now beside speed, CDNs do bring other benefits that are not measured in seconds:
- By offloading your static files you have more servers, more bandwidth and personnel to deal with more important things than serving static files.
- They can help you handle DDOS attacks.
- They are better prepared to handle seasonal traffic.
And once you have selected a CDN, and are up and running on a CDN platform, keep an eye on them, always monitor a file from the CDN and the Origin at all times. Another observation I can share with you is a CDN is not a fire & forget technology, you have to stay on top of them, make sure the configuration is up to date, that the GZIP is always on… I have been on many interesting calls where I unfortunately hear a Sales Engineer from a CDN company say “oops, we forgot to turn that on after our last release” or we need to tweak your “map”…”.
I welcome comments, suggestions and tips to help create a common knowledge base about CDN benchmarking. I am also looking forward to the RUM data that Jonathan is going to publish!
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)