Big Data/BI Zone is brought to you in partnership with:

Eric is living in Chapel Hill, NC. By night, he writes and edits science fiction. On weekends, he spends too much time making plumbers hop on things. Eric has posted 249 posts at DZone. You can read more from them at their website. View Full User Profile

Etsy Engineer: "Whom the Gods Would Destroy, They First Give Real-Time Analytics"

01.11.2013
| 1677 views |
  • submit to reddit
Prognosticating analysts suggest that 2013 will be the year of real-time analytics. But Dan McKinley, Principal Engineer at Etsy.com, suggests we all hold on a sec. "Whom the gods would destroy," he writes, "they first give real-time analytics."  


...There are many ways to screw yourself with real-time analytics. I will endeavor to list a few.


The first and most fundamental way is to disregard statistical significance testing entirely. This is a rookie mistake, but it's one that's made all of the time. Let's say you're testing a text change for a link on your website. Being an impatient person, you decide to do this over the course of an hour. You observe that 20 people in bucket A clicked, but 30 in bucket B clicked. Satisfied, and eager to move on, you choose bucket B. There are probably thousands of people doing this right now, and they're getting away with it.

This is a mistake because there's no measurement of how likely it is that the observation (20 clicks vs. 30 clicks) was due to chance. Suppose that we weren't measuring text on hyperlinks, but instead we were measuring two quarters to see if there was any difference between the two when flipped. As we flip, we could see a large gap between the number of heads received with either quarter. But since we're talking about quarters, it's more natural to suspect that that difference might be due to chance. Significance testing lets us ascertain how likely it is that this is the case.

A subtler error is to do significance testing, but to halt the experiment as soon as significance is measured. This is always a bad idea, and the problem is exacerbated by trying to make decisions far too quickly. Funny business with timeframes can coerce most A/B tests into statistical significance.


It's not a jeremiad against real-time analytics tools, but rather an appeal to use the right tools mindfully, without respect to buzz cycles. The full article is well worth a read.

Published at DZone with permission of its author, Eric Gregory.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)