DevOps Zone is brought to you in partnership with:

Jenny is a DZone MVB and is not an employee of DZone and has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

DevOps and More Better Analytics

06.28.2013
| 2387 views |
  • submit to reddit

This post via Jenny Yang is by Toufic Boubez.



At DevOpsDays Silicon Valley last week I gave a talk titled: Beyond pretty charts, Analytics for the rest of us. The talk was inspired by an Open Space discussion that was started at DevOps Days Austin. It was a really good discussion, with a lot of interest, so I really wanted to dive deeper. Foolishly Patrick (@patrickdebois) and John (@lusis) gave me the opportunity to stand on my REST box (sorry) and wave my arms around in front of everyone. I think it went well :) .

The main premise of the talk and the Austin session was that current monitoring tools are clearly reaching the limit of their capabilities. That’s because they are making some fundamental assumptions that are no longer true. Mainly, they are assuming that the underlying systems being monitored are relatively static and therefore their behavioural limits can be defined and surrounded by static rules and thresholds. It has become clear however that we have moved beyond static monitoring and that interest in applying analytics and machine learning to detect anomalies in dynamic web environments is gaining steam. However, understanding which algorithms should be used to identify and predict anomalies accurately within all that data we generate is not so easy.

I spent time talking about the importance of knowing your data and its characteristics in order to be able to use the appropriate analytics techniques. For example, techniques such as the three-sigma rule or the Grubbs score (check out kale, the most excellent tool introduced by @abestanway at Etsy) are only meaningful if your data has a normal probability distribution. I also covered the importance of context, and described some simple data transformations that can give you powerful results, such as the simple act of looking at a histogram of your data, because the regular time series plots can only give you so much insight.

I ended up straying a bit from my original topic, mostly due to a really good Open Space the day before about self-healing systems. I’d given a talk about control theory at the Bay LISA meetup a couple of nights before, and the combination of the excitement at each was too contagious to resist, so we got into a discussion about open and closed loop systems, which I had mentioned in a previous blog post. Finally, I broached the seemingly taboo topic of how much data to you really need? Of course, if you know me, this led to the Nyquist-Shannon sampling theorem :) . Oh yeah, there was also mention of cats and Disaster Girl.

Check it out. The video and slides are embedded above. Please feel free to send questions or comments. I might even reply! Next blog, I’ll look at some more analytics algorithms. Yes, I promise (this is reminiscent of the “free beer tomorrow” sign I see at some pubs).

Published at DZone with permission of Jenny Yang, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)