Big Data/Analytics Zone is brought to you in partnership with:

John Cook is an applied mathematician working in Houston, Texas. His career has been a blend of research, software development, consulting, and management. John is a DZone MVB and is not an employee of DZone and has posted 175 posts at DZone. You can read more from them at their website. View Full User Profile

Outliers and Kettleballs

  • submit to reddit

When you reject a data point as an outlier, you’re saying that the point is unlikely to occur again, despite the fact that you’ve already seen it. This puts you in the curious position of believing that some values you have not seen are more likely than one of the values you have in fact seen.

Maybe you believe that you did not actually see the outlier. If you’re looking at a set of human heights, and one of the values is 61 feet, it is more plausible that you’ve seen a transcription error than that you’ve encountered a person an order of magnitude taller than average.

But if you believe that a data point is real, but unlikely to reoccur, you are placing more weight on subjective belief than on data, which may or may not be appropriate.

Here’s a personal example. This weekend I bought a kettlebell. As I was waiting in line to check out, I struck up a conversation with the man in line behind me. His right leg was in a cast and resting on a scooter. He told me that he broke his foot in two places by dropping a kettlebell on it! My immediate thought was that this was a fluke, an outlier. My second thought was that according to the only data I have, kettlebells are quite dangerous.

Perhaps the rational decision would have been to leave the store immediately, but I bought the kettlebell anyway. Still, the fellow behind me made an impression. I will think of him every time I work out with the kettlebell and be more careful than I would have been otherwise. Kettlebells are probably more dangerous than I’d like to believe, but so is a sedentary life.

Published at DZone with permission of John Cook, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)