Big Data/BI Zone is brought to you in partnership with:

Reimagining the way work is done through big data, analytics, and event processing, Chris is the cofounder of Successful Workplace. He believes there’s no end to what we can change and improve. Chris is a marketing executive and flew for the US Navy before finding a home in technology 17 years ago. An avid outdoorsman, Chris is also passionate about technology and innovation and speaks frequently about creating great business outcomes at industry events. As well as being a contributor for The TIBCO Blog, Chris contributes to the Harvard Business Review, Venture Beat, Forbes, and the PEX Network. Christopher is a DZone MVB and is not an employee of DZone and has posted 249 posts at DZone. You can read more from them at their website. View Full User Profile

Big Data and Analytics as Both Hero and Villain

06.11.2013
| 2330 views |
  • submit to reddit

As the NSA PRISM debacle continues to unfold and spreads across continents it’s probably good to stop and think about the technology and philosophy behind it all. Because this is big data and analytics in its most potent and controversial form and it’s certainly not the last time we’ll see this hit the headlines.

The NSA built Accumulo, which is a database design based on Google Big Table. Written in Java, the NSA contributed Accumulo to the Apache Foundation and in 2012 it was promoted from incubation to a top level project. Right now the Agency is harvesting petabytes of data in Accumulo, a staggering amount that grows daily. But the cleverest part of all of this is in the analytics, Accumulo was built and extended the Big Table concept to analyze trillions of points in data in order to create intelligence that can detect the connections between those points and the strength of those connections.

If you thought you had it sussed with something like LinkedIn using INmaps (my network is too large to generate one funnily enough) or Facebook Social Graph, think again because through Accumulo the NSA can find out who you are, where you are, who you know and why you know them. It’s graph analysis on steroids and it’s a hot topic right now for making sense of large datasets, primarily by understanding how tightly different data points are related or how similar to each other they are.

If you put this in perspective in terms of the amount of data needed to make this happen, Yahoo a few years back was operating an approximate 42,000-node Hadoop environment, consisting of hundreds of petabytes, and users on Facebook are generating more than 500 terabytes of data every day. So the Agency has someinfrastructure in place. Just what is that infrastructure built on ? I’m sure there are vendors out there keeping a very tight lip. But rest assured, if they are that advanced they’ll have already considered or even built a cloud infrastructure, potentially hybrid, to stay ahead.

But behind all the noise and crowd baiting from the Washington Post the exposure raises more questions about the power of big data and analytics and just how large  and powerful it gets. If you think about all the hype generated about consumer privacy and enterprises collating and analyzing information for a more targeted and personal experience, customer segmentation and demographics, location-based and real-time marketing what the NSA exposure has taught us is that there really is no privacy in the 21st century and we should just get used to it. Our data is anonymized unless it’s being used specifically for our purpose and benefit but the fact is we are happily generating it for them to use in any case.

But Big Data is no longer creepy. Sorry but it’s not. You must live in painful ignorance if you think that every nuance of a digital interaction hasn’t been collected by someone and analysed. What’s clear is that analytics and big data seem to be labelled as only for marketers to hound us with or for banks to sell us more debt laden products. We forget, for example, about the medical and scientific boundaries being broken that rely on data analytics and human generated information to help it along.

At some point there will be consumer based tools affordable enough for people to make sense of the data they generate themselves, and why not, it’s all part of the equation. Personal graph analysis will become a reality as much as its parent will be wielded by enterprises.

So, you see, we have heroes and villains even in data analytics but it’s all a matter of perspective. The NSA are deemed evil for breaching our liberties and analyzing data without our consent to understand terrorist activities, and medical science is a force of good for helping us cure diseases using data sourced from all manner of places.

And to paraphrase the Man of Steel trailer -

“The analysts believed that if the world found out what big data really was, they’d reject me. They were convinced that the world wasn’t ready. What do you think?

Published at DZone with permission of Christopher Taylor, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)