Big Data/Analytics Zone is brought to you in partnership with:

Reimagining the way work is done through big data, analytics, and event processing, Chris is the cofounder of Successful Workplace. He believes there’s no end to what we can change and improve. Chris is a marketing executive and flew for the US Navy before finding a home in technology 17 years ago. An avid outdoorsman, Chris is also passionate about technology and innovation and speaks frequently about creating great business outcomes at industry events. As well as being a contributor for The TIBCO Blog, Chris contributes to the Harvard Business Review, Venture Beat, Forbes, and the PEX Network. Christopher is a DZone MVB and is not an employee of DZone and has posted 305 posts at DZone. You can read more from them at their website. View Full User Profile

Big Data Piled Up So High It Reaches the Cloud

  • submit to reddit

“Big data is any data that when you pile it up reaches into the Cloud.” This was the opening statement for Jack Norris, CMO of MapR at the Cloud Connect Conference in Santa Clara today. He was paraphrasing the analysts but it was the ideal frame up for the Big Data Track at a Cloud conference.

A new paradigm

According to Norris, big data and Cloud are a paradigm shift and an architectural change that involves putting data and computing power together as a massive processing unit.

Norris drilled into this by describing the challenge facing today’s enterprise: Separating data and computing as data grows is taking longer and longer. More and more, organizations need to

  • Process more quickly – Things are moving faster every day and competitive businesses need to keep up
  • Combine multiple data sources – Organizations need to blend data to gain insights. That data can’t be stored in one place and can even be outside the organization (such as in the cloud)
  • Expand analysis – There are limits on traditional systems and organizations need to go beyond the traditional SQL-based analysis of the past

Apache ProjectsThese needs led organizations like Google and others to grow their own tools that are now a big data ecosystem. Norris used the picture at the right to describe this ecosystem.

Hadoop in the Cloud

The most interesting part of of the big data story for this setting was how Hadoop and big data are used in the Cloud. For many companies going this direction, Hadoop in the cloud is a very flexible infrastructure. While we often hear about performance questions with Cloud, Norris brought up the current MinuteSort record of 1.5 TB, set by Google working with MapR as proof that Cloud performance is less and less of a question.

It takes more data

Where Norris made some of his strongest points came with his contention that greater data is now filling in the gaps where we used to use complex algorithms. Many things, like human behavior, have been deemed too complex to understand completely. Norris pointed out using uses cases like fraud detection, flu trends and the Netflix recommendation engine to show that even the most complex behavior becomes predictable when enough data comes to the table.

Simple algorithms

If this concept is true, than the ability to add additional data in cost-effective ways becomes one of the most important enterprise strategies available. It’s easy to see where the Cloud plays a critical role in providing a place for that data to sought, reached, and incorporated efficiently.

Norris provided the following examples of where Hadoop is being used in the Cloud:

  • Targeted advertising/clickstream analysis
  • Security for anti-virus, fraud detection, and image recognition
  • Pattern matching/recommendations
  • Data warehousing/BI
  • Bio-informatics like genome analysis
  • Financial simulation like Monte Carlo
  • File processing like image resizing and video encoding
  • Web indexing
Big Data lessons

This was a very comprehensive talk and drew a sizable crowd for the last day of the event. Norris closed with “Big Data Lessons from the Cloud”:

  • Big Data requires a new approach
  • Hadoop is a paradigm shift
  • Easy to get started with Hadoop in the Cloud
  • Scale clusters up and down in the Cloud
  • Only pay for what you use
  • Expand data for analysis
  • Combine data sources
  • New application from new data source
  • New analytics
  • Wide variety of applications appropriate for Hadoop
Published at DZone with permission of Christopher Taylor, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)