The Big Data Analytics Landscape
As a technologist and evangelist working in the big data marketplace, I'm excited by the new products my company brings
to market, and how this functionality helps to bridge the gap
for Enterprises adoption. It's also a bit surreal, seeing the number of
blog and tweets on Big Data rise. There seems to be a new big data
conference every week.
It's interesting to monitor other vendors in the marketplace and how
they position their offerings. There is certainly a lot of clever
marketing going on (that I believe in time will show a lack of
substance), and some innovation too. You know who you are...but jump on the bandwagon. Just because you might have Hadoop and your
database + analytics within the same rack, that doesn't mean they're
integrated.
You see a lot of fervor from people discussing open
source products. Those who know me know that I'm a long time
UNIX guru -- over 20 years from the early BSD distributions to working with
UNIX on mainframes, to even getting minix to work on PCs before Linux
came along. Fun times, but I was young, free, and single, and enjoyed the
technical challenge. I was also working in a research department in a
university. The argument that Hadoop is free, easy to
implement, and will one day replace data warehousing -- well, that doesn't ring true
for me. Certainly it has a place, and provides value, but
it doesn't come at zero cost. Certainly Hortonworks and Cloudera provide
distributions that are reducing the installation/configuration and
management effort, but you have multiple distributions, starting to go
in different directions? MapR for example?
How many enterprises really want to get that involved in running and
maintaining this infrastructure. Surely they should be focused on
identify new insights that provides business benefits or gives greater
competitive advantage. IT has an important role to play, but it will be
the business users ultimately that need to leverage the platform to gain
these insights.
It is no use getting insights, if you don't take action on them either.
Insight gained from big data analytics should be fed into existing EDW
(if they exist) so they can enhance what you already have and the EDW
provides you with a better means of operationalizing the results.
I say to those people who think Hive is a replacement for SQL, not
yet it ain't, it doesn't provide the completeness or performance that a
pure SQL engine can provide. You don't replace 30+ years of R&D that
quickly...
To the NoSQL folks, this debate is taking
on religious fervour at times, It has a role, but I don't see it
replacing the relational database overnight either.
In a previous role I managed a complex DB Environment that included a
Big Data platform for a company that operated in the online gaming
marketplace in a very much 24 X 7 environment, with limited downtime. It
was the bleeding edge at times, growing very fast. If we had Teradata
Aster 5.0 then, my life would have been so much easier. Se had an
earlier release but we learned a lot. We proved the value of SQL
combined with the Map Reduce programming paradigm. We saw the ease of
scaling and reliability, We delivered important insights into various
types of fraud, and took action on them, which yielded positive kudos
for the company and increased player trust, which is very important in
an online marketplace. We also were able to leverage the platform for an
novel ODS requirement and had both executing simultaneously along with
various ad-hoc queries. I was also lucky then and since to meet real
visionaries, like Mayank and Tasso which gives you confidence in the
approach and the future direction
When you think of big data analytics, it just not just about multi
structure data or new data sources. Using SQL/MR for example may be the
most performant way to yield new insights from existing relational data.
Also consider what 'grey data' already exists within your
organisations, it maybe easier to tap into that first, before sourcing
new data feeds. The potential business value should drive that decision
though.
Don't underestimate the importance of having a discovery platform as
you tackle these new Big Data challenges. Yes, you will probably need
new people or even better, train existing analysts to take on these new
skills and grow your own data scientists. The ease of this approach,
will be in how feature rich your discovery platform is, How many built
in and useful analytical functions are provided to get you started,
before you may have to develop specific ones of your own.
Maybe I'm rambling with these comments, but help is at hand! We recently put
together a short webinar, about 20 minutes in duration.
The Big Data Analytics Landscape: Trends, Innovations and New
Business Value, featuring Gartner Research Vice President Merv Adrian
and Teradata Aster Co-President Tasso Argyros. In the video, Merv and
Tasso, answer these questions and more, including how organizations can
find the right solution - to make smarter decisions, take calculated
risks, and gain deeper insights than their industry peers.
- How do you cost-effectively harness and analyze new big data sources?
- How does the role of a data scientist differ from other analytic professionals?
- What skills does the data scientist need?
- What are the differences between Hadoop, MapReduce, and a Data Discovery Platform?
- How are these new sources of big data and analytic techniques and technology helping organizations find new truths and business opportunities?
What do you think?
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





