Real-Time Hadoop Queries Will Be a Reality in 2013
Real-Time Hadoop queries will be a reality in 2013 thanks to two new projects from Cloudera: Impala and Trevni.
Impala is the open source version of Dremel, Google’s proprietary big data query solution. A first beta is available and the production version is foreseen for Q1 2013.
Impala allows you to run real-time queries on top of Hadoop’s HDFS, Hbase and Hive. No migrations necessary.
However the real revolution will only get better when Doug Cutting [the creator of Lucene, Hadoop, etc.]‘s Trevni is integrated into Impala. Trevni is a new columnar data storage format that promises superior performance for reading large columnar stored data sets.
Impala + Trevni is promising real-time big data queries with multiple joins that are on par in performance but have more functionality than Google’s Dremel…
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





Comments
Peter ___ replied on Fri, 2012/12/07 - 10:58am
Well, either Impala is not really active (any more) or cloudera is hiding the activity: https://github.com/cloudera/impala/commits/master
Bertrand Dechoux replied on Sun, 2012/12/16 - 7:59am
in response to:
Peter ___
Actually, it is the Apache Drill project that is an implementation based on Dremel.
http://wiki.apache.org/incubator/DrillProposal
Impala is based on (google's private) F1 for which there is no public documentation/publication.
http://www.wired.com/wiredenterprise/2012/10/cloudera-impala-hadoop/
But the title holds true. Although the meaning of 'real time' could lead to a meaningless semantic fight, the tools for interactively exploring the data within 'Hadoop' are definitely coming next year. I would advise anybody interested in the subject to watch Ted Dunning's presentations and to join your local Hadoop User Group as it is very likely Cloudera will also talks about Impala there.
http://wiki.apache.org/hadoop/HadoopUserGroups
Bertrand Dechoux replied on Sun, 2012/12/16 - 10:51am
in response to:
Peter ___
@Peter Kar : Cloudera has published a post answering your point.
http://blog.cloudera.com/blog/2012/12/whats-next-for-cloudera-impala/
They are planning to "provide more transparency".