Big Data/Analytics Zone is brought to you in partnership with:

Disruptive Innovator and Senior Executive with a passion transforming industries by applying cutting edge technologies (Cloud, Big Data, M2M, Distributed Machine Learning, Open Source Hardware, etc.) and business innovations (Freemium, Gamification, Social, Long Tail, Pay per Use, SaaS Subscriptions, Design Thinking, Blue Ocean Strategy, Lean Startup, etc.) to generate new revenues Maarten is a DZone MVB and is not an employee of DZone and has posted 35 posts at DZone. You can read more from them at their website. View Full User Profile

Real-Time Hadoop Queries Will Be a Reality in 2013

12.06.2012
| 6126 views |
  • submit to reddit

Real-Time Hadoop queries will be a reality in 2013 thanks to two new projects from ClouderaImpala and Trevni.

Impala is the open source version of Dremel, Google’s proprietary big data query solution. A first beta is available and the production version is foreseen for Q1 2013.

Impala allows you to run real-time queries on top of Hadoop’s HDFSHbase and Hive. No migrations necessary.

However the real revolution will only get better when Doug Cutting [the creator of Lucene, Hadoop, etc.]‘s Trevni is integrated into Impala. Trevni is a new columnar data storage format that promises superior performance for reading large columnar stored data sets.

Impala + Trevni is promising real-time big data queries with multiple joins that are on par in performance but have more functionality than Google’s Dremel…

Published at DZone with permission of Maarten Ectors, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Peter Karussell replied on Fri, 2012/12/07 - 10:58am

Well, either Impala is not really active (any more) or cloudera is hiding the activity: https://github.com/cloudera/impala/commits/master

Bertrand Dechoux replied on Sun, 2012/12/16 - 7:59am in response to: Peter Karussell

Actually, it is the Apache Drill project that is an implementation based on Dremel.

http://wiki.apache.org/incubator/DrillProposal

Impala is based on (google's private) F1 for which there is no public documentation/publication.

http://www.wired.com/wiredenterprise/2012/10/cloudera-impala-hadoop/

But the title holds true. Although the meaning of 'real time' could lead to a meaningless semantic fight, the tools for interactively exploring the data within 'Hadoop' are definitely coming next year. I would advise anybody interested in the subject to watch Ted Dunning's presentations and to join your local Hadoop User Group as it is very likely Cloudera will also talks about Impala there.

http://wiki.apache.org/hadoop/HadoopUserGroups




Bertrand Dechoux replied on Sun, 2012/12/16 - 10:51am in response to: Peter Karussell

 @Peter Kar : Cloudera has published a post answering your point.

http://blog.cloudera.com/blog/2012/12/whats-next-for-cloudera-impala/

They are planning to "provide more transparency".

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.