Big Data/Analytics Zone is brought to you in partnership with:

Eric is living in Chapel Hill, NC. By night, he writes and edits science fiction. On weekends, he spends too much time making plumbers hop on things. Eric has posted 249 posts at DZone. You can read more from them at their website. View Full User Profile

Getting Hadoop, Hive and HBase Up and Running in Less than 15 Minutes

  • submit to reddit

Note: This tutorial comes from guest writer Mark Grover. Enjoy.


If you have delved into Apache Hadoop and related projects, you know that installing and configuring Hadoop is hard. Often, a minor mistake during installation or configuration with messy tarballs will lurk for a long time until some otherwise innocuous change to the system or workload causes difficulties. Moreover, there is little to no integration testing among different projects (e.g. Hadoop, Hive, HBase, Zookeeper, etc.) in the ecosystem. Apache Bigtop is an open source project aimed at bridging exactly those gaps by:

1. Making it easier for users to deploy and configure Hadoop and related projects on their bare metal or virtualized clusters.

2. Performing integration testing among various components in the Hadoop ecosystem.

More about Apache Bigtop

The primary goal of Apache Bigtop is to build a community around the packaging and interoperability testing of Hadoop related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc.) developed by a community with a focus on the system as a whole, rather than individual projects.

The latest released version of Apache Bigtop is Bigtop 0.5 which integrates the latest versions of various projects including Hadoop, Hive, HBase, Flume, Sqoop, Oozie and many more! The supported platforms include CentOS/RHEL 5 and 6, Fedora 16 and 17, SuSE Linux Enterprise 11, OpenSuSE 12.2, Ubuntu LTS Lucid and Precise, and Ubuntu Quantal.

Who uses Bigtop?

Folks who use Bigtop can be divided into two major categories. The first category of users are those who leverage Bigtop to power their own Hadoop Distributions. The second category of users are those who use Bigtop for deployment purposes.

In alphabetical order, they are:

Cloudera leverages Bigtop in its Cloudera’s Distribution, including Apache Hadoop (CDH), a 100% open source Hadoop distribution based on Apache Bigtop.

EMC/Greenplum uses Bigtop extensively as a build framework for their 1000-node Analytics Workbench Cluster.

Juju Charms for Hadoop, HBase, Hive and Zookeeper and the associated packages for Ubuntu are a derivation of Apache Bigtop.

Magna Tempus Group provides ready-to-use, well integrated open source stack for intensive and high-performance in-memory data analysis based on such widely accepted technologies as Bigtop, Hadoop, HBase, Hive and many others.

Trend Micro uses Bigtop as the basis for their internal custom distribution of Hadoop, which starts with Bigtop but then pulls features from different upstream versions and includes Apache licensed non-core contributions as their platform needs dictate.

Uniting Data’s 100% open source platform is a Hadoop distribution based on Apache Bigtop.

WANdisco bases its 100% open source distro, WANdisco Distro (WDD), on Apache Bigtop.

Using Bigtop

Whether or not you have dabbled with Hadoop before, Apache Bigtop can go a long way towards making your life easier by providing infrastructure for easy deployment along with the latest debian and rpm artifacts for various projects. Moreover, these artifacts have been integration tested so you can rely on having a trustworthy cutting edge distribution of Hadoop and related projects on your cluster.  You can use the wiki instructions to set up a pseudo-distributed cluster in no time or use the puppet recipes to set up a fully distributed cluster. You can also make use of soon-to-be-introduced Bigtop integration with Apache Whirr.

If you are a novice and would like to learn more about how you can use Apache Bigtop to quickly deploy Hadoop on your laptop and give it a test drive, or if you are a veteran and are curious to find out how Apache Bigtop can make your cluster more robust and easier to deploy, drop by my talk on Apache Bigtop at ApacheCon NA 2013 on February 26th, 2013.

Published at DZone with permission of its author, Eric Gregory.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Laurent Cohen replied on Fri, 2013/02/15 - 7:48am

That Hadoop requires another project just to make its installation easier, says a lot about how it was engineered. Geez, can we just drop the buzz about Hadoop and start seeing it for what it is: a powerful framework, and a big pain in the butt to install, configure and use.

If you have any choice at all, please choose anything else.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.