Big Data/Analytics Zone is brought to you in partnership with:

Eric is living in Chapel Hill, NC. By night, he writes and edits science fiction. On weekends, he spends too much time making plumbers hop on things. Eric has posted 249 posts at DZone. You can read more from them at their website. View Full User Profile

Cloud Deployments: Using Hadoop on Clouds

  • submit to reddit
Packt Publishing has provided Chapter 10 of their forthcoming Hadoop MapReduce Cookbook for DZone Readers, covering Hadoop and Amazon ElasticMapReduce. The chapter explores:

  1. Running WordCount using Amazon ElasticMapReduce (EMR)
  2. Saving money using Amazon EC2 Spot Instances to execute EMR Job Flows
  3. Executing a Hive script using EMR
  4. Creating an Amazon EMR job flow using the Command Line Interface  
  5. Using Apache Whirr to deploy an Apache Hadoop cluster in EC2 cloud environment
Computing clouds provide on-demand horizontally scalable computing resources with no upfront capital investment, making them an ideal environment to perform occasional large scale Hadoop computations. In this chapter, we explore several mechanisms to deploy and execute Hadoop MapReduce and Hadoop related computations on cloud environments.  

This chapter discusses how to use Amazon Elastic MapReduce (EMR), hosted Hadoop infrastructure, to execute traditional MapReduce computations as well as  Hive computations on the Amazon EC2 cloud infrastructure. We will also use Apache Whirr, a cloud neutral library for deploying services on cloud environments, to provision a Apache Hadoop HBase cluster on cloud environments.

You can download the full chapter (in .doc format) here.

Published at DZone with permission of its author, Eric Gregory.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)