DevOps Zone is brought to you in partnership with:

Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2576 posts at DZone. You can read more from them at their website. View Full User Profile

Adobe Contributes Puppet Modules for Hadoop

06.27.2010
| 10181 views |
  • submit to reddit
Managing Hadoop servers can be a pain, but Puppet makes it easy.  The open source server config automation software makes the sys-admin's job a breeze, even with technologies like Hadoop, which is expanding its enterprise reach.   A great thing about Puppet is the open source community around it that contributes recipes and modules for various configuration automations.  Recently, Adobe contributed to this pool of resources with its own Puppet modules used for auto-managing it's own internal Hadoop servers.  DZone interviewed Luke Kanies, the creator of Puppet and CEO of Puppet Labs (the commercial sponsor of Puppet), about this announcement and the next release of Puppet. 

Puppet is one of the most acclaimed configuration management tools, and it's used by numerous high-profile companies for environment automation.  Google, Adobe, Twitter, Oracle, the NY Stock Exchange, and Bank of America all use Puppet.  The software makes configuration quick and easy to avoid tedious system setups and makes it so you don't have to worry about forgetting a security upgrade or some other configuration step.  Puppet provides Agile Infrastructure, which is unfortunately not as widespread as Agile development.  

For some setups, it helps to have a prior case to look at so that you don't have to "re-invent the wheel."  Configuring and deploying Hadoop across clusters is a complex task, but thanks to the experts at Adobe, there is an excellent blueprint available.  Cristian Ivascu blogged about his steps when deploying hstack to a cluster using Puppet and Hudson:

  1. Trigger a build in Hudson for Hadoop, HBase or anything else we want to deploy.
  2. We click a link next to the newly completed build to push the resulting archives to the Puppet Master repository.
  3. Using ssh, we start up the Puppet clients on each machine. In the future we want to keep them running at all times, but since we’re doing development on the current cluster, a running Puppet tends to mess with our tests (restarting daemons when we kill them for testing is an example).
  4. The last step is to trigger the change in the configuration. The Puppet Master pulls the configuration from a git repository, so we just have to change the version number and push the new file back to source control.

You can find the Adobe recipes for Hadoop, HBase, high availablility, and Zookeeper here.

Kanies also revealed to DZone that the beta release for the next version of Puppet (2.6) would be sometime next week or shortly thereafter.  The previous release number was 0.25, but the Puppet committers decided that the 0.x numbering was out-of-date and made the project look immature (it's actually been around for five years).  He said the unofficial tagline for this would be "It's eleventy times better."

For a deep dive into the questions surrounding Puppet, check out this interview