Big Data/Analytics Zone is brought to you in partnership with:

Developer with experience in a variety of different systems and technologies, with a customer focus and balance with business goals. Particularly interested in backend and large scale systems, and also interested in high level architecture, and API design. Always open to feedback in order to keep learning and improving as a professional. Rodrigo is a DZone MVB and is not an employee of DZone and has posted 39 posts at DZone. You can read more from them at their website. View Full User Profile

How HDFS Does Replication

08.14.2012
| 7000 views |
  • submit to reddit

As I learned about HBase and HDFS, I wanted to understand how HDFS actually does its replication, whether it's an synchronous replication, what is the flow. As it turns out, it wasn't easy to find the answers to these questions, but I ran into a very good page on the internals and how HDFS works that is worth sharing:

Understanding Hadoop Clusters and the Network

Another aspect of HBase and HDFS is the ingenuity of having a filesystem that is good enough for Hadoop and then building a database on top of it, without reinventing the wheel. Kudos to Google for their work on GFS, MapReduce, and BigTable, which inspired the open source implementations.

 

 

Published at DZone with permission of Rodrigo De Castro, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)