NoSQL Zone is brought to you in partnership with:

Gonzalo Ayuso is a Web Architect with more than 10 year of experience in the web development, specialized in Open Source technologies. Experienced delivering scalable, secure and high performing web solutions to large scale enterprise clients. Blogs at gonzalo123.com. Gonzalo is a DZone MVB and is not an employee of DZone and has posted 46 posts at DZone. You can read more from them at their website. View Full User Profile

Multi-Master File-System Replication with CouchDB

04.09.2012
| 2320 views |
  • submit to reddit

When we want to work with a cloud/cluster one of the most common problems is the file-system. It’s mandatory to be able to scale horizontally (scale out). We need to share the same file-system between our nodes. We can do it with a file server (samba for example), but this solution inserts a huge bottleneck into our application. There’s different distributed filesystems such as Apache Hadoop (inspired by Google’s MapReduce and Google File System). In this post we’re going to build a really simple distributed storage system based on NoSql. Let’s start.

NoSql (aka one of our last hypes) databases normally allow to store large files. MongoDB for example has GridFS, But in this post we’re going to do it using CouchDB. With CouchDB we can attach documents within our database as simple attachments, just like email.

The api of CouchDB to upload an attachment is very simple. It’s a pure REST api.

In order to create the http connections directly with curl commands we can use libraries to automate this process. For example we can use a simple library shown in a previous post. If we inspect the code we will see that we’re creating a PUT request to store the file in our couchDB database.

Another cool thing we can do with PHP is to create a stream-wrapper to use standard filesystem functions for read/write/delete we can see a post about this here.

As we can see is very easy to use couchdb as filesystem. but we also can replicate our couchDB databases. Imagine that we have tho couchDB servers (host1 and host2). Each host has one database called fs. If we run the following command:

curl -X POST -H "Content-Type: application/json" http://host1:5984/_replicate -d '{"source":"cmr","target":"http://host2:5984/fs","continuous":true}'

Our database will be replicated from host1 to host2 in a continuous mode. That’s means everytime we create/delete anything in host1, couchDB will replicate it to host2. A simple master-slave replica.

Now if we execute the same command in host2:

curl -X POST -H "Content-Type: application/json" http://host2:5984/_replicate -d '{"source":"cmr","target":"http://host1:5984/fs","continuous":true}'

We have a multi-master replica system, cheap and easy to implement. As we can see we only need to install couchDB in each node, activate the replica and that’s all. Pretty straightforward, isn’t it?

Published at DZone with permission of Gonzalo Ayuso, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)