NoSQL Zone is brought to you in partnership with:

Cody Powell (@codypo) is the cofounder and CTO of Famigo. Famigo's main offering is a cross-platform recommendation engine for mobile content, helping families find things like the best android apps, best iPad apps, and free apps. He's a graduate of Trinity University, an ardent supporter of the Texas Rangers, and he makes a mean mojito. Cody is a DZone MVB and is not an employee of DZone and has posted 25 posts at DZone. You can read more from them at their website. View Full User Profile

The Joys of Map Reduce Thanks to CouchDB

03.22.2012
| 6265 views |
  • submit to reddit

I was tasked with tracking everything a user could do with any of our products. Easy enough, right? I basically only needed to track every page view on our website, every gameplay call to our API, certain Javascript function calls, all attempts to call incomplete, but still publicly visible web and API features, and then just random user actions that are arbitrarily interesting. Oh, and let's not forget the part where we turn that big blob of crazy data into something meaningful for me and my cofounders. Ohhhhhh-kay.

How would I approach that problem in a relational database world? Well, step 1 would probably be start crying, followed quickly by step 2, wetting my pants. I'd embarrass myself like so because the data that I described above is all different. Each class of data has different things we care about. For example, if it's a page view on the website, where did they come from? If it's an API call, who was making the call? If it's a random user action like a successful upload of a video, what'd they upload? The data is more different than it is similar.

With the data being that disparate, I could've explored a few different relational approaches. I could've come up with a table for each one of these user accesses, with maybe a base UserAccess table that I join against, and then have big switch statements for determining what I insert and select from. Or, I could just have one mega table that has 9 gagillion, nullable columns. Perhaps I would've gone with a simple, completely generic table structure that stored all of the interesting parts of the data in XML, and then reach for the Wild Turkey when it came time to query that. I've tried all of these approaches before, and it always seemed harder than it should've been and it resulted in something that was very difficult to maintain.

Fortunately, I didn't have to engage in any of that idiocy because I have a little friend named CouchDB. As you might well know, CouchDB is non-relational and schema-free; it's one of those wacky NoSQL databases. It happens to serve us particularly well for great big blobs of data with differing structures, much like all of this user access data.

It's one thing to be adept at storing wacky data like this, but the hard part is really the analysis portion, where you select the data in such a way that sense can be made out of the whole thing. Fortunately for me and everyone who must put up with me, this is made easy via some wonderful tools borrowed from the world of functional programming. I'm speaking specifically of map and reduce. (If it's been a while since you brushed up on what map and reduce do, pay a visit to Mr. Wikipedia.)

Since each user access, no matter what kind, gets stored as a unique document inside of CouchDB, it's simple for me to write a map function that goes through each document and emits the fields I'm interested in. It's similarly simple to write a reduce function that accumulates all of that data and does something interesting with it, whether it's summing, averaging, or some funnel analysis. (I should note one awesome aspect of CouchDB - its native support for viewing your data via maps and reduces.)

Once I had all of my data stored, I was able to produce a pretty impressive dashboard of exactly what our users are doing via 7 or 8 different map and reduce functions. The function themselves were quite simple too, just two or three lines of code. Could I have recreated those same results in standard SQL? Sure. Would I have wanted to? Hells to the no. Relational databases are great for certain problems, but for flexibly structured data, I encourage everyone to dip their toes into the deep end of NoSQL and map reduce.


Published at DZone with permission of Cody Powell, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)