NoSQL Zone is brought to you in partnership with:

Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2573 posts at DZone. You can read more from them at their website. View Full User Profile

How I MapReduced a Neo4j Store w/ Hadoop

09.02.2013
| 3460 views |
  • submit to reddit

When exploring very large raw datasets containing massive interconnected networks, it is sometimes helpful to extract your data, or a subset thereof, into a graph database like Neo4j. This allows you to easily explore and visualize networked data to discover meaningful patterns.

When your graph has 100M+ nodes and 1000M+ edges, using the regular Neo4j import tools will make the import very time-intensive (as in many hours to days).

In this talk, I'll show you how we used Hadoop to scale the creation of very large Neo4j databases by distributing the load across a cluster and how we solved problems like creating sequential row ids and position-dependent records using a distributed framework like Hadoop.

About the speaker:

Kris Geusebroek is a developer with a passion for combining technologies to create new possibilities for the people around him. Coming from a Java and GIS background and being a fan of open source software, Kris started working with distributed systems and graph databases in the last couple of years. He's currently working on visualizing Big Data with the help of Hadoop and Neo4j. Kris has spoken at in-house knowledge sharing events and several local meetups. He also has experience doing handson training sessions at the dutch java user group.

--YouTube Page