NoSQL Zone is brought to you in partnership with:

I am a Webscience PhD student at the university of Koblenz and the Founder of http://www.metalcon.de Social news streams are my research interest. René is a DZone MVB and is not an employee of DZone and has posted 36 posts at DZone. You can read more from them at their website. View Full User Profile

Is Neo4j 377 Times Faster than MySQL?

01.06.2013
| 4502 views |
  • submit to reddit

Curator's Note: The content of this article was originally written back in 2011.  Are these claims still valid?  Please give us your thoughts by commenting below.

Over the last weeks I did some more work on neo4j. And I am ready to present some more results on the speed (In my use case neo4j outperformed MySQL by a factor of 377 ! That is more than two magnitudes). As known one part of my PhD thesis is to create a social newsstream application around my social networking site metalcon.de. It is very obvious that a graph structure for social newsstreams are very natural: You go to a user. Travers to all his friends or objects of interest and then traverse one step deeper to the newly created content items. A problem with this kind of application is the sorting by Time or relvance of the content items. But before I discuss those problems I just want to present another comparission between MySQL and neo4j.

Setting:

I took a fairly small dataset with the following properties:

  • 14986 content items (entries in a forum of a band)
  • 1391 Bands
  • 854 user having at leas one of those bands in their list of fav bands
  • a bipartit graph of fan relations between users and bands

For every User I wanted to select all content items from his favourite bands. I know this is far away from the social newsstream application I am about to build but I used it as a first test to see weather graph databases really are the more natural setting for my questions.

Results:

In MySQL this would look something like this:
for all (User_ID in Users){

SELECT ce.Text
FROM Entry e
JOIN Topic t ON t.ID = e.Topic_ID
JOIN UserBand ub ON ct.Band_ID = ub.Band_ID
WHERE ub.User_ID = User_ID

}

Even though we have all relevant colums indexed those joins are expensive. Especially because the Entry Table is much bigger than 14986 Items.

Using MySQL It took 152 Seconds = 2:32 Minutes to create the interesting newsstreams for all 854 users or 177 ms per user

Let’s look at neo4j:

Not using any traverser but just some hacked in code I have something like the following

or all (user in index){

for (Relationship rel: user.getRelationships(DynamicRelationshipType.withName(“FAN”), Direction.BOTH)){

Node n1 = rel.getOtherNode(user);
for (Relationship r2: n1.getRelationships(DynamicRelationshipType.withName(“CONTENTITEM”), Direction.BOTH)){
Node n2 = r2.getOtherNode(n1)
edgecnt++;
}
}
}

Even thogh we only have 854 users and 1391 bands we end up with  1368270 relations that we have traversed to retrieve all content items for all favourite bands of each user.

Using neo4j it took  3,4 Seconds in doing this or 4 ms per user

That is about 44 times faster than MySQL

After warming the caches.

When repeating this experiment MySQL does not get faster. I know they have a query cache but I guess it gets emptied before the first queries are run again. In neo4j though this result gets even better. Each time I repeated this experiment the runtime decreased until I came to something like 0,4 Seconds for the job which now is 377 times faster than MySQL. The best part of this is scaling. When my Social graph grows the search in neo4j stays a local search. more users and more discussions does not mean more edges to traverse from the user to his favourite bands. in MySQl though the cost of these heavy joins will just explode.

Yes I know in MySQL I would denormalize my tables to create such an application in order to gain speed. But denormalizing means more redundancy and again a graph is a much more natural structure for a social application.

Open Questions! Feel free to discuss them (-:

There is an very important open question though and that is sorting. A social newsstream of course should be sorted by time. in MySQL it is easy to create an Index over a colum with timestamps on the Contentitems. Sorting my resultset by time will in this case not increase the runtime.

In a graph database with this particular schema of my graph I am not quite sure how to retrieve the results in a sorted way. (Anyone has some ideas?) There needs to be further testing to see if sorting after retrieving still makes my solution faster than MySQL or (my prefered solution) if there is some way of designing the graph in a way that for any user with any set of favourite bands there is a quick way of traversing through the content items and receiving them in an ordered way. I even guess this is already possible in neo4j using traversers with bredth first search and tell them in wich order to traverse relations. Just have too look deeper into this and i will keep you updated.

I am happy and open for comments and suggestions! Oh and could anyone suggest a nice syntax highlightning plugin for wordpress?

Published at DZone with permission of René Pickhardt, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Silvio Bierman replied on Sun, 2013/01/06 - 12:46am

Although you left out most of the significant details about the MySQL tables it seems like a very inefficient SQL query you are using looping over all users and then filtering on that user. If you are doing an apples versus oranges comparison then you should do it right.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.