NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 60 posts at DZone. You can read more from them at their website. View Full User Profile

Using Gremlin with Neography

04.19.2012
| 2863 views |
  • submit to reddit

Gremlin is a domain specific language for traversing property graphs. Neo4j is one of the databases that can speak the gremlin language, and as promised I’ll show you how you can use it to implement friend recommendations as well as degrees of separation.

We can send any gremlin script to Neo4j via the REST API and neography using the execute_script command. Let’s implement suggestions_for so it sends a gremlin script to the server:

def suggestions_for(node)
  node_id = node["self"].split('/').last
  @neo.execute_script("g.v(node_id).
                         in('friends').
                         in('friends').
                         dedup.
                         filter{it != g.v(node_id)}.
                         name", {:node_id => node_id})
end

puts "Johnathan should become friends with #{suggestions_for(johnathan).join(', ')}"

# RESULT
# Johnathan should become friends with Mary, Phil

Let’s go through the gremlin steps

g

is our graph.

v(node_id)

is the vertex with the id gathered from node_id (which will be passed as a parameter later). In gremlin a node is a vertex, and a relationship is an edge.

in('friends')

tells gremlin we want to traverse incoming relationships of type “friends” and we want to do this twice since we’re going to get friends of friends.

dedup

removes any duplicate nodes we found along the way… you know the popular kids.

name

grabs the name property of the nodes we found. You want to parameterize your script when possible to avoid re-parsing and improve performance.

How about degrees of separation?

def degrees_of_separation(start_node, destination_node)
  start_node_id = start_node["self"].split('/').last
  destination_node_id = destination_node["self"].split('/').last
  @neo.execute_script("g.v(start_node_id).
                         as('x').
                         in.loop('x'){it.loops <= 4 & 
                                      it.object.id != destination_node_id}.
                         simplePath.
                         filter{it.id == destination_node_id}.
                         paths{it.name}", {:start_node_id => start_node_id,
                                           :destination_node_id => destination_node_id })
end

degrees_of_separation(johnathan, mary).each do |path|
  puts "#{(path.size - 1 )} degrees: " + path.join(' => friends => ') 
end

# RESULT
# 3 degrees: Johnathan => friends => Mark => friends => Phil => friends => Mary
# 2 degrees: Johnathan => friends => Mark => friends => Mary

This one is a bit more tricky. We once again start with our graph

g

then go to the start node (or vertex) 

v(start_node_id)

we are going to name this step as x with the 

as('x')

command because we’ll want to come back here later.

in.loop('x')

tells gremlin to traverse incoming relationships and keep looping but go back to x when the condition that follows becomes false. Gremlin keeps track of the number of loops you’ve taken and here we tell it to stop at 4 

it.loops <= 4

and stop if we reach our destination along the path

it.object.id != destination_node_id}

We avoid repeating elements in the path with

simplePath

and we filter the results that end in our destination node with

filter{it.id == destination_node_id}

and get the names of the nodes in the path with 

paths{it.name}

We then pass our parameters

{:start_node_id => start_node_id,
 :destination_node_id => destination_node_id }

Gremlin is very powerful and is aided by the whole Tinkerpop stack. You’ll need some time and a little something to get you in right state of mind but if you can grok it then reading the gremlin wiki will expand your mind. If you prefer to watch instead of read, this video will give you a pretty good introduction to the Gremlin language.

Published at DZone with permission of Max De Marzi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)