NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 60 posts at DZone. You can read more from them at their website. View Full User Profile

Facebook Graph Search with Cypher and Neo4j

01.28.2013
| 2963 views |
  • submit to reddit

neo_graph_search_screen_shot

Facebook Graph Search has given the Graph Database community a simpler way to explain what it is we do and why it matters. I wanted to drive the point home by building a proof of concept of how you could do this with Neo4j. However, I don’t have six months or much experience with NLP (natural language processing). What I do have is Cypher. Cypher is Neo4j’s graph language and it makes it easy to express what we are looking for in the graph. I needed a way to take “natural language” and create Cypher from it. This was going to be a problem.

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

It’s an old programmer joke, but that is what came to mind. Some kind of fuzzy regular expressions. In the IPhone world, we usually hear people say “There’s an App for that”. In Ruby world, we go with “there’s a Gem for that”… so I asked google for some help and came upon Semr.

Semr is the gateway drug framework to supporting natural language processing in your application. It’s goal is to follow the 80/20 rule where 80% of what you want to express in a DSL is possible in familiar way to how developers normally solve solutions. (Note: There are other more flexible solutions but also come with a higher learing curve, i.e. like treetop)

Awesome, a ray of light to solve my problem… but the Gem is 4 years old. I could not get it to install. Bummer… Wait what was that about Treetop?

Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge parsing expression grammars, it helps you analyze syntax with revolutionary ease.

Score! Now I had no idea how to write a proper language grammar, but that’s never stopped anyone before. Someone who has more than a couple hours of experience with Treetop is going to laugh at this but I’ll show you part of what I did:

rule friends
  "friends" <Friends>
end

rule likes
  "who like" <Likes>
end

rule likeand
  likes space thing space "and" space thing <LikeAnd>
end

rule thing
  [a-zA-Z0-9]+ <Thing>
end

I am creating some rules for things, and the likes relationship, and also the idea of “likes this and that”.
The “natural language” is run by these rules and a syntax tree is generated with the matching rules. These are then turned into hashes representing pieces of cypher. Looking at the code above and below you can see how “friends who like Neo4j” gets parsed into Friends, Likes, Thing.

class Friends < Treetop::Runtime::SyntaxNode
  def to_cypher
      return {:start  => "me = node({me})", 
              :match  => "me -[:friends]-> people",
              :return => "people",
              :params => {"me" => nil }}
  end 
end

class Likes < Treetop::Runtime::SyntaxNode
  def to_cypher
      return {:match => "people -[:likes]-> thing"}
  end 
end

class Thing < Treetop::Runtime::SyntaxNode
  def to_cypher
      return {:start  => "thing = node:things({thing})",
              :params => {"thing" => "name: " + self.text_value } }
  end 
end

Then these hashes are combined and turned into a proper Cypher string:

class Expression < Treetop::Runtime::SyntaxNode
  def to_cypher
    cypher_hash =  self.elements[0].to_cypher
    cypher_string = ""
    cypher_string << "START "   + cypher_hash[:start].uniq.join(", ")
    cypher_string << " MATCH "  + cypher_hash[:match].uniq.join(", ") unless cypher_hash[:match].empty?
    cypher_string << " RETURN DISTINCT " + cypher_hash[:return].uniq.join(", ")
    params = cypher_hash[:params].empty? ? {} : cypher_hash[:params].uniq.inject {|a,h| a.merge(h)}
    return [cypher_string, params].compact
  end
end

Finally I built a Sinatra web application that imports your data from Facebook and a search page so you can try this out for yourself. As always, the code is available on Github, and hosted on Heroku.

While reproducing a “kinda” Facebook Graph Search is interesting, what would be more interesting is seeing other people use this idea on their own data. If you would like to know more about this proof of concept, contact me or come to the Neo4j Meetups in Virginia (Feb 26th) or in Boston (Feb 28th) or in Chicago (TBD) and somewhere near you.


Published at DZone with permission of Max De Marzi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)