NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 59 posts at DZone. You can read more from them at their website. View Full User Profile

Knowledge Bases in Neo4j

05.15.2013
| 2553 views |
  • submit to reddit


cnet5promo

From the second we are born we are collecting a wealth of knowledge about the world. This knowledge is accumulated and interrelated inside our brains and it represents what we know. If we could export this knowledge and give it to a computer, it would look like ConceptNet. ConceptNet is a semantic network that…

…is built from nodes representing concepts, in the form of words or short phrases of natural language, and labeled relationships between them. These are the kinds of things computers need to know to search for information better, answer questions, and understand people’s goals.


I wrote a little ruby script to import ConceptNet5 into Neo4j and it gives us a nice graph (243MB) to work with. ConceptNet5 as presented in csv files is actually a hypergraph, with a reason for the concept:

/a/[/r/NotHasProperty/,/c/en/old_map/,/c/en/very_accurate/]     /r/NotHasProperty       /c/en/old_map   /c/en/very_accurate     /ctx/all
        -1      /s/activity/omcs/vote,/s/contributor/omcs/PJ    /e/e529e3a070783cbbe212bc5e721b6938c0a6df6b     /d/conceptnet/4/en      [[Old maps]] are not [[very accurate]]
/a/[/r/NotHasProperty/,/c/en/old_map/,/c/en/very_accurate/]     /r/NotHasProperty       /c/en/old_map   /c/en/very_accurate     /ctx/all
        -1      /s/activity/omcs/vote,/s/contributor/omcs/aghanford     /e/a8ecaed55f5ffba88b6d02da99ecf3fe42bffe55     /d/conceptnet/4/en
      [[Old maps]] are not [[very accurate]]

Here two contributors let us know that old maps are not very accurate. That’s great to know, but we don’t really need to represent this twice in our graph. So instead we capture and ignore duplicate relationships by using a bloom filter to check for their existence.

@edge_bf = BloomFilter::Native.new(:size => 212000000, :hashes => 23, :bucket => 8, :raise => false)
 
def is_unique_rel(from,to,rel)
  return false if @edge_bf.include?("#{from}-#{to}-#{rel}")
  @edge_bf.insert("#{from}-#{to}-#{rel}")
  true
end

Once it’s all set and done, we end up with about 2.5 million nodes and 7.5 million relationships:

conceptnet5

For example, let’s see everything ConceptNet5 knows about Sushi:

sushi

START sushi=node:Concepts(id="/c/en/sushi")
MATCH sushi-[r]-other_concepts
RETURN sushi.id, TYPE(r), other_concepts.id

We imported all of the concepts in to a “Concepts” index to make the graph easy to work with.
Here we are asking for all other concepts connected to the sushi concept, and asking the graph to tell us what type of relationship exists between them.

==> +--------------------------------------------------------+
==> | TYPE(r)           | other_concepts.id                  |
==> +--------------------------------------------------------+
==> | "MadeOf"          | "/c/en/raw_fish"                   |
==> | "MotivatedByGoal" | "/c/en/eat_in_restaurant"          |
==> | "AtLocation"      | "/c/en/japanese_restaurant"        |
==> | "HasProperty"     | "/c/en/delicious"                  |
==> | "HasProperty"     | "/c/en/japanese_in_origin"         |
==> | "IsA"             | "/c/en/asian_food"                 |
==> | "IsA"             | "/c/en/from_japan"                 |
==> | "IsA"             | "/c/en/japanese_food"              |
==> | "IsA"             | "/c/en/food"                       |
==> | "IsA"             | "/c/en/fish"                       |
==> | "NotIsA"          | "/c/en/raw_fish"                   |
==> | "CapableOf"       | "/c/en/consist_mainly_of_raw_fish" |
==> | "ReceivesAction"  | "/c/en/eat_by_many_westerner"      |
==> +--------------------------------------------------------+

The results are quite interesting. Our graph knows it’s made of raw fish, eaten in a restaurant, specifically a Japanese restaurant (hard to find sushi at an Italian or Indian restaurant). The graph thinks sushi is delicious (I would agree, but some folks would violently disagree). Notice also that it has a link to “NotIsA” raw_fish and a link to “consists_mainly_of_raw_fish”, so our graph is smart enough to know that some sushi is not raw.

If you ever happen to stop by the Neo4j office in San Mateo, CA, you’ll want to go to Sushi Sams for the best Sushi in San Mateo. Let’s see what else it thinks is delicious:

START delicious=node:Concepts(id="/c/en/delicious")
MATCH delicious-[r]-other_concepts
RETURN TYPE(r), other_concepts.id

==> +------------------------------------+
==> | TYPE(r)       | other_concepts.id  |
==> +------------------------------------+
==> | "IsA"         | "/c/en/single"     |
==> | "NotIsA"      | "/c/en/nutricious" |
==> | "HasProperty" | "/c/en/ice_cream"  |
==> | "HasProperty" | "/c/en/atangerine" |
==> | "HasProperty" | "/c/en/banana"     |
==> | "HasProperty" | "/c/en/chicken"    |
==> | "HasProperty" | "/c/en/chocolate"  |
==> | "HasProperty" | "/c/en/beef"       |
==> | "HasProperty" | "/c/en/fruit"      |
==> | "HasProperty" | "/c/en/butter"     |
==> | "HasProperty" | "/c/en/meat"       |
==> | "HasProperty" | "/c/en/cake"       |
==> | "HasProperty" | "/c/en/sushi"      |
==> | "HasProperty" | "/c/en/marmite"    |
==> | "HasProperty" | "/c/en/cheese"     |
==> | "HasProperty" | "/c/en/tortilla"   |
==> +------------------------------------+

Anything that is not “nutricious” (they probably meant nutritious ) is not delicious. I agree with most other things on here… but marmite? Seriously.

marmite-404_685611c

If you want to tackle something a bit bigger, you can look at the Yago Knowledge Base which has 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities.

yago_logo_mainpage


Published at DZone with permission of Max De Marzi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)