Big Data/Analytics Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 544 posts at DZone. You can read more from them at their website. View Full User Profile

Neo4j: Make Properties Relationships

03.07.2013
| 3008 views |
  • submit to reddit

I spent some of the weekend working my way through JimIan & Emil‘s book ‘Graph Databases‘ and one of the things that they emphasise is that graphs allow us to make relationships first class citizens in our model.

Looking back on a couple of the graphs that I modelled last year I realise that I didn’t quite get this and although the graphs I modelled had some relationships a lot of the time I was defining things as properties on nodes.

While it’s fine to do this I think we lose some of the power of a graph and it’s not necessarily obvious what we’ve lost until we model a property as a relationship and see what possibilities open up.

For example in my football graph I wanted to record the date of matches and initially stored this as a property on the match before realising that modelling it as a relationship which might open up some interesting queries.

I created this relationship between a match and the month that the match took place in:

MATCH-[:in_month]->MONTH

As a result of having this relationship I can now really easily find out which matches Gareth Bale played in September for example:

START player = node:players('name:"Gareth Bale"'), month=node:months('name:September')
MATCH player-[:played_in]-game
WHERE game-[:in_month]-month
RETURN game.name, game.home_goals + "-" +game.away_goals AS score, game.date
+----------------------------------------------------------------------------------+
| game.name                                  | score | game.date                   |
+----------------------------------------------------------------------------------+
| "Reading vs Tottenham Hotspur"             | "1-3" | "2012-09-16 16:00:00 +0100" |
| "Tottenham Hotspur vs Norwich City"        | "1-1" | "2012-09-01 15:00:00 +0100" |
| "Tottenham Hotspur vs Queens Park Rangers" | "2-1" | "2012-09-23 16:00:00 +0100" |
| "Manchester United vs Tottenham Hotspur"   | "2-3" | "2012-09-29 17:30:00 +0100" |
+----------------------------------------------------------------------------------+

Or we could find all the matches in December where one of the teams won by more than 2 goals:

START month=node:months('name:December')
MATCH month-[:in_month]-game
WHERE ABS(game.home_goals - game.away_goals) > 2
RETURN game.name, game.home_goals + "-" +game.away_goals AS score, game.date
+----------------------------------------------------------------------------+
| game.name                            | score | game.date                   |
+----------------------------------------------------------------------------+
| "Sunderland vs Reading"              | "3-0" | "2012-12-11 19:45:00 +0000" |
| "Reading vs Arsenal"                 | "2-5" | "2012-12-17 20:00:00 +0000" |
| "Newcastle United vs Wigan Athletic" | "3-0" | "2012-12-03 20:00:00 +0000" |
| "Fulham vs Tottenham Hotspur"        | "0-3" | "2012-12-01 15:00:00 +0000" |
| "Liverpool vs Fulham"                | "4-0" | "2012-12-22 17:30:00 +0000" |
| "Chelsea vs Aston Villa"             | "8-0" | "2012-12-23 16:00:00 +0000" |
| "Aston Villa vs Tottenham Hotspur"   | "0-4" | "2012-12-26 17:30:00 +0000" |
| "Aston Villa vs Wigan Athletic"      | "0-3" | "2012-12-29 15:00:00 +0000" |
| "Arsenal vs Newcastle United"        | "7-3" | "2012-12-29 17:30:00 +0000" |
| "Queens Park Rangers vs Liverpool"   | "0-3" | "2012-12-30 16:00:00 +0000" |
+----------------------------------------------------------------------------+

There are certainly other things that we can find out now that we’ve got this relationship from months to matches explicit but it’s not only dates where this idea comes in useful.

I already had players modelled in the data set but I thought it’d be interesting to find out more about the data set based on where players came from.

I therefore added the following relationships:

PLAYER-[:comes_from]->COUNTRY-[:is_in]->CONTINENT

We can now find the top scorers in the Premiership (accurate until before last weekend) who come from South America for example:

START continent = node:continents('name:"South America"')
MATCH continent-[:is_in]-country-[:comes_from]-player-[:played|subbed_on]-stats-[:in]-game
WHERE player-[:scored_in]-game
RETURN player.name, country.name, player.team, SUM(stats.goals) AS goals
ORDER BY goals DESC
LIMIT 5
+--------------------------------------------------------------+
| player.name       | country.name | player.team       | goals |
+--------------------------------------------------------------+
| "Luis Suárez"     | "Uruguay"    | "Liverpool"       | 18    |
| "Sergio Agüero"   | "Argentina"  | "Manchester City" | 9     |
| "Carlos Tevez"    | "Argentina"  | "Manchester City" | 8     |
| "Franco Di Santo" | "Argentina"  | "Wigan Athletic"  | 5     |
| "Ramires"         | "Brazil"     | "Chelsea"         | 4     |
+--------------------------------------------------------------+

Or we could find out how many goals have been scored by players from each continent:

START continent = node:continents('name:*')
MATCH continent-[:is_in]-country-[:comes_from]-player-[:played|subbed_on]-stats-[:in]-game
WHERE player-[:scored_in]-game
RETURN continent.name, SUM(stats.goals) AS goals
ORDER BY goals DESC
+-------------------------+
| continent.name  | goals |
+-------------------------+
| "Europe"        | 569   |
| "Africa"        | 73    |
| "South America" | 62    |
| "North America" | 22    |
| "Asia"          | 3     |
| "Oceania"       | 3     |
+-------------------------+

I don’t think every property needs to be a relationship but it can certainly be useful to think about doing so because it does allow you to think of interesting queries that you may not have previously thought about.

As an aside I’m working on putting this data set somewhere so people can play around with cypher queries on it so if you’d be interested let me know.

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)