NoSQL Zone is brought to you in partnership with:

Andreas Kollegger is a leading speaker and writer on graph databases and Neo4j and the bridge between community and developer efforts. He works actively in the community, speaking around the world and promoting the larger Neo4j ecosystem of projects. Author of Fair Trade Software, and the lead for Neo4j in the cloud, Andreas plays a valuable role for progressive happenings within Neo4j. Andreas is a DZone MVB and is not an employee of DZone and has posted 70 posts at DZone. You can read more from them at their website. View Full User Profile

Using an In-Graph-Alcohol-Percentage-Index

05.20.2013
| 2039 views |
  • submit to reddit

Curator's Note: The content of this article was written by Rik Van Bruggen over at the Neo4j blog. 

What happened before

As  you may remember, I created a little beer graph some time ago to experiment and have fun with beer, and graphs. And yes, I have been having LOTS of fun with it - using it to explain graph concepts to lots of not-so-technical folks, like myself. Many people liked it, and even more people had some questions about it - started thinking in graphs, basically. Which is way more than what I ever hoped for - so that's great!

One of the questions that people always asked me was about the model. Why did I model things the way I did? Are there no other ways to model this domain? What would be the *best* way to model it? All of these questions have somewhat vague answers, because as a rule, there is no *one way* to model a graph. The data does not determine the model - it's the QUERY that will drive the modelling decisions.

One of the things that spurred the discussion was - probably not coincidentally - the AlcoholPercentage. Many people were expecting that to be a *property* of the Beerbrand - but instead in my beergraph, I had "pulled it out". The main reason at the time was more coincidence than anything else, but when you think of it - it's actually a fantastic thing to "pull things out" and normalise the data model much further than you probably would in a relational model. By making the alcoholpercentage a node of its own, it allowed me to do more interesting queries and pathfinding operations - which led to interesting beer recommendations. Which is what this is all about, right?

Taking the AlcholPercentage to the next level

So in my new version of my beergraph, I have done something different. I used the example of Peter to create an in-graph index of AlcoholPercentages - a bit like the picture of the new model that you see here.
Essentially what I am doing is I am connecting all the alcohol-percentages into a chain of alcholpercentages - using the [:PRECEDES] relationship. In Cypher-style ascii-art that would be something like... -(alcperc-0.2)-[:PRECEDES]->(alcperc-0.1)-[:PRECEDES]->(alcperc)-[:PRECEDES]->(alcperc+0.1)-[:PRECEDES]->(alcperc+0.2)- ...To do this, I of course did have to modify my beer-spreadsheet a little bit. You can find the new version over here. But from the screenshot below you can see that all I did was create another tab that had all the alcoholpercentages and that "PRECEDES" relationship between them. Easy peasy.Nice. So what? The resulting dataset is very similar to what we had before - it's just a little bit richer. You immediately notice it as you start "walking" the graph on the WebUI: the links to the AlcoholPercentage-chain gives me a new and interesting way to explore the graph.But what else what can we do with this? Well, querying it is the obvious answer. Let me give you a couple of examples:
  • how can I find beers that have the same beertype and a "same or similar" alcoholprecentage (let's say + or - 1%) as a beer that I really like (Orval). That's now become very easy:

start    orval=node:node_auto_index(name="Orval")match   orval-[:IS_A]-beertype,   orval-[:HAS_ALCOHOL_PERCENTAGE]-alcperc,   alcperc-[:PRECEDES*0..10]-otheralcperc,   otherbeer-[:HAS_ALCOHOL_PERCENTAGE]-otheralcperc,   otherbeer-[:IS_A]-beertype,   otherbeer-[:BREWS]-otherbreweryreturnotherbeer.name, beertype.name, otherbrewery.name;Or another example:
  • how can I find other beers from the same brewery that have a similar AlcoholPercentage as a beer that I also like (Duvel)
start    duvel=node:node_auto_index(name="Duvel")match   duvel-[:BREWS]-brewery,   duvel-[:IS_A]-beertype,   duvel-[:HAS_ALCOHOL_PERCENTAGE]-alcperc,   alcperc-[:PRECEDES*1..10]-otheralcperc,   otherbeer-[:HAS_ALCOHOL_PERCENTAGE]-otheralcperc,   otherbeer-[:IS_A]-otherbeertype,   otherbeer-[:BREWS]-breweryreturn   otherbeer.name, otherbeertype.name, brewery.name,     otheralcperc.nameorder by    otherbeer.name;

Both of the queries above gave me some new, interesting insights that I did not know before, allowing me to discover even more and nicer Belgian beers. But what's important is of course that these in-graph indexes are fantastically interesting. By "pulling the data out", normalising even further, and then indexing the normalised data as a subgraph of it's own, we can much more easily derive new and interesting insights. And that, my dear friends, is what graphs are all about :) ...

Hope this was useful. If you like this post and want to discuss more about graphs and beer, please come to our Graph Café in June in Antwerp or Amsterdam - or at a pub near you?

Published at DZone with permission of Andreas Kollegger, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)