NoSQL Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 522 posts at DZone. You can read more from them at their website. View Full User Profile

Model to Answer Your Questions Rather Than Modelling Reality

08.26.2013
| 3421 views |
  • submit to reddit

On the recommendation of Ian Robinson I’ve been reading the 2nd edition of William’s Kent’s ‘Data and Reality‘ and the author makes an interesting observation at the end of the first chapter which resonated with me:

Once more: we are not modelling reality, but the way information about reality is processed, by people.


It reminds me of similar advice in Eric Evans’ Domain Driven Design and it’s advice which I believe is helpful when designing a model in a graph database.

Last year I wrote a post explaining how I’d be using an approach of defining questions that I wanted to ask before modelling my data and in neo4j land we can do this by writing cypher queries up front.

We can then play around with increasing the size of our data set in different ways to check that our queries are still performant and tweak our model if necessary.

For example one simple optimisation would be to run an offline query to make implicit relationships explicit.

Although graphs are very whiteboard friendly and it can be tempting to design our whole model before writing any queries this often causes problems later on.

When we eventually get to asking questions of our data we may find that we’ve modelled some things unnecessarily or have designed the model in a way that leads to inefficient queries.

I’ve found an effective approach is to keep the feedback loop tight by minimising the amount of time between drawing parts of our model on a whiteboard and writing queries against it.

If you’re interested in learning more, Ian has a slide deck from a talk he did at JAX 2013 which covers this idea and others when building out graph database applications.

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)