NoSQL Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 523 posts at DZone. You can read more from them at their website. View Full User Profile

Neo4j/Cypher: Keep Longest Path When Finding Taxonomy

05.21.2013
| 1130 views |
  • submit to reddit

I’ve been playing around with modelling a product taxonomy and one thing that I wanted to do was find out the full path where a product sits under the tree.

I created a simple data set to show the problem:

CREATE (cat { name: "Cat" })
CREATE (subcat1 { name: "SubCat1" })
CREATE (subcat2 { name: "SubCat2" })
CREATE (subsubcat1 { name: "SubSubCat1" })
CREATE (product1 { name: "Product1" })
CREATE (cat)-[:CHILD]-subcat1-[:CHILD]-subsubcat1
CREATE (product1)-[:HAS_CATEGORY]-(subsubcat1)

I wanted to write a query which would return ‘product1′ and the tree ‘Cat -> SubCat1 -> SubSubCat1′ and initially wrote the following query:

START product=node:node_auto_index(name="Product1") 
MATCH product-[:HAS_CATEGORY]-category, taxonomy=category<-[:CHILD*1..]-parent 
RETURN product, EXTRACT(n IN NODES(taxonomy): n.name)

which returns:

==> +--------------------------------------------------------------------+
==> | product                    | EXTRACT(n IN NODES(taxonomy): n.name) |
==> +--------------------------------------------------------------------+
==> | Node[888]{name:"Product1"} | ["SubSubCat1","SubCat1"]              |
==> | Node[888]{name:"Product1"} | ["SubSubCat1","SubCat1","Cat"]        |
==> +--------------------------------------------------------------------+
==> 2 rows

I didn’t want to return the first row since that isn’t the full tree and Andres suggested that looking for nodes which didn’t have any incoming children would help me do that:

START product=node:node_auto_index(name="Product1") 
MATCH product-[:HAS_CATEGORY]-category, 
      taxonomy=category<-[:CHILD*1..]-parent 
WHERE NOT parent<-[:CHILD]-() 
RETURN product, EXTRACT(n IN NODES(taxonomy): n.name)

==> +--------------------------------------------------------------------+
==> | product                    | EXTRACT(n IN NODES(taxonomy): n.name) |
==> +--------------------------------------------------------------------+
==> | Node[888]{name:"Product1"} | ["SubSubCat1","SubCat1","Cat"]        |
==> +--------------------------------------------------------------------+
==> 1 row

If we want to reverse the taxonomy so it’s in the right order we can follow Wes Freeman’s advice from the following Stack Overflow thread:

START product=node:node_auto_index(name="Product1") 
MATCH product-[:HAS_CATEGORY]-category, taxonomy=category<-[:CHILD*1..]-parent 
WHERE NOT parent<-[:CHILD]-() 
RETURN product, 
       REDUCE(acc=[], cat IN EXTRACT(n IN NODES(taxonomy): n.name): cat + acc) AS taxonomy

==> +-------------------------------------------------------------+
==> | product                    | taxonomy                       |
==> +-------------------------------------------------------------+
==> | Node[888]{name:"Product1"} | ["Cat","SubCat1","SubSubCat1"] |
==> +-------------------------------------------------------------+
==> 1 row



Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)