NoSQL Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 492 posts at DZone. You can read more from them at their website. View Full User Profile

neo4j/cypher/Lucene: Dealing with special characters

06.18.2013
| 3376 views |
  • submit to reddit

neo4j uses Lucene to handle indexing of nodes and relationships in the graph but something that can be a bit confusing at first is how to handle special characters in Lucene queries.

For example let’s say we set up a database with the following data:

CREATE ({name: "-one"})
CREATE ({name: "-two"})
CREATE ({name: "-three"})
CREATE ({name: "four"})

And for whatever reason we only wanted to return the nodes that begin with a hyphen.

A hyphen is a special character in Lucene so if we forget to escape it we’ll end up with an impressive stack trace:

START p = node:node_auto_index("name:-*") RETURN p;
==> RuntimeException: org.apache.lucene.queryParser.ParseException: Cannot parse 'name:-*': Encountered " "-" "- "" at line 1, column 5.
==> Was expecting one of:
==>     <BAREOPER> ...
==>     "(" ...
==>     "*" ...
==>     <QUOTED> ...
==>     <TERM> ...
==>     <PREFIXTERM> ...
==>     <WILDTERM> ...
==>     "[" ...
==>     "{" ...
==>     <NUMBER> ...
==>

So we change our query to escape the hyphen:

START p = node:node_auto_index("name:\-*") RETURN p;

which results in the following exception:

==> SyntaxException: invalid escape sequence
==> 
==> Think we should have better error message here? Help us by sending this query to cypher@neo4j.org.
==> 
==> Thank you, the Neo4j Team.
==> 
==> "START p = node:node_auto_index("name:\-*") RETURN p"
==>

The problem is that the cypher parser also treats ‘\’ as an escape character so we need to use two of them to make our query do what we want:

START p = node:node_auto_index("name:\\-*") RETURN p;
==> +------------------------+
==> | p                      |
==> +------------------------+
==> | Node[4]{name:"-one"}   |
==> | Node[5]{name:"-two"}   |
==> | Node[6]{name:"-three"} |
==> +------------------------+
==> 3 rows

Alternatively, as Chris pointed out, we could make use of parameters in which case we don’t need to worry about how the cypher parser handles escaping:

require 'neography'
 
neo = Neography::Rest.new
query = "START p = node:node_auto_index({query}) RETURN p"
result = neo.execute_query(query, { :query => 'name:\-*'})
 
p result["data"].map { |x| x[0]["data"] }
$ bundle exec ruby params.rb
[{"name"=>"-one"}, {"name"=>"-two"}, {"name"=>"-three"}]



















Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)