NoSQL Zone is brought to you in partnership with:

Romiko Derbynew is an independent consultant, focusing on integration solutions. Romiko is passionate about future technologies relating to Cloud Computing, Graph Databases, Messaging and Integration. Romiko spends his spare time in the performing arts, surfing, paragliding and finding new adventures. Romiko strives to contribute to the community, check out rangerrom.com if you have kids that love animals and get his latest book" Ranger Rom saves Rhino Reth". Romiko is a DZone MVB and is not an employee of DZone and has posted 18 posts at DZone. You can read more from them at their website. View Full User Profile

Lucene Full Text Indexing with Neo4j

03.23.2012
| 6920 views |
  • submit to reddit

I spent some time working on full text search for Neo4j. The basic goals were as follows

    • Control the pointers of the index
    • Full Text Search
    • All operations are done via Rest
    • Can create an index when creating a node
    • Can update and index
    • Can check if an index exists
    • When bootstrapping Neo4j in the cloud run Index checks
    • Query Index using full text search lucene query language.
Download: This is based on Neo4jClient: http://nuget.org/List/Packages/Neo4jClient Source Code at: http://hg.readify.net/neo4jclient/ Introduction

So with the above objectives, I decided to go with Manual Indexing. The main reason here is that I can put an index pointing to node A based on values in node B.

Imagine the following.

You have Node A with a list:

Surname, FirstName and MiddleName. However Node A also has a relationship to Node B which has other names, perhaps Display Names, Avatar Names and AKA’s.

So with manual indexing, you can have all the above entries for names in Node A and Node B point to Node A only.

So, in a Rest call to the Neo4j server, it would look something like this in Fiddler.

image

Notice the following:

Url: http://localhost:7474/db/data/index/node/{IndexName}/{Key}/{Value}

So, if we were adding 3 names for the SAME client from 2 different nodes. You would have the same IndexName and Key then with different values in the Url. The node pointer (In the request body) will then be the address to the Node.

Neo4jClient Nuget Package

I have updated the Neo4jClient which is on Nuget, to now support:

  • Creating Exact or FullText Indexes on it’s own, so that it just exists
  • Creating Exact or FullTest indexes when creating a node, the node reference will automatically be calculated.
  • Updating an Index
  • Deleting entries from an index.
    Class diagram for the indexing solution in Neo4jClient.

image

RestSharp

The Neo4jClient package uses RestSharp, thus making all the index call operations a trivial task for us, so lets have a look at some of the code inside the client to see how to consume manual index api from .Net, and then in the next section well look how we consume this code from another application.

public Dictionary<string, IndexMetaData> GetIndexes(IndexFor indexFor)
       {
           CheckRoot();

           string indexResource;
           switch (indexFor)
           {
               case IndexFor.Node:
                   indexResource = RootApiResponse.NodeIndex;
                   break;
               case IndexFor.Relationship:
                   indexResource = RootApiResponse.RelationshipIndex;
                   break;
               default:
                   throw new NotSupportedException(string.Format("GetIndexes does not support indexfor {0}", indexFor));
           }

           var request = new RestRequest(indexResource, Method.GET)
           {
               RequestFormat = DataFormat.Json,
               JsonSerializer = new CustomJsonSerializer { NullHandling = JsonSerializerNullValueHandling }
           };

           var response =  client.Execute<Dictionary<string, IndexMetaData>>(request);

           if (response.StatusCode != HttpStatusCode.OK)
               throw new NotSupportedException(string.Format(
                   "Received an unexpected HTTP status when executing the request.\r\n\r\n\r\nThe response status was: {0} {1}",
                   (int)response.StatusCode,
                   response.StatusDescription));

           return response.Data;
       }

       public bool CheckIndexExists(string indexName, IndexFor indexFor)
       {
           CheckRoot();

           string indexResource;
           switch (indexFor)
           {
               case IndexFor.Node:
                   indexResource = RootApiResponse.NodeIndex;
                   break;
               case IndexFor.Relationship:
                   indexResource = RootApiResponse.RelationshipIndex;
                   break;
               default:
                   throw new NotSupportedException(string.Format("IndexExists does not support indexfor {0}", indexFor));
           }

           var request = new RestRequest(string.Format("{0}/{1}",indexResource, indexName), Method.GET)
           {
               RequestFormat = DataFormat.Json,
               JsonSerializer = new CustomJsonSerializer { NullHandling = JsonSerializerNullValueHandling }
           };

           var response = client.Execute<Dictionary<string, IndexMetaData>>(request);

           return response.StatusCode == HttpStatusCode.OK;
       }

       void CheckRoot()
       {
           if (RootApiResponse == null)
               throw new InvalidOperationException(
                   "The graph client is not connected to the server. Call the Connect method first.");
       }

       public void CreateIndex(string indexName, IndexConfiguration config, IndexFor indexFor)
       {
           CheckRoot();

           string nodeResource;
           switch (indexFor)
           {
               case IndexFor.Node:
                   nodeResource = RootApiResponse.NodeIndex;
                   break;
               case IndexFor.Relationship:
                   nodeResource = RootApiResponse.RelationshipIndex;
                   break;
               default:
                   throw new NotSupportedException(string.Format("CreateIndex does not support indexfor {0}", indexFor));
           }

           var createIndexApiRequest = new
               {
                   name = indexName.ToLower(),
                   config
               };

           var request = new RestRequest(nodeResource, Method.POST)
               {
                   RequestFormat = DataFormat.Json,
                   JsonSerializer = new CustomJsonSerializer {NullHandling = JsonSerializerNullValueHandling}
               };
           request.AddBody(createIndexApiRequest);

           var response = client.Execute(request);

           if (response.StatusCode != HttpStatusCode.Created)
               throw new NotSupportedException(string.Format(
                   "Received an unexpected HTTP status when executing the request..\r\n\r\nThe index name was: {0}\r\n\r\nThe response status was: {1} {2}",
                   indexName,
                   (int) response.StatusCode,
                   response.StatusDescription));
       }

       public void ReIndex(NodeReference node, IEnumerable<IndexEntry> indexEntries)
       {
           CheckRoot();

           var nodeAddress = string.Join("/", new[] {RootApiResponse.Node, node.Id.ToString()});

           var updates = indexEntries
               .SelectMany(
                   i => i.KeyValues,
                   (i, kv) => new {IndexName = i.Name, kv.Key, kv.Value});

           foreach (var update in updates)
           {
               if (update.Value == null)
                   break;

               string indexValue;
               if(update.Value is DateTimeOffset)
               {
                   indexValue = ((DateTimeOffset) update.Value).UtcTicks.ToString();
               }
               else if (update.Value is DateTime)
               {
                   indexValue = ((DateTime)update.Value).Ticks.ToString();
               }
               else
               {
                   indexValue = update.Value.ToString();
               }

               AddNodeToIndex(update.IndexName, update.Key, indexValue, nodeAddress);
           }
       }

       public void DeleteIndex(string indexName, IndexFor indexFor)
       {
           CheckRoot();

           string indexResource;
           switch (indexFor)
           {
               case IndexFor.Node:
                   indexResource = RootApiResponse.NodeIndex;
                   break;
               case IndexFor.Relationship:
                   indexResource = RootApiResponse.RelationshipIndex;
                   break;
               default:
                   throw new NotSupportedException(string.Format("DeleteIndex does not support indexfor {0}", indexFor));
           }

           var request = new RestRequest(string.Format("{0}/{1}", indexResource, indexName), Method.DELETE)
           {
               RequestFormat = DataFormat.Json,
               JsonSerializer = new CustomJsonSerializer { NullHandling = JsonSerializerNullValueHandling }
           };

           var response = client.Execute(request);

           if (response.StatusCode != HttpStatusCode.NoContent)
               throw new NotSupportedException(string.Format(
                   "Received an unexpected HTTP status when executing the request.\r\n\r\nThe index name was: {0}\r\n\r\nThe response status was: {1} {2}",
                   indexName,
                   (int)response.StatusCode,
                   response.StatusDescription));
       }

       void AddNodeToIndex(string indexName, string indexKey, string indexValue, string nodeAddress)
       {
           var nodeIndexAddress = string.Join("/", new[] { RootApiResponse.NodeIndex, indexName, indexKey, indexValue });
           var request = new RestRequest(nodeIndexAddress, Method.POST)
           {
               RequestFormat = DataFormat.Json,
               JsonSerializer = new CustomJsonSerializer { NullHandling = JsonSerializerNullValueHandling }
           };
           request.AddBody(string.Join("", client.BaseUrl, nodeAddress));

           var response = client.Execute(request);

           if (response.StatusCode != HttpStatusCode.Created)
               throw new NotSupportedException(string.Format(
                   "Received an unexpected HTTP status when executing the request.\r\n\r\nThe index name was: {0}\r\n\r\nThe response status was: {1} {2}",
                   indexName,
                   (int)response.StatusCode,
                   response.StatusDescription));
       }

       public IEnumerable<Node<TNode>> QueryIndex<TNode>(string indexName, IndexFor indexFor, string query)
       {
           CheckRoot();

           string indexResource;

           switch (indexFor)
           {
               case IndexFor.Node:
                   indexResource = RootApiResponse.NodeIndex;
                   break;
               case IndexFor.Relationship:
                   indexResource = RootApiResponse.RelationshipIndex;
                   break;
               default:
                   throw new NotSupportedException(string.Format("QueryIndex does not support indexfor {0}", indexFor));
           }

           var request = new RestRequest(indexResource + "/" + indexName, Method.GET)
               {
                   RequestFormat = DataFormat.Json,
                   JsonSerializer = new CustomJsonSerializer {NullHandling = JsonSerializerNullValueHandling}
               };

           request.AddParameter("query", query);

           var response = client.Execute<List<NodeApiResponse<TNode>>>(request);

           if (response.StatusCode != HttpStatusCode.OK)
               throw new NotSupportedException(string.Format(
                   "Received an unexpected HTTP status when executing the request.\r\n\r\nThe index name was: {0}\r\n\r\nThe response status was: {1} {2}",
                   indexName,
                   (int) response.StatusCode,
                   response.StatusDescription));

           return response.Data == null
          ? Enumerable.Empty<Node<TNode>>()
          : response.Data.Select(r => r.ToNode(this));
       }

 

Using the Neo4jClient from within an application

Create an Index and check if it exists

This is useful when bootstrapping Neo4j, to see if there are any indexes that SHOULD be there and are not, so that you can enumerate all the nodes for that index and add entries.

public void CreateIndexesForAgencyClients()
        {
            var agencies = graphClient
                .RootNode
                .Out<Agency>(Hosts.TypeKey)
                .ToList();

            foreach (var agency in agencies)
            {
                var indexName = IndexNames.Clients(agency.Data);
                var indexConfiguration = new IndexConfiguration
                    {
                        Provider = IndexProvider.lucene,
                        Type = IndexType.fulltext
                    };

                if (!graphClient.CheckIndexExists(indexName, IndexFor.Node))
                {
                    Trace.TraceInformation("CreateIndexIfNotExists {0} for Agency Key {0}", indexName, agency.Data.Key);
                    graphClient.CreateIndex(indexName, indexConfiguration, IndexFor.Node);
                    PopulateAgencyClientIndex(agency.Data);
                }
            }
        }

Create an Index Node Entry when creating a node

var indexEntries = GetIndexEntries(agency.Data, client, clientViewModel.AlsoKnownAses);

var clientNodeReference = graphClient.Create(
                client,
                new[] {new ClientBelongsTo(agencyNode.Reference)}, indexEntries);

public IEnumerable<IndexEntry> GetIndexEntries(Agency agency, Client client, IEnumerable<AlsoKnownAs> alsoKnownAses)
        {
            var indexKeyValues = new List<KeyValuePair<string, object>>
            {
                new KeyValuePair<string, object>(AgencyClientIndexKeys.Gender.ToString(), client.Gender)
            };

            if (client.DateOfBirth.HasValue)
            {
                var dateOfBirthUtcTicks = client.DateOfBirth.Value.UtcTicks;
                indexKeyValues.Add(new KeyValuePair<string, object>(AgencyClientIndexKeys.DateOfBirth.ToString(), dateOfBirthUtcTicks));
            }

            var names = new List<string>
            {
                client.GivenName,
                client.FamilyName,
                client.PreferredName,
            };

            if (alsoKnownAses != null)
            {
                names.AddRange(alsoKnownAses.Where(a => !string.IsNullOrEmpty(a.Name)).Select(aka => aka.Name));
            }

            indexKeyValues.AddRange(names.Select(name => new KeyValuePair<string, object>(AgencyClientIndexKeys.Name.ToString(), name)));

            return new[]
            {
                new IndexEntry
                {
                    Name = IndexNames.Clients(agency),
                    KeyValues = indexKeyValues.Where(v => v.Value != null)
                }
            };
        }
		
Reindex a node

Notice there was a call to PopulateAgencyClientIndexin in the code, this is done in our bootstrap to ensure indexes are always there as expected, and if for some reason they are not, then they created and populated by using reindex feature.

void PopulateAgencyClientIndex(Agency agency)
        {
            var clients = graphClient
                .RootNode
                .Out<Agency>(Hosts.TypeKey, a => a.Key == agency.Key)
                .In<Client>(ClientBelongsTo.TypeKey);

            foreach (var client in clients)
            {
                var clientService = clientServiceCallback();
                var akas = client.Out<AlsoKnownAs>(IsAlsoKnownAs.TypeKey).Select(a => a.Data);
                var indexEntries = clientService.GetIndexEntries(agency, client.Data, akas);
                graphClient.ReIndex(client.Reference, indexEntries);
            }
        }
		

Querying a full text search index using Lucene

Below is sample code to query full text search. Basically your index entries for a person with

Name: Bob, Surname:Van de Builder, Aka1: Bobby, Aka2: Bobs, PrefferedName: Bob The Builder

The index entries will need to look like the

Key:Value
Name: Bob
Name:Van
Name:de
Name: Builder
Name: Bobby
Name: Bobs

Remember, Lucene has a white space analyser, so any names with spaces MUST become a new index entry, so what we do is split out names based on whitespaces and this becomes our collection of IndexEntries. The above is related to full text search context.

Note: If using EXACT Index match, then composite entries are needed for multiple words, since you no longer using lucene full text search capabilities. e.g.

Name: Bob The Builder

This is good to know, because things like postal code searches or Gender where exact matches are required do not need full text indexes.

Lets check out an example of querying an index.

[Test]
public void VerifyWhenANewClientIsCreateThatPartialNameCanBeFuzzySearchedInTheFullTextSearchIndex()
{
    using (var agency = Data.NewTestAgency())
    using (var client = Data.NewTestClient(agency, c =>
    {
        c.Gender = Gender.Male;
        c.GivenName = "Joseph";
        c.MiddleNames = "Mark";
        c.FamilyName = "Kitson";
        c.PreferredName = "Joey";

        c.AlsoKnownAses = new List<AlsoKnownAs>
            {
               new AlsoKnownAs {Name = "J-Man"},
               new AlsoKnownAs {Name = "J-Town"}
            };
    }
        ))
    {
        var indexName = IndexNames.Clients(agency.Agency.Data);
        const string partialName = "+Name:Joe~+Name:Kitson~";
        var result = GraphClient.QueryIndex<Client>(indexName, IndexFor.Node, partialName);
        Assert.AreEqual(client.Client.Data.UniqueId, result.First().Data.UniqueId);
    }
}

Dates

Notice that in some of the code, you may have noticed that when I store date entries in the index, I store them as Ticks, so this will be as long numbers, this is awesome, as it gives raw power to searching dates via longs Smile

[Test]
       public void VerifyWhenANewClientIsCreateThatTheDateOfBirthCanBeRangeSearchedInTheFullTextSearchIndex()
       {
           // Arrange
           const long dateOfBirthTicks = 634493518171556320;
           using (var agency = Data.NewTestAgency())
           using (var client = Data.NewTestClient(agency, c =>
           {
               c.Gender = Gender.Male;
               c.GivenName = "Joseph";
               c.MiddleNames = "Mark";
               c.FamilyName = "Kitson";
               c.PreferredName = "Joey";
               c.DateOfBirth = new DateTimeOffset(dateOfBirthTicks, new TimeSpan());
               c.CurrentAge = null;
               c.AlsoKnownAses = new List<AlsoKnownAs>
                   {
                      new AlsoKnownAs {Name = "J-Man"},
                      new AlsoKnownAs {Name = "J-Town"}
                   };
           }
               ))
           {
               // Act
               var indexName = IndexNames.Clients(agency.Agency.Data);
               var partialName = string.Format("DateOfBirth:[{0} TO {1}]", dateOfBirthTicks - 5, dateOfBirthTicks + 5);
               var result = GraphClient.QueryIndex<Client>(indexName, IndexFor.Node, partialName);
               // Assert
               Assert.AreEqual(client.Client.Data.UniqueId, result.First().Data.UniqueId);
           }
       }
	
Summary

Well, I hope you found this post useful. Neo4jClientis on nuget, so have a bash using it and would love to know your feedback.

Download NuGetPackage: http://nuget.org/List/Packages/Neo4jClient Source Code at: http://hg.readify.net/neo4jclient/

Cheers

 

Published at DZone with permission of Romiko Derbynew, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)