Big Data/Analytics Zone is brought to you in partnership with:

Coming from a background of Aerospace Engineering, John soon discovered that his true interest lay at the intersections of information technology and entrepreneurship (and when applicable - math). In early 2011, John stepped away from his day job to take up software consulting. Finally John found permanent employment at Opensource Connections where he currently consults large enterprises about full-text search and Big Data applications. Highlights to this point have included prototyping the future of search with the US Patent and Trademark Office, implementing the search syntax used by patent examiners, and building a Solr search relevancy tuning framework called SolrPanl. John is a DZone MVB and is not an employee of DZone and has posted 23 posts at DZone. You can read more from them at their website. View Full User Profile

Building Suggest-As-You-Type With Carrot2 Clustering

02.25.2013
| 1866 views |
  • submit to reddit

The first interaction that a customer has with your e-commerce web site is with the search box itself. So it is of utmost importance to make the user experience here is as clean and positive as possible. And one good way of doing this is by providing useful suggestions to the user as they type. This behavior is called Suggest-As-You-Type.

I recently discover a clever new approach to Suggest-As-You-Type which makes use of Carrot2 Clustering. The basic idea is to populate a list of suggestions in an independent Suggest-As-You-Type Solr core.

Let’s walk through the basic idea using an example. Pretend that our online store – a grocery store – has several departments each with their own sub departments. The goal for our Suggest-As-You-Type is to provide reasonable recommendations about which department the customer should visit based upon their current search string. However, a common problem in e-commerce shops is the fact that products may be sourced from several venders, and the information that the venders provide about these products may be incomplete, inconsistent, or just wrong. So how can the Suggest-As-You-Type know that a search for “brocoli” belongs in the “Produce/Vegetable” department? And how can Suggest-As-You-Type know that a search for “India Pale Ale” belongs in the “Adult Beverage/Beer” department? As I found out, Carrot2 Clustering is how!

Here’s roughly how it could work. Presumably, we already have one Solr that is serving up product searches. Let’s restart that Solr and enable the clustering component:

java -Dsolr.clustering.enabled=true-jar start.jar

Now we can make queries with the clustering request handler that will look something like this:

http://localhost:8983/solr/clustering?q=*:*&rows=1000&carrot.title=ProductName&carrot.snippet=ProductDescription

And the response will in turn look like this:

<arrname="clusters"><lst><arrname="labels"><str>Delicious</str></arr><doublename="score">3.1654221261111397</double><arrname="docs"><str>474523</str><str>234263</str><!-- snip --><str>553285</str></arr></lst><lst><arrname="labels"><str>On Sale</str></arr><!-- snip --></lst><lst><arrname="labels"><str>Frozen</str></arr><!-- snip --></lst><lst><arrname="labels"><str>Milk</str></arr><!-- snip --></lst><!-- snip --></arr>

As you can see, the clustering query is returning a set of tagged clusters, but the tags themselves, MilkFrozenOn SaleDelicious, etc. are quite scattered and are not very helpful. Let’s tighten the query up a bit by only looking in the Produce/Fruit department:

http://localhost:8983/solr/clustering?q=+Department:Produce+SubDepartment:Fruit&rows=1000&carrot.title=ProductName&carrot.snippet=ProductDescription

The cluster tags here will be much more fruit oriented: AppleOrangePear,Red DeliciousJuicyCitrus, etc. The tags we see here are perfect for building our Suggest-As-You-Type Solr core because when the user types any of these terms, they will likely be thinking about fruit.

So let’s do just that; let’s build our suggest core. First we need to define the appropriate fields (in schema.xml):

<fieldname="DisplayText"type="string"stored="true"></field><fieldname="TagWords"type="ignored"multiValued="true"></field><fieldname="Text"type="text_general_edge_ngrammed"indexed="true"stored="true"multiValued="true"></field><!-- snip --><copyfieldsource="DisplayText"dest="Text"></copyfield><copyfieldsource="TagWords"dest="Text"></copyfield>

Here we store the DisplayText so that it can be displayed later. But the TagWords can be ignored because that field is only used to refer to the field that we dump into the Text field. The Text field, then, is of type text_general_edge_ngrammed. (So, the same as text_general, but then edge n-gram for faster partial-word matches. We can get a lot more clever here, but text analysis is not the focus of this post.)

Now that the schema is set up, all we need are documents! We need one document for every possible Department/SubDepartment present in our grocery store. The department name goes into the DisplayText field, and, as you might have guessed, for the TagWords field we run a side job that collects the Carrot2 cluster names for each of these departments. (That’s the key part. Read it again.)

Once the indexing of the Suggest At You Type core is complete, we can then take the partial searches of our customers and help direct them to the department that they are seeking. If they are looking for “crispy …” we will direct them to Produce/Fruit, and Produce/Vegetable, and Snacks/Chips. But we do not direct them them to Home Supplies/Cleaners! If they are looking for “microwa…” then we direct them to Frozen Foods, but not Adult Beverages.

Now this does beg a question, and this is the question that I asked my e-commerce client: Just because we can understand which department the searcher belongs in, is it really helpful to direct them to those departments? Think about that for a second… maybe not! For instance, if a user knows that he really, really wants the “new york chocolate cheese cake with strawberries on top”, then by suggesting that they visit the the Bakery/Cakes you are actually asking the customer to generalize their search. They came to your search box intent upon buying a New York chocolate cheese cake with strawberries on top, and you said “Nah, why don’t you just look at all our cakes. Maybe you’ll find something there.”

So… I think that this usage of clustering is a great example of the power of Carrot2, and I have been impressed with semantic clarity of the tags that Carrot2 provides to the clusters that it finds. However, building a good user experience for Suggest-As-You-Type is not as easy as it might first seem because there are so many different types of customers. For customers that come into your e-commerce site to simply look around, then it’s probably a good idea to suggest departments that they might be interested in. But for more serious customers, you might want to provide suggestions based upon previous customer searches. And for those customers that know exactly what they want, then placing products directly in the Suggest-As-You-Type response is a good idea. Ideally, a good Suggest-As-You-Type user experience would include all three of these aspects.

Published at DZone with permission of John Berryman, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)