Big Data/Analytics Zone is brought to you in partnership with:

Rafal Kuc is a team leader and software developer. Right now he is a software architect and Solr and Lucene specialist. Mainly focused on Java, but open on every tool and programming language that will make the achievement of his goal easier and faster. Rafal is also one of the founders of solr.pl site where he tries to share his knowledge and help people with their problems. Rafał is a DZone MVB and is not an employee of DZone and has posted 75 posts at DZone. You can read more from them at their website. View Full User Profile

Autocomplete on Multivalued Fields Using Faceting

03.26.2013
| 2826 views |
  • submit to reddit

In the previous blog post about auto complete on multi-valued field we discussed how highlighting can help us get the information we are interested in. We also promised that we will get back to the topic and we will show how to achieve a similar functionality with the use of Solr faceting capabilities. So, let’s do it.

Before we start

Because this post is more or less a continuation of what we’ve wrote earlier about autocomplete on multi-valued fields we recommend to read the “Autocomplete on multivalued field using highlighting” before reading the rest of this entry. We would also like to note, that the method shown in this entry is very similar to the one shown in the “Solr and autocomplete (part 1)” post, but we wanted to refresh that topic and show the example using multi-valued fields.

Configuration

Similar to the previous post we will start with Solr configuration.

Index structure

The structure of our index is exactly the same as the one previously shown, but let’s recall it. One thing – please remember that we want to have auto complete working on multi-valued field. This field is called features and the whole index fields configuration looks like this:

<fields>
 <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
 <field name="features" type="string" indexed="true" stored="true" multiValued="true"/>
 <field name="features_autocomplete" type="text_autocomplete" indexed="true" stored="true" multiValued="true"/>

 <field name="_version_" type="long" indexed="true" stored="true"/>
</fields>

For getting values for auto complete we will use the features_autocomplete field.

Copy field

Of course we don’t want to change our indexer and we want Solr to automatically copy the data from features field to the features_autocomplete one. Because of that we will add the copyField definition to the schema.xml file, so it looks like this:

<copyField source="features" dest="features_autocomplete"/>

Our text_autocomplete field type

And we’ve come to the first difference – the text_autocomplete field type. This time it looks like this:

<fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
 <analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

Because of the fact that we will use faceting we use the solr.KeywordTokenizerFactory with thesolr.LowerCaseFilterFactory to have the data in our field as a single, lowercased token.

Example data

Our example data is identical to what we had before, but even though let’s recall them for things to be clear:

<add>
 <doc>
  <field name="id">1</field>
  <field name="features">Multiple windows</field>
  <field name="features">Single door</field>
 </doc>
 <doc>
  <field name="id">2</field>
  <field name="features">Single window</field>
  <field name="features">Single door</field>
 </doc>
 <doc>
  <field name="id">3</field>
  <field name="features">Multiple windows</field>
  <field name="features">Multiple doors</field>
 </doc>
</add>

Query with faceting

Let’s look how our query will look like when we will use faceting instead of highlighting.

Full query

When using faceting our query should look more or less like the following one:

q=*:*&rows=0&facet=true&facet.field=features_autocomplete&facet.prefix=sing

A few words about the parameters:

  • rows=0 – we tell Solr that we don’t want the documents that matched the query in the results,
  • facet=true – we inform Solr that we want to use faceting,
  • facet.field=features_autocomplete – we say which field will be used to calculate faceting,
  • facet.prefix=sing – with the use of this parameter we provide the value of a query for auto complete.

Query results

Query results returned by Solr for the above query are as follows:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
  <lst name="params">
    <str name="facet">true</str>
    <str name="q">*:*</str>
    <str name="facet.prefix">sing</str>
    <str name="facet.field">features_autocomplete</str>
    <str name="rows">0</str>
  </lst>
</lst>
<result name="response" numFound="3" start="0">
</result>
<lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields">
    <lst name="features_autocomplete">
      <int name="single door">2</int>
      <int name="single window">1</int>
    </lst>
  </lst>
  <lst name="facet_dates"/>
  <lst name="facet_ranges"/>
</lst>
</response>

As you can see in the field faceting section we got the phrases we were interested in along with the number of documents they appear in.

What to remember about

The crucial thing to remember is that the value provided to the facet.prefix parameter is not analyzed. Because of that if we would provide the Sing value instead of the singwe wouldn’t get the results. You should remember that.

A short summary

The above entry shown the second method used to develop auto complete functionality on multi-valued fields. Of couse we didn’t say all about the topic and we will get back to it someday, but for now that is all. We hope that someone will find it useful :)

Published at DZone with permission of Rafał Kuć, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)