DevOps Zone is brought to you in partnership with:

Robin Bramley is a hands-on Architect who has spent the last decade working with Java, mobile & Open Source across sectors including Financial Services & High Growth / start-ups. Prior to that he helped UK Police Forces with MIS /reporting & intelligence systems. He has contributed to a wide variety of Open Source projects including adding Open ID support to Spring Security. Robin is a DZone MVB and is not an employee of DZone and has posted 23 posts at DZone. You can read more from them at their website. View Full User Profile

How to Get Apache Mahout Recommenders into Grails

07.07.2012
| 6768 views |
  • submit to reddit

Apache Mahout is a scalable machine learning framework that can be used to create intelligent applications. In this article we’ll see how Mahout can be used to create personalised recommendations within a Grails application.

This article originally appeared in the February 2012 edition of GroovyMag.

About Mahout

Mahout started off as a sub-project of Apache Lucene and the name is a Hindi word referring to an elephant driver; portions of Mahout are built on top of Hadoop which was named after a stuffed toy elephant owned by the son of Doug Cutting who started that project.

Hadoop was extracted from the Nutch crawler Lucene sub-project and provides a scalable data processing framework using Map-Reduce on top of a distributed file system (HDFS). The use of Hadoop is beyond the scope of this introductory article.

Mahout has three primary areas of machine learning functionality: classification, clustering and recommendations.

Classification can be used to determine how similar an item is to previously encountered items or patterns and whether it belongs to the same category.

For instance, Mahout supports Naive Bayes classification with the ‘hello world’ sample training a classifier and then classifying posts from 20 newsgroups (a common reference data set for machine learning research).

Clustering groups items together based on their similarity – e.g. this can be observed in Google search results.

We’re going to focus on recommendations, sometimes referred to as collaborative filtering. Amazon is a well-known example of providing suggested "other people also bought" products.

For further information on Mahout and clustering or classification I suggest you read Mahout in Action (see what I did there?).

Recommendations

Recommendations provide a discovery mechanism to introduce users to new items that may be of interest to them; it is usually associated with cross-selling tactics (e.g. ‘Also bought’) on retail e-commerce sites.

Mahout provides a set of components to enable the construction of customised recommendation engines.

Firstly there is the Recommender that will produce recommendations based on a DataModel.

A user-based recommender will look for similar preferences or behavior patterns between users (UserSimilarity), and then group these into neighbourhoods (UserNeighborhood) e.g. the nearest 10 users to the specified user. The chosen algorithm will then select the recommendations of new items from within the neighbourhood.

An item-based recommender works on similarity between items (ItemSimilarity).

The DataModel has a very simple representation of the Preference data to work with: a long user-id, a long item-id and a float preference value. This is limited to a tuple to reduce memory consumption and therefore increase scalability. DataModel implementations are available for files, MySQL, generic JDBC and Postgres.

https://cwiki.apache.org/MAHOUT/recommender-documentation.html has lots more useful information and further links.

Grails recommendations

Lim Chee Kin has been hard at work and recently released a Grails Mahout-Recommender plugin in December 2011. The plugin is intended to let you evaluate different recommenders without needing to write any code; once you have selected a recommender then it can be enabled using configuration.

We’ll start by looking at the sample packaged libimseti application based on code from chapter 5 of Mahout in Action which uses 17+ million profile ratings from a Czech dating site to recommend compatible user profiles.

Libimseti

Getting set up

I created a clean Grails 2.0.0 application and installed the plugin (version 0.5.1 at time of writing) using
the commands in Listing 1. The plugin adds some new scripts, including the ability to install a sample application that we can run using the grails install-libimseti-sample-app command.

grails create-app GroovyMagMahout
cd GroovyMagMahout
grails install-plugin mahout-recommender

Listing 1: application creation and plugin installation

Sample data

The Libimseti sample data is not redistributable, so you’ll need to download the zip from http://www.occamslab.com/petricek/data/libimseti-complete.zip and then extract the .dat files to grails-app/conf

Configuration

There are two more things we need to do before we can run the Grails application, firstly we need to adjust the plugin configuration in Config.groovy to add the settings from Listing 2 and we also need to give Grails some more memory for the hungry algorithm to prevent an "OutOfMemoryError: GC overhead limit exceeded" which is achieved by Listing 3.

mahout.recommender.hasPreference = false
mahout.recommender.data.file = 'ratings.dat'

isting 2: Additional plugin configuration

export GRAILS_OPTS="-XX:MaxPermSize=256m -Xmx1024M -server"
Listing 3: Increasing the memory allocated to Grails

Libimseti sample usage

When we execute grails run-app and browse to http://localhost:8080/GroovyMagMahout we will see that as per Figure 1 we have a single recommender controller listed.

Figure 1: Application home page

Selecting the controller will bring us to the settings form shown in Figure 2.

Figure 2: Recommender settings

Enter user ID ’133′, submit the form and after some time (the algorithm is tuned for better matching rather than performance) you’ll see the recommendations shown in Figure 3 where a higher score means a better match.

Figure 3: Recommendations

What next?

The libimseti sample application is good to prove that the theory works on a reasonably sized dataset, but not many of us are going to want to curate our data to match an input file. More realistically we’ll have an application that has associations between users and items or allows users to rate items.

We’ll build a simple Grails implementation – the source code is available on GitHub from https://github.com/rbramley/GroovyMagMahout

The data model

As a simplification for this exercise we will use a single preference table, this will represent the link (many-to-many join) table between a user and an item as illustrated in ERD notation in Figure 4.

Figure 4: Sample ERD

This table will have a composite primary key comprising of the user and items IDs and then a value rating the strength of the preference. For the preference we’ll use a 1 to 5 range as this may be represented by a rating widget (such as that provided by the Grails Rich UI plugin).

We’ll create a domain class (using grails create-domain-class com.rbramley.mahout.Preference) and specify it as per Listing 4. Note that we’ve used a composite key to satisfy Mahout’s needs, but this exposes a view minor Grails issues with the generated default controller and views (grails generate-all com.rbramley.mahout.Preference) not being composite-key aware (this shouldn’t affect you unless you want to edit/delete preference records).

package com.rbramley.mahout
 
import org.apache.commons.lang.builder.HashCodeBuilder
 
class Preference implements Serializable {
    long userId
    long itemId
    float prefValue
 
    static constraints = {
        userId()
        itemId()
        prefValue range: 0.0f..5.0f
    }
 
    boolean equals(other) {
        if(!(other instanceof Preference)) {
           return false
        }
 
        other.userId == userId && other.itemId == itemId
    }
 
    int hashCode() {
        def builder = new HashCodeBuilder()
        builder.append userId
        builder.append itemId
        builder.toHashCode()
    }
 
    static mapping = {
        id composite: ['userId', 'itemId']
       version false
    }
}

Listing 4: Domain class

We’ll use MySQL for the database, as that Mahout DataModel provider implementation is supported by the Grails plugin (Mahout also has Postgres and generic JDBC implementations).

Firstly we need to create the target schema in MySQL (Listing 5).

mysql -u root -p
mysql> create database recommender;
mysql> grant all on recommender.* to recommender@localhost identified by 'mahoutdemo';

Listing 5: MySQL commands

With that done we can uncomment the runtime mysql-connector-java dependency in BuildConfig.groovy and then configure DataSource.groovy accordingly (Listing 6).

    development {
        dataSource {
            driverClassName = "com.mysql.jdbc.Driver"
            dbCreate = "create-drop" // one of 'create', 'create-drop', 'update', 'validate'
            url = "jdbc:mysql://localhost:3306/recommender"
            username = "recommender"
            password = "mahoutdemo"
        }
    }

Listing 6: Development data source configuration

Reconfiguring the plugin

We now need to reconfigure Config.groovy to instruct the plugin which recommender and similarity algorithms to use and where to obtain the data from, this is achieved using the settings in Listing 7.


mahout.recommender.mode = 'config'  // 'input', 'config' or 'class'
mahout.recommender.hasPreference = true
mahout.recommender.selected = 1 // user-based
mahout.recommender.similarity = 'PearsonCorrelation'
mahout.recommender.withWeighting = false
mahout.recommender.neighborhood = 2
mahout.recommender.data.model = 'mysql'
mahout.recommender.preference.table = 'preference'
mahout.recommender.preference.valueColumn = 'pref_value'

Listing 7: Plugin configuration

Providing data

We can now run the application, select the new com.rbramley.mahout.PreferenceController and enter some values for our data.

If you enter the data set shown in Figure 5, then when you use the recommendations controller to obtain recommendations for user ID 1, you should get the recommendations of 104 and 106 as shown in Figure 6.

Figure 5: Sample data values

Figure 6: Recommendations for User Id 1

Alternatively there is a SQL script within the project on GitHub that can be run to seed the preferences table with similar data (based on listing 2.1 from Mahout in Action).

Evaluating recommenders

The Grails plugin features a built in recommender evaluator based on average difference, in our case we can access it at http://localhost:8080/GroovyMagMahout/recommender/evaluator and click on the ‘Run Evaluator’ link, the sample output is shown in Figure 7.

A lower difference is better – so you may want to experiment with changing the mahout.recommender.similarity property that we set in Listing 7, valid values are ‘PearsonCorrelation’, ‘EuclideanDistance’, ‘LogLikelihood’ or ‘TanimotoCoefficient’.

Figure 7: Recommender evaluation

Likewise you may want to modify other properties such as applying weighting or adjusting the size of the neighbourhood – in any case please refer to the configuration section of the plugin manual at http://limcheekin.github.com/mahout-recommender/docs/manual/guide/configuration.html

Summary

This article has introduced Apache Mahout, an open source scalable machine learning framework, and shown how you can utilise it to provide personal recommendations within a Grails application. We’ve seen custom recommendations for the libimseti sample data files and recommendations based on user similarity on top of a Grails domain class. In practice these recommenders would ideally be invoked asynchronously particularly for large data sets, this could be achieved using AJAX techniques.

Have fun integrating stylised recommendations into your application, but remember it’s good to allow the users to give feedback on the relevancy of the recommendations!

!

References / further reading

The following provide valuable sources of information:

 

 

 

 

 

 

 

 

 

Published at DZone with permission of Robin Bramley, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)