NoSQL Zone is brought to you in partnership with:

Enthusiastic Java, Scala and Haskell programmer with a long history of large and successful systems. Known author, speaker, motivator and coach. Jan is a DZone MVB and is not an employee of DZone and has posted 26 posts at DZone. You can read more from them at their website. View Full User Profile

How to Persist Instances in Cassandra Using Hector and Scala

12.15.2012
| 9513 views |
  • submit to reddit

I am going to show you how I went about solving how to persist instances in Cassandra, using Hector, in the function insert(instance) in Scala. To work out how these instances are going to be persisted, I shall use type classes. More than just the code, I will explain & show every step of my design.

Crash course in Cassandra

Cassandra is a schema-free database; to understand it, here are the most important concepts and their loose mapping to the relational databases counterparts:

  • keyspace – schema; database
  • column family – table, with key and rows
  • key – primary key
  • row – collection of columns; the rows in the column family may have completely different columns
  • column – column

When inserting data into Cassandra, we must be able to serialise the data to be inserted. To do that, we must know the key type and the names and types of all columns.

Back to Scala

Let’s turn back to our insert(instance) function. Intuitively, we would understand what should happen if we were to insert a simple case class:

case class User(username: String,
                password: String,
                firstName: String,
                lastName: String,
                id: UUID)

Calling insert(User("janm", "yeah right, like I'd tell ya!", "Jan", "Machacek", UUID.randomUUID)), we would expect a new row with key equal to the value of id; with four columns (username -> janm, password -> ..., …) in the column family user.

It seems that I need to be able to (ignoring setting the column values, which I shall leave as exercise for the readers):

  • extract column family, given the instance
  • extract the key, given the instance
  • obtain the serializer for the extracted key
  • set the columns, given the extracted key, column family and Mutator

Let’s turn this into Scala code

package object hector {
  type KeySerializer[K] = () => Serializer[K]
  type KeyExtractor[A, K] = A => K
  type ColumnFamilyExtractor[A] = A => String
  type ColumnExtractor[A, K] = A => (K, String, Mutator[K]) => Unit
}

Now that I these types, let’s have them given to the insert function implicitly:

trait Hector {

  def keyspace: Keyspace

  def insert[A, Key](instance: A)
               (implicit keySerializer: KeySerializer[Key],
                keyExtractor: KeyExtractor[A, Key],
                columnFamilyExtractor: ColumnFamilyExtractor[A],
                columnExtractor: ColumnExtractor[A, Key]) {
    val mutator = HFactory.createMutator(keyspace, keySerializer())
    val key = keyExtractor(instance)
    val columnFamily = columnFamilyExtractor(instance)
    columnExtractor(instance)(key, columnFamily, mutator)

    mutator.execute()
  }

}

The insert function in the Hector trait uses the Hector Java API for Cassandra; the Keyspace instance is a reference to the keyspace in Cassandra.
In the first line of the insert function, we create a Mutator[K] for the keyspace, supplying the Serializer[K] we obtained from the keySerializer. Next, we use the keyExtractor to extract the value of the key K from the instance; then we extract the name of the column family from the same instance. Finally, we obtain a function that calls addInsertion of the Mutator[K]; and we complete the body by executing the queued insertions.

Simples!

The type classes

To use the Hector trait to insert some instances, I need instances of the type classes KeySerializer[K], KeyExtractor[A, K], ColumnFamilyExtractor[A] and ColumnExtractor[A, K] for the types that I am inserting. Eh–what?

case class User(username: String,
                password: String,
                firstName: String,
                lastName: String,
                id: UUID) 

object Main extends App with Hector {
  def keyspace = // connect to the keyspace

  insert(User("janm", 
              "yeah right, like I'd tell ya!", 
              "Jan", 
              "Machacek", 
              UUID.randomUUID))
}

Without the instances of the type classes, the code will not compile: there are no implicit values that are assignable to the implicit parameters of the insert function.

Home for the type classes

Before we jump into implementing the instances of the type classes, we must decide where they “live”. Because we want our code to be flexible, a good place for the type class instances are traits; traits that you can mix in wherever you use the Hector trait. All because in one case, you are inserting a case class with id: UUID field; in another case, you are inserting a class with @Id key: String getter. I would like to have the flexibility to write:

object Main extends App 
            with Hector 
            with UUIDKeySerializer 
            with UUIDIdKeyExtractor 
            with ... {
}

Or, in the second case

object Main extends App 
            with Hector 
            with StringKeySerializer 
            with StringAnnotatedKeyExtractor 
            with ... {
}

So, let’s implement instances of these type classes.

Instance of the KeySerializer[K]

We shall implement a KeySerializer[K] for UUID as K:


trait UUIDKeySerializer {

  implicit object UUIDKeySerializer extends KeySerializer[UUID] {
    def apply() = UUIDSerializer.get()
  }

}

So, for a key of type UUID, the compiler can find implicit value of KeyExtractor[UUID] and supply it to the call of the insert function. The next task is to be able to extract the value of the key.

Instance of the KeyExtractor[A, K]

We are calling insert(User(..., UUID.randomUUID)); and the key type is UUID. We shall implement instance of KeyExtractor[A <: {def id: K}, K], which extracts key of type K from some type A, which contains getter called id returning K. We then further specialise the type class instance into KeyExtractor[A <: {def id: UUID}, UUID] to match our case class.

trait IdKeyExtractor {

  class IdKeyExtractor[A <: {def id: K}, K] 
    extends KeyExtractor[A, K] {

    def apply(value: A) = value.id
  }

}

trait UUIDIdKeyExtractor extends IdKeyExtractor {

  implicit def UUIDIdKeyExtractor[A <: {def id: UUID}] = 
    new IDKeyExtractor[A, UUID]

}

Excellent–when I call insert(User(..., UUID.randomUUID)), the compiler will find that the only possible type class instance for the KeyExtractor[A, K] is the UUIDIdKeyExtractor[User, UUID] (meaning that the type of the K is now UUID), which means that the only applicable type class instance for KeySerializer[K] is the UUIDKeySerializer. Onwards!

Instance of the ColumnFamilyExtractor[A]

Before we can insert rows (with keys and columns), we must know the name of the column family. For simplicity, let’s take an approach similar to JPA and use the [simple] type name of the instances we’re inserting as the column family name. In our case, we’re inserting instances of User, so the column family should be user. The instance of the ColumnFamilyExtractor[A] is therefore:

trait TypeNameColumnFamilyExtractor {

  implicit object TypeNameColumnFamilyExtractor 
    extends ColumnFamilyExtractor[AnyRef] {
    def apply(v1: AnyRef) = v1.getClass.getSimpleName.toLowerCase
  }

}

So, the compiler knows how to get its hands on instance of KeyExtractor[A, K], KeySerializer[K], ColumnFamilyExtractor[A] where A is User and K is UUID. Now we have to set the column values.

Instance of the ColumnExtractor[A, K]

I will simply outline the implementation and leave the details to the curious readers–not because the implementation is difficult, but because this blog post is getting rather too long. Anyway, a skeleton of an instance of ColumnExtractor[A, K] is for case classes is:

trait ProductColumnExtractor {

  implicit def ProductColumnExtractor[K] = new ColumnExtractor[Product, K] {
    def apply(value: Product) = { 
        (key: K, columnFamily: String, mutator: Mutator[K]) =>
        // TODO: extract the values and serialize them
        for-all-fields {
          val fieldValue = ///
          val fieldName = ///

          // as an example for String columns, you could call
          mutator.addInsertion(key, columnFamily, 
            HFactory.createStringColumn(fieldName, fieldValue))
        }
        ()
      }
  }

}
Usage

This completes the instances of the type classes I need to insert the User instances; all I need to do is to mix in the appropriate traits that contain the correct type class instances.

case class User(username: String,
                password: String,
                firstName: String,
                lastName: String,
                id: UUID) 

object Main extends App 
  with Hector 
  with UUIDIdKeyExtractor 
  with UUIDKeySerializer
  with TypeNameColumnFamilyExtractor 
  with ProductColumnExtractor {

  def keyspace = // connect to the keyspace

  insert(User("janm", 
              "yeah right, like I'd tell ya!", 
              "Jan", 
              "Machacek", 
              UUID.randomUUID))
}

What has all this achieved? Well, I have compile-time verification of all types I am inserting; and if I decide to insert a value for which I have no instances of the type classes, I will get a compiler error! This is much better than discovering that something fails at runtime.

Parting gift

Naturally, this code will make its way to my GitHub account at https://github.com/janm399 in the next few days, but I shall give you an example of where I have used this very code in an Akka actor (with the Configuration Akka pattern):

class UserActor extends Actor 
  with Configured 
  with Hector 
  with UUIDIdKeyExtractor 
  with UUIDKeySerializer
  with TypeNameColumnFamilyExtractor 
  with ProductColumnExtractor {

  def keyspace = configured[Keyspace]

  protected def receive = {
    case Register(user) =>
      // business logic left to readers' imagination!

      insert(user)
    
  }
}

Finally, because I have many actors that want the same instances of the type classes, I have DefaultHector trait, which is:

trait DefaultHector extends Hector
  with UUIDIdKeyExtractor 
  with UUIDKeySerializer
  with TypeNameColumnFamilyExtractor 
  with ProductColumnExtractor

And it is the DefaultHector trait that I mix in to my actors… But that’s for another blog post!


Published at DZone with permission of Jan Machacek, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)