NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 60 posts at DZone. You can read more from them at their website. View Full User Profile

Neo4j and Gatling Sitting in a Tree, Performance T-E-S-T-ING

02.15.2013
| 4216 views |
  • submit to reddit

neo4j_loves_gatling

I was introduced to the open-source performance testing tool Gatling a few months ago by Dustin Barnes and fell in love with it. It has an easy to use DSL, and even though I don’t know a lick of Scala, I was able to figure out how to use it. It creates pretty awesome graphics and takes care of a lot of work for you behind the scenes. They have great documentation and a pretty active google group where newbies and questions are welcomed.

It ships with Scala, so all you need to do is create your tests and use a command line to execute it. I’ll show you how to do a few basic things, like test that you have everything working, then we’ll create nodes and relationships, and then query those nodes.

We start things off with the import statements:

import com.excilys.ebi.gatling.core.Predef._
import com.excilys.ebi.gatling.http.Predef._
import akka.util.duration._
import bootstrap._

Then we start right off with our simulation. For this first test, we are just going to get the root node via the REST api. We specify our Neo4j server, in this case I am testing on localhost (you’ll want to run your test code and Neo4j server on different servers when doing this for real). Next we specify that we are accepting JSON to return. For our test scenario, for a duration of 10 seconds, we’ll get “/db/data/node/0″ and check that Neo4j returns the http status code 200 (for everything be ok). We’ll pause between 0 and 5 milliseconds between calls to simulate actual users, and in our setup we’ll specify that we want 100 users.

class GetRoot extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:7474")
    .acceptHeader("application/json")

  val scn = scenario("Get Root")
   .during(10) {
     exec(
       http("get root node")
         .get("/db/data/node/0")
         .check(status.is(200)))
     .pause(0 milliseconds, 5 milliseconds)
   }

  setUp(
    scn.users(100).protocolConfig(httpConf)
  )
}

We’ll call this file “GetRoot.scala” and put it in the user-files/simulations/neo4j.

gatling-charts-highcharts-1.4.0/user-files/simulations/neo4j/

We can run our code with:

~$ bin/gatling.sh

We’ll get a prompt asking us which test we want to run:

GATLING_HOME is set to /Users/maxdemarzi/Projects/gatling-charts-highcharts-1.4.0
Choose a simulation number:
     [0] GetRoot
     [1] advanced.AdvancedExampleSimulation
     [2] basic.BasicExampleSimulation

Choose the number next to GetRoot and press enter.

Next you’ll get prompted for an id, or you can just go with the default by pressing enter again:

Select simulation id (default is 'getroot'). Accepted characters are a-z, A-Z, 0-9, - and _

If you want to add a description, you can:

Select run description (optional)

Finally it starts for real:

================================================================================
2013-02-14 17:18:03                                                  10s elapsed
---- Get Root ------------------------------------------------------------------
Users  : [#################################################################]100%
          waiting:0     / running:0     / done:100  
---- Requests ------------------------------------------------------------------
> get root node                                              OK=58457  KO=0     
================================================================================

Simulation finished.
Simulation successful.
Generating reports...
Reports generated in 0s.
Please open the following file : /Users/maxdemarzi/Projects/gatling-charts-highcharts-1.4.0/results/getroot-20130214171753/index.html

The progress bar is a measure of the total number of users who have completed their task, not a measure of the simulation that is done, so don’t worry if that stays at zero for a long while and then jumps quickly to 100%. You can also see the OK (test passed) and KO (tests failed) numbers. Lastly it creates a great html based report for us. Let’s take a look:

gatling

Here you can see statistics about the response times as well as the requests per second. So that’s great, we can get the root node, but that’s not very interesting, let’s create some nodes:

class CreateNodes extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:7474")
    .acceptHeader("application/json")

  val createNode = """{"query": "create me"}"""

  val scn = scenario("Create Nodes")
    .repeat(1000) {
    exec(
      http("create node")
        .post("/db/data/cypher")
        .body(createNode)
        .asJSON
        .check(status.is(200)))
      .pause(0 milliseconds, 5 milliseconds)
  }


  setUp(
    scn.users(100).ramp(10).protocolConfig(httpConf)
  )
}

In this case, we are setting 100 users to create 1000 nodes each with a ramp time of 10 seconds. We’ll run this simulation just like before, but choose Create Nodes. Once it’s done, take a look at the report, and scroll down a bit to see the chart of the Number of Requests per Second:

Screen Shot 2013-02-14 at 5.33.29 PM

You can see the number of users ramp up over the first 10 seconds and fade at the end. Let’s go ahead and connect some of these nodes together:

We’ll add JSONObject to import statements, and since I want to see what nodes we link to what nodes together, we’ll print the details for the request. I am randomly choosing two ids, and passing them to a cypher query to create the relationships:

import com.excilys.ebi.gatling.core.Predef._
import com.excilys.ebi.gatling.http.Predef._
import akka.util.duration._
import bootstrap._
import util.parsing.json.JSONObject


class CreateRelationships extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:7474")
    .acceptHeader("application/json")
    .requestInfoExtractor(request => {
      println(request.getStringData)
      Nil
    })


  val rnd = new scala.util.Random
  val chooseRandomNodes = exec((session) => {
    session.setAttribute("params", JSONObject(Map("id1" -> rnd.nextInt(100000),
                                                  "id2" -> rnd.nextInt(100000))).toString())
  })

  val createRelationship = """START node1=node({id1}), node2=node({id2}) CREATE UNIQUE node1-[:KNOWS]->node2"""
  val cypherQuery = """{"query": "%s", "params": %s }""".format(createRelationship, "${params}")


  val scn = scenario("Create Relationships")
    .during(30) {
    exec(chooseRandomNodes)
      .exec(
        http("create relationships")
          .post("/db/data/cypher")
          .header("X-Stream", "true")
          .body(cypherQuery)
          .asJSON
          .check(status.is(200)))
      .pause(0 milliseconds, 5 milliseconds)
  }

  setUp(
    scn.users(100).ramp(10).protocolConfig(httpConf)
  )
}

When you run this, you’ll see a stream of the parameters we sent to our post request:

{"query": "START node1=node({id1}), node2=node({id2}) CREATE UNIQUE node1-[:KNOWS]->node2", "params": {"id1" : 98468, "id2" : 20147} }
{"query": "START node1=node({id1}), node2=node({id2}) CREATE UNIQUE node1-[:KNOWS]->node2", "params": {"id1" : 83557, "id2" : 26633} }
{"query": "START node1=node({id1}), node2=node({id2}) CREATE UNIQUE node1-[:KNOWS]->node2", "params": {"id1" : 22386, "id2" : 99139} }

You can turn this off, but I just wanted to make sure the ids were random and it helps when debugging. Now we can query the graph. For this next simulation, I want to see the answers returned from Neo4j, and I want to see the nodes related to 10 random nodes passed in as a JSON array. Notice it’s a bit different from before, and we are also checking to see if we got “data” back in our request.

import com.excilys.ebi.gatling.core.Predef._
import com.excilys.ebi.gatling.http.Predef._
import akka.util.duration._
import bootstrap._
import util.parsing.json.JSONArray


class QueryGraph extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:7474")
    .acceptHeader("application/json")
    .responseInfoExtractor(response => {
      println(response.getResponseBody)
      Nil
    })
    .disableResponseChunksDiscarding

  val rnd = new scala.util.Random
  val nodeRange = 1 to 100000
  val chooseRandomNodes = exec((session) => {
    session.setAttribute("node_ids", JSONArray.apply(List.fill(10)(nodeRange(rnd.nextInt(nodeRange length)))).toString())
  })

  val getNodes = """START nodes=node({ids}) MATCH nodes -[:KNOWS]-> other_nodes RETURN ID(other_nodes)"""
  val cypherQuery = """{"query": "%s", "params": {"ids": %s}}""".format(getNodes, "${node_ids}")

  val scn = scenario("Query Graph")
    .during(30) {
    exec(chooseRandomNodes)
      .exec(
        http("query graph")
          .post("/db/data/cypher")
          .header("X-Stream", "true")
          .body(cypherQuery)
          .asJSON
          .check(status.is(200))
          .check(jsonPath("data")))
      .pause(0 milliseconds, 5 milliseconds)
  }

  setUp(
    scn.users(100).ramp(10).protocolConfig(httpConf)
  )
}

If we take a look at the details tab for this simulation we see a small spike in the middle:

Screen Shot of Gatling

This is a tell-tale sign of a JVM Garbage Collection taking place and we may want to look into that. Edit your neo4j/conf/neo4j-wrapper.conf file and uncomment the garbage collection logging, as well as add timestamps to gain better visibility in to the issue:

# Uncomment the following line to enable garbage collection logging
wrapper.java.additional.4=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional.5=-XX:+PrintGCDateStamps

Neo4j performance tuning deserves its own blog post, but at least now you have a great way of testing your performance as you tweak JVM, cache, hardware, load balancing, and other parameters. Don’t forget while testing Neo4j directly is pretty cool, you can use Gatling to test your whole web application too and measure end to end performance.


 

Published at DZone with permission of Max De Marzi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)