NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 60 posts at DZone. You can read more from them at their website. View Full User Profile

Visualizing a Set of Hiveplots with Neo4j

03.29.2012
| 5867 views |
  • submit to reddit



What should a graph look like and how can I tell two graphs apart?


These are questions Martin Krzywinski (Genome Sciences Center, Vancouver, BC) has been asking. Take a look at the picture below:

It’s the same graph, the same data, visualized 8 different ways. Which is the right way? What advantage does one layout give over the other? Can you tell it’s the same network? I can’t.

Eight layouts might be too much, so let’s just look at one on the next picture:

Martin took the spring embedded visualization and tweaked it around. Can you tell it’s the same graph, the same data underneath? I can’t.

To tackle this problem, Martin invented the Hive Plot, a perceptually uniform and scalable layout visualization for network visual analytics.

If you want to learn more about Hive Plots, take a look at his website and this presentation (it is quite large at 20 MB). I cannot do it justice in this short blog post, and in all honestly haven’t had the time to study it properly.

Today I just want to give you a little taste of Hiveplots. I am going to visualize the github graphs of nine languages you might not have heard of: Boo, Dylan, Factor, Gosu, Mirah, Nemerle, Nu, Parrot, Self. I’m not going to show you how to create the graph this time, because this is real data we are using. You can take a look at it on the data folder in github.

The graph is basically: (Language)–(Repository)–(User). There are two relationships between Repository and User, wrote and forked.

I’ll show you how to get the data out and into our visualization.

def wroterepos(language)
  neo = Neography::Rest.new
  neo.execute_script("m = [:]
                      g.V.filter{it.type == 'language' && it.name == '#{language}'}
                       .in.transform{m[it.name] = it.in('wrote').gather{it.name}.next()}
                       .iterate()
                      m")
end

We do the same thing but for forked. This may seem a bit strange to you, but what I am doing is kind of like the SQL equivalent of a LEFT OUTER JOIN with Gremlin.

def forkedrepos(language)
  neo = Neography::Rest.new
  neo.execute_script("m = [:]
                      g.V.filter{it.type == 'language' && it.name == '#{language}'}
                       .in.transform{m[it.name] = it.in('forked').gather{it.name}.next()}
                       .iterate()
                      m")
end

Now we do some ruby magic to put our data into the JSON format the visualization wants.

get '/hive/:name' do
  repos        = []
  writers      = [] 
  forkers      = []
  temp_forkers = []
  temp_writers = []

  wroterepos(params[:name]).each_pair do |key, value|
    repos << {"name" => key, "imports" => value, "node_type" => "repo"}
    temp_writers << { "name" => value[0] }
  end

  i = 0
  forkedrepos(params[:name]).each_pair do |key, value|
    repos[i]["imports"] =  repos[i]["imports"] + value
    temp_writers[i]["imports"] = value
    temp_forkers << value
    i += 1
  end

  temp_writers.group_by {|i| i["name"]}.each do |w, f|
    writers << {"name" => w, 
                "imports" => f.collect{|y| y["imports"]}.flatten.uniq, 
                "node_type" => "writer"}
  end

  temp_forkers.flatten.uniq.delete_if{|x| writers.collect{|y| y["name"]}.include?(x)}.each do |f|
    forkers << {"name" => f, 
                "imports" => [], 
                "node_type" => "forker"}
  end

  (repos + writers + forkers).to_json
end

The blue color nodes are our repositories, the yellow nodes are our writers, and the green nodes are our forkers. The 12 o’clock axis (the top) shows nodes with only outgoing relationships. The bottom-left axis shows nodes with only incoming relationships. These are the writers without any forks, and the forkers who never started their own public projects. The remaining nodes in the bottom-right have both incoming and outgoing relationships. These are the repository writers who created projects other people found worth forking.

The graphs are ordered across for each row in the following manner:

  • Boo, Dylan, Factor
  • Gosu, Mirah, Nemerle
  • Nu, Parrot, Self

Can you see the similarities between Boo, Factor and Numerle? See how different they are from Gosu and Self? What does the hive plot tell you about these Language github repositories?

 

You can try a live version at hiveplot.herokuapp.com/index.html and as always the code is available on github.

Our visualization was done by Rich Morin and Mike Bostock with D3.js. Is is a hot off the press work in progress. You can follow the action on this D3.js google group thread.

Published at DZone with permission of Max De Marzi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)