Big Data/Analytics Zone is brought to you in partnership with:

Big Data Graphs: Loopy Lattices

08.27.2012
| 4644 views |
  • submit to reddit

 The content of this article was originally co-authored by Marko Rodriguez and Bobby Norton over at the Aurelius blog

A lattice is a graph that has a particular, well-defined structure. An nxn lattice is a 2-dimensional grid where there are n edges along its x-axis and n edges along its y-axis. An example 20x20 lattice is provided in the two images above. Note that both images are the “same” 20x20 lattice. Irrespective of the lattice being “folded,” both graphs are isomorphic to one another (i.e. the elements are in one-to-one correspondence with each other). As such, what is important about a lattice is not how it is represented on a 2D plane, but what its connectivity pattern is. Using the R statistics language, some basic descriptive statistics are computed for the 20x20 lattice named g.

~$ r

R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
...
> length(V(g))
[1] 441
> length(E(g))
[1] 840
> hist(degree(g), breaks=c(0,2,3,4), freq=TRUE, xlab='vertex degree', ylab='frequency', cex.lab=1.25, main='', col=c('gray10','gray40','gray70'), labels=TRUE, axes=FALSE, cex=2)

The degree statistics of the 20x20 lattice can be analytically determined. There must exist 4 corner vertices each having a degree of 2. There must be 19 vertices along every side that each have a degree of 3. Given that there are 4 sides, there are 76 vertices with degree 3 (19 x 4 = 76). Finally, their exists 19 rows of 19 vertices in the inner-square of the lattice that each have a degree of 4 and therefore, there are 361 degree 4 vertices (19 x 19 = 361). The code snippet above plots the 20x20 lattice’s degree distribution–confirming the aforementioned derivation. The 20x20 lattice has 441 vertices and 840 edges. In general, the number of vertices in an nxn lattice will be (n+1)(n+1) and the number of edges will be 2((nn) + n).

Traversals Through a Directed Lattice

Suppose a directed lattice where all edges either point to the vertex right of or below the tail vertex. In such a structure, the top-left corner vertex has only outgoing edges. Similarly, the bottom-right corner vertex has only incoming edges. An interesting question that can be asked of a lattice of this form is:

“How many unique paths exist that start from the top-left vertex and end at the bottom-right vertex?”

For a 1x1 lattice, there are two unique paths.

0 -> 1 -> 3
0 -> 2 -> 3

As diagrammed above, these paths can be manually enumerated by simply drawing the paths from top-left to bottom-right without drawing the same path twice. When the lattice becomes too large to effectively diagram and manually draw on, then a computational numerical technique can be used to determine the number of paths. It is possible to construct a lattice using BlueprintsTinkerGraph and traverse it using Gremlin. In order to do this for a lattice of any size (any n), a function is defined named generateLattice(int n).

def generateLattice(n) {
  g = new TinkerGraph()
  
  // total number of vertices
  max = Math.pow((n+1),2)
  
  // generate the vertices
  (1..max).each { g.addVertex() }
    
  // generate the edges
  g.V.each {
    id = Integer.parseInt(it.id)

    right = id + 1
    if (((right % (n + 1)) > 0) && (right <= max)) {
      g.addEdge(it, g.v(right), '')
    }

    down = id + n + 1
    if (down < max) {
      g.addEdge(it, g.v(down), '')
    }
  }
  return g
}

An interesting property of the “top-to-bottom” paths is that they are always the same length. For the 1x1 lattice previously diagrammed, this length is 2. Therefore, the bottom right vertex can be reached after two steps. In general, the number of steps required for an nxn lattice is 2n.

gremlin> g = generateLattice(1) 
==>tinkergraph[vertices:4 edges:4]
gremlin> g.v(0).out.out.path    
==>[v[0], v[2], v[3]]
==>[v[0], v[1], v[3]]
gremlin> g.v(0).out.loop(1){it.loops <= 2}.path
==>[v[0], v[2], v[3]]
==>[v[0], v[1], v[3]]

A 2x2 lattice is small enough where its paths can also be enumerated. This enumeration is diagrammed above. There are 6 unique paths. This can be validated in Gremlin.

gremlin> g = generateLattice(2)               
==>tinkergraph[vertices:9 edges:12]
gremlin> g.v(0).out.loop(1){it.loops <= 4}.count()
==>6
gremlin> g.v(0).out.loop(1){it.loops <= 4}.path
==>[v[0], v[3], v[6], v[7], v[8]]
==>[v[0], v[3], v[4], v[7], v[8]]
==>[v[0], v[3], v[4], v[5], v[8]]
==>[v[0], v[1], v[4], v[7], v[8]]
==>[v[0], v[1], v[4], v[5], v[8]]
==>[v[0], v[1], v[2], v[5], v[8]]

If a 1x1 lattice has 2 paths, a 2x2 6 paths, how many paths does a 3x3 lattice have? In general, how many paths does an nxn lattice have? Computationally, with Gremlin, these paths can be traversed and counted. However, there are limits to this method. For instance, try using Gremlin’s traversal style to determine all the unique paths in a 1000x1000 lattice. As it will soon become apparent, it would take the age of the universe for Gremlin to realize the solution. The code below demonstrates Gremlin’s calculation of path counts up to lattices of size 10x10.

gremlin> (1..10).collect{ n ->
gremlin>   g = generateLattice(n)
gremlin>   g.v(0).out.loop(1){it.loops <= (2*n)}.count()
gremlin> }
==>2
==>6
==>20
==>70
==>252
==>924
==>3432
==>12870
==>48620
==>184756

A Closed Form Solution and the Power of Analytical Techniques

In order to know the number of paths through any arbitrary nxn lattice, a closed form equation must be derived. One way to determine the closed form equation is to simply search for the sequence on Google. The first site returned is the Online Encyclopedia of Integer Sequences. The sequence discovered by Gremlin is called A000984 and there exists the following note on the page:

“The number of lattice paths from (0,0) to (n,n) using steps (1,0) and (0,1). [Joerg Arndt, Jul 01 2011]“

The same page states that the general form is “2n choose n.” This can be expanded out to its factorial representation (e.g. 5! = 5 * 4 * 3 * 2 * 1) as diagrammed below.


Given this closed form solution, an explicit graph structure does not need to be traversed. Instead, a combinatoric equation can be evaluated for any n. A directed 20x20 lattice has over 137 billion unique paths! This number of paths is simply too many for Gremlin to enumerate in a reasonable amount of time.

> n = 20
> factorial(2 * n) / factorial(n)^2
[1] 137846528820

A question that can be asked is: “How does 2n choose 2 explain the number of paths through an nxn lattice?” When counting the number of paths from vertex (0,0) to (n,n), where only down and right moves are allowed, there have to be n moves down and n moves right. This means there are 2n total moves, and as such, there are n choices (as the other n “choices” are forced by the previous n choices). Thus, the total number of moves is “2n choose n.” This same integer sequence is also found in another seemingly unrelated problem (provided by the same web page).

“Number of possible values of a 2*n bit binary number for which half the bits are on and half are off. – Gavin Scott, Aug 09 2003″

Each move is a sequence of letters that contains n Ds and n Rs, where down twice then right twice would be DDRR. This maps the “lattice problem” onto the “binary string of length 2n problem.” Both problems are essentially realizing the same behavior via two different representations.

Plotting the Growth of a Function

It is possible to plot the combinatorial function over the sequence 1 to 20 (left plot below). What is interesting to note is that when the y-axis of the plot is set to a log-scale, the plot is a straight line (right plot below). This means that the number of paths in a directed lattice grows exponentially as the size of the lattice grows linear.

> factorial(2 * seq(1,n)) / factorial(seq(1,n))^2
 [1]            2            6           20           70          252          924
 [7]         3432        12870        48620       184756       705432      2704156
[13]     10400600     40116600    155117520    601080390   2333606220   9075135300
[19]  35345263800 137846528820

> x <- factorial(2 * seq(1,n)) / factorial(seq(1,n))^2
> plot(x, xlab='lattice size (n x n)', ylab='total number of paths', cex.lab=1.4, cex.axis=1.6, lwd=1.5, cex=1.5, type='b')
> plot(x, xlab='lattice size (n x n)', ylab="total number of paths", cex.lab=1.4, cex.axis=1.6, lwd=1.5, cex=1.5, type='b' log='y')

Conclusion

It is wild to think that a 20x20 lattice, with only 441 vertices and 840 edges, has over 137 billion unique directed paths from top-left to bottom-right. It’s this statistic that makes it such a loopy lattice! Anyone using graphs should take heed. The graph data structure is not like its simpler counterparts (e.g. the list, map, and tree). The connectivity patterns of a graph can yield combinatorial explosions. When working with graphs, it’s important to understand this behavior. It’s very easy to run into situations, where if all the time in the universe doesn’t exist, then neither does a solution.

Acknowledgments

This exploration was embarked on with Dr. Vadas Gintautas. Vadas has published high-impact journal articles on a variety of problems involving biological networks, information theory, computer vision, and nonlinear dynamics. He holds a Ph.D. in Physics from the University of Illinois at Urbana Champaign.

Finally, this post was inspired by Project Euler. Project Euler is a collection of math and programming challenges. Problem 15 asks, “How many routes are there through a 20x20 grid?”

 

 

 

 

 

 

Published at DZone with permission of Marko Rodriguez, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)