NoSQL Zone is brought to you in partnership with:

Loopy Lattices Redux

06.19.2013
| 1827 views |
  • submit to reddit


Loopy Lattices Reduxgraph can be realized as a tessellated surface. Each vertex (or point) on the surface has a set of neighbors. Two vertices are neighbors if there is an edge that connects them. For a discrete traverser (e.g. a marble), movement in a space requires a decision at each vertex. The traverser must decide whether to go left, up, east, right, west, or down — i.e., it must choose one and only one edge emanating from its current vertex. The edge chosen leads to yet another vertex. At which point, another choice must be made ad infinitum.

k-Regular Graphs: k-Ary String Generators

Binary State Machinedirected graph is k-regular if each vertex has k adjacent vertices. Starting from any vertex, there are kn possible n-length paths in a k-regular graph. 2-Regular GraphFor a 2-regular graph (k=2), the number of unique paths is equivalent to the number of objects that can be represented by abinary string. For instance, there are 4,294,967,296 length 32 paths in a 2-regular graph (232). This relationship is apparent if every vertex’s incident edges are labeled with either a 0 or 1. If the traversers print the edge label at each step in an n-step walk, then all possiblen-length binary strings will be generated.

A graph with 2-degree vertices is rare in natureBinary TreeA graph with 10+ degree vertices is more common, though such k-regularity is uncommon. A 10-regular graph has 100,000,000,000,000,000,000,000,000,000,000 unique 32-length paths (1032).Tiny Binary MachineCombinatorial explosions are readily apparent whencomputing on natural graphs. Such truths effect the way in which technology/algorithms should be implemented when solving real-world graph problems.

Lattice Graphs: Binary Strings with an Equal Number of 0s and 1s

Lattice Graphlattice is a planar graph with a rectangular tessellation. The “inside” of a finite lattice is regular in that each vertex connects to the same number of vertices, but the “outside” is not regular in that there are no neighbors beyond the border vertices. A 20x20 directed lattice has 441 vertices and 840 edges. As shown analytically in the original Loopy Latticespost, there are 137,846,528,820 (137.85 billion) unique paths from the top-left vertex to the bottom-right vertex. A discrete traverser must take 40 steps to make the journey. Of those 40 steps, an equal number of 1s and 0s will be “printed.” Thus, the problem of determining how many unique paths there are in a directed 20x20lattice is a question of how many unique 40-length binary strings exist such that there is an equal number of 1s and 0s. This constraint yields a number that is much smaller than kn. In the previous post, Gremlin (in its depth-first, enumerative form) could not calculate the answer due to the explosion of possibilities. Therefore, to answer the question, the closed form solution below was provided. The solution says “2n choose n” or, in particular, “40 choose 20.” The 20 chosen 1 slots in the 40-length binary string forces the remaining 20 positions to be 0.

Titan vs. Faunus: The Enumerative/Counting Distinction

graph database such as Titan can be used to store a 20x20 lattice. While a 840 edge graph is extremely small for a “database,” it is necessary for the experiment to follow. The Gremlin/Groovy code to create a 20x20 lattice in Titan is provided below.

12345678910111213141516171819202122232425def generateLattice(n) {g = TitanFactory.open('bin/cassandra.local')// total number of verticesmax = Math.pow((n+1),2)// generate the verticesvertices = [];(1..max).each { vertices.add(g.addVertex()) }// generate the edgesvertices.eachWithIndex { v, i ->right = i + 1if (((right % (n + 1)) > 0) && (right <= max)) {v.addEdge('link', vertices[right])}down = i + n + 1if (down < max) {v.addEdge('link', vertices[down])}}g.commit();return g}

Traversing a Lattice with Titan

 Distributed Graph DatabaseWith Titan/Gremlin, it is possible to count the number of 1-, 2-, 3-, …, 40-length paths in the 20x20lattice. A traversal of length 40 is a full traversal of the lattice from the top-left to the bottom-right. The traversal code and the runtimes are provide below. An exponentiating runtime is realized: 10.9 minutes (28 steps), 2.3 hours (29 steps), and 9.8 hours (30 steps) traversals. The calculation was halted on the 31st step. The number of 40-length paths could not be reasonably calculated with Titan/Gremlin. By exhaustively enumerating paths and with the number of paths growing exponentially as the path length increases, a calculation of this form is doomed for large values of n.

/usr/local/titan-all-0.3.1$ bin/gremlin.sh
 
         \,,,/
         (o o)
-----oOOo-(_)-oOOo-----
gremlin> g = generateLattice(20)
==>titangraph[cassandrathrift:127.0.0.1]
gremlin> g.V.count()
==>441
gremlin> g.E.count()
==>840
gremlin> (1..40).each{ l ->
  t = System.currentTimeMillis();
  c = g.v(4).out.loop(1){it.loops <= l}.count()
  t = System.currentTimeMillis() - t;
  println(l + ":" + c + ":" + t)
}


path lengthpath counttraversal time (ms)path lengthpath counttraversal time (ms)
129242
3814162
53236645
712810825617
95123610102462
1120487412409696
13819249141638461
1532768881665536163
1713107232518262144646
1952428812962010485762573
212097150517222419425810306
23838805420659241677256641555
2533523880829932666941500166828
2713342254033195428265069020654070
29523921830828856630102781365035512124

Traversing a Lattice with Faunus

 Graph Analytics EngineFaunus is a graph analytics engine that leverages Hadoop and a breadth-first implementation ofGremlin. With Faunus, paths can be enumerated in a similar fashion to Titan/Gremlin, but if only counts are required (destinations vs. paths), then it is more efficient to propagate and sum counters.Pascal's TriangleInstead of explicitly storing each and every Gremlin at every vertex, Faunus stores the number of Gremlins at each vertex. This is the difference between representing a list of length m and a long with value m. A consequence of this model is that it is possible to efficiently compute the number of 40-length paths in a 20x20 lattice with Faunus. This counter propagation mechanism is analogous to the mechanical technique for computing binomial coefficientsas proposed by Blaise Pascal via Pascal’s triangle (in the Western world). It is important to note the breadth-first requirement of this computation.

Enumeration vs. Counting

g = FaunusFactory.open('bin/titan-cassandra-input.properties')
t = System.currentTimeMillis()
// As of Faunus 0.3.1, the loop() construct is not supported
g.v(4).out.out.out.out.count() // 4-steps
t = System.currentTimeMillis() - t

The number of paths of length n in a 20x20 lattice is plotted below in both y-linear (left) and y-logarithmic (right) form. In short, the number of paths grows exponentially as the path length increases. What is interesting to note in the following table of Faunus counts and runtimes is that Faunus is able to compute the total number of unique 40-length paths in the lattice in 10.53 minutes — 137,846,528,820.

20x20 Lattice Path Count

path lengthpath counttraversal time (ms)path lengthpath counttraversal time (ms)
12327812448071
386353741678575
53294078664109246
71281245748256139850
9512156223101024171549
112048186049124096201111
1381922184171416384232642
15327682480191665536263355
1713107227868518262144296912
19524288308225201048576324440
212097150340823224194258355349
2383880543715462416772566385755
25335238804021892666941500416868
2713342254043391728265069020448150
29523921830462522301027813650478595
311995537270493224323821729910508492
3371918741405247913413237415400539216
35236908795205561083640885872720568512
376715600122058669738102501265020601196
39137846528820617152

Titan’s runtime grows exponentially, in proportion to the number of paths computed. On the other hand, Faunus’ computation time grows linearly when computing an exponential path count. At step 28, Faunus executes the path count faster than Titan. This does not mean that Titan is inherently less efficient at such computations, it is simply a function of the depth-first, enumerative nature of Gremlin/Titan vs. the breadth-first, counter nature of Gremlin/Faunus. Implementing Faunus’ Gremlin engine over Titan would enable Titan to compute such path counts effectively. However, the purpose of Faunus is to serve as the global batch processer to Titan.

Titan and Faunus Lattice Traversal Times

Conclusion

20x20 LatticeA graph is a data structure composed of vertices and edges. The natural interpretation of a computation on a graph is a traversal — i.e., a directed walk over the vertices by means of chosen incident edges. An exhaustive exploration of all paths within a graph is typically not feasible because the number of paths grows exponentially as a function of the path length and the graph’s branch factor. As demonstrated with Titan and Faunus, the goal of the traversal and the choice of the traversal engine ultimately determines the feasibility of the computation. Once again, the loopy lattice exposes a simple truth in the theory and practice of graphs.






Published at DZone with permission of Marko Rodriguez, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)