Big Data/Analytics Zone is brought to you in partnership with:

Graph Theory and Network Science: The Basics

08.12.2012
| 41450 views |
  • submit to reddit

Graph theory and network science are two related academic fields that have found application in numerous commercial industries. The terms ‘graph’ and ‘network’ are synonymous and one or the other is favored depending on the domain of application. A Rosetta Stone of terminology is provided below to help ground the academic terms to familiar, real-world structures.

graph network brain knowledge society circuit web
vertices nodes neurons concepts people elements pages
edges links axons relations ties wires hrefs

Graph theory is a branch of discrete mathematics concerned with proving theorems and developing algorithms for arbitrary graphs (e.g. random graphs, lattices, hierarchies). For example, can a graph with four vertices, seven edges, and structured according to the landmasses and bridges of Königsberg have its edges traversed once and only once? From such problems, the field of graph theory has developed numerous algorithms that can be applied to any graphical structure irrespective of the domain it represents (i.e. irrespective of what the graph models in the real-world). Examples of such developments are provided below:

In the domain of network science, researchers don’t study networks in the abstract, but instead, they study numerous real-world representations in order to understand the universal properties of networks. Examples of such networks include social networks, transportation networks, gene regulatory networks, knowledge networks, scholarly networks, etc. Network science is a relatively new discipline that has only been able to blossom because of computer technologies. With computers, scientists are able analyze large-scale networks such as the World Wide Web which has approximately 500 billion nodes. Due to their size, such structures tend to be studied from a statistical perspective.

  • Degree distribution: If a node is randomly selected from the network, what is the probability that it has X number of edges emanating from it? This statistic has implications for understanding how disease spreads through a social network and how communication networks can be sabotaged by directed attacks.
  • Assortative mixing: Do nodes with characteristic A tend to connect to nodes with characteristic B? Such information is useful as a descriptive statistic as well as for inferring future connectivity patterns in the network.
  • Growth models: Do all real-world networks grow according to a similar rule? Network growth models have implications for designing learning systems and understanding the future statistics of a fledgling network.

Perhaps the crowning achievement of network science is the realization that most “real-world” networks have a similar structure. Such networks are called scale-free networks and their degree distribution has an exponential decay. What this means is that real-world networks tend to have few nodes with numerous links and numerous nodes with few links. Prior to recent times, most people assumed that networks were randomly connected. In the late 90s and early 00s a mass of scholarly articles were published demonstrating the prevalence of scale-free networks in nearly every possible domain imaginable.

Interestingly, because most natural networks have this connectivity pattern, the processes that are evaluated on such networks are nearly the same—the only difference being the semantics as defined by the domain. For example, recommending products is analogous to determining the flow within an electrical circuit or determining how sensory data propagates through a neural network. Finding friends in a social network is analogous to routing packets in a communication network or determining the shortest route on a transportation network. Ranking web pages is analogous to determining the most influential people in a social network or finding the most relevant concepts in a knowledge network. Finally, all these problems are variations of one general process—graph traversing. Graph traversing is the simple process of moving from one vertex to another vertex over the edges in the graph and either mutating the structure or collecting bits of information along the way. The result of a traversal is either an evolution of the graph or a statistic about the graph.

The tools and techniques developed by graph theorists and networks scientists has an astounding number of practical applications. Interestingly enough, once one has a general understanding of graph theory and network science, the world’s problems start to be seen as one in the same problem.

Related Material

Rodriguez, M.A., Neubauer, P., “The Graph Traversal Pattern,” Graph Data Management: Techniques and Applications, eds. S. Sakr, E. Pardede, IGI Global, ISBN:9781613500538, August 2011.

Watkins, J.H., M.A. Rodriguez, “A Survey of Web-based Collective Decision Making Systems,” Studies in Computational Intelligence: Evolution of the Web in Artificial Intelligence Environments, Springer-Verlag, pages 245-279, 2008.

Rodriguez, M.A., “A Graph-Based Movie Recommender Engine,” Marko A. Rodriguez Blog, 2011.

Rodriguez, M.A., “Graphs, Brains, and Gremlin,” Marko A. Rodriguez Blog, 2011.

Published at DZone with permission of Marko Rodriguez, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)