NoSQL Zone is brought to you in partnership with:

Multitenant Graph Applications

09.19.2012
| 4025 views |
  • submit to reddit

Editor's Note: This article was co-written by Marko Rodriguez and Stephen Mallette.

 

A multitenant software system is a system that supports any number of customers within a single application instance. Typically, that single instance makes use of a shared data set, where a customer’s data is properly separated from another’s. While data separation is a crucial aspect of a multitenant application, there may be system-wide (e.g. global) computations that require the consumption of all customer data (or some subset thereof). If no such global operations are required, then a multitenant application would instead be a multi-instance application, where each customer’s data is contained in its own isolated silo. A few example multitenant applications are itemized below.

  • A company’s confidential reports (e.g. market strategies or financial information) in a Business Intelligence system is isolated from competitors within the same application. However, public data (e.g. census, market, tax data) is shared amongst and linked to by the various tenant data sets. As such, the public data helps to enhance the usefulness of each company’s respective private data.
  • A social network service guarantees user privacy while, in an access control list (ACL) fashion, allows users to share their data with other trusted users (e.g. friends) in the system.
  • Patient records in a multitenant electronic health record system can be separated to ensure patient confidentiality. However, collective statistics can be gleaned from the global data set in order to allow data analysts/scientists to study population-wide health concerns.

Blueprints and PartitionGraph

TinkerPop’s Blueprints 1.2+ makes it easy to build multitenant, graph-based applications. Blueprints is a graph database interface similar to the JDBC of the relational database community. Blueprints is supported by various graph databases including TinkerGraph, Neo4j, OrientDB, DEX, and InfiniteGraph. In addition to providing a standard graph interface, Blueprints includes a collection of graph wrappers. A graph wrapper takes an existing graph implementation, such as Neo4jGraph, and decorates it with new features. For example, wrapping a graph implementation with ReadOnlyGraph prevents graph mutations.

The graph wrapper that enables multitenancy is called PartitionGraph. PartitionGraph separates the underlying graph into different partitions/buckets. However, edges can link vertices in two separate partitions. In this way, multitenancy is clearly realized, where a partition serves as the location for a single tenant’s data. Moreover, data “cross-fertilization” is possible through appropriately constrained inter-partition linking. The design of PartitionGraph borrows heavily from the Named Graph data architecture popularized by the Web of Data/Linked Data community. The remainder of this post will demonstrate graph-based multitenancy by means of an Electronic Health Records (EHR) system example using PartitionGraph, the graph traversal/query language Gremlin, and the colorful characters of TinkerPop.

Intra-Partition Electronic Health Records

The following code snippet demonstrates how PartitionGraph solves the multitenancy problem. First, a new graph is constructed and wrapped in a PartitionGraph with an initial write partition of pgp (Pipes General Practice). The graph used is the in-memory TinkerGraph. The write partition is the partition that newly created data is written too. When patient Gremlin goes to Pipes General Practice and TinkerPop Medical Center, two vertices are written to the pgp and tmc partitions, respectively.

~$ gremlin

         \,,,/
         (o o)
-----oOOo-(_)-oOOo-----
gremlin> g = new PartitionGraph(new TinkerGraph(), '_partition', 'pgp')
==>partitiongraph[tinkergraph[vertices:0 edges:0]]
gremlin> g.getPartitionKey() 
==>_partition
gremlin> g.getReadPartitions()    
==>pgp
gremlin> g.getWritePartition()
==>pgp
gremlin> gremlinPgp = g.addVertex('gremlin@pipesgeneralpractice')
==>v[gremlin@pipesgeneralpractice]
gremlin> g.setWritePartition('tmc')
gremlin> gremlinTmc = g.addVertex('gremlin@tinkerpopmedicalcenter')
==>v[gremlin@tinkerpopmedicalcenter]

The following diagram shows what has been established thus far. There are two partitions in the same multitenant graph (pgp and tmc). Gremlin has visited both facilities and has two different medical histories as denoted by the vertices and edges within each partition. Note that the generation of those medical histories is not demonstrated in the code fragment above. For the sake of clarity, imagine that a medical history includes a patient’s current conditions, lab results, vitals (such as height, weight, and blood pressure), allergies, current medications, etc.

When a physician at Pipes General Practice checks patient records (where PartitionGraph has its read partition set to pgp), the physician will only see Pipes General Practice data. Moreover, if the current read partition is removed and a new one is added, then only the data in the newly added partition is visible.

gremlin> g.V
==>v[gremlin@pipesgeneralpractice]
gremlin> g.removeReadPartition('pgp')
gremlin> g.addReadPartition('tmc')
gremlin> g.V
==>v[gremlin@tinkerpopmedicalcenter]

At this point, the example has shown how to firewall customer data with PartitionGraph. Next, it is possible to go beyond simply separating graph elements into partitions. Edges may either be intra- or inter- partition in that they can point to vertices in the same partition or to vertices in two different partitions. In this way, it is possible to introduce global data that can be shared amongst all customers.

Inter-Partition Electronic Health Records

The following code fragment introduces a new snomed partition, where snomed refers to the publicly available SNOMED-CT clinical terms data set. Example terms include pneumonia, common cold, acute nasal catarrh, etc. Vertices and edges are added to the snomed partition that represent the SNOMED-CT concept hierarchy. Note that in practice, the full SNOMED-CT data set would be parsed into the partition, but for this simple example, two clinical terms and their subsumption relationship are written.

gremlin> g.setWritePartition('snomed')
gremlin> painInRightLeg = g.addVertex('snomed:287048003', [name:'Pain in right leg (finding)'])
==>v[snomed:287048003]
gremlin> painInLowerLimb = g.addVertex('snomed:10601006', [name:'Pain in lower limb (finding)'])
==>v[snomed:10601006]
gremlin> g.addEdge(painInRightLeg, painInLowerLimb, 'broader')
==>e[0][snomed:287048003-broader->snomed:10601006]

 When patient Gremlin complains of an injured leg at both Pipes General Practice and TinkerPop Medical Center, edges are added that connect the patient vertex to the respective clinical term vertex in the snomed partition. These complainedOf edges are denoted by the dashed lines in the diagram below.

gremlin> g.setWritePartition('pgp')
gremlin> g.addEdge(gremlinPgp, painInRightLeg, 'complainedOf')
==>e[1][gremlin@pipesgeneralpractice-complainedOf->snomed:287048003]
gremlin> g.setWritePartition('tmc')
gremlin> g.addEdge(gremlinTmc, painInRightLeg, 'complainedOf')
==>e[2][gremlin@tinkerpopmedicalcenter-complainedOf->snomed:287048003]

With respect to the diagram below, assume that both Rexster and Frames are new patients at TinkerPop Medical Center who have also complained of limb pain. A limb pain specialist at TinkerPop Medical Center can query the tmc partition to see which patients have a lower limb issue. The traversal in line 2 walks the SNOMED-CT hierarchy in order to find all patients in the tmc partition that have complained of anything related to lower limb pain (e.g. right leg pain). Given a more complex hierarchy, various lower limb ailments and the patients suffering from such ailments would be exposed by this graph traversal.

gremlin> g.addReadPartition('snomed')
gremlin> painInLowerLimb.in('broader').loop(1){true}{it.object.in('complainedOf').count() > 0}.in('complainedOf')
==>v[gremlin@tinkerpopmedicalcenter]
==>v[rexster@tinkerpopmedicalcenter]
==>v[frames@tinkerpopmedicalcenter]

Over a rich EHR data set, various other types of graph queries can be enacted. A few examples are itemized below.

  • Determine what treatments were used on patients suffering from the same lower limb ailment as Gremlin.
  • Correlate the personal medical histories of all lower limb patients to see if there is a relationship amongst them (e.g. smoking, obesity, medical prescriptions, etc.).
  • Find related clinical terms in SNOMED-CT and locate other patients that have similar problems (e.g. numbness of the leg, sciatica, etc.). Determine what treatments were successful for those related patients.
  • Connect patient Gremlin’s records at both Pipes General Practice and TinkerPop Medical Center in order to create a unified perspective of Gremlin’s medical history via a sameAs edge (represented by the dash-dotted line in the diagram above).

Conclusion

The benefit of PartitionGraph is that global data does not introduce significant complexity to the programming model nor does it expose risks to firewalled partitions. Moreover, it is possible to enact graph-wide analyses that span all partitions. This can be a compelling advantage for Business Intelligence applications. Given the running example, by simply making all partitions readable, it is possible to analyze the medical histories across all medical facilities and leverage the SNOMED-CT data as the bridge between these seemingly disparate partitioned data sets.

gremlin> g.addReadPartition('pgp')
gremlin> g.addReadPartition('tmc')
gremlin> g.addReadPartition('snomed')
gremlin> g.V
==>v[rexster@tinkerpopmedicalcenter]
==>v[snomed:287048003]
==>v[frames@tinkerpopmedicalcenter]
==>v[snomed:10601006]
==>v[gremlin@tinkerpopmedicalcenter]
==>v[gremlin@pipesgeneralpractice]
gremlin> g.E
==>e[3][rexster@tinkerpopmedicalcenter-complainedOf->snomed:287048003]
==>e[2][gremlin@tinkerpopmedicalcenter-complainedOf->snomed:287048003]
==>e[1][gremlin@pipesgeneralpractice-complainedOf->snomed:287048003]
==>e[0][snomed:287048003-broader->snomed:10601006]
==>e[4][gremlin@tinkerpopmedicalcenter-sameAs->gremlin@pipesgeneralpractice]
==>e[5][frames@tinkerpopmedicalcenter-complainedOf->snomed:287048003]

PartitionGraph presents interesting opportunities for analyses that mix and match different partitions in a traversal space. To conclude, a collection of more complex EHR medical use cases are presented that can be conveniently facilitated by PartitionGraph.

  • A cross medical facility (i.e. partition) analysis shows that the percentage of patients with HIV/AIDS who are prescribed an antiretroviral drug is below standards set forth by the Centers for Medicare and Medicaid Services’ (CMS) quality measures. This analysis prompts healthcare providers and administrators to consider changes to treatment protocols and drug formularies.
  • Physician communities of practice can be identified by analyzing patient visit and treatment patterns. Given those communities, it is possible for pharmaceutical companies to predict the influencers within the greater physician social network in order to yield insight into potential drug adoption patterns.
  • Patient records across all partitions provides the foundational data set for a population-wide analysis within a clinical decision support system.

 

 

Published at DZone with permission of Marko Rodriguez, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)