William Vambenepe works are architect in the Application and Middleware management group at Oracle. He blogs at http://stage.vambenepe.com/ William has posted 55 posts at DZone. View Full User Profile

With M (Oslo), is Microsoft on the path to reinventing RDF?

06.12.2009
| 2841 views |
  • submit to reddit

I have given up, at least for now, on understanding what Microsoft wants Oslo (and more specifically the “M” part) to be. I used to pull my hair reading inconsistent articles and interviews about what M tries to be (graphical programming! DSL! IT models! generic parser! application components! workflow! SOA framework! generic data layer! SQL/T-SQL for dummies! JSON replacement! all of the above!). Douglas Purdy makes a valiant 4-part effort (1, 2, 3, 4) but it’s still not crisp enough for my small brain. Even David Chapell, explainer extraordinaire, seems to throw up his hands (“a modeling platform that can be applied in lots of different ways”, which BTW is the most exact, if vague, description I’ve heard). Rather than articles, I now mainly look at the base specifications and technical documents that show what it actually is. That’s what I did  when the Oslo SDK first came out last year. A new technical document came out recently, an update to the MGraph Object Model so I took another a look.

And it turns out that MGraph is… RDF. Or rather, “RDF minus entailment”. And with turtle as the base representation rather than an add-on.

Look at section 3 (”RDF concepts”) in this table of content from W3C. It describes the core RDF concepts. Keep the first five concepts (sections 3.1 to 3.5) and drop the last one (”3.6: Entailment”). You have MGraph, a graph-oriented object model.

On top of this, the RDF community adds reasoning capabilities with RDF entailment, RDFS, OWL, SWRL, SPIN, etc and a variety of engines that implement these different levels of reasoning.

Microsoft, on the other hand, seems to ignore that direction. Instead, it focuses on creating a good mapping from this graph object model to programming languages. In two directions:

  • from programming languages to the graph model: they make it easy for you to create a domain-specific language (DSL) that can easily be turned into M instances.
  • from the graph model to programming languages: they make it easy for you to work on these M instances (including storing them) using the .NET technology stack.

So, if Microsoft is indeed reinventing RDF as the title of this entry provocatively suggests, then they are taking an interesting detour on the way. Rather than going straight to “model-based inferencing”, they are first focusing on mapping the core MGraph concepts to programming (by regular developers) and user interactions (with regular users). Something that for a long time had not gotten much attention in the RDF world beyond pointing developers to Jena (though it seemed to have improved over the last few years with companies like TopQuadrant; ironically, the Oslo model browser/editor is code-named “Quadrant”).

Whether the Oslo team sees the inferencing fun as a later addition or something that’s not needed is another question, on which I don’t see any hint at this point.

I hope they eventually get to it. But I like the fact that they cleanly separate the ability to represent and manipulate the graph model from the question of whether instances can be inferred. We could use such a reusable graph representation mechanism. Did CMDBf, for example, really have to create a new graph-oriented metamodel and query language? I failed to convince the group to adopt RDF/SPARQL, but I may have been more successful if there had been a cleanly-separated “static” version of RDF/SPARQL, a way to represent and query a graph independently of whether the edges and nodes in the graph (and their types) are declared or inferred. Instead, the RDF stack has entailment deeply embedded and that’s very scary to many.

But as much as I like this separation, I can’t help squirming when I see the first example in the MGraph document:

// Populate a small village with some people
Villagers => {
Jenn => Person { Name => 'Jennifer', Age => 28, Spouse => Rich },
Rich => Person { Name => 'Richard', Age => 26, Spouse => Jenn },
Charly => Person { Name => 'Charlotte', Age => 12 }
},
HaveSpouses => { Villagers.Rich, Villagers.Jenn }

 

That last line is an eyesore to anyone that has been anywhere near RDF. I have just declared that Rich and Jenn are one another’s spouse, why do I have to add a line that says that they have spouses? What I want is to say that participation in a “Spouse” relationship entails membership in the “hasSpouse” class. And BTW, I also want to mark the “Spouse” relationship as symmetric so I only have to declare it one way and the inverse can be inferred.

So maybe I don’t really know what I want on this. I want the graph model to be separated from the inference logic and yet I want the syntactic simplicity that derives from base entailments like the example above. Is June too early to start a Christmas wish list?

While I am at it, can we please stop putting people’s ages in the model rather than their dates of birth? I know it’s just an example, but I see it over and over in so many modeling examples. And it’s so wrong in 99% of cases. It just hurts.

There are other things about MGraph edges that look strange if you are used to RDF. For example, edges can be labeled or not, as illustrated on this first example of the graph model:

In this example, “Age” is a labeled edge that points to the atomic node “42″, while the credit score is modeled as a non-atomic node linked from the person via an unlabeled edge. Presumably the “credit score” node is also linked to an atomic node (not shown) that contains the actual score value (e.g. “800″). I can see why one would want to call out the credit score as a node rather than having an edge (labeled “credit score”) that goes to an atomic node containing the actual credit score value (similar to how “age” is handled). For one thing, you may want to attach additional data to that “credit score” node (when was it calculated, which reporting agency provided it, etc) so it helps to have it be a node. But making this edge unlabeled worries me. Originally you may only think of one possible relationship type between a person and a credit score (the person has a credit score). But other may pop up further down the road, e.g. the person could be a loan agent who orders the credit score but the score is about a customer. So now you create a new edge label (”orders”) to link the loan agent person to the credit score. But what happens to all the code that was written previously and navigates the relationship from the person to the score with the expectation that the score is about the person. Do you think that code was careful to only navigate “unlabeled” edges? Unlikely. Most likely it just grabbed whatever credit score was linked to the person. If that code is applied to a person who happens to also be a loan agent, it might well grab a credit score about other people which happened to be ordered by the loan agent. These unlabeled edges remind me of the practice of not bothering with a “version” field in the first version of your work because, hey, there is only one version so far.

The restriction that a node can have at most one edge with a given label coming out of it is another one that puzzles me. Though it may explain why an unlabeled edge is used for the credit score (since you can get several credit scores for the same person, if you ask different rating agencies). But if unlabeled edges are just a way to free yourself from this restriction then it would be better to remove the restriction rather than work around it. Let’s take the “Spouse” label as an example. For one thing in some countries/cultures having more than one such edge might be possible. And having several ex-spouses is possible in many places. Why would the “ex-spouse” relationship have to be defined differently from “spouse”? What about children? How is this modeled? Would we be forced to have a chain of edges from parent to 1st child to next sibling to next sibling, etc? Good luck dealing with half-siblings. And my model may not care so much about capturing the order (especially if the date of birth is already captured anyway). This reminds me of how most XML document formats force element order in places where it is not semantically meaningful, just because of XSD’s bias towards “sequence”.

Having started this entry by declaring that I don’t understand what M tries to be, I really shouldn’t be criticizing its design choices. The “weird” aspects I point out are only weird in the context of a certain usage but they may make perfect sense in the usage that the Oslo team has in mind. So I’ll stop here. The bottom line is that there are traces, in M, of a nice, reusable, graph-oriented data model with strong bridges (in both directions) to programming languages and user interfaces. That is appealing to me. There are also some strange restrictions that puzzle me. We’ll see where this goes (hopefully this article, “Designing Domains and Models Using M” will soon contain more than “to be submitted” and I can better understand the M approach). In any case, kuddos to the team for being so open about their work and the evolution of their design.

References
Published at DZone with permission of its author, William Vambenepe. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)