NoSQL Zone is brought to you in partnership with:

Chris Travers is the most active developer of LedgerSMB and have been working with PostgreSQL since 1999. For the last five years he has spent extensive time with the more advanced features of this database management system. At the same time, PostgreSQL has continued to grow in these areas. Chris is a DZone MVB and is not an employee of DZone and has posted 30 posts at DZone. You can read more from them at their website. View Full User Profile

Object-Relational Algebra 3: The Series Join Function

02.23.2013
| 1088 views |
  • submit to reddit
I am back from a break due to working hard on getting LedgerSMB 1.4 closer to release (beta 2 has been released and we are working hard on moving towards beta 3).  Anyway, to finish up the series on object-relational algebra.

The second important addition I would make to relational algebra to give it object-relational capabilities is what I call a "series join" operation.  A series join only makes sense in object-relational approaches because the output can be minimal and yet certain functional dependencies on that output can be readily defined.

I use the capital sigma Σ to refer to a series join, acknowledging that the lower case sigma refers to the select operation and so there is a partial conflict.

A series join takes a theta condition much like any other join, but this theta condition operates on the input relation to the series join.  The ut set is joined to itself in the first iteration and then to the output set of the previous iteration in subsequent iterations.  This is repeated until the output set does not change with subsequent iterations.  in a finite data set, the mappings will also be finite.  An optional subscript provides a starting set of values in the input relation and an optional superscript provides a maximum iteration depth.

The output is set of tuples of (o1, o2) where o1 is the initial object's key or reference and o2 is the linked to object's key or reference.  From here the following functional dependencies arise:  path can be used to trace a path (how we get from one record to another) and depth (how many iterations we have to go to reach the destintion record) are two obvious ones.

A series join provides a useful way to express transitive operations.  This allows, for example, binary transitive closure to be expressed and tested because one can generate all possible paths from one node on a graph up until the point where they loop back on themselves.

Series join operations in the SQL world are roughly approximated by the CONNECT BY and WITH RECURSIVE constructs, both of which are typically used to construct a finite series of self-joins.  However there are key differences too.  In my model we are less worried about the tuple membership than what we can clearly derive from a series join.

Please pardon my layout since I am not quite sure how to make mathematical equations display perfectly in blogger.

Suppose we have a relation employee.

We might have reports = employee Σ13 with a theta condition of id θ manager and this would provide a list of the direct reports to this employee, their direct reports, and their direct reports.  We can also express certain functional dependencies, such as depth(reports), which is between 0 and 3, and path(reports) which will show the chain of command between the report and the employee.  If reports = employee Σ1 and employee 1 is the CEO, then we get the entire organizational chart of the business.

Series joins allow us to do graph traversal, hierarchical searches, and more in an OR database, and approximations of this have been implemented in the relational model.  They are mathematically clear, clean, avoiding magic operations and solve a great number of problems.

 

Published at DZone with permission of Chris Travers, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)