MORE ON TREES AND XML
with Fabian Pascal

 

 

 

From: GW

To: Editor

Date: 27 Jul 2004

 

I know you may be getting tired of the "Trees" issue, but here goes anyway! The posting of 7/23/2004 On Climbing Trees in SQL and Its Implementations led me to think about Chris Date's original proposal for an EXPLODE and IMPLODE operator. I fully realize that his paper was written early on, and that his thoughts on some of the issues have changed. However, it is a seminal paper on the subject and worth re-reading. The part of it I don't understand concerns the idea of "essential ordering".

 

For reference, here is the tree Mr. Date uses as an example:

 

P1

|

+ -----------+-----------+

|            |           |

P2           P3          P4

| \          / \         /  \

P3 P5      P5  P6      P3    P6

|                      |

/ \                    / \

P5   P6                P5   P6

 

Take the Parts tree as an example. As you and Mr. Date have pointed out, any structure where an element or group can occur more that once must have SOME kind of additional information to maintain the identity of the node in order to avoid duplication.

 

As I understand it, Date's proposal was to extend the SQL SELECT operation with an EXPLODE operator which would expose a relvar consisting of two attributes, the MAJORP# and a relation-valued attribute containing all which would create a ordered list and transform it into a relation.

 

What I don't understand is how such an operation would obtain the appropriate ordering criteria from the PP table. To wit, if the immediate dependents of a root P1 are P2, P3 and P4, then Date's proposal would rely on sorting the part numbers to obtain the ordering. Since one cannot assume that part numbers are in any order, it would seem that this should not be used.

 

The question then is, where does the initial ordering come from? The only thing I can think of is that the ordering should be explicitly represented in the relvar with a "sequence" attribute, or as the writer of the posting suggested, a "node id".

 

That being the case, the user would in any case be responsible for specifying the structure. The DBMS could manage the user specified composition of any given part, but wouldn't it have to rely on the ordering provided by the user, thus introducing concepts like "the first node", "the next node", etc.?

 

 

Fabian Pascal Responds: The suggested EXPLODE operator is part of a rather old treatment of the subject; it was conceptual in nature and outlined a way in which a true RDBMS might go about it; extending SQL would require major surgery. The old material is now superseded by a PRACTICAL DATABASE FOUNDATIONS two-part paper entitled Tree-Structured Data, Part 1: A Relational Perspective. Part 2: Three SQL Solutions will assess the DB2, Oracle, and SQL Server tree operations, relative to the relational solution. It is forthcoming next month.

 

 

From: GW

To:  Editor

 

Yes, your chapter kicked off a flurry of considerations on what exactly it means to represent a hierarchy, how that would be done relationally, and what the predicates would be. It's a hot issue these days, especially when the more extreme XML advocates are trying to present all data as purely hierarchical, both in the way it is represented and the way it is manipulated.

 

So instead of considering why they wish to represent data hierarchically in their "data model" (rather than, say, reporting on it hierarchically), they simply rely on the argument that "that's the way the world is structured".

 

So basically I am trying to think about the best way to model sequential structures before just throwing it into a tree because XML says to.

 

 

From: Fabian Pascal

To GW

 

Gee, and I thought that XML was essentially for documents.

 

In our PRACTICAL DATABASE FOUNDATIONS papers #1, What First Normal Form Really Means, and #2, What First Normal Form Means Not we explain that any data can be perceived and represented hierarchically or relationally. We contend that inherently hierarchic structures (e.g. organization and par-subpart assemblies), are less frequent than what's being claimed, and for the majority of cases that are not, hierarchies must be forced on the data. On the other hand, any data, whether inherently hierarchic or not, can be represented relationally, and there are all sorts of practical advantages in integrity and manipulation relative to a direct hierarchic representation. The fact that current products don't do it properly, or at all, is a defect of SQL and its commercial dialects, and should not be interpreted as some sort of relational weakness. The purpose of the above-mentioned forthcoming paper is to demonstrate that.

 

 

Posted 10/29/04