Saturday, February 19, 2022


Note: To demonstrate the correctness and stability offered by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references, which I enclose in square brackets).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the)  order of the original publication.
Part 1: Clarifications on a Review of My Book Part 1
Part 2: Slashing a SlashDot Exchange Part 1
Part 3: Slashing a SlashDot Exchange Part 2
Part 4: Slashing a SlashDot Exchange Part 3
Part 5: Slashing a SlashDot Exchange Part 4
Part 6: Clarifications on a Review of My Book Part 2


(originally posted 2/21/01)

In Part 1 debunked a review of my book @Slashdot.Org. In parts 2-5 I tackled the discussion generated there by the review. In this last part I focus on the discussion of data hierarchies covered in chapter 7 of my book [the in-vogue re-emergent graph fad].

“Chapter 7 discusses data hierarchies and trees. In a nutshell: there are no trees in SQL. The author is distressed by this. Given that a foreign key is basically a pointer, you can store trees in databases, it might not be pretty and there may not be easy way to read them and it might not be a good thing to do - but if you feel the need then get right in there. Of course I could be totally wrong about this.”
Confusing keys with pointers is one of the major errors many practitioners make ]. One intentional core advantage of the RDM is precisely that it prohibits pointers -- both physical and, as in object-orientation, logical. Exposing pointers to users has caused many unnecessary problems and complications, but offered no benefit (Don't Mix Pointers and Relations and Don't Mix Pointers and Relations - Please! in Date's RELATIONAL DATABASE WRITINGS 1994-1997). There is an easy way to demonstrate that relational keys are not, like object IDs (OID), pointers, but values: they represent uniquely identifying names/attributes of rel world entities. Pointers are system-generated internals and have no real world counterpart. The desirability of a data model that produces logical models that are faithful representations of the real world, without adding artifacts of their own. Indeed, as Date points out in Why The Object Model' is Not a Data Model in his above-mentioned book, the fact that "in the object world all the references to objects are by means of their corresponding OIDs explains why -- as is well known -- OO systems typically provide (a) two different equality comparison operators, equal OID vs. equal value and (b) two different assignment operators, assign OID vs. assign value.  Note the added complication -- what is the benefit?


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.


02/13 OBG No Understanding without Foundation Knowledge Part 5: Debunking an Online Exchange 4

02/04 OBG No Understanding without Foundation Knowledge Part 5: Debunking an Online Exchange 4

01/30 TYFK Nobody Understands What a Data Model Is

01/21 Read My Lips: If There's NULLs, It's Not Relational

01/16 OBG: No Understanding without Foundation Knowledge Part 4 -- Debunking an Online Exchange 3

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


Incidentally, SQL:1999 does support recursive queries (see below), [but poorly, and it surely should not have].

“It's not hard to create a binary tree structure in a database without use of foreign keys. Just make a parent and two child fields.”
“Yes, you can represent a tree in a relational database, but there is no mechanism in SQL to query on or return trees, and there is no tree data type. I believe that's the author's point.”
As I demonstrate in the chapter, hierarchies are, indeed, representable and queriable relationally, [but that defeats the whole purpose of the RDM, which was introduced expressly to avoid the problems of directed graph (hierarchic and network) technologies that preceded it. Codd explicitly reserved the RDM for what he called "non-network applications", for which the RDM provides core advantages that graph technologies not only do not, but severely complicate needlessly. SQL was purportedly designed as -- and is universally believed to be -- a relational data sublanguage, but as I argued before it is neither structured, not just for queries, and not a real language. Its authors did not understand the RDM and there is hardly a better proof of that than the addition of (very poor] hierarchic functionality to it.] -- a "tree data type" which "returns trees" makes no sense.

“Oracle: Definition of table MYTREETAB: ID, NAME, PARENTID
SELECT name FROM mytreetab
START WITH id = [Insert ID for rootnode here]
CONNECT BY PRIOR id = parentid
Very useful. Depth first. Loop detection included.”
“Also, the CONNECT BY predicate is an Oracle extension, rather than part of the SQL standard, so for many of us, is not a solution. Don't get me wrong, it is definitely possible to deal with trees in SQL (I do it all the time), it simply requires a lot of overhead that could be nicely handled behind the scenes, thus improving performance, making code easier to read, standardizing behavior, et al.”

“I think you mean CONNECT BY. Unfortunately, Oracle's CONNECT BY isn't very useful -- it only works if you have only one table in a query -- it doesn't allow you to join the table containing the parent/child pointers to other tables ... In any case, it only works for Oracle. All other systems have to use client-side code or make use of temp tables or cursors to achieve the same result, both of which are not a part of standard SQL, which always gives academician's (like this book's author) fits, but is rarely a problem in the real world (unless you have to migrate to a new RDBMS).”
[This is, likely, dated -- I have no idea what hierarchic functionality SQL supports today -- but that is not the point. Such functionality does not belong in a relational data sublanguage and it is one of the reasons SQL cannot be considered such, industry ignorance notwithstanding.] Aside from being limited to very simple tree structures and not extendable to more realistic ones, the (proprietary) Oracle CONNECT BY clause violates relational closure, by generating tables that (a) contain duplicates and have essential ordering and, therefore, (b) are likely to lose information when queried further--which, as I demonstrate in the chapter, is exactly the sort of thing users need for hierarchic cases. The true relational solution outlined in the chapter would obviate the SQL difficulties described, but SQL would require prohibitive major surgery to add the kind of tree support suggested.
“It is extremely expensive when you try to traverse/search the tree that is built in such a way. As you have to make query, get the results, and based on this results, make another query for the child node, and iteration after iteration. The SQL was created to query relational databases. And relational databases were created to manage tabular data. Such databases basically are not built to handle hierarchical (tree-like) structures, and as such they handle it poorly. Oracle's solution (proprietary, not adopted by other products), is just a shortcut, but it has unnecessary high performance hit, comparing to what can be done with the hierarchical (or object) databases.”
“I didn't say it was good or efficient, merely that it can be done.”
“Disclaimer: I have not read the book, so I'm commenting based solely on the review and the comments. Trees can be implemented in databases, although, as L points out, it might not necessary be pretty. However, the review says there are no trees in *SQL*, which is the language used to get data out of databases. Even if you implement a tree in your database, the usual way to access the data is by reading the "root" record, then using some programming language to follow the pointers to other records. You almost never try to follow the pointers using SQL, because that quickly becomes difficult. SQL is designed to grab sets of records out of tables. It's good at doing set operations such as union, intersection, and so forth. A tree structure isn't really a set operation. It is specialized for operations like that. It is not the right tool for implementing trees. That's disappointing, because you very often want to use trees and similar data structures. It means you need more than just SQL to implement a useful database system, but as long as you have those other tools at your disposal, things usually work out OK.”
Another of the major fallacies prevalent in the industry -- one that makes it practically impossible for true solutions to emerge -- is that to most practitioners, SQL DBMSs are RDBMSs. Consequently, problems experienced with SQL -- and there are plenty of them -- are blamed on its being relational and set oriented. But, in fact, SQL and its commercial dialects are so far from what a relational data sublanguage (and from a well designed language) should and could have been, that SQL DBMSs cannot, for all practical purposes, be considered relational. In their THIRD MANIFESTO book, Date and Darwen emphatically reject SQL (I dare anybody to find a single reference to the term relational in the whole ANSI SQL standard. This was the committee's conscious decision). As I point out in the chapter, a true and correct relational DBMS would be able to handle recursive queries, [but that is absurdly contradictory to and defeat the relational purpose. If you have network applications use a directed graph data model (if only a sound one were developed and formalized!) and GDBMS (if there were one implementing it, not the RDM (if only a sounde one were developed and formalized] and a true RDBMS (if only there were one implementing it). Don't practitioners say so often "The right tool for the right job?].
“For info on how SQL3 handles recursive queries, IBM has a good tutorial. Trees are a set-based operation, but it requires a recursive reference in the set definition, e.g. X = {the root node} union {y: y is a child of some node in X}. This is perfectly well-defined; it's just a matter of getting SQL to handle it, as is done in SQL3.”
[No, it's certainly not] From what I've seen [at the time of the original post] of the IBM's DB2 support of hierarchies, it is hardly the correct solution referred in the comment [part of the reason being the very complexity that the RDM was intended to protect non-network applications from]. In fact, In Chapter 7 I demonstrate this with an example of IBM's extension to DB2's SQL dialect for recursive queries and as far as I know, some version of it has been incorporated in SQL3.
“While it's true that creating a tree structure is not difficult using SQL, it is rather difficult to maintain the integrity of the data, and also to manipulate the data or even query it in useful ways. Some examples:
  • A tree should have one root level node (a node with no parents). This means that the column that holds the parent keys must allow nulls, but only in one row. A trigger or constraint must be employed to ensure this.
  • A tree cannot have any cycles. Let's say that you're using a tree to represent an organizational chart. Bill reports to Jenny. Jenny reports to Frank. You need to add a constraint to ensure that Frank can never report to Bill (because then he would be reporting indirectly to Jenny, his own employee). Again, this can be done, but requires triggers, and some crafty data design.
  • Just one more, I promise: A common kind of query for a tree is to return all the rows that have the same ultimate parent. For example, you want to know what the total payroll is for everyone that reports to Jenny or her subordinates. Not easy to do unless you go to special lengths in the data design.
An excellent treatment of working with trees and graphs in SQL is presented in Joe Celko's book: SQL For Smarties. This is an EXCELLENT resource. Trees are a pervasive structure in the real world, the org chart is the most common example, but once you start looking, you'll find them everywhere. It would be a great blessing to see extensions to the SQL standard to handle trees more gracefully.”
[Bingo: another major industry fallacy is the exclusive focus on data structure, ignoring the two other components of a data model: the integrity and manipulation of that structure, which is precisely where severed complexity and disadvantages pile up in hierarchies, but the RDM shines. There may not be a way around accepting them for network applications, but why incurring them in non-network applications?]

Points arising:

  • The correct relational design does not require NULLs, which have some horrendous problems of their own [The Last NULL in the Coffin: A Relational Solution to Missing Data].
  • True hierarchies (in fact, graph structures in general), indeed, do not have cycles by definition. With the correct table design, there is no need of additional constraints of the kind suggested here (in other words, structures with cycles are not true hierarchies and require a different design and integrity constraints), but this has nothing to do with the RDM.
  • In Chapter 7 I clearly show how a true RDBMS (not a SQL one!) and a correctly designed database could handle the kind of explosion operations suggested, but why?
  • I urge practitioners to do themselves a favor and stay away from Joe Celko. I have never encountered anything said or written by him that is not confused, irrelevant, misleading, or riddled through with errors. He simply does not understand the RDM and, therefore, whatever he understands about SQL is highly suspect.
  • Hierarchies are common, but nowhere near as common as application programmers think and users think. Hierarchic structures are frequently forced onto representations of the real world (XML is a case in point), increasing complexity considerably and unnecessarily. [Furthermore, when true hierarchies -- particularly cyclic -- truly exist in the real world, there will be network applications, which can be supported by directed graph technology (if it ever becomes formally sound) at the unavoidable complexity and lack of system-guaranteed correctness. Relational technology was developed as an alternative to it for non-network applications, for which graph technology was discarded decades ago. We don't want them back through the back door, but unfortunately that's what we get with graph re-emergence and XML.



Posted 02/21/01

No comments:

Post a Comment

View My Stats