Saturday, September 17, 2022

NEW "DATA MODELS" PART 1 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

 

“Codd's aim was to free programmers from having to know the physical structure of data. Our aim is to free them in addition from having to know its logical structure.”

                                                          --Simon Williams, LazySoft

This series is a re-publication of several DBDebunk 2001 posts in response to Simon Wlliams' so-called "Associative Model of Data", academic claims of superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 is the email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that started the whole thing.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

09/12 DATABASE DESIGN: THE STATE OF KNOWLEDGE IN THE INDUSTRY

08/28 NOBODY UNDERSTANDS DATABASE DESIGN (sms)

08/25 NOTHING TO DO WITH RELATIONAL (t&n)

UPDATES

08/20 Added Logic and databases course to LINKS page.

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
------------------------------------------------------------------------------------------------------------------

Then: ON RESPECTED TECHNICAL ANALYSTS (July 2001)

LF:
Ha. They use the same quote we used:

“INDUSTRY INFLUENCER EVENING: The decline and eventual demise of the Relational Model of Data. A presentation and discussion for journalists, analysts, influencers and special interest groups. Presenter: Simon Williams, Inventor of the Associative Model of Data.
The Relational Model of Data, now over thirty years old, is the foundation of almost every commercial database today. The challenge of the Object Model of Data has faded and the economic case for adopting the hybrid Object/Relational technology is unproven. So is the Relational Model the last word in database architecture? Simon Williams will present the case that the Relational Database is fundamentally unsuited to the Internet age and has begun an inevitable decline. Every new relational database application needs a new set of programs written from scratch. This is expensive and labor-intensive and unmaintainable in the growing skills shortage.Simon invented the Associative Model of Data, which is claimed to be the first major advance beyond the Relational Model. This promises to be a debate of great interest for those of you following the fortunes of the database market. Simon is a highly knowledgeable and entertaining presenter. Copies of his recently published book, the Associative Model of Data, will also be available on the day."
FP:
Another one who thinks he can whip up a "better" data model than RDM without understanding the RDM, nor what a data model is. I featured Simon Williams in our Quote-of-the-Week last month (QotWs are pronouncements by database vendors, practitioners or "experts" exhibiting lack of foundation knowledge). Williams is no exception (though he might well be entertaining.) If the guy wants to make a fool of himself, let him.

He has no data model and what he is proposing is sheer nonsense.

LF:
This is a response by Williams to the message I posted for you.
“Hi Lee, it's good to make your acquaintance. I'd like you to accept a complimentary copy of my book "The Associative Model of Data", and then invite you to debate my knowledge of data models and relational theory, and whether or not what I'm proposing is complete nonsense, in this public forum. Please email me your mailing address and I'll send the book right away. Meanwhile, in my defense pending a fuller debate could I offer quotations from the UK's two most respected technical analysts:
“We like the Associative Model of Data. While Object database vendors used to claim that objects were more intuitive than relational tables, the associative approach is arguably more intuitive than either. The Associative Model potentially offers huge advantages when it comes to both application reuse and application maintenance."
--Bloor Research
“It is eminently possible that the long-awaited replacement for the RDBMS may have just appeared on the market. That this will be a long process is an indisputable fact, but those with good memories will also be able to recall that the relational model was not universally accepted from its inception. It is also possible to recall that those organizations that took on board the relational model from an early stage were able to gain some large measure of competitive advantage both in terms of pure data management and with respect to developing powerful business-based applications. Butler Group believes that the new data model proposed by Lazy Software could also have a similar impact for early adopters."
--Butler Group
FP:
Williams could not offer better evidence that I am right than quoting Bloor and Butler. They may be "respected technical analysts", but they understand what a data model and RDM are as well as Williams, namely, zilch. I recall I critiqued at least one DBMS article they wrote in 1992 in which they were proclaiming "the end of RDBMS" at the hands of ODBMS. Now [9 years later] they are claiming the same for Williams' so-called AMD. What fad next?
[Ed. Note: Chris Date critiqued it too -- see below]. The industry is chockful of "respected analysts" who know very little. You can get analysts to say anything you want them to say these days -- which is easy if you know nothing.

Williams invites people to debate his knowledge of data models and RDM, but it comes through loud and clear from his public pronouncements, even if he is, obviously, oblivious to it. Consider the following:
“The most visible limitation of the relational model has been its inability to handle multimedia files, but the importance of this has been overstated. In fact, the relational model has some far more significant limitations that have not yet been challenged:
Every new relational application needs a new set of programs developed from scratch, which is labour-intensive, expensive and wasteful.
Relational applications cannot be readily tailored to the needs of large numbers of individual users, which is an issue for ASPs.
Relational applications cannot record a piece of information about an individual thing that is not relevant to every other thing of the same type. This limits our ability to continually improve customer service levels.
Information about identical things in the real world is structured differently in every relational database, so it is difficult and expensive to amalgamate two databases."
What the hell does this mean???? Such vague language has no place in the technical database field and should be itself a red flag. Incidentally, one of the main advantages of relational databases -- and an explicit objective of the RDM -- was precisely the ability to merge databases, something practically impossible in non-relational systems.

The amount of nonsense squeezed in these short paragraphs is impressive. Anybody who does not discern them is in trouble too. So let's keep matters simple: if Williams knows what a data model is, let him provide the definition and then explain how his AMD satisfies it.

CJD:
The following material is excerpted from an appendix in my book RELATIONAL DATABASE WRITINGS 1994-1997. It's edited slightly here. It begins with a quote from an article by Robin Bloor entitled "The End of Relational?"(DBMS, No. 7, July 1992).
“Relational databases can handle most varieties of structured data, but when it comes to text, compound documents, vector graphics, bit-mapped images, and so on, relational technology is out of its depth ... Those of you who attended training sessions on the relational theory of data can be forgiven for wondering why relational databases cannot adequately handle such data. After all, your instructor probably told you that the relational view was mathematically correct, provably correct, or something similar, but also far more flexible than anything that preceded it. That explanation is fairly simple and perhaps a little embarrassing for the computer world, because the relational theory of data is wrong. Data cannot always be represented in terms of entities, attributes, and relationships.”
Wow! ... so the relational theory of data is wrong, eh? Maybe Bloor thinks predicate logic is wrong, too? After all, the relational model is essentially just an applied form of predicate logic. If Bloor thinks he's found a bug in predicate logic, I look forward very much to hearing about it ASAP. I expect a lot of logicians and mathematicians would be pretty interested, too.By the way, I'd also love to see some data that can't be represented "in terms of entities, attributes, and relationships.
There's quite a bit more from the same source (unfortunately):
“So what is going on with normalization? By physically storing data as two-dimensional tables, relational databases encourage you to store your data in an atomic manner. This means that every time you wish to process an object, you must first assemble it ... It is as though you took your car apart to put it into the garage and had to reassemble it before driving it out.”
I am so tired of this stupid car analogy ... It crops up all over the place. It stems from a failure to understand the relational model, of course, and in particular a failure to understand the true nature of domains -- though in a way I can sympathize with this latter failure somewhat, since SQL has never supported domains. But this is a classic example of the relational model being criticized for not having been implemented! See my various presentations on this subject (especially those having to do with THE THIRD MANIFESTO). [Ed. Note: Getting Chris Date to say 'stupid' is really something.]
Anyway, Bloor continues:
“In order to support this form of storage, relational databases provide performance-hungry mechanisms -- foreign keys that increase data volumes and disk I/O, and optimizers that knit together the data that may never need to have been stored separately. Some databases [sic] even allow you to configure two (stored] tables to share the same ... [disk] page in an attempt to provide a back-door way of implementing an NF2 model -- a late, inefficient, and untidy mechanism that (text missing?) the promoters of the relational way have got it wrong.”
This is really confused ... By the way, note too that here we run into one of my pet peeves: the "database vs. DBMS" terminology issue. The point is: If we call the DBMS a database, then what do we call the database? Very common offense!
Bloor goes on:
“I do not want to go too far in my criticism. Normalization is an excellent technique for analyzing data, even if it is an abysmal technique for physically designing databases.”
By "physically designing databases," he means "designing physical databases" (I presume). Anyway, his remarks are quite absurd. Normalization was never intended as a basis for physical design. (Though in fairness perhaps I should say that the problem here is -- again -- partly caused by the SQL vendors, who failed to give us as much physical data independence in their products as they should have done. As a result, normalizing at the logical level -- where it belongs -- often does have the side-effect of "normalizing" at the physical level too.)

FP [Ed. note]:
For more on this logical/physical confusion see the last two articles in the Against the Grain series, my article "Denormalizing for Performance: Et Tu Academia?" and Chapters 5 and 8 in my book PRACTICAL ISSUES IN DATABASE MANAGEMENT.

Finally:
“Although it is now certain that the next generation of databases will be object databases [oh yeah?], we cannot predict with any confidence which the dominant products will be ... one thing we can be sure of: They won't be relational at the physical level.”

FP [Ed. note]: Now, this is really stupid!]

By the way, I should tell you that I've crossed swords with Bloor before, in one of my earlier "relational misconceptions" articles (Relational Database: Further Misconceptions Number Three, in my book RELATIONAL DATABASE WRITINGS 1989-1991). He was claiming in 1990 that SQL products lost updates, and further that the reason they did so was because of a flaw in SQL per se (i.e., non-SQL products didn't have the problem). The claim was utter nonsense, of course. It was also very badly expressed! For example, from Robin Bloor, SQL Compromises Integrity, Daemon 1, No. 1, ButlerBloor Ltd., Milton Keynes, England, June 1990).
“[Cursors] may be implemented as pointers or as direct copies of data ... where the cursor is more complex it is likely that the cursor will be held as an actual copy of information from the buffer.”

I'm tempted to offer a small prize to anybody who identifies the most errors in this particular quote. But it's so galling -- the sloppiness of expression, I mean. As I wrote at the time:

"I have two broad problems with (Bloor's article]: its overall message on the one hand, and the quite extraordinarily imprecise language in which that message is expressed on the other ... It is very distressing to find such sloppiness in publications dealing with relational technology of all things, given that one of the objectives of the relational model was precisely to introduce some sorely needed precision and clarity of thinking into the database field."
Anyway, back to the stuff about the relational model being wrong etc.: Not very surprisingly, Ted Codd responded to Bloor's article (in the October 1992 issue of DBMS). He referred to "the two mysterious assertions" in the final paragraph, and asked (very reasonably, in my opinion!):

  • Where are "object databases" precisely defined?
  • What is the meaning of "relational at the physical level"?

Bloor replied a month later (DBMS, November 1992), in an article entitled "In Response to Dr. Codd". As far as I can see, he didn't answer either of Ted's questions. But he did say:

“In [my original] article I commented on the diminishing influence of the relational model of data in commercial databases.”
Words fail me. What can I say? (Actually he might be right to say the relational model's influence is diminishing ... but if he is, it's the industry's loss, and it's partly the fault of certain "experts" who ought not to be working in a field they don't seem to have even the most elementary understanding of.)

Now (Comments on Republication)

  • Where are object DBMSs? Better still, where is ADM?
  • Even with their poor relational fidelity, acceptance of SQL DBMS can be considered universal. Unfortunately, no true RDBMS exist due the very lack of foundation knowledge that Williams, Butler and Bloor exhibit (e.g., they don't understand physical independence!). Hence the proliferation of "better data models" such as the "object model" and ADM contributing to the "diminishing influence of the RDM".


(Continued in Part 2)

 

 

 

 

No comments:

Post a Comment

View My Stats