Sunday, May 28, 2023

INTENSION, EXTENSION AND R-TABLES (t&n)



Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the disregarded evolution/progress of RDM, I am re-visiting my old debunkings, bringing them up to the current state of knowledge. This will enable you to judge how well arguments have held up and realize the increasing gap between industry stagnation --  and scientific progress.

THEN: THE IMPORTANCE OF RELATIONAL TERMINOLOGY (t&n)

(email exchange with a reader originally published September 2002)

“Saw your latest and once again I think you have hit one of the many protruding nails on the head. Understanding one's data is so central and so crucial and yet so often ignored.

All this talk (not from you, I note) of silver bullets. Nothing new and I wonder if the paying customers and the big-ticket so-called technology strategy companies will ever wise up. Edward de Bono wrote of 'porridge words' that distract thought from the matter at hand. When used sparingly, they can facilitate new lines of thought but when, as they are in this field, they are used so casually and often they blur the real issues. All this technicalese of XML etcetera has this effect on me.

During one of the few times an employer allowed me to help people with logical design, I was having difficulty because the customer's IT staff knew very little English and had perhaps even less database background. I hit on the idea of explaining tables as relations and relations as sentences - sentences that must have the same 'size and shape'. Their faces seemed to light up and when they agreed that they had overloaded some of their tables, I was very pleased with myself. I felt vindicated a few weeks later when I read an article about predicates and propositions that Hugh Darwen had written in the now defunct DBPD magazine, put these thoughts much more precisely than I could, . Of course, the changes created new problems because the database product, like so many others, gave precious few ways to map the logical design to the physical one. But I regarded these as preferable problems since the staff was much more interested in the more concrete physical optimization techniques.

Without any disrespect to Dr. Codd (who I once met but was too awe-struck to ask any questions of), I have often thought that the language used by everybody in the field, with words such as "tables", nearly always brings connotations of physical arrangements to the mind of anybody who has done traditional programming. This seems unfortunate to me. Especially after I read Mr. McGoveran's proposals for results that might embody more than one table. (I wonder if these might not be part of the key for much better physical integration of databases with their visualization for users, not to mention smarter engines.)

I came across a site https://www.mcjones.org/System_R/ the other day, where a bunch of the System R people reminisced about its development on the occasion of, I think, the 25th anniversary of one of Codd's early papers. Presumably Mr. Date was absent from this gathering so that he could write his own most interesting history, which I remember reading five or six years ago. Anyway, I was struck again by how often their design decisions were either determined or distorted by physical considerations. And now, when many of the obstacles have been overcome courtesy of Moore's and other laws, some of those clever people seem regretful.

Also, please let me submit an historical, non-technical 'nit' to Mr. Date - I remember him writing that Codd did not coin the database term 'normalization of relations' as a result of R.M. Nixon's foreign policy excursion with China. But I also remember reading what I recall was an original interview with Dr. Codd in the DBMS magazine where he stated that this was the case. It's not really important, perhaps I'm just sensitive to it because I live in a country that established relations with modern China a year earlier!”

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

05/08 ON PROPERTIES & CHEN'S E/RM (rm)

04/30 RELATIONSHIPS AND THE RDM V2 Part 3: SEMANTIC CONSTRAINTS

04/23 THE DENORMALIZATION ILLUSION (t&n)

UPDATES

04/23 Added The Story of Mathematical Proof to LINKS page

04/03 Added First OrderLogic to LINKS page

04/03 Added Mathematical Logic - Reasoning in First Order Logic to LINKS page

03/26 Added Modeling of Integrity Constraints Dependencies to LINKS page

03/14 Added Russell’s On Denoting to LINKS page

03/14 Added Russell’s Paradox to LINKS page.

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)

08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
------------------------------------------------------------------------------------------------------------------

Using complicated or pompous language is an old trick in the absence of substance [to obscure the latter and] impress the uninformed. The lack of foundation knowledge and the "server bullet" are in the culture of this society. It is anti-intellectual and counts on lack of adequate education to push all sorts of silver bullets.

Explaining the RDM to the industry was not Codd's forte, but his contribution is major enough that he should be excused for and from that (although it would have helped had he been better at it). To his credit, however, he introduced the term relation and tuple precisely in order to distinguish them from the physical file and record concepts. He was more or less forced into tables, but the simplicity of and universal familiarity with tables was too useful to give up, which is why I am prefixing the term with an R,  to express that they are a special kind of tables, purely logical in character.

And Now

Einstein famously said that everything should be as simple as possible, but not simpler". Hugh Darwen's article is a classic example and a must read for any practitioner who wants to understand the RDM.
To this I add that true understanding means proper use of terminology. An indicator that something is seriously amiss in the industry is that most practitioners misuse and abuse relational terms without understanding them.

OTOH tables were initially an effective means of communicating the RDM -- without them we wouldn't have had even SQL and the little RDM knowledge there is. Unfortunately, poor understanding led to their being equated with relations, which inflicted major damage to understanding. Not just inducing/reinforcing the logical-physical confusion (LPC) that you mention, but also distracting away from the intension of relations (semantic constraints) that is not visible in tables and towards the extension (table body) and the normal forms. That obscured the need for database relations to be in 5NF by definition and the relational algebra (RA) from having 5NF closure and integrating the semantic constraints -- resulting in semantic loss referred to misleadingly as "update anomalies". Otherwise put, what the industry's current little "understanding" -- such as it is -- is not of the RDM, but a misinterpretation of Codd's work.

SQL is another testimony of poor understanding, a factor in the failure and the absence of truly relational DBMSs -- as a whipped-up prototype for the System R research project, it was rushed into the market, confused with a true relational data sublanguage and the rest is, sadly, history.

RELATIONS, DATABASE RELATIONS AND TABLES

DATABASE RELATIONS, TABLES AND SEMANTIC CONSISTENCY

 

 

 

 

No comments:

Post a Comment

View My Stats