Sunday, December 5, 2021

TYFK: How Not to Explain the Relational Model



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The key idea is "Parent-Child" relationship. Entities ~ Relations ~ Tables (tilde stands for "more or less like"). Concept of a Table resonates with most of the people just as everybody intuitively grasps a concept of "rows and columns” but might struggle with "tuples and attributes". Explain relations and relationships, 1:1, 1:N, N:N etc. Explain rationale for this way of collecting and storing data, touch upon data normalization, and tell a few anecdotes about cost of storage back in 1970 and Y2K problem it have caused; add that we have inadvertently created Y10K problem while fixing it (not exactly true but not wrong either). Show an ERD diagram, trace the relationships, introduce SQL, maybe run a few simple SELECT queries to help your listeners visualize it, including equijoin and ORDER BY. Save other JOIN types, data types and other, more advanced topics, and for the next encounter.”
--Quora.com

 An excellent example that validates my claim of lack of foundation knowledge in the industry: most "explainers" of RDM have acquired relational jargon, but do not know or understand it at all.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

- 11/25 Nobody Understands the Relational Model: Semantics, Closure and Database Correctness Part 3

- 11/19 OBG: The Fate of Fads -- XML DBMS

- 11/11 Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2

- 11/05 OBG: Database Consistency and Physical Truth

- 10/27 Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 1

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

Misconceptions

  • "Parent-Child" is not the key relational idea (but the exact opposite);
  • Relations do not represent entities;
  • Tables, rows and columns are not "more or less like" relations, tuples and attributes;
  • 1:1, M:1 and M:N relationships are not the only type of relationship represented by the RDM in databases, and are not the feature to which the RDM owes its name;
  • The relational idea has intentionally and explicitly nothing to do with data storage;
  • Proper database design does not require (it obviates) normalization and further normalization;
  • SQL is not a proper data sublanguage, let alone truly relational and its teaching should not be syntactic;
  • ORDER BY is a non-relational, non-database operator;
  • Neither equi-joins, nor joins are the only relational operations;
  • Programming data types are not part of the RDM;
  • The chance that such explanations will lead to understanding of the RDM is nil.

Debunking

"Parent-child" is the key idea behind hierarchic DBMSs that preceded the RDM, which was introduced precisely to address the complexity and rigidity issues thereof.

Database relations are mathematical relations constrained to represent formally in databases groups of entities of a single type, for which reason they are by design in both 1NF and 5NF; it's their tuples that represent individual entities.

R-tables, columns and rows should not be confused with database relations, attributes and tuples -- they only visualize them on physical media and play no role in the RDM.

Conventional wisdom notwithstanding, M:1 and M:N relationships between entity groups are not the only ones represented in relational databases -- there are several other types of relationships, all of which are formally represented by constraints in databases.

Contrary to the logical-physical confusion (LPC) rampant in the industry, one of the core objectives of the RDM is physical independence (PI) -- the insulation of applications and users from implementation details.

For correctness, database relations are by definition -- and thus must be designed to be -- in 1NF and 5NF and the relational algebra (RA) must be revised for 5NF closure, in which case explicit normalization (to 1NF) and further normalization (to 5NF) is required only to repair bad designs.

SQL's introduction and adoption as a relational data sublanguage is primarily responsible for the lack of true implementations of the RDM. SQL DBMSs have always ben confused with RDBMSs while in reality SQL is not a proper data sublanguage, let alone truly relational.

Because physical order plays no part in the RDM, ordering of attributes and tuples in results of RA operations is not relational and, therefore, an application, not DBMS function.

RA consists of several primitive set operations that are combined into derived operations, one of which is join, a combination of cross-product, restriction and projection. As surrently defined, RA operations, including join, lack 5NF closure and, thus, correctness is not guaranteed.

Notwithstanding conventional wisdom, relational domains should not be confused with programming data types, from which they are critically distinct -- the reason Codd used a different term.





No comments:

Post a Comment

View My Stats