Saturday, July 10, 2021

Relational Misconceptions Part 2: RDM is Applied Theory

In Part 1 we showed (yet again) how even those with their heart in the right (relational) place can't help being affected by the common and entrenched industry misconceptions, in this case about relationships, relations and tables. More often than not authors exhibit the very misconceptions they try to debunk.

We left the author distinguishing sets (with unordered, unique elements) from tables (lists of ordered, possibly duplicate rows). 


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-05/09/21 Re-posted the 
FUNDAMENTALS page, the content of which had mysteriously disappeared.

-03/15/21: Pruned the POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


Logical Validity and Semantic Consistency

“The most important consequence of the fact that we are dealing with sets is that we have the entire set theory at our disposal. If we have sets, then we can have subsets, unions, intersections and — wait for it— Cartesian products!”
IOW, the importance is access to set operations with which sets are manipulable mathematically to derive other sets (just like numbers are by arithmetics). One of those operations, Cartesian product (Cp), when applied to two sets produces a set, a subset of which is a relation. If a database consists of relations, a RDBMS can manipulate them mathematically and produce relations as results system-guaranteed to be logically valid.

But a mathematical relation is an abstract set -- it represents nothing in the real world:

  • Is defined on "arbitrary" (i.e., mathematically set) nameless domains domains/attributes identifiable by order;
  • Has tuples that are unique by mathematical definition: Cp tuples are unique combinations of the values of all its attributes).

Database relations are not abstract, they represent entity groups and must be constrained (restricted) for semantic consistency -- faithfulness to the conceptual model represented by the database, without loss of mathematical properties and logical validity of results. The RDM is an adaptation of the mathematical theory of relations for database management. PKs and insignificant order are only two of of several restrictions necessary.

  • Domains are named and constrained to represent properties;
  • Attributes are named and constrained to represent properties in specific group contexts;
  • Tuples are PK-constrained to represent (facts about) entities that are distinguishable in the real world;
  • Relations are constrained

- individually to represent within-group relationships between properties and entities; and,
- collectively to represent cross-group relationships.

Note: We usually do not represent in databases all the properties of entities that jointly render them unique, for which reason and more convenient identification we assign entities unique names represented by PKs.

The relational algebra/calculus are also adaptations of mathematical set operations to database management. For example, join is a combination of primitive Cp, selection and projection that is useful for querying databases and is defined to work correctly with relations that represent groups of entities of a single type (i.e., are by definition in 5NF).

System-guaranteed logical validity and by-design semantic consistency jointly comprise correctness of inferences from databases (i.e., query results) that the RDM confers, together with decidability of data sublanguage, physical and logical independence and more.

We prefer the term R-table for a table that visualizes a relation, a data body of which (relations don't have headers) obey all semantic constraints on the relation. Note that the constraints are not visible in a R-table.

Missing Data and NULLs

“Now, what about the NULL value that is present in relational databases? The NULL which means a lack of value can be thought of as an implicit member of every domain, unless you have a NOT NULL restriction. We won’t talk about NULL in the next examples for the sake of simplicity and decency.”
A lack of value is represented by a NULL ... value? This language common in the industry reveals the confusion about missing data. Relational databases do not have NULLs, SQL databases do, and they should not be confused either.

Note: In fact, Date did propose a missing data scheme whereby default values were part of every domain, but he renounced it. For a proposed relational scheme without NULLs see The Last NULL in the Coffin.

Relational and Computational Completeness

“Most people who think in terms of tables think about it imperatively ... But is it the right way to think about that problem? We’re not using the relational model to our advantage. We need to think declaratively instead of imperatively. We need to explain what we want, not how to get it. But it’s hard to think declaratively if we ignore all of the relational model and set theory because we don’t really have good words to explain the result that we want to get. We don’t have the right language.”
Let me rephrase this: R-tables per se are not the main inducer of imperative (i.e., procedural) thinking, programming is. It also contributes to the failure to understand and appreciate the important difference between a relationally complete data sublanguage and a computationally complete programming language.

“In my opinions there are three main subjects that need to be known to understand relational databases:

  • set theory
  • predicate calculus
  • relational algebra

Without it you will never truly appreciate the relational model.”

Could not agree more, but while necessary, they are not sufficient. What is also crucial to understand, but practitioners do not, is that the RDM is applied theory -- an adaptation of the theory to database practice. And understanding of neither the theory, nor its adaptations are achieved via experience with industry tools -- certainly not SQL, let alone non-SQL DBMSs -- but foundational education, which practitioners have long stopped undergoing. Academic courses have become industry tool trainers and even when the rare one covers the theory it does not convey the adaptations and their advantages.
Here's an excellent example.





No comments:

Post a Comment

View My Stats