Friday, April 23, 2021

Relational Misconceptions Part 1: Relationships and Tables



Amid the plethora of industry misconceptions, an article titled "What if I told you there are no tables in relational databases?" is surprising. That it starts with:

“I’ve seen one sentence about relational databases repeated on the Web many, many times. I’ve seen it in countless comments, I’ve seen it in few articles. Recently I’ve even seen it in one book — which finally made me write this article. The sentence in question goes like this: "Ironically, relational databases deal poorly with relationships". Read it carefully. Think about it for a moment. I’m sure it must sound perfectly reasonable — for anyone who doesn’t understand the meaning of both "relational" and "irony".”

is practically shocking. Be that as it may, my regular readers know that I refer to pronouncements on the RDM as "heart in the right place": correct statements are not a guarantee of full grasp thereof, or effective explanations to practitioners lacking the necessary foundation knowledge. Hence this two part debunking.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST UPDATES
-03/15/21: Pruned the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

Relations and Relationships

“The word "relational" in a "relational database" has nothing to do with relationships. It’s about relations from relational algebra. So what is a relation, then? Relation is a set of tuples ... Tuples are finite ordered lists of elements. Every tuple in a relation has the same number of elements — it’s an n-tuple in general and specifically a 1‑tuple is called a singleton, a 2‑tuple is called an ordered pair and a 3‑tuple is a triple or triplet — and the elements on the same position are members of the same domains. A domain is a set of every possible values that a given element of the tuple can have. A domain, for example, can be a set of integers, or a set of two values "yes" and "no". So relation is a set of tuples. but it’s not just any set — it is [crucially] a subset of a Cartesian product of the domains ... From [this] we can derive almost everything else — which we’ll do in a moment. But first we need to understand few misconceptions.”

Note very carefully that this a description of mathematical relations and, as I reiterated so many times, database relations are adaptations thereof that preserve mathematical properties but are distinct -- if you don't know/understand the differences, you don't understand the RDM. For example, database relations have named unordered attributes and tuples are unique by virtue of PKs.

"Relational has nothing to do with relationships" refers, of course, to the correct origin of the term relational. But in the absence of foundation knowledge 'relational' can easily be misinterpreted as referring to the RDM and, thus, induce/reinforce the very misconception the author tries to dispel.

A relation is, indeed, a set of tuples, but by virtue of being a relationship among domains -- a subset of their Cartesian product. Unlike their abstract mathematical counterparts, database relations represent jointly at the logical level a conceptual model consisting of a collection of related entity groups -- a metagroup -- where entities are primitive and groups and the metagroup are compound objects with properties and relationships among properties, entities and groups. To that end relations are individually and collectively constrained semantically to be consistent with the model -- relationships are represented in the database by constraints. Practitioners are familiar with FK constraints that represent intergroup relationships (1:1,M:1,M;N), but largely unaware of those among properties and among entities within a group. For example, few realize that a PK constraint represents a relationship -- uniqueness -- among entities within a group.

In other words, database relations (1) are relationships at the logical level and (2) are constrained to represent relationships at the conceptual level. Indeed, if the RDM is about anything, it is about relationships. The misconception the author refers to involves the (2) relationships and is due primarily to poor grasp -- if any -- of the RDM. Practitioners miss (intragroup) relationships and do not realize that that is what constraints represent. SQL DBMSs reinforce it by not supporting constraints for all relationships.

Note: We recommend reserving the term relationship for the conceptual level and using constraint at the logical level to avoid the common and entrenched conceptual-logical conflation (CLC), but practitioners should understand that at the logical level a relation is a relationship.

Tables and Relations

“Relations are what most of people mistake for "tables". But there are no tables in the relational model. Relational databases are not spreadsheets, even though people often think about them as such and that leads to countless problems ... of course when you get the data out of a relational database it is usually presented as a table, with rows and columns, as in a spreadsheet. Also, the internal representation of data can use certain lookup tables and other table-like structures. But that is irrelevant, that is just an implementation detail. All we care about is that conceptually we are dealing with relations, just like when we are thinking about factorial it’s important to know that we are dealing with a function, no matter how it is implemented or stored in memory — as an array of opcodes, an abstract syntax tree or an s-expression.

Note: Sometimes the keywords in programming languages can be misleading and we have to deal with it. But even if the names are confusing it doesn’t mean that we should be confused. We need to understand the underlying concepts as an abstract meaning of particular names and symbols that have been chosen (often poorly) to represent some ideas.”

Presented as a table, exactly! It has been extremely helpful that relations can be visualized in tabular form, but the form does not play any part in the RDM and in the absence of foundation knowledge this has contributed to the confusion. I have always preferred to refer to a table that visualizes a properly constrained relation as a R-table, but it still does not have the mathematical properties of a relation. Note that it's impossible to determine whether a table is a R-table by visual inspection, without reference to the conceptual model (i.e., entity groups, properties and relationships) and the corresponding constraints, neither of which are visible in the table.
“At this point you may ask — if there are no tables in a relational database then what do I create with the "create table" command? You create relations. With a "table" keyword. Yes, it’s confusing. Think of it this way: The "table" keyword in SQL doesn’t create tables but relations, just like the "class" keyword in ECMAScript 6 doesn’t create classes but prototypes. There are no classes in JavaScript even though we have a "class" keyword and similarly there are no tables in relational databases even though we have a "table" keyword.”
I could not agree more with the author's note above. But CREATE TABLE does not create relations because, as explained so many times, SQL tables are not necessarily relations (why?). Users can design them so that they are, but even if they do, SQL DBMSs do not treat them consistent with RDM requirements. SQL was sold as a relational data sublanguage and accepted as such in an industry devoid of foundation knowledge, which is why the author, in an attempt to dispel a confusion, reinforces another: neither tables, nor SQL tables should be confused with relations. Incidentally, tables are not an implementation detail and SQL DBMSs may not be so considered either, as they don't implement (i.e., store) their data as such, as so many practitioners seem to think (neither should SQL tables be confused with tables that visualize relations).
“So, people say "tables" to mean "relations" — what’s the big deal? ... Table is a list of rows ... Relation is a set of tuples ... And a set is not a list. A set doesn’t have an order. And in a set no element can exist more than once. Every element of a set has to be distinct. And here you go, from the very first word of a definition we already see two important and yet obvious features of relations, that would seem awkward if we thought about tables ... It’s a set of something. And a set is not a list. A set doesn’t have an order. And in a set no element can exist more than once. Every element of a set has to be distinct ...”

Indeed, as we already stated above, but there is more to relations than insignificant (i.e., not meaningful) order and PKs, so stay tuned for Part 2.

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


Further Reading

Relationships and the RDM series

Levels of Representation Relationships, Rules, Relations and Constraints

What Is A Database Relationship 

Relationships and Relations

Semantics, Relations and the Missed Link: Constraints 

Understanding Relations Part 1: Tables So What

What Relations Really Are and Why They Are Important 

 

 

 

 

No comments:

Post a Comment

View My Stats