Monday, April 25, 2022

SMS: Relational Database and Set Theory

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “To what extent is relational database theory related to set theory?”

A: “Relational database theory is indeed closely derived from set theory. Many operations in relational data are directly related to common operations one does with sets. In fact, SQL has keywords for them that should sound familiar to someone who has just taken a class in Discrete Mathematics:
  • DIFFERENCE (called MINUS in Oracle)
Even the structure of a table is set-oriented. A table is a set of rows, and a row is a set of columns, and those columns must match the set of columns defined in the table's header.”


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.


04/10 SMS: Quota Queries

03/25 SMS: Keys and Indexes

03/18 SMS: Entities and Records

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.


A relation is a special type of set that can be manipulated mathematically. Relational database theory (aka the RDM) is an adaptation and application of that part of simple set theory that is expressible in first order predicate logic (SST/FOPL) to database management: database relations are mathematical relations constrained to represent in the database groups of entities. The relational algebra (RA) is the mathematical manipulation component of the SST adapted for making database inferences: derive from database relations new relations that are logical implications thereof.

The SST/FOPL foundation confers the following advantages on database practice:

  • Soundness (i.e., correctness):

- system-guaranteed logical validity;
- by-design semantic consistency;

  • Data independence;
  • Data sublanguage:

- decidability;
- declarativity

and more.

The current RA -- Codd's revision of the 1969-70 original -- consists of some SST operations (product, union, intersect, difference), some new primitive set operations (restrict, project), as well as some derived from them (join = product+restrict+project) that are useful for database management. But according to McGoveran's new understanding of Codd's work, the relational benefits materialize if and only if:

  • Database relations are in 5NF (i.e., each represents a group of entities of a single type); and,
  • The RA:

- is closed over 5NF relations;
- supports multi-relation results;

which is not the case for the current RA and must be revised accordingly -- David's endeavor in progress.

A R-table visualizes a database relation on some physical medium -- tuples display as rows, columns as attributes -- but the row-column arrangement plays no formal role in the RDM. In fact, only the body of the table (data) visualizes the relation, the header holds meta-data, which belongs in the database catalog.


Not only doesn't the version of the RDM in current practice mandate 5NF relations and 5NF closure of RA, but SQL DBMSs do not even mandate non-5NF relations. In other words, set keywords notwithstanding,
SQL is not a relational (SST/FOPL) data sublanguage and, thus, cannot and does not guarantee relational advantages, because:

  • While the operations of the RA as defined by Codd were derived from SST, the RA in use is not a proper algebra because it suffers from "anomalies" due to its operation on non-5NF relations and lack of closure over 5NF relations. 
  • "Set oriented tabular structure" induces/reinforces confusion of tables with db relations. A R-table is a special kind of table that visualizes -- but is not! -- a db relation and RA  operates on 5NF relations, not tables. Contrary to Date and Darwen's interpretation of the RDM (to the extent that even it is understood in the industry), db relations do not have headers.

Associating current practice and tools with SST/FOPL, misleads and contributes to poor foundation knowledge.

No comments:

Post a Comment

View My Stats