Monday, May 2, 2022

SMS: "Relation Proliferation"?

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “How do I avoid too many relations in databases?”

A: “You don’t. Every relation is there to store meaningful data, hopefully you do not define database relations for data that are not to be stored in your database.”

A: “By following proper design principles. Normalization, standard data patterns, and progressing from logical to physical always. Never denormalize (or avoid normalizing in the first place) because performance never trumps accuracy. It really doesn't matter how fast you get the wrong answer.”


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.


04/25 SMS: Relational Database and Set Theory

04/10 SMS: Quota Queries

03/25 SMS: Keys and Indexes

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.


Conceptual modeling organizes reality of interest as a multigroup -- a collection of related entity groups, each a set of entities of a single type. A relational database is a formal logical representation of the multigroup: a collection of base relations, each constrained to represent a group in the database. The number of base relations is determined by (1) the entity groups identified as meaningful to applications during conceptual modeling and (2) database design, which assigns a base relation to each.

The theoretical foundation of the RDM (SST/FOPL) guarantees correctness iff database design adheres to three formal principles:

  • Principle of Expressive Completeness (POEC): all relations meaningful to applications are derivable with the relational algebra (RA) from base relations;
  • Principle of Representational Parsimony (PORP): no base relation is superfluous;
  • Principle of Orthogonal Design (POOD): no base relation is derivable, in whole or partially, from the others.

According to a McGoveran conjecture, joint adherence to the three principles implies full normalization (but not vice-versa!): joint adherence to the three principles produces base relations in 5NF -- a formal way of saying that each represents a group of entities of a single type --  and obviates explicit normalization to 1NF or further normalization to 5NF. As I wrote before Normalization and Further Normalization: If You Need Them, You're Doing It Wrong

Note: "Not vice-versa" means that full normalization does not necessarily imply joint adherence to all three principles, because 5NF is a property of a single relation POOD applies to the (mulltirelation) database.


It is, thus, true that (per first answer), you don't [have to] avoid "too many relations" if you follow (per second answer) "proper design principles", provided those principles are the three formal ones above -- it's they that guarantee neither too many, nor too few base relations. If that were industry practice, the above question would not even come up. That it does -- and often! -- indicates that this is not the case which, unfortunately, not only the question, but both answers make clear.

With respect to the first answer:

  • "Too many" base relations can be due not only to "data that is not meaningful", but also to violations of the three design principles that "split" meaningful data about single entity group into multiple relations ("too few" relations are the result of the opposite -- "bundling" meaningful data about multiple entity groups into single relations -- denormalization).
  • Note the common confusion of levels of representation: relations are logical and physical true RDBMSs support physical independence, insulating them from how their data is physically stored and accessed. Language like "stored relations" induce and reinforce the confusion. It is to preempt it that proposed the three-fold terminology conceptual modeling -> logical database design -> physical implementation.
With respect to the second answer:
  • "Proper design principles", whatever they are, are not the three formal ones.
  • "Standard data patterns", whatever they are, are ad-hoc, not grounded in theory.
  • Adherence to the three design principles produces 5NF relations and obviates explicit normalization (to 1NF) or further normalization (to 5NF) (the two are distinct but usually lumped together as normalization). 
  • SQL DBMSs falsely sold as relational and poor database design and implementation practices force users into a false choice of correctness vs. performance, which they erroneously blame on the RDM.





No comments:

Post a Comment

View My Stats