Sunday, August 28, 2022

NOBODY UNDERSTANDS DATABASE DESIGN 1 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

In a previous SMS post I debunked an attempt to express something important about database practice that was handicapped by lack of foundation knowledge. Here is another example.

“This Codd guy might have been onto something. Unfortunately, normalization is usually taught in a somewhat backwards, overly technical way. If you start with concepts, connections between them and details about them, you usually are already at a fairly high normal form without going through any formal normalization steps.”
--LinkedIn.com

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

08/25 NOTHING TO DO WITH RELATIONAL (t&n)

08/20 DATABASE RELATIONS, DATABASE DESIGN & CORRECTNESS (sms)

08/14 THE VOCIFEROUS IGNORANCE HALL OF SHAME (t&n)

UPDATES

08/20 Added Logic and databases course to LINKS page.

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
------------------------------------------------------------------------------------------------------------------

Misconceptions

If only normalization (a term misused) were taught "in a technical way" -- it is taught wrong!

  • "Concepts, connections and details" is vague language.
  • Normal forms are features of logical, not conceptual model.
  • Given a proper conceptual model, proper design does not produce "fairly high normal form", but fully normalized (5NF) databases.

Fundamentals

Given a conceptual model of some segment of reality of interest as a multigroup -- a collection of related groups of entities, each of which has entities of a single type as members -- if a database is designed such that every such group is represented by a database relation, then all relations are in both 1NF (normalized) and in 5NF (fully normalized). McGoveran had the insight that, formally, this means adherence to three formal database design principles:

  • Principle of Expressive Completeness (POEC);
  • Principle of Representational Parsimony (PORP);
  • Principle of Orthogonal Design (POOD).

In a 5NF relation the only dependencies that hold are functional dependencies (FD) of the non-key attributes on the PK (i.e., there is a 1:1 relationship between every PK value and each corresponding non-key value, but not vice-versa). This theoretical requirement of the RDM ensures correctness: design for semantic consistency and system-guaranteed logical validity, as well as other advantages. This is why we say that every database relation is in 5NF by definition.

Since non-1NF ("nested relations") design is, for all practical purposes, not practiced in the industry, the term normalization (to 1NF) is generally misused to mean further normalization (to 5NF) -- the two should not be confused and lumped together as 'normalization'.

Contrary to the industry practice, you don't start with some arbitrary 1NF design and pass it sequentially through each of the normal forms; rather, you design 5NF relations. Explicit further normalization is necessary only to repair non-5NF designs due to either poor conceptual modeling, or non-adherence to the three design principles, or both.

Notes:

  • If you think about FDs, you'll see that, logically, they are an implication of every group having entities of a single type: every descriptive entity property is 'about' the identifying property (name) for all entities in the group (i.e. the PK value).
  • The 5NF requirement is part of the current understanding of Codd's work (McGoveran interpretation), not the the traditional (Date) interpretation underlying industry practice. 
 

Setting Matters Straight

We revise the above comment as follows:

Practically all treatments of relational database design are incorrect or misleading. If you (1) model reality as a multigroup consisting of related groups of entities of a single type (i.e., that share the same properties and within-group relationships) and the groups share inter-group relationships and (2) adhere to the three design principles, you produce a 5NF database without any further normalization.

 

Further reading

Database Design: What It Is and Isn't

Normalization and Further Normalization Part 1: Databases Representing ... What?

Normalization and Further Normalization Part 2: If You Need Them, You're Doing It Wrong

Normalization and Further Normalization Part 3 Understanding Database Design

Normalized, Fully Normalized, Non-Normalized, Denormalized: Clearing the Mess

DATABASE RELATIONS, DATABASE DESIGN & CORRECTNESS

 

No comments:

Post a Comment

View My Stats