Sunday, October 23, 2022

NOBODY UNDERSTANDS NORMALIZATION 2 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

(Continued from Part 1)

What's right/wrong about this database picture?

“So, what is this theory of normal forms? It deals with the mathematical construct of relations (which are a little bit different from relational database tables). The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.”
--Vertabelo.com

Misconceptions

  • All database relations are, mathematically, relations, but not all mathematical relations are database relations.
  • The tabular structure play practically no role in RDM.
  • In practice there is no normalization (to 1NF) and there should not be further normalization (to 5NF).
  • Further normalization does not go from 2NF sequentially through 3NF and 4NF to 5NF.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

10/16 NEW "DATA MODELS" 2 (t&n)

10/08 NOBODY UNDERSTANDS NORMALIZATION 1 (sms)

09/17 NEW "DATA MODELS" PART 1 (t&n)

UPDATES

08/20 Added Logic and databases course to LINKS page.

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
------------------------------------------------------------------------------------------------------------------

Fundamentals

Mathematical relations are a special kind of sets -- abstractions that have no meaning (i.e., they represent nothing in the real world). Database relations are intended to represent entity groups defined by shared entity properties (some of which are relationships) and, thus, are semantically constrained: semantic constraints ensure that database relations are consistent with (i.e., faithfully represent) the properties and relationships defining the entity groups they correspond to.

According to a McGoveran conjecture, if database design adheres to three core principles (i.e., each database relation represents a group of entities of a single type), then relations are in both 1NF and 5NF and, thus, neither normalization (to 1NF), nor further normalization (to 5NF) are necessary.

In a 5NF relation the only dependencies that hold are functional dependencies (FD) of each of the non-key attributes on the primary key (PK) (i.e., for each PK value there is exactly one value of every non-key attribute). Then the only constraint to be declared to and enforced by the DBMS is the uniqueness constraint on the PK attribute.

If, on the other hand, the design principles are violated (i.e., multiple entity groups  are "bundled" into single relations), additional dependencies on multiple keys hold and, thus, additional additional constraints must be declared to and enforced by the DBMS, unnecessarily increasing the complexity of the design, structure and understanding of the database, as well as of query expressions and their results. This is why in RDM database relations are in 5NF by definition (i.e., they must be so designed), otherwise they are mathematically relations, but not database relations and all bets are off.

There are three ways to violate design principles (i.e., to bundle groups) that yield relations in 2NF, 3NF and 4NF (we ignore the 1NF because nobody designs non-1NF relations anymore, which obviates normalization). Note very carefully that, industry fallacy notwithstanding, it is not the case that database design starts with 2NF relations and proceeds with further normalization stepwise -- through 3NF and 4NF -- to 5NF! Rather, you design 5NF relations and no further normalization is necessary. If and only if you encounter poorly designed relations in one of the lower NFs, you further normalize them from either 2NF, 3NF, or 4NF directly to 5NF in one step.

Note: This stepwise fallacy is one of the main reason practitioners fail to understand  further normalization and get stuck trying to perform it (example: DATABASE DESIGN: THE STATE OF KNOWLEDGE IN THE INDUSTRY). My paper makes this clear: it does not take a 2NF example through all the "intermedia ry" NFs, but rather three 2NF, 3NF, and 4NF examples to 5NF.

As a set, a database relation that represents a group of entities of a single type, has an intension, the definition of the type of object the entities are (the shared properties that qualify them as set members); and an extension, the member entities that satisfy the definition (i.e., share those properties). The extension, but not the intension, can be visualized on physical media as the body of a table. Without the intension -- the constraints -- a table is an incomplete description of a database relation: RA operations, correctly defined, are not applicable to tables.

Setting Matters Straight

We revise the above comment as follows:

The theory of normal forms deals with the mathematical construct of relations (different from tables, the body of which only visualizes the extension of relations) as applied to database management. Database relations are in 5NF by definition. Normalization to 1NF and further normalization repair relations in any of the four different states of poor design (1NF-4NF) directly to 5NF.

(Continued in Part 3)




No comments:

Post a Comment

View My Stats