Tuesday, August 31, 2021

TYFK: Normalized, Fully Normalized, Non-Normalized, Denormalized -- Clearing the Mess



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“A non-normalized database is a disorganized one, where nobody has bothered to work out where the facts should be stored. It is like a stack of paper files that has been tossed down the stairs. We are not interested in non-normalized databases.

A normalized database has been organized so that each fact is stored in exactly one place (2nf and greater) and no more than one fact is stored in each place (1nf). In a normalized database there is a place for everything and everything is in its place.

A denormalized database is a normalized database that has had redundancies deliberately re-introduced for some practical gain. Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed. Values are copied from table to table, calculations are made within a row, and totals, averages and other aggregrations are made between child and parent tables.”
--database-programmer.blogspot.com

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

- 08/13 OBG: The Myth of Market Based Education

- 08/05 TYFK:Facts, Properties, Relationships, Domains, Relations, Tuples

- 07/22 Documents and Databases

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

Misconceptions

Note: Level of normalization (normal form) is a property of individual relations. When associated with databases it refers collectively to their relations (e.g., normalized database means all its relations are normalized).

  • A non-normalized database:

- is not "disorganized";
- has nothing to do with storage (facts are not directly stored, the data of tuples that represent them is).

  • A normalized database:

- is not "organized so that each fact is stored in exactly one place";
- is not "in 2NF and greater";
- should not be confused with a fully normalized database.  

  • Denormalized is ambiguous in use, but:

- it has nothing to do with "calculated columns added to tables" (entire paragraph is nonsense).
- it is not a "normalized database that has redundancies deliberately reintroduced".

Fundamentals

"Disorganized database" is a contradiction in terms -- a database is an organized collection of data, regardless of how -- relationally or non-relationally.

Database relations are adaptations of mathematical relations to database management. They are a subset of the latter that retain mathematical properties, but are constrained for correctness in database practice (by-design semantic consistency and system-guaranteed logical validity). To that end the theoretical foundation of the RDM (SST/FOPL) requires adherence to three core database design principles:

  • Principle of Expressive Completeness (POEC);
  • Principle of Representational Parsimony (PORP); and,
  • Principle of Orthogonal Design (POOD).

According to a (yet unproven) McGoveran conjecture joint adherence to the principles implicitly produces databases consisting of relations that are both:

  • In 1NF (normalized) -- have attributes defined on simple domains the values of which are treated as atomic by the data sublanguage; and,
  • In 5NF (fully normalized) -- the only dependencies that hold in them are those of non-PK attributes on the PK.

Thus, while there are mathematical relations that are not in 1NF or 5NF, to ensure correctness there are no such database relations. To avoid common confusion we shall prefix the latter with DB. When we say that non-5NF relations are not relations we mean they are not DB relations.

The two conditions above hold for DB relations because each represents a group of entities of a single type (why?). There are two ways to violate the principles by bundling multiple groups into a single relation  -- nesting (non-1NF relations) and merging (non-5NF relations) -- which render databases non-relational. Adhering the principles produces DB relations and obviates explicit normalization to 1NF and further normalization to 5NF, relegating them to repair of designs that violate the principles.

Debunking

Unfortunately, there is rampant confusion, distortion, imprecision and ambiguity about data and relational fundamentals in the industry. Most practitioners:

  • Believe that database design is explicit stepwise "normalization" (e.g., "Here's some tables, how do I normalize them to 3NF? [sic]", where "normalize" is ambiguous: to 1NF, to 5NF, both, or neither.
  • Associate normalization levels -- which are specific to relational databases -- with non-relational databases. While the latter can be designed in a similar manner,  all relational advantages will not accrue.
  • Confuse levels of representation (e.g., "denormalization for performance"). Logical design has nothing to do with performance, which is determined exclusively by physical implementation.

The above definitions are a case in point -- compare them with the following.

  • A normalized database consists of 1NF relations with attributes defined on simple domains.
  • A non-normalized database contains one or more non-1NF relations with attributes defined on non-simple domains. This does not produce redundancies.
  • A fully normalized database consists of relations in which only dependencies of non-PK attributes on the PK hold.
  • A not fully normalized database contains one or more non-5NF relations with dependencies other than of non-PK attributes on the PK. This produces redundancies.

"Denormalized" is inherently ambiguous, which is exacerbated by how it is used in the industry. It could be viewed as (1) a normalized (1NF) database to which some non-1NF relations are introduced (no redundancies), but it commonly refers to (2) a fully normalized (5NF) database where one or more non-5NF relations are introduced (producing redundancies). It is necessary to review specific designs to determine which it is.

If you don't know all this you don't understand the RDM.






 


No comments:

Post a Comment

View My Stats