Monday, February 1, 2021

Normalization -- Will They Ever Learn?

“To Normalize or not to Normalize? that really isn't a question. few things to consider:
Normalization is supposed to protect from data anomalies, but not prevent us from using data encapsulation is the magic trick that allows you to do what you want without breaking rules.what are your experiences with normalization?”

This is a question that at this time need -- and should -- not be asked anymore, and the fact that it still is is one confirmation -- among many -- that there is no progress in data management. According to the current understanding of the RDM:

  • Database relations are both normalized (in 1NF) and fully normalized (in 5NF) by definition, otherwise they are not relations and the relational algebra (RA) does not work;
  • Adherence to three database design principles produces 1NF and 5NF relational databases;
  • Consequently, there should not be such a thing as "doing" normalization (to 1NF) and further normalization (to 5NF) except to repair databases that are non-relational due to failure to adhere to the principles.

Note: The three design principles are fundamental to SST/FOPL foundation of the RDM, but were never understood even by relational proponents. I do not know what encapsulation has to do with this.


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


Unfortunately, not only is the question asked, but is also answered with the same decades old misconceptions. Consider the following:

“every database designer MUST know how to normalize AND denormalize deliberately. Context is what i find important. back in the 80's, we learned how to normalize any data to at least 3NF. beyond that depends on what you need the system to do. Personally i have never gone through a "normalization" process, I end up at 3NF by default whenever designing data models. After that, the bit that today's students seem to skip is the denormalization process. Once you have a logical normalized model, you then trip through multiple denormalization steps to arrive at a physical model that will work for YOUR application.  e.g., If I have a classic Orders and Order Details E/R, I will typically write DB triggers that automatically SUM the line values to the parent record. (dont even think of challenging what I do, I have used these 35 years or more, they work and add major performance value to any system). there's about half a dozen other denormalization steps i work through when designing a physical database schema. Once upon a time, we were taught all these steps - entity life history, access path analysis, and more so that one could translate a logical business model into a highly performant software application.”

I responded that there are too many misconceptions to debunk in a LI exchange, and that I will do it in a post, so here goes.

Normalization is the repair of a non-1NF database; since the RDM does not permit relation-valued domains (RVD), in practice this is never called for;
  • The repair of 2NF-4NF databases is referred to as further normalization (i.e., beyond 1NF);
  • I don't know what "beyond 3NF depends on what you need the system to do" means. The only reason for the arbitrary stop at 3NF is lack of understanding what non-relations in 4NF are and how to bring them to 5NF, all of which is unnecessary if the three principles are adhered to.
  • You don't "design data models" -- you use a data model (e.g., the RDM) to formalize conceptual models as logical models for database representation and implementation;
  • Logical design has nothing to do with performance, which is determined exclusively at the physical level.
  • To the extent that a SQL DBMS (which is not relational!) exhibits poor performance, this is due to implementation issues by either DBMS vendors or users;
  • Both further normalization and denormalization are logical and independent of physical implementation. It is impossible, therefore, to "arrive at a physical model via denormalization steps", or "work through denormalization steps when designing a physical database schema" -- this is logical-physical confusion (LPC).

Note: Because SQL DBMSs have limited physical (storage and access methods) options and weak support of physical independence (logical designs are tied to particular physical implementations) logical denormalization may be associated with physical changes that improve performance, creating the impression that the denormalization is responsible, reinforcing the LPC.

“personally I hope to never end up with 5NF physical schemas. they are usually logically 'perfect' but end up costing way more in code and in performance. once one has normalized, the essential next step is to DEnormalize for the specific application needs to support what you already know will be needed functionally. this trades off against the future. a more perfect logical implementation will tend to support future changes more easily. a more compromised and denormalized implementation will tend to be harder to extend or adapt later. weigh this against most business software has maybe a 5 year lifespan when choosing.

This paragraph is an excellent example of the use of relational jargon without a good grasp of the fundamentals.

  • There is no such thing as a "5NF physical schema" (LPC again) -- see note above;
  • Logical design is not a matter of "perfection", but of correctness!
  • The whole point of physical and logical independence is an application-neutral (not biased!) database that can be optimized and re-optimized physically for all applications without impairing them;
  • Upside down and backwards: that's precisely why relational (i.e., 5NF) database are more resilient to change than non-relational denormalized ones;
  • If SQL DBMSs "cost code", that's because SQL is a poorly designed data language with weak relational fidelity; and application programmers often do not use it properly;
  • "Denormalized implementation" induces/reinforces LPC, which is why I recommend a three-fold terminology of conceptual modeling, logical design and physical implementation;

Note: Another participant commented that he overdid it once, trying 6nf for stuff that was best in about four tables. That's abnormalization.” It would be, of course, non-sensical to go over 5NF, but he is referring to 5NF as "over-normalization" -- his 4 tables being under 5NF. 


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


Normalization and Further Normalization Parts 1-3

Data Independence and Physical Denormalization

Denormalization for Performance: Don't Blame the Relational Model

What Is a Relational Schema

Database Design and Guaranteed Correctness Parts 1-2

Levels of Representation Conceptual Modeling, Logical Design and Physical Implementation

No comments:

Post a Comment

View My Stats