Sunday, April 23, 2023

THE DENORMALIZATION ILLUSION (t&n)



Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the disregarded evolution/progress of RDM, I am re-visiting my old debunkings, bringing them up to the current state of knowledge. This will enable you to judge how well arguments have held up and realize the increasing gap between industry stagnation --  and scientific progress.

DENORMALIZATION, PERFORMANCE & INTEGRITY

(Email exchange with reader originally published August 2002)

Then ...

“I'd like to comment on your other recent articles: on denormalization. Of course you prove that denormalization does not improve performance, because you pay for it by maintaining integrity. But, when people say that de-normalization improves performance, they usually mean just on one side. For example, I can merge DEPT and EMP tables into a third table DE and achieve a better query performance by replacing a join by a simple select from the new table. If this is the most frequent and most important operation in my application (vs. updates, inserts, deletes), then my overall performance will be improved (and that's what usually happens in DW). But if the opposite is true, then performance will suffer. I didn't see these considerations in your articles ...

Many people, yes, but not nobody. I always considered the cost of denormalization. I know many people in this field that do the same; however, I do agree with you that many people, especially those "younger" ones learning from more "modern" books on database design, especially those in the OO field, are not aware, and what's worse, don't even want to be aware.

That's exactly how I always thought and when I had discussions with people, that's what I always said to them (not that it made a big difference in their thinking). However, when I read your articles on this topic, I had another thought. As you always say (and again, I fully agree with you on this), we must always separate logical and physical. I always considered denormalization as one of the things done at the physical level. So, denormalization shouldn't even be your concern, because it has nothing to do with the relational model. The rule I always follow is that whatever I do at the physical level, it should not destroy my logical model, which must stay normalized. If I denormalize to achieve some performance gains for a selected set of functions, then I do pay for it by writing additional logic to preserve the integrity and by creating views that represent the entities on my logical model, which I had to "destroy". So as long as I separate these two levels, I don't think I'm in any conflict with the relational model. Of course if DBMS gave me more options in physical design while protecting the integrity of my logical model, I wouldn't have to do this myself.

Theoretically, I think the way you do, and that's why I enjoy reading your columns. But I also have to deliver practical results to my users. Unfortunately, I can't go to my users and tell them that their response time is slow because of Oracle's technology. And I don't believe screaming at Oracle will do me any good either (and yes I know what you will say to this). So until that mysterious technology you mentioned many times is implemented, I have to do what I can.”

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

04/16 RELATIONSHIPS & THE RDM V2 PART 2: INTRA-GROUP RELATIONSHIPS

04/08 MISSING DATA: RDM VS SQL -- A REAL WORLD COMPARISON (t&n)

03/31 I Left Ceausescu's Romania for AI Algorithms??? At Least Him I Understood!!

03/26 RELATIONSHIPS AND THE RDM V2 PART 1: RELATIONS & DATABASE RELATIONS

UPDATES

04/23 Added The Story of Mathematical Proof to LINKS page

04/03 Added First OrderLogic to LINKS page

04/03 Added Mathematical Logic - Reasoning in First Order Logic to LINKS page

03/26 Added Modeling of Integrity Constraints Dependencies to LINKS page

03/14 Added Russell’s On Denoting to LINKS page

03/14 Added Russell’s Paradox to LINKS page.

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)

08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
------------------------------------------------------------------------------------------------------------------

You may get better performance with denormalized databases, but that is due (1) exclusively to implementation factors and (2) only if you ignore integrity -- you trade it off for the performance. Now, given existing products, you may have to accept the risk knowingly; but the problem is that data practitioners are not even aware of the risk -- they are oblivious to the cost of denormalization and blame the RDM and full normalization, which is absurd.

Performance [is determined exclusively by physical implementation (DBMS optimizer, hardware, network load and so on). If you denormalize and -- all else being constant!!!! -- you get better performance, it only means that commercially available SQL DBMSs (which are not relational) fail to perform well with the correctly designed (i.e., fully normalized) database. But instead of drawing this conclusion and pressuring vendors for better implementations (even if in some long run), users accept denormalization as if it were a solution], which explains why there are no true RDBMSs almost six decades since the RDM was introduced. The point is that if users understood the RDM, they would stop confusing SQL with relational products and levels of representation. As long as they think the way you do, there will be no progress in database management.

No, it is not [that important]. The really important points are to realize that (a) it's physical implementation and not logical design that's the source of trouble; and (b) implementations forces you to trade integrity for performance. As it is, they think it's logical normalization that's the source of the problem and they are unaware of the cost and do not correct solution. Emphasizing the point you insist on would not achieve the objective of getting them to understand the real problem. I want people to realize that they are forced by vendors to trade integrity for performance and that the problem is products, not normalization or relational theory.

The vast majority doesn’t know and I have demonstrated this over and over again where I quote both users and vendors (watch this space for a recent exchange that proves it yet again). Moreover, I have challenged in writings, lectures and seminars to specify the additional constraints [necessary to protect semantic consistency in denormalized databases] and nobody has ever done so. They have no clue, because they don’t understand RDM. Were you able to come up with the join constraint in the article before you saw my explanation and constraint formulation?

Data independence -- [both physical and logical] has been a main objective of RDM, but that is another thing practitioners are ignorant of, because SQL products don't support it very well and induces confusion of the levels of representation without realizing it.

You are correct in principle, however, I would not use the term "denormalize at the physical level": Normalization is a purely logical concept -- this is another way to induce level confusion [-- which is why we proposed the three-fold terminology of conceptual modeling, logical database design and physical implementation]. Simply say that you must have freedom to choose any physical implementation and change it at will to maximize performance without bastardizing your logical design and incur integrity risk. To the extent that you must do the latter, know that it's the product and other implementation details that are the culprits.

You cannot generalize from yourself. You are in a minuscule minority.

... and Now (comments on re-publication)

At the time of the exchange the understanding was that database relations are at least in 1NF (normalized), but preferably also in 5NF (fully normalized). We now understand that database relations are by definition and design in 5NF and relational algebra has 5NF, not just 1NF closure (which the current RA version does not), otherwise all bets are off.

There is still utter confusion of:

  • Normalization (to 1NF);
  • Further normalization (from any of 1NF-4NF) to 5NF
  • Full normalization (5NF)*
  • Denormalization (from 5NF to any of 1NF-4NF)






No comments:

Post a Comment

View My Stats