Thursday, March 25, 2021

OBG: Don't Confuse Levels of Representation Part 1

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

This is an email exchange with readers in response to my article Normalization and Performance: Never the Twain Shall Meet.


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-03/15/21: Pruned the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


On Normalization, Performance and Database Correctness

(originally posted 03/22/2001)

“Let us assume that database designers followed this premise and only "correctly designed" databases from day one. We would never be having this discussion. The concept of a database would only exist in the world of theory -- because, at the point of design, the limitations of the physical technology at the time must be brought into play or no one would ever buy the "correctly designed" database. Nothing happens until a sale--and to make the sale the product had better perform at a price the customer is willing to spend. There is a balance to be achieved between the theoretically pure and the realistic. Normalize to the point that it hurts--and then back off. Just my two cents.”

FP: Two cents indeed. You are simply missing the point. You don't appreciate the practical implications of DBMSs and databases that do not guarantee correctness and make you work your you know what off for no good reason. You take the poor state of products as a given and say "We must accept all these products from vendors, no matter how many problems they impose on us, so please don't bother me with the correct way of doing things such that it would minimize problems and maximize performance". How in the world will you ever get good products with this attitude?

If you were to build bridges, we pretty much know the consequences were we "realistic/practical" and ignored the laws of physics -- we see them when contractors do. This is exactly what you are proposing for databases with respect to laws of logic, except the consequences are somewhat harder to associate with their cause. Replace the word theory with science and you should understand what I mean. One of the most common mistakes by database practitioners (and I would venture to say, American culture in general) is a failure to understand the difference between "just theory" and "sound theoretical foundation".

Here is what I suggest: read [my paper on normalization] and then tell me that (1) you were aware of the integrity costs of denormalization and either (2) you enforced the additional constraints necessary, or  (3) you deem the consequences of not adding such constraints unimportant. Otherwise, we're grinding water.
“I've been in the computer business for over 20 years and fully realize the fallacy in the idea that any database system and/or computer system can "guarantee correctness." It is beyond the function of any database system to guarantee correctness. That being said, the issue comes down to correcting errors in the database. Now we get to the down and dirty, the consequence of errors in the database and the complexity of finding and correcting consequential errors in the database.”

FP: 1st, experience is not [substitute for] foundation knowledge. The database field is chockful of people with lots of experience but no knowledge of fundamentals. In fact, this is one of the biggest problems in the field, which I am trying to address. 2nd, there is a tendency to take my comments to an unintended extreme and then shoot that extreme down. When I said "guarantee correctness" ... I meant [something] very specific: enforcement of business rules. Are you telling me that DBMSs cannot and ought not guarantee it? That users should do that in applications? Because that is the implication of what you are saying, even though you may not realize it. 3rd, I gather that you have no intention to read the references I suggested and tell me what exactly is wrong with my arguments. If that is so, we won't get anywhere.
“If you had stated that database design should enforce business rules you never would have heard from me -- unless perhaps you made such an eloquent and emphatic statement that I was compelled to write AMEN! That is not what you said in your article. Nor did you say that in response to my e-mail. Rather, you responded, "You just don't appreciate the practical implications of products and databases which fail to guarantee correctness..." No product or database can guarantee correctness. I am not twisting what you said or taking it to an extreme. I am simply observing that your statement is flawed. Perhaps if I read your book I would have other opinions but my comments are based on what I was presented with. Trying to stay up to date in the computer business keeps me continually exercising my mind and thinking young. Which brings me to "is not everything." I don't have all necessary knowledge. That is a moving target and if I ever think I have it then I'm done in the computer business.”

FP: In my article that you responded to I state explicitly and clearly that denormalization [creates redundancies that] require additional integrity constraints that are either prohibitive, or not supported by products and are [almost never] enforced ... opening them to corruption (i.e., correctness is not guaranteed). And I referred you to the book, because I have examples there. I also point out in the article that denormalization per se cannot possibly affect performance. You ignore that and continue to maintain that normalization is not practical. That is upside down and backwards: it's DBMSs and denormalized designs that are impractical, practitioners just don't understand it. Unfortunately, what my website, books, articles and seminars demonstrate with evidence is that 80-90% of the material the industry generates on database management is non-sense. That is the source of your knowledge and it shows.

Comments on republication

In fairness, my reply about correctness was a bit of a gloss. Correctness is comprised of

  • Semantic consistency guaranteed (a) by users via database design and constraint declaration and (b) by the DBMS via constraint enforcement; and,
  • Logical validity guaranteed by the RDBMS via the relational algebra/calculus.

Semantic constraints are formalizations of business rules. They are declared by the DBA to the DBMS, which enforces them.

Performance is determined exclusively at the physical implementation level. If performance is not satisfactory with fully normalized databases, it is due to various implementation factors that inhibit it, such as poor support of physical independence. For example, if a SQL DBMS stores each logical table in a physical file, denormalization from two tables to one forces one file instead of two and it is the physical change that affects performance, while creating the misperception of "denormalization for performance".
The redundancy produced by redundancy requires additional constraints to protect integrity, which, if added, would cancel out the performance gains if any, but they are never added, which reinforces the misperception -- in effect integrity is traded for performance.

Note: We ignore, for the purpose of the example, that SQL DBMSs are not relational and SQL tables are not relations.


(Cont'd in Part 2)


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.





No comments:

Post a Comment

View My Stats