Sunday, April 7, 2013

More on Relational Denial

Note: What follows are my comments on a LinkedIn exchange, So What is a 'Large Database'? Minor edits of the online comments for grammatical, clarity, precision and coherence purposes are within square brackets.
PS: No doubt Oracle/SQL Server/etc are designed and optimized to deal with normalized data. That's where the power lies. They're like Sirens though ... those who don't respect them with proper designs are destined to have performance crashes (bear with me on this metaphor will ya? :)

This is difficult to interpret:
  • Power lies not in normalization, but in predicate logic and set mathematics, closure, operation nestability and optimizability and data independence.
  • What is "proper design": 'normalized' as in 1NF, or "fully normalized" as in 5NF? Which of the two leads to "performance crashes" if it is not respected?
FP: Actually that is a misconception. Many believe that all the benefits from the relational model are in database design, namely ["fully"] normalized [5NF] databases. But without a true RDBMS--which SQL DBMSs are not--[to exploit the design] the relational advantage is drastically curtailed.
In other words, SQL DBMSs are not relational and, therefore, cannot take full advantage of relational databases, be they in 1NF (the minimal relational requirement), or in 5NF (to minimize integrity risks).
PS: I disagree with your assertion that it's a misconception. When the rubber hits the road, it's about logical I/O's - whether it's backed by a physical I/O or not. Normalization takes you down this path .. whether you realize it or not. On the CPU side, performant code is set-based rather than procedural. Certainly there are other areas one could continue to delve into but for your biggest bang for the buck, the above is where you'll see the most benefits. Edit: I have many years of /solving/ database performance issues, I've also worked on customer benchmarks and assisted on TPC-C's.
I am still not clear on what PS means by "Normalization takes you down this path". Suppose you have a choice between two design options:

1. One 1NF R-table:


that represents assignment data of employees to departments.

2. Two 5NF R-tables that represent department and employee data separately and assignments by a referential relationship:


The first design is normalized and the second is "fully normalized". Which of the two does PS mean by 'normalized' and which one takes you down what path?

More importantly, how do you know the performance effect from just the number of logical R-tables? Don't you need additional information to determine that (if so, what information?) Of course, we mean realistic R-tables, not this simple example, but it's the principle involved that counts.
FP: It is your prerogative to disagree, but that does not invalidate my argument. I am not questioning your experience, but your foundation knowledge, which is separate [from and] cannot be acquired by sheer experience. In fact, it's experience [that often] inhibits acquiring foundation knowledge. A driver who does not know the traffic laws, or how to drive, won't benefit much from how the rubber hits the road.
EM: What exactly did PS write that caused you to think he lacks specific foundational knowledge? I would also be curious to know what you think the "true RDBMS" products on the marketplace are?
In my comment I expressed discomfort with, first, the notion that SQL systems can be considered "optimized for normalized databases"; and second, with the performance focus on (logical) levels of normalization, rather than (physical) implementation factors. If these are not PS's misconceptions, they certainly are reinforcing the ones common in the industry. As explained in depth in The Costly Illusion: Normalization, Integrity and Performance, if you denormalize a 5NF SQL database and obtain performance gains--which is hardly guaranteed, certainly not for any and all applications--the gains do not come from denormalization per se, but from trading off integrity for them.
FP: Not many [are TRDBMSs]. You can find out by checking the list on my home page. The problem is precisely that the lack of foundation knowledge causes too many professionals to confuse SQL [and its commercial implementations] with the relational model and to blame the latter [including normalization] for poor performance and various other deficiencies. That would require to define here what foundation knowledge is, which is not possible to do justice in an online exchange. That is what I do in my courses and writings and it's easy to check on my site.
It is indeed, sad, that four decades after the publication of the relational model data professionals still cannot assess whether and which products are truly relational and are inclined to believe that I have my own personal criteria for such, which is what theory eliminated.

Moreover, [primary, if not exclusive] focus on physical aspects, without conceptual and logical considerations, is an indicator of poor foundation knowledge.
EM: I question how someone can assert from a few lines that another person writes how they lack foundational knowledge on the relational model? Especially when his initial comments had nothing whatsoever to do with the relational model. [You are] unable to explain how he is able to assert how someone lacks knowledge on the "relational model" when there is nothing in the discourse that has anything to do with the relational model.

1 or 2 simple questions: Do you believe all problems are ultimately relational in nature and can be solved by following the relational model? If yes to above, do you believe that a structured database can also be used to solve problems that a non-relational in nature? A simple yes or no will suffice. It will also help me to understand why you think you have actually answered my question regarding Pablo's and others "lack" of foundational knowledge regarding the relational model.
I leave it to the reader to judge whether my online response was adequate. Here I choose to amplify on it:
  • Indeed, sadly, very few can detect poor foundation knowledge, precisely because very few possess it. Having spent decades on the subject, I would like to hope I am one of those few. A pronouncement can be inconsistent with foundation knowledge regardless of its length.
  • The relational model is not all there is to foundation knowledge.
  • It would be nice if everything could be reduced to yes/no (it is precisely because more often than not reality is not that simple that the legal system forces this oversimplification if cases are to be won).
  • Indeed, "[A]ll problems are ultimately relational in nature and can be solved by following the relational model" in itself is a hint of poor grasp of what a data model in general and the relational data model in particular are. There is no such thing as "relational problems", or "relational data". Any segment of reality can be represented relationally. To the extent that the informational/analytical objective is to make logical inferences--i.e. derive facts that are logical implications of facts recorded in a database--then there is no alternative superior to the relational model, because it is nothing but logic applied to database management (see Business Modeling for Database Design for a set of criteria  for assessing whether a database technology's underlying data model--if at all well-defined--is superior to the relational model).
EM: You sir have been the only one talking about mistakes throughout this thread ... engage others to learn and grow. It is not yet clear to me what your motives are. If you expect to influence others I have found that generally it is a good idea to start by being civil and respectful towards them irrespective of whether you perceive them to be right or wrong. I suspect the next time we are discussing something that has nothing to do with the relational model and you try to steer us off course by engaging us in the sins of our lack of foundational knowledge on the relational model we will know that it is best to just ignore you and carry on the conversation "at our own peril".
FP: Comments about normalization are certainly within the relational framework and confusing levels of representation is certainly a mistake ... and I thought that one learns from mistakes if they are pointed out.
We relational proponents are criticized for being "purists". But precision and consistency of definitions and terminology are crucial in a field predicated on logic. For an example of what happens in their absence see Different types of DBMSs.

No comments:

Post a Comment

View My Stats