ON THE STATE OF THE INDUSTRY AND THE RELATIONAL SPIRIT
with Fabian Pascal

 

 

From: PL

To: Editor

 

I have followed both yours and Chris' writings for many years now. In fact I once arranged a lecture by Chris in Denmark, and most successful it was too.

 

But to the point, sadly, your observations of the state of the industry continue to be correct, and in my experience (20+ years now) continue to decline.  At my most recent client engagement, I spent almost two years arguing for the removal of code necessary to check/maintain integrity from applications for replacement by declarative integrity.  My task was impeded at every turn by the same illogical and provably inaccurate statements regarding performance and its relationship to data normalization and integrity definition that you have often written about.

 

Interestingly, many of the individuals so opposed to my proposals were those who should have known better, and despite being able to empirically demonstrate that their arguments were fallacious and logically invalid, my success was severely limited.  It was rather like being back in 1980 when almost no one knew any better.  Sadly, this was 2002, and the rise of the "I can program anything in C" brigade has meant that theory and knowledge has given way to the loudest voices with the cheapest hourly rate - despite the negative cost outcomes resulting therefrom.  In the battle between 20 "C" programmers with zero DB knowledge vs. 1 DBA with very good theoretical and practical knowledge, the winner is a foregone conclusion. Remarkable but true!

 

This was all rather sad as the database in question Oracle Rdb is probably the database technically best able to implement a completely normalized model with complete integrity declarations.  It has excellent SQL conformance, an outstanding optimizer and very, very sophisticated physical options that make it ideal for doing it right!  (It also does NULLs properly with all the good and bad that that entails - but one is not forced to use them).

 

Which leads me to observe that even you, Fabian, on occasion make general comments regarding physical relational DBMS implementations as if all databases are implemented that way.  This you rightly criticize in others, so I point it out for your own internal consistency.  For example, "R-tables are supposed to be sets, and sets have no order. Physical records – be they representing single base rows, partitioned base rows or joined base rows – have an order that a DBMS must preserve and rely on, and this limits optimization (sic). In other words, it is the order of records and fields in files, not the 1:1 relationship between logical rows and physical records and/or tables and files per se, that's the crux here, which is what I should have written in the article"

 

I should point out that as far as I am aware, Oracle Rdb does not have row order, either physically or logically.  If you want order, it must be specified in the query. It is common for output order to occur if the optimizer chosen access method is via a sorted index or projection is part of the query, but that is a by-product of the physical retrieval path chosen.  The internal storage techniques are very sophisticated and there is no reliance upon physical sequence of rows in a table.  There is of course a reliance on the physical order of columns within the table, so that structure can correctly be applied to the raw bits when realizing a query or validating a constraint, but this is almost certainly a forgivable fact of implementation.

 

Actually, Oracle Rdb (latest version 7.1) is a very sophisticated database engine with a remarkable array of physical deployment possibilities that are quite independent of the logical data description: co-incident row storage, vertical & horizontal partitioning and much more.  On the logical side, the concept of a domain has existed since 1984, as have constraints (assertions) and the killer IMHO, transactional DDL with meta-data versioning (until you have used a database with this feature, it is hard to understand how truly important this is).

 

As a demonstration for the programming hordes, I once demonstrated using Oracle Rdb, that complete database integrity (Primary and foreign keys, domains, cascading update/delete/null, business integrity rules etc) could be described and implemented independent of physical structure - no indexes.  They were aghast, as they were laboring under misapprehensions such that a primary key and a unique index were the same thing (too much MySQL, Access, SQL Server in their pasts I suspect).  Performance was a bit sluggish :-), but it was logically sound and the DBMS engine performed all operations correctly.

 

Might I humbly suggest that both you and Chris could improve your knowledge of "current products" by closely investigating this product?  I am sure that the engineers at Oracle Rdb engineering would be glad to provide you with the necessary software/hardware environment, documentation etc for this purpose.

 

So much for that, keep up the good work, though my experience suggests that you are fighting a battle that has been lost.  I think the reasons for this have nothing to do with your position, but with the elevation of poor thinking to "industry standard" and the general "dumbing down" of the population.  I am not sure to whom I should attribute blame for this sad state of affairs, but we could start with the universities, followed by the industry trade press and then vendors.

 

 

To: PL

From: Fabian Pascal

 

Good for you.

 

You are not telling me anything I don't know. It is rather depressing, but systemic problems cannot be resolved at the individual level. My efforts are not likely to change the industry, but they make it easier for those few independent, critical thinkers to cope with reality, by knowing that they are not wrong and alone. And to ridicule "pillars of industry" who are ignorant at best and/or dishonest at worst. That's all.

 

As I often say, the system not only does not reward good thinkers and knowledge, it actually punishes them; so what you describe is pretty predictable and would be, indeed, shocking, if reality were not what you describe.

 

By the way, this is not just an IT problem, but also a more profound societal problem: independent critical thinking is dangerous, which is why those who own the country make sure the system stifles it. They could not have gotten away, for example, with the last "election", if people could think independently and critically. The situation used to be better in Europe, but with the fall of the soviet system, there is ample Americanization everywhere, and it's only bound to get worse.

 

Well, a DBMS can either be SQL or relational, but not both. Rdb may have had advantages, but it was not the real R solution. SQL and the current implementation paradigm won't do. There's no way to "do NULLs properly" -- they violate 2VL on which R is based [see Chapter 10 in PRACTICAL ISSUES IN DATABASE MANAGEMENT]. Rdb may have had less implementation problems than other SQL systems, but not the logically correct solution described in my book.

 

You are confusing user-level order and system-level order. I was referring to the latter. Even though SQL DBMSs do not expose order to users, they are limited by their reliance on the physical order of columns internally, which inhibits optimization. The point I was making is that if DBMS were implemented in the true R spirit, they would be really set-oriented, which would lead to better physical data independence, which would, in turn, allow for better optimization and performance. This is exactly what the new implementation technology I have been alluding to does.

 

Perhaps, but [Rdb's way] it's not the ultimate way, only better than current products.

 

One of the most common fallacies in the industry, induced by lack of education and products that instill wrong practices.

 

I am sure Chris knows about Rdb. And in fact, I included some info on it in an old article on quota queries.

 

You are, of course, preaching to the choir. See above on the objective. I'd rather be doing this than do what everybody else is doing "more effectively", although I could do with more support.

 

 

From: PL

 

And some of the writers I have seen debunked by you are just that - ignorant (and some of them very arrogant as well).

 

Did you mean SQL or relational?

 

Well, I did not say is was R, just that it was " the database technically best able to implement a completely normalized model with complete integrity declarations" and that it had "excellent SQL conformance".  This makes it an SQL DBMS.

 

Declarative integrity definition is a defining feature of R databases and in this area  Oracle Rdb offers very good support.

 

As for NULLs, by "properly" I meant by SQL standard definition, something not true for many other databases that attempt to implement Nulls.  I also noted that "one is not forced to use them".

 

Oracle Rdb is an SQL database.  Since I have to make a living (I am unemployed now btw), I am happy enough to work with the less than ideal, especially since the ideal does not exists, the "ideal" being as it is, incompletely defined.

 

NULLs were for a long time part R.  I remember hearing Chris say once, that "Nulls created more problems than they were worth" and that the "R model would be better off without them because of all the extra complexity that impose on the query language" which at that time still included languages other than SQL.  What he did not say at the time was "that Null was logically incompatible with R".

 

As they say in Denmark "One has a position, until one takes a new".

 

I do not think I am confused, unless we are confusing order (the opposite of chaos) and order (sequence).  Oracle Rdb has internal order in the sense that it is not chaotic and all elements can be addressed, independently of each other.  However, there is no order (sequence) in the manner that table rows are stored.  Further, the internal addresses can and do change (for a variety of reasons) independently and/or because of the data values of the rows.  I do not wish to go into an internals discussion, but there is no sequence of rows involved for storage.  For example, there is no requirement to be at row (n) in order to find the location of row (n+1), which seems to me to be what you are implying (correct me if I am wrong).  Actually, it appears to me that there is a very high level of physical data independence in Oracle Rdb.  Perhaps you might explain to me what Oracle Rdb might lack in this area, as this is really one of the strong points of the product.

 

Agreed [on lack of education].

 

Is your [article on quota queries] online?

 

 

From: Fabian Pascal

 

It's not entirely the [writers'] problem, really; it's the system. If the top people in the field are ignorant and arrogant, it cannot be just their fault. We are talking US culture here, much more profound problems than IT.

 

I meant SQL.

 

It is possible to implement a normalized database in any system, including SQL. How it performs, that's a completely different issue.

 

You mean DBMSs [not databases]. No, it is a defining feature of DBMSs; if there is no integrity, it's not a DBMS; and if it's not declarative, it's more difficult and expensive to handle and ensure correctness.

 

The problem with NULLs is fundamental. SQL simply piles implementation problems on top of that. And it does so precisely because of the fundamental problem, which will unavoidably result in implementation errors.

 

Oracle Rdb is an SQL database.  Since I have to make a living (I am unemployed now btw), I am happy enough to work with the less than ideal, especially since the ideal does not exists, the "ideal" being as it is, incompletely defined.

 

That's a completely separate issue.

 

Nulls were for a long time part R.  I remember hearing Chris say once, that "Nulls created more problems than they were worth" and that the "R model would be better off without them because of all the extra complexity that impose on the query language" which at that time still included languages other than SQL.  What he did not say at the time was "that Null was logically incompatible with R".

 

You are right, but unlike others, Chris does adjust his thinking when he realizes it's necessary. He has even revised his position on the primacy of PKs. The fundamental mistake was actually Codd's -- probably the only major one he made -- but can be forgiven given his contribution. BTW, he still thinks I-marks and A-marks are OK, but even so he does not mean SQL NULLs.

 

As they say in Denmark "One has a position, until one takes a new".

 

As far as I know that has always been the basis for science; or at least, it ought to be. The GROUNDING of positions also matters.

 

Then you were confusing in that sense, because sequence is the only meaning pertinent here. And that is what I meant.

 

Unlikely [to be online]. It's in my book.

 

Posted 05/01/03

 

 

 

[ABOUT] [QUOTES] [LINKS]