From: PL
To: Editor
I have followed both yours and Chris' writings for many years
now. In fact I once arranged a lecture by Chris in Denmark, and most successful
it was too.
But to the point, sadly, your observations of the state of
the industry continue to be correct, and in my experience (20+ years now)
continue to decline. At my most recent
client engagement, I spent almost two years arguing for the removal of code
necessary to check/maintain integrity from applications for replacement by
declarative integrity. My task was
impeded at every turn by the same illogical and provably inaccurate statements
regarding performance and its relationship to data normalization and integrity
definition that you have often written about.
Interestingly, many of the individuals so opposed to my
proposals were those who should have known better, and despite being able to
empirically demonstrate that their arguments were fallacious and logically
invalid, my success was severely limited.
It was rather like being back in 1980 when almost no one knew any
better. Sadly, this was 2002, and the
rise of the "I can program anything in C" brigade has meant that
theory and knowledge has given way to the loudest voices with the cheapest
hourly rate - despite the negative cost outcomes resulting therefrom. In the battle between 20 "C"
programmers with zero DB knowledge vs. 1 DBA with very good theoretical and
practical knowledge, the winner is a foregone conclusion. Remarkable but true!
This was all rather sad as the database in question Oracle
Rdb is probably the database technically best able to implement a completely
normalized model with complete integrity declarations. It has excellent SQL conformance, an
outstanding optimizer and very, very sophisticated physical options that make
it ideal for doing it right! (It also
does NULLs properly with all the good and bad that that entails - but one is
not forced to use them).
Which leads me to observe that even you, Fabian, on occasion
make general comments regarding physical relational DBMS implementations as if
all databases are implemented that way.
This you rightly criticize in others, so I point it out for your own
internal consistency. For example, "R-tables
are supposed to be sets, and sets have no order. Physical records – be they
representing single base rows, partitioned base rows or joined base rows – have
an order that a DBMS must preserve and rely on, and this limits optimization
(sic). In other words, it is the order of records and fields in files, not the
1:1 relationship between logical rows and physical records and/or tables and
files per se, that's the crux here, which is what I should have written in the
article"
I should point out that as far as I am aware, Oracle Rdb does
not have row order, either physically or logically. If you want order, it must be specified in the query. It is
common for output order to occur if the optimizer chosen access method is via a
sorted index or projection is part of the query, but that is a by-product of
the physical retrieval path chosen. The
internal storage techniques are very sophisticated and there is no reliance
upon physical sequence of rows in a table.
There is of course a reliance on the physical order of columns within
the table, so that structure can correctly be applied to the raw bits when
realizing a query or validating a constraint, but this is almost certainly a
forgivable fact of implementation.
Actually, Oracle Rdb (latest version 7.1) is a very
sophisticated database engine with a remarkable array of physical deployment
possibilities that are quite independent of the logical data description:
co-incident row storage, vertical & horizontal partitioning and much more. On the logical side, the concept of a domain
has existed since 1984, as have constraints (assertions) and the killer IMHO,
transactional DDL with meta-data versioning (until you have used a database
with this feature, it is hard to understand how truly important this is).
As a demonstration for the programming hordes, I once
demonstrated using Oracle Rdb, that complete database integrity (Primary and
foreign keys, domains, cascading update/delete/null, business integrity rules
etc) could be described and implemented independent of physical structure - no
indexes. They were aghast, as they were
laboring under misapprehensions such that a primary key and a unique index were
the same thing (too much MySQL, Access, SQL Server in their pasts I
suspect). Performance was a bit sluggish
:-), but it was logically sound and the DBMS engine performed all operations
correctly.
Might I humbly suggest that both you and Chris could improve
your knowledge of "current products" by closely investigating this
product? I am sure that the engineers
at Oracle Rdb engineering would be glad to provide you with the necessary
software/hardware environment, documentation etc for this purpose.
So much for that, keep up the good work, though my experience
suggests that you are fighting a battle that has been lost. I think the reasons for this have nothing to
do with your position, but with the elevation of poor thinking to
"industry standard" and the general "dumbing down" of the
population. I am not sure to whom I
should attribute blame for this sad state of affairs, but we could start with
the universities, followed by the industry trade press and then vendors.
To: PL
From: Fabian Pascal
Good for you.
You are not telling me anything I don't know. It is rather
depressing, but systemic problems cannot be resolved at the individual level.
My efforts are not likely to change the industry, but they make it easier for
those few independent, critical thinkers to cope with reality, by knowing that
they are not wrong and alone. And to ridicule "pillars of industry"
who are ignorant at best and/or dishonest at worst. That's all.
As I often say, the system not only does not reward good
thinkers and knowledge, it actually punishes them; so what you describe is
pretty predictable and would be, indeed, shocking, if reality were not what you
describe.
By the way, this is not just an IT problem, but also a more
profound societal problem: independent critical thinking is dangerous, which is
why those who own the country make sure the system stifles it. They could not
have gotten away, for example, with the last "election", if people
could think independently and critically. The situation used to be better in
Europe, but with the fall of the soviet system, there is ample Americanization
everywhere, and it's only bound to get worse.
Well, a DBMS can either be SQL or relational, but not both.
Rdb may have had advantages, but it was not the real R solution. SQL and the
current implementation paradigm won't do. There's no way to "do NULLs
properly" -- they violate 2VL on which R is based [see Chapter 10 in PRACTICAL ISSUES IN
DATABASE MANAGEMENT]. Rdb may have had less implementation problems
than other SQL systems, but not the logically correct solution described in my
book.
You are confusing user-level order and system-level order. I
was referring to the latter. Even though SQL DBMSs do not expose order to
users, they are limited by their reliance on the physical order of columns
internally, which inhibits optimization. The point I was making is that if DBMS
were implemented in the true R spirit, they would be really set-oriented, which
would lead to better physical data independence, which would, in turn, allow
for better optimization and performance. This is exactly what the new
implementation technology I have been alluding to does.
Perhaps, but [Rdb's way] it's not the ultimate way, only
better than current products.
One of the most common fallacies in the industry, induced by
lack of education and products that instill wrong practices.
I am sure Chris knows about Rdb. And in fact, I included some
info on it in an old article on quota queries.
You are, of course, preaching to the choir. See above on the
objective. I'd rather be doing this than do what everybody else is doing
"more effectively", although I could do with more support.
From: PL
And some of the writers I have seen debunked by you are just
that - ignorant (and some of them very arrogant as well).
Did you mean SQL or relational?
Well, I did not say is was R, just that it was " the
database technically best able to implement a completely normalized model with
complete integrity declarations" and that it had "excellent SQL
conformance". This makes it an SQL
DBMS.
Declarative integrity definition is a defining feature of R
databases and in this area Oracle Rdb
offers very good support.
As for NULLs, by "properly" I meant by SQL standard
definition, something not true for many other databases that attempt to
implement Nulls. I also noted that
"one is not forced to use them".
Oracle Rdb is an SQL database. Since I have to make a living (I am unemployed now btw), I am
happy enough to work with the less than ideal, especially since the ideal does
not exists, the "ideal" being as it is, incompletely defined.
NULLs were for a long time part R. I remember hearing Chris say once, that "Nulls created more
problems than they were worth" and that the "R model would be better
off without them because of all the extra complexity that impose on the query
language" which at that time still included languages other than SQL. What he did not say at the time was
"that Null was logically incompatible with R".
As they say in Denmark "One has a position, until one
takes a new".
I do not think I am confused, unless we are confusing order
(the opposite of chaos) and order (sequence).
Oracle Rdb has internal order in the sense that it is not chaotic and
all elements can be addressed, independently of each other. However, there is no order (sequence) in the
manner that table rows are stored.
Further, the internal addresses can and do change (for a variety of
reasons) independently and/or because of the data values of the rows. I do not wish to go into an internals
discussion, but there is no sequence of rows involved for storage. For example, there is no requirement to be
at row (n) in order to find the location of row (n+1), which seems to me to be
what you are implying (correct me if I am wrong). Actually, it appears to me that there is a very high level of
physical data independence in Oracle Rdb.
Perhaps you might explain to me what Oracle Rdb might lack in this area,
as this is really one of the strong points of the product.
Agreed [on lack of education].
Is your [article on quota queries] online?
From: Fabian Pascal
It's not entirely the [writers'] problem, really; it's the
system. If the top people in the field are ignorant and arrogant, it cannot be
just their fault. We are talking US culture here, much more profound problems
than IT.
I meant SQL.
It is possible to implement a normalized database in any
system, including SQL. How it performs, that's a completely different issue.
You mean DBMSs [not databases]. No, it is a defining feature
of DBMSs; if there is no integrity, it's not a DBMS; and if it's not
declarative, it's more difficult and expensive to handle and ensure
correctness.
The problem with NULLs is fundamental. SQL simply piles
implementation problems on top of that. And it does so precisely because of the
fundamental problem, which will unavoidably result in implementation errors.
Oracle Rdb is an SQL database. Since I have to make a living (I am unemployed now btw), I am
happy enough to work with the less than ideal, especially since the ideal does
not exists, the "ideal" being as it is, incompletely defined.
That's a completely separate issue.
Nulls were for a long time part R. I remember hearing Chris say once, that "Nulls created more
problems than they were worth" and that the "R model would be better
off without them because of all the extra complexity that impose on the query
language" which at that time still included languages other than SQL. What he did not say at the time was
"that Null was logically incompatible with R".
You are right, but unlike others, Chris does adjust his
thinking when he realizes it's necessary. He has even revised his position on
the primacy of PKs. The fundamental mistake was actually Codd's -- probably the
only major one he made -- but can be forgiven given his contribution. BTW, he
still thinks I-marks and A-marks are OK, but even so he does not mean SQL
NULLs.
As they say in Denmark "One has a position, until one
takes a new".
As far as I know that has always been the basis for science;
or at least, it ought to be. The GROUNDING of positions also matters.
Then you were confusing in that sense, because sequence is
the only meaning pertinent here. And that is what I meant.
Unlikely [to be online]. It's in my book.
Posted
05/01/03
[ABOUT]
[QUOTES]
[LINKS]