From: AD
To: Editor
First off, I want to
congratulate all of you at dbdebunk.com for providing an invaluable
service. (Valuable at least to those
that can appreciate it!) I myself,
having come from an application programming background, was a little rusty on
the true power of relational theory, especially given that the existing
implementations (i.e., SQL DBMS s) leave so much to be desired. In fact, connecting databases to
set theory
was an Aha! That should have occurred to me years ago. Now, after several
months of consuming both the website content and several books (while
simultaneously developing a data intensive application at work), I m feeling
much better about my own understanding of the relational theory.
However, there are
some points I am finding myself in disagreement with. I m not convinced that
the distinction between variables and values is as pronounced as it has been
claimed. For example, the statement:
x := 2
is logically equivalent to a statement like:
UPDATE storage
SET value = 2
WHERE location = x
I.e., variable update
is synonymous with tuple update. The
fact that locations may happen to map to memory addresses is purely a physical
implementation detail, and has nothing to do with the logical meaning. In fact, variable assignment
means exactly
remove one association from the system and add another. And the association itself is a value.
Perhaps the only thing that could uniquely
be considered a variable is the whole system.
So if we extend this
to more complex objects, we can map the following:
anEmployee.setDeparment(aDepartment)
to
UPDATE employee
SET department_id = aDepartment
WHERE employee_id = anEmployee
Claims to the contrary
notwithstanding, it is clear that the simplest, most direct, and most complete
mapping between Object and Relational systems is:
·
class = relvar
·
instance = tuple
·
field = attribute
·
object id = primary key
·
pointer = foreign key
To repeat: the fact
that locations may happen to map to memory addresses is purely a physical
implementation detail, and has nothing to do with the logical meaning. The biggest failure of
object systems is, in
fact, the utter lack of a query mechanism and this is the source of all the
pointer chasing. (Please note: nothing
written here should be misconstrued as arguments for Objects instead of
Relational or any similar nonsense. I
am a firm proponent of relational systems.
My point is that the gulf is not nearly so wide as it is sometimes
portrayed.)
Further I would argue
that introducing domains to relational theory is, in fact, a serious
misstep. The relational model needs no
extension. (Or perversion!) The clearest indications of this are:
·
any complex entity can be represented EITHER by a
relation OR by a domain (duplication of functionality)
·
any read-only operator (i.e., a function ) can be
replaced by an equivalent relation and a join (duplication of functionality)
·
subtyping by specialization can easily be defined via
views
Note that some
relations may be virtual and predefined (relconsts). I.e., integer(x) could be a predefined
relation. So could greater_than(x, y), which could
also include the built-in constraint greater_than(x, y) -> integer(x) AND
integer(y) (or not, if a polymorphic relation was desired).
So, there are (at
most) only two domains that are required: ID, and STRING (and the second one is
only really necessary for a text-based interactive user session in application
environments, the system can map other types to and from STRING
externally). And there is one
constraint needed: primary keys on type tables are unique across all type
tables (this is how strong typing is achieved it could be relaxed slightly to
allow inheritance).
I think adopting a
pure relational model is so much cleaner than rehashing all the same issues
with domains. (In fact, I couldn't even
get all the way through THE THIRD MANIFESTO, because I so vehemently
disagree with the initial premises.)
I would very much appreciate feedback on these points. I think the database
industry is very much
in flux right now, and it is imperative that energies be focused in the right
places to affect the right kind of change.
Chris Date Responds: I found this message something of
a curate's egg: Right on several
counts, and yet very wrong on several counts too. I don't think I want to get into a blow-by-blow
response here;
instead let me just offer the following comments.
Overall, I think Dommasch is thinking "too close to the
metal." It's true, close to the
metal, that we have just storage. It's
also true, close to the metal, that whether we treat a particular piece of
storage as containing the same thing at all times ("constants") or
different things at different times ("variables") is up to us (how we
manage that storage). And so on. (See
the article Why 'the Object Model' Is Not a Data Model in my book RELATIONAL DATABASE
WRITINGS 1994-1997)
But: It seems to me that Dommasch's whole argument is
like the argument that says all we need is bits: We can do everything we want in terms of
bits. Ultimately, of course, this
argument is correct, as is demonstrated by the very fact that digital computers
exist at all. But what we do in
practice is group useful bundles of bit-based concepts to form higher-level
abstractions (e.g., character strings, decimal numbers). Then we group those again to form
still
higher-level abstractions (e.g., records, files). Then we group those again to form still
higher-level abstractions
(e.g., directories, databases). Then
... but you get the picture. Thus, I
contend that:
Ø Types,
values, variables, and operators are useful high-level abstractions. We learned this with Fortran
and it's still
true today, nearly 50 years later.
Ø In
particular, relation types, relation values, relation variables, and relational
operators are very useful abstractions.
We learned this with the relational model and it's still true today,
over 30 years later. [Ed. Comment: Weeellll, most have not learned
this, even though it’s still true today].
High-level abstractions (at least, good ones) are useful
because they raise the level of discourse, they allow us to focus on problems
more closely related to the real problems of the real world without
getting bogged down in irrelevant details, and (just incidentally) they are
probably easier to implement efficiently. The entire history of the computing
field could be characterized as a search forever higher, ever more useful
abstractions.
PS: One assertion of Dommasch's that I disagree with quite
strongly is that relations and domains represent "duplication of
functionality." See Appendix C of FOUNDATION FOR
FUTURE DATABASE SYSTEMS: THE THIRD MANIFESTO, where this nonissue is discussed
in some detail.
Posted
10/04/02
[ABOUT]
[QUOTES]
[LINKS]