From: RM
To: Editor
Date: 16 Mar 2004
Research papers abound from the Semantic, Topic Map and RDF
camps that claim relational binary database design is the superset of n-ary or
the true relational model.
The only reference to relational binary that I could find on
your site was mentioned by C.J. Date in passing. He mentioned that Codd showed
that n-ary and even 0-ary relations have unique and important properties that
would in fact make n-ary the superset.
Could this be a good topic for a Foundation Paper?
I and maybe others would benefit greatly
from a short description and example of what 0-ary, unary, binary etc. means
and what each looks like as implemented in a TRDBMS schema.
More importantly, however, would be an explanation of the
hopefully overarching principle of relational algebra and predicate logic
establishing that the root of this discipline is in the "ary" and not
in the "n" or the "bi".
That is to say that relations are the fundamental and not the quantity
of members involved.
I am sure that you can understand my desire to know exactly
where ground zero lies and to be sure that I am standing firmly on it.
From: Fabian Pascal
To: RM
I think you've got the gist of it yourself. The issue is not
what is the superset, but whether it is the most convenient/useful
representation. Codd came to the conclusion not.
Anyway, I would not expect those sources you mention worthy
of paying attention to. They don't even know what data fundamentals means, let
alone what a superset is.
I will forward your message to Chris and let him decide
whether a paper is warranted.
From: RM
I agree that convenience and ease of use are important.
However, I suspect that just as there is a
clear separation of physical and logic concerns there is a clear separation of
logical data structures and useful representation. Thus the wide use of tables.
That being said, I think n-array indeed be the superset for
obvious reasons. I want to show you the
punch line from a paper pitting binary against n-ary
(s,j), (s,p), (p,j) != (s, p, j)
The details are not significant, but the arguments that the
binary on the left has more detail than the n-ary on the right.
It is true that the binary is more granular. It has more relations, more
predicates.
However, it seems to be a question of schema and not general
data model. I can think of cases where
the fine detail described on the left fits requirement. What I feel is being
ignored in this case is the many, many times when and n-ary relation is the
only thing that is useful and even right.
Anyone unfortunate enough to have used Sentences'
"Associative Model of Data" will know the most basic problem with a
pure binary system. Call it the Person
problem. A person has one value in a
binary relation. So what does a person
look like? Option 1) (person,
"Richard R. McKinley"). One
is then forced to parse whenever first, last or middle names are needed
individually. Option 2) (person,
"Richard"), (person, last_name), (last_name, "McKinley"),
(person, middle_name), (middle_name, "R."). While this schema is
certainly more granular it is wrong.
To borrow from object-orientated terminology, when we map to
the real world the binary in this case does not hold up. To properly identify
an individual human
being by name we very much think and reason in terms of person (first_name,
middle_name, last_name, etc.). In a small group of friends may say "John?
Which? John Smith or John Jones?"
In the requirements of most applications, however, a person’s various
names are important taken as a whole and being available separately.
They relate to one predicate, if I
understand the term, but are available for individual inspect and
consideration.
I believe this shows that n-ary matters.
There is nothing stopping one from using
binary relations for a portion of the schema in a TRDBMS. In some cases it may
be the thing that works best. In many
cases, however, binary is wrong and does not represent the world of discourse
in which we do our thinking and understanding.
If I'm wrong then I'll start all introductions as, "This is
Matt. Matt has a last name. That last name is Brown. Matt has a middle name..."
Sentences indeed! If a data model forces us to record information in a format more
difficult to use than human language then perhaps it is not a good choice.
You are right about that.
Every time one reads such papers, the mental red pen comes out to make
all the corrections just to be able to understand what they are trying to say.
From: FP
This is not a "separation", but rather certain
details made explicit. If you violate the former you get drawbacks/costs (in
large part because you contaminate logic with irrelevant considerations; if you
do the latter, you get benefits and no costs).
Precisely. We are already on record that which propositions
are axioms (database) and which theorems (derived) is essentially arbitrary,
there is no theoretical criterion by which to decide.
Incidentally, my next paper on normalization and possibly a
later one by Chris may be of interest.
As to Sentences, the few pieces I published on Williams and
his so-called AMD rather speak for themselves. That's what happens when
products are developed by simple minds, without a grasp of data fundamentals.
Codd was just "a tad" more knowledgeable and smart than Williams,
wouldn't you say?
C. J. Date Responds: Herewith a very brief response to
your question regarding n-ary vs. binary (etc.) relations. I think there are at
least two distinct issues here that often get confused:
1.
There’s no question that we need n-ary relations in general
(for arbitrary nonnegative integer n). In particular, even if all base
relations are binary—see point 2 following—as soon as we join two of them
together, we get a ternary result; if we join that result to another binary
relation, we get a result of degree four, and so on. (Please understand that I
am speaking pretty loosely here.) Likewise, if we project one of those binary
relations over one attribute, we get a result of degree one, and so on. Thus,
the relational model quite rightly deals with n-ary relations, not just binary
ones.
2.
The foregoing point notwithstanding, there are actually some
pretty strong arguments in favor of making all base relations binary. Well, not
binary exactly, but irreducible, rather; an n-ary relation is
irreducible if it can’t be non-loss decomposed into two or more projections
each of degree less than n. (In practice, irreducible relations often are
binary, a fact that might account for part of the confusion I mentioned;
however, some irreducible relations are not binary and some binary relations
are not irreducible.) But the question
of whether base relations should be irreducible is a database design
question, not a relational model question.
Database design theory—the stuff about join dependencies and further
normalization, for example [Ed. Note:
See the forthcoming DATABASE
FOUNDATIONS paper #6 The Costly Illusion: Further
Normalization, Integrity and Performance]—is a separate theory that builds
on top of the relational model but is not itself part of that model.
Hope this helps. I might try to write a short paper soon
elaborating on these issues.
Posted
05/14/04