A while ago Jonathan Leffler brought to my attention an
exchange about multi-value (MV) DBMSs in the news group comp.databases.pick,
where in response to a reference to my debunkings of MV technology (On "Multivalue"
Technology, MultiValue
Lacks Value, More
on "Multivalue" Databases), Dawn Wolthuis (DW) boasted:
I know a bit about relational theory. I once bought into it. I've learned a bit more. I'm not as gullible now.
So I asked Jonathan
Do me a favor and
post a message to that group and say I am asking her to post the link to our
exchange from 2003, and that I am still waiting for her “mathematical”
proof that MVDBs are relational, which she promised then.
You can also add
that I said Dawn is another Celko, throwing all sorts of fuzzy comments about
mathematics around to impress the uninformed, without anything to back her up.
Known technique,
and I am not gullible anymore either.
The following email exchange between DW and myself ensued.
She had posted the link to our dialog, but the dialog is no longer available on
her site. DW’s attachment is in the sidebar below, and the reader can form
her/his own opinion on whether it constitutes any proof for the relational
fidelity of MVDBMSs, let alone a mathematical one).
From: DW
Attached is what I attempted to send last year, but sent it
to the old address and it was returned.
I made a couple of quick changes today after re-reading it, but I'll
just send it as is, including the note I had at the top asking if you could
help me get it to a final draft (and I hope you will take me up on that).
Sorry I didn't get it to you last year, but after it bounced,
I tried it again a few days later and it bounced again, so I guessed that I had
made it to your "don't accept e-mail from this chick" list.
From: FP
I have a web site with contact info. All you had to do is
check it out [to find the new address].
[To recall from my writings:] The material I read on
multi-value systems focused almost exclusively on the structural aspects of MV
systems and said essentially nothing regarding the manipulative or integrity
aspects.
Sorry, but [your paper does the same, which] is a show
stopper, I don't need to read more. The whole point of structure is
integrity and manipulation. Anything that leaves that out does not merit
attention. That is precisely how MVDBMS proponents can ever come to the
conclusion that those systems are "better" than relational (they are
certainly not relational, and those who claim so do not understand the model,
nor the concept of relation-valued attributes, or RVAs).
Just FYI, check out a more realistic assessment by a MVDB
practitioner to be posted next week on my site.
From: DW
OK, then if you would kindly pass my responses as they are to
Chris Date, assuming he really wanted answers to his questions when he wrote
them.
From: FP
Sure. But knowing him, he won't read much either.
Is our old exchange still posted on the Net? I would like to
send it to him too.
From: DW
1. You let me know "in public" that I had not
passed you something I said I would.
2. When I got your new e-mail address and sent you the
materials I had tried to send last year, that you had said you would send to
Date in response to his paper, you let me know that you could not proofread
them for me. Fair enough - you are a
busy man and you do not need to help me out.
3. I asked you to pass them along to the author "as
is". You responded that he would
likely not read them either. He
published these questions in a document I paid for and he is likely unwilling
to read the free responses I am providing - is that correct?
Now what, pray tell, was the purpose of this little game Mr.
Pascal? It might surprise you to know
that I am also very busy, that I am not a mental peasant, and that I do have
areas of expertise, even if this is not yet one of them. I've been very, even excessively, patient
with your replies in the past in my attempt to learn from a master in the
field. Your responses seem to include
little substance nor assistance.
While you owe me nothing, I am one of your readers, so to the
extent that you have any interest at all in those who buy your books or read
your web site, you could at least put a better spin on blowing me off. I can accept that you are busy, but to
encourage me to send you something only to tell me that neither you nor Date
will read my response seems like a taunt and not a civil exchange, don't you
think?
I'll attach what I sent you earlier only for Jonathan's
benefit. I do so only so that a
seemingly "regular person" who might know how to treat me as if I,
too, were a human being, reads this exchange.
[I'll also ask you, Jonathan, to recognize that I did my best in
response to your group posting--you do not need to do so, but if you are
so inclined, you could let the audience know that I have not been negligent.]
If I have misunderstood your responses or over-reacted to
them (yes, I'm ticked off about this!), please correct my interpretation.
From: FP
Actually I really did not want you to pass me anything,
because I knew it would not be worth reading. The only reason I said anything
is because somebody sent me a reference to your exchange, where you were making
the same unsupported claims as before, and you had not produced what you had
promised to produce.
I help only when my effort can be helpful. I don't
think it will be in this case. Over the years I developed an eye for what's
worth getting into and what not. Sorry.
I was guessing, based on my knowledge of him. It's entirely
up to him and he may well read it, but how succinct he will be in his answer I
don't know. He tends to be much more forthcoming than I am, even when he does
not have different opinions than me. So disregard my guess. If he responds, I
will post his reply on the site.
That was your choice. Nobody has forced you to put any effort
into it, and nobody promised a response, or help. There is no game.
We judge the quality of somebody's mental quality not by
their opinion of themselves, but on [their pronouncements and] the material
they produce.
We are interested but, unfortunately, the quality of thinking
and knowledge in our area of expertise is at such low levels that if we
responded to all we get, we wouldn't be able to do anything else, and
maintaining the site is not readily justifiable even at a much lower effort,
let alone at a higher one. [Ed. Note: That’s
precisely why we produce books and papers.]
If for any reason you deem it not worthy to read the site or
purchase papers, that's entirely your prerogative.
From: FP
Here's Date’s response. It validates my initial opinion that
you don't know what you're talking about, and the briefness of his reply is
telling. Demanding that we read this nonsense, while you refuse to read the
material that would obviate the need for us to bother, is unreasonable. I think
that Chris was, as usual, unjustifyingly polite.
I thank Dawn Wolthuis for her attempts to educate me on the
topic of multivalue systems. However, I regret to say my position on the topic
remains unchanged; none of the questions I raised in a paper on DATABASE
DEBUNKINGS—What First
Normal Form Really Means—were answered satisfactorily. It does not seem
worth attempting a blow-by-blow response here. There's just one point: DW
claims that multivalue systems are "more relational" than relational
systems because their files, like relations in mathematics (and like SQL
tables, incidentally), have a left-to-right ordering to their attributes. But
relations in the relational model deliberately don't have such ordering. See
Codd's 1970 paper A Relational Model for Large Shared Data Banks, CACM
13, No. 6, June 1970; see also another of my papers on the site, A Sweet Disorder.
Ed. Comments: It so
happens that I have recently re-read Codd’s first two papers for an analysis
that will be published as a first paper in the newly re-launched PRACTICAL DATABASE FOUNDATIONS series (Truly
Relational: What It Really Means, Codd’s 1969-70 Papers). And I would like
to expand on Date’s comment regarding order, and debunk the claim that MV files
are relations.
Relations and attribute ordering: DW is correct in
that mathematical relations are ordered domain-wise. In his first (internal
IBM) paper in 1969 Codd preserved that order for database relations
because (a) he initially labeled attributes by their domain names, and he
foresaw the possibility of identical labels within the same relation (what we
today would refer to as attributes defined over the same domain) and (b) he
envisioned the array, rather than the table, as the structure representing a
relation in the database. The order was just a kind of labeling to guarantee logical
access to attributes. What he did not envision was the ensuing semantic
use of that order, which he did not mean to imply (and of which MV technology
is guilty). When he realized that, he explicitly dropped the order in
his 1970 paper, which was actually the first public expounding of the
relational idea. As we quote him in our above-mentioned forthcoming paper:
Users should not
normally be burdened with remembering the domain ordering of any relation …
Accordingly, we propose that users deal not with relations [the mathematical
concept] which are domain-ordered, but with relationships which are
their domain-unordered counterparts. To accomplish this, domains must be
uniquely identifiable at least within any given relation, without using
position. Thus, where there are two or more identical domains, we require in
each case that the domain name be qualified by a distinctive role name,
which serves to identify the role played by that domain in the given relation …
To sum up, it is proposed that most users should interact with a relational
model of the data consisting of a collection of time-varying relationships
(rather than relations) … [but] we shall not bother to distinguish between
relations and relationships…”
Incidentally, we have no idea what “function” in DW’s paper
has to do with anything.
First Normal Form (1NF): In his 1969 paper Codd
allowed for domains that had relations as values, or what we refer to as relation-valued
domains/attributes (RVD/RVA), “nested relations” for short. He considered RVDs
the relational equivalent of “repeating groups” in hierarchic/network systems,
which he deemed unnecessary complications. At the time Codd thought that RVDs
would necessitate second- rather than first-order logic as a
basis for the data language, which (a) is problematic, and (b) would complicate
implementation of relational systems (see our forthcoming FOUNDATIONS paper for
details). So in his 1970 paper he introduced the idea of eliminating nested
relations through a process of normalization, for which he lists some
benefits. The result is a collection of relations in their normal form,
what we today call 1NF. Otherwise put, Codd’s initial position was that
relations can be either defined over RVDs, or in 1NF (no
RVD/RVA).
As we explain in What First Normal Form
Really Means, it later turned out that RVDs/RVAs can be
supported within first-order logic and that, consequently, 1NF essentially
means that at every intersection of a tuple and attribute there is exactly
one value, which can be anything, including a relation. In this sense, a
relation is by definition in 1NF (otherwise put: there is no such thing
as an unnormalized relation). Note very carefully, however, that (a) values
must be atomic with respect with the operators defined for the respective
domains, whether they are relations or not (b) 1NF is only one
requirement of a relation (c) there are integrity and manipulation
components to the relational model.
MV proponents (including DW) often claim that because
RVDs/RVAs are compatible with the relational model, MV files are relations and,
therefore, MVDBMSs are relational. But as we demonstrate in What First Normal Form
Means Not, this is, of course, false on its face because:
·
MV files violate 1NF properly defined, nor do they
possess the other relation requirements either
·
true domains are not supported
·
neither the integrity component (if any), nor the
manipulation component of MVDBMSs are relational
What we also argue in our two papers is what Codd already
foresaw in 1970: their later found compatibility with the model
notwithstanding, RVAs add complexity, but no power (except perhaps some
convenience in rare, specialized cases). Otherwise put, the model does not
prohibit RVAs, but they are not a very good idea. This, however, has nothing to
do with MVDBMSs, because they are not relational in the first place, and they
don’t support RVAs in the second.
So her claim notwithstanding, DW does not really
understand relational theory and, having bought into MV products, is even more
gullible than she thinks.
=========================================================================
SIDEBAR: Dawn
Wolthuis “Proof” that MVDBMSs Are Relational
This is a draft
of responses to Date. I quickly typed
in (and didn’t proofread) the questions from his paper and did not put them in
double quotes – I simply highlighted my responses in blue like this. If you are willing, I would like your help
fixing this response so it makes sense to you and so you think it will make
sense to Date – in other words, would you be willing to proofread this for me
and point out anything I should write more clearly or precisely, for
example? If you can help me not sound
too ignorant or stupid, I would appreciate that since I don’t need to have both
Pascal AND Date know my name and think I lack intelligence ;-) After all, I think you guys are very bright,
even if occasionally incorrect, and usually arrogant. smiles. --dawn
APPENDIX D: SO WHAT ABOUT “MULTI-VALUE SYSTEMS”?
Note: Some of the
points raised in this appendix are discussed further in the companion paper by
Fabian Pascal.
In the body of this paper, in the section entitled
“Relation-Valued Attributes in Base Tables?—The Bad News,” I observed that some
people might try to claim that multi-value
systems effectively support relations with RVAs [relation-valued
attributes], and hence that such systems are really just relational systems
after all (except that those same people would probably also claim that such
systems are somehow “better” than relational systems, or at least SQL systems,
precisely because of that RVA support).
In this appendix, I’d like to respond to such claims. I’d like
to, I say—but I can’t, not really, because my attempts to educate myself
regarding multi-value systems from material available on the Web were an utter
failure (I had more questions when I’d finished reading than I did when I
started). So all I can do here is
sketch my limited understanding of what multi-value systems are, and then raise
(but not answer) what seem to me to be some pertinent questions. Note: For
the remainder of this appendix, I’ll use MVS
as an abbreviation for “multi-value system.”
First of all, then, an MVS database (also called a file)
Let me break in
a second and note that MV users typically use the word file for what would be similar to a relational database table. A database
in a MultiValue system is a set of files, typically all files within the same
namespace are considered to be in a single database, although if there are
multiple software applications running against that namespace, each with its
own files where there are no overlaps, then a site might suggest they have
multiple databases within that same namespace.
Typically an “account” in MultiValue terminology is similar to a
namespace and to a single schema in an RDBMS.
It is identified by a single VOC (Vocabulary, term is used by those from
the Prime Information flavors) or MD (Master Dictionary, term is used by more
traditional PICK flavors).
consists of a collection of records (also called items). Each record contains two or more field values (also called attribute values)—two or more, because
the system automatically prefixes every record as presented to it by the user
with an item-ID (unique at least
within the file), which becomes the first field valued in the record, and it’s
my belief that records as presented by the user must already contain at least
one field value.
I can see where
you might have guessed such from various descriptions, but this is not the
case. The item-ID for a record or item
is completely within the control of the user (the software developer). The storage and retrieval of data are
decoupled from user-defined-constraint handling in the MultiValue model. This has pros and cons. But if you think that the system is doing
something automatically that is unrelated to physical storage and retrieval of
data, it is likely an incorrect assumption.
In that respect, the database portion of the MultiValue system could
almost more accurately be classified as a file system. However, there are too many dbms features
that a file system lacks that do exist in MultiValue systems. When I first saw a PICK-like database, which
was after I saw Oracle and IMS, for example, I said something like “that’s not
a dbms!”
In typical
computer-speak, a developer would think of the record ID as the zeroth field
value in the record. Alternatively,
sometimes the record is considered field values 1 … n and the item-ID is not
always considered as part of the record.
As for records
having one field value, the only value required for storing a record is the
record-ID (item-ID). The rest of the
record may be missing or have NULL values.
I say “NULL value”, rather than “NULL” because PICK uses a 2VL and in a
MultiValue system, a NULL is a value – it can be logically modeled as a NULL
SET. If you compare two null values,
they are equal. So, minimally, an item
contains a non-null value for the item-ID (as specified by the user) and a
value for the (rest of the) record which maybe NULL, as in “null set”. There are some implementations that permit a
NULL value as the item-ID, but it would never be considered a good practice to
use that “feature”.
The following points arise immediately:
·
As the phrase “the first field value” suggests, MVS
fields are ordered left to right (and so MVS files are certainly not relations,
and the system is certainly not relational).
As you define
“relational” they would not match the definition. But if by “relational” you mean that one would use relations in
the logical model when using such systems, then they are relational. I have seen many mathematical definitions of
relations and they all define a relation as a set of ordered tuples. What mathematical (not derived database
jargon) definition of relation is there that claims that a mathematical
relation MUST NOT BE a set of ordered tuples rather than that it IS a set of
ordered tuples? It seems like the
definition of a mathematical relation has been completely confused in the
relational database community. So, in
this particular aspect, I would suggest that MultiValue files (which are
mathematical functions and thus relations) are MORE relational than those
implementations that have unordered tuples.
For the most part, the “location” of an attribute in the tuple is a way
of having a fixed name for that attribute.
Users may give a particular “location” new names and definitions – as
many logical definitions for that one attribute as they would like, but it
retains its location. Also, there are
operations, such as a matrix read, that do require the ordering. So, by your definition of “relational” as
well as by Codd's and others who have written on relational theory, the MultiValue
model does not meet those definitions – it meets the mathematical definition
instead.
·
Whether item-ID values are visible to the user is
unclear (if they’re not, then MVS files are certainly not relations---see
Appendix B—and the system is certainly not relational.
·
Whether item-ID fields can be updated by the user is
unclear.
Again, I’m
happy to agree that the term “relational database management system” really
doesn’t fit PICK, but item-IDs are completely visible to and maintained by the
user. Other than hashcodes and
information used for storage of data, there is no information that is not
available for CRUD access by the user.
·
Two records can be duplicates as far as the user is
concerned and yet distinct as far as the system is concerned (because they’re
given distinct item-ID values). The
full implications of this state of affairs are unclear.
Nope –nothing
so fancy or confusing as this is the case, but this must stem from the use of
the term “record” in multiple ways – when it is used to describe the “range” of
the function and exclude the domain (key), then yes, you could map two keys to
the same “record” in that sense. I
rarely hear the term “record” used that way by practitioners, unless they are using
what in at least one PICK implementation is referred to as the @ID and @RECORD
components – the first being the domain of the function (relation) and the
latter being the range. You can have 2
@RECORDs that are equal while their @IDs are different. That is like saying that you could have two
different values for a candidate key where the rest of the tuple is the same.
·
Item-ID values are hashed to determine where records
are physically stored (note the mixture of logical and physical considerations
here!). Whether records can be accessed
sequentially, using either physical or logical sequence, is unclear.
The hashed
business is something the database management system does – it is not something
the user concerns themselves with. So,
it is not part of the logical model at all and there is no reason to bring the
physical storage into the mix – it is under the covers. As for ordering of “reads” it is the case
that a user can specify an ordering for reads on the item-IDs, but not based on
any physical ordering of records. The
user typically reads without concern for the order and simply orders the
output, but alternatively can “select” item-IDs, prepare an order for them and
then read records in that order.
·
Does the collection of records in a given file
constitute a set? Or a bag? Or an array? Or a sequence? Or
what? Note that if it’s anything other
than a set, then the file is certainly not a relation, and the system is
certainly not relational.
A
function. Each file can be modeled as a
mathematical function, which by definition is therefore a relation, which is
therefore a set. And as mentioned
before, it is closer to the look of a mathematical relation than are the tables
in an rdbms.
·
What file-level operators exist? E.g. is there anything
analogous to join or union? Note that if
there are no such operators, then the system is certainly not relational.
For all
practical purposes, it is useful to think of all pre-packaged functions
(operators) as operating on sets of strings.
Files are one example of a set of strings in PICK, as are sets of files,
but most operators act on strings that are at the item/record level and on down
the tree, rather than at the file/function level or above that. I think it is accurate to suggest that
relation level operators are only a subset of those available. The model here is of functions (see these as
mathematical relations with designated keys) in a di-graph – that is, a web of
functions -- that provide us with trees of data.
Next, a given
MVS field can be “multi-valued”—that is, a given record can contain any number N of values of the same type in a give
field position, where N is either a
positive or non-negative integer (whether N
can be zero is unclear). In other
words, MVS fields can apparently contain “repeating groups.” Also, individual values can have
“subvalues,” which I think just means the values in question don’t have to be
scalar but can be what some languages call structs
(though the material I read was extremely unclear on this point; in fact, it
was almost certainly incorrect). Questions:
·
How do you do MVS database design?
·
MVS apparently allows one level of nesting. Does it allow two or more? If not, the system is certainly not
relational.
·
Can a given “multi-valued field value” be operated upon
as a single value, or does it always have to be operated on piece by piece, one
component value at a time? If the
latter, then the system is certainly not relational.
·
Is a given “multi-valued field value” a set of component values? Or a bag?
Or an array? Or a sequence? Or what?
To sum up: The
material I read on multi-value systems focused almost exclusively on the
structural aspects of such systems and said essentially nothing regarding the
manipulative or integrity aspects. And
even on the structural aspects, the material was very incomplete—not to say
confused. Thus, the only thing I feel
comfortable in saying is that, based on what I’ve seen of far, the chance of
multi-value systems being “truly relational,” let alone being “more relational
than SQL systems,” is vanishingly small.
In fact, it looks to me as if such systems will almost certainly be more
complex—complex for the user, that is, and probably for the DBA as well—than
true relational systems (or even SQL systems) should ever be.
Which is a
reason I tackled this – from what I had learned regarding relational database
theory and from what I have seen in practice, it was hard for me to believe
that the PICK model yielded such higher developer productivity as I was
experiencing with my teams. And, yet, I
am quite convinced that if a PICK developer and an Oracle, DB2, or SQL Server
developer were given the same task and the resulting system needed to work well
for the user, then the seasoned PICK developer would likely produce a system
sooner (for less cost) and that the resulting system would have a lower
maintenance cost as the business introduces changes over time. I’m trying to tap into just what it is about this platform that has, in the past, yielded more bang for the buck so that as
the industry advances we do not lose some of the agility present in PICK. I also suspect, but have not proven, that the underlying data model is key to this bigger bang for the buck. What I mean by “data model” is not the conceptual nor physical, but the logical – the mathematical representation of the predicates for which propositions will be stored.
Posted 4/22/05