DAWN WOLTHUIS’ “PROOF”
by Fabian Pascal

 

 

 

A while ago Jonathan Leffler brought to my attention an exchange about multi-value (MV) DBMSs in the news group comp.databases.pick, where in response to a reference to my debunkings of MV technology (On "Multivalue" Technology, MultiValue Lacks Value, More on "Multivalue" Databases), Dawn Wolthuis (DW) boasted:

 

I know a bit about relational theory.  I once bought into it. I've learned a bit more.  I'm not as gullible now.

 

So I asked Jonathan

 

   Do me a favor and post a message to that group and say I am asking her to post the link to our exchange from 2003, and that I am still waiting for her “mathematical” proof that MVDBs are relational, which she promised then.

   You can also add that I said Dawn is another Celko, throwing all sorts of fuzzy comments about mathematics around to impress the uninformed, without anything to back her up.

   Known technique, and I am not gullible anymore either.

 

The following email exchange between DW and myself ensued. She had posted the link to our dialog, but the dialog is no longer available on her site. DW’s attachment is in the sidebar below, and the reader can form her/his own opinion on whether it constitutes any proof for the relational fidelity of MVDBMSs, let alone a mathematical one).

 

 

From: DW

 

Attached is what I attempted to send last year, but sent it to the old address and it was returned.  I made a couple of quick changes today after re-reading it, but I'll just send it as is, including the note I had at the top asking if you could help me get it to a final draft (and I hope you will take me up on that).

 

Sorry I didn't get it to you last year, but after it bounced, I tried it again a few days later and it bounced again, so I guessed that I had made it to your "don't accept e-mail from this chick" list.

 

 

From: FP

 

I have a web site with contact info. All you had to do is check it out [to find the new address].

 

[To recall from my writings:] The material I read on multi-value systems focused almost exclusively on the structural aspects of MV systems and said essentially nothing regarding the manipulative or integrity aspects. 

 

Sorry, but [your paper does the same, which] is a show stopper, I don't need to read more. The whole point of structure is integrity and manipulation. Anything that leaves that out does not merit attention. That is precisely how MVDBMS proponents can ever come to the conclusion that those systems are "better" than relational (they are certainly not relational, and those who claim so do not understand the model, nor the concept of relation-valued attributes, or RVAs).

 

Just FYI, check out a more realistic assessment by a MVDB practitioner to be posted next week on my site.

 

 

From: DW

 

OK, then if you would kindly pass my responses as they are to Chris Date, assuming he really wanted answers to his questions when he wrote them.

 

 

From: FP

 

Sure. But knowing him, he won't read much either.

 

Is our old exchange still posted on the Net? I would like to send it to him too.

 

 

From: DW

 

1. You let me know "in public" that I had not passed you something I said I would. 

 

2. When I got your new e-mail address and sent you the materials I had tried to send last year, that you had said you would send to Date in response to his paper, you let me know that you could not proofread them for me.  Fair enough - you are a busy man and you do not need to help me out.

 

3. I asked you to pass them along to the author "as is".  You responded that he would likely not read them either.  He published these questions in a document I paid for and he is likely unwilling to read the free responses I am providing - is that correct?

 

Now what, pray tell, was the purpose of this little game Mr. Pascal?  It might surprise you to know that I am also very busy, that I am not a mental peasant, and that I do have areas of expertise, even if this is not yet one of them.  I've been very, even excessively, patient with your replies in the past in my attempt to learn from a master in the field.  Your responses seem to include little substance nor assistance. 

 

While you owe me nothing, I am one of your readers, so to the extent that you have any interest at all in those who buy your books or read your web site, you could at least put a better spin on blowing me off.  I can accept that you are busy, but to encourage me to send you something only to tell me that neither you nor Date will read my response seems like a taunt and not a civil exchange, don't you think?

 

I'll attach what I sent you earlier only for Jonathan's benefit.  I do so only so that a seemingly "regular person" who might know how to treat me as if I, too, were a human being, reads this exchange.  [I'll also ask you, Jonathan, to recognize that I did my best in response to your group posting--you do not need to do so, but if you are so inclined, you could let the audience know that I have not been negligent.]

 

If I have misunderstood your responses or over-reacted to them (yes, I'm ticked off about this!), please correct my interpretation.

 

 

From: FP

 

Actually I really did not want you to pass me anything, because I knew it would not be worth reading. The only reason I said anything is because somebody sent me a reference to your exchange, where you were making the same unsupported claims as before, and you had not produced what you had promised to produce.

 

I help only when my effort can be helpful. I don't think it will be in this case. Over the years I developed an eye for what's worth getting into and what not. Sorry.

 

I was guessing, based on my knowledge of him. It's entirely up to him and he may well read it, but how succinct he will be in his answer I don't know. He tends to be much more forthcoming than I am, even when he does not have different opinions than me. So disregard my guess. If he responds, I will post his reply on the site.

 

That was your choice. Nobody has forced you to put any effort into it, and nobody promised a response, or help. There is no game.

 

We judge the quality of somebody's mental quality not by their opinion of themselves, but on [their pronouncements and] the material they produce.

 

We are interested but, unfortunately, the quality of thinking and knowledge in our area of expertise is at such low levels that if we responded to all we get, we wouldn't be able to do anything else, and maintaining the site is not readily justifiable even at a much lower effort, let alone at a higher one. [Ed. Note: That’s precisely why we produce books and papers.]

 

If for any reason you deem it not worthy to read the site or purchase papers, that's entirely your prerogative.

 

 

From: FP

 

Here's Date’s response. It validates my initial opinion that you don't know what you're talking about, and the briefness of his reply is telling. Demanding that we read this nonsense, while you refuse to read the material that would obviate the need for us to bother, is unreasonable. I think that Chris was, as usual, unjustifyingly polite.

 

I thank Dawn Wolthuis for her attempts to educate me on the topic of multivalue systems. However, I regret to say my position on the topic remains unchanged; none of the questions I raised in a paper on DATABASE DEBUNKINGS—What First Normal Form Really Means—were answered satisfactorily. It does not seem worth attempting a blow-by-blow response here. There's just one point: DW claims that multivalue systems are "more relational" than relational systems because their files, like relations in mathematics (and like SQL tables, incidentally), have a left-to-right ordering to their attributes. But relations in the relational model deliberately don't have such ordering. See Codd's 1970 paper A Relational Model for Large Shared Data Banks, CACM 13, No. 6, June 1970; see also another of my papers on the site, A Sweet Disorder.

 

Ed. Comments: It so happens that I have recently re-read Codd’s first two papers for an analysis that will be published as a first paper in the newly re-launched PRACTICAL DATABASE FOUNDATIONS series (Truly Relational: What It Really Means, Codd’s 1969-70 Papers). And I would like to expand on Date’s comment regarding order, and debunk the claim that MV files are relations.

 

Relations and attribute ordering: DW is correct in that mathematical relations are ordered domain-wise. In his first (internal IBM) paper in 1969 Codd preserved that order for database relations because (a) he initially labeled attributes by their domain names, and he foresaw the possibility of identical labels within the same relation (what we today would refer to as attributes defined over the same domain) and (b) he envisioned the array, rather than the table, as the structure representing a relation in the database. The order was just a kind of labeling to guarantee logical access to attributes. What he did not envision was the ensuing semantic use of that order, which he did not mean to imply (and of which MV technology is guilty). When he realized that, he explicitly dropped the order in his 1970 paper, which was actually the first public expounding of the relational idea. As we quote him in our above-mentioned forthcoming paper:

 

Users should not normally be burdened with remembering the domain ordering of any relation … Accordingly, we propose that users deal not with relations [the mathematical concept] which are domain-ordered, but with relationships which are their domain-unordered counterparts. To accomplish this, domains must be uniquely identifiable at least within any given relation, without using position. Thus, where there are two or more identical domains, we require in each case that the domain name be qualified by a distinctive role name, which serves to identify the role played by that domain in the given relation … To sum up, it is proposed that most users should interact with a relational model of the data consisting of a collection of time-varying relationships (rather than relations) … [but] we shall not bother to distinguish between relations and relationships…”

 

Incidentally, we have no idea what “function” in DW’s paper has to do with anything.

 

First Normal Form (1NF): In his 1969 paper Codd allowed for domains that had relations as values, or what we refer to as relation-valued domains/attributes (RVD/RVA), “nested relations” for short. He considered RVDs the relational equivalent of “repeating groups” in hierarchic/network systems, which he deemed unnecessary complications. At the time Codd thought that RVDs would necessitate second- rather than first-order logic as a basis for the data language, which (a) is problematic, and (b) would complicate implementation of relational systems (see our forthcoming FOUNDATIONS paper for details). So in his 1970 paper he introduced the idea of eliminating nested relations through a process of normalization, for which he lists some benefits. The result is a collection of relations in their normal form, what we today call 1NF. Otherwise put, Codd’s initial position was that relations can be either defined over RVDs, or in 1NF (no RVD/RVA).

 

As we explain in What First Normal Form Really Means, it later turned out that RVDs/RVAs can be supported within first-order logic and that, consequently, 1NF essentially means that at every intersection of a tuple and attribute there is exactly one value, which can be anything, including a relation. In this sense, a relation is by definition in 1NF (otherwise put: there is no such thing as an unnormalized relation). Note very carefully, however, that (a) values must be atomic with respect with the operators defined for the respective domains, whether they are relations or not (b) 1NF is only one requirement of a relation (c) there are integrity and manipulation components to the relational model.

 

MV proponents (including DW) often claim that because RVDs/RVAs are compatible with the relational model, MV files are relations and, therefore, MVDBMSs are relational. But as we demonstrate in What First Normal Form Means Not, this is, of course, false on its face because:

 

·   MV files violate 1NF properly defined, nor do they possess the other relation requirements either

·   true domains are not supported

·   neither the integrity component (if any), nor the manipulation component of MVDBMSs are relational

 

What we also argue in our two papers is what Codd already foresaw in 1970: their later found compatibility with the model notwithstanding, RVAs add complexity, but no power (except perhaps some convenience in rare, specialized cases). Otherwise put, the model does not prohibit RVAs, but they are not a very good idea. This, however, has nothing to do with MVDBMSs, because they are not relational in the first place, and they don’t support RVAs in the second.

 

So her claim notwithstanding, DW does not really understand relational theory and, having bought into MV products, is even more gullible than she thinks.

 

 

=========================================================================

SIDEBAR: Dawn Wolthuis “Proof” that MVDBMSs Are Relational

 

This is a draft of responses to Date.  I quickly typed in (and didn’t proofread) the questions from his paper and did not put them in double quotes – I simply highlighted my responses in blue like this.  If you are willing, I would like your help fixing this response so it makes sense to you and so you think it will make sense to Date – in other words, would you be willing to proofread this for me and point out anything I should write more clearly or precisely, for example?  If you can help me not sound too ignorant or stupid, I would appreciate that since I don’t need to have both Pascal AND Date know my name and think I lack intelligence ;-)  After all, I think you guys are very bright, even if occasionally incorrect, and usually arrogant.   smiles.  --dawn

 

APPENDIX D: SO WHAT ABOUT “MULTI-VALUE SYSTEMS”?

Note: Some of the points raised in this appendix are discussed further in the companion paper by Fabian Pascal.

 

In the body of this paper, in the section entitled “Relation-Valued Attributes in Base Tables?—The Bad News,” I observed that some people might try to claim that multi-value systems effectively support relations with RVAs [relation-valued attributes], and hence that such systems are really just relational systems after all (except that those same people would probably also claim that such systems are somehow “better” than relational systems, or at least SQL systems, precisely because of that RVA support).  In this appendix, I’d like to respond to such claims.  I’d like to, I say—but I can’t, not really, because my attempts to educate myself regarding multi-value systems from material available on the Web were an utter failure (I had more questions when I’d finished reading than I did when I started).  So all I can do here is sketch my limited understanding of what multi-value systems are, and then raise (but not answer) what seem to me to be some pertinent questions.  Note: For the remainder of this appendix, I’ll use MVS as an abbreviation for “multi-value system.”

 

First of all, then, an MVS database (also called a file)

 

Let me break in a second and note that MV users typically use the word file for what would be similar to a relational database table.  A database in a MultiValue system is a set of files, typically all files within the same namespace are considered to be in a single database, although if there are multiple software applications running against that namespace, each with its own files where there are no overlaps, then a site might suggest they have multiple databases within that same namespace.  Typically an “account” in MultiValue terminology is similar to a namespace and to a single schema in an RDBMS.  It is identified by a single VOC (Vocabulary, term is used by those from the Prime Information flavors) or MD (Master Dictionary, term is used by more traditional PICK flavors).

 

consists of a collection of records (also called items).  Each record contains two or more field values (also called attribute values)—two or more, because the system automatically prefixes every record as presented to it by the user with an item-ID (unique at least within the file), which becomes the first field valued in the record, and it’s my belief that records as presented by the user must already contain at least one field value.

 

I can see where you might have guessed such from various descriptions, but this is not the case.  The item-ID for a record or item is completely within the control of the user (the software developer).  The storage and retrieval of data are decoupled from user-defined-constraint handling in the MultiValue model.  This has pros and cons.  But if you think that the system is doing something automatically that is unrelated to physical storage and retrieval of data, it is likely an incorrect assumption.  In that respect, the database portion of the MultiValue system could almost more accurately be classified as a file system.  However, there are too many dbms features that a file system lacks that do exist in MultiValue systems.  When I first saw a PICK-like database, which was after I saw Oracle and IMS, for example, I said something like “that’s not a dbms!” 

 

In typical computer-speak, a developer would think of the record ID as the zeroth field value in the record.  Alternatively, sometimes the record is considered field values 1 … n and the item-ID is not always considered as part of the record.

 

As for records having one field value, the only value required for storing a record is the record-ID (item-ID).  The rest of the record may be missing or have NULL values.  I say “NULL value”, rather than “NULL” because PICK uses a 2VL and in a MultiValue system, a NULL is a value – it can be logically modeled as a NULL SET.  If you compare two null values, they are equal.  So, minimally, an item contains a non-null value for the item-ID (as specified by the user) and a value for the (rest of the) record which maybe NULL, as in “null set”.  There are some implementations that permit a NULL value as the item-ID, but it would never be considered a good practice to use that “feature”.        

 

The following points arise immediately:

 

·   As the phrase “the first field value” suggests, MVS fields are ordered left to right (and so MVS files are certainly not relations, and the system is certainly not relational).

 

As you define “relational” they would not match the definition.  But if by “relational” you mean that one would use relations in the logical model when using such systems, then they are relational.  I have seen many mathematical definitions of relations and they all define a relation as a set of ordered tuples.  What mathematical (not derived database jargon) definition of relation is there that claims that a mathematical relation MUST NOT BE a set of ordered tuples rather than that it IS a set of ordered tuples?  It seems like the definition of a mathematical relation has been completely confused in the relational database community.  So, in this particular aspect, I would suggest that MultiValue files (which are mathematical functions and thus relations) are MORE relational than those implementations that have unordered tuples.  For the most part, the “location” of an attribute in the tuple is a way of having a fixed name for that attribute.  Users may give a particular “location” new names and definitions – as many logical definitions for that one attribute as they would like, but it retains its location.  Also, there are operations, such as a matrix read, that do require the ordering.  So, by your definition of “relational” as well as by Codd's and others who have written on relational theory, the MultiValue model does not meet those definitions – it meets the mathematical definition instead.

 

·   Whether item-ID values are visible to the user is unclear (if they’re not, then MVS files are certainly not relations---see Appendix B—and the system is certainly not relational.

·   Whether item-ID fields can be updated by the user is unclear.

 

Again, I’m happy to agree that the term “relational database management system” really doesn’t fit PICK, but item-IDs are completely visible to and maintained by the user.  Other than hashcodes and information used for storage of data, there is no information that is not available for CRUD access by the user.

 

·   Two records can be duplicates as far as the user is concerned and yet distinct as far as the system is concerned (because they’re given distinct item-ID values).  The full implications of this state of affairs are unclear.

 

Nope –nothing so fancy or confusing as this is the case, but this must stem from the use of the term “record” in multiple ways – when it is used to describe the “range” of the function and exclude the domain (key), then yes, you could map two keys to the same “record” in that sense.  I rarely hear the term “record” used that way by practitioners, unless they are using what in at least one PICK implementation is referred to as the @ID and @RECORD components – the first being the domain of the function (relation) and the latter being the range.  You can have 2 @RECORDs that are equal while their @IDs are different.  That is like saying that you could have two different values for a candidate key where the rest of the tuple is the same.

 

·   Item-ID values are hashed to determine where records are physically stored (note the mixture of logical and physical considerations here!).  Whether records can be accessed sequentially, using either physical or logical sequence, is unclear.

 

The hashed business is something the database management system does – it is not something the user concerns themselves with.  So, it is not part of the logical model at all and there is no reason to bring the physical storage into the mix – it is under the covers.  As for ordering of “reads” it is the case that a user can specify an ordering for reads on the item-IDs, but not based on any physical ordering of records.  The user typically reads without concern for the order and simply orders the output, but alternatively can “select” item-IDs, prepare an order for them and then read records in that order.

 

·   Does the collection of records in a given file constitute a set?  Or a bag?  Or an array?  Or a sequence?  Or what?  Note that if it’s anything other than a set, then the file is certainly not a relation, and the system is certainly not relational.

 

A function.  Each file can be modeled as a mathematical function, which by definition is therefore a relation, which is therefore a set.  And as mentioned before, it is closer to the look of a mathematical relation than are the tables in an rdbms.

 

·   What file-level operators exist? E.g. is there anything analogous to join or union?  Note that if there are no such operators, then the system is certainly not relational.

 

For all practical purposes, it is useful to think of all pre-packaged functions (operators) as operating on sets of strings.  Files are one example of a set of strings in PICK, as are sets of files, but most operators act on strings that are at the item/record level and on down the tree, rather than at the file/function level or above that.  I think it is accurate to suggest that relation level operators are only a subset of those available.  The model here is of functions (see these as mathematical relations with designated keys) in a di-graph – that is, a web of functions -- that provide us with trees of data.

 

          Next, a given MVS field can be “multi-valued”—that is, a given record can contain any number N of values of the same type in a give field position, where N is either a positive or non-negative integer (whether N can be zero is unclear).  In other words, MVS fields can apparently contain “repeating groups.”  Also, individual values can have “subvalues,” which I think just means the values in question don’t have to be scalar but can be what some languages call structs (though the material I read was extremely unclear on this point; in fact, it was almost certainly incorrect).  Questions:

 

·   How do you do MVS database design?

·   MVS apparently allows one level of nesting.  Does it allow two or more?  If not, the system is certainly not relational.

·   Can a given “multi-valued field value” be operated upon as a single value, or does it always have to be operated on piece by piece, one component value at a time?  If the latter, then the system is certainly not relational.

·   Is a given “multi-valued field value” a set of component values?  Or a bag?  Or an array?  Or a sequence?  Or what?

 

To sum up:  The material I read on multi-value systems focused almost exclusively on the structural aspects of such systems and said essentially nothing regarding the manipulative or integrity aspects.  And even on the structural aspects, the material was very incomplete—not to say confused.  Thus, the only thing I feel comfortable in saying is that, based on what I’ve seen of far, the chance of multi-value systems being “truly relational,” let alone being “more relational than SQL systems,” is vanishingly small.  In fact, it looks to me as if such systems will almost certainly be more complex—complex for the user, that is, and probably for the DBA as well—than true relational systems (or even SQL systems) should ever be.

 

Which is a reason I tackled this – from what I had learned regarding relational database theory and from what I have seen in practice, it was hard for me to believe that the PICK model yielded such higher developer productivity as I was experiencing with my teams.  And, yet, I am quite convinced that if a PICK developer and an Oracle, DB2, or SQL Server developer were given the same task and the resulting system needed to work well for the user, then the seasoned PICK developer would likely produce a system sooner (for less cost) and that the resulting system would have a lower maintenance cost as the business introduces changes over time.  I’m trying to tap into just what it is about this platform that has, in the past, yielded more bang for the buck so that as the industry advances we do not lose some of the agility present in PICK.  I also suspect, but have not proven, that the underlying data model is key to this bigger bang for the buck.  What I mean by “data model” is not the conceptual nor physical, but the logical – the mathematical representation of the predicates for which propositions will be stored.

 

 

Posted 4/22/05