Monday, December 5, 2016

Domain vs. (Data) Type, Class vs. Relation (UPDATE)

A rewrite of this revision was posted on 10/1/17 to bring it into line with McGoveran's formal exposition and interpretation of Codd's true RDM.

Class, Type, Relation and Domain in Database Management

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:


  1. Codd avoided relvars by using the term "time-varying relations" instead.

    Fabian: bunkum! McG is entitled to object to relvars because programming language variables have no place in set theory. (A set does not have persistent identity.) Then equally (as he says) there can be no place for a database variable.

    Then there can be no "relation" with a persistent identity that could be "time-varying". The best we could say is: look at these two database (value)s; at two different times; they both have a relation with such-and-such a predicate (RP); then we might mentally construct a persistent entity which is time-varying. But that's (convenient) mythology, not licensed by set theory.

    Equally, we could describe that situation by naming a relation; and adopting a convenient mythology that the name identifies a programming language variable.

    The problem with identifying a relation by RP is with schema evolution: we might have two database values with two slightly different RPs. We cannot say that is one time-varying relation. (The whole idea of RPs differing "slightly" is more mental fairy-tale.)

    So I do not see why McG is so critical of relvars. (He as good as admits he's being over-precious.) We can regard a database value as a set of (name, relation) pairs, where the name fills the role of a programming variable.

  2. At this time David McGoveran offers only this reaction and defers any further discussion until his formal exposition of the RDM is published in the book he's currently working on.

    I appreciate the fact that Clayden approves of my objection to relvars on the basis of set theory. On the other hand, he seems to completely ignore the problem relvars introduce into the language vis-à-vis computational completeness when he says he doesn't understand why I am so critical. BTW, I'm not sure what word was intended where he uses "precious", but it made me smile.

    I also don't understand his comment regarding schema evolution, especially inasmuch as his example seems only to reinforce my position that relation predicates (RP) do accurately identify relations (which is not only consistent with set theory, BTW, but with EFC as early as 1969--see "set specification"). That said, I've always said that relational theory has not addressed so-called "schema evolution" from any theoretical basis. I've also said it needs to be done properly.

    As to RPs differing only slightly being "mental fairy tale", I suggest that you can only make such judgments if you both understand how to write RPs in formal detail and define what it means for them to differ or be related, slightly or otherwise. I've defined such differences in terms of the deductive apparatus of FOPL and you can't get much more mechanical than that.

    1. "over-precious" is what I intended. I mean: yes the RM needs no more than set theory and FOPL for a database value to capture a world situation as at a point in time. That does not mean we have to preciously restrict ourselves to set theory when we come to the pragmatics of a DBMS as a Management System whose role within an enterprise is to express persistence of the enterprise's assets.

      I do not understand why McG says relvars (or programming language variables in general) get in the way of computational completeness: a variable stands for a value; just replace the variable with its (current) value in any computation.

      Or isomorphically: map a named relation's schema to a schema with an extra attribute, whose value is the relvar name.

    2. I figured that's what you intended.

      >"I do not understand..." This seems to be the problem, ain't it?

      First, restriction to FOPL is only for the DATA component and USER MODEL, such that no damage is done to the benefits from the RDM. Computational completeness (CC) that's what hosting of data sublanguage is for, but this must be done carefully in order to avoid the damage.

      I have no idea what "relvars get in the way of CC" means--they get in the way of the RDM, not CC.

    3. David McGoveran's reply:

      1. I do not concern myself with pragmatics. If you don't have the theory right, pragmatics are premature and likely to lead you to false conclusions or worse, actions that contradict the foundations on which your system is built.

      2. I have not said that relvars get in the way of computational completeness. Computational completeness disables data independence because computationally complete systems are never decidable. and relvars have no meaning outside computationally complete systems.

      3. There is no over importance that can be asserted regarding set theory. RM
      is a kind of set theory, namely that portion which has a representation in
      first order predicate logic. If you step away from this, you are no longer
      talking about RM. And I am.

      4. You should expound upon your last assertion or drop it. The correspondence you suggest does not explain how it pertains to relvars - at least as you've expressed it.

    4. I'd like to unpack David McGoveran's point 2, particularly "and relvars have no meaning outside computationally complete systems."

      This seems to be a different objection to relvars, compared to what I've seen before that they offend against the Information Principle.

      A programming model can be computationally complete without using variables. (For example the SKI Combinator Calculus.)

      A programming language can use variables without being computationally complete. (For example COBOL vintage 1980's.)

      To remain first-order, variables must range only over individuals, not over predicates or functions. Relvars range over sets of tuples. What's not first-order about them?

      If we take Codd's RA (that is, without transitive closure): that is not computationally complete; it has variables standing for relations. Are those variables not in effect relvars?

  3. For years the fuss about 'time-varying relations' has seemed incomprehensible to me. Surely Codd meant nothing more than references to different extensions at different times, without dictating any particular implementation language.

    Also, the comment that "a set does not have persistent identity" is nonsense. The situations that relational sets represent might come and go but sets of sentences about the situations are as everlasting as anything can be. Language devices that replace the sets under consideration don't change that. And the set ops MINUs and UNION used for updating make it perfectly clear which tuples/rows 'persist'.

  4. (I hope my comments won't disagree with anything David McGoveran has written, his explanations being usually much deeper/more fundamental than I could manage.)

    It should be obvious that language devices such as relvars/tables often represent more than one relation, for example a relation that is a join will often have proper subsets that are also joins. Those subsets would have distinct predicate extensions and therefore different intensions from those of the value of a relvar or table. The most obvious exception would be in systems where every relvar reference specifies a key value, so that every referenced relation is a singleton and so has no subsets. (I'd guess anybody who met Codd would know what a stickler he was for keys.)

    Beyond that, tuples of some relations such as joins can be projections of others.

    When a relvar has only one predicate (supposedly) but can represent multiple relations, each with a different predicate, it looks to be very tricky, if it's possible at all, to use relvars or tables to explain relational theory, which is to say explain database behaviour which is to say meaning of schemas. In other words using an implementation to explain everything else rather than using the theory to explain an implementation. (This hasn't stopped SQL systems from forcing users to use base tables for all updates, even when the reference manuals call such updates 'view updates'! The SQL updating situation is exactly the same as it was/is in file systems - every file update must be individually specified by users before the system can be 'correct'. So SQL systems should more accurately be called advanced file systems. It also didn't stop one of the System R developers from claiming in 1981 that data independence had been achieved!) The result is not only not relational but ignores the overall database meaning and consistency.

  5. Regarding there being no 'license' for mythology/fairy tales to vary relations, this could be a typical coder's view or even a physical view. Set operations allow differences between relations to be expressed, therefore they allow the expression of output relations that vary from the input relations. Most language implementations discard the inputs and differences after the expressions have been evaluated. McGoveran's view updating chapter gives a concise update definition/vary definition expressed as equations using set operators on inputs and differences/'transforms' and doesn't dictate anything about inputs or differences being discarded. (Any discarding would be an implementation choice, not a definition choice.)