Sunday, August 12, 2012

Type vs. Domain and Class

BS: I wish to interject a few specific points, mainly in response to the writings of Date.

1. Our terminology is broken beyond repair.

It seems that Date is largely in agreement with me about this.  Now, to make myself a pedantic nuisance, I shall point out some problems with Date's use of terminology, specifically in two cases.

"type" = "domain"

I fully understand why one might equate "type" and "domain", but there are two problems with this.  Firstly, according to their original mathematical definitions, "type" is very different from "domain". Specifically, the old-school mathematical use of "type" corresponds more to the modern term "function signature" than anything else. Secondly, in today's programming practice, "type" and "domain" are quite different.  The word "type" is largely tied to system-level (or "physical"-level) definitions of data, while a "domain" is thought of as an abstract set of acceptable values.

"class" != "relvar"

Historically, the term "class" was created to deal with the problem of "over-comprehensive sets" (e.g., Russell's Paradox).  In simple terms, the word "class" applied to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist.  Every set has a corresponding class, although a class may have no corresponding set.  The relevance of these historical facts is that according to the terminology of mathematical logic, a "relation" *is* a "class" (and trivially also a "set"), which contributes to confusion.

In modern programming parlance "class" is generally distinguished from "type" *only* in that "type" refers to "primitive" (system-defined) data definitions while "class" refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, "type" and "class" are actually synonymous.

I am not claiming that Date is wrong in his use of the terms--he's completely justified.  I merely wish to point out that most the terms he uses are tangled in so much confusion that it will always be difficult to avoid disputes with whatever Date says, since somebody can always produce an argument (reasonable or otherwise) against him based on reasonable semantic grounds.  This is certainly not Date's fault, but it is something that he (as well as all of us) has to cope with.

Sadly, I have no solution for this problem except maybe that we should invent new, unambiguous words (like "relvar") to replace the words "type", "domain", "class", "object", "function", etc.  In the meantime, I recommend a little less assertion of the idea that "type" = "domain", because it can be both completely correct and completely incorrect depending on the understanding of each term, so the phrasing of this assertion confuses many readers and distracts from important concepts.

C. J. Date: This is good stuff!  Proper discussion of all the points Stevens raises requires more space and time than I have available (I agree with Stevens on this issue, too), so I'll limit my responses to a few specific issues and points.  Section titles and numbering follow Stevens's original.

1. Our terminology is broken beyond repair.

This might be true.  The cause of understanding and communication isn't helped, either, when people take well-understood terms and change their meaning ... I was recently informed that a certain well-known commercial SQL product uses the term "declarative" (as in, e.g., "declarative integrity constraint") to mean "stated but not enforced"!

That said, we're at liberty (following Chesterton, I think it was) to use terms any way we like, just so long as we're clear as to what we mean by them.  And that said too, it's better to follow rather than to flout convention in our usage ... I can't agree that "type" and "domain" refer to such different things "in today's programming practice," as Stevens claims; at least I don't make the distinction that Stevens alludes to, and I know others who don't either.  (Even SQL doesn't!--though appealing to SQL as a precedent for anything is usually contraindicated.)  As for "class" and "relvar," Stevens says a relation is a class (at least in the mathematical sense, if not the OO sense).  But I didn't claim otherwise; what I claimed was that a relvar wasn't a class.  A relvar is a variable, a class (at least in the OO sense) is a type, and variables aren't types.

Overall, I think all Stevens is doing in this section is pointing out how difficult it all is.  I agree.


UPDATED (7/13/12)

It so happens that I have recently revised my paper Business Modeling for Database Design and the issues of (1) type vs. domain and (2) type vs. class came up.

Business modeling suffers from some linguistic limitations that impose some constraints on modeling terminology and the need to distinguish it from logical database design terminology. I use property, attribute, entity and class as real world business concepts and domain, column, row and R-table as database concepts, albeit relational versions of these concepts. Type has dual use. And since "we're at liberty to use terms any way we like, just so long as we're clear as to what we mean by them", I think I define them as clearly as possible given that the business concepts are linguistic primitives, another limitation of terminology).

I consider logical-to-business 1:1 conceptual mapping (formalizing the informal) very important for understanding and practicing a sound modeling and design methodology.

Regarding (1), Codd used the term domain explicitly to distinguish it from programming type. In his book THE RELATIONAL MODEL VERSION 2, at least one of several reasons he gave was, if I recall correctly, that domains are under DBMS control, programming types are under application/programmer control. There were several others which I don't recall.

Furthermore, as David McGoveran points out, there are two distinct type concepts: (a) a type that defines what is permissible for a typed object (b) a type that defines the typed object itself. I use the term in the former sense, C. J. Date uses it in the latter sense. In my sense every set has a type, defined by the specification of the criterion for membership of an element in the set. Domains are sets of values and, therefore, to belong to the domain value set values must be of a type.

As to (2), quoting from a good old logic book (emphasis mine):
The distinction between properties and classes, however, is somewhat artificial and tenuous, since every property defines a class—namely, the set of individuals possessing that property—whereas every class is a class simply by virtue of the fact that its members have common defining properties.--Olson, R. G., MEANING AND ARGUMENT: ELEMENTS OF LOGIC (Harcourt, Brace & World, 1969)
Tenuous, but nevertheless a distinction, which is why there are two concepts.
So, if we, loosely (and to keep it simple) consider the possession of common properties the criterion for membership of entities in a class--i.e. the entity type--then that is distinct from the class, the set of entities itself. And in this sense an R-table represents a class by virtue of its rows representing a set of entities (or true propositions about them) of the same type.

(Originally posted at 2/11/05)

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment