Sunday, August 12, 2012

Domain vs. Type, Class vs. Relation

What's wrong with last week's picture (update of August 2012)
"Our terminology is broken beyond repair. [Let me] point out some problems with Date's use of terminology, specifically in two cases.
  1. "type" = "domain": I fully understand why one might equate "type" and "domain", but ... in today's programming practice, "type" and "domain" are quite different. The word "type" is largely tied to system-level (or "physical"-level) definitions of data, while a "domain" is thought of as an abstract set of acceptable values.
  2. "class" != "relvar": In simple terms, the word "class" applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a "relation" is a "class" (and trivially also a "set"), which contributes to confusion.
In modern programming parlance "class" is generally distinguished from "type" only in that "type" refers to "primitive" (system-defined) data definitions while "class" refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, "type" and "class" are actually synonymous."
With respect to 1, well, yes, they are distinct, but not for the stated reason. With respect to 2, well, no insofar as "programming parlance" goes. The terminology introduced by Codd was explicitly intended to distinguish formal concepts from set theory and first order predicate logic from the terminology used in programming practice. 

1. Domain vs. (Data) Type

"The theory behind data types in most programming languages is based on abstract data types, but programmers hardly ever use the term in this way and languages are rarely strong in this regard. The need for a formal theory (of abstract data) and the semantics of types was not addressed by either Codd or the current RDM interpretation. Codd's treatment of types was greatly simplified and its understanding in the current interpretation of the RDM is at best simplistic. An adequate treatment of the subject is beyond the scope of this discussion and will be addressed in Part III of LOGIC FOR SERIOUS DATABASE FOLKS". --David McGoveran
For our purposes here suffice it to say that type is used in two senses:
(a) Extensionally i.e., type denotes a specific set of typed object(s), which define the type;
(b) Intensionally i.e., type defines what is and is not permissible for a typed object.
Both relational domains (which Codd called "extended types") and programming data types are types in the (a) sense: sets of values within a specified range to which certain operations are applicable. In his book THE RELATIONAL MODEL VERSION 2, Codd lists several differences between them: domains represent entity properties in the real world and are under DBMS control, while programming data types are under programmer/application control and do not necessarily represent anything in reality.

2. Relation vs. Class

"Whatever type and class are in "modern programming parlance", the meanings of class in set theory (vs. any other usages) should not be confused with how it is popularly used in programming or--for that matter--in the database literature (class vs. type is another good example of such confusion).

The distinctions between class and set vary with the specific version of set theory. To avoid problems, we will use the most broadly applicable definitions that will still apply to usages relevant to relational database theory and will try to:
1. be precise about how we use the terms;
2. identify the subject areas to which the definitions do not apply." --David McGoveran
In the real world
"...every property defines a class--namely, the set of [entities] possessing that property--whereas every class is a class simply by virtue of the fact that its members have common defining properties."--MEANING AND ARGUMENT: ELEMENTS OF LOGIC
In other words, entities are members of a class by virtue of common properties and when we say they are of the same type, we use type in the (b) sense.
"The definition of a class is intensional--it is a statement of the properties that distinguish members of the class from non-members. When applied to a particular universe of entities, a class definition selects out those that are members of the class. If the universe is well defined--a collection of entities in which each can, in principle though perhaps not in practical terms, be examined--the result is a set. Mathematicians say that a class over a universe "induces" a set. If one defines a class, one must then "compute" the set that is induced when that class definition is applied to a particular universe." --LOGIC FOR SERIOUS DATABASE FOLKS
At the class level by properties we mean:
  • Individual properties shared by entities that are class members;
  • Properties arising from relationships between individual properties;
  • Properties arising from relationships among all class members collectively;
There are also multi-class properties arising from relationships among two or more classes.

Note that while this seems to contradict "whether such a collection could actually exist", it does not because of the caveat regarding "well defined universe". If the collection could not actually exist, the universe is not well defined as required.

Conceptual modeling consists of specifying these relationships in natural language as informal business rules. Those rules correspond to a formal predicate that expresses the class i.e., they comprise the intensional definition of each class of interest. When applied to a universe of entities, the class induces a set of class members, facts about which are to be recorded in the database.

A relation is, thus, a set of tuples that represent in the database facts about the set of entities induced by the class. Every relation is associated with a relation predicate (RP)--the conjunction of integrity constraints that represent the business rules in the database. The RP represents formally in the database the intensional class definition (that was informally expressed by the business rules). When applied to a universe of entities, that RP induces the relation and serves as its membership function. The relation's tuples--its extension--satisfy that RP. This is another way of saying the tuples in a relation represent facts about a set of entities of the same type i.e., a RP is a relation type and a
tuple type specification statement.

Note very carefully that:

"Translating business rules into a formal first order predicate (let alone expressing it as integrity constraints in any DBMS-specific data language) is a big step that casts the die. There is no way to know you've done it incorrectly, except that you decide you are unhappy with the results--that the formalism doesn't produce something you think it should produce, or produces something you think it should not (usually detected by translating the constraints backwards and comparing to reality). We can minimize the likelihood of a bad modeling effort by following a careful methodology, but we must not confuse the conceptual with its formal representation, the former being the choice of subject matter and latter being the result of a choice of formalism." --LOGIC FOR SERIOUS DATABASE FOLKS
I shudder at comparing database practice to this recommendation.

Note also that, following Codd, we refer to relations rather than relvars.

"...set semantics do not have the concept of a computer variable to which values can be destructively assigned (or "updated") ... [such] variables can be expressed in certain systems of logic, but they cannot be expressed in elementary set theory, or first order predicate logic. Other, more expressively powerful systems are required. Unfortunately, such powerful formal systems do violence to the relational data model and its intended benefits." --LOGIC FOR SERIOUS DATABASE FOLKS
which is perhaps why Codd avoided relvars by using the term "time-varying relations" instead. His choice seems to skirt the need for such powerful formal systems, while relvars--which introduce the semantics of computationally complete programming languages and the higher logic that they entail--embrace it.

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment