Saturday, February 16, 2019

Class, Type, Set, Relvar, and Relation




Note: This is a rewrite of a part of an older post (now redirecting here), to bring into line with McGoveran's formalization, re-interpretation, and extension of Codd's RDM[1] (the rewrite of the other part was posted last week).
“[According to Date] relvar ≠ class. [But i]n simple terms, class applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a relation is a class (and trivially also a set), which contributes to confusion.”

“In modern programming parlance, class is generally distinguished from type only in that the latter refers to primitive (system-defined) data definitions, while class refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, type and class are actually synonymous.”
Class, type, and set are often used interchangeably in the industry. Relations are neither class, nor type, and Date's relvars must be placed properly in their formal context. While details regarding these concepts vary with the flavor of set theory, they are sufficiently well defined to be distinguishable in each of the three formal foundations of the RDM, simple set theory (SST), mathematical relation theory, and first order predicate logic (FOPL).



------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE 

Up to 2018, DBDebunk maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 the website was has been discontinued. The content is not available anywhere else, and if you deem it useful, particularly if you are a regular reader, please ensure its continuation and free availability by supporting it with as much as you can afford via purchases of publications, or donations. Thank you.

LATEST PUBLICATIONS 



·    Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING OF THE REAL RDM series, is available for ordering here.

·    Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING OF THE REAL RDM series, is available for ordering here.

SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:

  • @dbdebunk: will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.
  • @ThePostWest: will contain evidence for, and my take on the spike in Anti-semitism that usually accompanies existential crises. The current one is due to the decadent decline of the West and the corresponding breakdown of the world order.
HOUSEKEEPING

  • NEW! Updated the LINKS page. NEW!
  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------

Class, Type, and Set


“... every property defines a class -- namely, the set of individuals possessing that property -- whereas every class is a class simply by virtue of the fact that its members have common defining properties.”[2]

Given a universe of objects, the definition of a class specifies the properties that distinguish objects that are class members (i.e., that have those properties) from those that are not (i.e., that do not). When applied to a particular universe, a class definition selects out exactly those members, which satisfy that definition. If the universe is well defined (i.e., each object can, in principle, if not in practical terms, be examined), the result is a set (i.e., class membership is derived rather than defined). For such a class the set can be computed -- mathematicians say that a class over a universe induces a set. Hence, every set is a class, but every class is not necessarily a set (e.g., if the universe is not well defined, the set cannot be computed).[1]

Formally, a set can be defined:

  • Intensionally, by specifying the properties required for membership (i.e., the definition of the class inducing it);
  • Extensionally, by enumeration of the members as computed (induced by the class), or designated.

Note: Designation determines a  set, but the class definition is not computable except in the sense that a member either is or is not in the list of designated members (i.e., logically, only yes/no, designated or not, is or isn't a member).
 

The properties required for the corresponding set membership define the type of object the members are. In other words, the intensional definition of a set is a type specification statement.

Note: Type in this sense should not be confused with data type discussed in previous post (see below). Object is not used here in the object-oriented sense.


Entities and Groups


A conceptual model structures a segment of reality as a multigroup of related groups of entities with properties specified by business rules (BR). Entity group members share (1) as individuals, first (1OP) and second order properties (2OP), and (2) collectively, third order properties (3OP) required for membership (i.e., are of the same type)[3]. The 1OPs, 2OPs, and 3OPs required for membership are the defining properties of a group. By virtue of sharing defining properties, entity members of a group are of the same type.

Note: We ignore fourth order properties (4OP) arising from relationships among groups in the multigroup for the purposes of this discussion.

At any time, for a type defined by a class there can be, in a universe of entities, more than one collection thereof that share the defining 1OPs, 2OPs, and 3OPs -- the entities in those collections are "potential" group members, one of which is designated as the "actual" group[4]. Membership of the designated group varies over time, but members always share the defining properties. Otherwise put, those properties define a class of entities, and every designation induces a time-specific set thereof.

For example, an enterprise has established some properties that programmers must have, individually and collectively, to qualify for employment. Out of a universe of programmers available for hire, there may be more than one collection thereof that satisfies the requirements, one of which is hired (i.e., designated as a set of employees). A designation occurs each time employed programmers leave, either because (1) they do no longer meet defining properties (lack of continuing education, bad moral behavior, insubordination, etc.), or (2) lose the designation (are fired without cause); or new ones are hired.



Class = Relation Variable?


Database relations are mathematical relations adapted for, and applied to database management[5]. A database relation is a set of tuples with an interpretation assigned by the conceptual model it represents -- a set of facts about (individual and collective properties of) entity members of a group specified by the model's BRs. Every relation has an associated relation predicate (RP) -- a formalized expression of the BRs -- which, when expressed in a DBMS-specific FOPL based data sublanguage, is enforceable by the DBMS to constrain the relation for consistency with its interpretation. 1OPiCs (i.e., 1OPs associated with entities of a type) are represented by attributes defined on domains, 2OPs and 3OPs by constraints[6,7].

In the temporal context, however, each relation is, in effect, a time-series of relations of the same type, (i.e., constrained by the same RP), each representing a time-specific designated entity group. On the one hand, the RP represents formally in the database a class definition, while on the other it constrains all time-specific relations, which can be viewed as a sequence of time-ordered values of a relation variable (relvar) that has those relations as values. It is in this sense that a correspondence of class and relvar may be interpreted. Nonetheless, given the requirement that the successive values of a relvar may have to satisfy transition constrains, if any, this interpretation is still a bit of a gloss, so strictly speaking Date is correct.


However, Codd never used anything like relvars, instead introducing the necessarily informal notion of "time-varying relations":
“A database so structured will then consist of two parts: a regular part consisting of a collection of time-varying relations of assorted degree (sometimes called the extension) and an irregular part consisting of predicate logic formulas that are relatively stable over time (this is sometimes called the intension, although it may not be what the logicians Russell and Whitehead originally intended by this word).”[8]
We contend that Codd recognized that set semantics does not have the concept of a variable with values that can be updated (i.e., destructively assigned) -- such variables cannot be expressed in SST or FOPL. More expressively powerful formal systems are necessary that require computationally complete languages (CCL), which:
  • Do not support data independence (physical and logical)[9];
  • Do not support system-guaranteed correctness -- logical validity and semantic consistency[10]; and,
  • Are procedural and, thus, not decidable.

While more expressively powerful than FOPL, every algorithm in a CCL must be separately tested for correctness and termination, since no general algorithm for correctness or termination can exist. Thus, a CCL would do violence to the RDM, and rob it of key advantages. Codd skirted these issues with the informal term, but in so doing he assumed that the time variance could be handled "under the covers" using set semantics -- a formally unproven assumption.


“The concept of an updatable relation is a little odd. A relation is a set. Relational theory, and so the RDM, is a particular expression of standard finite set theory and first order predicate logic. These two formal systems are compatible in the sense that standard finite set theory, its axioms and its theorems can be expressed using the language of first order predicate logic. The semantics (i.e., the valid interpretations of the theory) do not include any concept of an object than can change, nor can any symbol in the object language be given a time-varying instantiation.

“Yet the RDM would be of little practical use if we had no notion of updating a relation. So how do we resolve this apparent inconsistency? The answer requires understanding that the notion of an updatable relation is a simplifying gloss that hides considerable set theoretic detail. Unfortunately, the informal terminology in common use -- and introduced by Codd -- suggests formal expressibility (expressive power of the formal language) of which relational theory per se is simply incapable. Yet we must be careful not to be led down the primrose path suggested by the terminology, and import non-first order languages into the RDM.”[1]

Hence FOPL-based relational data sublanguages responsible strictly for data manipulation and constraint expression, devoid of explicit relvar semantics, and properly hosted by CCLs -- properly meaning no FOPL violations [11,12,13].

Note: Updatability in the relational context is well beyond the scope of this discussion; the reader is referred to [1]. Here's (slightly rearranged) the relational concept of an update, from chapter 20 (draft):

“Practitioners tend to think of, rather informally and imprecisely, of an update as an input set and output set, or even more commonly as the initial state of the relation and the final state of the relation. But in set theory and, therefore, in a relational database, updates are set transformations. That is to say, what we informally call an update is actually a way of deriving (i.e., selecting) a new set from an existing (i.e., given) set. The new set does not replace the old set -- concepts such as "replacement" or "assignment" lie outside standard set theory. Put another way, an update is realized in set theory as a mapping between two [pre-existing] sets.”

The details are technical. The book (and the chapter in particular) is a must read to understand that any relationship between industry's "understanding of the RDM" and the real thing is purely coincidental.

Type and Data Type


There are two distinct type concepts:
  • A type that defines what is permissible for a typed object: in this sense, every object has a type defined by the defining properties;
  • A type that defines the typed object itself: a data type (discussed in last week's post) is a type in this sense.

They should not be confused, as in the second paragraph of the above quote.



Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.



References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming (draft chapters).

[2] Olson, R., MEANING AND ARGUMENT.

[3] Pascal, F., Data and Meaning Part 2: Types of Business Rules.

[4] Pascal, F., Designation Property and Assertion Predicate.

[5] Pascal, F., What Relations Really Are, and Why They Are Important.

[6] Pascal, F., Data and Meaning Parts 1-3.

[7] Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency.

[8] Codd, E.F., RM/T: Extending the Relational Model to Capture More Meaning.

[9] Pascal, F., Physical Independence Parts 1-2.

[10] Pascal, F., Logical Validity and Semantic Correctness.

[11] Pascal, F., Logical Symmetric Access, Data Sublanguage, Kinds of Relations, Database Redundancy and Consistency.

[12] Pascal, F., Data Sublanguages, Programming, and Data Integrity.

[13] Natural, Programming, and Data Languages.



No comments:

Post a Comment

View My Stats