Conventional Wisdom and True Relational Features

Sunday, December 10, 2017

Conventional Wisdom and True Relational Features

Follow @DBDebunk Follow @ThePostWest

Here's what's wrong with last week's picture, namely:

"Per Date’s AN INTRODUCTION TO DATABASE SYSTEMS, Date & Darwen’s DATABASES, TYPES, AND THE RELATIONAL MODEL, and related references, the features of a relational database are values, types, attributes, tuples, relations, relation-valued variables, operators, and constraints.

A type is a set of values and related operators.

An attribute is a name, value, type triple.

A tuple is a set of attributes.

A relation is a set of tuples with a given heading.

A relation-valued variable (known as a relvar) is a persistent variable whose time-varying value is a relation." --Dave Voorhis, Computer scientist; lead developer of Rel, a true relational database system, Quora.com

This is more or less the conventional wisdom, which is nothing like the true RDM envisioned by Codd [1].

Domains, Not [Data] Types

Relations are defined on domains, not types, which were introduced by Codd expressly to distinguish from programming data types (PDT). Domains are not types, like everything else they have types and:

Represent real world properties of all possible members of an object group (i.e., a class);
Are simple (i.e., have no components meaningful to applications and have values treated as atomic by the data language);
Are created by database designer and under DBMS control;

while PDTs (1) do not (2), are not necessarily and (3) are created by application developer and under application control. [2,3,4]

Attributes Are Representations of Domains

An attribute is a subset of a domain that constitutes the range of a simple function (i.e., a function with an inverse, a 1:1 mapping) on a domain, which interpreted as a representation of a domain in a specific real world context -- a first order property (1OP) of actual members of an object group (e.g., attributes SALARY and COMMISSION are distinct representations of the domain COMPENSATION). This obviates the need for Date's possible representations (possrep) (e.g., if COMPENSATION is a domain derived from the primitive domain MONEY defined in US$, the two attributes can be representations in other currencies). [5]

A tuple is a set of values of attributes defined on domains.

Relations Don't Have Headings

Headings are not an element of relations, but of the tabular visualization of relations as R-tables on physical media. Only the bodies of R-tables visualize relations, the headings visualize meta-data, not data. [6]

Time-varying Relations, Not Relvars

While there is something like relation variables "under the covers", Date's introduction of explicit relvar semantics notwithstanding, in none of the theoretical foundations of the RDM are there variables with values that can be destructively assigned ("updated").

"Set semantics do not have the concept of a variable with values can be updated (i.e., destructively assigned). Such variables can be expressed in certain systems of logic, but they cannot be expressed in elementary set theory, or first order predicate logic (FOPL). Other, more expressively powerful formal systems are required. Unfortunately, such systems do violence to the RDM and rob it of its advantages and benefits." --David McGoveran

This is probably why Codd did not include explicit relvar semantics in the data language, using the informal notion of "time-varying relations" instead. It skirted the more powerful formal systems that would introduce the semantics of computationally complete languages (CCL), which:

Are undecidable and, therefore, cannot be declarative;
Do not support physical and logical independence [7];
Do not guarantee logical validity and semantic correctness [8];
Are more powerful, but every algorithm must be tested for correctness and termination.

References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., Class, Type, Relation and Domain in Database Management.

[3] Pascal, F., Simple Domains and Value Atomicity.

[4] Pascal, F., First Normal Form in Theory and Practice, Parts I, II, III.

[5] Pascal, F., To Really Understand Integrity, Don't Start with SQL.

[6] Pascal, F., What Relations Really Are and Why They Are Important.

[7] On View Updating (C. J. Date and D. McGoveran)

[8] Pascal, F., Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness.

6 comments:

UnknownDecember 10, 2017 at 11:14 PM
Thanks for your interest in Rel!

You can read more or download it from https://reldb.org
ReplyDelete
Replies
Alain Pereira ToledoDecember 12, 2017 at 9:23 AM
A value is an individual constant. They can be scalar or nonscalar. A scalar value has no user-visible component parts. By definition, a value can't be updated. In the RDM, tuples and relations are nonscalar values (not scalar values).
Every value is of some type. A type is a named set of values, and can be either scalar or nonscalar. So, if there are tuple and relation values, then there are tuple types and relation types.
A variable is a holder for a representation of a value. Variables have a location in time and space and can be updated (the current value of a variable can be replaced by another value). Every variable is of some type at some moment in time. Again, if there are tuple values and relation values, then there are tuple variables and relation variables (relvars).
ReplyDelete
Replies
Alain Pereira ToledoDecember 12, 2017 at 12:45 PM
In the RDM, a domain is nothing but a set on which a relation is defined. It is a mathematical concept. Domains are not types, but do not have types either. If you say that domains have types, then you are saying that sets (domains) have named set of values (types).
A language (from a computational perspective) must be based on values, types and variables. That is why, a data sublanguage must be based on values, types and variables too. In particular, a relational-based data sublanguage it is based on relation values, relation types and relation variables (relvars).
The RDM is a mathematical representation ("relational view of data", as E. F. Codd wrote) of real-world facts. But we need to translated it into a computational representation.
What do you mean by "under the covers"? That is not a formal definition.
ReplyDelete
Replies