Sunday, December 10, 2017

Conventional Wisdom and True Relational Features



Here's what's wrong with last week's picture, namely:
"Per Date’s AN INTRODUCTION TO DATABASE SYSTEMS, Date & Darwen’s DATABASES, TYPES, AND THE RELATIONAL MODEL, and related references, the features of a relational database are values, types, attributes, tuples, relations, relation-valued variables, operators, and constraints.
  • A type is a set of values and related operators.
  • An attribute is a name, value, type triple.
  • A tuple is a set of attributes.
  • A relation is a set of tuples with a given heading.
  • A relation-valued variable (known as a relvar) is a persistent variable whose time-varying value is a relation." --Dave Voorhis, Computer scientist; lead developer of Rel, a true relational database system, Quora.com

This is more or less the conventional wisdom, which is nothing like the true RDM envisioned by Codd [1].



Domains, Not [Data] Types


Relations are defined on domains, not types, which were introduced by Codd expressly to distinguish from programming data types (PDT). Domains are not types, like everything else they have types and:
  1. Represent real world properties of all possible members of an object group (i.e., a class);
  2. Are simple (i.e., have no components meaningful to applications and have values treated as atomic by the data language);
  3. Are created by database designer and under DBMS control;
while PDTs (1) do not (2), are not necessarily and (3) are created by application developer and under application control. [2,3,4]

Attributes Are Representations of Domains


An attribute is a subset of a domain that constitutes the range of a simple function (i.e., a function with an inverse, a 1:1 mapping) on a domain, which interpreted as a representation of a domain in a specific real world context -- a first order property (1OP) of actual members of an object group (e.g., attributes SALARY and COMMISSION are distinct representations of the domain COMPENSATION). This obviates the need for Date's possible representations (possrep) (e.g., if COMPENSATION is a domain derived from the primitive domain MONEY defined in US$, the two attributes can be representations in other currencies). [5]

A tuple is a set of values of attributes defined on domains.


Relations Don't Have Headings


Headings are not an element of relations, but of the tabular visualization of relations as R-tables on physical media. Only the bodies of R-tables visualize relations, the headings visualize meta-data, not data. [6]


Time-varying Relations, Not Relvars


While there is something like relation variables "under the covers", Date's introduction of explicit relvar semantics notwithstanding, in none of the theoretical foundations of the RDM are there variables with values that can be destructively assigned ("updated").
"Set semantics do not have the concept of a variable with values can be updated (i.e., destructively assigned). Such variables can be expressed in certain systems of logic, but they cannot be expressed in elementary set theory, or first order predicate logic (FOPL). Other, more expressively powerful formal systems are required. Unfortunately, such systems do violence to the RDM and rob it of its advantages and benefits." --David McGoveran
This is probably why Codd did not include explicit relvar semantics in the data language, using the informal notion of "time-varying relations" instead. It skirted the more powerful formal systems that would introduce the semantics of computationally complete languages (CCL), which:
  • Are undecidable and, therefore, cannot be declarative;
  • Do not support physical and logical independence [7];
  • Do not guarantee logical validity and semantic correctness [8];
  • Are more powerful, but every algorithm must be tested for correctness and termination. 

References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., Class, Type, Relation and Domain in Database Management.

[3] Pascal, F., Simple Domains and Value Atomicity.

[4] Pascal, F., First Normal Form in Theory and Practice, Parts I, II, III.

[5] Pascal, F., To Really Understand Integrity, Don't Start with SQL.

[6] Pascal, F., What Relations Really Are and Why They Are Important.

[7] On View Updating (C. J. Date and D. McGoveran)

[8] Pascal, F., Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness.




6 comments:

  1. Thanks for your interest in Rel!

    You can read more or download it from https://reldb.org

    ReplyDelete
  2. A value is an individual constant. They can be scalar or nonscalar. A scalar value has no user-visible component parts. By definition, a value can't be updated. In the RDM, tuples and relations are nonscalar values (not scalar values).
    Every value is of some type. A type is a named set of values, and can be either scalar or nonscalar. So, if there are tuple and relation values, then there are tuple types and relation types.
    A variable is a holder for a representation of a value. Variables have a location in time and space and can be updated (the current value of a variable can be replaced by another value). Every variable is of some type at some moment in time. Again, if there are tuple values and relation values, then there are tuple variables and relation variables (relvars).

    ReplyDelete
    Replies
    1. I suggest you re-read the post more carefully. It clearly says that:

      1. Domains HAVE types, but ARE NOT types. So domain values do have types.

      2. There are relvars UNDER THE COVERS, but there must not be EXPLICIT RELVAR SEMANTICS in the data language.

      Delete
  3. In the RDM, a domain is nothing but a set on which a relation is defined. It is a mathematical concept. Domains are not types, but do not have types either. If you say that domains have types, then you are saying that sets (domains) have named set of values (types).
    A language (from a computational perspective) must be based on values, types and variables. That is why, a data sublanguage must be based on values, types and variables too. In particular, a relational-based data sublanguage it is based on relation values, relation types and relation variables (relvars).
    The RDM is a mathematical representation ("relational view of data", as E. F. Codd wrote) of real-world facts. But we need to translated it into a computational representation.
    What do you mean by "under the covers"? That is not a formal definition.

    ReplyDelete
    Replies
    1. >If you say that domains have types, then you are saying that sets (domains) have named set of values (types).

      Again, read more carefully: there is a difference between DATA TYPE and type. I am using the term in a 2nd sense, not in the 1st.

      >That is why, a data sublanguage must be based on values, types and variables too.

      If it is a CCL yes. There is a DATA LANGUAGE and has a FOPL-based RELATIONAL DATA SUB-LANGUAGE component that expresses only the DATA MGMT FUNCTIONS of the DBMS -- manipulation and integrity. The other DBMS functions are not expressed in the data sublanguage, but in the data language which is not limited to FOPL. Insofar as the sublanguage is concerned there are no variables, because then it is not a FOPL only relational language.

      For the formal exposition of Codd's RDM and its correct interpretation -- as distinct from the current "understanding" in the industry you will have to wait for David McGoveran's book.


      A relational data sub-language implements data management functions and

      Delete
    2. BTW, the RDM is an INTERPRETED formal system. It's ADAPTED set theory and FOPL -- which are purely abstract -- to make it applicable to the real world that databases represent (e.g., math relations do not have keys, domains are ordered, etc.) So you gotta be careful when you use purely math arguments for the RDM -- it is APPLIED, not pure theory.

      Delete

View My Stats