Saturday, March 17, 2018

Physical Independence Part 2: Logical-physical Confusion

Note: This is a rewrite of older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.

Revised 3/17/18

(Continued from Part 1)

This is the second part of my response today to an old DBDebunk query:

"You constantly remind us that the relational model is a logical model having no connection to any physical model (so I infer). You also indicate how no commercial product fully implements the relational model. Therefore, how do we make use of the relational model when dealing with the physical constructs of a commercial database program (Oracle, Access, DB2, etc.)?"
In Part 1 I explained physical independence (PI) and claimed that the  industry has failed to internalize its importance. Here I provide evidence to that effect and discuss some consequences.

Data Modeling and NoSQL

"To the question “How relevant is data modeling in the world of NoSQL?” I give the following answer.

The main purpose of data modeling is to understand the business, some application domain, some users world. The model becomes a representation of that world -- the "things" in it, the relationships among those things and any constraints on those things or relationships. A secondary purpose is to build a database to contain information which pertains to and describes that domain."

"Generally we speak of the model coming first, then the implementation, and finally, the data gets collected and stored according to the model. Hence, the business data model should not be concerned with issues of physical stored representation, or the transformations/manipulations/constraints which are imposed to facilitate implementation in some data (storage) management system. That could be a relational DBMS, or a NoSQL tool".

" ... increasingly the data already exists in some form. Which leaves us with the task of figuring out what it means, what it represents -- that is, understanding the data as it represents some user domain. NoSQL tools are often designed to deal with existing data and to process it more efficiently (that may be an oversimplification!). Either way, you must understand the business in order to make sense of the data." 

--Gordon Everest,
I have written extensively on the three levels of representation and four types of model and I won't repeat it here -- readers can refresh their memory if necessary[1,2]. Everest's comments are at best ambiguous with respect to the levels and models (e.g., by data modeling he means business modeling, and his "business data model" lumps together business model and data model). It is to avoid such ambiguities and the resulting confusion that I recommend the three-fold terminology of conceptual modeling, logical database design and physical implementation, eschewing data modeling[3]. Here I will rely on my earlier writings to address strictly the issue of data modeling in the NoSQL context raised by Everest.

Monday, March 5, 2018

Physical Independence Part 1: Don't Mix Model with Implementation

Note: This is a rewrite of several older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.

"You constantly remind us that the relational model is a logical model having no connection to any physical model (so I infer). You also indicate how no commercial product fully implements the relational model. Therefore, how do we make use of the relational model when dealing with the physical constructs of a commercial database program (Oracle, Access, DB2, etc.)?" query

Monday, February 26, 2018

Relationships and the Relational Model

Note: This is a rewrite of several older posts (which now link here) to bring them in line with McGoveran's formalization and interpretation  of Codd's true RDM [1]. 

Revised  5/4/18.
"William Kent confesses (in my words) that he can not distinguish between "relationships" and "attributes" ... the later might be completely redundant ... the notion of an attribute presumes a relationship, so we must define that first ... All of this is handled explicitly and correctly in ORM -- we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes ... an attribute is an object playing a role in a relationship with another object."
"... we are not modeling objects/entities/attribute ... at all in the relational model, [but] a bunch of relationships ... hence perhaps Codd was correct in calling it a "relation", a bunch of relationships ... Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not ... [it] has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes ..."
"... isn't it funny, that the term relation is implicitly mapped (in our minds) to a table of a database? If (loosely speaking) a relationship in our conventional data modeling is represented by a foreign key in a table (and combining both points together) -- should a table (relation) consists only of foreign keys? ... What [other] type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier. Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between relations) was through explicitly stored values (i.e., attributes, foreign keys)."

The lack of foundation knowledge and inability to reason never ceases to amaze me. So, in the RDM "we model a bunch of relationships", but "it has no relationships" because "all relationships must be represented by foreign keys"?

It says something about the state of knowledge in the industry [2] that five decades since the RDM many data professionals still (1) do not know that relational derives not from "relationships between tables", but
from mathematical relations, which contributes to the misconception that (2) "the relational model does not have any other relationships".

Sunday, February 18, 2018

This Week

1. Database Truth

"The RDM is a formal system. It has two parts. Semantics its outside the formal language (which is Deductive Subsystem), but not outside the interpretation (i.e., application) of that language (Interpretation Subsystem). Without an Interpretation Subsystem there is no possibility of applying the formal system and it remains an abstract game of symbols."

"Codd's 1979 paper described a way to "capture" semantics using the relational formalism. That formalism doesn't tell you how to discover semantics, but if you have them, then he showed (at least to some degree) how to express those semantics relationally."

"Semantics is about applying the RDM to some subject. In effect, what you do is restrict the power of the abstract formalism so that it is more closely aligned with your intended use. In my terminology, that means you:

  • Create axioms (expressed as constraints), limiting the vocabulary to the subject matter (and making it finite and usually fairly small); and,
  • Restrict the possible interpretations that can be used consistently with the resulting subset of the formalism."
                                                          --David McGoveran

2. Do You Know What's Wrong With This Picture?

"When someone refers to a relation in a database course, what does that mean?"

"It means that it is time to go to Wikipedia."

"A relation in the context of modeling a problem will include the fields and possibly the identification of fields which have relationships with other relations."

"A relation is a data structure which consists of a heading and an unordered set of tuples which share the same type."

"I was trying to show that, in SQL, a relation is more than just a table. Queries return relations. And within a query, relational math is happening, with many intermediate results, that themselves are relations."

"A relation is an abstract structure which contains a set of attributes, and a relvar is the dataset status in a particular moment of this relation. The first one can be considered as the table definition with columns, and the second one is dataset in this table." 

"Tuples need not have a key (or any way of locating them?) Having tried to answer this question so that I could explain it to my students, I am forced to the conclusion that the theory has nothing whatsoever to do with "data" in the usual sense. Perhaps information Theory would have been a better basis for data systems, rather than Mathematics? Computer applications are notably different from Physics, which math was created to model, and its child, Engineering. I think data was never intended to be "true", it must be useful."
                  --What is a relation in database terminology?,

Sunday, February 11, 2018

The Key to Relational Keys - A New Understanding

Version 3 of paper #4 in the PRACTICAL DATABASE FOUNDATION series is now available to order here. The Key To Relational Keys: A New Understanding is a completely new, re-written version that incorporates David McGoveran's formalization and interpretation of Codd's true RDM and introduces a new perspective. It is distinct from and supersedes all previous versions.


The dual theoretical foundation of the RDM — simple set theory (SST) expressible in first order predicate logic (FOPL) — is applicable to database management because it is a theory of unique objects and objects are unique in the real world that databases represent. In the real world objects are uniquely identified by (1) a combination of one or more defining properties and/or (2) names assigned to them as members of various groups. A database relation is a formal representation of an object group, with tuples representing (facts about) and attributes representing properties and/or names of object members. Relational keys are attributes that represent formally in the database those identifying properties and names.

Their fundamental database role notwithstanding, relational keys are poorly understood. 70% of hits are about keys and misconceptions about their necessity, what kinds of keys are relational, their functions, their selection, and so on, abound.
This paper defines and explains:

  • The relational key concept
  • The kinds of relational keys, their properties and functions
  • The formal PK mandate
  • PK selection
  • RDBMS key support

and debunks common misconceptions.

Sunday, January 28, 2018

This Week

1. Database truth of the week

"Relvars introduce a concept of assignment, which has no counterpart in either FOPL or set theory. If you add it to those formalisms you introduce computational completeness, which destroys both decidability (the existence of a general algorithm by which you can determine if an expression is or is not logically valid) and the guarantee that there exists a (query) evaluation procedure that will halt (the existence of a general algorithm by which you can evaluate the truth or falsity of every instantiated predicate expression given those instantiations from any given database). Therefore we must forbid relvars." --David McGoveran

2. What's wrong with this database picture?

"Many data and information modelers talk about all kinds of keys (or identifiers. I'll forego the distinction for now). I hear them talk about primary keys, alternate keys, surrogate keys, technical keys, functional keys, intelligent keys, business keys (for a Data Vault), human keys, natural keys, artificial keys, composite keys, warehouse keys or Dimensional Keys (or Data Warehousing) and whatnot. Then a debate rises on the use (and misuse) of all these keys ... The foremost question we should actually ask ourselves: can we formally disambiguate kinds of keys (at all)? Of all kinds of key, the primary key and the surrogate key gained the most discussion."
"If we take a look at the relational model we only see of one or more attributes that are unique for each tuple in a relation -- no other formal distinction is possible. When we talk about different kinds of keys we base our nomenclature on properties and behavior of the candidate keys. We formally do not have a primary key, it is a choice we make and as such we might treat this key slightly different from all other available keys in a relation. The discussion around primary keys stems more from SQL NULL problems, foreign key constraints and implementing surrogate keys. --Martijn Evers, Kinds of Keys: On the Nature of Key Classifications,