Sunday, July 30, 2017

Integrity Is Not Only Referential: DBMS vs Application Enforced Constraints

Note: This is 07/30/17 rewrite of a 11/11/12 post to bring it line with McGoveran formal exposition of Codd's real RDM [1] and its interpretation.

 

There is nothing wrong per se with the question in last week's picture, namely:
"Can I ask whether people make use of the functionality provided by the database to ensure adequate data quality. Secondly do people apply this retrospectively as seems quite reasonable to me when a data problem is identified and the source data cleaned up--to do so could prevent future errors. There appears to be a tension between this sort of implementation and at least a perception of flexibility as database changes would be required should additional allowable values be required." --LinkedIn.com
except that it's about time such questions are no longer asked. Unfortunately, they are evidence of the persistent lack of foundation knowledge in the industry for more than five decades. Such knowledge would have obviated such questions.

Sunday, July 23, 2017

This Week

1. Database Truth of the Week

"And [AI] weaknesses there are. Watson requires many months of laborious training, as experts must feed vast quantities of well-organized data into the platform for it to be able to draw any useful conclusions. And then it can only draw conclusions based upon the body of data, or ‘corpus’ (plural: ‘corpora’) that it has been trained on. The ‘well-organized’ requirement is especially challenging for Watson, as unprepared data sets are typically insufficient. As a result, Watson customers must hire teams of expert consultants to prepare the data sets, a time-consuming and extraordinarily expensive process." --Is IBM's Watson a Joke?

Sunday, July 16, 2017

Relations and Relationships Part II

Note: This is a 6/21/17 rewrite of a 4/21/13 post, to bring it in line with McGoveran's interpretation [1] of the true RDM envisioned by Codd . It is the second part of a debunking of a LinkedIn thread (the first part of which was debunked two weeks ago).


Here's what's wrong with last week's picture, namely:
"A conceptual model has no rigorous definition? It is like a sketch of a picture yet to be completed? Or like an outline to a paper to be written or fleshed out? And once the model is rigorously defined, the ad hoc, informal model must be precisely consistent with the underlying model in all its semantics. Are you suggesting that a conceptual model is a precursor to a defined logical (relational) model? Then after the relational model is defined, the conceptual model needs to be a consistent abstraction of the formal logical model. 

What other type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier. Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between entity tables or relations) was through explicitly stored values (i.e., attributes, foreign keys).

What do you mean "Attributes are subsets of domains"? An attribute only exists in the context of a relationship. Something (a domain) is a descriptor of (i.e., is related to) something else (another domain).

What is an "R-table"? What do you mean by a "PICTURE [of a relation]"? There are things and there are views or manifestations/presentations of things. There is the model, and there are various presentations of that model. Is that what you are getting at?"--Gordon Everest, LinkedIn.com

"Do you mean that...relations are defined over types (also known as domains); a type is basically a conceptual pool of values from which actual attributes in actual relations take their actual values. (taken from the SQL AND RELATIONAL THEORY [2009] by Chris Date). I am also not sure about "pointers". Can I define a domain of pointers? There might be an interesting relation over such domain.In addition, what will happen if I define a relation over a set of types, each of which is (another) relation? Lets say that a relation is either defined over types (domains), or defined over a "heading" (or a "definition") of other relations ... and I also try to eliminate identifiers completely". --AT, LinkedIn.com

Sunday, July 9, 2017

This Week

1. Database Truth of the Week

"For the operations of a formal system to have inverses within some specific use of that system (like a specific application):
  • The basic elements must be orthogonal (independent), hence the Principle of Orthogonal Design;
  • The combination of basis elements and operations must be expressive enough to represent every aspect of the subject matter, hence the Principle of Expressively Complete Design;
  • And, at the same time, not so expressive that there is more than one way to express each aspect of the subject matter, hence the Principle of Representation Minimality Design.
The basic elements of a relational database is the relation. Adherence to these principles ensure thatthere is a unique relational expression for every aspect of the subject matter--either a base relation or a derived relation--and if there are two ways to derive a derived relation, then those two expressions are provably equivalent (i.e., the differences are merely syntax and never meaningful)." --David McGoveran

Monday, July 3, 2017

New Paper: Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency

NEW!!! Paper #2 in the Understanding the Real RDM series. NEW!!!

 

The data management field cannot and will not progress without educated and informed data professionals and users. UNDERSTANDING THE REAL RDM is a new series of papers that offers informal access to the forthcoming McGoveran interpretation of the formal real RDM envisioned by E. F. Codd (EFC), contrasts it with the current understanding that emerged after EFC's death and demonstrates the practical implications of the differences.

If you are a thinking data professional interested in understanding the scientific foundations of data management, as opposed to the "cookbook" industry practices, these unique papers are a must read. They give you new insights into the RDM--which I call the real data science--and the practical benefits that fail to materialize due to its misuse and abuse promulgated by DBMS vendors, the trade media, "experts" and poor SQL implementations, all of which ignore its formal theoretical grounding from which all the practical benefits derive. You will also learn to minimize the consequences of deficiencies of and optimize the use of SQL.

The series starts with an analysis of the evolution of E. F. Codd's thinking between 1969-1980, which will then be used to show how McGoveran's interpretation follows from his RDM vision and with what consequences. Paper #1, The Interpretation and Representation Of Database Relations, covers the structural component of of the RDM as introduced in EFC's two initial 1969-70 papers.

Paper #2 covers the sections not discussed in paper #1 (net of the relational algebra, which is deferred to a future paper), as follows:

Table of Contents

Acknowledgement

Preface

Introduction

1. Logical Symmetric Access

2. Universal Data Sub-language

2.1. FOPL vs. SOPL
2.2. Relational Completeness
2.3. Computational Completeness and Hosting

3. Kinds of Relations
3.1. Expressible and Named Relations
3.2. Derived Relations
3.3. Relations with Stored Data


4. Derived Relations and Redundancy
4.1. Database Consistency

5. Database Catalog

Conclusion

References

Both papers are available to order from the PAPERS page.