Sunday, July 23, 2017

This Week

1. Database Truth of the Week

"And [AI] weaknesses there are. Watson requires many months of laborious training, as experts must feed vast quantities of well-organized data into the platform for it to be able to draw any useful conclusions. And then it can only draw conclusions based upon the body of data, or ‘corpus’ (plural: ‘corpora’) that it has been trained on. The ‘well-organized’ requirement is especially challenging for Watson, as unprepared data sets are typically insufficient. As a result, Watson customers must hire teams of expert consultants to prepare the data sets, a time-consuming and extraordinarily expensive process." --Is IBM's Watson a Joke?

Sunday, July 16, 2017

Relations and Relationships Part II

This is a 6/21/17 rewrite of a 4/21/13 post, to bring it in line with McGoveran's interpretation of the real RDM envisioned by Codd. It is the second part of a debunking of a LinkedIn thread (the rewrite of the first part of which was posted two weeks ago).

The misconception that the RDM represents relationships of only one type--referential--most likely originates with the E/R conceptual modeling approach. It assumes an "absolute" distinction between entities (objects) and relationships. The distinction, however, is in the "eye of the modeler": objects, properties and object groups are all, in fact, relationships labeled differently as a matter of subjective, pragmatic convenience.  All those relationships expressed as business rules comprising a conceptual model are expressible in a relationally complete FOPL-based data language as integrity constraints enforcible by a RDBMS for consistency with the rules. That neither SQL, nor any other current data languages can express--nor can the DBMSs based on it enforce--all of them is the deficiency of their implementation, not a RDM weakness.  
 
Read it all.


Due to a glitch an earlier revision of an older post seems to have gotten lost. If you have not seen it, read it it here: To Really Understand Integrity, Don't Start with SQL.







Sunday, July 9, 2017

This Week

1. Database Truth of the Week

"For the operations of a formal system to have inverses within some specific use of that system (like a specific application):
  • The basic elements must be orthogonal (independent), hence the Principle of Orthogonal Design;
  • The combination of basis elements and operations must be expressive enough to represent every aspect of the subject matter, hence the Principle of Expressively Complete Design;
  • And, at the same time, not so expressive that there is more than one way to express each aspect of the subject matter, hence the Principle of Representation Minimality Design.
The basic elements of a relational database is the relation. Adherence to these principles ensure thatthere is a unique relational expression for every aspect of the subject matter--either a base relation or a derived relation--and if there are two ways to derive a derived relation, then those two expressions are provably equivalent (i.e., the differences are merely syntax and never meaningful)." --David McGoveran

Monday, July 3, 2017

New Paper: Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency

NEW!!! Paper #2 in the Understanding the Real RDM series. NEW!!!

 

The data management field cannot and will not progress without educated and informed data professionals and users. UNDERSTANDING THE REAL RDM is a new series of papers that offers informal access to the forthcoming McGoveran interpretation of the formal real RDM envisioned by E. F. Codd (EFC), contrasts it with the current understanding that emerged after EFC's death and demonstrates the practical implications of the differences.

If you are a thinking data professional interested in understanding the scientific foundations of data management, as opposed to the "cookbook" industry practices, these unique papers are a must read. They give you new insights into the RDM--which I call the real data science--and the practical benefits that fail to materialize due to its misuse and abuse promulgated by DBMS vendors, the trade media, "experts" and poor SQL implementations, all of which ignore its formal theoretical grounding from which all the practical benefits derive. You will also learn to minimize the consequences of deficiencies of and optimize the use of SQL.

The series starts with an analysis of the evolution of E. F. Codd's thinking between 1969-1980, which will then be used to show how McGoveran's interpretation follows from his RDM vision and with what consequences. Paper #1, The Interpretation and Representation Of Database Relations, covers the structural component of of the RDM as introduced in EFC's two initial 1969-70 papers.

Paper #2 covers the sections not discussed in paper #1 (net of the relational algebra, which is deferred to a future paper), as follows:

Table of Contents

Acknowledgement

Preface

Introduction

1. Logical Symmetric Access

2. Universal Data Sub-language

2.1. FOPL vs. SOPL
2.2. Relational Completeness
2.3. Computational Completeness and Hosting

3. Kinds of Relations
3.1. Expressible and Named Relations
3.2. Derived Relations
3.3. Relations with Stored Data


4. Derived Relations and Redundancy
4.1. Database Consistency

5. Database Catalog

Conclusion

References

Both papers are available to order from the PAPERS page.




Friday, June 30, 2017

Data Meaning: Analytics vs. Data Mining


My July post @All Analytics

Data mining is distinct from analytics. The former is aimed at ‘finding’ meaningful data patterns—i.e., knowledge ‘discovery’—while the latter derives new knowledge from ‘existing’ knowledge—i.e., deduction (see Data, Information, Knowledge Discovery, and Knowledge Representation). ‘Sensible’ querying of databases to retrieve data for analytic applications and correct interpretation of results without a good grasp of data meaning is a fool's errand. Yet current database practices are extremely deficient in this respect.

Sunday, June 25, 2017

Relations & Relationships Part I


This is a 06/18/17 rewrite of a 04/14/13 to bring it in line with the McGoveran interpretation of Codd's RDM[1].

Similarly, as we have explained, some object group properties arise from relationships among individual properties of its members and/or among the members themselves[4]. And a relationship can be modeled as an object, with properties of its own. For example, the relationship between a supplier and a part can be modeled as an object--a relationship object, with quantity as a property. On the other hand, a relationship can also be perceived as a property. For example, the relationships between supplies and suppliers and parts can be modeled with referencing properties. The distinction that the Entity/Relationship Model (E/RM) approach makes is unnecessary and contributes to the confusion[5]. Therefore, Kent should not be interpreted to mean that modeling choices, once made, are confusing, but rather that they are "in the eye of the beholder", so to speak--there is usefulness, not correctness. (Kent's book is a must-read Recommended Book). Otherwise put, make the most useful choice and strive for parsimony, explicitness, well-definedness, and consistency.

Read it all 



Sunday, June 18, 2017

This Week

1. Database Truth of the Week

"The RDM is a formal system. It has two parts.
  • The Deductive Subsystem: the formal language
  • The Interpretation Subsystem i.e., the application--of that language
Without an interpretation subsystem there is no possibility of applying the formal system and it remains an abstract game of symbols.
Semantics is about applying the RDM to some subject. In effect, what you do is restrict the power of the abstract formalism so that it is more closely aligned with your intended use. That means you are using constraints to limit the vocabulary to the subject matter (and making it finite and usually fairly small) and restricting the possible interpretations that can be used consistently with the resulting subset of the formalism." --David McGoveran