Friday, December 29, 2017

DBMS for Analytics: Risky Business Without Foundation Knowledge, Part 2


Note: This was originally posted at AllAnalytics, which no longer exists, so some links to other posts there no longer work, but I left them in to alert the reader that I have written on those specific subjects. Other links work.  

In this two-part series I alert analysts that correct interpretation and assessment of media/industry claims without being misled requires a good grasp of data fundamentals. In Part 1, I discussed the logical-physical confusion and the erroneous missclassification of DBMSs as relational and non-relational underlying the argument that the latter are superior to the former for analytics applications. In Part 2, I discuss a third misconception behind the claim.

Thursday, December 21, 2017

Three Re-writes with Season's Greetings

The following posts have been re-written to bring in line with the McGoveran formalization and interpretation of Codd's true RDM. Re-reading is strongly recommended.
"But the core Information Principle (IP) of the RDM mandates that all information in a relational database be represented explicitly and in exactly one way -- as values of relation attributes defined on domains. The difference between relation names is, thus, meaningful information, the representation of which violates the IP and the RDM, for which reason it is inaccessible to the DBMS: consider the candidate tuple {v1,v2} -- it is impossible for the DBMS to know to which relation it belongs based on the relation and attribute names because it does not understand semantics!" Database Design Relation Predicates and “Identical Relations”
"Some set defining properties are formed as the disjunction of two or more properties (a kind of relationship between two common properties). These disjuncts, taken together, are meaning criteria. Each meaning criterion (an individual disjunct) induces a partitioning of a set into two subsets, those that meet the criterion and those that do not. Alternatively, we can say that each meaning criterion serves to differentiate a possible subset of a set from other subsets of the set (some of the possible subsets will be disjoint, while others not). Each of the possible subsets of the set is then defined by (“inherits”): The defining properties of the set conjoined with at least one meaning criterion (that or those becoming the defining property, or properties, respectively, specific to the proper subset)." Meaning Criteria and Entity Supertype-Subtypes
"Although they are no longer used, inquiries about them persist and with the current proliferation of non-relational products (e.g., NoSQL, graph DBMSs) there is value in understanding them. The closest the industry came to implementing the RDM is SQL which, despite its poor relational fidelity, proved much superior relative to the complexity and inflexibility of preceding DBMSs. But the rules still expose poor relational fidelity of SQL DBMS's that have not been addressed for four decades, while new RDM violations were introduced.

We offer here our clarifications on the rules. For each rule, we:
  • Explain its intended objective;
  • Offer clarifications, some of which reflect our current understanding of the RDM -- distinct from conventional wisdom -- based on its dual theoretical foundation and a careful analysis of Codd's work;" --Interpreting Codd's 12 Rules



 




Sunday, December 17, 2017

This Week


1. Database truth of the week

"Within the database field, it is common to refer to three “level” of description: conceptual, logical, and physical. Both the logical level and the physical level are formal systems. By contrast, the conceptual level is typically an informal system and refers to the subject of the database.

The conceptual language is a subject language, in the terminology of formal systems. The conceptual level identifies the concepts to be formally represented by the logical and physical levels, and how users think and talk about those concepts. This level corresponds only informally to the so-called “conceptual schema” of earlier approaches to information management, which emphasized the capture of conceptual information using various techniques including diagrams and documentation having various degrees of formality, but not forming a strictly formal system themselves."
-- David McGoveran

2. What's wrong with this database picture?


I re-wrote two older debunkings to bring them in line with the McGoveran formalization and interpretation of Codd's true RDM. Re-reads are recommended.
"Can you have 2 tables, VIEWS and DOWNLOADS, with identical structure in a good DB schema (item_id, user_id, time). Some of the records will be identical but their meaning will be different depending on which table they are in. The "views" table is updated any time a user views an item for the first time. The "downloads" table is updated any time a user downloads an item for the first time. Both of the tables can exist without the other ..."
"I have a database for a school ... [with] are numerous tables obviously but consider these:
CONTACT - all contacts (students, faculty) has fields such as LAST, FIRST, MI, ADDR, CITY, STATE, ZIP, EMAIL;
FACULTY - hire info, login/password for electronic timesheet login, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT.
Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? ..."

Sunday, December 10, 2017

Conventional Wisdom and True Relational Features

Here's what's wrong with last week's picture, namely:
"Per Date’s AN INTRODUCTION TO DATABASE SYSTEMS, Date & Darwen’s DATABASES, TYPES, AND THE RELATIONAL MODEL, and related references, the features of a relational database are values, types, attributes, tuples, relations, relation-valued variables, operators, and constraints.
  • A type is a set of values and related operators.
  • An attribute is a name, value, type triple.
  • A tuple is a set of attributes.
  • A relation is a set of tuples with a given heading.
  • A relation-valued variable (known as a relvar) is a persistent variable whose time-varying value is a relation." --Dave Voorhis, Computer scientist; lead developer of Rel, a true relational database system, Quora.com

This is more or less the conventional wisdom, which is nothing like the true RDM envisioned by Codd [1].


Sunday, December 3, 2017

DBMS for Analytics: Risky Business Without Foundation Knowledge (Part 1)


Note: This was originally a post at AllAnalytics, which is no longer exists, so some links to other posts there no longer work, but I left them in to alert the reader that I have written on those specific subjects. Other links work.
 

A new study finding that "non-relational database management systems now comprising 70% of analytics data sources" attributes their popularity to "superiority" over RDBMSs in satisfying analytics needs. There are good reasons to be skeptical of such findings, but even if this one were true, the arguments advanced in support of the claim are rooted in the usual misconceptions due to poor foundation knowledge debunked by this blog. Let's see.

This Week


1. Database Truth of the Week

"A formal system is a systematic way of representing something. We call that something the subject and often refer to it as the “subject system”, although it might not be “systematic” in any sense at all. By contrast we call the formal system the object system -- a system of abstract “objects”.

Representing a subject with a formal system allows us to reason about it without getting trapped in ambiguities, or circular arguments. The formal system becomes a theory about that portion of the subject that has been represented and when that portion is faithfully represented by the theory, we say that portion is a model of the theory.
There are at least three distinct uses of languages necessary to use or apply any formal system that correspond to the three distinct ways in which we need to discuss and use formal systems. It is standard to refer to these three uses by different names and as if they were distinct languages: subject, object and meta-language." --David McGoveran

Ed. Note: In the database context, the subject language expresses the conceptual model; the object language expresses the logical model; the data model is the meta-language that expresses the relationship between object and subject.




2. What's Wrong With This Database Picture?

"Per Date’s AN INTRODUCTION TO DATABASE SYSTEMS, Date & Darwen’s DATABASES, TYPES AND THE RELATIONAL MODEL and related references, the features of a relational database are values, types, attributes, tuples, relations, relation-valued variables, operators, and constraints.
  • A type is a set of values and related operators.
  • An attribute is a name, value, type triple.
  • A tuple is a set of attributes.
  • A relation is a set of tuples with a given heading.
  • A relation-valued variable (known as a relvar) is a persistent variable whose time-varying value is a relation." --Dave Voorhis (Computer scientist; lead developer of Rel, a true relational database system), --What are the features of a relational database?, Quora.com