Sunday, September 17, 2017

Database Management: No Progress Without Data Fundamentals

I have recently -- yet again -- been accused in a LinkedIn exchange  of "gibberish without any evidence" and of claiming that "nobody know what they're doing" with databases. I will leave it to readers to judge whether (1) five decades worth of writings and teaching is "no evidence" and (2) my comments in the exchange are gibberish. Here I would like to dare anybody to find claims to that effect in any of my pronouncements. What I did, do and will say is that most data professionals do not know and understand data and relational fundamentals -- an incontrovertible fact proved not just by me[1], but also by others[2,3] and that this inhibits real progress in database management. 

As I wrote two weeks ago:
"The RDM put database management on a formal, scientific foot. Consequently, tool experience and relational terminology are insufficient -- foundation knowledge is necessary. Unfortunately, most data professionals do not possess it, in part because they have been misled by the industry and in part because few go through an education -- as distinct from training -- program that teaches the RDM and teaches it correctly. Consequently, even those with the heart in the right place defend the RDM without a full understanding, their views distorted by what passes for it (stay tuned for a debunking of such a recent example)."
I will now fulfill the promise by debunking just such a "heart-in-the-right-place" defense of the RDM. 

Sunday, September 10, 2017

This Week

1. Database Truth of the Week

“A network is a directed acyclic graph (the "direction" of the transitive relationship) and, thus, amenable to transitive closure (TC). In the Relational Data Model (RDM) that usually means the smallest set that includes all the members that satisfy the transitive relationship in question (for the count of each object type the closure is computed and the count ignores level). While the Relational Data Model (RDM) can handle an important subset of graph theory via special graph domain operators and extensions to the original relational operators, which could be made efficient, it is a very difficult problem. Certain computations on finite sets such as TC are not in general computable in a language based on first order predicate logic (FOPL) that is declarative, decidable and supports physical independence (PI) -- a core relational objective. They require a computationally complete language (CCL) that is imperative and recursive.
A ‘TC function’ can be implemented using a host CCL that returns its result in the form of a relation; then a symbol (i.e., pure syntax) of type relation can be defined in relational algebra that references/invokes that function. From within the algebra it appears to be just a relation and is up to the user to understand what the value of the returned relation means --i.e., that it represents the TC. That understanding/interpretation is outside the algebra and passed to users only via documentation (e.g., some meta-language).” --David McGoveran


2. What's Wrong With This Database Picture?

"I don’t like talking about the relational theory of data. It is absolutely fundamental to any deep understanding of data, but most practitioners get along fine without it. It’s more the implementers of database management systems (DBMSs) who need to understand relational theory, so teaching relational theory to ordinary practitioners is a bit like tormenting people with irrelevant theory before you let them get on with the business at hand. Moreover, some of those who understand relational theory use their knowledge to beat other people over the head with it. I don’t want to be associated with that high-handed approach to this important theory.

But I’ve been goaded. Google made me do it. My attention was drawn to a video put out by some folks at Google, Data Modeling for BigQuery. The video is fine for the most part, but it makes some misstatements about relational theory that just drive me crazy. They repeat commonly accepted misconceptions about relational databases—misconceptions that, unfortunately, have driven some of the “advances” we’ve seen of late in the realm of database technology. There have definitely been some true advances, but some new technology is merely different without being better.
If you’re a practitioner, designing, implementing, and using databases, whether SQL or NoSQL, this won’t matter much to you, although it never hurts to learn a little more about the theory of data. However, if you are a programmer who might be the one who builds the next NoSQL mega-star that will replace decades-old technology, you need to know this, because this knowledge will enable you to blind-side every established DBMS vendor, whether SQL or NoSQL." --Ted Hills, Understand Relational to Understand the Secrets of Data

Friday, September 1, 2017

Don't Confuse/Conflate Database Consistency with Truth

My September post @All Analytics.

Ideally only true axioms should be represented in the database. But while a DBMS can enforce declared constraints for database consistency, it cannot ascertain truth (e.g., that there is an actual employee with specific property values in the real world). The facts recorded in the database are, thus, not statements of objective truth about the world -- they are assumed to be true only because they were asserted as such by trusted authorized users and are as true as the trust accorded those users is justifiable.

Read it All.



Sunday, August 27, 2017

Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness

Here's what's wrong with last week's picture, namely:
"In my experience, using an object model in both the application layer and in the database layer results in an inefficient system. This are my personal design goals:
- Use a relational data model for storage
- Design the database tables using relational rules including 3rd normal form
- Tables should mirror logical objects, but any object may encompass multiple tables
- Application objects, whether you are using an OO language or a traditional language using structured programming techniques should parallel application needs which most closely correspond to individual SQL statements than to tables or "objects". --LinkedIn.com
Read it all



Sunday, August 20, 2017

This Week

1. Database Truth of the Week

“... [one] limitation imposed by set semantics is the inability to express the concept of a computer variable to which values can be destructively assigned (or "updated") ... variables can be expressed in logic, but they cannot be expressed in elementary set theory, or first order predicate logic (FOPL) -- the foundations of the RDM. Other, more expressively powerful systems are required. Unfortunately, such powerful formal systems do violence to the RDM and its intent.” --David McGoveran


2. What's Wrong With This Database Picture?

"In my experience, using an object model in both the application layer and in the database layer results in an inefficient system. This are my personal design goals:
- Use a relational data model for storage.
- Design the database tables using relational rules including 3rd normal form
- Tables should mirror logical objects but any object may encompass multiple tables
- Application objects, whether you are using an OO language or a traditionallanguage using structured programming techniques should parallel application needs which most closely correspond to individual SQL statements than to tables or "objects". --LinkedIn.com

Sunday, August 13, 2017

Relational Fidelity, Cursors and ORDER BY

Here's what's wrong with last database picture, namely:
"In a book I am reading (QUERYING SQL SERVER 2012) the author talks about theory of how databases work. He mentions relations, attributes and tuples etc. He frequently stresses the fact that some aspect of T-SQL is not relational. Like in the following excerpt:
"T-SQL also supports an object called a cursor that is defined based on a result of a query, and that allows fetching rows one at a time in a specified order. You might care about returning the result of a query in a specific order for presentation purposes or if the caller needs to consume the result in that manner through some cursor mechanism that fetches the rows one at a time. But remember that such processing isn’t relational. If you need to process the query result in a relational manner--for example, define a table expression like a view based on the query--the result will need to be relational. Also, sorting data can add cost to the query processing. If you don’t care about the order in which the result rows are returned, you can avoid this unnecessary cost by not adding an ORDER BY clause."
I would like to know, since every implementation of SQL pretty much has an ORDER BY clause which makes it non-relational, why does it even matter that (the set after ORDER BY is used) its not relational anymore since its like that everywhere? I can understand if he said it was non-standard, for example using != instead of <> for inequality because that affects portability etc., but I do not understand why something is better being relational. Please enlighten." --stackoverflow.com

Saturday, August 5, 2017

This Week

1. Database Truth of the Week

"Semantic correctness: every interpretation of the symbols (meaning assignment and truth value assignment) that makes the axioms true, makes the theorems true. When we extend a logical data model with semantics (specific to the subject matter and its "business" rules) via constraints, those constraints become axioms that must be true." --David McGoveran