Friday, September 22, 2017

This Week

1. Database Truth of the Week

“If the data sub-language ... has the power of second order predicate logic (SOPL), expressions are possible that cannot be evaluated (for example, self-referencing expressions) and the formal language is then undecidable, an algorithm to implement a declarative query language is impossible and all hope of physical independence is lost." --David McGoveran

2. What's Wrong With This Database Picture?

"Our terminology is broken beyond repair. [Let me] point out some problems with Date's use of terminology, specifically in two cases.
"type" = "domain": I fully understand why one might equate "type" and "domain", but ... in today's programming practice, "type" and "domain" are quite different. The word "type" is largely tied to system-level (or "physical"-level) definitions of data, while a "domain" is thought of as an abstract set of acceptable values.

"class" != "relvar": In simple terms, the word "class" applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a "relation" is a "class" (and trivially also a "set"), which contributes to confusion.
In modern programming parlance "class" is generally distinguished from "type" only in that "type" refers to "primitive" (system-defined) data definitions while "class" refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, "type" and "class" are actually synonymous."

Sunday, September 17, 2017

Database Management: No Progress Without Data Fundamentals

I have recently -- yet again -- been accused in a LinkedIn exchange  of "gibberish without any evidence" and of claiming that "nobody know what they're doing" with databases. I will leave it to readers to judge whether (1) five decades worth of writings and teaching is "no evidence" and (2) my comments in the exchange are gibberish. Here I would like to dare anybody to find claims to that effect in any of my pronouncements. What I did, do and will say is that most data professionals do not know and understand data and relational fundamentals -- an incontrovertible fact proved not just by me[1], but also by others[2,3] and that this inhibits real progress in database management. 

As I wrote two weeks ago:
"The RDM put database management on a formal, scientific foot. Consequently, tool experience and relational terminology are insufficient -- foundation knowledge is necessary. Unfortunately, most data professionals do not possess it, in part because they have been misled by the industry and in part because few go through an education -- as distinct from training -- program that teaches the RDM and teaches it correctly. Consequently, even those with the heart in the right place defend the RDM without a full understanding, their views distorted by what passes for it (stay tuned for a debunking of such a recent example)."
I will now fulfill the promise by debunking just such a "heart-in-the-right-place" defense of the RDM. 

Sunday, September 10, 2017

This Week

1. Database Truth of the Week

“A network is a directed acyclic graph (the "direction" of the transitive relationship) and, thus, amenable to transitive closure (TC). In the Relational Data Model (RDM) that usually means the smallest set that includes all the members that satisfy the transitive relationship in question (for the count of each object type the closure is computed and the count ignores level). While the Relational Data Model (RDM) can handle an important subset of graph theory via special graph domain operators and extensions to the original relational operators, which could be made efficient, it is a very difficult problem. Certain computations on finite sets such as TC are not in general computable in a language based on first order predicate logic (FOPL) that is declarative, decidable and supports physical independence (PI) -- a core relational objective. They require a computationally complete language (CCL) that is imperative and recursive.
A ‘TC function’ can be implemented using a host CCL that returns its result in the form of a relation; then a symbol (i.e., pure syntax) of type relation can be defined in relational algebra that references/invokes that function. From within the algebra it appears to be just a relation and is up to the user to understand what the value of the returned relation means --i.e., that it represents the TC. That understanding/interpretation is outside the algebra and passed to users only via documentation (e.g., some meta-language).” --David McGoveran

2. What's Wrong With This Database Picture?

"I don’t like talking about the relational theory of data. It is absolutely fundamental to any deep understanding of data, but most practitioners get along fine without it. It’s more the implementers of database management systems (DBMSs) who need to understand relational theory, so teaching relational theory to ordinary practitioners is a bit like tormenting people with irrelevant theory before you let them get on with the business at hand. Moreover, some of those who understand relational theory use their knowledge to beat other people over the head with it. I don’t want to be associated with that high-handed approach to this important theory.

But I’ve been goaded. Google made me do it. My attention was drawn to a video put out by some folks at Google, Data Modeling for BigQuery. The video is fine for the most part, but it makes some misstatements about relational theory that just drive me crazy. They repeat commonly accepted misconceptions about relational databases—misconceptions that, unfortunately, have driven some of the “advances” we’ve seen of late in the realm of database technology. There have definitely been some true advances, but some new technology is merely different without being better.
If you’re a practitioner, designing, implementing, and using databases, whether SQL or NoSQL, this won’t matter much to you, although it never hurts to learn a little more about the theory of data. However, if you are a programmer who might be the one who builds the next NoSQL mega-star that will replace decades-old technology, you need to know this, because this knowledge will enable you to blind-side every established DBMS vendor, whether SQL or NoSQL." --Ted Hills, Understand Relational to Understand the Secrets of Data

Friday, September 1, 2017

Don't Confuse/Conflate Database Consistency with Truth

Disregard for foundation knowledge and failure to learn from past mistakes by even data professionals deemed experts inhibit progress in data management and bring back problems already resolved that should be of foremost concern to data analysts. Consider the following:

"Above all else, we count on databases to reflect the truth consistently, or at least to reflect the table data perfectly. The database cannot be blamed when an application (or the end users of an application) place inaccurate data in its tables, but a database must accurately report the data it holds. Therefore, bugs are not all created equal; there are bugs, and there are wrong-rows bugs, bugs that silently misrepresent the data that the tables hold. Even the craziest, most obscure corner case that potentially misrepresents your data should rightly bring a loud chorus: "The emperor has no clothes!" We depend on the database, above all, not to lie."

View My Stats