Tuesday, October 17, 2017


Given the ample misuse and abuse of terminology, a rigorous and comprehensive
data fundamentals dictionary is long overdue. I have tentatively committed to one that is consistent with Codd's true RDM and its McGoveran interpretation -- as distinct from what passes for it in the industry -- to include (a) informal conceptual terms used properly and consistently and (b) accurate formal logical terms. The project will have two phases:

1. Expansion of this blog's search beyond the current Blogger label limitations;
2. Addition of term definitions and publication of a full fledged desk dictionary;

Search Improvement (Phase I)

Since its inception Google's Blogger -- the platform for this blog -- has had a 200 character limit on the set of labels to tag a post with. This constrains significantly the number of labels per post. Moreover, there are many more fundamental terms than can be practically included in the label list.

Having looked for and failed to find a widget or programmatic solution, the only way around these limitations is  (a) to use acronyms for some of the basic, frequently searched fundamental terms included in the label list and (b) to use the Blogger search feature for terms not on the list. 

  • A TERMINOLOGY page listing fundamental terms, some with acronyms, will be added to site's top menu (this will be the first, online component of the dictionary). Both acronyms and terms without acronyms included in the label list will be marked in bold.
  • Any reference in any post to a fundamental term will also include the acronym, if any (e.g., logical-physical confusion (LPC)). 
Because there is no way in Blogger to document in the label list the terms that acronyms stand for, some searches will be multi-step. The process will work as follows:

  • Is the term you want to search by, or its acronym -- if you recognize it -- on the label list?
  • If yes, search by it.
  • If not, use the browser's "Find on this page" feature to check whether it is on the TERMINOLOGY page:
  •   Is your term on it?
  • If yes, is its acronym marked as a label?
  • If yes, search by its acronym label;
  • If not, use the Blogger site search feature to search by the full term.
  • If not, contact be via email to determine whether it is a significant fundamental terms that should be added to the list.

Note very carefully, however:

  • For label searches the results are determined by my choice of labels, based on my judgment of relevance/significance (I may even assign a label to a post even if the term is not referred explicitly in the text, if I deem it implicitly significant). So you will end up with results "curated" (so to speak) by me.
  • For Blogger searches, subject to how the Blogger algorithm works, you may end up with all the posts with explicit references to a term/acronym, regardless of its significance.
The Blogger search option was always there, but now there is also the correct terminology to guide searches and serve as a learning resource, another idea behind this project.

Examples may help. Say you are looking for posts about logical-physical confusion, the acronym for which, LPC, is on the label list. If you know what LPC stands for, you can search by it. If you don't, you 

(1) go to the TERMINOLOGY page 
(2) use "Find on this page" browser feature and find your term listed with the acronym marked as a label
(3) use the LPC label to search. 

Suppose now you're looking for posts about relation predicate (RP). Neither it or its acronym are on the label list, so you

(1) go to the TERMINOLOGY page 
(2) use "Find on this page" feature and find it listed, but not marked as label 
(3) use Blogger search by either the term or its acronym.

It's a bit cumbersome, I know, but it's the best possible given Blogger limitations and the expanded search guided by the dictionary justifies the inconvenience.

A TERMINOLOGY draft page has been added to the site's top menu. Please check it out and provide feedback -- opinions, suggestions, corrections, ideas are all welcome -- via email. This is an opportunity to test your knowledge of fundamentals against their corruption in the industry.

After the page is finalized, the label list will be revised to be consistent with it.
All forthcoming and possibly some of the most recent posts will also abide by the described system. Time permitting I may go back and gradually revise older posts.

Full Fledged Desk Dictionary (Phase II)

After the above system is implemented and works, the intention is, time permitting, to add term definitions and publish THE DBDEBUNK DICTIONARY OF DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER, similar to THE DBDEBUNK GUIDE TO MISCONCEPTIONS.

Monday, October 9, 2017

This Week

1. Database Truth of the Week

“A DBMS using the RDM for all its functionality would be very limited. The RDM only requires that the declarative data sub-language employed by users for data manipulation -- has power not more expressive than first order predicate logic (FOPL), which implies acceptance of certain limitations on what users can do directly in the language, in return for
Language declarativity and decidability;
Semantic correctness and system-guaranteed logical validity;
Physical and logical independence;
                                                  --David McGoveran

2. What's Wrong With This Database Picture?

"The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system(DBMS).

The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:

  • Determine the data to be stored in the database.
  • Determine the relationships between the different data elements.
  • Superimpose a logical structure upon the data on the basis of these relationships.
Within the relational model the final step above can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects." 
                             --Halil Lacevic, What is a Relational Database?, Quora.com

Monday, October 2, 2017

Understanding the Division of Labor between Analytics Applications and DBMS

 My October post @All Analytics

"I am coming across, on the one hand, instructions on how to do "analytics with SQL" and, on the other, tools purporting to enable "analytics without SQL." They are an umpteenth iteration of essentially similar ideas during my 30-plus years in data management and reflect common and entrenched fundamental misconceptions that I have documented and analyzed the costly consequences of in my writings and teachings. They will keep repeating, inhibiting genuine progress, as long as data fundamentals are ignored or dismissed. One of the least understood is the distinction between DBMS and application functions."

Read it all.


Sunday, October 1, 2017

Class, Type, Relation and Domain in Database Management

This is a 10/01/17 re-rewrite of a 08/12/12 post revised on 12/05/16 to bring it in line with David McGoveran's formal exposition and interpretation[1] of Codd's RDM (as distinct from its common "understanding" in the industry).

Here's what's wrong with last week's picture, namely:

"Our terminology is broken beyond repair. [Let me] point out some problems with Date's use of terminology, specifically in two cases.
  • type = domain: I fully understand why one might equate type and domain, but ... in today's programming practice, type and domain are quite different. The word type is largely tied to system-level (or physical-level) definitions of data, while a domain is thought of as an abstract set of acceptable values.
  • class != relvar: In simple terms, the word class applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a relation is a class (and trivially also a set), which contributes to confusion.
In modern programming parlance class is generally distinguished from type only in that type refers to primitive (system-defined) data definitions while class refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, type and class are actually synonymous."
There is, indeed, a huge mess. And, as always, it is rooted in poor foundation knowledge[2], to which the comment itself is not immune.