Sunday, January 28, 2018

This Week




1. Database truth of the week

"Relvars introduce a concept of assignment, which has no counterpart in either FOPL or set theory. If you add it to those formalisms you introduce computational completeness, which destroys both decidability (the existence of a general algorithm by which you can determine if an expression is or is not logically valid) and the guarantee that there exists a (query) evaluation procedure that will halt (the existence of a general algorithm by which you can evaluate the truth or falsity of every instantiated predicate expression given those instantiations from any given database). Therefore we must forbid relvars." --David McGoveran


2. What's wrong with this database picture?


"Many data and information modelers talk about all kinds of keys (or identifiers. I'll forego the distinction for now). I hear them talk about primary keys, alternate keys, surrogate keys, technical keys, functional keys, intelligent keys, business keys (for a Data Vault), human keys, natural keys, artificial keys, composite keys, warehouse keys or Dimensional Keys (or Data Warehousing) and whatnot. Then a debate rises on the use (and misuse) of all these keys ... The foremost question we should actually ask ourselves: can we formally disambiguate kinds of keys (at all)? Of all kinds of key, the primary key and the surrogate key gained the most discussion."
"If we take a look at the relational model we only see of one or more attributes that are unique for each tuple in a relation -- no other formal distinction is possible. When we talk about different kinds of keys we base our nomenclature on properties and behavior of the candidate keys. We formally do not have a primary key, it is a choice we make and as such we might treat this key slightly different from all other available keys in a relation. The discussion around primary keys stems more from SQL NULL problems, foreign key constraints and implementing surrogate keys. --Martijn Evers, Kinds of Keys: On the Nature of Key Classifications, dm-unseen.blogspot.com

Sunday, January 21, 2018

How to Think (and Not to Think) During Database Design



"I have to maintain some lists in DB (SQLServer, Oracle, DB2, Derby), I have 2 options to design underlying simple table:

"1st:
 NAME   VALUE
=================
 dept   HR
 dept   fin
 role   engineer
 role   designer
-----------------
UNIQUE CONSTRAINT (NAME, VALUE) and some other columns like auto generated ID, etc.
"2nd:
 NAME  VALUE_JSON_CLOB
==================================
dept   {["HR", "fin"]}
role   {["engineer", "designer"}]
----------------------------------
UNIQUE CONSTRAINT (NAME) and some other columns like auto generated ID, etc.
"There is no DELETE operation, only SELECT and INSERT/UPDATE. In first advantage is only INSERT is required but SELECT (fetch all values for a given NAME) will be slow. In second SELECT will be fast but UPDATE will be slow. By considering there could be 10000s of such lists with 1000s for possible values in the system with frequent SELECTs and less INSERTs, which TABLE design will be good in terms of select/insert/update performance." --SQL TABLE to store lists of strings, StackOverflow.com

Using a relational database to "maintain lists" probably does not merit attention and I actually considered canceling the debunking of this example. But it provides an opportunity to demonstrate the gap between conventional wisdom, database practice and SQL DBMSs and
Codd's true RDM, as formalized and interpreted by McGoveran [1]. Such use is induced by lack of foundation knowledge, so for the purpose of this discussion I treat the example as a case of "how not to think when performing database design".

Note: Certainly logical database design should not be contaminated with physical implementation considerations such as performance [2].


Monday, January 15, 2018

This Week and The End of Empire




1. Database truth of the week

"ALL names are human created, either by non-algorithmic assignment, or via some algorithm. We ONLY know that two types of objects are distinct because they have different sets of defining properties and, for a given object type, we ONLY know that two objects are distinct because the values (observed or measured) of that object type's defining properties are distinct. Names (of objects of some type) allow us to distinguish two such entities ONLY when they are 1:1 with the values of the object defining properties. Two sets of names (whether human assigned or machine generated) consistently identify the same set of entities ONLY when they are 1:1." --David McGoveran


2. What's wrong with this database picture?

"I have to maintain some lists in DB (SQLServer, Oracle, DB2, Derby), I have 2 options to design underlying simple table:

"1st:
 NAME   VALUE
=================
 dept   HR
 dept   fin
 role   engineer
 role   designer
-----------------
UNIQUE CONSTRAINT (NAME, VALUE) and some other columns like auto generated ID, etc.
"2nd:
 NAME  VALUE_JSON_CLOB
==================================
dept   {["HR", "fin"]}
role   {["engineer", "designer"}]
----------------------------------
UNIQUE CONSTRAINT (NAME) and some other columns like auto generated ID, etc.
"There is no DELETE operation, only SELECT and INSERT/UPDATE. In first advantage is only INSERT is required but SELECT (fetch all values for a given NAME) will be slow. In second SELECT will be fast but UPDATE will be slow. By considering there could be 10000s of such lists with 1000s for possible values in the system with frequent SELECTs and less INSERTs, which TABLE design will be good in terms of select/insert/update performance." --SQL TABLE to store lists of strings, StackOverflow.com

View My Stats