Sunday, January 21, 2018

How to Think (and Not to Think) During Database Design

"I have to maintain some lists in DB (SQLServer, Oracle, DB2, Derby), I have 2 options to design underlying simple table:

"1st:

 NAME   VALUE
=================
 dept   HR
 dept   fin
 role   engineer
 role   designer
-----------------
UNIQUE CONSTRAINT (NAME, VALUE) and some other columns like auto generated ID, etc.
"2nd:

 NAME  VALUE_JSON_CLOB
==================================
dept   {["HR", "fin"]}
role   {["engineer", "designer"}]
----------------------------------
UNIQUE CONSTRAINT (NAME) and some other columns like auto generated ID, etc.
"There is no DELETE operation, only SELECT and INSERT/UPDATE. In first advantage is only INSERT is required but SELECT (fetch all values for a given NAME) will be slow. In second SELECT will be fast but UPDATE will be slow. By considering there could be 10000s of such lists with 1000s for possible values in the system with frequent SELECTs and less INSERTs, which TABLE design will be good in terms of select/insert/update performance." --SQL TABLE to store lists of strings, StackOverflow.com

Using a relational database to "maintain lists" probably does not merit attention and I actually considered canceling the debunking of this example. But it provides an opportunity to demonstrate the gap between conventional wisdom, database practice and SQL DBMSs and
Codd's true RDM, as formalized and interpreted by McGoveran [1]. Such use is induced by lack of foundation knowledge, so for the purpose of this discussion I treat the example as a case of "how not to think when performing database design".

Note: Certainly logical database design should not be contaminated with physical implementation considerations such as performance [2].

Monday, January 15, 2018

This Week and The End of Empire


1. Database truth of the week

"ALL names are human created, either by non-algorithmic assignment, or via some algorithm. We ONLY know that two types of objects are distinct because they have different sets of defining properties and, for a given object type, we ONLY know that two objects are distinct because the values (observed or measured) of that object type's defining properties are distinct. Names (of objects of some type) allow us to distinguish two such entities ONLY when they are 1:1 with the values of the object defining properties. Two sets of names (whether human assigned or machine generated) consistently identify the same set of entities ONLY when they are 1:1." --David McGoveran


2. What's wrong with this database picture?

"I have to maintain some lists in DB (SQLServer, Oracle, DB2, Derby), I have 2 options to design underlying simple table:

"1st:
 NAME   VALUE
=================
 dept   HR
 dept   fin
 role   engineer
 role   designer
-----------------
UNIQUE CONSTRAINT (NAME, VALUE) and some other columns like auto generated ID, etc.
"2nd:
 NAME  VALUE_JSON_CLOB
==================================
dept   {["HR", "fin"]}
role   {["engineer", "designer"}]
----------------------------------
UNIQUE CONSTRAINT (NAME) and some other columns like auto generated ID, etc.
"There is no DELETE operation, only SELECT and INSERT/UPDATE. In first advantage is only INSERT is required but SELECT (fetch all values for a given NAME) will be slow. In second SELECT will be fast but UPDATE will be slow. By considering there could be 10000s of such lists with 1000s for possible values in the system with frequent SELECTs and less INSERTs, which TABLE design will be good in terms of select/insert/update performance." --SQL TABLE to store lists of strings, StackOverflow.com

Sunday, January 7, 2018

Understanding Relational Keys - A New Perspective: Primary Keys

Note: Rewritten 1/1/18. This is one of three re-writes of older posts on keys to bring them in line with the McGoveran formalization and interpretation [1] of Codd's true RDM (see second [2] and third [3]). They are abbreviated extracts of the forthcoming rewrite of [4], which proposes a new perspective on relational keys in depth and more detail, distinct from the conventional wisdom of the last five decades. Re-reads are strongly recommended.

Revised: 1/14/18.

"A key is a column or columns that together have no duplicate values across rows. Also the columns must be irreducibly unique, meaning no subset of the columns has this uniqueness ... In most databases primary keys have survived as a vestige, and nowadays merely provide some conveniences rather than reflecting or determining physical layout. For instance declaring a primary key includes a NOT NULL constraint automatically, and defines the default foreign key target in a PostgreSQL table. Primary keys also give a hint that their columns are preferred for joins." --Joe Nelson, SQL Keys in Depth, begriffs.com

Q: "My understanding has always been that a primary key should be immutable, and my searching since reading this answer has only provided answers which reflect the same as a best practice. Under what circumstances would a primary key value need to be altered after the record is created?"

A: "When a primary key is chosen that is not immutable?"
--Why should a primary key change, StackExchange.com

There is a general and persistent lack of foundation knowledge in the industry [5] and keys are not an exception. 70% of searches hitting this site are about keys, just one indication that this fundamental relational feature is poorly understood decades after the RDM. "wading through sixty-four articles, skimming sections in five books, and asking questions on IRC and StackOverflow" to "put the pieces together"? And then what he put together is conventional wisdom (besides, SQL is the last source to go to for really understanding anything in depth, let alone relational features [6].

Keys can only be understood within the RDM, which is simple set theory (SST) expressible in first order predicate logic (FOPL) adapted and applied to database management -- which SQL authors never understood -- that is disregarded in the industry. For a proper understanding of keys, read my trilogy of posts, [4] for the in-depth treatment,  then compare to Nelson's take and your SQL DBMS support of keys.