April 2018 - DATABASE DEBUNKINGS

Sunday, April 29, 2018

A New Understanding of Keys Part 3: Surrogate Key Illusions

Follow @DBDebunk Follow @ThePostWest

Note: This the third of three re-writes of older posts to bring them in line with McGoveran's formalization and interpretation[1] of Codd's true RDM. They are short extracts from a completely rewritten paper #4 in the PRACTICAL DATABASE FOUNDATIONS series[2] that provides a new perspective on relational keys, distinct from the conventional wisdom of the last five decades.

(Continued from Part 2)

"When defining a surrogate primary key for a [SQL Server] table, two options are the most common: Integer and UniqueIdentifier (aka Globally Unique Identifiers, or GUID's) ... Historically, Integer has been the logical choice. It’s human-readable, requires minimal storage, and can be set as an identity (auto-incrementing) to prevent the need for additional application logic. UniqueIdentifier comes with significant disadvantages. The most immediately noticeable is that it’s user-unfriendly. You’ll never hear a user or developer ask you about record “A78383A3-4AB1-42CF-B3FC-A4A23AD10398”. With high availability and replication becoming highly prevalent, UniqueIdentifier is being chosen more often, but has caveats that mean it isn’t always the optimal solution."

--Jeffrey J. Keller, Vertabelo.com

As we explained in Parts 1 and 2, keys can be properly understood only within the RDM. We revealed a new perspective on keys, discussed relationally valid kinds of keys, and revised definitions of natural (NK) and surrogate keys (SK).

As we have seen, the formal PK mandate is distinct from PK selection, which may be pragmatic. A PK must represent a name -- either pre-assigned, or generated only when there is no simple name CK. Generated keys must ensure entity integrit and are managed by the DBMS transparently to users.

All this is absent from conventional wisdom and database practice, as the above example illustrates: generated SKs are overused for the wrong reasons, the most common being emulation of OIDs (a SK -- often database-wide and, so, unique across relationsn), followed by performance.

Note: While OIDs have unique values, they often also have some physical significance.

A New Understanding of Keys Part 2: Kinds of Keys

Follow @DBDebunk Follow @ThePostWest

Note: This the second of three re-writes of older posts to bring them in line with McGoveran's formalization and interpretation[1] of Codd's true RDM. They are short extracts from a completely rewritten paper #4 in the PRACTICAL DATABASE FOUNDATIONS series[2] that provides a new perspective on relational keys, distinct from the conventional wisdom of the last five decades.

(Continued from Part 1)

"Many data and information modelers talk about all kinds of keys (or identifiers. I'll forego the distinction for now). I hear them talk about primary keys, alternate keys, surrogate keys, technical keys, functional keys, intelligent keys, business keys (for a Data Vault), human keys, natural keys, artificial keys, composite keys, warehouse keys or Dimensional Keys (or Data Warehousing) and whatnot. Then a debate rises on the use (and misuse) of all these keys ... The foremost question we should actually ask ourselves: can we formally disambiguate kinds of keys (at all)? Of all kinds of key, the primary key and the surrogate key gained the most discussion."

"If we take a look at the relational model we only see of one or more attributes that are unique for each tuple in a relation -- no other formal distinction is possible. When we talk about different kinds of keys we base our nomenclature on properties and behavior of the candidate keys. We formally do not have a primary key, it is a choice we make and as such we might treat this key slightly different from all other available keys in a relation. The discussion around primary keys stems more from SQL NULL problems, foreign key constraints and implementing surrogate keys."

--Martijn Evers,dm-unseen.blogspot.com

I've deplored the misuse and abuse of terminology due a general lack of foundation knowledge in the industry [3] for longer than I care to remember, and keys are not an exception. If "the discussion around primary keys stems more from SQL NULL problems, foreign key constraints and implementing surrogate keys", then there is no understanding of relational keys whatsoever: whatever it is, a data structure that contains NULLs is not a relation, one reason for which SQL tables are not relations, SQL databases are not relational and SQL DBMSs are not RDBMSs (for a relational solution to missing data without NULLs see[4]).

We sure can disambiguate, but the key (pun intended) to keys is that they are a relational feature and, thus, can only be properly understood within the dual theoretical foundation of the RDM, which is an adaptation and application of simple set theory (SST) expressible in first order predicate logic (FOPL) to database management. Thus, their "nomenclature on properties and behavior" should reflect what from the real world they represent, and what function they fulfill in the RDM. Which is precisely what the industry disregards.

A New Understanding of Keys Part 1: Primary Key Formal Mandate and Pragmatic Selection

Follow @DBDebunk Follow @ThePostWest

Note: This the first of three re-writes of older posts to bring them in line with McGoveran's formalization and interpretation[1] of Codd's true RDM. They are short extracts from a completely rewritten paper #4 in the PRACTICAL DATABASE FOUNDATIONS series[2] that provides a new perspective on relational keys, distinct from the conventional wisdom of the last five decades.

"The Internet is full of dogmatic commandments for choosing and using keys in relational databases. At times it verges on a holy war: should you use natural or artificial keys? Auto-incrementing integers, UUIDs? After wading through sixty-four articles, skimming sections in five books, and asking questions on IRC and StackOverflow I think I’ve put the pieces together and have a recommendation to harmonize the various camps. Many arguments about keys boil down to false dichotomies and failures to acknowledge other points of view."

--Joe Nelson, begriffs.com

As a relational feature, keys can only be properly understood within the formal foundation of the RDM, which is simple set theory (SST) expressible in first order predicate logic (FOPL), adapted and applied to database management. Yet that is precisely what is ignored and dismissed in the industry -- including by the authors of SQL. Dogma and holy war are products of ignorance. What Nelson did "piece together" from "sixty-four articles, five books and asking questions" is conventional wisdom, which cannot produce understanding because it has been off for decades.

Name the Relational Violation Part 2: Self-defeating Constraint

Follow @DBDebunk Follow @ThePostWest

Note: This two part series is a rewrite of of an older post (which now links here), to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.

(Continued from Part 1)

In Part 1 I how several data practitioners failed to pinpoint the relational violation by a a conditional uniqueness constraint that should have been obvious with foundation knowledge. The closest one came was "more than one kind of business entity here [that] share the same properties (not attributes)", but still missed the implications.

POSTS

Sunday, April 29, 2018

A New Understanding of Keys Part 3: Surrogate Key Illusions

Sunday, April 22, 2018

A New Understanding of Keys Part 2: Kinds of Keys

Sunday, April 15, 2018

A New Understanding of Keys Part 1: Primary Key Formal Mandate and Pragmatic Selection

Saturday, April 7, 2018

Name the Relational Violation Part 2: Self-defeating Constraint