Saturday, April 7, 2018

Name the Relational Violation Part 2: Self-defeating Constraint

Note: This two part series is a rewrite of of an older post (which now links here), to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.

(Continued from Part 1)

In Part 1 I how several data practitioners failed to pinpoint the relational violation by a a conditional uniqueness constraint that should have been obvious with foundation knowledge. The closest one came was "more than one kind of business entity here [that] share the same properties (not attributes)", but still missed the implications. 



I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 



  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 


  • The Dystopia of Western Decadence, the Only Acceptable Racism, and the Myth of a “Palestinian nation”

Key Fundamentals

Databases represents facts about entities that are distinguishable in the real world because they are unique. The RDM is applicable to databases that represent the real world because it is a theory of unique objects. Conventional wisdom notwithstanding, primary keys (PK) -- not just candidate keys (CK) -- are mandatory for theoretical (as well as pragmatic) reasons[2].

Other than uniqueness, a PK must satisfy several other formal requirements, one of which is that it must represent a name, not properties[1]. If there are multiple CKs that represent names used in the real world to identify entities, the simplest is designated PK. If there is no name CK (or no simple name CK), a simple surrogate key (SK) is generated that is managed by the DBMS[1]. However, to qualify as PK, a SK must ensure entity integrity (i.e., every tuple represents exactly one entity). A SK guarantees that iff there is a 1:1 relationship between its values and some CK that represents either a set of properties, or possibly another, less simple, name used to identify entities in the real world (i.e., there is a reliable procedure to associate every SK value with one of the CK)[2].

No "Partial Uniqueness"

The conditional uniqueness constraint is a formalization of a business rule to the effect that the two properties represented by the composite attribute (A,B) uniquely identify some, but not all of the entities represented by TABLEX. In other words, (A,B) is not a CK!

Ah, some say, but there is a SK. Ah, but to qualify as PK, the SK must have the 1:1 relationship with some CK that represents an entity identifier, and if (A,B) is not, the SK cannot guarantee entity integrity.

In other words, TABLEX has no valid PK and is not a relation

Database Consistency

As one respondent suggested, if there are two types of entities, active and inactive, there are two options: "One could simply go ahead and delete the 'inactive [tuples], [or] move the inactive [tuples] to a separate history [relation]." But the conditional key constraint wouldn't have come up if the inactive tuples were to be deleted, so we assume that they are retained. 

While "bundling" distinct types of entities into one relation is always poor design, denormalization usually bundles types that have distinct properties (and relationships, which, as it turns out, are properties too[3]). But active and inactive entities share the same properties.
So while they do need to reside in distinct relations, inactive tuples cannot just be dumped "as is" into an INACTIVE relation -- the proper design depends on the conceptual context (see one example in[4]).


Nobody really understands the RDM. The little relational understanding that does exist derives from conventional wisdom which, as McGoveran is proving, is flawed.

As we have explained, the PK mandate is a formal requirement of first order predicate logic (FOPL) -- without PKs the RDM would not applicable to database management, which is why Codd added PKs to relations
[1,2]. But according to the conventional understanding of the RDM, the PK mandate has no theoretical basis and, therefore, while PKs are "a good idea", CKs are sufficient; many data professionals question even the necessity of CKs. 

This probably plays a part in obscuring the self-defeating nature of a conditional key constraint.


[1] McGoveran, D., LOGIC FOR SERIOUS FOR DATABASE FOLK, forthcoming.

[2] Pascal, F., The Key To Relational Keys: A New Understanding.

[3] Pascal, F., Conceptual Business Modeling for Database Design: A New Perspective, forthcoming.

[4] Pascal, F., Relation Predicates and Identical Relations.

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

No comments:

Post a Comment

View My Stats