Saturday, November 16, 2013

Theory Applied Right: N-ary R-tables, NULLs and Keys

In a comment to one of my posts E writes:
E: Codd has based his model on n-ary relations and that is the key mistake he has made ; that leads to complex structure (absolutely not necessary) and situations where there are no values known and as a consequence the need of the concept we know too well -> the null pointers; bin-ary/2-ary relations (smallest possible) are sufficient to express any predicate/sententional formula and there is no possibility to have something like null; if a value is unknow then we do not know it thus it is not a fact for us thus it is not in our database; function is a special case of a relation...
Not accurate.

  • In fact, Codd initially based the RM on binary relations, but realized subsequently that it was not a practical idea. The only way to perceive binary relations a good idea is by ignoring the integrity implications: consider what happens to the number and complexity of constraints when relations are exclusively binary. And as soon as relational operations are applied, the results are n-ary relations anyway. Besides, n-ary relations can be viewed as representing entities or, more accurately, propositions about real world entities; what exactly in the real world do binary relations represent?
  • Codd has a lot to answer for SQL's NULLs, but not for the reasons advanced by E. He advocated a four-valued logic (4VL) based on an I-Mark (inapplicable value) and an A-mark (unknown value). They served as inspiration for IBM's one mark in SQL--NULL--a disaster in both concept and implementation. For why both Codd and IBM were mistaken see my The Last NULL In the Coffin: A Relational Solution to Missing Data. The existence of NULL has nothing to do with n-ary relations. It originate in a theoretical error and many implementation mistakes due to it. As I demonstrate in my paper (1) a correct conceptual and logical representation of inapplicable data (entity supertype-subtypes) and (2) a relational solution to unknown/missing data obviate the need for NULLs.
E: ...and I would suggest to make a little step further: deploy a functional approach to data management; then we will avoid all the problems and drawbacks we now face: normalization and most importantly the null pointers; and as PK has written: "....PKs have no theoretical foundation ......" and he is right; with functional approach PKs have; Codd wrote in his 1990 book "...The fact that relations can be perceived as tables ... breeds the false assumption that the freedom of actions permitted [with] tables .....must also be permitted when manipulating relations..." and for the sake of God stop calling relations tables or R-tables: a relation is not a table and will never be.
I can't say I understand how a "functional" approach relates to the cardinality of relations and what "null pointers" have to do with NULLs, but insofar as normalization and PKs are concerned, see my previous posts on the subject:
  • In Business Modeling For Database Design I show that if my approach is followed for both, the results are implicitly fully normalized (5NF) databases. Explicit further normalization is necessary only as a design repair procedure, if/when both representations are messed up.
  • I agree that one should not arbitrarily use the term table to mean relation. But I disagree that relation is a database construct: it is a mathematical abstraction which carries no real world meaning and represents nothing. Codd applied relational theory to databases, which represent aspects of the real world. Theories often require adjustments to be applicable to the real world. An R-table is a subset of a relation applied to database management and I have been very careful in how I define it and its manipulation to preserve theoretical rigor (see Truly Relational: What It Really Means). Keys are another such adjustment (see my modeling paper for an explanation; I note, in passing, Date's revised notion that the critical concept is the candidate key and that selecting one as primary key is a pragmatic matter, mainly for referential integrity purposes (see my forthcoming paper on keys).

No comments:

Post a Comment