Monday, November 7, 2016

The Principle of Orthogonal Database Design Part II

Revised: 11/11/16

(Continued from Part I)

To recall, adherence to the POOD means independent base relations--no base facts are derivable from the other base facts--that have unique relation predicates (RP) i.e., relations are uniquely constrained.

Going back to the employees example in Part 1, because salary and commission are mutually exclusive properties of employees, salaried and commissioned employees are subtypes of the employee supertype. Entity subtypes inherit the properties and constraints of the supertype.

Entity Supertype-subtype Relationships

The three base relations
EMPS (EMP#,ENAME,HIREDATE)
SAL_EMPS (EMP#,ENAME,HIREDATE,SALARY)
COMM_EMPS (EMP#,ENAME,HIREDATE,COMMISSION)
represent the entity supertype and two subtypes. This design avoid NULLs, but note that aside from redundancy, the relations are not independent--supertype facts are derivable from subtype facts--in violation of the POOD.

First, with a DBMS that is not relational and does not support ESS relationships and constraint inheritance (e.g., a SQL DBMS), application/users:

  • Must formulate disjunctive integrity constraints on SAL_EMPS and COMM_EMPS relations to ensure their mutual exclusivity;
  • Must ensure that candidate tuples are inserted into both EMPS and one of the subtype relations;
But into which of the two? The DBMS could guarantee that the tuple is inserted in the correct subtype relation if and only if it satisfied only one of the subtype relation predicates (RP) i.e., if the two subtype relations were uniquely constrained, such that the tuple would satisfy only one.

Some data professionals contend that the RPs are unique by virtue of the distinct relation and column names (an example will be presented in Part III). Thus, for the tuple

{EMP#="emp#",ENAME="ename",HIREDATE=date,SALARY=salary}:
BEGIN TRANSACTION
 INSERT INTO emps (EMP#="emp#",ENAME="ename",HIREDATE=date)
 INSERT INTO sal_emps (EMP#="emp#",ENAME="ename",HIREDATE=date,SALARY=salary}
END TRANSACTION
But consider what happens if SALARY and COMMISSION are defined on the same domain and have overlapping value ranges and one application/user specifies the wrong name: because the relations are not uniquely constrained, the update will render the database inconsistent (without anybody being aware of it!).

This is, of course, always the likely consequence of application/user-enforced integrity. For DBMS-enforced integrity, relation and attribute names are just references to the corresponding RPs, namely the conjunctions of

  • Domain constraints
  • Attribute constraints
  • Tuple constraints
  • Multi-tuple constraints
If they are not unique in the sense that the DBMS "knows" whether and into which relation a tuple belongs (another way to say POOD adherence) the DBMS cannot guarantee database consistency as it should. Thus, wherever the DBMS encounters a name, it should substitute for it the corresponding constraint.

Note: After-the-insert relations must also satisfy, of course, any database (multi-relation) constraints to which it is subject.

POOD, ESS and View Updatability

Let's now comply with POOD by making EMPS a union view (for simplicity, assume the data of the two subtype base relations are stored). ESS relationships generally result in POOD designs where views represent particular entity supertypes or subtypes, so applications will be built on top of views, consistent with the general recommendation that database access should be via views to benefit from logical data independence (LI)--the insulation of applications from logical database reorganizations. For this to work, views must be updatable: the DBMS must know how to propagate view updates to the base relations from which they are--directly or indirectly--derived.

As we explained in Part I, a true RDBMS supports constraint inheritance and it can trace back the base RPs from the cataloged derived view RPs, which, courtesy of the POOD-compliant design, are unique: the base relations are uniquely constrained. When tuples are presented for insertion into the view, a RDBMS aware of the ESS relationship (i.e., it is expressed in the data language as part of the RP), it can:

  • Generate the EMPS union view on its own;
  • Generate and enforce the disjunctive constraints on the base relations;
  • Properly propagate the view update to the (only possible) base relation;
without the inconsistency risk associated with non-DBMS enforced integrity.
Note: Even if the EMPS union view is defined with derived attribute INCOME (salary or commission), the RDBMS can still push the correct value to the correct base relation, because each of the update values are uniquely constrained to be one of SALARY or COMMISSION.

Unfortunately, SQL DBMSs are not relational. Stay tuned for Part III.



Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

17 comments:

  1. Maybe that's just semantics and I'm not very good at those.

    That was in reply to someone not making the imo proper distinction between an application programmer and a db designer, claiming the db designer "is also a programmer, namely of what the DBMS is to do". And it was mostly ironical.

    "My DBMS is sufficiently expressive, thank you."

    That was in reply to a remark that if one's DBMS cannot express the predicates, one should switch DBMS. My DBMS has structural built-in support for expressing all relevant predicates. That is, it has support for keeping a record of what the precise business meaning is of the tuples recorded in [a relvar in] the db. And it has support for expressing (AND enforcing !!!) all applicable constraints. That which Toon in his book claimed no one knew how to do.

    In that light, I don't feel I have any pondering to do let alone any revisiting and what I said remains what I said.

    Also in that light, and regarding

    "What happens when users are not sophisticated enough to "read the mind" of the designer and have either incompatible interpretations or -- just as common -- interpretations that vary by day, mood, focus, etc.? What if two applications have incompatible interpretations?"

    With my DBMS, users are not required to read any designer's mind, they only need to query the catalog. The risk of "Incompatible interpretations" can never be entirely eliminated when the predicate is expressed in NL (typical for external predicates defining business meaning) but can indeed be eliminated if the predicate is expressed as a mathematical formula (typical for definitions of constraints), which in my DBMS it is.

    ReplyDelete
    Replies
    1. I was as ironic as you were. Note my comment in the article "I am not worried about Erwin". As to pondering, it's up to you.

      As you will see, a true RDBMS according to our RDM interpretation should document semantics and provide it on demand to users. However, this does not mean that reliance on the DBMS should not preferred to human intervention, as you say you agree.

      Delete
  2. (I'm going over this finish to start this time)

    "any error in the name specification will result in the tuple being inserted in the wrong relation. But if the design adheres to the POOD the DBMS can decide on its own, based on which RP the tuple satisfies,"

    If a user can make a mistake when trying to pick the right relvar/relation name, he can also make a mistake when trying to pick the right attribute name. The effect stays just the same.

    ReplyDelete
    Replies
    1. Absolutely. Which is why update decisions should not be made based on attribute names either. Names are only references to the integrity constraints. An attribute name is a reference to the attribute constraint.

      Delete
  3. "However, the meaning -- captured by the business rules -- is represented in the database by the formal relation predicate (RP) -- a set of integrity constraints that the DBMS can enforce -- which is the criterion that the DBMS uses to decide whether the tuple belongs in the relation or not. And we should rely as much as possible on the DBMS, not the user, to make that decision."

    If this is your way of saying that all applicable constraints should be definable to the DBMS in such a way that the DBMS, and the DBMS alone, will be able to derive/determine how to actually enforce them, and effectively do that, I agree wholeheartedly, and I'm aware of LOTS of participants on various db discussion fora who feel EXACTLY the same.

    That all those people, when using SQL, have no other option than to hand-code lots of enforcement strategies (in SPROCS, in the business app itself, wherever) is not their fault. Ref. the remark you quoted from Toon&Lex book.

    ReplyDelete
    Replies
    1. Then why do you insist that decisions should be based on names such that users must "help the DBMS" when adherence to the POOD would obviate the need for such help?

      Delete
    2. Because I still have no idea what alternatives are available in your proposed data language for identifying the target relation(s) of a given update.

      Or rather, because I have no idea how "adherence to the POOD" can be enforced unless by relying on ... attribute names.

      Delete
    3. I hope you are not asking for syntax -- we are not designing a data language. We are talking formulation of constraints such that the set of values that comprise a candidate tuple will satisfy only one RP (conjunction of constraints). Then the name is just a reference to the RP and if there is only one that is satisfied, then using the wrong name won't insert the tuple in the wrong relation. Duplicate RPs can't prevent it.

      Orthogonality is a design discipline, same as full normalization (excepting normalization to 1NF which can be enforced), as I explained in a previous article.

      I also mentioned a McGoveran (unproven) conjecture that the POFN implies the other 3 principles, not the other way around.





      The semantics

      Delete
    4. No I'm not asking for syntax. But you have eschewed relvar (names) and attribute names as a means to express updates. We can't express tuple content as merely a bunch of values, because a given tuple type might have more than one String attribute or more than one Int, etc.

      Both you [Pascal] and McGoveran expend considerable verbiage telling us what update is not. I've seen nothing saying what it is. Not in enough detail to design a database that is updatable (orthogonally or otherwise).

      I have by now (after a long time of waiting) come to the conclusion the reason so little is forthcoming is because the quest for POOD is like trying to square the circle.

      Delete
    5. I suggest you re-read the article because you apparently did not understand it. See also my last reply to Erwin.

      As to how to handle updates of relational databases given that SET THEORY HAS NO CONCEPT OF VARIABLES (whether you like it or not), you'll have to have patience for David's book. That work is much more profound and difficult than what passes for relational theory these days. What I am offering is just some basic ideas here and there that will be fully developed in the book.

      In the meantime, why don't you refresh your knowledge -- assuming you have some -- of set theory and predicate logic, to verify that my claim is correct.

      That you dk or understand something does not mean it does not hold.

      FYI: Principle of axiomatic independence is basic in axiomatic theory, of which set theory is one. Designing independent base relations such the facts they represent are not derivable is a clear design principle and it was even alluded to by Codd's first 1969 paper when he discussed weak and strong redundancy. I suggest you re-read that too.

      SO please, we're not gonna spoon-feed you on fundamentals -- go educate yourself and spare me.

      Delete
  4. "In order for this to work, however, every RP must be unique -- another way of saying that the database design adheres to the POOD. Erwin's claim -- assumed by many -- is that RPs are unique even if only relation (and attribute) names are distinct".

    I have no idea what could possibly be meant by "every RP must be unique". Perhaps this problem could be resolved once and for all if you could exemplify two RP's that are "not unique".

    If relvar/relation names are used as the identifier for all update and query operations, then uniqueness is indeed guaranteed by virtue of the name having to be able to act as the identifier. I don't see the problem. (You could theoretically have two distinct relvars/relations/tables in a db, or in distinct db's, representing the very same business meaning. That is more a cultural/organizational problem than it is one that is intrinsic to RM, imo.)

    ReplyDelete
    Replies
    1. Insofar as a tuple insertion is concerned, name uniqueness ***means nothing*** to the DBMS -- it's just 2 distinct names x,y. Uniqueness of RP (conjunction of constraints on relation), OTOH, makes a single relation the recipient of the tuple without the DBMS having to understand meaning -- it "just computes" (somebody used this term).

      Delete
  5. "Of course, declarative constraints are only as good as how efficient CPU-wise and concurrency-wise the underlying implementation code that actually validates the constraint during transactions is, which should be the purview of the DBMS designer. This is why the real solution are shorthands like the ones for key and referential constraints, which are more reliable and relieve developers from the drudgery. They would also help vendors: they can develop shorthand-specific validation code, which is much easier to do than accept developer formulated ASSERTIONs, parse and deep-analyze them."

    In this context, maybe I should once more point to my article in the NOCOUG journal on this very subject (NOV 2013 issue). And the one by Toon published in the preceding issue (AUG 2013) to which mine was a response.

    ReplyDelete
    Replies
    1. Are they online? If so, can you provide the links?

      Delete
    2. Toon's : https://nocoug.files.wordpress.com/2014/08/nocoug_journal_201308.pdf

      Mine : https://nocoug.files.wordpress.com/2014/08/nocoug_journal_201311.pdf

      Delete
  6. hmm... all attempts at defining formal systems are doomed to be incomplete. Doesn't mean we try to define all the rules and boundaries, etc., but it doesn't take too many user interactions in said system for holes, inconsistencies and outright breaks in said system can get found out by simply blind and stupid user interaction.

    "you're not holding it right"...

    ReplyDelete
    Replies
    1. And your point is? That systems not fomally defined are better, or at least as reliable as formal ones?

      One of the points of a well-designed formal system is limit as much as possible "blind and stupid interaction" without affecting the meaningful functionality.

      Delete