Wednesday, May 3, 2017

The Necessity of Foreign Keys

Last month I showed how poor grasp of data fundamentals makes it difficult to understand different types of key and the necessity of primary keys (PK). There's another type of key that is poorly understood for the same reason: 
"... [we] wish to make a point. There is something which is bad design/good design/mandatory/optional. Please stop insisting that Primary and Foreign keys are mandatory. They are good design habits but by no means mandatory."
Really? FKs were introduced as a relational solution to the problems of hierarchic database technology of the late 60s. 


--------------------------------------------------------------------------------
I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.
---------------------------------------------------------------------------------

Consider the actual example E.F. Codd used in 1970 -- a four-object hierarchy -- when he introduced the concept:
In the hierarchic databases of the time there would be four "segments" -- record types -- connected physically by pointers that would have to be navigated for data access. For example, SALARYHISTORY data would be accessible only via EMPLOYEE records and pointer navigation through the JOBHISTORY records. In other words, information about relationships between objects of different types was represented by the 'physical organization' of the data. Aside from the complexity, applications accessing data for analytical purposes had to reference them explicitly and any reorganization would impair applications, necessitating prohibitive maintenance.

Relational systems support 'physical independence' (PI) -- .e., applications are insulated from physical details. Data access is based exclusively on information content -- data values -- not machine internals that are unknown and irrelevant to application developers and users. That is why the core relational Information Principle (IP) mandates 'all' information in a relational database -- including about relationships -- be represented in 'only one way': as domain values in relations.

If a relation is substituted for each record type:
JOBHISTORY, SALARYHISTORY and CHILDREN are attributes defined on non-simple domains -- specifically, 'relation-valued domains' (RVD) -- i.e., that have relations as values. As I explained in Data Sublanguages, Programming, and Data Integrity, relations "nested" within relations -- sets of sets -- require logic higher than first-order predicate logic (FOPL), the formal foundation of RDM, which robs relational databases of their advantages and makes them not less complicated than the hierarchic predecessors they replaced.

Consequently, the information about the relationships between objects of different types should be represented by values rather than pointers, in compliance with the IP. If relations are defined only on 'simple domains' with atomic values -- are in what we call their 'normal form' -- FOPL and its advantages are preserved. The RVDs can be converted to relations "in their own right" and the pointers with FKs. A FK is a combination of one or more attributes of one relation that is not the PK, whose values match those of the PK of some other relation:
where JOBHISTORY.MAN# and SALARYHISTORY.JOBDATE are FKs.

Thus, FKs are not just "a good database design habit", but "a user-oriented means" -- as Codd referred to them -- to represent information about relationships between objects of different types in relational databases. The DBMS enforces the matching via a 'referential constraint' on the relations representing the related object types. With relations in normal form and FKs, FOPL and the soundness and simplicity of relational databases are preserved.
Lately graph databases -- of which hierarchic databases are one type -- are reinvented as a superior alternative to relational databases. Data professionals young enough to have not been around the old versions to experience what FKs replaced should check them out (e.g., A few queries on IMS-DB) before they believe any arguments such as the one above. Those who forget the past ...

As to other constraints, stay tuned.



Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment