Sunday, June 17, 2018

Foreign Keys Part 2: Beware of Misconceptions




Note: This is the second part of a multipart re-write of several older posts to bring them into line with the McGoveran formalization and re-interpretation of Codd's real RDM, including revisions, refinements, and extesions of his own[1].

(Continued from Part 1)

Part 1 started with an online exchange triggered by the question “Do I Have to Use Foreign Keys? If I am already manipulating data properly, are foreign keys required? Do they have another purpose that I’m just not aware of?” Both the question and the replies exhibit misconceptions about FKs (there are misconceptions about almost everything in the RDM[2]) rooted in lack of foundation knowledge, so we provided some FK fundamentals. We are now in a position to debunk the replies.



------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE 

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS 

NEW: SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:


The DBDebunk page will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back. I created a #RelModel hashtag for this purpose.

The PostWest pages will contain links to evidence for, and my take on Dystopian Western Decadence, The Only Acceptable Racism Left, and The Weaponized Myth of a "Palestinian Nation" for which the anti-semitic world fell lock stock and barrel. I will create a #PostWest hash tag for this purpose.

HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------
“... [we] wish to make a point. There is something which is good/bad/design, and mandatory/optional. Please stop insisting that Primary and Foreign keys are mandatory. They are good design habits but by no means mandatory. However, life is much more complex than a Normalized DB structure. This includes tables serving as event logs; tables, serving as User maintained materialized query tables, tables, serving as supporting structures, reflecting state of complex transactional databases; persistent tables serving as Result Set or Session keepers.”

“And I personally believe that if [FKs] were truly mandatory, Sybase, Oracle, SQL Server, Ingres, DB2, etc. would require them. Oh, sorry, forgot the SQL standard itself. This is not the relational model we're talking about. These are commercially available RDBMSs which, not surprisingly, DO tend to listen to their customers. If they didn't, they wouldn't be in business!! Since Sybrand is unlikely to get FKs required by the SQL standard or the major RDBMS vendors, it seems that mandatory means that his answer to the question "Do I have to use foreign keys?" is "You would if you worked in my shop!". I'm inclined to agree with that.”

As we have explained in Part 1, FKs are part of enforcement of referential constraints among the relations that represent in the database related entity groups forming a multi-group. If the constraints are not declared to, and enforced by the DBMS, model representation is incomplete (the relationships among the groups are unknown to the DBMS) and, consequently, data integrity is at risk (see [3,4,5] for more in-depth treatment). It is left to the reader to judge whether this is an option.

Note: Relational databases consist of relations, not tables (which are just one way to visualize relations on some physical medium), and confusing the two is also rooted in lack of foundation knowledge[6]. Database user relations are in 5NF by definition[7] -- they represent entity groups with members of the same type. Otherwise, they are not relations, databases are not relational, and the advantages conferred on relational databases by the formal logic and mathematics underlying the RDM, semantic correctness and system-guaranteed logical validity of query results, are lost[8]. There are actually three core principles of relational database design, and according to an (yet unproven) McGoveran conjecture, if they are all adhered to, databases will comply with the Principle of Full Normalization (POFN), but not the other way around[9]. Any segment of reality, no matter how complex, can be represented by a 5NF database.

The second paragraph is upside down and backwards. Instead of holding products to sound theoretical foundations such as the RDM,  in the absence of knowledge and appreciation of those foundations, practitioners assume that products support them. But, first, vendors are equally ignorant of those foundations, so this assumption is not justified. Second, even if vendors were knowledgeable, why should they bother to develop true and full RDBMSs, if users confuse them with SQL DBMSs?

“Databases can work with or without primary keys and foreign keys. The choice is yours... However ... enforcing referential integrity can be done by many methods ... TMTOWTDT = There is more than one way to do this ... It all depends on your approach... In the last ten years... every one is enforcing referential integrity with help of primary and foreign keys but before this ... a lot of applications were working without primary and foreign keys to enforce referential integrity and to avoid orphaned rows/avoid duplicate records.”

This comment seems to arise from a conflation of the simplistic notion of a "database" with a shared database founded on a logical data model and implemented by a shared DBMS. It is fundamentally impossible to achieve consistency and concurrency using application code which, of necessity, has a limited view of concurrent activities.  Are databases without guaranteed consistency, concurrency control, data integrity, and result correctness "working"?  Is this really the choice you intend to make by renouncing constraints? Those not old enough to have experienced application-enforced integrity are unaware that it is the hopeless difficulties of that approach that compelled the industry to adopt DBMS-enforced integrity, referential included.

Note: Some of the costs of application enforced integrity, which is so prohibitive that is almost never undertaken, are database inconsistency, logical invalidity and semantic incorrectness of query results, hidden information inaccessible to the DBMS, or applications, multiple sources of corruption, burden of checks and cleanups, development/maintenance burden, inhibited performance optimization.

“We don't have every possible logical relationship enforced by the database. Sometimes you have to compromise for performance reasons, as too many foreign key validations can slow down high volume inserts. Other times you have to create breakpoints just to keep the web of relationships from becoming too tangled and connecting hundreds or thousands of tables.”
In Part 1 we identified several types of relationships that the RDM supports, and without enforcement of the constraints corresponding to them there would really be no databases to speak of -- practice would revert to pre-database application-based, but rarely enforced integrity, the problem database management was introduced to address. Practitioners should understand that (1) what they really need are true RDBMSs implemented to perform well without giving up integrity and (2) they cannot expect them if they consider and accept SQL DBMSs and trading off integrity for performance as the solution.
“I think it is preferable to have FK constraints as an additional security layer and they can be disable[d] during loading if required; however, you need to be 100% certain your ETL is enforcing the constraints. It is best to do both -- have the ETL reject records which fail FK checks and report on these whilst also enforcing FK intergrity on insert/update, if appropriate. The only additional thing I can add is -- when you delete from a FK enabled DB, make sure you do it in the correct order.”
The "ETL layer" is an application. With properly enforced referential constraints, violating tuples are rejected by the DBMS and there is nothing to report, which renders application enforcement unnecessary, so why defeat the purpose by bringing back application-enforced integrity? Particularly when performance is a concern (and SQL DBMS' support of shorthands for key constraints).
“I think, you have to learn about data structures and logical data design (not only database, which is nowadays interpreted mainly as only RDBMS), to be clear about usage primary, alternate, and foreign keys, normal forms, data integrity-and database integrity, because your database will work suboptimally without these knowledge if it will work at all.”
I don't know what the difference between "data" and "database" design is, but amen to the rest.


References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

[3] Pascal, F., Relationships and the RDM Parts 1,2,3.

[4] Pascal, F., Conceptual Modeling for Database Design, forthcoming.

[5] Pascal, F., Database Design: Formalizing the Informal, forthcoming.

[6] Pascal, F., Flat Tables and Multidimensional Relations.

[7] Pascal, F., What Relations Really Are and Why They Are Important.

[8] Pascal, F., Object Orientation, Logical Database Design, Logical Validity, and Semantic Correctness.

[9] Pascal, F., Database Design: What It Is and Isn't.


 

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.





No comments:

Post a Comment