Sunday, June 17, 2018

Foreign Keys Part 2: Beware of Misconceptions

Note: This is the second part of a multipart re-write of several older posts to bring them into line with the McGoveran formalization and re-interpretation of Codd's real RDM, including revisions, refinements, and extesions of his own[1].

(Continued from Part 1)

Part 1 started with an online exchange triggered by the question “Do I Have to Use Foreign Keys? If I am already manipulating data properly, are foreign keys required? Do they have another purpose that I’m just not aware of?” Both the question and the replies exhibit misconceptions about FKs (there are misconceptions about almost everything in the RDM[2]) rooted in lack of foundation knowledge, so we provided some FK fundamentals. We are now in a position to debunk the replies.

Sunday, June 10, 2018

Foreign Keys Part 1: Understanding the Fundamentals

Note: This is the first part of a two-part re-write of several earlier posts, to bring them into line with McGoveran's formalization and re-interpretation of Codd's true RDM, which includes his own corrections, refinements and extensions[1]. For a more in-depth treatment see the series of papers available here.

“Do I Have to Use Foreign Keys? If I am already manipulating data properly, are foreign keys required? Do they have another purpose that I’m just not aware of? I appreciate the guidance!”
“... [we] wish to make a point. There is something which is bad design/good design/mandatory/optional. Please stop insisting that Primary and Foreign keys are mandatory. They are good design habits but by no means mandatory. However, life is much more complex than a Normalized DB structure. This includes tables serving as event logs; tables, serving as User maintained materialized query tables, tables, serving as supporting structures, reflecting state of complex transactional databases; persistent tables serving as Result Set or Session keepers. And I personally believe that if they were truly mandatory, Sybase, Oracle, SQL Server, Ingres, DB2, etc. would require them. Oh, sorry, forgot the SQL standard itself. This is not the relational model we're talking about. These are commercially available RDBMSs which, not surprisingly, DO tend to listen to their customers. If they didn't, they wouldn't be in business!! Since Sybrand is unlikely to get FKs required by the SQL standard or the major RDBMS vendors, it seems that mandatory means that his answer to the question "Do I have to use foreign keys?" is "You would if you worked in my shop!". I'm inclined to agree with that.”
“Databases can work with or without primary keys and foreign keys. The choice is yours... However ... enforcing referential integrity can be done by many methods ... TMTOWTDT = There is more than one way to do this ... It all depends on your approach... In the last ten years... every one is enforcing referential integrity with help of primary and foreign keys but before this ... a lot of applications were working without primary and foreign keys to enforce referential integrity and to avoid orphaned rows/avoid duplicate records.”
“We don't have every possible logical relationship enforced by the database. Sometimes you have to compromise for performance reasons, as too many foreign key validations can slow down high volume inserts. Other times you have to create breakpoints just to keep the web of relationships from becoming too tangled and connecting hundreds or thousands of tables.”
“I think it is preferable to have FK constraints as an additional security layer and they can be disable[d] during loading if required; however, you need to be 100% certain your ETL is enforcing the constraints. It is best to do both - have the ETL reject records which fail FK checks and report on these whilst also enforcing FK intergrity on insert/update, if appropriate. The only additional thing I can add is - when you delete from a FK enabled DB, make sure you do it in the correct order.”
“I think, you have to learn about data structures and logical data design (not only database, which is nowadays interpreted mainly as only RDBMS), to be clear about usage primary, alternate, and foreign keys, normal forms, data integrity-and database integrity, because your database will work suboptimally without these knowledge if it will work at all.”
Data practitioners have high levels of tool knowledge, but lack a good grasp of fundamentals, for which reason they cannot be considered data professionals. Now, do not get me wrong: I do not mean that good knowledge of tools is unimportant -- if you work with them you gotta know them real well -- but the ability to fully assess them, use them optimally, and compensate for any shortcomings is limited in the absence of foundation knowledge. So let's have some before tackling the exchange.

Friday, June 8, 2018

DBDebunk on Social Media

I have deleted my Facebook account and am focusing my presence on social media on two Twitter pages, DBDebunk, and The PostWest.


On DBDebunk  I will tweet links to new posts on this site, To Laugh or Cry? and What's Wrong with This Picture? that I am bringing back there, and occasional links to and comments on items of interest. I have created a #RelModel hashtag for use with it.

On ThePostWest I will tweet links to evidence for, and my take on (1) Dystopian Western Decadence and Decline, including (2) the equivalence of The Only Acceptable Racism Left and The Weaponized Myth of a "Palestinian Nation". I will create a #ThePostWest later on for use with it.

Please make a note of it, disseminate, and follow.

Saturday, June 2, 2018

Relationships and the RDM Part 3: Beware the Misconceptions

Note: This is the last part of a three-part re-write of two earlier posts, to bring them into line with McGoveran's formalization and re-interpretation[1] of Codd's true RDM, including his own corrections, refinements and extensions. For an in-depth treatment see [2,3].

(Continued from Part 2)

I started Part 1 with a bunch of comments revealing misconceptions (among the many rampant in the industry[4]) about RDM poor support of relationships, if any. Debunking them requires foundation knowledge that is lacking in the industry, so in Part 1 I documented the types of relationships that need to be supported by the RDM, and in Part 2 I showed that they are supported via relational integrity constraints by the RDM (though not necessarily by SQL DBMSs that are confused with RDBMSs), and pinpointed factors that, in the absence of foundation knowledge, contribute to the misconceptions.

Armed with this information I will now debunk the comments with which I started. They all exhibit confusion of levels of representation -- conceptual-logical conflation (CLC) and/or logical-physical confusion (LPC)[5] -- a common source of problems in database practice.