Tuesday, September 18, 2018

Don't Conflate/Confuse Primary Keys, PK Constraints, and Indexes




“What is the difference between an index and a key? How are they related?”

“There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define uniqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.”

“A primary key by default creates a clustered index. A unique constraint/key by default creates a non-clustered index.”

“An index is a (logically) ordered list of rows. For example, an index on LastName means all values are already sorted in LastName order. Usually index rows contain far fewer columns in them than the table itself (except the clustered index, which is the table). A key is a column or columns that defines the order of an index. For example, on an index ordered by (LastName,FirstName), then LastName and FirstName are the keys. Btw, a primary key is a physical object, not a logical one. The db engine needs physical rows in order to insure unique values in the index.”
--Difference between an index and a key?, SQLTeam.com
I have recently published a paper[1], and posted a multipart series[2] on relational keys. In the latter I stated as follows:
"As a relational feature, keys can only be properly understood within the formal foundation of the RDM, which is simple set theory (SST) expressible in first order predicate logic (FOPL) adapted and applied to database management. Yet that is precisely what is ignored and dismissed in the industry -- including by the authors of SQL[3]."
I have also written extensively on widespread logical-physical confusion (LPC)[4], recently specifically in the key-index context[5]. The replies above are examples -- if any more were needed -- that validate my repeated claim of lack of foundation knowledge in the industry -- can you tell what's wrong with, and what's correct in, them?

------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE 

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS 

NEW: The Key to Relational Keys: A New Perspective

NEW: SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:

  • @dbdebunk: will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.
  • @ThePostWest: will contain evidence for, and my take on the spike in Anti-semitism that usually accompanies existential crises. The current one is due to the decadent decline of the West and the corresponding breakdown of the world order.
HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------


Names and Primary Keys, Relationships and Constraints


The first reply is an example of what I usually refer to as "heart in the right place", but insufficient grasp of data fundamentals results, as usual, in failure to express things clearly and correctly[6].

First, there is the common confusion of database relations, tuples and attributes with their presentation as tables, columns, and rows on some physical medium (paper, screen)[7];

Second, a PK is a collection of one or more attributes that represents formally in the database a name assigned to an entity type. Its values represent names of specific entities of that type, and there is a 1:1 relationship between the name of an entity and the value of each of its properties that is not a name[1]. This relationship is one of the third order properties (3OP) of the entity group represented by the relation in the database, and is represented by a functional dependency (FD) constraint on the relation -- for each PK value there is exactly one value of each non-key attribute[8] -- often referred to as entity integrity;

Third, uniqueness is a relationship among all the members of the entity group, another 3OP of the group, which is represented by a uniqueness constraint on the relation, commonly referred to as a PK constraint (not by a "property on the PK attribute(s)"). Note that (1) a PK (attribute) is distinct from a PK constraint and (2) by enforcing the PK constraint, a DBMS implicitly also enforces the FD constraint[8,9];

Fourth, because they violate two-valued logic (2VL)[10] and the Closed World Assumption (CWA)[11], the core Information Principle (IP) of the RDM[12] prohibits missing data. Thus, NULL is a non-relational SQL feature, and a NOT NULL constraint would be superfluous in a relational database managed by a true RDBMS[13].


PKs and Indexes


It follows that the second and third replies are examples of LPC --  practitioners commonly conflate a PK -- component of the logical model -- and the index on it -- component of the physical implementation[14]. PKs are most often used to access data and are normally indexed for performance. This common PK constraint implementation renders the constraint and PK dependent on the index (dropping the latter also drops the former two), and induces the misconception that they are one and the same thing, or conceptually related in some way ("a key is a column or columns that defines the order of an index").

Note: Points arising:

  • Actually, (Lastname,Firstname) is the PK, not "keys";
  • The definition of a PK requires a uniqueness constraint on the relation, which is implemented with an index on the PK;
  • The dependence of PKs on indexes violates physical independence (PI)[15] and the RDM. This does not mean that PKs should not be indexed for performance, but that a logical feature should not be dependent on a specific physical implementation feature.
  • I have no idea what "the db engine needs physical rows in order to insure unique values in the index" means.

Proper knowledge of the RDM would prevent all of this.


References

[1] Pascal, F., The Key to Relational Keys: A New Understanding.

[2] Pascal, F., A New Understanding of Keys Parts 1,2,3.

[3] Darwen, H., Why Are There No Relational DBMSs.

[4] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[5] Pascal, F., No Such Thing As "Primary Key Tuning".

[6] [8] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER.

[7] Pascal, F., Understanding Relations Part 1: Tables? So What?

[8] Pascal, F., Relationships and the RDM Parts 1,2,3.

[9] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLKS, forthcoming.

[10] Pascal, F., The Final NULL in the Coffin: A Relational Solution to Missing Data.

[11] Pascal, F., The Clouding Syndrome.

[12] Pascal, F., Interpreting Codd: The 12 Rules.

[13] Pascal, F., "NULL Value" is a Contradiction in Terms.

[14] Pascal, F., Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation.

[15] Pascal, F., Physical Independence Parts 1,2.





Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.




No comments:

Post a Comment

View My Stats