"My understanding has always been that a primary key should be immutable, and my searching since reading this answer has only provided answers which reflect the same as a best practice. Under what circumstances would a primary key value need to be altered after the record is created?"This is, a more detailed treatment of keys undertaken in:
"When a primary key is chosen that is not immutable?" --Why should a primary key change, StackExchange.com
 On Kinds of Keys: Natural, Primary and Surrogate Are Sufficientwhich I have just revised. It is the result of several discussions I had with David McGoveran as part of my familiarization with his formal interpretation of Codd's relational data model (RDM), which he'll introduce in his forthcoming book LOGIC FOR SERIOUS DATABASE FOLKS. My own forthcoming book--THE DBDEBUNK GUIDE TO FUNDAMENTAL DATA MANAGEMENT MISCONCEPTIONS--incorporates some of his thoughts in an informal form accessible to the thinking data professional. As you shall see, it diverges in some important ways from the current understanding of the RDM (well, whatever little understanding there is).
 Surrogate Key Illusions
 Duplicates: Stating the Same Fact More Than Once Doesn't Make It Truer, Only Redundant
Entity-Defining Identifiers and Natural KeysRecall that a database relation represents a set of (facts about) property-sharing entities in the real world. A property is an observable/measurable set of values obtained by a well-defined procedure (e.g., fingerprinting, weighing, checking documents). Entities are distinguishable by definition--otherwise we could not tell them apart, or even count. They have an entity-defining identifier (EDI)--a set of inherent defining properties by which we recognize them--whether explicitly or implicitly--that uniquely identifies them and distinguishes each from the others (e.g., fingerprints for people, ownership and a set of books for companies and so on).
Natural keys (NK) are relation attributes at the logical level that represent EDI's at the conceptual level.
"If there exists in the real world a naturally occurring set of entity-defining properties thatNK's are so labeled because they represents inherent entity properties. Immutability is an assumption based on belief (from evidence, not whim) about the real world--(1) that the EDI properties will be stable during the intended life of the conceptual and corresponding logical model and, therefore, the database and applications that use it and (2) there is no evidence to the contrary--belief which is incorporated in the conceptual model. We can be wrong and if we are, there is no way around database redesign (see below). Otherwise put, NK's change only if and when the entities themselves change.
that property set is represented by a natural key (NK) in the database." --D. McGoveran
- Distinguishes each entity from the others and uniquely identifies them (i.e., the properties have unique values in combination);
- Is immutable (i.e., can be used repeatedly to identify an entity for the intended life of the conceptual model in the context of the universe of discourse)
Because they represent real world properties, NK's are used for meaningful logical data access.
Primary KeysCodd mandated a primary key (PK) for each relation as the designated identifier of entities and--where appropriate--target for references by foreign keys (FK) of other relations. When there is more than one candidate key (CK), one would be selected as PK, or a PK would be created.
The criteria for choosing a PK are both pragmatic--simplicity,familiarity/naturalness (the DBMS has to do a lot more work and use a lot more storage to translate among multiple CK's)--and formal-- stability and uniqueness.
Codd's seems to have thought of PK's as what he later called "permanent surrogates for entities", but he failed to make clear the formal motivation for them.
Every base tuple corresponds to some entity (possibly abstract) that is identifiable in the real world. In set theory, entity identity is determined by testing for properties and their values (observation/measurements in the real world). A unique set of properties having specific values determines a specific entity--the defining properties for that type of entity.
To make simple set theory (and the RDM) applicable to the real world, we assume some properties identify an entity permanently while others (e.g., location) change over time. This is the only basis for the claim that an entity observed at one time is the same as an entity observed at another time. Without that, simple set theory and the RDM--which relies on it as its foundation--simply cannot be applied to the real world, nor can first order predicate logic (FOPL), as it requires a predicate variable over which predicates range and in FOPL as it appears in RDM that is a tuple variable. Modeling is not possible without permanent identifiers.
Codd did recognize that users might have to change the values of "user controlled keys" (e.g., companies merging) and his own relational data language ALPHA allowed such updates via delete and insert. In the 1979 paper he discussed this issue and introduced system assigned surrogate keys (SSK) that, unlike those promoted in the industry
- would not change in any way visible to the user;
- if changed by the system, "tuple identity" (i.e., correspondence to some unique entity in the "real world") would be preserved.
He also introduced a "coalesce" operator to handle the case of two entities turning out to be one. --David McGoveran
Entity Names and Surrogate Keys
Consider SSN's. They were introduced as a simple numeric name to stand in for a complex set of properties that the SSA used to classify US citizens and alien residents as eligible to earn income in the US (i.e., it had some well-defined procedure by which it associated a SSN value with a set of distinct value of the EDI consisting of those properties). Although in time the SSN became a recognizable identifier, it's not the SSN that defines US eligible citizens and residents--the set of properties for which it stands do--it is just a set of simple artificial entity names that substitutes for them. Very often when we encounter multiple CK's for a relation, it is because the entity type it represents has multiple names. Such assigned names are represented by surrogate keys (SK) in the database.
"A SK assigns an entirely artificial name or label to every entity, such thatNote: SK's are often deployed for the wrong reasons (e.g., to emulate object-id's, or to improve performance). In the former case they serve no function. In the latter case logical design is contaminated with physical implementation considerations--a violation of physical independence (PI)--and can actually defeat the purpose. SQL DBMS's implement keys by indexing and each SK adds not only data, but also an index that must be managed. Arbitrary SK's cannot be used for meaningful logical data access and if they are visible to users, the need to translate to and from a meaningful/familiar key imposes joins that would otherwise be unnecessary. Query formulation is more complex and DBMS optimization is inhibited. Ami Levin warns that "magical performance improvement myths that are commonly associated with SK's should, like all myths, not be taken for granted and be critically investigated. In many cases it is the NK--if a simple and stable one can be identified--that wins in terms of performance, consistency, and maintenability" .
There are three cases in which a SK might be used:
- There is a well-defined procedure that ensures a 1:1 relationship between SK values and NK values (however abstract, complex, or difficult to measure/observe the NK values may be);
- It is defined such that all attributes representing descriptive entity properties are functionally dependent on it.
In all these cases, once we identify a SK as the PK, it is just a synonym for any other entity name.
- As a substitute for a (possibly complex) NK;
- As a substitute for other, pre-existing names for the entity;
- As a substitute for an unspecified NK (e.g., when we distinguish entities without knowing exactly the properties we use implicitly for that purpose).
If the values of the defining properties represented by the presumed NK change within the intended life of the conceptual model, but still serve to uniquely identify the entity in the real world, the "changing NK" fails to capture this identification. If a SK is added to the relation and the properties represented by the presumed NK are treated as ordinary attributes, the SK enables tracking the relationship between the old and new NK property values, while recognizing that only one entity is being identified. NK's simply cannot suffice for this situation.
If entities themselves mutate (e.g., companies involved in mergers, acquisitions, spinoffs), NK's (which represent inherent entity defining properties) are almost certain to fail. In this case, the conceptual model does no longer represent reality faithfully and the database needs to be redesigned. That said, SK's can track how entities are related over time (entities combine to form a new entity, an entity divides into multiple entities, an entity is created or destroyed, etc).
If an EDI cannot be identified in the real world (e.g., clouds, sub-atomic particles, anonymous poems), the assignment of SK values to entities is inherently arbitrary and they are then just as good as the ability to somehow recognize individual entities implicitly within a class systematically." --D. McGoveran
- represent EDI's;
- are used for meaningful logical data access.
- as designated entity identifiers;
- referential targets.
- serve as simple and stable PK's;
- that help track changes when immutability assumptions fail, or entities mutate.
Do you like this post? Please link back to this article by copying one of the codes below.URL: HTML link code: BB (forum) link code: