p class=BodyCopy>
From: BF
To: Editor
I have a question regarding the use of surrogate keys and its
effect on the ability to enforce referential integrity in a relational database
using Microsoft SQL Server. The quickest way I can describe my question is with
an example:
1.
A Company has one or more Employees
2.
A Company has one or more retirement Plans
3.
Employees are Participants in one or more Plans offered by
their Company
Using surrogate keys the database looks like this:
Company: CompanyID,CompanyName
Plan: PlanID,CompanyID(FK),PlanName
Employee: EmployeeID,CompanyID(FK),SSN
Participant: ParticipantID,PlanID(FK),EmployeeID(FK)
My problem is this: I can declaratively enforce the fact that
a Participant references a valid Plan and a valid Employee, but I cannot
enforce the fact that an Employee can only participate in a Plan offered by the
Company he/she works for! The only way (using Microsoft SQL Server) I can
enforce this fact is by using triggers and writing code. I realize that this
problem is not a shortcoming of the relational model or even surrogate keys,
but a shortcoming of the product, but assuming that limitation...
I have another way to structure the database that solves this
type of problem:
Company: CompanyID,CompanyName
Plan: CompanyID,PlanID,PlanName
Employee: CompanyID,EmployeeID,SSN
Participant: CompanyID,PlanID,EmployeeID
Now I can declaratively enforce that a participant references
an Employee and Plan for the same Company using foreign keys and the problem is
solved. Note that ID fields are assigned incrementally within the parent so for
example, the Plan table might look like this:
CompanyID PlanID PlanName
====================================
1
1 XYZ Executive Plan
1 2
XYZ Directors Plan
2
1 ABC Executive Plan
2
1 ABC Directors Plan
====================================
There are at least two other benefits (over standard
surrogate keys) to my approach that I can see:
1
It is much easier to clone "objects" such a Company
or Plan by copying "owned" rows and changing the foreign key ID to
the cloned ID.
2
For large tables that need to be summed, data for a given
Company, Plan, Employee, Account etc. is naturally clustered on its primary key.
The biggest problem I see with this approach is that if/when
a design mistake is made defining the primary key for some entity, that mistake
is propagated throughout the database in the foreign keys. So my question is
this: Is there anything fundamentally wrong with my approach and is it a wise
tradeoff over using triggers to enforce referential integrity?
From: Fabian Pascal
To: BF
Specific design advice via email is a bad idea and I urge you
to be very careful with any advice offered this way. I usually refrain from it
because it always requires more knowledge about the business reality than I
possess and acquiring the necessary knowledge to be able to make intelligent
suggestions requires time and effort that are difficult to justify.
I can offer only general comments:
Based on your three business rules, establish your entity
types and the relationships between the entities of those types correctly,
then map the entity types and M:M relationships to tables and the 1:M
relationships to FKs. This process should result in the correct integrity
constraints, including referential ones.
Entity types that do not have natural keys and require surrogate
keys often tend to indicate that they were improperly defined and need be
reconsidered.
The way you consider different tables suggests that you have
not derived your tables from your entity types, but you are trying to design
them to satisfy some key issues, which may lead you astray.
Check out the chapter on keys in my book, it may
be of help.
From: BF
Thanks for your quick reply. I understand you cannot give
specific advice and I thank you for your general comments. I don't think the
designs I suggested REQUIRE surrogate keys--I was just trying to follow
something I saw in one of Chris' books suggesting the surrogates are Ok and
have many benefits.
I am trying to figure out your comment regarding properly
identifying entity types and the relationships. Are you suggesting that it is
possible for this type of referential integrity problem to be declaratively
enforced within the limits of Microsoft SQL Server and using surrogate keys? Or
are you suggesting that the "correct" solution will force me to use
composite keys where necessary.
From: Fabian Pascal
I was stating the general principle of selecting keys: you
use surrogates only when there are no natural identifiers, or there are
composite keys which complicate access and referential integrity.
I am not familiar with specific products' support of keys and
referential integrity, so I cannot speak to that, although I know enough about
vendors' attitude towards fundamentals to be reasonably sure that support is
usually weak, incomplete, or too complicated. I am only telling you what are
the database principles involved. You have to figure out whether the product
does or does not support them, and how fully or correctly. If they don't, you
will probably have to bastardize design or use application code to work around
them.
I recommend you read the chapter on keys in my book, figure
out the correct design and then whether the product support that or not.
Fabian Pascal Responds: Starting with table examples
is quickest, but surest way to be led astray. You must formulate the
business rules precisely, clearly and completely before you design any tables.
That means, informally, establishing your entity types and the relationships
between them (formally, specifying the table predicates). Otherwise table examples
are meaningless, because, essentially, you have not determined what exactly
you want them to represent in the database (see chapters 5 and 8 in PRACTICAL ISSUES
IN DATABASE MANAGEMENT). This is why I am reluctant to give remote
design advice via email -- the chance of figuring out from emails what is to be
represented is very slim.
The business rules provided are a case in point:
·
The specification of the relationship between Companies
and Plans is incomplete: you don't say how many plans per company. I am
assuming one or more, which means the relationship is M:M.
·
Rule 3 is ambiguous: do you mean (a) each employee
participates in one or more company plans, or (b) each employee participates in
one company plan? In both cases "Employees are Participants in one
or more Plans offered by their Company".
Based on my assumption 1 there are three entity types:
Company, Plan and Employee. Rule 3 establishes an associative entity type that I
shall call Company-Plan. In case (a), the relationship between employees and
company-plans is M:M, which establishes another associative entity type that I
shall call Participant. The business (conceptual) model is shown in Figure
1.

Figure 1: The Business Model
Entity types map to tables and 1:M relationships map to
referential constraints. Thus, the logical design is (PKs bold, FKs
underlined):
COMPANIES: {CompanyID,CompanyName}
EMPLOYEES: {EmployeeID,CompanyID}
PLANS: {PlanID,PlanName}
COMPANY-PLANS: {CPlanID,PlanID,CompanyID}
PARTICIPANTS: {EmployeeID,CPlanID}
I have defined a simple surrogate key CPlanID for the
COMPANY-PLANS table in order to simplify the enforcement of the referential
constraint highlighted in red, which a composite key would complicate (see
chapter on keys in PRACTICAL ISSUES IN
DATABASE MANAGEMENT and Date's article on composite keys in his RELATIONAL DATABASE
WRITINGS 1989-91).
In case (b), where each employee participates in only one
company plan, the relationship between employees and company plans is M:1 and
there is no entity type associating them. The design can be simplified to only
four tables:
COMPANIES: {CompanyID,CompanyName}
PLANS: {PlanID,PlanName}
COMPANY-PLANS: {CPlanID,PlanID,CompanyID}
EMPLOYEES: {EmployeeID,CompanyID,CPlanID}
Note: Under certain special data
administration circumstances it may be possible to further reduce the design to
three tables by unifying COMPANY-PLANS and EMPLOYEES into one table, which
would require the exercise of care.
Posted
06/28/02
[ABOUT]
[QUOTES]
[LINKS]