ON DATABASE DESIGN
with Fabian Pascal

 



p class=BodyCopy> 

From: BF

To: Editor

 

I have a question regarding the use of surrogate keys and its effect on the ability to enforce referential integrity in a relational database using Microsoft SQL Server. The quickest way I can describe my question is with an example:

 

1.      A Company has one or more Employees

2.      A Company has one or more retirement Plans

3.      Employees are Participants in one or more Plans offered by their Company

 

Using surrogate keys the database looks like this:

 

Company: CompanyID,CompanyName 
Plan: PlanID,CompanyID(FK),PlanName
Employee: EmployeeID,CompanyID(FK),SSN 
Participant: ParticipantID,PlanID(FK),EmployeeID(FK)

 

My problem is this: I can declaratively enforce the fact that a Participant references a valid Plan and a valid Employee, but I cannot enforce the fact that an Employee can only participate in a Plan offered by the Company he/she works for! The only way (using Microsoft SQL Server) I can enforce this fact is by using triggers and writing code. I realize that this problem is not a shortcoming of the relational model or even surrogate keys, but a shortcoming of the product, but assuming that limitation...

 

I have another way to structure the database that solves this type of problem:

 

Company: CompanyID,CompanyName 
Plan: CompanyID,PlanID,PlanName 
Employee: CompanyID,EmployeeID,SSN 
Participant: CompanyID,PlanID,EmployeeID

 

Now I can declaratively enforce that a participant references an Employee and Plan for the same Company using foreign keys and the problem is solved. Note that ID fields are assigned incrementally within the parent so for example, the Plan table might look like this:

 

CompanyID PlanID PlanName

==================================== 
1         1      XYZ Executive Plan 
1         2      XYZ Directors Plan 
2         1      ABC Executive Plan 
2         1      ABC Directors Plan 

====================================

 

There are at least two other benefits (over standard surrogate keys) to my approach that I can see:

 

1        It is much easier to clone "objects" such a Company or Plan by copying "owned" rows and changing the foreign key ID to the cloned ID.

2        For large tables that need to be summed, data for a given Company, Plan, Employee, Account etc. is naturally clustered on its primary key.

 

The biggest problem I see with this approach is that if/when a design mistake is made defining the primary key for some entity, that mistake is propagated throughout the database in the foreign keys. So my question is this: Is there anything fundamentally wrong with my approach and is it a wise tradeoff over using triggers to enforce referential integrity?
 
 

From: Fabian Pascal
To: BF

 

Specific design advice via email is a bad idea and I urge you to be very careful with any advice offered this way. I usually refrain from it because it always requires more knowledge about the business reality than I possess and acquiring the necessary knowledge to be able to make intelligent suggestions requires time and effort that are difficult to justify.

 

I can offer only general comments:

 

Based on your three business rules, establish your entity types and the relationships between the entities of those types correctly, then map the entity types and M:M relationships to tables and the 1:M relationships to FKs. This process should result in the correct integrity constraints, including referential ones.

 

Entity types that do not have natural keys and require surrogate keys often tend to indicate that they were improperly defined and need be reconsidered.

 

The way you consider different tables suggests that you have not derived your tables from your entity types, but you are trying to design them to satisfy some key issues, which may lead you astray.

 

Check out the chapter on keys in my book, it may be of help.
 
 

From: BF 
 

Thanks for your quick reply. I understand you cannot give specific advice and I thank you for your general comments. I don't think the designs I suggested REQUIRE surrogate keys--I was just trying to follow something I saw in one of Chris' books suggesting the surrogates are Ok and have many benefits.

 

I am trying to figure out your comment regarding properly identifying entity types and the relationships. Are you suggesting that it is possible for this type of referential integrity problem to be declaratively enforced within the limits of Microsoft SQL Server and using surrogate keys? Or are you suggesting that the "correct" solution will force me to use composite keys where necessary.
 
 

From: Fabian Pascal
 

I was stating the general principle of selecting keys: you use surrogates only when there are no natural identifiers, or there are composite keys which complicate access and referential integrity.

 

I am not familiar with specific products' support of keys and referential integrity, so I cannot speak to that, although I know enough about vendors' attitude towards fundamentals to be reasonably sure that support is usually weak, incomplete, or too complicated. I am only telling you what are the database principles involved. You have to figure out whether the product does or does not support them, and how fully or correctly. If they don't, you will probably have to bastardize design or use application code to work around them.

 

I recommend you read the chapter on keys in my book, figure out the correct design and then whether the product support that or not.
 
 

Fabian Pascal Responds: Starting with table examples is quickest, but surest way to be led astray. You must formulate the business rules precisely, clearly and completely before you design any tables. That means, informally, establishing your entity types and the relationships between them (formally, specifying the table predicates). Otherwise table examples are meaningless, because, essentially, you have not determined what exactly you want them to represent in the database (see chapters 5 and 8 in PRACTICAL ISSUES IN DATABASE MANAGEMENT). This is why I am reluctant to give remote design advice via email -- the chance of figuring out from emails what is to be represented is very slim. 

The business rules provided are a case in point:

 

·   The specification of the relationship between Companies and Plans is incomplete: you don't say how many plans per company. I am assuming one or more, which means the relationship is M:M.

·   Rule 3 is ambiguous: do you mean (a) each employee participates in one or more company plans, or (b) each employee participates in one company plan? In both cases "Employees are Participants in one or more Plans offered by their Company".
 

Based on my assumption 1 there are three entity types: Company, Plan and Employee. Rule 3 establishes an associative entity type that I shall call Company-Plan. In case (a), the relationship between employees and company-plans is M:M, which establishes another associative entity type that I shall call Participant. The business (conceptual) model is shown in Figure 1. 

 

Figure 1: The Business Model

 

 

Entity types map to tables and 1:M relationships map to referential constraints. Thus, the logical design is (PKs bold, FKs underlined):

 

COMPANIES: {CompanyID,CompanyName}
EMPLOYEES: {EmployeeID,CompanyID}
PLANS: {PlanID,PlanName}
COMPANY-PLANS: {CPlanID,PlanID,CompanyID}
PARTICIPANTS: {EmployeeID,CPlanID}

 

I have defined a simple surrogate key CPlanID for the COMPANY-PLANS table in order to simplify the enforcement of the referential constraint highlighted in red, which a composite key would complicate (see chapter on keys in PRACTICAL ISSUES IN DATABASE MANAGEMENT and Date's article on composite keys in his RELATIONAL DATABASE WRITINGS 1989-91).

 

In case (b), where each employee participates in only one company plan, the relationship between employees and company plans is M:1 and there is no entity type associating them. The design can be simplified to only four tables:

 

COMPANIES: {CompanyID,CompanyName}
PLANS: {PlanID,PlanName}
COMPANY-PLANS: {CPlanID,PlanID,CompanyID}
EMPLOYEES: {EmployeeID,CompanyID,CPlanID}

 

Note: Under certain special data administration circumstances it may be possible to further reduce the design to three tables by unifying COMPANY-PLANS and EMPLOYEES into one table, which would require the exercise of care.

 

 

Posted 06/28/02

 

 

 

[ABOUT] [QUOTES] [LINKS]