Sunday, November 25, 2012

Databases Representing ... What?

A database is the formal representation (logical model) of an informal business model via the relational data model. Informal business models have meaning understood only by humans: computer systems do not and cannot know what entities, properties, attributes, classes and business rules are. DBMSs "understand" only their formal (logical) database representations--domains, R-tables, integrity constraints--and even that understanding is algorithmic, not semantic. It is, therefore, impossible to either design a database, or assess accuracy of representation without detailed knowledge and understanding of the informal reality that the database is supposed to represent.

A request for design advice that presents one or more tables without sufficient information about what they represent in the real world should induce both doubts about their design and great care in offering advice based on assumptions and guessing. A good reference (like Business Modeling for Database Design, for example) would have a much more beneficial effect.

Take the following example:
I am working on Data Modelling [sic] for a database (DB2). I need help on deciding keys for the table.

Table T1 has columns Col1 Col2 Col3 Col4 Col5, etc. The natural key is a composite key (Col1, Col2, Col3, Col4). In most of the scenarios, Col1,2 and 3 can be the same, and they can be distinguished apart by Col4. So in order to normalize the table, I split the tables as

Table T1 --> Col1,Col2, Col3 P.Key-->Surrogate Key, make the combination unique(Unique Key/index)
Table T2--> T1.Surrogate KEy, Col4, Col5, etc P.Key-->(T1.SK, Col4)

My questions are

1: Is Surrogate Keys better [sic] than Natural PK(Composite PK) in this scenario?

2: Will index rebuilding/maintenance become a major issue, because this is a transactional table, and will be hit a lot of times during the day?

3: This table will be joined with multiple tables, so will Surrogate keys cause overheads while hashing?

4: Which one is better for joins in this case? Surrogate Key as a PK or composite natural key as PK?

Please advise which is a better approach. Composite PK or Surrogate Key for this scenario.
--primary key- Data Modelling DB2 Index and Surrogate Keys,
Points arising:
  • Without knowledge of what attributes the columns represent, the relationship (dependencies) between those attributes, we don't know what and how many entity classes are there and, therefore, what and how many tables should be in the database.
  • Normalization is a design repair procedure. During the business modeling and database design process we ensure that each entity class is represented by exactly one R-table, which implicitly yields fully normalized designs. It's only if and when multiple entity classes are "bundled" into a single table that explicit normalization is necessary, to avoid several drawbacks that such bundling causes (see The Costly Illusion: Normalization, Integrity and Performance).
Given the information provided, can the asked questions be answered and the correct design determined?

No business model is provided, so how many entity classes are there? The composite natural key exacerbates the uncertainty: is there (a) one associative entity class, or (b) multiple entity classes?

{COL1, COL2, COL3, COL4} is a natural key if and only if it represents an identifier of one entity class--case (a). In that case a two-table design would split one entity class into two tables, neither of which has a natural key, an indication that this does not make sense. Generating the same surrogate key values for the two tables would produce unnecessary redundancy and require joins for no purpose.

As explained in On Kinds of Keys, the four criteria for key selection are
  • familiarity
  • irreducibility
  • simplicity
  • stability
If there is, indeed, only one entity class and one R-table, then a four-column natural key would justify a simple surrogate primary key for referential purposes. But the natural composite key would remain an alternate key to provide familiarity for queries.

Consider now the following exchange in this context: Many-to-many question.

No comments:

Post a Comment