ON PHYSICAL CONSTRAINTS IN SQL
with Fabian Pascal

 

 

  

From: IH
To: Editor

 

I have just stumbled across your site and read a couple of articles. With respect to the article The Logical-Physical Confusion and just to be pedantic, I have to disagree with a couple of points made by Fabian Pascal, although, in general, I find myself in agreement with his views.

 

There appears to be statement that all/most practical R(?)DBMSs are in fact SQL DBMSs, whatever that is, and not RDBMSs. There is then an implication that all(?) such ?DBMSs implement a 1:1 relationship between data tables and physical disk files. I would respectfully suggest that it is very rarely the case. dBase, FoxPro and Paradox, as almost extinct desktop systems use this model, as does mySQL, although that may not yet fully qualify anyway. The others do not, allowing, preferring and in some cases even forcing all tables into a single datafile.

 

If I have understood your point, then logical design involves turning some business requirement/model into a set of normalised tables, while physical design specifies how these are arranged in files and on disks. Good physical database design will have tables, indexes, etc. distributed between files and across disks, to minimise predictable bottlenecks and will happen after the logical design (normalisation, tables, etc.).

Personally I hate the idea of denormalisation and for every instance there must be a procedure checking on the lost integrity, however, the claim that denormalisation can improve performance should be verifiable; observed read performance that is. Consider a simple case of a person having several telephone numbers. This has then been normalised into a person table and a numbers table. To retrieve the name and all numbers from the normalised tables, whether in 1 data file or 2, the system must either scan table 1 or lookup an index for table 1 and then read table 1. Having extracted the required text (name) from the data blocks, the system then either reads all of table 2, or reads an index of table 2 and then reads the appropriate data.

 

Each time a read is made, the disk heads are moved and a block of data is read. This data is at least the size of an OS block, usually 512B or 1024B or a database block, which may be up to 32/64k. Denormalising to a single table means that all of the data is read from table 1, because it is contiguous, so the system has no need to read table 2 at all. This is the point, not the number of datafiles, but the separation of the data. Disk performance is many times better than it used to be, but is still one of the slowest links in the chain, so cutting it in half should produce a benefit.

 

The system will need to spend time checking on the lost integrity, which may impinge on the general performance of the database.

 

The use of complex data types within a RDBMS can/may be a way of implementing "denormalisation", without the loss of integrity, if done carefully and if the RDBMS implements it adequately. This, however, requires more careful attention to detail, not less.

 

 

Fabian Pascal Responds: Thanks for making me aware that with respect to physical implementation of SQL RDBMSs, I did not state properly what I meant to say. I should have said physical records instead of files. SQL implementations maintain a 1:1 relationship between logical rows and physical records. That means that there is an inherent physical order for rows and columns on which SQL relies and, thus, must be preserved by the DBMS, limiting optimization options. Relieving DBMSs from this constraint would enable them to optimize performance in ways that are currently not possible with SQL DBMSs (and is exactly what the new implementation technology I alluded to in several of my writings does).

 

As to denormalization and verifiability, I explain in various writings (see, for example, The Dangerous Illusion, Part 1 and Part 2) that even if denormalization does increase performance, that is entirely due to ignoring the additional integrity checks (which, by the way, should not be procedural). Adding one such check for each denormalized table would wipe out performance gains from denormalization, if any.

 

 

Posted 07/28/02

 

 

 

[ABOUT] [QUOTES] [LINKS]