From: IH
To: Editor
I have just stumbled across your site and read a couple of
articles. With respect to the article The Logical-Physical
Confusion and just to be pedantic, I have to disagree with a couple of
points made by Fabian Pascal, although, in general, I find myself in agreement
with his views.
There appears to be statement that all/most practical
R(?)DBMSs are in fact SQL DBMSs, whatever that is, and not RDBMSs. There is
then an implication that all(?) such ?DBMSs implement a 1:1 relationship
between data tables and physical disk files. I would respectfully suggest that
it is very rarely the case. dBase, FoxPro and Paradox, as almost extinct
desktop systems use this model, as does mySQL, although that may not yet fully
qualify anyway. The others do not, allowing, preferring and in some cases even
forcing all tables into a single datafile.
If I have understood your point, then logical design involves
turning some business requirement/model into a set of normalised tables, while
physical design specifies how these are arranged in files and on disks. Good
physical database design will have tables, indexes, etc. distributed between
files and across disks, to minimise predictable bottlenecks and will happen
after the logical design (normalisation, tables, etc.).
Personally I hate the idea of denormalisation and for every
instance there must be a procedure checking on the lost integrity, however, the
claim that denormalisation can improve performance should be verifiable;
observed read performance that is. Consider a simple case of a person having
several telephone numbers. This has then been normalised into a person table
and a numbers table. To retrieve the name and all numbers from the normalised
tables, whether in 1 data file or 2, the system must either scan table 1 or
lookup an index for table 1 and then read table 1. Having extracted the
required text (name) from the data blocks, the system then either reads all of
table 2, or reads an index of table 2 and then reads the appropriate data.
Each time a read is made, the disk heads are moved and a
block of data is read. This data is at least the size of an OS block, usually
512B or 1024B or a database block, which may be up to 32/64k. Denormalising to
a single table means that all of the data is read from table 1, because it is
contiguous, so the system has no need to read table 2 at all. This is the
point, not the number of datafiles, but the separation of the data. Disk
performance is many times better than it used to be, but is still one of the
slowest links in the chain, so cutting it in half should produce a benefit.
The system will need to spend time checking on the lost integrity,
which may impinge on the general performance of the database.
The use of complex data types within a RDBMS can/may be a way
of implementing "denormalisation", without the loss of integrity, if
done carefully and if the RDBMS implements it adequately. This, however,
requires more careful attention to detail, not less.
Fabian Pascal Responds: Thanks for making me aware
that with respect to physical implementation of SQL RDBMSs, I did not state
properly what I meant to say. I should have said physical records
instead of files. SQL implementations maintain a 1:1 relationship between logical
rows and physical records. That means that there is an inherent
physical order for rows and columns on which SQL relies and, thus, must be
preserved by the DBMS, limiting optimization options. Relieving DBMSs from this
constraint would enable them to optimize performance in ways that are currently
not possible with SQL DBMSs (and is exactly what the new implementation
technology I alluded to in several of my writings does).
As to denormalization and verifiability, I explain in various
writings (see, for example, The Dangerous Illusion, Part 1
and Part 2)
that even if denormalization does increase performance, that is entirely due to
ignoring the additional integrity checks (which, by the way, should not be
procedural). Adding one such check for each denormalized table would wipe out
performance gains from denormalization, if any.
Posted
07/28/02
[ABOUT]
[QUOTES]
[LINKS]