From: AE
To: Editor
I am an application developer who is competent in SQL and
have designed some small-scale schemas that have gotten the job done. I came to
your website seeking enlightment on the issue of whether it is better to use
strings to represent attributes (I believe these are called "speaking
keys"), or to use integers and key them to a list of strings. I see that
you are avidly pro-normalization, but I'm not even sure if this is a
normalization issue. Here is a quote from a web site that captures my gut
feeling. I hope you can comment or point me to an article that presents your
view of the matter.
"Many popular database tools make it very easy to create
some sort of arbitrary numeric sequence for your primary keys. Be careful not
to use these to excess. If you have are designing a music catalog system, it is
definitely not useful to create a lot of arbitrary codes for music
classification. There is unlikely to be much benefit in designating
"1" to stand in for "Jazz", 2 for "Rock", 3 for
"Classical" and 4 for "Latin Jazz" in an application. Why
not just use the word "Jazz" for "Jazz" and use
"Rock" to stand for "Rock"? It is certainly easier to
remember. It also means that you won't have to lookup the value of 1 to see
that it stands for "Jazz".
From: Fabian Pascal
To: AE
We do not believe it is possible to be a good developer of
databased applications without first educating yourself on database
fundamentals. Such fundamentals cannot be captured via questions to a web site
each time you have a specific question, but rather via reading basic material
in sources, then, if there are details on which you are not clear on, or
think you disagree with, you can try to have those clarified this way. There
are no shortcuts and beware of anybody who claims otherwise.
The question you ask is part of the fundamentals that you
need to know prior to involving yourself in practice. I would suspect that this
is not the only fundamental knowledge you lack. I suggest the sources on our Books page,starting with Date's
INTRODUCTION TO DATABASE
SYSTEMS and my PRACTICAL ISSUES IN DATABASE MANAGEMENT. Both these
sources as well Date's DATABASE WRITINGS series of books have sections
about keys. Chapter 3 in my book is on keys, where I say that key selection is
a logical endeavor, which should not be contaminated with physical
performance considerations.
From: AE
OK, Debunkers, I ordered PRACTICAL ISSUES from Amazon
(yes I clicked through dbdebunk.com to get to it). I read it cover to cover.
There was no discussion at all on the subject of speaking keys, although I
could see that most of the keys in the examples were speaking keys (e.g. in
many examples, "SUPP" in the "Activity" domain means
"support"). There was also a "note" saying that some people
like populating their tables with synthetic keys at every opportunity, but this
isn't necessary, nor is it always recommended. I would have liked some more
discussion of this issue.
So I ask my question again with some background information.
Feel free to use it for your quote of the week.
I am an application developer who had developed an
application and a database schema to go with it, and was handing the schema off
to a DBA so he could complete the work. I originally had an "action"
domain that was defined as the varchar values "add",
"delete", and "move". He changed the schema to have
"action_id" columns, taking system-generated integer values, which
referred to an action lookup table with action_id and name columns. He did the
same thing for other domains, ripping out varchars and replacing them with
integers referenced to lookup tables. He said that this would ensure
"flexibility"--what if I wanted to change one of the names?
I then and especially now (after having read PRACTICAL
ISSUES) believe that what he did nothing to do with normalization, and
instead was an empty exercise in indirection and obfuscation. What's more, due
to the lack of true domains in our DBMS, it makes it possible for a programmer
to put an 'action_id' value in, say, a 'source_application_id' column, since
they're both integers. To make matters worse, this DBA eschews views, which
makes constructing queries very cumbersome. He suggested defining mnemonic
constants in my code that would correspond to the appropriate integer values,
but it seemed to me that that was just more work for me, and the ensuing
"flexibility" of the system left more opportunities for error. I
agreed to the changes mostly because it was going to be his schema to work
with.
Shortly after I rewrote my application to accommodate the
changes, the DBA left us for greener pastures, and I'm stuck with this (in my
opinion) ugly schema. My first impulse was to revert the schema to the old
design, but I wanted to check with people with more experience in these areas,
to see if there was any advice on this common, non-vendor-specific issue.
So my question remains: was I right to use sets of mnemonic
character strings for these small, stable domains? Is it only a matter of taste? What are the
practical
considerations when making such a decision? I could not find any discussion of
this matter on the Database Debunking website or in the PRACTICAL ISSUES book.
I understand you are busy, professional consultants, and
don't really expect a personal reply. I'm just suggesting that you address this
"practical issue" somewhere on your website, or in a future edition
of the book.
From: Fabian Pascal
Everything that is needed to understand and define
keys is in the chapter in my book. If it does not address your question, it
must mean that the issue is not a formal one, but rather an arbitrary
one outside of the data model.
In the chapter I say that the relational model only insists
that every table has a key, but it has nothing to say about key selection--that
is a database-specific issue, which only those familiar with it (in this case,
you) can address. The choice of keys is made on pragmatic grounds, where
4 criteria, familiarity, irreducibility, simplicity, stability, must be
traded off, depending on how users use the specific database. I'm afraid there
is nothing that can be added to that, it is up to you to decide on which
tradeoff is satisfactory in your particular case.
From: AE
Status: U
X-UIDL:
PRjKFNHkIfVw3AE
"Because the type
definition does not place any restriction on type representations, a data model
worthy of its name should permit and work with any type, regardless of
representation and complexity. In the relational model, this is achieved by the
use of DOMAINS, which are nothing but data types of arbitrary complexity."
--popular book on database management
Comment: If a domain is "nothing but" a data type
of arbitrary complexity, then it is nothing but a data type, period. Nothing
but fuzzy thinking is "achieved" by using two different terms to mean
the exact same thing.
From: Fabian Pascal
You got it somewhat backwards. The industry has been
corrupting terms since time immaterial, and that includes data types, which
were known to programmers, but not to database practitioners. It was in order
to avoid confusion with the corrupted terms that Codd was forced to use
different terminology for the relational model. Yes, it would have been nice if
he didn't have to do that, but you should place the blame where it really
belongs: on the industry, not on relational proponents.
It’s always a good idea to acquire some background on
subjects before you comment on them.
Posted
09/13/02
[ABOUT]
[QUOTES]
[LINKS]