ON KEYS
with Fabian Pascal

 

 

 

From: AE

To: Editor

 

I am an application developer who is competent in SQL and have designed some small-scale schemas that have gotten the job done. I came to your website seeking enlightment on the issue of whether it is better to use strings to represent attributes (I believe these are called "speaking keys"), or to use integers and key them to a list of strings. I see that you are avidly pro-normalization, but I'm not even sure if this is a normalization issue. Here is a quote from a web site that captures my gut feeling. I hope you can comment or point me to an article that presents your view of the matter.

 

"Many popular database tools make it very easy to create some sort of arbitrary numeric sequence for your primary keys. Be careful not to use these to excess. If you have are designing a music catalog system, it is definitely not useful to create a lot of arbitrary codes for music classification. There is unlikely to be much benefit in designating "1" to stand in for "Jazz", 2 for "Rock", 3 for "Classical" and 4 for "Latin Jazz" in an application. Why not just use the word "Jazz" for "Jazz" and use "Rock" to stand for "Rock"? It is certainly easier to remember. It also means that you won't have to lookup the value of 1 to see that it stands for "Jazz".

 

 

From: Fabian Pascal

To: AE

 

We do not believe it is possible to be a good developer of databased applications without first educating yourself on database fundamentals. Such fundamentals cannot be captured via questions to a web site each time you have a specific question, but rather via reading basic material in sources, then, if there are details on which you are not clear on, or think you disagree with, you can try to have those clarified this way. There are no shortcuts and beware of anybody who claims otherwise.

 

The question you ask is part of the fundamentals that you need to know prior to involving yourself in practice. I would suspect that this is not the only fundamental knowledge you lack. I suggest the sources on our Books page,starting with Date's INTRODUCTION TO DATABASE SYSTEMS and my PRACTICAL ISSUES IN DATABASE MANAGEMENT. Both these sources as well Date's DATABASE WRITINGS series of books have sections about keys. Chapter 3 in my book is on keys, where I say that key selection is a logical endeavor, which should not be contaminated with physical performance considerations.

 

From: AE

 

OK, Debunkers, I ordered PRACTICAL ISSUES from Amazon (yes I clicked through dbdebunk.com to get to it). I read it cover to cover. There was no discussion at all on the subject of speaking keys, although I could see that most of the keys in the examples were speaking keys (e.g. in many examples, "SUPP" in the "Activity" domain means "support"). There was also a "note" saying that some people like populating their tables with synthetic keys at every opportunity, but this isn't necessary, nor is it always recommended. I would have liked some more discussion of this issue.

 

So I ask my question again with some background information. Feel free to use it for your quote of the week.

 

I am an application developer who had developed an application and a database schema to go with it, and was handing the schema off to a DBA so he could complete the work. I originally had an "action" domain that was defined as the varchar values "add", "delete", and "move". He changed the schema to have "action_id" columns, taking system-generated integer values, which referred to an action lookup table with action_id and name columns. He did the same thing for other domains, ripping out varchars and replacing them with integers referenced to lookup tables. He said that this would ensure "flexibility"--what if I wanted to change one of the names?

 

I then and especially now (after having read PRACTICAL ISSUES) believe that what he did nothing to do with normalization, and instead was an empty exercise in indirection and obfuscation. What's more, due to the lack of true domains in our DBMS, it makes it possible for a programmer to put an 'action_id' value in, say, a 'source_application_id' column, since they're both integers. To make matters worse, this DBA eschews views, which makes constructing queries very cumbersome. He suggested defining mnemonic constants in my code that would correspond to the appropriate integer values, but it seemed to me that that was just more work for me, and the ensuing "flexibility" of the system left more opportunities for error. I agreed to the changes mostly because it was going to be his schema to work with.

 

Shortly after I rewrote my application to accommodate the changes, the DBA left us for greener pastures, and I'm stuck with this (in my opinion) ugly schema. My first impulse was to revert the schema to the old design, but I wanted to check with people with more experience in these areas, to see if there was any advice on this common, non-vendor-specific issue.

 

So my question remains: was I right to use sets of mnemonic character strings for these small, stable domains?  Is it only a matter of taste? What are the practical considerations when making such a decision? I could not find any discussion of this matter on the Database Debunking website or in the PRACTICAL ISSUES book.

 

I understand you are busy, professional consultants, and don't really expect a personal reply. I'm just suggesting that you address this "practical issue" somewhere on your website, or in a future edition of the book.

 

 

From: Fabian Pascal

 

Everything that is needed to understand and define keys is in the chapter in my book. If it does not address your question, it must mean that the issue is not a formal one, but rather an arbitrary one outside of the data model.

 

In the chapter I say that the relational model only insists that every table has a key, but it has nothing to say about key selection--that is a database-specific issue, which only those familiar with it (in this case, you) can address. The choice of keys is made on pragmatic grounds, where 4 criteria, familiarity, irreducibility, simplicity, stability, must be traded off, depending on how users use the specific database. I'm afraid there is nothing that can be added to that, it is up to you to decide on which tradeoff is satisfactory in your particular case.

 

 

From: AE

Status: U

X-UIDL: PRjKFNHkIfVw3AE

 

"Because the type definition does not place any restriction on type representations, a data model worthy of its name should permit and work with any type, regardless of representation and complexity. In the relational model, this is achieved by the use of DOMAINS, which are nothing but data types of arbitrary complexity." --popular book on database management

 

Comment: If a domain is "nothing but" a data type of arbitrary complexity, then it is nothing but a data type, period. Nothing but fuzzy thinking is "achieved" by using two different terms to mean the exact same thing.

 

 

From: Fabian Pascal

 

You got it somewhat backwards. The industry has been corrupting terms since time immaterial, and that includes data types, which were known to programmers, but not to database practitioners. It was in order to avoid confusion with the corrupted terms that Codd was forced to use different terminology for the relational model. Yes, it would have been nice if he didn't have to do that, but you should place the blame where it really belongs: on the industry, not on relational proponents.

 

It’s always a good idea to acquire some background on subjects before you comment on them.

 

 

Posted 09/13/02

 

 

 

[ABOUT] [QUOTES] [LINKS]