July 2023 - DATABASE DEBUNKINGS

Monday, July 31, 2023

ON RELATIONAL KEYS (& DOMAINS) (t&n)

Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the disregarded evolution/progress of RDM, I am re-visiting my old debunkings, bringing them up to the current state of knowledge. This will enable you to judge how well arguments have held up and realize the increasing gap between industry stagnation -- and scientific progress.

Then

ON KEYS (originally published September 2002)

“I am an application developer who is competent in SQL and have designed some small-scale schemas that have gotten the job done. I came to your website seeking enlightment on the issue of whether it is better to use strings to represent attributes (I believe these are called "speaking keys"), or to use integers and key them to a list of strings. I see that you are avidly pro-normalization, but I'm not even sure if this is a normalization issue. Here is a quote from a web site that captures my gut feeling. I hope you can comment or point me to an article that presents your view of the matter.

Many popular database tools make it very easy to create some sort of arbitrary numeric sequence for your primary keys. Be careful not to use these to excess. If you have are designing a music catalog system, it is definitely not useful to create a lot of arbitrary codes for music classification. There is unlikely to be much benefit in designating "1" to stand in for "Jazz", 2 for "Rock", 3 for "Classical" and 4 for "Latin Jazz" in an application. Why not just use the word "Jazz" for "Jazz" and use "Rock" to stand for "Rock"? It is certainly easier to remember. It also means that you won't have to lookup the value of 1 to see that it stands for "Jazz".”

Tuesday, July 25, 2023

NULL & THE "2ND ADDRESS LINE" (sms)

Follow @DBDebunk Follow @ThePostWest

Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “Does anyone have an alternative solution to the issue where multiple nulls are allowed in the same column for a unique index containing that column? I have a number of such nullable columns and unique index combinations in my database schema and I'm trying to avoid having to create additional generated as columns for each case.”

A: “null<>null so the problem's built into unique columns, perhaps the columns need to become non-nullable?”

“Some attributes in a tuple are not meant to be part of any relationship so people are safe to populate such relationshipless attributes with a Null when needed. Just an example, how about the famous MailAddressSecondLine attribute describing the optional second line of a mail address? You can populate it if needed or you can leave it with no value a.k.a. Null if you have nothing to put in there.”

“So, when YOU create an address table, do you have a "second address" line or do you normalize it? Enquiring minds want to know.”

“You are asking how I would implement optional parts of an address within the limitations of a fixed set of numbered address line attributes? If an address has fewer lines than there are attributes then I'd populate the extra lines with zero-length strings. Null wouldn't be appropriate because all the parts of the address are known and present and using null would give undesirable results. For example if you compare all the parts of an address in a join or subquery you ought to expect two identical addresses to be returned as equivalent, but that wouldn't work if you start adding nulls into addresses.”

We have discussed extensively the problems with and solution to the treatment of missing data in relational databases: