ON DATABASE DESIGN
with Fabian Pascal

 

 

 

From: Tim Armstrong

Date: 23 Dec 2005

 

I was hoping you might help me solve a relational modeling problem. I have a friend who collects LP records and sells them on eBay. The LP's consist of mostly music (80%) and the other 20% consists of very off the wall stuff like "Ventriloquism for Dummies", etc... His current database (if you could call it that) is essentially a few "flat tables" with several entities bundled together throughout. I'm a student and am trying to learn database design and thought this LP collection would present a good challenge. As it turns out, the problem at hand is much more difficult than I thought it would be. In any case, I've been pecking away at this problem for awhile and would like to know how well I've done (or not as the case may be). Forgetting for the moment the oddball LP's and concentrating on the music LP's still presents a bit of a challenge:

 

·   an artist may have one-to-many titles

·   a title may feature one-to-many artists

·    the entire LP collection may contain multiple instances of a particular title

·    these instances will generally differ by condition

 

Is this design proper? Any advice would be appreciated. Thanks.

 

 

From: Fabian Pascal

 

I generally refrain from giving design advice via email, because I can never know enough about the reality to be modeled to make intelligent comments. What is more, neither can I rely on the information provided on that business by others.

 

I don't give hungry people a fish, I teach them how to fish. That's why I wrote Conceptual Modeling and Database Design. I recommend that you read it and if you understand it you won't have to ask me or others again in your next project. Generally, there are two design principles to be followed:

 

·         Principle of Full Normalization (POFN): Informally, exactly one R-table per entity set;

·         Principle of Orthogonal Design (POOD): Informally, exactly one entity set per R-table;

 

Tables are not "flat". They are multidimensional objects, each column representing a dimension. I take bundled to mean multiple entity sets are merged together in single tables, resulting in less than fully normalized databases.

 

Then educate yourself on fundamentals, don't take the usual cookbook approach by asking ask for advice by email (btw, papers are deeply discounted for students).

 

It almost always is; most practitioners underestimate the task and overestimate their abilities in the absence of foundational knowledge.

 

Don't understand: What is condition, how is it defined?

 

This precedes design. These are some business rules (correct? complete set?) that make up a conceptual model of the business. They must be mapped to a logical model for representation in the. It's that latter stage that's the design.

 

I strongly urge you not to follow the cookbook approach and educate yourself in the fundamentals before you undertake design.

 

 

From: Tim Armstrong

 

I understand your reluctance to offer advice regarding design questions in general and my ill-posed one in particular. With that said, I would like to respond.

 

Regarding the giving of fish vs. learning to fish – I’ve been trying to learn how to fish for awhile now. I’ve taken a couple of database classes, but unfortunately they’ve not been very helpful. Very little time is spent on the fundamentals and too much time is spent learning SQL or the particulars of a RBDBMS like Oracle or SQL Server (yes I realize they’re not very good implementations of the relational model, although I don’t really know why). I’ve tried augmenting my classroom instruction by reading books, papers, etc… I haven’t read your paper, Conceptual Modeling and Database Design yet, but I plan to. I’m currently reading Date’s book, AN INTRODUCTION TO DATABASE SYSTEMS, Seventh Edition, and have purchased your paper, The Costly Illusion: Normalization, Integrity, and Performance. The problem is that reading hasn’t resulted in full comprehension. If I read something in the text that I don’t understand, who do I ask for clarification?

 

You pointed out two general design principles to follow:

 

·   The Principle of Full Normalization

·   The Principle of Orthogonal Design (I guess I need to read your paper on this)

 

I won’t pretend to understand the second principle except to say that it seems to be the equivalent to the first principle. As for the first principle, I’ve read and understand your arguments for full normalization, but the problem is how to achieve this. In the case of the LP collection (catalog, inventory, or what have you), my questions are:

 

·   Have I identified the entities correctly?

·   Are they related to each other properly?

·   Have I fully normalized?

 

As you alluded to in your response, these questions are impossible to answer in a meaningful way unless one knows the nature of that which is being modeled (the domain?). In this particular case, I thought it was relatively easy, not to model, but to articulate the thing or things to be modeled. Unfortunately I didn’t do that.

 

I’m not sure what you mean by cookbook approach, but I will say that it would be very helpful to see the application of the principles you outlined by way of this particular example. I don’t mean helpful in the sense that the problem is solved, rather in how it is solved (learning how to fish).

 

In answer to your question regarding what was meant by “condition”, perhaps I should better explain what it is that I’m trying to represent.

 

Consider a collection, or inventory if you will, of new and used LP analog records. An LP in the real world has a title, an artist or artists who are featured on it, a label that releases it, a general condition (good, fair, poor, or perhaps 1,2,3,4 where the integer stands for a pre-defined condition), and finally track information (track title, track number, duration, and composer or composers). This inventory of LP’s can contain multiple copies or instances of a particular title. For instance, I currently have three copies of the title “One Size Fits All” by Frank Zappa. Since these are used copies, one of them might be in excellent condition, while another may be in poor condition (scratches, messed up cover, etc...). There are formal descriptions of these conditions that collectors generally agree to, but I can’t tell you what they are off the top of my head.

 

Since the end user (the collector) doesn’t care about track or composer information, I assume that it doesn’t get modeled. Is this so? In any case, it seemed pragmatic to leave it out of the model.

 

I’m particularly interested in seeing how something like this gets modeled as it must be a fairly common situation. A video rental store comes to mind. They have titles for rent and it wouldn’t make sense to represent each and every copy of a particular movie repeatedly – too much redundancy.

 

 

From: Fabian Pascal

 

Stay away from them, they will mislead you. The bad courses are given because that's what uneducated users want and uneducated teachers can give. That is why I give my seminars and write my papers. Unfortunately, there is no demand for fundamentals.

If you don't understand my papers or Date's book, you need to start with something more basic first. Try my book UNDERSTANDING RELATIONAL DATABASES. It's out of print, so you should look for it at a library.

If you are a student you should ask those who are paid to teach you. You do realize that I cannot teach you for free if they can't. Education is how I make a living. And I gotta eat once in a while.

Anyway, it's one think to ask clarifications on fundamentals, and another on specific design issues.

Re POOD, it is not the equivalent to POFN, but I would have to explain them for you to understand why, and I cannot do that by email, and it's not the place for it.

Only somebody who knows the business reality well can answer the first 2 questions, not somebody by email. And if you conceptualized your business rules correctly, and you design your tables based on the 2 principles, you will end up with a fully normalized database. The only time you need to normalize is when you mess up your design and you need to fix it.

It is very hard and questionable to design via email. I have no way to know whether your assumptions are correct/complete and whatever I say can be wrong and will mislead you.

I get many questions such as yours. Imagine what would happen if I tried to satisfy all of them. It's neither advisable, nor practical. That's what courses and papers are for.

See what I mean? Now I would have to read these lengthy explanations, comprehend them, ask you questions, etc. Not possible.

I know you're interested. However, I simply cannot tutor personally people via email. In fact, beware of anybody who will give you such advice online—that means they don't sufficiently appreciate the nature of design.

 

 

Posted 3/3/06

© Fabian Pascal 2006 All Rights Reserved