From: Tim Armstrong
Date: 23 Dec 2005
I was hoping you might help me solve a relational modeling
problem. I have a friend who collects LP records and sells them on eBay. The
LP's consist of mostly music (80%) and the other 20% consists of very off the
wall stuff like "Ventriloquism for Dummies", etc... His current
database (if you could call it that) is essentially a few "flat
tables" with several entities bundled together throughout. I'm a student
and am trying to learn database design and thought this LP collection would
present a good challenge. As it turns out, the problem at hand is much more
difficult than I thought it would be. In any case, I've been pecking away at
this problem for awhile and would like to know how well I've done (or not as
the case may be). Forgetting for the moment the oddball LP's and concentrating
on the music LP's still presents a bit of a challenge:
·
an artist may have one-to-many titles
·
a title may feature one-to-many artists
·
the entire LP collection may contain multiple instances
of a particular title
·
these instances will generally differ by condition
Is this design proper? Any advice would be appreciated.
Thanks.
From: Fabian Pascal
I generally refrain from giving design advice via email,
because I can never know enough about the reality to be modeled to make
intelligent comments. What is more, neither can I rely on the information
provided on that business by others.
I don't give hungry people a fish, I teach them how to fish.
That's why I wrote
Conceptual
Modeling and Database Design. I recommend that you read it and if you
understand it you won't have to ask me or others again in your next project.
Generally, there are two design principles to be followed:
·
Principle of Full Normalization (POFN):
Informally, exactly one R-table per entity set;
·
Principle of Orthogonal Design (POOD):
Informally, exactly one entity set per R-table;
Tables are not "flat". They are multidimensional
objects, each column representing a dimension. I take bundled to mean multiple
entity sets are merged together in single tables, resulting in less than fully
normalized databases.
Then educate yourself on fundamentals, don't take the usual
cookbook approach by asking ask for advice by email (btw, papers are deeply
discounted for students).
It almost always is; most practitioners underestimate the
task and overestimate their abilities in the absence of foundational knowledge.
Don't understand: What is condition, how is it defined?
This precedes design. These are some business rules (correct?
complete set?) that make up a conceptual model of the business. They must be
mapped to a logical model for representation in the. It's that latter stage that's
the design.
I strongly urge you not to follow the cookbook approach and
educate yourself in the fundamentals before you undertake design.
From: Tim Armstrong
I understand your reluctance to offer advice regarding design
questions in general and my ill-posed one in particular. With that said, I
would like to respond.
Regarding the giving of fish vs. learning to fish – I’ve been
trying to learn how to fish for awhile now. I’ve taken a couple of database
classes, but unfortunately they’ve not been very helpful. Very little time is
spent on the fundamentals and too much time is spent learning SQL or the
particulars of a RBDBMS like Oracle or SQL Server (yes I realize they’re not
very good implementations of the relational model, although I don’t really know
why). I’ve tried augmenting my classroom instruction by reading books, papers,
etc… I haven’t read your paper, Conceptual Modeling and Database Design
yet, but I plan to. I’m currently reading Date’s book, AN INTRODUCTION TO DATABASE SYSTEMS,
Seventh Edition, and have purchased your paper, The Costly Illusion:
Normalization, Integrity, and Performance. The problem is that reading
hasn’t resulted in full comprehension. If I read something in the text that I
don’t understand, who do I ask for clarification?
You pointed out two general design principles to follow:
· The
Principle of Full Normalization
· The
Principle of Orthogonal Design (I guess I need to read your paper on this)
I won’t pretend to understand the second principle except to
say that it seems to be the equivalent to the first principle. As for the first
principle, I’ve read and understand your arguments for full normalization, but
the problem is how to achieve this. In the case of the LP collection
(catalog, inventory, or what have you), my questions are:
· Have
I identified the entities correctly?
· Are
they related to each other properly?
· Have
I fully normalized?
As you alluded to in your response, these questions are
impossible to answer in a meaningful way unless one knows the nature of that which
is being modeled (the domain?). In this particular case, I thought it was
relatively easy, not to model, but to articulate the thing or things to be
modeled. Unfortunately I didn’t do that.
I’m not sure what you mean by cookbook approach, but I will
say that it would be very helpful to see the application of the
principles you outlined by way of this particular example. I don’t mean helpful
in the sense that the problem is solved, rather in how it is solved (learning
how to fish).
In answer to your question regarding what was meant by
“condition”, perhaps I should better explain what it is that I’m trying to
represent.
Consider a collection, or inventory if you will, of new and
used LP analog records. An LP in the real world has a title, an artist or
artists who are featured on it, a label that releases it, a general condition
(good, fair, poor, or perhaps 1,2,3,4 where the integer stands for a
pre-defined condition), and finally track information (track title, track
number, duration, and composer or composers). This inventory of LP’s can
contain multiple copies or instances of a particular title. For instance, I
currently have three copies of the title “One Size Fits All” by Frank Zappa.
Since these are used copies, one of them might be in excellent condition, while
another may be in poor condition (scratches, messed up cover, etc...). There
are formal descriptions of these conditions that collectors generally agree to,
but I can’t tell you what they are off the top of my head.
Since the end user (the collector) doesn’t care about track
or composer information, I assume that it doesn’t get modeled. Is this so? In
any case, it seemed pragmatic to leave it out of the model.
I’m particularly interested in seeing how something like this
gets modeled as it must be a fairly common situation. A video rental store
comes to mind. They have titles for rent and it wouldn’t make sense to
represent each and every copy of a particular movie repeatedly – too much
redundancy.
From: Fabian Pascal
Stay away from them, they will mislead you. The bad courses
are given because that's what uneducated users want and uneducated teachers can
give. That is why I give my seminars and write my papers. Unfortunately, there
is no demand for fundamentals.
If you don't understand my papers or Date's book, you need to
start with something more basic first. Try my book UNDERSTANDING RELATIONAL DATABASES.
It's out of print, so you should look for it at a library.
If you are a student you should ask those who are paid to teach you. You do
realize that I cannot teach you for free if they can't. Education is how I make
a living. And I gotta eat once in a while.
Anyway, it's one think to ask clarifications on fundamentals, and another on
specific design issues.
Re POOD, it is not the equivalent to POFN, but I would have to explain
them for you to understand why, and I cannot do that by email, and it's not the
place for it.
Only somebody who knows the business reality well can answer the first 2
questions, not somebody by email. And if you conceptualized your business rules
correctly, and you design your tables based on the 2 principles, you will end
up with a fully normalized database. The only time you need to normalize is when
you mess up your design and you need to fix it.
It is very hard and questionable to design via email. I have no way to know
whether your assumptions are correct/complete and whatever I say can be wrong
and will mislead you.
I get many questions such as yours. Imagine what would happen if I tried to
satisfy all of them. It's neither advisable, nor practical. That's what courses
and papers are for.
See what I mean? Now I would have to read these lengthy explanations,
comprehend them, ask you questions, etc. Not possible.
I know you're interested. However, I simply cannot tutor personally people via
email. In fact, beware of anybody who will give you such advice online—that
means they don't sufficiently appreciate the nature of design.
Posted 3/3/06
© Fabian Pascal 2006 All Rights Reserved