Can anyone tell me where I can find guides on making a good
table? Usually I go to Access Help.
--dbForums.com
Please provide me with some good
design document/ link.
--dbForums.com
We select presentations based on information gathered from our
members. Popular topics are performance, security, new features.
Less requested are modeling topics.
--Oracle
User Group
Adam Machanic:
First
rule of DW is forget everything you think you know about data modeling... :)
--www.mcse.ms/~
Project Id: 123015, PHP3/4 + mySQL customer database
Qualifications and Experience Required Primary skill: Web
Development/Programming Secondary skills: PHP, MySQL Type of website: Retail -
Consumer Focused (B2C) Additional skills required: must be capable of creating
a well organized mySQL database that will never need to be re-written or
re-organized
--Job ad, guru.com
A few years ago a French diet allowing adherents to eat whatever
they wanted in any quantity became very popular. A US nutritionist interviewed
about it on TV commented that the diet's author learned what Americans have
known for a long time: you can become popular and rich telling people what they
want to hear.
Responding to an ITToolbox blog
with challenges to his opinion article in Computerworld, Time for a
New View of Data Management, Curt Monash stated:
Unlike Computerworld, ITToolbox seems to let anybody who wants
to blog for them--without screening for objectivity, reasoning ability, or
willingness to use a spell-checker. One of their bloggers is a fellow named Alf
Pederson. On the average, Alf rants once every few weeks decrying anything that
purports to be better than a theoretically pure relational database, preferably
one that is carefully designed by an expensive professional such as
himself. (Apparently he has somewhere
between 15 and 28 years of experience in exploiting the complexities of database
software, and is personally terrified any time somebody suggests a
technological architecture or strategy that would make it less necessary to
keep paying guys like him.)
Criticizing the incompetence of others—without anything to
back it up—is a common and well known technique to distract from one’s own
incompetence. In fact, it is Computerworld's judgment that is in
question here, for publishing Monash’s nonsense, and then censoring reactions
that exposes it for what it is (see On What They Don’t
Publish, What They Do, and What They Cut Out). But being Unskilled and
Unaware of It, Monash is oblivious to his Orwellian NewSpeak: exactly
the opposite of what he claims about himself, others, and database management
is true.
An informed reading of his “opinion” (as well as the two
threads he opened to complain about the challenges to it) reveals his
pronouncements to be so flawed as to not really merit commenting, and my
initial instinct was to ignore them. There is too much utter nonsense being
disseminated these days, and life is too short. But after two long exchanges in
which he did not address even one of the challenges to his arguments, perhaps
just a short debunking is in order, if only to let the reader ponder how low a
level of discourse the industry has descended to.
Says Monash:
Database management is in a crisis, one that's only partly
recognized. The horrors of data integration may be well known, but they're only
the tip of a much larger iceberg: schema complexity. Programmers, system
architects, and database administrators focusing on design and operation
alike--all their jobs are made immeasurably harder by the boggling complexity
of relational schemas. As schema diversity explodes, the pure relational model
is collapsing under its own weight.
There is, indeed, a huge crisis in data management,
and one that’s not recognized at all. But it is hardly rooted in
“relational schema complexity”. Rather, it is due to the lack of foundation
knowledge and inability to reason rampant in the industry (and incidentally, in
society as a whole), of which Monash exhibits plenty. Consequently, his claim
that "We must replace [the relational model] with a radically different
view of data management, which I'm calling DBMS2, for database management
system services" is impossible to take seriously: can a term devoid of any
substance and a few confused paragraphs be a superior replacement for thousands
of years of logic and mathematics and decades of data management experience?
Only in America can this be suggested, and published in 2005.
There is confusion right in the first paragraph. When Monash
says schema, he does not really mean schema—the metadata, the sum-total of
integrity constraints on the data (including table headers), the
DBMS-"understood meaning of the data, and the best approximation a DBMS
can have of the user-understood meaning, or interpretation of the data—the
business rules (see Business
Modeling for Database Design, A Foundation Framework for Data Management),
but rather fully normalized designs.
Now, if a database structure (or even schema) is complex, the
question is whether the complexity is (a) inherent, that is, the schema
represents a complex reality, or (b) due to data management methods and tools
employed. In the former case, whether we like it or not, there is nothing much
we can do: if the world is complex, so will be its representation in the
database. In the latter case, however, we should choose methods and tools that
do not add unnecessary complexity on top of that inherent in the
real world. And if Monash had any clue about the relational model and database
design, he would know that they were invented to address this very problem.
Ø
The argument against normalization for performance
reasons has long been thoroughly debunked, but Monash’s contention is that
fully normalized databases are also complex to the point of being unusable.
That is utterly false: what full normalization means, loosely, is that every
real-world entity type is represented in the database by exactly one table, and
that, of course
·
makes databases easier to understand
·
minimizes what otherwise would be a prohibitive
integrity burden
·
simplifies the formulation of queries issued by
multiple applications that need different views of the data
This is explained in detail in The Costly Illusion:
Normalization, Integrity, and Performance.
Incidentally, the relational model
only requires database tables to be in first normal form (1NF), or normalized
(see What First
Formal Form Really Means, and What First Normal Form
Means Not). Further/full
normalization (2NF-5NF) is not an integral component of the model per se.
Rather, it was devised for the practical purposes listed above, Monash’s
fallacious dismissal of them as “theological purity” notwithstanding.
Ø
One
of RM's main objectives was to simplify database management relative to
the hierarchic and network data models, the faults of which were inflexibility
and unnecessary complexity! So if RM and normalization are to be replaced, how
can XML, which is tree-structured, be an improvement?
Ø
The
relational model (RM) is nothing but the application of predicate logic and
mathematics to database management. Given that the purpose of a DBMS is to make
inferences from database data, what, pray, other than logic should govern
their operation? What does a view "radically different" than logic
and math mean, anyway?
Ø
None
of the DBMSs currently available commercially is a true implementation of RM,
so how could a model that has not been implemented and used, "collapse
under its own weight"? (Actually, there is one commercial product,
Alphora's Dataphor,
based more closely on RM, but when asked, Monash admitted he is not familiar
with it, yet dismissed it as a "toy").
All this throws into question Monash's knowledge and
understanding of the subject he opines on, hence my instinctive initial
reluctance to bother with it.
For his argument to stand, Monash must offer evidence that
(1) current database complexity is due to full normalization and (2) DBMS2
(whatever that is) is superior to fully normalized designs on that score (the
deficiencies addressed by full normalization do not go away just because a data
model other than RM is employed!)
The problem with (1) is that in reality, very few databases
are properly and fully normalized (even if those who design them erroneously
believe that they are). That is due partly to poor to inexistent design
knowledge, and indoctrination with the fallacy that normalization slows
performance; and partly to flawed SQL implementations which sometimes
(though not always) penalize normalized designs. So if any complexities exist,
they are not due to normalization, but to the lack thereof.
But does Monash at least prove (2), namely that his
approach—to the extent that he has one—is superior with respect to complexity,
even if we ignore, like he does, the many other criteria on which
database technologies, products, and practices, must be assessed—for example,
what about data independence?
Here are "the key aspects of DBMS2".
• Task-appropriate data managers. Just use whatever is cheapest
and simplest for each set of applications. Possible choices include but are not
limited to cheap online transaction processing DBMSs, high-end OLTP DBMSs, data
warehouse appliances, XML-based document stores, highly distributed and/or
small-footprint DBMSs, in-memory systems without their own persistent storage,
or cross-corpus indexers without their own storage.
Any knowledgeable and experienced database
professional (one must be both) would dismiss this instinctively as nonsense.
What Monash advocates here is application-managed data and the anarchy
of disparate technologies and products that has been haunting users at huge
great cost, part of it being the very thing Monash deplores: complexity. The
notion that his recommendation simplifies matters boggles the mind.
·
On exactly what sound grounds should task-appropriateness
be evaluated and tools selected?
·
If multiple different applications, including unknown
future ones, need to use the same data in different ways, should the data be
duplicated for each of the “task-appropriate” tools, in which case what about
consistency?
·
What about learning, using, and supporting all the
different systems?
·
Should data be managed by DBMSs, or task-appropriate
applications? Weren't DBMSs in general—not just the relational ones—invented
precisely to avoid these very problems?
·
What if the "simplest and cheapest" can't do
the job, particularly over time?
In fact, another objective of RM was to minimize the
proliferation of "task appropriate" systems which became
unmanageable, and relational proponents have argued and demonstrated extensively
that it was the failure to implement RM that led to such proliferation in the
first place.
Be that as it may, what does this have to do with logical
database design, and how does the multi-tool approach recommended by Monash
ensure that databases are not complex when they are used?!?
•
Drastic limitations on relational schema complexity. Relational schemas
shouldn't go far beyond two simple models: master-detail for transactions, and
hypercubes/star schemas for analytics. Anything inherently more complex is,
with rare exceptions, better handled via the schema flexibility of XML. If you
need to access data from a legacy application that violates these precepts, do
so via XML-based Web services.
It is hardly surprising that Monash displays more confusion,
which here is double fold.
·
Confusion of levels of representation: With respect
with “master-details”, aha, so that’s how DBMS2 simplifies designs: by
arbitrarily forcing simplicity on conceptual models of reality to
be represented in databases! If that is
the simplicity that you want to impose, what does it have to do with the
databases being relational, or fully normalized? Indeed, what prevents anybody
from doing this with even current SQL systems? (is it, perhaps, the
simplistic,
rather than simple, nature of the approach, which does not satisfy real-world
needs?)
·
Confusion of shared databases with application views of
data: Hypercubes/star schemas are the latter. They are ad-hoc, arbitrary
designs for databases without any justification, but there is nothing to
prevent making such views of the data available to applications that need them.
Indeed, logical data independence is one of the major advantages of RM.
“Better handled with the schema flexibility of XML”? What in
the world does that mean?? There is currently nothing that can really
be called an XML schema, as properly defined above. If (and when) XML
supports schemas with all types of integrity constraints necessary, they
will not be more simple or flexible than relational schemas, but rather much
more complex, because the XML data structure is the tree, and the relational
model was invented to get rid of the complexities and inflexibilities of
hierarchic structures with respect to integrity and manipulation—the
main purposes of database management—which Monash completely ignores.
Note: Lee Fesperman points out that while the primary
structure of XML is a physical tree as in the hierarchical data model, it also
allows external pointers that can be used to form a network model structure (albeit
clumsily) [which further complicates matters]. It is funny that Monash says
that RDBMSs should only cover “master-detail and hypercube/star schemas while
XML should handle more 'complex' schemas”, since XML's tree structure is just
an elaboration of master-detail. Exactly how could XML handle complex schemas,
except badly?
More likely, what Monash refers to as “XML schema
flexibility” is XML’s lack of schema, which also means very poor
semantics and, therefore, poor system knowledge of what the data means. I’ve
written extensively about this fallacy, and my articles are available for free
on the Internet.
•
Both XML-based and relational information integration. Eventually, most DBMS2
data integration will be done via XML. But relational enterprise information
integration will long have a role to play, such as connecting core OLTP and
data warehouse systems.
Another major objective of RM, explicitly stated by Codd, is
facilitation (courtesy of the very mathematics that Monash summarily dismisses)
of integration, in particular the consistent segmentation and merging of
databases. What is Monash's evidence that data integration is better done in
XML, or even can be done in XML? In fact, what exactly does he mean by
integration? I doubt Monash knows, but let me be generous and make an
assumption. If you really believe that the way to solve current data management
problems is to generate lots of disparate applications, each with its own data,
and then integrate by shipping data in XML format around between those
disparate applications, I suggest you read
The Myth ofSelf-describing XML, and if you still
believe that this relieves you from proper database design (which Monash seems
to suggest), I have a bridge in Brooklyn to sell you.
If you know and understand data fundamentals, you
observe current practice, and you can and are willing to think, you know
what the problem is: the average practitioner does not possess the necessary
foundation knowledge for proper conceptual modeling and database design, and
finds thinking hard (often, does not even know enough about the business to be
modeled). He is self taught, usually trained in a programming language and, at
best, a DBMS product, and operates in “cookbook mode”, without a real
understanding of what he does and why (my writings are chockfull of examples,
and it is impossible to keep up). This is due to the collapse of the
educational system, now being a “market-driven” training platform for software
tool vendors (see the Myth
of Market-based Education). The solution, for which there are no
substitutes or workarounds, is a rehabilitation of education to include
foundation knowledge and the history of the field. That, however, is not what
the industry wants to hear, and you cannot be a profitable consultant if you
insist on it.
But if you lack the necessary knowledge yourself, you will
confuse current SQL DBMSs (designed by people who don't know RM either, see If You Liked SQL You’ll
Love XQuery) with the model, will blame problems on what is actually
the solution, and will tell people what they want to hear: that they don't have
to learn/know anything beyond tools, don't have to think much, all they need to
do is buy whatever cheap and simple packages are sold, and not to worry, the
latest fad pushed by the industry (this time, XML) will be the magic wand that
solves everything.
In other words, do more of what created the crisis in the
first place, but give it some new label like DBMS2, and hype the latest
industry fad. Customers will love you, vendors will love you, and since the
vendors advertise, the trade media will love and publicize you. None of that
will happen if you expose the ignorance at the root of bad practices, and
advise people to go learn fundamentals before they get involved in data
management. Don’t bother them with theory, they have practical things to do.
Monash’s “new way” is no way at all. DBMS2 is DBMSless
data management. It is hardly a radically new approach to data management, but
a throwback decades, to the old days of application-managed data (which
database management was invented to eliminate). It's what you would do if you
had no clue what data management is, which is exactly what should be expected
in an industry that ignores fundamentals and its own history, and does not,
therefore, learn from experience.
Let me make a prediction: several years from now, the
Monashes in the industry will peddle DBMS3, a “new radical view of data
management” that will solve all the problems caused by DBMS2 and XML. The
constant moves from DBMSn to DBMSn+1, DBMSn+2,… is highly profitable for their
peddlers and the industry, but hugely costly to users and the public. And
without proper education, they will continue to pay increasingly more.
See also C. J. Date’s Commentary on Certain
Remarks by Curt Monash
Posted 9/3/05
© Fabian Pascal 2006 All Rights Reserved