MONASH BALDERDASH
by Fabian Pascal

 

 

 

Can anyone tell me where I can find guides on making a good table? Usually I go to Access Help.

--dbForums.com

 

Please provide me with some good design document/ link.
--dbForums.com

 

We select presentations based on information gathered from our members. Popular topics are performance, security, new features.  Less requested are modeling topics.

--Oracle User Group

 

Adam Machanic: First rule of DW is forget everything you think you know about data modeling... :)

--www.mcse.ms/~

 

Project Id: 123015, PHP3/4 + mySQL customer database Qualifications and Experience Required Primary skill: Web Development/Programming Secondary skills: PHP, MySQL Type of website: Retail - Consumer Focused (B2C) Additional skills required: must be capable of creating a well organized mySQL database that will never need to be re-written or re-organized

--Job ad, guru.com

 

A few years ago a French diet allowing adherents to eat whatever they wanted in any quantity became very popular. A US nutritionist interviewed about it on TV commented that the diet's author learned what Americans have known for a long time: you can become popular and rich telling people what they want to hear.

 

 

Responding to an ITToolbox blog with challenges to his opinion article in Computerworld, Time for a New View of Data Management, Curt Monash stated:

 

Unlike Computerworld, ITToolbox seems to let anybody who wants to blog for them--without screening for objectivity, reasoning ability, or willingness to use a spell-checker. One of their bloggers is a fellow named Alf Pederson. On the average, Alf rants once every few weeks decrying anything that purports to be better than a theoretically pure relational database, preferably one that is carefully designed by an expensive professional such as himself.  (Apparently he has somewhere between 15 and 28 years of experience in exploiting the complexities of database software, and is personally terrified any time somebody suggests a technological architecture or strategy that would make it less necessary to keep paying guys like him.)

 

Criticizing the incompetence of others—without anything to back it up—is a common and well known technique to distract from one’s own incompetence. In fact, it is Computerworld's judgment that is in question here, for publishing Monash’s nonsense, and then censoring reactions that exposes it for what it is (see On What They Don’t Publish, What They Do, and What They Cut Out). But being Unskilled and Unaware of It, Monash is oblivious to his Orwellian NewSpeak: exactly the opposite of what he claims about himself, others, and database management is true.

 

An informed reading of his “opinion” (as well as the two threads he opened to complain about the challenges to it) reveals his pronouncements to be so flawed as to not really merit commenting, and my initial instinct was to ignore them. There is too much utter nonsense being disseminated these days, and life is too short. But after two long exchanges in which he did not address even one of the challenges to his arguments, perhaps just a short debunking is in order, if only to let the reader ponder how low a level of discourse the industry has descended to.

    

Says Monash:

 

Database management is in a crisis, one that's only partly recognized. The horrors of data integration may be well known, but they're only the tip of a much larger iceberg: schema complexity. Programmers, system architects, and database administrators focusing on design and operation alike--all their jobs are made immeasurably harder by the boggling complexity of relational schemas. As schema diversity explodes, the pure relational model is collapsing under its own weight.

 

There is, indeed, a huge crisis in data management, and one that’s not recognized at all. But it is hardly rooted in “relational schema complexity”. Rather, it is due to the lack of foundation knowledge and inability to reason rampant in the industry (and incidentally, in society as a whole), of which Monash exhibits plenty. Consequently, his claim that "We must replace [the relational model] with a radically different view of data management, which I'm calling DBMS2, for database management system services" is impossible to take seriously: can a term devoid of any substance and a few confused paragraphs be a superior replacement for thousands of years of logic and mathematics and decades of data management experience? Only in America can this be suggested, and published in 2005.

 

There is confusion right in the first paragraph. When Monash says schema, he does not really mean schema—the metadata, the sum-total of integrity constraints on the data (including table headers), the DBMS-"understood meaning of the data, and the best approximation a DBMS can have of the user-understood meaning, or interpretation of the data—the business rules (see Business Modeling for Database Design, A Foundation Framework for Data Management), but rather fully normalized designs.

 

Now, if a database structure (or even schema) is complex, the question is whether the complexity is (a) inherent, that is, the schema represents a complex reality, or (b) due to data management methods and tools employed. In the former case, whether we like it or not, there is nothing much we can do: if the world is complex, so will be its representation in the database. In the latter case, however, we should choose methods and tools that do not add unnecessary complexity on top of that inherent in the real world. And if Monash had any clue about the relational model and database design, he would know that they were invented to address this very problem.

 

Ø       The argument against normalization for performance reasons has long been thoroughly debunked, but Monash’s contention is that fully normalized databases are also complex to the point of being unusable. That is utterly false: what full normalization means, loosely, is that every real-world entity type is represented in the database by exactly one table, and that, of course

 

·   makes databases easier to understand

·   minimizes what otherwise would be a prohibitive integrity burden

·   simplifies the formulation of queries issued by multiple applications that need different views of the data

 

This is explained in detail in The Costly Illusion: Normalization, Integrity, and Performance.

 

Incidentally, the relational model only requires database tables to be in first normal form (1NF), or normalized (see What First Formal Form Really Means, and What First Normal Form Means Not). Further/full normalization (2NF-5NF) is not an integral component of the model per se. Rather, it was devised for the practical purposes listed above, Monash’s fallacious dismissal of them as “theological purity” notwithstanding.

 

Ø       One of RM's main objectives was to simplify database management relative to the hierarchic and network data models, the faults of which were inflexibility and unnecessary complexity! So if RM and normalization are to be replaced, how can XML, which is tree-structured, be an improvement?

 

Ø       The relational model (RM) is nothing but the application of predicate logic and mathematics to database management. Given that the purpose of a DBMS is to make inferences from database data, what, pray, other than logic should govern their operation? What does a view "radically different" than logic and math mean, anyway?

 

Ø       None of the DBMSs currently available commercially is a true implementation of RM, so how could a model that has not been implemented and used, "collapse under its own weight"? (Actually, there is one commercial product, Alphora's Dataphor, based more closely on RM, but when asked, Monash admitted he is not familiar with it, yet dismissed it as a "toy").

 

All this throws into question Monash's knowledge and understanding of the subject he opines on, hence my instinctive initial reluctance to bother with it.

 

For his argument to stand, Monash must offer evidence that (1) current database complexity is due to full normalization and (2) DBMS2 (whatever that is) is superior to fully normalized designs on that score (the deficiencies addressed by full normalization do not go away just because a data model other than RM is employed!)

 

The problem with (1) is that in reality, very few databases are properly and fully normalized (even if those who design them erroneously believe that they are). That is due partly to poor to inexistent design knowledge, and indoctrination with the fallacy that normalization slows performance; and partly to flawed SQL implementations which sometimes (though not always) penalize normalized designs. So if any complexities exist, they are not due to normalization, but to the lack thereof.

 

But does Monash at least prove (2), namely that his approach—to the extent that he has one—is superior with respect to complexity, even if we ignore, like he does, the many other criteria on which database technologies, products, and practices, must be assessed—for example, what about data independence?

 

Here are "the key aspects of DBMS2".

 

• Task-appropriate data managers. Just use whatever is cheapest and simplest for each set of applications. Possible choices include but are not limited to cheap online transaction processing DBMSs, high-end OLTP DBMSs, data warehouse appliances, XML-based document stores, highly distributed and/or small-footprint DBMSs, in-memory systems without their own persistent storage, or cross-corpus indexers without their own storage.

 

Any knowledgeable and experienced database professional (one must be both) would dismiss this instinctively as nonsense. What Monash advocates here is application-managed data and the anarchy of disparate technologies and products that has been haunting users at huge great cost, part of it being the very thing Monash deplores: complexity. The notion that his recommendation simplifies matters boggles the mind.

 

·    On exactly what sound grounds should task-appropriateness be evaluated and tools selected?

·    If multiple different applications, including unknown future ones, need to use the same data in different ways, should the data be duplicated for each of the “task-appropriate” tools, in which case what about consistency?

·    What about learning, using, and supporting all the different systems?

·    Should data be managed by DBMSs, or task-appropriate applications? Weren't DBMSs in general—not just the relational ones—invented precisely to avoid these very problems?

·    What if the "simplest and cheapest" can't do the job, particularly over time?

 

In fact, another objective of RM was to minimize the proliferation of "task appropriate" systems which became unmanageable, and relational proponents have argued and demonstrated extensively that it was the failure to implement RM that led to such proliferation in the first place.

 

Be that as it may, what does this have to do with logical database design, and how does the multi-tool approach recommended by Monash ensure that databases are not complex when they are used?!?

 

• Drastic limitations on relational schema complexity. Relational schemas shouldn't go far beyond two simple models: master-detail for transactions, and hypercubes/star schemas for analytics. Anything inherently more complex is, with rare exceptions, better handled via the schema flexibility of XML. If you need to access data from a legacy application that violates these precepts, do so via XML-based Web services.

 

It is hardly surprising that Monash displays more confusion, which here is double fold.

 

·    Confusion of levels of representation: With respect with “master-details”, aha, so that’s how DBMS2 simplifies designs: by arbitrarily forcing simplicity on conceptual models of reality to be represented in databases!  If that is the simplicity that you want to impose, what does it have to do with the databases being relational, or fully normalized? Indeed, what prevents anybody from doing this with even current SQL systems? (is it, perhaps, the simplistic, rather than simple, nature of the approach, which does not satisfy real-world needs?)

 

·    Confusion of shared databases with application views of data: Hypercubes/star schemas are the latter. They are ad-hoc, arbitrary designs for databases without any justification, but there is nothing to prevent making such views of the data available to applications that need them. Indeed, logical data independence is one of the major advantages of RM.

 

“Better handled with the schema flexibility of XML”? What in the world does that mean?? There is currently nothing that can really be called an XML schema, as properly defined above. If (and when) XML supports schemas with all types of integrity constraints necessary, they will not be more simple or flexible than relational schemas, but rather much more complex, because the XML data structure is the tree, and the relational model was invented to get rid of the complexities and inflexibilities of hierarchic structures with respect to integrity and manipulation—the main purposes of database management—which Monash completely ignores.

 

Note: Lee Fesperman points out that while the primary structure of XML is a physical tree as in the hierarchical data model, it also allows external pointers that can be used to form a network model structure (albeit clumsily) [which further complicates matters]. It is funny that Monash says that RDBMSs should only cover “master-detail and hypercube/star schemas while XML should handle more 'complex' schemas”, since XML's tree structure is just an elaboration of master-detail. Exactly how could XML handle complex schemas, except badly?

 

More likely, what Monash refers to as “XML schema flexibility” is XML’s lack of schema, which also means very poor semantics and, therefore, poor system knowledge of what the data means. I’ve written extensively about this fallacy, and my articles are available for free on the Internet.

 

• Both XML-based and relational information integration. Eventually, most DBMS2 data integration will be done via XML. But relational enterprise information integration will long have a role to play, such as connecting core OLTP and data warehouse systems.

 

Another major objective of RM, explicitly stated by Codd, is facilitation (courtesy of the very mathematics that Monash summarily dismisses) of integration, in particular the consistent segmentation and merging of databases. What is Monash's evidence that data integration is better done in XML, or even can be done in XML? In fact, what exactly does he mean by integration? I doubt Monash knows, but let me be generous and make an assumption. If you really believe that the way to solve current data management problems is to generate lots of disparate applications, each with its own data, and then integrate by shipping data in XML format around between those disparate applications, I suggest you read The Myth ofSelf-describing XML, and if you still believe that this relieves you from proper database design (which Monash seems to suggest), I have a bridge in Brooklyn to sell you.

 

If you know and understand data fundamentals, you observe current practice, and you can and are willing to think, you know what the problem is: the average practitioner does not possess the necessary foundation knowledge for proper conceptual modeling and database design, and finds thinking hard (often, does not even know enough about the business to be modeled). He is self taught, usually trained in a programming language and, at best, a DBMS product, and operates in “cookbook mode”, without a real understanding of what he does and why (my writings are chockfull of examples, and it is impossible to keep up). This is due to the collapse of the educational system, now being a “market-driven” training platform for software tool vendors (see the Myth of Market-based Education). The solution, for which there are no substitutes or workarounds, is a rehabilitation of education to include foundation knowledge and the history of the field. That, however, is not what the industry wants to hear, and you cannot be a profitable consultant if you insist on it.

 

But if you lack the necessary knowledge yourself, you will confuse current SQL DBMSs (designed by people who don't know RM either, see If You Liked SQL You’ll Love XQuery) with the model, will blame problems on what is actually the solution, and will tell people what they want to hear: that they don't have to learn/know anything beyond tools, don't have to think much, all they need to do is buy whatever cheap and simple packages are sold, and not to worry, the latest fad pushed by the industry (this time, XML) will be the magic wand that solves everything.

 

In other words, do more of what created the crisis in the first place, but give it some new label like DBMS2, and hype the latest industry fad. Customers will love you, vendors will love you, and since the vendors advertise, the trade media will love and publicize you. None of that will happen if you expose the ignorance at the root of bad practices, and advise people to go learn fundamentals before they get involved in data management. Don’t bother them with theory, they have practical things to do.

 

Monash’s “new way” is no way at all. DBMS2 is DBMSless data management. It is hardly a radically new approach to data management, but a throwback decades, to the old days of application-managed data (which database management was invented to eliminate). It's what you would do if you had no clue what data management is, which is exactly what should be expected in an industry that ignores fundamentals and its own history, and does not, therefore, learn from experience.

 

Let me make a prediction: several years from now, the Monashes in the industry will peddle DBMS3, a “new radical view of data management” that will solve all the problems caused by DBMS2 and XML. The constant moves from DBMSn to DBMSn+1, DBMSn+2,… is highly profitable for their peddlers and the industry, but hugely costly to users and the public. And without proper education, they will continue to pay increasingly more.

 

See also C. J. Date’s Commentary on Certain Remarks by Curt Monash

 

 

Posted 9/3/05

© Fabian Pascal 2006 All Rights Reserved