Saturday, September 11, 2021

OBG: Data Warehouses Are Non-Relational Application Views

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).


(originally posted 11/10/2001)

“It is dawning on me that data warehouse techniques (as advocated by Ralph Kimball) are a response to perceived SQL DBMS performance weaknesses and nothing more. Dimensional modeling is a physical implementation design to deal with what may already be out of date performance tests to back up said design. I agree that where data warehouse logical design deviates from the relational data model there will be trouble down the road. I can attest to the high cost involved in maintaining a data warehouse. I am now questioning three purported benefits of using the current popular data warehouse design techniques.

Physical DBMS designs like the star schema produce faster more predictable query results than a normalized one (I realize normalization is a strictly logical concept but it does appear to have its direct physical mappings in current DBMS systems). Well, I am about to find out. We will be performing some benchmarks. What really are the query times performed on our OLTP system vs. a Star schema for a few relevant reports.

Data warehouses offload query processing from the OLTP system. This is true, but may not be necessary. One needs to thoroughly analyze the traffic on the OLTP system to see if offloading is necessary. We are looking into simply replicating the database (or part of it) for reporting purposes. Replication is far simpler to maintain than a data warehouse.

Data warehouse designs are simpler to understand than a relational one, thereby query construction is easier. I think this is more due to the fact that designers have had a bad habit of throwing up on a wall a full ER chart of their systems. Saying in effect Look how great I am, no one will ever understand this!  Creating ER diagrams of subsystems to describe important characteristics of a database can also be simple to understand.

Item #3 brings up a question. What do you think of OLAP tools such a Microsoft s Analysis Services? We have discovered you can use the tool without redesigning your database. We have seen some pretty fast query times if we take the option of allowing the tool to store data extracted from the production database into its own proprietary format. The opposition to its adoption here is the learning curve to master the MDX query language."

Fabian Pascal: The fact is that so-called "dimensional modeling" is logical, not physical modeling. Kimball does not present his "techniques" as just performance-maximization and, what is more, they won’t necessarily yield better performance. He also seems to believe, erroneously, that star-schemas are fully normalized designs. The real solution to performance problems is true RDBMSs with better implementations, not ad-hoc logical designs. Like so many, Kimball simply does not understand relational technology and confuses levels of representation.

I consider warehouses a regression to the good old days of application-biased files -- which we discarded for application-neutral and DBMSs, because they did not prove cost-effective. They are application-specific views of databases that are not derived via relational algebra from relational databases, but are rather non-relational SQL tables. The industry has a long and profitable history of recycling relabeled old failures, witness XML and graph throwbacks to hierarchic databases.

Aside from being necessary for soundness, fully normalized relational databases are the easiest to understand if (1) one thoroughly knows and understands the segment of reality to be represented in the database -- the business, that is -- and (2) data fundamentals. Arbitrary designs such as star-schemas are cope-outs, due to poor knowledge and understanding of the latter.

I am not familiar with the product, but I am generally wary of what vendors do.

Note on re-publication: See also Data Warehouses and the Logical-Physical Confusion.





No comments:

Post a Comment

View My Stats