“UNIVERSAL DATA MODELS” AND IMPORTANCE OF THINKING PRECISELY
by Fabian Pascal

 

 

 

In an interview, Len Silverstone, the proponent of Universal Data Models, argues as follows:

 

Regarding improving data modeling practices, this is something that is critical. Our track record for data modeling has not been great. Many data modeling efforts have struggled because they have cost more and taken longer than the associated perceived business value. I believe the answer is not to stop doing data modeling (or to stop gathering information requirements) but that we need to do it better, in less time, with better tools and methods, such as using re-usable components as we do in other aspects of systems development.

 

Needless to say, we agree. A major reason for the poor track record is insufficient knowledge and understanding of data fundamentals. As long as it persists, there will be only data muddling, not modeling.

 

One of the most common deficiencies in data management practice is confusion of levels of representation. When lower level considerations contaminate higher level ones, they practically guarantee flawed modeling,

 

There are three distinct levels of representation:

 

·         Conceptual level

·         Logical level

·         Physical level

 

The first is the business level, the other two are database levels. (We will not be concerned with the physical level here, but The Logical-Physical Confusion is widespread in the industry and responsible for many flawed practices; see also The Costly Illusion: Normalization, Integrity and Performance).

 

Conceptual models (also referred to as business, or E-R models) are informal: based on subjective perceptions of reality, they are collections of business rules expressed in real world terms (customers, departments, projects, and so on); they don’t contain the facts to be recorded in the database. Because of their informality, direct representation of conceptual models in computerized databases would not be very useful, because databases are for mechanized inferences from data (which represents the facts), and computerization is possible only with formally structured data, to which logic and mathematics can be applied (see, Business Modeling for Database Design). Therefore, conceptual models must be converted to formal representations that can be recorded in databases and lend themselves to mechanized inference by DBMSs. This is what logical models are, and to serve their purpose they must capture as much of the meaning in the conceptual models they represent as possible.

 

Mapping conceptual to logical models requires a “translation” mechanism that provides formal data constructs to represent informal business constructs. Ted Codd, the inventor of the relational model, introduced the data model as such a mechanism,anddefined it as follows:

 

[A data model] is a combination of three components:

·    a collection of data structure[s]…

·    a collection of operators or inferencing rules, which can be applied to any valid instances of the [pertinent structures] listed in 1, to retrieve or derive data from any parts of those structures in any combinations desired;

·    a collection of general integrity constraints, which implicitly or explicitly define the set of consistent database states or changes of states, or both…

--E. F. Codd, Data Models in Database Management, IBM Research Laboratory, 1980

 

Codd’s relational data model has the two desired properties mentioned above—formality and real world interpretation—and provides three components to map business models to. The structure is the time-varying relation,that can be presented to users and applications as a (special kind of) table (R-table), the operators on relations are restrict, project, join, union and so on, and there are four types of integrity constraint, domain, column, single-table and multi-table constraints.

 

Logical models are expressed in (formal) database terms (domains, R-tables, columns, rows and so on). Note also that while conceptual and logical models are enterprise-specific—they are informal and formal representations, respectively, of particular enterprises—a data model is, well, universal (see next). As Chris Date puts it, a data model is to logical models what a programming language is to programs; that is, it can be used to generate many logical models from conceptual models of specific enterprises.

 

In the interview Silverstone defines a Universal Data Model (UDM) as follows:

 

… a template or re-usable data model that is generally applicable and that can be used by a great number of organizations to save time and effort while offering holistic perspectives.

 

This creates the impression that Silverstone is referring to the data model as defined above. But then consider following:

 

For instance, when building a product pricing data model, the modeler may model a PRODUCT PRICE COMPONENT entity not realizing that these price components apply not only to the base price of a product but also to other things, for example, discounts or surcharges that are based on geography, or by agreement, or based on the type of customer. Therefore setting up a more generic PRICE COMPONENT entity offers a more re-usable and holistic approach instead of having multiple entities to maintain pricing structures. Likewise, when developing a CRM application instead of adding fields to the CUSTOMER entity, Universal Data Models can offer alternatives illustrating that maybe the name or contact information should be associated with a PERSON, ORGANIZATION or PARTY. This way the party’s information is consistent when this same party is involved in another role, for example as a PROSPECT or WEB SITE VISITOR [emphasis added].

 

Not good writing, but a “pricing data model” cannot possibly be a data model, not in the Codd sense. The terminology is clearly one of business, not logical; and ‘entity’ is an informal conceptual (E-R) construct. On the other hand, ‘field’ is neither a business, nor a formal logical construct, but rather a physical implementation detail at the application level.

 

Here’s more evidence suggesting that Silverstone’s UDM is at the business level:

 

Universal Data Models include common data constructs applying to most organizations as well as industry specific data constructs. For example, common data constructs that apply to most organizations would include data models for information about people, organizations, roles, relationships between people and organizations, contact information, products, services, inventory, pricing, requirements, quotes, orders, agreements, shipments, projects, invoicing, payments, budgeting and accounting. [emphasis added]

 

There are Universal Data Models for many industries that build upon these common constructs and offer additional extensions that may only be applicable to a certain industry. For example, a manufacturing Universal Data Model includes many of previously mentioned common data constructs but also includes additional data constructs such as design engineering models. Likewise, the insurance Universal Data Models includes additional common constructs for claims processing, which are actually an extension of the invoicing models since they both represent a request for reimbursement. [emphasis added]

 

These are real-world business constructs, not “data constructs”. Indeed, the distinction between a conceptual and a data model is precisely that the former is a model of a business and the latter a general model of data.

 

And when he says:

 

I also believe that we shouldn’t just stop there, but as a mature industry, we should have universal models for all cells in the Zachman framework. Why not have various template models for all aspects of systems development?

 

Silverstone cannot possibly mean data models, because there isn’t one for each “cell in the Zachman network”, whatever level that “framework” is at.

 

But then he adds:

 

Additionally, there are also data warehouse Universal Data Models offering common ways of modeling data warehouse and star schema constructs for example regarding sales analysis, human resource analysis or financial analysis. [emphasis added]

 

Well, a star schema is already at the logical level—it uses tables, albeit not relationally—so “UDMs offering common ways to modeling data warehouses and star schema constructs” can only mean logical models, not a data model.

 

Silverstone should be clear in his mind as to what level he is on—conceptual or logical—but one thing is clear: a UDM is not a data model (there cannot be many of them anyway).

 

Most of my clients implementing Universal Data Models are in an object oriented environment. Very often, the Universal Data Models serve as the foundation for a relational database and then an object oriented class structure is superimposed on the Universal Data Models to allow object oriented programmatic access. Sometimes a relational database is not even involved and the Universal Data Model (customized to the organization) is used as a basis for the object class structures in object oriented programs. Other clients use the Universal Data Models as a “universal” method for passing data, for example via XML.

 

As far as we know, object-orientation (OO) is a programming approach. It is essentially a set of guidelines that, if adhered to, leads to application programs that have certain advantageous properties. As such, OO is not a data modeling/management paradigm, but an application development approach. If UDMs are conceptual models or logical models they precede—as they certainly should—application development, so we do not understand how “an object-oriented class structure is superimposed on the Universal Data Model”, which suggests the reverse sequence.

 

Note: At the database level—Silverstone means SQL, not relational—the class structure is “imposed” perhaps on SQL databases, not conceptual or logical models. Anyway, the fact is that many OO-leaning practitioners skip databases altogether and implement all data management in applications strongly supports our contention that OO [is] For Application Development, Not Data Management. A truly relational DBMS (TRDBMS) would, of course, support user-defined domains of arbitrary complexity, so there would be no need to impose anything OO on relational databases. Even SQL products provide some such support, but in very limited and flawed ways.

 

… the models I am providing are NOT the only one right answer for a subject data area, or even for a very specific data construct! In my opinion, there is no one right answer, especially when offering “universal” constructs that can be generally applied to different situations. In my books, I sometimes show alternatives to modeling various structures and point out the pros and cons of each. When consulting, I will often provide models that are not what I have in my Universal Data Model repository but they are variations of the Universal Data Models based upon the specific needs of the client. [emphasis added]

 

So whatever they are, Silverstone’s models are not really universal after all. Which is true of conceptual and logical models by definition—because they are enterprise-specific, they vary with the “specific needs of the client”—but not of data models e.g. relational, hierarchic, network, which are intended to be universal.

 

I also believe that knowing and understanding various perspectives and possibilities is very powerful. When data modelers have differences of opinions, I will usually ask them to model the data requirements as they see it and very often several excellent models emerge. From these various alternatives, along with their pros and cons, an informed decision can be made.

 

Conceptual models are based on subjective perceptions of reality, hence multiple perspectives. There is no scientific basis for preferring one to another. But once a conceptual model is properly specified and agreed on, there may be more than one logical model to map it to, but whichever one is chosen, relational mapping is guided by universal formal principles (e.g. POFN and POOD). So if his models are conceptual, Silverstone’s position makes sense. If, however, they are logical models, design is much more constrained, and there no place for “opinions”.

 

There is similar confusion elsewhere.

 

I have attended Karen Lopez’s (moderator of the Data Modeling List) outstanding conference session entitled “Data Modeling Contention Issues”. She brings up various issues in data modeling (such as abstract versus specific modeling, use of surrogate keys, use of a conceptual data model, and even the idea of using template models) to a group of experienced modelers and has participants publicly rate their responses on a scale from 1 (strongly agree) versus 5 (strongly disagree). What I loved about attending this session is that even though there are near-religious debates about what participa[nts] believe is the “right” way, she constantly brings awareness that “the most successful discussions are ones where both sides learn something new about the others’ viewpoint”.

 

The broad term “data modeling” lumps together both conceptual and logical aspects. It is not very clear at what level “abstract vs. specific modeling” and “template models” are, although we suspect conceptual. Surrogate keys, on the other hand, are a logical database construct.

 

Confusing levels of representations does not lead to good modeling and design, a problem that Silverstone himself deplores. A while ago Chris Date wrote a multi-part article entitled Why Is It Important to Think Precisely (in RELATIONAL DATABASE WRITINGS 1994-87), which all modelers should read. He mentions that after spending a considerable time in one of his seminars explaining the necessity of precise thinking in database management, one of the participants asked, “Yes, but why is it important to think precisely?” Date was taken aback and, having thought for a few seconds, replied, “I don’t know why”. The reader is invited to draw his/her own conclusions.

 

(Thanks to Chris Date for useful comments on a draft of this article.)

 

 

Posted 3/26/04

© Fabian Pascal 2006 All Rights Reserved