In an interview, Len Silverstone, the proponent of Universal
Data Models, argues as follows:
Regarding improving data modeling practices, this is something
that is critical. Our track record for data modeling has not been great. Many
data modeling efforts have struggled because they have cost more and taken
longer than the associated perceived business value. I believe the answer is
not to stop doing data modeling (or to stop gathering information requirements)
but that we need to do it better, in less time, with better tools and methods,
such as using re-usable components as we do in other aspects of systems
development.
Needless to say, we agree. A major reason for the poor track
record is insufficient knowledge and understanding of data fundamentals. As
long as it persists, there will be only data muddling, not modeling.
One of the most common deficiencies in data management
practice is confusion of levels of representation. When lower level
considerations contaminate higher level ones, they practically guarantee flawed
modeling,
There are three distinct levels of representation:
·
Conceptual level
·
Logical level
·
Physical level
The first is the business level, the other two are database
levels. (We will not be concerned with the physical level here, but The Logical-Physical
Confusion is widespread in the industry and responsible for many flawed
practices; see also The
Costly Illusion: Normalization, Integrity and Performance).
Conceptual models (also referred to as business, or
E-R models) are informal: based on subjective perceptions of reality,
they are collections of business rules expressed in real world terms
(customers, departments, projects, and so on); they don’t contain the facts to
be recorded in the database. Because of their informality, direct representation
of conceptual models in computerized databases would not be very useful,
because databases are for mechanized inferences from data (which
represents the facts), and computerization is possible only with formally
structured
data, to which logic and mathematics can be applied (see, Business Modeling for
Database Design). Therefore, conceptual models must be converted to formal
representations that can be recorded in databases and lend themselves to
mechanized inference by DBMSs. This is what logical models are, and to
serve their purpose they must capture as much of the meaning in the
conceptual models they represent as possible.
Mapping conceptual to logical models requires a “translation”
mechanism that provides formal data constructs to represent informal business
constructs. Ted Codd, the inventor of the relational model, introduced the data
model as such a mechanism,anddefined it as follows:
[A data model] is a combination of three components:
·
a collection of data structure[s]…
·
a collection of operators or inferencing rules,
which can be applied to any valid instances of the [pertinent structures]
listed in 1, to retrieve or derive data from any parts of those structures in
any combinations desired;
·
a collection of general integrity constraints,
which implicitly or explicitly define the set of consistent database states or
changes of states, or both…
--E. F. Codd, Data Models in Database Management, IBM
Research Laboratory, 1980
Codd’s relational data model has the two desired
properties mentioned above—formality and real world interpretation—and
provides three components to map business models to. The structure is the
time-varying relation,that can be presented to users and
applications as a (special kind of) table (R-table), the operators on
relations are restrict, project, join, union and so
on, and there are four types of integrity constraint, domain, column,
single-table and multi-table constraints.
Logical models are expressed in (formal) database terms
(domains, R-tables, columns, rows and so on). Note also that while conceptual
and logical models are enterprise-specific—they are informal and formal
representations, respectively, of particular enterprises—a data model is, well,
universal (see next). As Chris Date puts it, a data model is to logical models
what a programming language is to programs; that is, it can be used to generate
many logical models from conceptual models of specific enterprises.
In the interview Silverstone defines a Universal Data Model
(UDM) as follows:
… a template or re-usable data model that is generally
applicable and that can be used by a great number of organizations to save time
and effort while offering holistic perspectives.
This creates the impression that Silverstone is referring to
the data model as defined above. But then consider following:
For instance, when building a product pricing data model,
the modeler may model a PRODUCT PRICE COMPONENT entity not realizing
that these price components apply not only to the base price of a product but
also to other things, for example, discounts or surcharges that are based on
geography, or by agreement, or based on the type of customer. Therefore setting
up a more generic PRICE COMPONENT entity offers a more re-usable
and holistic approach instead of having multiple entities to maintain
pricing structures. Likewise, when developing a CRM application instead of
adding fields to the CUSTOMER entity, Universal Data Models can
offer alternatives illustrating that maybe the name or contact information
should be associated with a PERSON, ORGANIZATION or PARTY. This
way the party’s information is consistent when this same party is involved in
another role, for example as a PROSPECT or WEB SITE VISITOR [emphasis added].
Not good writing, but a “pricing data model” cannot possibly
be a data model, not in the Codd sense. The terminology is clearly one of
business, not logical; and ‘entity’ is an informal conceptual (E-R) construct.
On the other hand, ‘field’ is neither a business, nor a formal logical
construct, but rather a physical implementation detail at the application
level.
Here’s more evidence suggesting that Silverstone’s UDM is at
the business level:
Universal Data Models include common data constructs applying to
most organizations as well as industry specific data constructs. For example,
common data constructs that apply to most organizations would include data
models for information about people, organizations, roles, relationships
between people and organizations, contact information, products, services,
inventory, pricing, requirements, quotes, orders, agreements, shipments,
projects, invoicing, payments, budgeting and accounting. [emphasis added]
There are Universal Data Models for many industries that build
upon these common constructs and offer additional extensions that may only be
applicable to a certain industry. For example, a manufacturing Universal Data
Model includes many of previously mentioned common data constructs but
also includes additional data constructs such as design engineering
models. Likewise, the insurance Universal Data Models includes additional
common constructs for claims processing, which are actually an extension of the
invoicing models since they both represent a request for reimbursement. [emphasis added]
These are real-world business constructs, not “data
constructs”. Indeed, the distinction between a conceptual and a data model is
precisely that the former is a model of a business and the latter
a general model of data.
And when he says:
I also believe that we shouldn’t just stop there, but as a
mature industry, we should have universal models for all cells in the Zachman
framework. Why not have various template models for all aspects of systems
development?
Silverstone cannot possibly mean data models, because there
isn’t one for each “cell in the Zachman network”, whatever level that
“framework” is at.
But then he adds:
Additionally, there are also data warehouse Universal Data
Models offering common ways of modeling data warehouse and star schema
constructs for example regarding sales analysis, human resource analysis or
financial analysis. [emphasis
added]
Well, a star schema is already at the logical level—it
uses tables, albeit not relationally—so “UDMs offering common ways to modeling
data warehouses and star schema constructs” can only mean logical models, not a
data model.
Silverstone should be clear in his mind as to what level he
is on—conceptual or logical—but one thing is clear: a UDM is not a data model
(there cannot be many of them anyway).
Most of my clients implementing Universal Data Models are in an
object oriented environment. Very often, the Universal Data Models serve as the
foundation for a relational database and then an object oriented class
structure is superimposed on the Universal Data Models to allow object oriented
programmatic access. Sometimes a relational database is not even involved and
the Universal Data Model (customized to the organization) is used as a basis
for the object class structures in object oriented programs. Other clients use
the Universal Data Models as a “universal” method for passing data, for example
via XML.
As far as we know, object-orientation (OO) is a
programming
approach. It is essentially a set of guidelines that, if adhered to, leads to
application programs that have certain advantageous properties. As such, OO is
not a data modeling/management paradigm, but an application development
approach.
If UDMs are conceptual models or logical models they precede—as they
certainly should—application development, so we do not understand how “an
object-oriented class structure is superimposed on the Universal Data Model”,
which suggests the reverse sequence.
Note: At the database level—Silverstone means SQL,
not relational—the
class structure is “imposed” perhaps on SQL databases, not conceptual or
logical models. Anyway, the fact is that many OO-leaning practitioners skip
databases altogether and implement all data management in applications strongly
supports our contention that OO [is] For Application Development,
Not Data
Management. A truly relational DBMS (TRDBMS) would, of
course, support user-defined domains of arbitrary complexity, so there would be
no need to impose anything OO on relational databases. Even SQL products
provide some such support, but in very limited and flawed ways.
… the models I am providing are NOT the only one right answer
for a subject data area, or even for a very specific data construct! In my
opinion, there is no one right answer, especially when offering “universal”
constructs that can be generally applied to different situations. In my books,
I sometimes show alternatives to modeling various structures and point out the
pros and cons of each. When consulting, I will often provide models that are
not what I have in my Universal Data Model repository but they are variations
of the Universal Data Models based upon the specific needs of the client. [emphasis added]
So whatever they are, Silverstone’s models are not really
universal after all. Which is true of conceptual and logical models by
definition—because they are enterprise-specific, they vary with the “specific
needs of the client”—but not of data models e.g. relational, hierarchic,
network, which are intended to be universal.
I also believe that knowing and understanding various
perspectives and possibilities is very powerful. When data modelers have
differences of opinions, I will usually ask them to model the data requirements
as they see it and very often several excellent models emerge. From these
various alternatives, along with their pros and cons, an informed decision can
be made.
Conceptual models are based on subjective perceptions of
reality, hence multiple perspectives. There is no scientific basis for
preferring one to another. But once a conceptual model is properly specified
and agreed on, there may be more than one logical model to map it to, but
whichever one is chosen, relational mapping is guided by universal formal
principles (e.g. POFN and POOD). So if his models are conceptual, Silverstone’s
position makes sense. If, however, they are logical models, design is much more
constrained, and there no place for “opinions”.
There is similar confusion elsewhere.
I have attended Karen Lopez’s (moderator of the Data Modeling
List) outstanding conference session entitled “Data Modeling Contention
Issues”. She brings up various issues in data modeling (such as abstract versus
specific modeling, use of surrogate keys, use of a conceptual data model, and
even the idea of using template models) to a group of experienced modelers and
has participants publicly rate their responses on a scale from 1 (strongly
agree) versus 5 (strongly disagree). What I loved about attending this session
is that even though there are near-religious debates about what participa[nts]
believe is the “right” way, she constantly brings awareness that “the most
successful discussions are ones where both sides learn something new about the
others’ viewpoint”.
The broad term “data modeling” lumps together both conceptual
and logical aspects. It is not very clear at what level “abstract vs. specific
modeling” and “template models” are, although we suspect conceptual. Surrogate
keys, on the other hand, are a logical database construct.
Confusing levels of representations does not lead to good
modeling and design, a problem that Silverstone himself deplores. A while ago
Chris Date wrote a multi-part article entitled Why Is It Important to Think
Precisely
(in RELATIONAL
DATABASE WRITINGS 1994-87), which all modelers should read. He mentions
that after spending a considerable time in one of his seminars explaining the
necessity of precise thinking in database management, one of the participants
asked, “Yes, but why is it important to think precisely?” Date was taken
aback and, having thought for a few seconds, replied, “I don’t know why”. The
reader is invited to draw his/her own conclusions.
(Thanks to Chris Date for useful comments on a draft of this
article.)
Posted 3/26/04
© Fabian Pascal 2006 All Rights Reserved