It’s always the intelligent people
who are left wing.
--Alan
Bennett
I object to being labeled right wing.
--Chris
Date
In DATA MODELING, LEFT AND RIGHT Phillip Engle focuses
on what may well be the issue in data management: the capturing of meaning
in modeled reality into computerized systems. He frames it in terms of two
modeling approaches:
Left-wing: “Data modeling as it is practiced by AI experts, KM
experts, object-oriented data analysts, and “semantic web” ontology advocates
involves the specification of deeply layered type hierarchies, including
multiple inheritance for entities and (often) multiple values per-attribute
per-entity … [the concern is] to capture logical rules (business rules) that
are semantically dependent on the particular subject-area being studied, in
addition to the logical/business rules that are independent of subject-matter.”
Right-wing: “… relational data modelers and other devotees of
sorted, two-valued, first-order predicate logic hate hierarchies …”
Frankly, we find the application of the two labels to the
subject matter at hand rather absurd, but will ignore that.
Let me, first, point out that the notion that relational
proponents “hate hierarchies” is rather silly. Engle does not seem to
differentiate between type hierarchies and the hierarchic data model,
which are distinct.
·
With respect to the latter, where the real world is inherently
hierarchic—e.g. organizational structures, or bill-of-material part
assemblies—the information about those structures should, of course, be
captured by modeling. The question is what is the best representation of
hierarchic data for the purpose of least complex, most flexible computerized
inferences. As we stated elsewhere (see MultiValue Lacks Value
and What First Normal
Form Really Means,
What First Normal Form Means Not), proponents of
hierarchic modeling focus on structure and mostly ignore its
purpose—manipulation (and integrity).
·
The former are orthogonal to the data model
Note: Chris Date points out that, interestingly,
bill-of-materials part assemblies are very difficult to represent in
IMS- or XML-style hierarchies because (a) they have a recursive structure (b)
they’re usually M:N (networks) anyway, not 1:N (true hierarchies). He likes
type hierarchies, but objects to force-fitting data into IMS- or XML-style
structures.
The fact is that hierarchic information can be represented
relationally, which is significantly more advantageous from the perspective of
data manipulation and integrity (see chapter 7 in PRACTICAL ISSUES
IN DATABASE MANAGEMENT). Moreover, where the world is not
inherently hierarchic—a majority of cases we would argue—imposing hierarchy
makes no sense, yet a hierarchic approach does just that.
Engle claims that given the R-table
EMPLOYEES
{EMPLOYEEID,NAME,ADDRESS,SALARY}
Date and Darwen would say that this relation is a predicate
(i.e., a truth-valued function), which might be expressed as follows:
An
EMPLOYEE has a unique EmployeeID, is called by a certain Name, lives
at a certain Address, and makes a certain yearly Salary.
but that the relational model does not capture this, but
rather
Each particular EMPLOYEE is associated with one-and-only one
EmployeeID (drawn from a specified domain) which uniquely identifies that tuple
within the relation. Furthermore, each particular EMPLOYEE is also associated
with one-and-only-one Name (drawn from a specified domain), with
one-and-only-one Address (drawn from a specified domain), and with one-and-only-one
Salary (drawn from a specified domain).
and, thus:
What’s missing is all of the semantically particular verbs in
the first formulation … From the point of view of relational database theory,
the information represented by the verbs underlined above must be stored
outside the relational system (perhaps in “system documentation” or just in
“peoples’ heads”).
But the relation EMPLOYEES is not just the predicate
in the first formulation! Rather, the relation’s heading corresponds to
an internal predicate whose user interpretation is the external
predicate to which that formulation corresponds. To understand the
significance of the difference, here’s a quote from Date’s INTRODUCTION TO DATABASE
SYSTEMS 8th Ed. (emphasis in the original):
The first and most significant point is that, while internal
predicates are a formal construct, external predicates are an informal
construct merely. Internal predicates are (loosely) what the data means to
the system; external predicates, by contrast, are what the data means to
the user. Of course, users have to understand the internal predicates as well
as the external ones, but to repeat, the system has to understand—indeed, can
only understand—the internal ones. In fact, we may say, loosely, that a given
internal predicate is the system’s approximation to the corresponding
external predicate.”
Note very carefully the emphasis on the system’s ability to
understand only the formal. Engle is, in fact, aware of this critical
aspect of Date and Darwen’s position when he quotes them (emphasis mine):
In an ideal world ... the DBMS would know the [full] meaning
of every relation, so that it could deal correctly with all possible updates.
But, of course, that’s impossible. There’s no way it can know those meanings
exactly. For example, there’s no way the DBMS can know what it means for a
certain supplier to be “in” a certain city or to “have” a certain status; these
concepts are outside the system – they’re understood by users, not by the DBMS.
More precisely, they’re part of what logicians call the interpretation (of the
[relvar] in question).
Otherwise put, no matter how desirable it is for computerized
systems to fully understand user meaning (the interpretation expressed by
external predicates), unless those predicates can be directly formalized,
internal predicates—and thus, predicate logic/the relational model--are the
best we can currently do to guarantee consistent mechanized inferences.
Logicians make this quite clear (emphasis
added):
… mathematical or symbolic logic has two aspects. On the one
hand it is logic—it is an analytical theory of the art of reasoning whose goal
is to systematize and codify principles of valid reasoning. It has
emerged from a study of the use of language in argument and persuasion and it
is based on the identification and examination of those parts of language,
which are essential for these purposes. It is formal in the sense that it
lacks reference to meaning. Thereby, it achieves versatility: it may be
used to judge the correctness of a chain of reasoning (in particular, a
“mathematical proof”) solely on the basis of the form (and not the content)
of the sequence of statements which make up the chain.
--Robert Stoll, SET THEORY AND
LOGIC, Dover Publications, 1963
Yet Engle seems to ignore this when he states:
… first-order predicate logic … is “semantically complete” only
with respect to the identity predicate, the truth-functional operators, and the
quantifiers expressible within that system.
Consequently, the many additional business rules within a subject-area
of interest that are dependent on the particular semantics of that subject-area
(especially the verbs within that subject-area) are simply lost if that subject-area
is modeled exclusively in relational terms.
and quotes:
“The fact is that the meanings of the predicate letters of
[basic first-order] predicate logic vary from problem to problem: Unlike
quantifiers, truth-functional operators, and the identity predicate, they do
not have fixed meanings. Consequently [basic first-order] predicate logic
provides no rules to account for the semantics of specific predicates – with
the sole exception of the identity predicate. It is therefore insensitive to
validity generated by their distinctive semantics.”
He takes Date to task for claiming that “… domains (or types)
and relations are together both necessary and sufficient to represent
absolutely any data whatsoever …” and concludes:
… it is clear that he rejects the idea that semantically
dependent logic (e.g., verb-dependent logic) can (or should?) be formally
represented within computer-based business systems. In effect, this leaves many
“business rules” entirely outside the scope of formal business systems!
What is clear, however, is that domains and relations are
necessary and sufficient for representing any data formally and Date’s
position should be taken in this context.
Engle’s position derives from his declared optimism regarding
the ability of nonrelational approaches to achieve fuller capturing of user
meaning. But while I may agree with Engle on the desirability of such an
objective, optimism is not substitute for scientific knowledge. I am unaware of
any formal approach that currently does a better job than the relational
model and the logic underlying it, and Engle does not offer any.
Neither do I share his optimism regarding the approaches he
refers to—object orientation, and many-valued logic have been around not less
than the relational model, with little to show for it. We are on the public
record with amply documented reasons for that—not just at DATABASE DEBUNKINGS,
but in also in our other writings, books, seminars, and so on—so there is no
point in repeating those reasons here. It is incumbent on those who are
optimistic about those approaches either to demonstrate that we are wrong about
the fundamental flaws we document, or show why those flaws are no good reason
to drop the optimism.
We relational proponents do not “reject the idea that
semantically dependent logic (e.g., verb-dependent logic) can (or should?) be
formally represented within computer-based business systems”, as he claims.
Rather, we are not prepared to give up a sound foundation until such time as
the better one that Engle hopes for emerges. To the extent that this “leaves
many “business rules” entirely outside the scope of formal business
systems!”—and we are not convinced this is the case—well, that is the best science
can currently do, and that is no reason to drop the scientific approach
altogether. If there is anything that causes the IT industry to regress rather
than progress, it is the flouting of sound foundations.
Engle accuses relational proponents in general, and us in
particular of “Stak[ing] out a position on the … far-right and demoniz[ing] the
other side” (making the absurdity of the labels even more obvious):
For example, the often-excellent “Database Debunkings” web site
(www.dbdebunk.com) maintained by Fabian Pascal and C.J. Date is marred by
attacks on non-relational data modeling that border on ad hominem. This
approach does not seem to me to be productive, since I see significant value on
both the left and the right.
Let me correct Engle here. We do not attack serious research,
and those with sufficient knowledge to undertake it, who withhold claims until
they have persuasive results. We do and will take on, however, unfounded
criticism of the relational model or absurd nonrelational claims based on
ignorance and/or stupidity, which are causing serious damage. Big difference,
and we invite Engle to prove otherwise.
Posted 10/3/03
© Fabian Pascal 2006 All Rights Reserved