MEANING: THE DESIRABLE AND THE POSSIBLE
by Fabian Pascal

 

 

It’s always the intelligent people who are left wing.

--Alan Bennett

 

I object to being labeled right wing.

--Chris Date

 

 

 

In DATA MODELING, LEFT AND RIGHT Phillip Engle focuses on what may well be the issue in data management: the capturing of meaning in modeled reality into computerized systems. He frames it in terms of two modeling approaches:

 

Left-wing: “Data modeling as it is practiced by AI experts, KM experts, object-oriented data analysts, and “semantic web” ontology advocates involves the specification of deeply layered type hierarchies, including multiple inheritance for entities and (often) multiple values per-attribute per-entity … [the concern is] to capture logical rules (business rules) that are semantically dependent on the particular subject-area being studied, in addition to the logical/business rules that are independent of subject-matter.”

 

Right-wing: “… relational data modelers and other devotees of sorted, two-valued, first-order predicate logic hate hierarchies …”

 

Frankly, we find the application of the two labels to the subject matter at hand rather absurd, but will ignore that.

 

Let me, first, point out that the notion that relational proponents “hate hierarchies” is rather silly. Engle does not seem to differentiate between type hierarchies and the hierarchic data model, which are distinct.

 

·   With respect to the latter, where the real world is inherently hierarchic—e.g. organizational structures, or bill-of-material part assemblies—the information about those structures should, of course, be captured by modeling. The question is what is the best representation of hierarchic data for the purpose of least complex, most flexible computerized inferences. As we stated elsewhere (see MultiValue Lacks Value and What First Normal Form Really Means,   What First Normal Form Means Not), proponents of hierarchic modeling focus on structure and mostly ignore its purpose—manipulation (and integrity).

 

·   The former are orthogonal to the data model

 

Note: Chris Date points out that, interestingly, bill-of-materials part assemblies are very difficult to represent in IMS- or XML-style hierarchies because (a) they have a recursive structure (b) they’re usually M:N (networks) anyway, not 1:N (true hierarchies). He likes type hierarchies, but objects to force-fitting data into IMS- or XML-style structures.

 

The fact is that hierarchic information can be represented relationally, which is significantly more advantageous from the perspective of data manipulation and integrity (see chapter 7 in PRACTICAL ISSUES IN DATABASE MANAGEMENT). Moreover, where the world is not inherently hierarchic—a majority of cases we would argue—imposing hierarchy makes no sense, yet a hierarchic approach does just that.

 

Engle claims that given the R-table

 

EMPLOYEES {EMPLOYEEID,NAME,ADDRESS,SALARY}

 

Date and Darwen would say that this relation is a predicate (i.e., a truth-valued function), which might be expressed as follows:

 

An EMPLOYEE has a unique EmployeeID, is called by a certain Name, lives at a certain Address, and makes a certain yearly Salary.

 

but that the relational model does not capture this, but rather

 

Each particular EMPLOYEE is associated with one-and-only one EmployeeID (drawn from a specified domain) which uniquely identifies that tuple within the relation. Furthermore, each particular EMPLOYEE is also associated with one-and-only-one Name (drawn from a specified domain), with one-and-only-one Address (drawn from a specified domain), and with one-and-only-one Salary (drawn from a specified domain).

 

and, thus:

 

What’s missing is all of the semantically particular verbs in the first formulation … From the point of view of relational database theory, the information represented by the verbs underlined above must be stored outside the relational system (perhaps in “system documentation” or just in “peoples’ heads”).

 

But the relation EMPLOYEES is not just the predicate in the first formulation! Rather, the relation’s heading corresponds to an internal predicate whose user interpretation is the external predicate to which that formulation corresponds. To understand the significance of the difference, here’s a quote from Date’s INTRODUCTION TO DATABASE SYSTEMS 8th Ed. (emphasis in the original):

 

The first and most significant point is that, while internal predicates are a formal construct, external predicates are an informal construct merely. Internal predicates are (loosely) what the data means to the system; external predicates, by contrast, are what the data means to the user. Of course, users have to understand the internal predicates as well as the external ones, but to repeat, the system has to understand—indeed, can only understand—the internal ones. In fact, we may say, loosely, that a given internal predicate is the system’s approximation to the corresponding external predicate.”

 

Note very carefully the emphasis on the system’s ability to understand only the formal. Engle is, in fact, aware of this critical aspect of Date and Darwen’s position when he quotes them (emphasis mine):

 

In an ideal world ... the DBMS would know the [full] meaning of every relation, so that it could deal correctly with all possible updates. But, of course, that’s impossible. There’s no way it can know those meanings exactly. For example, there’s no way the DBMS can know what it means for a certain supplier to be “in” a certain city or to “have” a certain status; these concepts are outside the system – they’re understood by users, not by the DBMS. More precisely, they’re part of what logicians call the interpretation (of the [relvar] in question).

 

Otherwise put, no matter how desirable it is for computerized systems to fully understand user meaning (the interpretation expressed by external predicates), unless those predicates can be directly formalized, internal predicates—and thus, predicate logic/the relational model--are the best we can currently do to guarantee consistent mechanized inferences.  Logicians make this quite clear (emphasis added):

 

… mathematical or symbolic logic has two aspects. On the one hand it is logic—it is an analytical theory of the art of reasoning whose goal is to systematize and codify principles of valid reasoning. It has emerged from a study of the use of language in argument and persuasion and it is based on the identification and examination of those parts of language, which are essential for these purposes. It is formal in the sense that it lacks reference to meaning. Thereby, it achieves versatility: it may be used to judge the correctness of a chain of reasoning (in particular, a “mathematical proof”) solely on the basis of the form (and not the content) of the sequence of statements which make up the chain.

--Robert Stoll, SET THEORY AND LOGIC, Dover Publications, 1963

 

Yet Engle seems to ignore this when he states:

 

… first-order predicate logic … is “semantically complete” only with respect to the identity predicate, the truth-functional operators, and the quantifiers expressible within that system.  Consequently, the many additional business rules within a subject-area of interest that are dependent on the particular semantics of that subject-area (especially the verbs within that subject-area) are simply lost if that subject-area is modeled exclusively in relational terms.

 

and quotes:

 

“The fact is that the meanings of the predicate letters of [basic first-order] predicate logic vary from problem to problem: Unlike quantifiers, truth-functional operators, and the identity predicate, they do not have fixed meanings. Consequently [basic first-order] predicate logic provides no rules to account for the semantics of specific predicates – with the sole exception of the identity predicate. It is therefore insensitive to validity generated by their distinctive semantics.”

 

He takes Date to task for claiming that “… domains (or types) and relations are together both necessary and sufficient to represent absolutely any data whatsoever …” and concludes:

 

… it is clear that he rejects the idea that semantically dependent logic (e.g., verb-dependent logic) can (or should?) be formally represented within computer-based business systems. In effect, this leaves many “business rules” entirely outside the scope of formal business systems!

 

What is clear, however, is that domains and relations are necessary and sufficient for representing any data formally and Date’s position should be taken in this context.

 

Engle’s position derives from his declared optimism regarding the ability of nonrelational approaches to achieve fuller capturing of user meaning. But while I may agree with Engle on the desirability of such an objective, optimism is not substitute for scientific knowledge. I am unaware of any formal approach that currently does a better job than the relational model and the logic underlying it, and Engle does not offer any.

 

Neither do I share his optimism regarding the approaches he refers to—object orientation, and many-valued logic have been around not less than the relational model, with little to show for it. We are on the public record with amply documented reasons for that—not just at DATABASE DEBUNKINGS, but in also in our other writings, books, seminars, and so on—so there is no point in repeating those reasons here. It is incumbent on those who are optimistic about those approaches either to demonstrate that we are wrong about the fundamental flaws we document, or show why those flaws are no good reason to drop the optimism.

 

We relational proponents do not “reject the idea that semantically dependent logic (e.g., verb-dependent logic) can (or should?) be formally represented within computer-based business systems”, as he claims. Rather, we are not prepared to give up a sound foundation until such time as the better one that Engle hopes for emerges. To the extent that this “leaves many “business rules” entirely outside the scope of formal business systems!”—and we are not convinced this is the case—well, that is the best science can currently do, and that is no reason to drop the scientific approach altogether. If there is anything that causes the IT industry to regress rather than progress, it is the flouting of sound foundations.

 

Engle accuses relational proponents in general, and us in particular of “Stak[ing] out a position on the … far-right and demoniz[ing] the other side” (making the absurdity of the labels even more obvious):

 

For example, the often-excellent “Database Debunkings” web site (www.dbdebunk.com) maintained by Fabian Pascal and C.J. Date is marred by attacks on non-relational data modeling that border on ad hominem. This approach does not seem to me to be productive, since I see significant value on both the left and the right.

 

Let me correct Engle here. We do not attack serious research, and those with sufficient knowledge to undertake it, who withhold claims until they have persuasive results. We do and will take on, however, unfounded criticism of the relational model or absurd nonrelational claims based on ignorance and/or stupidity, which are causing serious damage. Big difference, and we invite Engle to prove otherwise.

 

 

Posted 10/3/03

© Fabian Pascal 2006 All Rights Reserved