MEANING AND THE DATA MODEL

Recently I participated in a LinkedIn exchange initiated by the following statement:

“First-Order Predicate Logic is quietly breaking enterprise semantics. FOPL assumes meaning can be expressed as predicates applied to entities:

subject → predicate → object

That works beautifully in theory. It fails in practice. Because the predicate is where ambiguity hides. So you can build vast semantic graphs where:
– entities carry multiple meanings
– predicates encode multiple dimensions
– relationships look precise but aren’t disjoint

It scales connections. Not clarity.” –Robert Vane

--------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
This content here is not available anywhere else, except in regurgitations and hallucinations of LLMs, potentially mixed with other garbage. If you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing papers, donating, or contact me for online seminars/consulting.

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the SEARCH page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, incl uding earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
You can follow me @LinkedIn

--------------------------------------------------------------------------------------------------------------------

Behind Vane’s statement is an old criticism of RDM, namely that it is “semantically weak” (it does not capture enough meaning), hence ambiguity. It reminded me of an old online article:

“If we step back and look at what RDBMS is, we’ll no doubt be able to conclude that, as its name suggests (i.e., Relational Database Management System), it is a system that specializes in managing the data in a relational fashion. Nothing more. Folks, it’s important to keep in mind that it manages the data, not the MEANING of the data! ... So who is then responsible for managing the meaning of the words? It’s the author, who else? Why should we tolerate RDBMS opinions on our data? We’re the masters, RDBMS is the servant, it should shut up and serve. End of discussion.” --Alex Bunardzik, Should Database Manage The Meaning?

Aside from the silliness (RDBMS has no “opinions on our data”), we have OTOH not enough meaning, and OTO even a little meaning is too much. To settle these contradictory arguments requires foundation knowledge, which is scarce in the industry.

In database management, the source of meaning (semantics) is in conceptual models (CM) of information: collections of business rules (statements in natural language) that describe properties by specifying their permissible values, and object types (entities, groups, and multigroup) by specifying their defining properties. CMs are informal, however, and not “computable”, due not in small part precisely to ambiguity and vagueness of natural language in which BRs express meaning.

CMs are, however, informal and inaccessible to DBMSs. By assigning the meaning of informal terms in CMs to non-logical symbols of a formal logical theory, logical database design (LDD) produces formal logical models (LM) of the theory, each an application of the theory to the information modeled by CMs. The symbols of the theory acquire specific interpretations (the meaning in the CMs) while preserving their mathematical properties, thus becoming accessible to database representation and DBMS algebraic manipulation for inferencing purposes—inferring new information that is logical implication of the information recorded in the database.

In 1969-70 Codd introduced such a formal logical data theory—a combination of mathematical relation theory and simple set theory expressible in first order predicate logic (MRT/SST/FOPL) that he adapted and applied to database management. Its symbols stand for sets. In 1980 he defined formally such a theory as having three components: structure, manipulation, and integrity. Much of the semantic weakness argument was induced by Codd’s (1) labeling the RDM component integrity instead of what it really is—semantics (2) failure to integrate it in manipulation (relational algebra).

Predicates are formal expressions in FOPL of BRs in CMs; BRs are the source of meaning. The semantic component of RDM consists of constraints, which are predicates expressed in a FOPL-based data sublanguage (no DBMS “speaks FOPL”). Constraints constrain the symbolized sets of the theory for consistency with their interpretations in LMs (i.e., semantic consistency)—the meaning assigned them by the corresponding CMs. Which is why they are semantic constraints.

Note: Not all constraints are expressible declaratively as predicates in FOPL, and must be implemented in a CCPL and invoked in the data sublanguage.

In 1979 Codd tried to address the “semantic weakness” criticism of RDM by extending it semantically and, in doing so, he pointed out some, well, weaknesses of the semantic weakness argument.

“During the last few years numerous investigations have been aimed at capturing (in a reasonably formal way) more of the meaning of the data, while preserving independence of implementation. This activity is sometimes called semantic data modeling. Actually, the task of capturing the meaning of data is a never-ending one. So the label “semantic” must not be interpreted in any absolute sense. Moreover, database models developed earlier (and sometimes attacked as “syntactic”) were not devoid of semantic features … there is a strong emphasis on structural aspects, sometimes to the detriment of manipulative aspects. Structure without corresponding operators or inferencing techniques is rather like anatomy without physiology ... It should be remembered, however, that the [semantic] extensions in RM/T are primarily intended for the minority consisting of database designers and sophisticated users; most users will probably prefer the simplicity of the basic relational model.” --E. F. Codd, RM/T: Extending the Database Relational Model to Capture More Meaning

Emphasis ours.

· Users understand semantically, DBMSs “understand” algorithmically (which is implicit, if poorly expressed, in Bunardzik argument.) But not all semantics can be captured in a “reasonably formal way”—a necessity for database representation and algebraic manipulation.

· By its nature, semantics produces structural elements which in the theory may be devoid of manipulation, limiting inferencing and thus usefulness (“structure without corresponding operators or inferencing techniques is rather like anatomy without physiology”);

· There is no scientific (theoretical) basis for how much meaning is enough to capture (“the task of capturing the meaning of data is a never-ending one. So the label “semantic” must not be interpreted in any absolute sense”). In other words it’s a pragmatic choice.

· Extending a theory semantically makes it more complex and less versatile (general).

This compeled some to take Bunardzik’s argument to its absurd extreme:

”The strength of the relational model lies in its total abandonment of meaning. BTW, I detest the term "semantic constraint" that some people use, imagining that some constraints are to do with meaning and others are not. They are all just constraints. The word "meaning" is bandied about sometimes in ways that make me wonder if some people don't know what it means, whether they spell it that way or the posh way, "semantics. --H. Darwen

But as we have seen, constraints constrain the otherwise abstract sets of the theory for semantic consistency.

“Were logic (and RDM) to totally abandon meaning, it would be unusable! Logic is only the formal language "essential for argument and persuasion" (i.e., deduction). It merely “lacks reference to meaning” and thereby "achieves versatility", it does not abandon it! This applies only to the formal deductive part of logic, not to its equally important methods of formal interpretation (semantics)—which are essential to its application—the RDM is abstract formal theory applied to database management.” --David McGoveran

In the LinkedIn exchange I said that the current ontologic fad reminds me of the OO fad, in which the industry was “objectifying” itself to death, just like it is “ontologizing” itself to death now.

The take away should be that how much semantics to capture is not a matter of maximization (Vane), or abandonment (Bunardzik, Darwen), but of pragmatic optimization of tradeoffs (semantics vs. computability, simplicity, and generality).

"We argue that systems of operations on data are most effective when they are formalisms, in which semantic considerations are unimportant until the formalism is applied to some specific application. In this way, database processing can join the ranks of successful mathematical abstractions.

Differential equations, for instance, can be applied to situations ranging from orbit calculations to the quantum mechanics of the atom. The semantics of each application is unique to that application, but the formalism of differential equations is common. The power of the formalism lies in its abstraction from issues of meaning. --T.H. Merrett, Extending the Relational Algebra to Capture Less Meaning

Abstraction for deductive purposes; not abandonment for interpretation purposes.

POSTS

Friday, May 22, 2026

No comments:

Post a Comment