Wednesday, June 3, 2026

MEANING: IT’S NOT IN THE NAMES!



Note: In MEANING AND THE DATA MODELI mentioned a recent LinkedIn exchange that prompted that post. It also reminded me of an older post, which had triggered two exchanges: one @DBDebunk, and another @https://news.ycombinator.com/item?id=12437389&goto=news in which I participated at the time. I decided to update the older post and I strongly recommend the  YCombinator exchange, particularly comments by catnaroek.

  "Can you have 2 tables, VIEWS and DOWNLOADS, with identical structure in a good DB schema (item_id, user_id, time). Some of the records will be identical but their meaning will be different depending on which table they are in. The "views" table is updated any time a user views an item for the first time. The "downloads" table is updated any time a user downloads an item for the first time. Both of the tables can exist without the other."

---------------------------------------------------------------------

SUPPORT THIS SITE
The content here is not available anywhere else,  except in regurgitations and hallucinations of  LLMs, potentially mixed with garbage.  If you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing papers, donating, or contact me for online seminars/consulting.

 

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the SEARCH  page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
 

SOCIAL MEDIA
You can follow me @LinkedIn.

          ---------------------------------------------------------------------

In MEANING AND THE DATA MODEL we explained that conceptual models (CM) consisting of business rules (BRs) are the source of meaning of data in databases. Logical database design (LDD) assigns the meaning of terms in CMs to non-logical symbols of a formal logical theory (mis-labeled data model) to produce logical models (LM) of the theory, each an application of the theory to the reality modeled by a CM. If the theory is RDM, the symbols stand for sets, which acquire an interpretation--the meaning of terms in a CM.

Database relations (not tables, the bodies of which depict only the extension of relations) are sets symbolized in the theory assigned the meaning of groups of entities of a single type in a CM. In our example the entities are operations of two types—views and downloads—hence two groups and, therefore two relations;

VIEWS (USER_ID,ITEM_ID,TIME)

DOWNLOADS (USER_ID,ITEM_ID,TIME)

Straightforward, right?

First, the database relations are identically structured, but they have different meaning. Meaning is assigned in the CM, which is expressed at the logical level by semantic constraintspredicates in FOPL expressed in a FOPL-based data sublanguage. Can you suggest two distinct relation predicates/constraints that would express the different meanings of the two relations?

In fact, the only difference between the two relations is in names of the relations and attributes, which are neither predicates, nor constraints. The following was a to response my original post.

 ”The [relation] name itself is semantic content. The argument presented here treats the table name as something other than semantic content, which is incorrect. If it is necessary to encode table names and relationships into the database explicitly, then one should do so.”

Of course names have semantic content—the question is it accessible to a DBMS? Responders could not conceive of the negative—the distinct names "are there", how could they possibly be inaccessible?

It is this stance that prompted the title of this post. This requires understanding what meaning means in a database context. Say the tuple {A0035,file150,14:30} is candidate for insertion into the database. The way all commercial DBMSs work is that a user authorized to submit tuples for insertion of tuples and specifies (possibly via an application) the appropriate relation by name. That is because users do have access to the semantic content of names--they understand what the names mean. Could the DBMS decide? Only if it had access to the semantic content of the names--such that it could infer from the name where a tuple belongs. But it can’t—it has access to the names, but not to what they mean in this context! Names are just meaningless symbols to DBMSs.

When there is no predicate/constraint difference that the DBMS can rely on indicates poor database design, possibly due to poor conceptual modeling.

Ask yourself why are there two identifiers in the design: the entities (operations) are not unitary, but associative—they associate user with file entities. A group of associative entities is represented by an associative relation—TIME and TYPE (V/D) are properties of associations.

OPERATIONS (OP_ID, USER_ID, ITEM_ID, TIME, TYPE)

which merits a simple surrogate PK representing a Name. 

While users understand names semantically, a DBMS cannot--names are only referents to predicates/constraints. A DBMS does not have access to the semantic content of attribute names either--they reside in the "heading", which is a table, not relation feature.

This is one of the reasons why the core Information Principle (IP)--Codd’s Rule 0--mandates that “all information in a relational database be represented explicitly, and in exactly one way: as attribute values in relations” (i.e., in the extension) without reference to meaning for inferencing purposes. This too evoked vehement protestations.

In the original post I argued that the two group design violates the IP, rejected by the respondents:

 “The entire article and its claims are predicated on a very specific interpretation of the Information Principle (that what Date calls an external relvar predicate is itself information too and must therefore be accessible to the DBMS, with all that that entails). With no agreement on that interpretation, surely there's no point in debating/discussing.
The "external relvar predicate" way of working is known to work well. It was the way of working even in the pre-database era : records in the file named X represent meaning Y. What the alternative looks & feels like, I think nobody knows that yet, including DMG himself.”

 In RDM there is a formal relation predicate (RP) which, when expressed in some specific FOPL-based data sublanguage, is a conjunct of semantic constraints that a RDBMS enforces for consistency of the relation extension (data) with its intension (meaning). What Date calls “external predicate” is the informal user understanding of what a relation means, its human interpretation--the semantics in the CM. In general, Date eschews the conceptual level, which is too informal for his taste, and calls what are essentially business rules, with which he is more comfortable, “external predicates” to make them sound formal. Incidentally, McGoveran (DMG) documented the frequent changes over time in Date's definitions of kinds of predicates, probably for this very reason. Names are only shorthand for (referent to) the formal RPs. 

I do not know what “works well” means here, but I do know that “relvars” are inconsistent with set theory and FOPL—the theoretical foundations of RDM.

The external predicate differs from the internal one in that it contains the part that cannot be DBMS-understood, which is why it is called "external", as opposed to the internal one, which is entirely DBMS-understood, which is why it is called "internal.”

Bingo! The external predicate is the “user semantic understanding” of the relation, its interpretation (assigned meaning), the conjunct of BRs in the CM that jointly describe the group of entities that a relation represents at the logical level. The DBMS “understands” algorithmically constraints, that are formal expressions of BRs in a FOPL-based data sublanguage. It has access to the names, not to their semantic content. It does not really understand anything in a semantic/human sense, even those parts that Date thinks it does.  

 


No comments:

Post a Comment

View My Stats