Note: In MEANING AND THE DATA MODEL I mentioned a recent LinkedIn exchange that prompted that post. It also reminded me of an older post, which had triggered two exchanges: one @DBDebunk, and another @https://news.ycombinator.com/item?id=12437389&goto=news, in which I participated at the time. I decided to update the post, including the former exchange. I strongly recommend the latter exchange, particularly comments by catnaroek.
"Can you have 2 tables, VIEWS and DOWNLOADS, with identical structure in a good DB schema (item_id, user_id, time). Some of the records will be identical but their meaning will be different depending on which table they are in. The "views" table is updated any time a user views an item for the first time. The "downloads" table is updated any time a user downloads an item for the first time. Both of the tables can exist without the other."
-----------------------------------------------------------------------------------------------------------------------------------
SUPPORT
THIS SITE
The content here is not available anywhere else, except in regurgitations and hallucinations of LLMs, potentially mixed with garbage. If you deem it useful, particularly if you
are a regular reader, please help upkeep it by purchasing papers, donating, or contact me for online
seminars/consulting.
USING
THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or
acronyms of the terms listed on the SEARCH page. For detailed
instructions on how to understand and use the labels in conjunction with that
page, see the ABOUT page.
The 2017 and 2016 posts, including earlier posts rewritten in 2017 were
relabeled accordingly. As other older posts are rewritten, they will also be
relabeled. For all other older posts use Blogger search.
SOCIAL
MEDIA
You can follow me @LinkedIn.
-----------------------------------------------------------------------------------------------------------------------------------
In MEANING AND THE DATA MODEL we explained that conceptual models (CM) consisting of business rules (BRs) are the source of meaning of data in databases. Logical database design (LDD) assigns the meaning of terms in CMs to non-logical symbols of a formal logical theory (mislabeled data model) to produce logical models (LM) of the theory, each an application of the theory to the reality modeled by a CM. If the theory is RDM, the symbols stand for sets, which acquire an interpretation.
Database relations (not tables, the bodies of which depict only the extension of relations) are sets symbolized in the theory assigned the meaning of groups of entities of a single type in a CM. In our example the entities are operations of two types—views and downloads—hence two groups and, therefore two relations;
VIEWS (USER_ID, ITEM_ID, TIME)
DOWNLOADS (USER_ID,ITEM_ID, TIME)
Straightforward, right?
First, the database relations are identically structured, but they have different meaning. We have seen that meaning is assigned in the conceptual model, which is expressed at the logical level by semantic constraints—predicates in FOPL expressed in a FOPL-based data sublanguage. Can you suggest two distinct relation predicates/constraints that would express the different meanings of the two relations?
In fact, the only difference between the two relations is in their names, which are neither predicates, nor constraints. It is this point that triggered to the original post.
”The [relation] name itself is semantic content. The argument presented here treats the table name as something other than semantic content, which is incorrect. If it is necessary to encode table names and relationships into the database explicitly, then one should do so.”
Of course names have semantic content—the question is it accessible to a DBMS? Responders could not conceive of the negative—the distinct names are “there”, how could it possibly not be accessible?
It is this stance that prompted the title of this post. This requires understanding what meaning means. Say the tuple {A0035,file150,14:30} is candidate for insertion into the database. The way all commercial DBMSs work is that a user authorized to submit tuples knows the type (view or download) of tuples and specifies (possibly via an application) the name of the relation for each. That is because users do have access to the semantic content of names—the DBMS decides nothing. But could the DBMS decide the target relation? To say that it has access to the semantic content of the names means that it could infer from them where the tuple belongs, just like humans do. But it can’t—it has access to the names, but not to what they mean in this context! Names are just meaningless variables (X,Y,Z) to DBMSs.
Meaning is in the BRs in a conceptual model, expressed formally in LMs by RPs/constraints. That there is no predicate/constraint difference that the DBMS can rely on indicates poor database design, possibly due to poor conceptual modeling.
Ask yourself why are there two IDs in the design: the entities (operations) are not unitary, but associative—associate user with file entities. A group of associative entities is represented by an associative relation—time is a property of associations, and so is the type—(V/D):
OPERATIONS (OP_ID, USER_ID, ITEM_ID, TIME, TYPE)
which merits a simple PK representing a Name. Note that in this design TYPE is a shared property (entities differ in its values). Groups types differ in properties, not the values of a property.
While users understand semantically and can infer the proper relation from its name, a DBMS can only decide based on either on intension (RP), or extension (data). Names are only referents to predicates/constraints.
Note: A database relation consists of an extension (the body of a table that depicts it) and an intension (constraints). A heading (with attribute names) is a table, not relation feature.
This is one of the reasons why RDM’s Information Principle (IP), Codd’s Rule 0, mandates that “all information in a relational database be represented explicitly, and in exactly one way: as attribute values in relations” (i.e., as part of the extension, without reference to meaning). This too evoked vehement protestations.
In the original post I argued that the two group design violates
“The entire article and its claims are predicated on a very specific
interpretation of the Information Principle (that what Date calls an external
relvar predicate is itself information too and must therefore be accessible to
the DBMS, with all that that entails). With no agreement on that
interpretation, surely there's no point in debating/discussing.
The "external relvar predicate" way of working is known to work well.
It was the way of working even in the pre-database era : records in the file
named X represent meaning Y. What the alternative looks & feels like, I
think nobody knows that yet, including DMG himself.”
In RDM there is only a formal relation predicate (RP) which, when expressed in some specific FOPL-based data sublanguage, is a conjunct of semantic constraints that a RDBMS enforces for consistency of the relation extension (data) with its intension (meaning). What Date calls “external predicate” is the informal user understanding of what a relation means, its human interpretation--usually the semantics of the conceptual model. In general, Date eschews the conceptual level which is too informal for his taste, and calls “external predicates” what are essentially business rules, with which he is more comfortable. Incidentally, McGoveran (DMG) documented the frequent changes over time in Date's definitions of kinds of predicates, probably for this very reason. Names are only shorthand for (referent to) the formal relation predicates. The attribute names are in the heading, which is a table, not a relation feature.
I do not know what “works well” means here, but I do know that “relvars” are inconsistent with set theory and FOPL—the theoretical foundations of RDM.
”The external predicate differs from the internal one in that it contains the part that cannot be DBMS-understood, which is why it is called "external", as opposed to the internal one, which is entirely DBMS-understood, which is why it is called "internal.”
Bingo! The external predicate is the “user semantic understanding” of the relation, its interpretation (assigned meaning), which is very close to the conjunct of BRs in the CM that jointly describe the group of entities at the conceptual level that a relation represents at the logical level. The DBMS “understands” algorithmically constraints, that are expressions of RPs—formal expressions in FOPL of BRs—in a FOPL-based data sublanguage. It has access to the names, not to their semantic content. It does not really understand anything in a semantic sense, even those parts that Date thinks it does.
No comments:
Post a Comment