Sunday, July 29, 2012

Meaning and Database Management

DT: The author of the following quote resists the idea that a primary function of an RDBMS is indeed to manage the meaning of the data, through constraints.
If we step back and look at what RDBMS is, we’ll no doubt be able to conclude that, as its name suggests (i.e. Relational Database Management System), it is a system that specializes in managing the data in a relational fashion. Nothing more. Folks, it’s important to keep in mind that it manages the data, not the MEANING of the data! And if you really need a parallel, RDBMS is much more akin to a word processor than to an operating system. ...Why should we tolerate RDBMS opinions on our data? We’re the masters, RDBMS is the servant, it should shut up and serve. End of discussion.

Hugh Darwen: Couldn't agree more.  The strength of the relational model lies in its total abandonment of meaning. The word "meaning" is bandied about sometimes in ways that make me wonder if some people don't know what it means, whether they spell it that way or the posh way, "semantics".

The meaning of relation SP as understood by the user is:
Supplier S# supplies part P# in a quantity of QTY
We can express the relevant constraints (a key and two foreign keys) informally, thus:
If Supplier S# supplies part P# in a quantity of QTY, then there exists a supplier with supplier number S# and there exists a part with part number P# and there does not exist a quantity QTY2 such that QTY<>QTY2 and S# supplies P# in a quantity of QTY2.
From the long conditional sentence I have just written, we can derive the following extended predicate for SP (as Chris does):
Supplier S# supplies part P# in a quantity of QTY and there exists a supplier with supplier number S# and there exists a part with part number P# and there does not exist a quantity QTY2 such that QTY<>QTY2 and S# supplies P# in a quantity of QTY2.
But what is gained by writing that extended predicate?  The essential meaning is in the first conjunct. The rest is just consequential. The DBMS kind-of understands the consequential stuff but does not understand the essential bit (thank goodness). And, yes, the verb is the important bit. I accept that the DBMS understands that S# must look like a supplier number, P# and part number and QTY a quantity, but those are all consequences too.

BTW, I detest the term "semantic constraint" that some people use, imagining that some constraints are to do with meaning and others are not. They are all just constraints.


UPDATE (7/29/12)

What Hugh refers to as constraints I prefer to call business rules, because they are expressed in real-world terms (suppliers, parts and so on). Integrity constraints would be their representatives in the database and are expressed in database terms (tables, columns, rows). Incidentally, the table constraint Hugh specified subsumes not only the PK and FK constraints, but also the attribute constraints.

It is indeed the case that the relational model achieves its versatility for database management by being formal and independent of meaning. Thus, a RDBMS does not care what R-tables mean, it just manipulates them mathematically as sets and produces R-tables as results. The meaning of the tables--our interpretation--are the business rules and propositions which, verbs included, we understand, but the DBMS does not.

The quote should be understood in the context of the author's argument that "integrity is a myth" and, therefore, constraints mean nothing and are unnecessary. So while I agree that the DBMS does not understand the integrity constraints in the sense that we do, we refer to them as the best approximation to the user-understood meaning of the database that the DBMS can have and we don't mean anything more than algorithm checking.

The point here is that constraints are critical and cannot be dismissed as a myth just because the DBMS cannot understand the user interpretation. It "understands" enough to guarantee the maximum possible consistency with the business rules.

I agree that one must express that carefully the way I did in order not to mislead. That's why I usually put it in quotes.

Glossing over the integrity constraints as representing business rules contributes to poor understanding of and confusion about the RM and the notion that it's "just theory and not practical". I've argued about this with Chris too, and about the tendency to stay away from business modeling concepts such as properties, entities, classes and so on, because they are informal, imprecise, loaded, etc. They are all that, but precisely because of that it is important to make very clear the link between formal database representation of an informal world. In the absence of that we get arguments like the quote author's that integrity is a myth.

Codd applied theory to the real world. The usefulness of the relational model is precisely in its provision of formal constructs that have an informal interpretation. My forthcoming new version of the Business Modeling for Database Design paper makes a point of mapping every database construct to a corresponding real world counterpart.

(Originally posted at 12/10/05)

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment