Thursday, June 10, 2021

Entities, Properties and Codd's Sleight of Hand

A recent LinkedIn post tried to illustrate in graphic form some old bunk that had been debunked to death decades ago: the purported inferiority of relational relative to directed graph database management. The plethora of "Great viewpoint!", "great representation", "love this!", "like", "Nice!, "Very clever!" reactions it triggered confirm the lack of foundation knowledge and familiarity with history of the field in the industry, without which there cannot be any progress. I will not waste any time debunking it. Instead, I will use a quote by a participant in the exchange from a Date and Darwen (D&D) book to bring up an important insight by David McGoveran.

“Overall, we believe the most appropriate design will emerge if careful consideration is given to the distinction between (a) declarative sentences in natural language, on the one hand, and (b) the vocabulary used in the construction of such sentences on the other. As we showed in Chapter 2 (but simplifying slightly here), it is unencapsulated tuples in relations that stand for those sentences, and it is encapsulated domain values in attributes in those tuples that stand for particular elements — typically nouns — in that vocabulary. To say it slightly differently (and to repeat what we said in Chapter 2, albeit in different words): Domains, or types, give us values that represent things we might wish to make statements about; relations give us ways of making those statements. Consider once again the EMP relvar of Design R. Suppose that relvar includes the tuple:


The existence of this tuple in the relvar means, by definition, that the database includes something that assert  that the following declarative sentence (statement) is true:

Employee E7, named Amy, is assigned to department D5 and earns a salary of $60,000."


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-05/09/21 Re-posted the 
FUNDAMENTALS page, the content of which had mysteriously disappeared.

-03/15/21: Pruned the POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


Let's cast the example in a more accurate form. The statement in specialized natural language:

Employee identified by employee# (EMP#) with name (ENAME) is assigned to department identified by department# (DEPT#) and earns salary (SALARY).
formalizes symbolically as a predicate in FOPL. If the four values for the specific employee are plugged into the parenthesized terms, the statement reduces to the proposition (fact):
Employee identified by employee# E7, named Amy is assigned to department identified by department# D5 and earns a salary of $60,000.

which, if asserted as true by an authorized user, is represented in the database by the tuple {E7,Amy,D5,60000} in relation EMP {EMP#,ENAME,DEPT#,SALARY} that has four attributes values drawn from domains.

We model (structure) reality as objects with properties and relationships. As we explained, in conceptual modeling for database design:

  • Entities are primitive objects (e.g., employees), with individual properties:

- direct properties (e.g., salary); and possibly,
- indirect properties that are relationships among direct properties (say, between department assignment and salary);

  • Entity groups and the multigroup they form are compound objects (e.g., employee and department groups) where:

- within-group relationships among their entity members (e.g., uniqueness) are collective group properties; and,
- between-groups relationships (e.g., employees and department groups) are collective multigroup properties.

The RDM is simple set theory expressible in first order predicate logic (SST/FOPL) adapted for and applied to database management and is used to formalize conceptual models as logical models for database representation. A relational domain represents a property and an attribute defined on it a property of entities of a specific type (i.e., a property in a group context). Thus, if the domain MONEY represents the property $Amount, the attribute SALARY represents that property in the context of the employees group.

D&D say "Domains, or types, give us values that represent things we might wish to make statements about; relations give us ways of making those statements" (we stick to relations rather than relvars and contend that domains are distinct from programming data types for reasons we explained elsewhere, but this is not important for the purposes of this discussion). "Things we might wish to make statements about" are entities and values are those of entity direct properties represented by attributes. Thus, the statements -- the propositions (facts) represented by tuples -- specify the property values that comprise the objects (entities).

In FOPL a predicate only captures relationships among sets of objects -- properties can only be expressed as predicates among sets of objects and so only indirectly. For example, the predicate "has salary X" (property) is not captured in FOPL directly as such, but rather as something like "the member of the set of employee objects in the universe that is also a member of the set of objects with salary X in the universe". A relation is like constructing a Venn diagram -- for each tuple in the relation we start with a universe of entities, then carve out the intersection of all employees by some specific employee#, then by specific name, then by departmental assignment, then by salary, and so on. Each employee tuple defined by specific property values is the intersection of multiple sets of employees, each of which has one of those values. Visually, in a R-table both the rows and columns ought to represent sets of entities, but in Codd's RDM columns represent attributes -- properties -- (i.e., "has salary X") rather than "the set of entities with salary X". Since properties in FOPL are themselves predicates, it follows that the relation predicate is a predicate of predicates, requiring at least second order logic (SOL), or some logic higher than FOPL. But the relational advantages for database practice -- soundness, declarativity, decidability, data independence -- would then lost, which is why, having started originally with SOL, Codd had to switch to FOPL to avoid their loss.

Codd glossed over this sleight of hand -- allowing domains and attributes that are incorrectly interpreted as sets of property values instead of entities -- probably not noticing the implications. Consequently, while in FOPL an entity represented by a tuple is an intersection of different subsers of the universe of entities with pre-defined values, users intuitively think of it as having property values, which is one important reason they often get FOPL expressions of relations and tuples wrong.

“As we explained elsewhere,  conceptual modeling to date assumes the Ontological Commitment to Objects (OCO) where the world is comprised of a fixed number of immutable objects. This imposes some limitations: in database terms we cannot have dynamic, ever evolving schemas, and are limited to either simple schema extensions or periodic redesign (again, better than non-RDM alternatives, especially pre-RDM). This problem cannot be addressed by data independence alone -- it requires an Ontological Commitment to Properties (OCP). Neither any existing predicate logic, nor any existing set theory of which I am aware (and I've examined lots of both) can express OCP concepts. To take advantage of OCP requires modifying them, so that properties can be treated as primitive objects, and relationships among properties (instead of objects) can be expressed and deduced. I labor in the belief that OCP and building on and learning from Codd's RDM will lead to the creation of a powerful formal language, capable of capturing much more meaning, and extending the relational algebra to handle extremely complex and dynamic models uniformly and consistently.”
                                                                         --David McGoveran

To be used for formalization of conceptual models that assume the OCP the RDM (and SST-FOPL) must be revised and extended to treat the tuples of a relation as representing the instantiations of a predicate that has arguments taking property values. Much of McGoveran's published work to date has been about refining Codd's RDM based on OCO-FOPL. His work in progress -- yet unpublished -- will introduce a version of the RDM grounded in OCP-extended FOPL, while also incorporating his semantic theory of types -- a rather tall order. The posts here are intended to convey only some flavor thereof.

No comments:

Post a Comment

View My Stats