Saturday, November 3, 2018

Understanding Conceptual vs. Data Modeling Part 4: Property-Entity Modeling

In Part 1  and Part 2  we explained that when the RDM (1969-70) and the E/RM (1976) were introduced, no clear distinction was made between an informal conceptual level as we now understand it, and a formal logical level. In 1980 Codd gave the first definition of a formal data model, and in the later 80s the conceptual-logical-physical distinction of levels of representation emerged.  If the definition is applied to the E/RM and the RDM, only the latter explicitly satisfies it at the logical level. In Part 3 we presented a typical case of conflation, common in the industry, of the conceptual and logical levels, and confusion of types of model (conceptual, logical, physical, and data).

While the E/RM can be used for conceptual modeling, its weaknesses as such have been thoroughly discussed elsewhere[1], and we will not repeat them here. As promised, we outline a new conceptual modeling approach that makes a different ontological commitment than all modeling to date, and requires RDM extensions for consistency with it.

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 


NEW: The Key to Relational Keys: A New Perspective


I deleted my Facebook account. You can follow me on Twitter:

  • @dbdebunk: will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.
  • @ThePostWest: will contain evidence for, and my take on the spike in Anti-semitism that usually accompanies existential crises. The current one is due to the decadent decline of the West and the corresponding breakdown of the world order.

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 

Groups and Relationships As Properties

The E/RM divides world as entities with properties (called attributes), and relationships among entities of distinct types, or of the same type playing distinct roles. There are sets of entities of the same type (sharing common properties), and relationship sets. To keep levels of representation distinct, we shall use the informal group instead of the formal set at the conceptual level.

During his work formalizing Codd's RDM, David McGoveran had the insight that (1) relationships exist not just among entities, but also among properties of entities, among all entity members of a group, and among groups, and (2) they are actually properties of entities and groups. Specifically, if we refer to the properties of entities as first order properties (1OP), then:
  • Relationships among 1OPs are second order properties (2OP) of entities;
  • Relationships among all entity members of groups are third order properties (3OP) of groups;
  • Relationships among groups (whether among their individual members, or collectively as groups) are fourth order properties (4OP) of a multigroup formed by related groups[2,3].

Thus, at the conceptual level the objects and properties are:

  • Entities with 1OPs and 2OPs;
  • Groups with 3OPs;
  • Multigroups with 4OPs.

Note: E/RM's relationships are the 4OPs arising from relationships among groups due to relationships among individual members of those groups. There are also 4OPs that arise from relationships among groups collectively.

Object Ontological Commitment

In E/RM, one starts with entities that are "identifiable things" -- directly observable objects -- which is another way of saying that E/R modeling makes an ontological commitment to objects (OCO).

“Under the OCO, objects are "first class" concepts. A property exists only in association with objects of some type, and a procedure is applied to each object to determine whether it has the property (i.e., if the property exists, or has a value). The object is concluded to have the property by virtue of the property being observed or measured in association with the object, specifically vis-à-vis the result of the procedure. The OCO philosophically and formally treats the concept of object as primary, and that of property as secondary, consequential, or derived from the primary concept of object.”[2]
“Note that in E/RM under the OCO:
  • How objects are recognized, and distinguished from one another is not specified;
  • Properties are consequential to independently existing objects.

Property Ontological Commitment

But while we tend to break up the world intuitively into objects, we actually do not observe them directly, but infer them from observable properties:

“We see contrast, color, hue, etc.; we touch/push/pull to observe smoothness, roughness, sharpness, resistance, weight, mass, density, etc.; we hear pitch, volume, etc. -- there is no way these can sense "objectness" directly. Instead, when a co-occurrence of properties is persistent in time and space (i.e., doesn't appear and disappear erratically), it is human nature to give the collection a name. Once it is named, and the name has become commonplace, it is used as an object type -- an abstraction (i.e., shorthand) for the collection. In other words, a collection of properties is assigned object type status by being named, or at least "recognized" as having been previously observed. Those properties are then understood as the defining properties of objects of the type.”
                               --David McGoveran
For example, the collection of properties including job title, salary, departmental assignment, and so on, observed repeatedly to occur together, might be perceived as a type of objects named "employee", from which point on objects of that type -- specific employees -- are inferred whenever these properties are observed to co-occur.

McGoveran refers to this non-OCO perspective, which gives first class status to properties and recognizes the derived status of objects, as the ontological commitment to properties (OCP).

“Under the OCP, objects have meaning only in consequence of defining properties: object types are named or otherwise referenceable collections of properties that are observed to occur together, and that the objects of those types have been defined to "have". The OCP philosophically and formally treats the concept of object as secondary, consequential, or derived from the primary concept of property.”[2]
In other words, objects are consequential to independently existing properties that are directly observable, from the co-occurence of which we infer objects of types defined to have those properties.

The OCP is more realistic (i.e., in line with human sense perceptions), and because it has certain advantages, is an empirical basis from which to develop an alternative approach to conceptual modeling.

Properties and the RDM

The OCP (and, therefore, P/E modeling) raises some important issues with the traditional interpretation of the formal underpinnings of the RDM. As introduced by Codd, the RDM is rooted in simple set theory (SST) -- which assumes the OCO -- expressible in first order predicate logic (FOPL)[4]. It is this formal theoretical foundation that confers the practical advantages of the RDM.

In FOPL, predicates can only be expressed as predicates among objects and so only indirectly. Each symbolic argument in a predicate is instantiated from a set of objects. A predicate corresponds to the set of propositions which are its possible instantiations (represented by tuples), so that a predicate only captures relationships between sets of entities. For example, the predicate "the lipstick is red" (where redness is a property) actually captures something like "the member of the set of lipstick objects in the universe that is also a member of the set of red objects in the universe", not "the lipstick object has the redness property".

“Visually, think of both the tuples and attributes of a relation as representing sets of entities, and of the relation as constructing a Venn diagram -- for each tuple in the relation, we start with a universe of entities (say employees), then carve out the intersection of all employees by some specific gender, then by some specific salary, then by some specific title value, specific employee ID, and so on. The final intersection of these sets has to be a single entity represented by a tuple defined by all the corresponding attribute values. Codd glossed over the counterintuitive nature of this, allowing domains and attributes that are incorrectly interpreted as values of properties.”
                                      --David McGoveran
Otherwise put, both the tuples and attributes of a relation ought to represent sets of entities. But Codd's RDM attributes represent properties (i.e., redness rather than FOPL's set of red things). Since FOPL properties must themselves be predicates, it follows that a predicate representing a relation is a predicate of predicates, and so at least second order predicate logic (SOPL), or some higher logic than FOPL. Using SOPL would lose of the advantages of the FOPL based RDM. Moreover, practitioners intuitively think of the entity represented by a tuple as having properties with values, rather than being some intersection of different subsets of the universe of entities with pre-defined values, which is one important reason practitioners often get FOPL expressions of relations and tuples wrong. Codd glossed over the implications of his slight of hand, and probably did not notice the problem.

Property-Entity Modeling

A formal data model that satisfies the Codd definition and is used to formalize conceptual models as computable logical models for database representation[5] must be consistent with the ontological commitment of the conceptual modeling approach producing those models. Consequently, to be used for data modeling (aka logical database design) in conjunction with conceptual modeling that assumes the OCP rather than OCO, the RDM (and SST-FOPL) must be extended (1) to treat the tuples of a relation as representing the instantiations of a predicate that has arguments taking property values, and (2) to make the OCP.

 "As it stands, the formal foundations of the RDM -- though far better than alternatives -- have some limitations: we are in a world with a fixed number of immutable objects, rather than a dynamic world in which new objects arise and objects can morph. In database terms, this means we cannot have dynamic, ever evolving schemas, and are limited to either simple schema extensions or periodic redesign (again, better than alternatives, especially pre-RDM). This problem cannot be addressed by data independence alone. Neither any existing predicate logic, nor any existing set theory of which I am aware (and I've examined lots of both) can express OCP concepts. To take advantage of OCP would require modifying them so that properties can be treated as primitive objectst, and relationships among properties (instead of objects) can be expressed and deduced. I labor in the belief that OCP and building on and learning from Codd's RDM will lead to the creation of a powerful formal language, capable of capturing much more meaning, and extending the relational algebra to handle extremely complex models uniformly and consistently.”
                                    -- David McGoveran
For convenience, let's define Property-Entity Modeling (P/EM) for now as a modeling approach that (1) incorporates McGoveran's insight, and (2) assumes the OCP. In PE/M:
  • Only properties are directly observable;
  • Objects are consequential to, and identifiable/distinguishable by their defining properties;
  • Relationships among entity properties, among entity members of a group, and among groups may be understood as properties of entities, groups, and multigroups, respectively.

Note: We recommend "modeling" approach instead of "model" (e.g., P/E modeling) to avoid the common confusion of the enterprise-specific conceptual models with the modeling approach that produces them.

Much of McGoveran's published work to date has been about refining Codd's RDM -- grounded in the OCO -- in light of FOPL. His yet unpublished work in progress also introduces a new version of the RDM grounded in the more dynamic OCP and extended FOPL, while also incorporating his semantic theory of types[2] (early drafts of selected chapters may be found here). This is a tedious, long term effort, but stay tuned.


[1] Nijssen, G.M., Duke, D.J., Twine, S.M., The Entity-Relationship Data Model Considered Harmful.

[2] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[3] Pascal, F., Relationships and the RDM Part 1: Kinds of Relationships.

[4] Pascal, F., What Is a True Relational System (and What It Is Not).

[5] Pascal, F., Data Model: Neither Conceptual, Nor Logical, Nor Physical Model.

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

No comments:

Post a Comment