Saturday, September 29, 2018

Conceptual Modeling, Ontological Commitment, and the Data Model




Revised: 10/1/18

We are culturally and linguistically conditioned to view the world as consisting of objects with properties[1]. Objects in a universe thereof that share common properties are of the same type and form a class, distinguishing them from the objects that are not and do not. Applying a class definition to the universe (i.e., selecting out the objects with the common properties) produces a group of objects of that type.

Ontology is the philosophical study of concepts that directly relate to being, existence, reality, as well as the basic categories of being, and their relationships -- questions concerning what entities exist or may be said to exist and how such entities may be grouped, related, and subdivided according to similarities and differences[2].

Note: We use 'object' in the general, not OO sense. Philosophical ontology should not be confused with "computer science ontology", whereby the term ontology was usurped, and was used by programmers to mean a conceptual graph of directed relationships among objects (and only sometimes among object types).

Conceptual modeling (1) identifies types of real world objects and (2) formulates business rules that specify their properties and relationships[3] and, thus, makes an ontological commitment that the data model used to formalize conceptual models as logical models for computable database representation must be consistent with. Unfortunately, due to lack of foundation knowledge in the industry[4], practitioners are largely unaware of, and oblivious to the ontological underpinning of modeling and their implications for database practice, one reason why it has not progressed for the last five decades.


------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE 

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS 

NEW: The Key to Relational Keys: A New Perspective

NEW: SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:

  • @dbdebunk: will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.
  • @ThePostWest: will contain evidence for, and my take on the spike in Anti-semitism that usually accompanies existential crises. The current one is due to the decadent decline of the West and the corresponding breakdown of the world order.

HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------

Conceptual Modeling: Chen vs. Codd/McGoveran


Even if not exactly as defined by Chen, one form or another of the Entity-Relationship Model (E/RM) has dominated modeling practice for the last fifty years.
“... we consider entities and relationships. An entity is a "thing" which can be distinctly identified. A specific person, company, or event is an example of an entity. Entities are classified into different entity [groups]  such as EMPLOYEE, PROJECT, and DEPARTMENT ... if entity employee denotes an entity which exists in our minds, there is a test associated with each entity [group as to] whether an entity belongs to it. For example, if we know an entity is in the entity [group] EMPLOYEE, then we know that it has the properties common to the other entities in the [group].”

“A relationship is an association among entities. For instance, "father-son" is a relationship between two "person" entities. A relationship set is a mathematical relation among n entities, each taken from an entity [group]:

{[el, e2, ..., en] | el ϵ El,e2 ϵ E2, ..., en ϵ En}
and [each association of] entities, [el, e2, ..., en] is a relationship. Note that the sets Ei in the above definition may not be distinct. For example, a "marriage" is a relationship between two entities in the entity set PERSON.”

“The information about an entity or a relationship ... is expressed by a set of attribute-value pairs ... An attribute is a function which maps from an entity set or a relationshi set into a value set or a Cartesian product of value sets:

f: Ei or Ri →  Vi or Vi1 x Vi2 x ... x Vin.”
“Note that relationships also have attribute. Consider the relationship set PROJECT-WORKER. The set PERCENTAGE-OF-TIME, which is the portion of time a particular employee is committed to a particular project, is an attribute defined on the relationship set PROJECT-WORKER. It is neither an attribute of EMPLOYEE nor an attribute of PROJECT, since its meaning depends on both the employee and project involved.”[5]
where ϵ symbolizes "belongs to" or "is member of".

Note: The various weaknesses of the E/RM have been criticized elsewhere, including closeness to implementation and terminology overload [6], and confusion with a data model[7]. For the purposes of this discussion we note that it suffers from conceptual-logical conflation (CLC)[8]. Set and attribute are formal logical concepts corresponding to the informal conceptual concepts entity group and property in context[9] (a property in context is to a property at the conceptual level what an attribute is to a domain at the logical level, but in what follows we drop 'in-context' for simplicity).

If entities are "identifiable things" (i.e., objects), so are collections and associations thereof. So while for Chen relationships are distinct from entities, one can't help conclude that they are "derived compound entities" (i.e., also objects). Thus, in E/RM the objects are entities and entity groups. But note that:
  • As entities that are associations of entities, relationships share the combined properties of the associated entities (which can be of the same type in different roles, as in the marriage example), and can also have properties of their own (as in the projects and workers example);
  • Entity groups are objects devoid of properties.

Codd provided only a sparse outline of the conceptual model to which the RDM was intended to correspond at the logical level. The objects are entities, entity groups (collections of entities), and multigroups (collections of groups)[10]. In his effort to formalize Codd's RDM, McGoveran had the insight that, distinct from the E/RM, (1) all three types of objects have properties and (2) relationships
are properties, not entities[9]:
  • A relationship among first order properties (1OP) of entities is a second order property (2OP) of those entities;
  • A relationship among all entity members of a group is a third order property (3OP) of the group;
  • A relationship among group members of a multigroup is a fourth order property (4OP) of the multigroup[9].


The Object Ontological Commitment


In E/RM, objects -- entities and groups -- are "identifiable things" (i.e., they exist independently and are directly  observable). Another way of saying it is that E/R modeling makes an object ontological commitment (OCP).
“Under the object ontological commitment (OCP) objects are "first class" concepts. A property exists only in association with some object, and a procedure on is applied to it to determine whether the object has the property (i.e., if it exists or is a property of the object) by virtue of being observed or measured in association with the object vis-à-vis the result of the procedure. The OCO philosophically and formally treats the concept of object as primary, and that of property as secondary, consequential, or derived from the primary concept of object.”[9]
But while we tend to break up the world intuitively into objects, we do not observe them directly, we infer them from properties:
“We see contrast, color, hue, etc.; we touch/push/pull to observe smoothness, roughness, sharpness, resistance, weight, mass, density, etc.; we hear pitch, volume, etc. -- there is no way these can sense "objectness" directly, a well-known neurological fact!) Instead, when a co-occurrence of properties is persistent in time and space (i.e., doesn't appear and disappear erratically), it is human nature to give the collection a name. Once it is named, and the name has become commonplace, it is used as an object type -- an abstraction (shorthand) for the collection. In other words, a collection of properties is assigned  object type status by being named, at which point the properties in the collection are associated with objects of that type, and become their defining properties.”
--David McGoveran
For example, the collection of properties including job title, salary, departmental assignment, and so on observed repeatedly to occur together is asserted as object type named employee, from which point on objects of that type -- specific employees -- are inferred whenever these properties are observed to co-occur.

Note: We make an important distinction between properties and assigned names (e.g., employee name)[11].

A property ontological commitment (OCP) is, thus, more realistic (i.e., in line with human sense perceptions).
“Under the property ontological commitment, properties are "first class" concepts. Object types are named collections of properties that are observed to occur together, and that the objects of those types have been defined to "have", and have meaning only in consequence of those defining properties. The OCP philosophically and formally treats the concept of object as secondary, consequential, or derived from the primary concept of property.”[9]
In other words, properties exist independently and are the only result of direct observation, from which we infer objects of a type defined in terms of those properties.

Note very carefully that even though in E/RM under the OCO objects are identifiable, how they are identified and what distinguishes one from the others is not specified. Under the OCP, objects -- entities, entity groups, and multigroups -- are identified and distinguished by their defining properties.


For convenience, let's for now call the Codd/McGoveran approach that makes the OCP Property-Entity Modeling (P/EM); and Codd's version that makes (like the E/RM) the OCO, Entity-Property Modeling (E/PM).

The OCP, P/EM, and the RDM


“The RDM as introduced by Codd is rooted in simple set theory (SST) -- which assumes the OCO -- expressible in first order predicate logic (FOPL), forcing properties to be expressed only in terms of sets of objects -- and hence by  predicates. In FOPL, predicates can only be about objects: a value really represents a subset of the objects in the universe that have that value. Visually, think of a relation as constructing a Venn diagram -- for each tuple in the relation, we start with a universe of entities (say employees), then carve out the intersection of all employees by some specific gender value, then by some specific salary, then by some title, employee ID, and so on. The final intersection of these groups has to be a single entity represented by a tuple defined by all the corresponding attribute values (which is why people get the logic wrong)! By contrast with this traditional approach, we want to think of the entity represented by the tuple as HAVING properties, not being some intersection of different subsets of the universe of entities.”
--David McGoveran

Otherwise put, properties are not directly expressible in FOPL, but only indirectly by an expression involving the sets of objects that have them. A predicate corresponding to a set of propositions (tuples) only captures relationships between sets of entities. For example, the predicate "the lipstick is red" (where redness is a property) actually captures something like "the member of the set of lipstick objects in the universe that is also a member of the set of objects of red things in the universe", not "the lipstick object has the redness property".

A data model used to formalize conceptual models as computable logical models for database representation must be consistent with the ontological commitment of the conceptual models it formalizes. P/E modeling that makes the more realistic OCP requires that a version of the RDM that makes the same commitment.

“While standard predicate logics (of all orders) consider properties and relationships to be predications about objects, I consider objects to be the result of predications (a propositional function) of properties. What is needed is an OCP foundation for set theory (and the RDM) that give properties "first class" status, both conceptually and formally. We can then reason about properties without having to bootstrap a definition from an arbitrary and indefinable universe of a priori differentiated objects. We are then no longer locked into a world of a fixed number of immutable objects, but can define a dynamic world in which new objects arise and objects can morph. In database terms, this means we can have dynamic, ever evolving schemas, rather than being limited to either simple extensions or periodic redesign, a problem that cannot be addressed by data independence alone. This lets us create a much more powerful formal language, capable of capturing much more meaning, and extending the relational algebra to handle extremely complex models uniformly and consistently.”
--David McGoveran
Much of David's published work to date has been about refining and correcting Codd's RDM -- grounded in the OCO -- in light of FOPL. His yet unpublished work in progress is about further refining and extending a RDM grounded in the more dynamic OCP that promotes properties to first class concepts, and by demonstrating the advantages thereof. This is a tedious, long term effort, but stay tuned.


References

[1] Philosophical ontology, Wikipedia.

[2] Olson, R., MEANING AND ARGUMENT.

[3] Pascal, F., Logical Symmetric Access, Data Sublanguage, Kinds of Relations, Redundancy, and Consistency.

[4] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER.

[5] Chen, P., The Entity-Relationship Model - Toward a Unified View of Data.

[6] Nijssen, G.M., Duke, D.J., Twine, S.M., The Entity-Relationship Data Model Considered Harmful.

[7] Pascal, F., Data Model: The RDM Is, the E/RM Isn't.

[8] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[9]
McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[10]  Conceptual Modeling for Database Design, forthcoming.

[11] Pascal, F., The Key to Relational Keys: A New Understanding.





Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.








No comments:

Post a Comment