Saturday, November 3, 2018

Understanding Conceptual vs. Data Modeling Part 4: Properties-object Modeling



Revised 6/26/19.

In Part 1 and Part 2 we explained that when the RDM (1969-70) and the E/RM (1976) were introduced, there was no distinction between a conceptual and a logical level -- the conceptual-logical-physical distinction of levels of representation emerged in mid 80s. Only in 1980 did Codd specify three components of a formal data model -- structure, integrity, manipulation. While the RDM satisfies the specification, the E/RM does not: it is a conceptual modeling approach, weaknesses of which have been elaborated elsewhere[1]. In Part 3 we presented a common example of conceptual-logical conflation (CLC), and corresponding confusion of types of model (conceptual, logical, physical, and data).

As promised, here we outline a new conceptual modeling approach derived by David McGoveran from his work formalizing Codd's RDM. It makes an ontological commitment different from that by conventional modeling, which requires revision and extension of the RDM -- an objective of David's effort.



------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE 
I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS 

NEW: The Key to Relational Keys: A New Perspective

 
NEW: SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:

  • @dbdebunk: will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.
  • @ThePostWest: will contain evidence for, and my take on the spike in Anti-semitism that usually accompanies existential crises. The current one is due to the decadent decline of the West and the corresponding breakdown of the world order.

HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------

Objects, Properties, and Relationships


Conceptual modeling divides the world into objects with properties, and relationships (e.g., traditional E/R modeling: entities with attributes, and relationships among (sets of) entities of different types). Codd distinguished "network applications" that focus on relationships among individual entities from non-network applications that focus on relationships among groups of entities and introduced the RDM explicitly for the benefit of the latter.

Note: To avoid conceptual-logical confusion/conflation (CLC), we reserve set (of tuples) for the logical level, and use group (of entities) at the conceptual level.

During his work formalizing Codd's RDM, David McGoveran had the insight that relationships describe how objects at one level of abstraction are combined/related at a higher level into new, more complex objects. How that happens depends on which objects are defined as "primitive" and the order in which we define compound objects. A relationship among primitive objects (e.g., entities, entity groups) is just a property of the compound object for which they are qualified members (group, multigroup respectively): from a component object's perspective it is a relationship, but from the perspective of the compound object it is a property[2].

Ontological Commitment


Traditional modeling (e.g., E/RM) makes an ontological commitment to objects (OCO):
“Under the OCO, objects are "first class" concepts. A property exists only in association with objects of some type, and a procedure is applied to each object to determine whether it has the property (i.e., if the property exists, or has a value). The object is concluded to have the property by virtue of the property being observed or measured in association with the object, specifically vis-à-vis the result of the procedure. The OCO philosophically and formally treats the concept of object as primary, and that of property as secondary, consequential, or derived from the primary concept of object.”
--David McGoveran
We shall refer to conceptual modeling that assumes the OCO as Object-properties Modeling (OpM).

While we tend to break up the world intuitively into objects, we actually do not observe them directly, but infer them from observable properties:
“We see contrast, color, hue, etc.; we touch/push/pull to observe smoothness, roughness, sharpness, resistance, weight, mass, density, etc.; we hear pitch, volume, etc. -- there is no way these can sense "objectness" directly. Instead, when a co-occurrence of properties is persistent in time and space (i.e., doesn't appear and disappear erratically), it is human nature to give the collection a name. Once it is named, and the name has become commonplace, it is used as an object type -- an abstraction (i.e., shorthand) for the collection. In other words, a collection of properties is assigned object type status by being named, or at least "recognized" as having been previously observed. Those properties are then understood as the defining properties of objects of the type.”
--David McGoveran
For example, the collection of properties including job title, salary, departmental assignment, and so on, observed repeatedly to occur together, might be perceived as a type of objects named "employee", from which point on objects of that type -- specific employees -- are inferred whenever these properties are observed to co-occur.

McGoveran refers to this perspective, which gives properties first class status and recognizes the derived status of objects, as ontological commitment to properties (OCP).

“Under the OCP, objects have meaning only in consequence of defining properties: object types are named or otherwise referenceable collections of properties that are observed to occur together, and that the objects of those types have been defined to "have". The OCP philosophically and formally treats the concept of object as secondary, consequential, or derived from the primary concept of property.”
--David McGoveran
Note that under OCO how objects are recognized and distinguished from one another is not specified: it is assumed that objects exist independently of properties and are somehow directly observable -- properties are consequential. Under OCP properties exist independently of objects and are directly observable/measurable; objects -- essentially named collections of co-occurring properties -- are consequential, and assumed to 'have' those properties.

The OCP is more realistic (i.e., in line with human sense perceptions), and has certain advantages for data modeling, which makes it an empirical basis from which to develop an alternative approach to conceptual modeling, to which we will refer as Properties-object Modeling (PoM).
 

PoM and the RDM


A formal data model is used to formalize conceptual models as logical models for database representation[3]. The OCP (and, therefore, PoM) raises some important issues with the traditional interpretation of the formal underpinnings of the RDM.

As introduced by Codd, the RDM is rooted in simple set theory (SST) expressible in first order predicate logic (FOPL) -- the dual formal theoretical foundation responsible for its advantages: declarative, decidable data sublanguages[4], physical[5] and logical data independence[6], semantic correctness and system-guaranteed logical validity[7], and favorable power-to-simplicity ratio[8]. In FOPL, properties can only be expressed as predicates among objects and so only indirectly -- a predicate only captures relationships among sets of entities. For example, the predicate "the lipstick is red" (where redness is a property) actually captures something like "the member of the set of lipstick objects in the universe that is also a member of the set of red objects in the universe", not "the lipstick object has the redness property".

“Visually, think of both the tuples and attributes of a relation as representing sets of entities, and of the relation as constructing a Venn diagram -- for each tuple in the relation, we start with a universe of entities (say employees), then carve out the intersection of all employees by some specific gender, then by some specific salary, then by some specific title value, specific employee ID, and so on. The final intersection of these sets has to be a single entity represented by a tuple defined by all the corresponding attribute values.”
--David McGoveran
Otherwise put, both the tuples and attributes of a relation ought to represent sets of entities, but in Codd's RDM attributes represent properties (i.e., "redness", rather than FOPL's "set of red things")[9]. Since FOPL properties must themselves be predicates, it follows that a relation predicate is a predicate of predicates, requiring at least second order logic (SOL), or some higher logic than FOPL. But having started originally with SOL[10], Codd switched to FOPL[11] precisely to avoid loss of the relational advantages.

Moreover, practitioners intuitively think of an entity represented by a tuple as having properties with values, rather than being some intersection of different subsets of the universe of entities with pre-defined values, which is one important reason they often get FOPL expressions of relations and tuples wrong. Codd glossed over the counterintuitive nature of his sleight of hand -- allowing domains and attributes that are incorrectly interpreted as values of properties instead of entities -- probably not noticing the implications.

“As it stands, the formal foundations of the RDM -- though far better than alternatives -- have some limitations: we are in a world with a fixed number of immutable objects, rather than a dynamic world in which new objects arise and objects can morph. In database terms, this means we cannot have dynamic, ever evolving schemas, and are limited to either simple schema extensions or periodic redesign (again, better than non-RDM alternatives, especially pre-RDM). This problem cannot be addressed by data independence alone. Neither any existing predicate logic, nor any existing set theory of which I am aware (and I've examined lots of both) can express OCP concepts. To take advantage of OCP requires modifying them, so that properties can be treated as primitive objects, and relationships among properties (instead of objects) can be expressed and deduced. I labor in the belief that OCP and building on and learning from Codd's RDM will lead to the creation of a powerful formal language, capable of capturing much more meaning, and extending the relational algebra to handle extremely complex models uniformly and consistently.”
-- David McGoveran
To be used for formalization of conceptual models that assume the OCP, the RDM (and SST-FOPL) must be revised and extended to treat the tuples of a relation as representing the instantiations of a predicate that has arguments taking property values. Much of McGoveran's published work to date has been about refining Codd's RDM in light of FOPL. His work in progress -- yet unpublished -- will introduce a revised version of the RDM grounded in extended FOPL, while also incorporating his semantic theory of types -- a rather tall order. The posts here are intended to convey some flavor thereof.


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


References

[1] Nijssen, G.M., Duke, D.J., Twine, S.M., The Entity-Relationship Data Model Considered Harmful.

[2] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming.

[3] Pascal, F., What Is a Data Model, and What It Is Not.

[4] Pascal, F., Natural, Programming, and Data Language.

[5] Pascal, F., Physical Independence Parts 1-3.

[6] McGoveran, D., Date, C.J., On View Updating.

[7] Pascal, F., Logical Validity and Semantic Correctness.

[8] Pascal, F., Simplicity: Forgotten, Misunderstood, Underrated Relational Objective.

[9] Pascal, F., What Relations Really Are and Why They Are Important.

[10] Codd, E. F., Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks.

[11] Codd, E.F., A Relational Model of Data for Large Shared Data Banks.








No comments:

Post a Comment