Sunday, February 15, 2015

The Conceptual-Logical Conflation and the Logical-Physical Confusion (UPDATED)

GE: The future in data modeling is Object Role Modeling (ORM). It is a far superior way to approach data modeling (compared to any record-based methods such as relational) that avoids all the pitfalls of "Table Think" and the necessity of normalization.

Big data or any other kind of data--you still need to know your data and what it represents. That is the myth in big data--that you don't need a schema, i.e., knowledge of what the data means. True you may not need a SQL schema in Oracle, but you do need to know your data. You need to have names for things (that is the vocabulary) and their relationships.

Points arising:
  • ORM is conceptual, not data modeling. It originates in NIAM, whose author named his book CONCEPTUAL SCHEMA AND RELATIONAL DATABASE DESIGN--clearly recognizing that business models are not represented directly in databases and must be be mapped to logical models. A data model is the "mapping mechanism" so to to speak. Being rooted in linguistics, NIAM is is completely consistent with predicate logic and RDM. Objects and roles are not mandatory--NIAM did not require them--they just add complexity.
  • Referring to RDM as "record-based" reveals the logical-physical confusion (LPC). Records are an implementation aspect and the RDM is purely logical-- it imposes no record-based physical storage on the DBMS.
  • I have no idea what "the pitfalls of table-think" are, but as I show in paper #1, conceptual modeling and its mapping to logical models can yield fully normalized databases. Further normalization is necessary only for design repairs.
  • Yes, indeed, you most certainly need to know what the data means (see my columns @AllAnalytics). That is why the RDM has a dual theoretical foundation--predicate logic and set theory. The former provides the interpretation--meaning--of the R-tables of the latter.

DT: ... When we model “reality” directly in relational notation, we are forced into that same “conceptual time” classification of every “fact”. Is it an attribute of a table or a foreign key between tables? This is the same problem that the E-R model presents to us – but without the formal recognition provided by Chen’s “Shift” operator.
I suspect the most important thing is to maintain awareness of conceptual versus logical/physical modeling. It is the old “confusing the map for the territory” problem yet again.

As to relations versus graphs, my own “take” on the “superiority” of relational representation versus graph notation is probably one of perception. I suspect that, as the expressiveness of the logic captured in relational algebra versus a NIAM graph is equivalent, the real difference is psychological rather than formal.

Points arising:
  • There is no "forcing". We do not model reality directly in relational notation, that is the common conceptual-logical conflation (CLC)--the very error deemed important to avoid! Conceptual modeling must precede data modeling. Once we are clear about the reality to be represented in terms of properties and classes of entities with attributes as expressed by business rules, we use the RDM to map them to domains and R-tables constrained by integrity constraints--the schema--for database representation.
  • NIAM has a graphic--not graph!--notation, which is precisely why it cannot be represented directly in the database: what manipulation and integrity can a DBMS apply to it? The purpose of a database is  manipulation and integrity. To that end the superiority of the RDM is hardly just one of just perception: it is the only theoretically sound, complete, most general and simplest data model.

No comments:

Post a Comment