Sunday, June 12, 2016

Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation



From last week:

What's wrong with this picture? (Kinds of Data Models, LinkedIn.com)

David Hay: "Part of the ... confusion as to what exactly was meant by “data modeling”--conceptual, logical or physical--is that most data modeling activities seem to focus on achieving good relational database designs ... my approach is the portrayal of the underlying structure of an enterprise’s data--without regard for any technology that might be used to manage it ... a “conceptual data model” ... that represents the business."
Nothing raises uncertainty whether to laugh or cry better than attempts to dispel confusion which suffer from the very confusion they purport to dispel.
It's not clear what's special about Hay's approach, but conceptual modeling structures an enterprise--not its data--as classes of property-sharing entities, which is precisely why it is technology-agnostic by definition. However, a DBMS can only manipulate abstract symbols mathematically, so to be represented in a database, the model must undergo a symbolized formalization as a logical model. It's the logical model that structures data as a collection of true facts of different types about the conceptualized entities in the corresponding classes. A data model provides the abstract structure and integrity constraints that formally represent the classes and the business rules in the database and the manipulation operations for querying (e.g., relation, several types of constraints and relational algebra in the relational case). A conceptual model can be formalized via either a relational, hierarchic, or network data model, except that we tried the latter two and effectively discarded them decades ago because they proved prohibitively complex and inflexible.

Note: There are domain, attribute, multi-attribute, tuple, multi-tuple and multi-relation relational constraints.

Referring to the conceptual, logical and physical endeavors as "data modeling" reflects and reinforces confusion of levels of representation--the conceptual-logical conflation (CLC) and the logical-physical confusion (LPC). The terms conceptual modeling, logical design and physical implementation help avoid it and also stop obscuring the difference between themselves and between them--which are enterprise-specific--and a data model, which is not. Think of the conceptual model as territory, the logical model as map, the map medium (paper, plastic) as the implementation and the data model as the map legend.

Incidentally, it's the DBMS that manages the data, not the data model.

Nigel Higgs: "... many folks do not get the difference between the Barker entity relationship style of modeling and the relational style of modeling ... [because] the modeling conventions are very similar and the former [is always] a precursor to RDBMS design."

The E/RM is to enterprise-specific conceptual models what a data model (e.g., RDM) is to enterprise specific logical models. Given their levels of representation, the former "is a precursor to database (not RDBMS) design" by definition (duh!), but not necessarily relational design. There must be a 1:1 correspondence between a conceptual and logical representation, but what does "the modeling conventions are very similar" mean? Incidentally, if Barker's E/RM model anything like Chen's, it lacks the manipulation component--an important difference from a data model.

Clifford Heath: "Any terminology for models must project three aspects of intention: (a) audience, (b) level of detail and (c) purpose. These three variables are sufficient to discriminate all the main kinds of models in use. The traditional terms of "conceptual/logical/physical" are manifestly inadequate."

The three levels of representation are not inadequate, they just are. And, in fact, they do project three aspects of intention: they are different levels of abstraction, which differ precisely in level of detail and purpose and are intended for different audiences.
Remy Fannader: "Models are meant to describe sets of instances (objects or behaviors)."

It is the nature of object terminology to obscure rather than enlighten. How exactly does this statement help understanding?

Be that as it may, object orientation is, at its root, a programming, not data paradigm. There is no well-defined, formal "object data model" in the sense in which the RDM is one (if there is, what are--exactly, please--its structure, integrity and manipulation components?).


Note: Object databases have classes--the equivalent of relational domains--but not an equivalent of relations, for which reason the collection had to be invented, defying the insistence that everything is an object. Furthermore, domains encapsulate like classes, but relations intentionally do not, to permit ad-hoc queries, which object-orientation intended to avoid).



Quote of the Week (Jnan Dash, RDBMS vs. NoSQL: How do you pick?)
"The first consideration that needs to be made when selecting a database is the characteristics of the data you are looking to leverage. If the data has a simple tabular structure, like an accounting spreadsheet, then the relational model could be adequate. Data such as geo-spatial, engineering parts, or molecular modeling, on the other hand, tends to be very complex. It may have multiple levels of nesting and the complete data model can be complicated. Such data has, in the past, been modeled into relational tables, but has not fit into that two-dimensional row-column structure naturally." 
Dash belabors under the common misconception that structure is an inherent property of data that must be discovered and accommodated by data management technology. In fact, the structure of factual information is a matter of choice that conceptual modeling and database design make in order to optimize the cost-effectiveness of integrity enforcement and data manipulation.
“[The problem with any arbitrarily structured data] is not that the knowledge (useful facts) contained therein cannot be given a representation in RDM, but that data modelers choose not to analyze it. Documents are not unstructured--rather, they are very highly structured and come in many types. Few implementers are willing to take the time to model their content, often because of resource constraints, but just as often out of ignorance about how to use the RDM.” --David McGoveran
Which, of course, shifts tackling complexity from modeling and design (structure) to programmatic manipulation, at the expense of integrity and all practical benefits of the RDM.

Dash is a "tech visionary/executive consultant in Silicon Valley who spent 10 years at Oracle Corporation and 16 years at IBM in various database leadership positions." If this is his understanding of data fundamentals, what can we expect of the average practitioner?



Speaking of attempts to clarify that further muddy the waters, I just came across the following:
Gordon Everest: "I guess to start we need to know what you mean by a "conceptual model" and a "logical model" and also the difference between an ER model and a relational model. I would also argue that an ORM Fact model is NOT really the same thing. In ORM there is no concept of an entity record (tuple), although relational tables can be automatically generated from an ORM model (furthermore, guaranteed to be fully normalized). The inverse mapping from a relational model to an ORM model is incomplete and ambiguous (unless you assume the relational model is fully normalized, however, it is impossible for that to be known to the system). Thus they are not really the same thing. I would argue that even the act of forming records (i.e., clustering "attributes" into entity records) is a step toward physical implementation." --I just want to know what are different approaches you follow for designing a conceptual, logical and physical model, LinkedIn.com

Points arising:

  • ORM is a conceptual modeling approach with graphic presentation;
  • "Entity record (tuple)" is a three-fold confusion: entity is conceptual, tuple is logical and represents a proposition (fact) about an entity, record is one possible physical implementation of a tuple; 
  • The RDM is based on relations, tables are only one way to picture them on some medium for visualization; the two should not be confused; 
  • Inverse mapping from a (relationally designed) logical model to a conceptual model might be possible with some effort from a complete FOPL specification of its relations and integrity constraints, but not from just tables picturing the relations and SQL constraints;
  • "Forming records" is physical implementation;
  • Attributes (actually, values thereof) are "clustered" into tuples, properties into entities and fields into records.



No comments:

Post a Comment

View My Stats