Saturday, November 2, 2013

More on E/RM: Still Not a Data Model


In a previous site update I linked to three online exchanges on my post about E/RM and considered a response. Here's some thoughts.
MQ: I find this a strange discussion - is there value other than in the realm of philosophy? If ERMs are used by IT professionals across the world to direct the design and build of the majority of applications guided by standard methodologies, is the view of this argument that these were all build wrongly? Regardless of success? Out of interest, is there a common Relational Modelling tool, that is not also an ERM tool, that models the full Codd definition? Is the inferred conclusion that only the RM models data, and ERM, BM, OOM, BOM, plus any other techniques do not? I think that is a little limiting.


By the by, I measure success for ER modelling and RM modelling quite easily - I do both and believe they reinforce each other. I have watched people fail by trying to implement RM perfectly as many times as I have watched people fail through not doing it. I have a bad memory of arguing with a purist that, in banking, you should not model a person's name with relations of Letter, Word, Word-Letter-Sequence, Name, Name-Word-Role, Involved-Party-Name, Involved-Party-Name-Type (whereas there are applications in Law Enforcement where this type of structure may have some use).
There are so many ways in which it is this comment that is strange:
  • My post claimed that E/RM is not a data model (in the sense in which RM is one). That is a fact, so what exactly is "philosophical" about it?
  • There are no "E/RMs"--plural--but one E/RM, singular. As is so common in the industry, by data model MQ means, erroneously, either conceptual or logical model--more likely, interchangeably. This confusion is quite clear: how could one use two data models simultaneously and how exactly could they "complement" each other"?
  • If the "standard methodologies" used by IT professionals are based on confusion of levels of representation, does this mean that they are "right"?
  • The existence or not of any modeling tools is irrelevant to whether E/RM is a data model or not. If the ER/M were a data model, it could be used to map conceptual models to their logical database representations--structure, manipulation and integrity. But it models reality, not data and lacks a manipulation component and most of the integrity component of a true data model.
  • I never claimed that "only RM models data". In fact, the hierarchic and network data models also do, but were discarded decades ago because they were, for various reasons, inferior to the RM. Other than those two, I am unaware of any other proposed data model that is formal and complete.
The last part of the second paragraph is sheer nonsense.
MQ: I understand you want better foundation knowledge for others - but what foundation knowledge? RM only with other aspects taking a lesser place? While Codd's foundations are still a strong backbone, the body of knowledge has progressed and modern architectures need to address these changes in knowledge to satisfy business requirements.
Not only I do, but I want at least some such knowledge, because very often there is none. This very comment is an excellent example: what "body of knowledge" has "progressed"? The use of vague terms like "modern architectures" and "business requirements" that for unspecified reasons are not "satisfied" by RM is so much arm waving. In my paper Business Modeling for Database Design I offer well-defined criteria by which any proposed data model and its superiority over RM should be assessed. Can MQ offer anything similar in support of his argument?
MQ: I would still be interested if you think RM is the only correct method of modelling data - I raised a couple of questions earlier that need your thoughts if you want to persuade readers to your point of view - and there are quite a few of us who do have the fundamentals you are referring to. Please debunk/agree. I do believe that if the professional body (those with good fundamentals and techniques) are successfully building systems then there is value to be gleened from this by all of us.
I just rejected the "only" argument put into my mouth. As to "correct", it depends on how one uses the term. For the informational purpose that RM satisfies--inferencing facts that are logical implications of facts represented in databases--the RM is superior, because it is the simplest way to guarantee logically correct results with respect to the real world and it has the highest scope-to-simplicity ratio: it can represent any reality with the least and simplest of constructs. Why use something inferior?
MQ: I would argue that an ERM is a model of data - If we take a RDB as the ultimate target implementation of data, and an ERM (or extended) can correctly design all the artifacts that are implemented, this means it is modelling the data. Granted, an ERM does not explicitly model some of the non-structural aspects of the original Codd definition.
If this is not strange, then I don't know what is: MQ concedes that E/RM does not satisfy the definition of a data model, but it is a data model anyway! We, "relational purists" call it inconsistency (very dangerous in a field founded on logic).
MQ: Strictly speaking, Codd defined a Database Model - not a Data Model. He explicitly made this clarification and went on to state that he expected growth and change as our knowledge increased, even bringing out a v2 and then going on to further thinking.
Actually, he defined both a general data model concept and a specific data model of database management, the RM. Any true scientist would never assume that any theory will never be extended/refined, or even made obsolete by another. But arm waving and vague terminology are not evidence for either. Can MQ specify--precisely, please!--the specific "developments" that are valid extensions to, or substitutions for RM?

Come to think of it, what exactly is the theoretical foundation of E/RM?
MQ: ERM is a data model – So says Date, Chen, etc. So says the majority of current industry experts. Refer to Date 6th edition p347. With very strong references to Codd (who he worked with), Date elegantly explains the differences between RM and ERM – but clearly believes both are data models (even allowing for the charitable comment). There are also several other methods of modelling data – ERM is more a mechanism to represent the data.
First, Date does not say that exactly, I suggest re-reading him. Second, as far as I know, Chen does not say it either. Third, I already mentioned two other data models that model data, not reality.
QM: ERM isomorphic to RM – looking to more modern implementations of Chens work, as used in data modelling, applications like Erwin implement significant portions of the RM – certainly...
If it is isomorphic, then why do we need two data models that do "the same thing"? And how can they be isomorphic, if one has more elements than the other?
MQ: [I] Don’t understand ERM?- Please feel free to look me up on Linked In. I am actually the original author of the most widely sold data warehouse data model in the world – I just don’t flaunt it as I don’t feel any need to do so. My legacy to-date is more than 25 major database systems still in production world-wide and maintained with low change-rates to the data structure. Either I know what I am doing or IT is much simpler than I think.
I don't cease to be amazed at how prolific or popular are confused with valid knowledge. Consider the logic of it (logic again!): If I am, indeed, correct about the poverty of foundation knowledge in the database field, amount of output and its popularity signify what? Familiarity with sources per se does not guarantee comprehension or appreciation thereof. I leave it to the reader to read the whole exchange and judge whether MQ demonstrates either with respect to Codd and Date.
MQ: For interest, no Relational Database strictly enforces the Relational Model – for example, duplicate rows can still be recorded especially when Index failures occur. These potential errors can be exacerbated in the modelling activity is profiling, knowledge, and principles are not understood. But I would argue that the same mistakes can be made in RM by practitioners who do not understand the value of Normalising etc.
I will insist on being pedantic and point out that no database can enforce RM, only DBMSs can, if they are intentionally designed as such, of c

See also What is a Data Model and which “Data Model” do you prefer. Sorry, but I won't bother with the rest.




Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment