Saturday, April 20, 2019

Understanding Data Modeling Part 2: "E/RM" and "RDM"




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's modeling mess that Friesendal has tried to map. We now proceed to apply it to the various industry "data models" considered by Friesendal, and his understanding thereof. In this part, we apply this knowledge to the first two industry "data models" considered by Friesendal -- the E/RM and RDM.


"Entity-Relationship Model"


“One of the first formal attempts at a framework for Data Modeling was the Entity-Relationship data model paradigm proposed [in 1976] by Peter Chen. Notice that in the original Chen-style, the attributes are somewhat independent and the relationships between entities are named and carry cardinalities ("how many" participants in each end of the relationship) ... Attributes are related to their "owner" entity" in what other people called "functional dependencies".”

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE 

Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. This content is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help to upkeep it by purchasing publications, or donations. Thank you.

LATEST UPDATES 

04/20/19: Added POSTS page with links to all site posts, to be updated monthly.
03/29/19: Updated the LINKS page – added:
02/29/19: Revised ThePostWest blog and Twitter page (see below).

LATEST PUBLICATIONS (order PAPERS and BOOKS)


USING THIS SITE 

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled. As other older posts are rewritten, they will also be relabeled. Use Blogger search for the rest. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics may work.

SOCIAL MEDIA 

I deleted my Facebook account. You can follow me:
  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict. 
------------------------------------------------------------------------------------------------------------------Points arising:
  • At the time when the E/RM was introduced, the three levels of representation had not yet emerged, and Codd had not yet published his meta-model outline (structure, integrity, manipulation). When both are considered, the E/RM is essentially a meta-model of conceptual models with some relational terms (relations, attributes) mixed in[1,2];
  • While we cannot criticize Chen ex-post-facto for conceptual-logical confusion/conflation (CLC)[3], we can, should, and do criticize it in current industry practice[4,5]: 
- at the conceptual level entities have properties, represented at the logical level by tuples and attributes, respectively;
- functional dependencies (FD) are not "relationships of properties to their owner entities" (i.e., conceptual), but logical (see below).

    Note: For the relationships between properties and entities see[6].


    “After Entity-Relationship modeling things took other directions leading the generic ER-diagrams in many styles. Here is (a part of) the Microsoft Northwind example database in the SQL Server diagramming tool. Note that the attributes have now been folded into the entities, which came to be called tables (in the SQL sense). With the advent of SQL, the named relationships were not named anymore. Since foreign key relationships are constraints, and constraints may have names in most SQL implementations, it really is strange why this happened. From a business semantics perspective, this is a very sad loss of information.”

    Points arising:
    • Per Part 1, diagrams are symbolizations of models (i.e., notation), distinct from the models and the modeling paradigm underlying them (i.e., the data model in the Codd sense). Changes in "styles of generic E-R diagrams" may or may not be due to changes in paradigm, and cannot be assumed to be modeling developments;
    • Attributes are currently understood to be logical representations of conceptual properties;
    • Entities did not "come to be named tables" -- depending on its design, a SQL table represents at the logical level one or more groups of entities, not an entity. Entities are represented by SQL rows;
    • The inter-group relationships due to M1:M2:M3:...:Mn relationships among their entity members are represented in SQL by named association tables (if they are not, they should be). The inter-group relationships due to 1:1 and M:1 member relationships are represented by named foreign key (FK) constraints. We explained how this difference happened for historical reasons[7], and proposed a RDM revision[8] whereby all inter-group relationships are represented by named association relations.


    The "Relational Data Model"


    “Before SQL there was the relational model, which was published by Dr. Ted Codd [in 1970] ... Looking back, there are some key issues, which were of high importance for the paradigm shift:
    • The relations (tables) are the key components (although the name of model suggests that relationships should be the core).
    • The tables should be "normalized" and intra-table relationships should be defined using "primary" and "foreign keys".”

    Points arising:
    • Database relations are not tables and SQL tables are not relations -- they can be visualized as such on some physical medium -- these are industry misconceptions[9]; similarly, 'relational' does not come from relationship (see note), but from mathematical relations, of which database relations are adaptations;
    • The understanding in the industry has been that database relations are by definition in 1NF, but for practical reasons should be in 3NF (see below). McGoveran has concluded, however, (based on his work formalizing, reinterpreting, and extending the RDM[8], that semantic consistency and DBMS-guaranteed logical validity[10] require 5NF[11] (i.e., relations are in 5NF by definition);
    • A referential (i.e. FK) constraint enforces an inter-relation relationship[12]; a PK constraint[13] enforces an intra-relation relationship (uniqueness) among all its tuples.

    Note: Like every set, a relation is a logical relationship (every tuple belongs to a class having the same relationship among its attributes). The relationship underlying the misconception, however, is that "among "tables" that is currently represented by a referential constraint.


    “If you ask me, the major contribution of "The Relational Model" is the focus on functional dependencies, which is at the very heart of Data Modeling. Because that is where the structure is residing. Functional dependency is originally a mathematical term, which indicates that something (a concept) is completely derived (by way of a function) from something that controls it (and possibly many other concepts). For example, a Status attribute is describing something very specific, of which it is functionally dependent: Is it describing the status of the Supplier or the City? What if it is depending functionally on CITY? We cannot see this since we just have the colocation as indication of dependencies.”
    “Third normal form is considered the desired state to be in. Loosely stated it simply means that all attributes are functionally dependent on the primary key. Getting the keys and functional dependencies in order is the target of "normalization".”

    Points arising:
    • The RDM is not "focused" on FDs; it has multiple major contributions, most of which are unknown, or unappreciated in the industry[8] (e.g., [10] are but two of them);
    • In fact, FDs were not originally part of the RDM per se, but emerged in database design, and are rather limited (not every logical relationship can be decomposed unambiguously into FDs, or without introducing "artificial" entity types" at the conceptual level). It's McGoveran's recent conclusion that relations are by definition in 5NF (which is not yet widely accepted) that raises their relational profile (see below);
    • A FD constraint on a relation represents a relationship among all entity members of the group represented by the relation (which is a third order property (3OP) of the group[14]). Constraints are the integrity, not structure component of the RDM[15], and are not visible in tabular displays of relations, but are discernible only in spelled out logical models (database schemas), or in conceptual models as business rules (BR), which they represent formally[16] (in a relation, FD constraints are implicitly enforced by PK constraints -- why?);
    • 5NF is the formal way of saying that a relation represents a group of entities of a single type. If and only if that is the case, the only dependencies that hold in it are FDs of non-key attributes on the PK (i.e., for every PK value there is exactly one value of every non-key attribute), and, thus, the values of those attributes determine the corresponding PK value. Contrary to conventional wisdom, 3NF does not guarantee this (why?).
    • Proper conceptual modeling and database design[17] produces relations (5NF). Explicit normalization to 1NF and full normalization to 5NF repair poor designs. The objective of the former is to eliminate non-simple domains[18], and of the latter to eliminate any dependencies that are not FDs of non-key attributes on the PK.

    (Continued in Part 3)




    Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


    References

    [1] Pascal, F., Data Model - The RDM Is, the E/RM Isn't.

    [2] Pascal, F., E/RM Models Reality, RDM Models Data.

    [3] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

    [4] Pascal, F., Conceptual Modeling Is Not Data Modeling.

    [5] Pascal, F., Understanding Conceptual vs. Data Modeling Parts 1-4.

    [6] Pascal, F., Property-Entity Modeling.

    [7] Pascal, F., Fourth Order Properties Parts 1,2.

    [8] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming.

    [9] Pascal, F., It’s Not Tables, It's the Relationships.

    [10] Pascal, F., Logical Validity and Semantic Correctness.

    [11] Pascal, F., Understanding Relations, Parts 1-3.

    [12] Pascal, F., The Key to Relational Keys - A New Understanding.

    [13] Pascal, F., Understanding Foreign Keys Parts 1,2.

    [14] Pascal, F., Relationships and the RDM Parts 1-3.

    [15] Pascal, F., Structure, Integrity, Manipulation: How to Compare Data Models.

    [16] Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints, and Database Consistency.

    [17] Pascal, F., How to Think (and Not to Think) During Database Design.

    [18] Pascal, F., The Costly Illusion: Normalization, Integrity and Performance.

    [19] Pascal, F., First Normal Form in Theory and Practice Parts 1-3.



    No comments:

    Post a Comment

    View My Stats