Sunday, April 28, 2019

Understanding Data Modeling Part 3: OO/UML, and "Graph Data Models"




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's "data modeling" mess that Friesendal has tried to catalog. In Part 2 we applied this knowledge to the first two modeling approaches considered by Friesendal, the E/RM and RDM. We apply it here to other two, OO/UML and "GDM".


Object Orientation and Unified Modeling Language


“A "counter revolution" against the relational movement was attempted in the 90’s. Graphical user interfaces came to dominate and they required advanced programming environments. Functionality like inheritance, sub-typing and instantiation helped programmers combat the complexities of highly interactive user dialogs. The corresponding Data Modeling tool is the Unified Modeling Language ...”

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE 

Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you.

NEW 
  • 04/20/19: Added POSTS page with links to all site posts, to be updated monthly.
  • 04/22/19: Updated the LINKS page.
LATEST PUBLICATIONS (order PAPERS and BOOKS)

USING THIS SITE

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics may work.
SOCIAL MEDIA 

I deleted my Facebook account. You can follow me:
  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------


Points arising:
  • From a database perspective, the user interface, graphical or not, is an application, not DBMS function, and, thus, not the purview of a data model. To confer the multiple relational advantages upon data management, relational data sublanguages are intentionally limited to first order predicate logic (FOPL) and, thus, are not computationally complete. Such completeness is provided by application development (e.g., programming) languages hosting data sublanguages, which are not so limited and, being based on higher logic, are computationally complete (CCL)[1];
  • A type theory (inheritance, sub-typing, etc.) is an integral part of a data model, without which it can lay no claim to a formal concept of domain[2]. Codd assumed a rather intuitive and extremely limited type theory that causes all manner of difficulty (one of the reasons SQL does not have real domain support). This is one of the RDM extensions that McGoveran is working on[3,4];
  • A formal data model compliant with Codd's outline[5] is used to formalize conceptual models as logical models for database representation[6] (i.e., a logical model is an application of a data model). A diagramming tool that symbolizes such models supports (i.e., is able to express the constructs of) the data model employed to create them, but is not the data model;
  • UML supports object orientation (OO), which is a programming (and, thus, application development), not data paradigm. To the extent that it incorporates data management constructs, there is affinity to the graph approach and physical contamination[7] (see next). Moreover, industry references to a "object data model" notwithstanding, no Codd-compliant formal ODM has been specified.

Note: There are more general criticisms of UML[8].


Graph "Data Models"


“Yet another counter revolution happened at about the same time: Graphs emerged as data models in the late 1990’s. For many years, formal graphs have been part of mathematics and they are a well-researched area. In the Data Modeling community, however, graphs emerged remarkably late. The graph types of particular interest to data modelers are the directed graphs. This means that the relationships (edges) between nodes are directed. And that is a natural fit with the nature of semantics and, consequently, with the nature of data models (which represent business semantics).”

Points arising:

  • Models (plural) indicates the common model confusion[9]: a "graph" is not a graph data model (GDM) (in the Codd sense), but a logical model (i.e., an application thereof);
  • Graph technology has not "emerged in late 90s", it is being revived by an industry lacking foundation knowledge and familiarity with history. A previous generation of GDBMSs -- hierarchic and network (CODASYL) -- actually preceded the RDM and SQL that emerged in the 70s. They were not based on a formal GDM adapted from mathematical graph theory (like the RDM was from SST/FOPL), but abstracted ad-hoc from the industry practice at the time. Improvements notwithstanding, the current generation of GDBMSs still lack a formally specified GDM[10];
  • Were a GDM to be specified, it would satisfy needs of what Codd referred to as "network applications" concerned with relationships among individual entities. The RDM was introduced explicitly for the majority of non-network applications concerned with relationships among groups of entities (some of which are due to the relationships among their individual members[11]), to spare them the complexity and rigidity of the graph approach that does not add functionality[12]. In other words, two distinct semantics, data models, and "natural fits". This is poorly understood in the industry, with "GDM" -- absent a formal specification -- promoted as "progress over the RDM", and GDBMSs as superior to "RDBMSs" (meaning SQL DBMSs), with disregard for the distinct semantics, separation of concerns, soundness, and functional completeness[13].

“One of the great misconceptions of the Data Modeling community has been that relational databases are about relationships. They are not. In fact, the relational model focused on "relations" ("relvars") in the sense of having related attributes existing together in the table representing a tangible or abstract entity (type). So the "relations in the relational model are the functional dependencies between attributes and keys (also attributes). Relationships in the sense that there is a relationship between Customer and Order are “constraints”. Some constraints in SQL are the “foreign keys,” which is as high as you get as an attribute, after being a primary key, of course.”
“Caveat: the world is full of relationships, and they express vivid dynamics. This is the space that the graph data models explore. If you ask me, structure (relationships) is of higher importance than contents (the list of properties). Visuals are a great help and visualizing structure is the same as saying “draw a graph of it.”

Points arising:

  • Relational databases are about relationships: they represent conceptual relationships of several types[11], and constrained database relations are logical relationships (see Part 1). The dual misconception is that the term 'relational' comes from "relationships among tables": it actually comes from mathematical relations (of which database relations are adaptations), and relations are not tables)[14];
  • While Date advocates explicit support of relation variables (relvars) in relational data sublanguages, Codd seems to have intentionally avoided them[3], and we do not subscribe to them for cause[15,16];
  • If and only if, per our 5NF assertion, a relation represents a single group of entities of the same type, the FDs of non-key attributes on the PK are the only dependencies that hold in the relation;
  • Orders and Customers are entities of distinct types, usually M:1 relationships have been traditionally represented by referential constraints (we recommend an association relation instead). A referential constraint is distinct from a FD constraint, which represents an intra-group relationship among all its members[11] (see above);
  • FKs are attributes, not constraints; for each FK there is a referential constraint on the referenced and the referencing relations[17] (we have no idea what "as high as you get as an attribute" means);
  • "Full of relationships" is so much handwaving -- we have specified the several types of relationships supported by the RDM[11].
  • Constraints are part of the integrity, not structure component of the data model. Attributes (representing entity properties) are part of structure (so, what does "structure (relationships) is of higher importance than contents (the list of properties)" mean?;
  • To reiterate, a tool visualizing a logical model must support a data model, but is neither the model being visualized, nor the data model employed to create it.

(Continued in Part 4)




Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.



References

[1]  Pascal, F., Natural, Programming, and Data Language.

[2]  Pascal, F., Understanding Domains and Attributes.

[3]  McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming.

[4]  Pascal, F., Interpreting Codd: LOGIC FOR SERIOUS DATABASE FOLKS.

[5]  Codd, E. F., Data Models in Database Management, Proceedings of the 1980 workshop on Data abstraction, databases and conceptual modeling archive, 112-114, 1980.

[6]  Pascal, F., Conceptual Modeling for Database Design: Formalizing the Informal.

[7]  Pascal, F., Object Orientation, Logic and Database Management.

[8]  Meyer, B., UML: The Positive Spin.

[9]  Pascal, F., Data Model: Neither Conceptual, Nor Logical, Nor Physical Model.

[10]  Pascal, F., Graph Databases They Who Forget the Past...

[11]  Pascal, F., Relationships and the RDM Parts 1-3.

[12]  Codd, E. F., Normalized data base structure: A Brief Tutorial.
SIGFIDET '71 Proceedings of the 1971 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control, 1-17, 1971.

[13]  Meyer, B., Soundness and completeness with precision.

[14]  Pascal, F., Understanding Relations Parts 1-3.

[15] Pascal, F., C. J. Date and D. McGoveran On View Updating.

[16] Pascal, F., Class, Type, Set, Relvar, and Relation.

[17] Pascal, F., Foreign Keys Parts 1,2.

No comments:

Post a Comment