Saturday, May 4, 2019

Understanding Data Modeling Part 4: Fact Modeling




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the "data models" mess in the industry that Friesendal has tried to catalog. In Part 2 we applied this knowledge to the first two industry "data models" considered by Friesendal, the E/RM and the RDM. In Part 3,  we applied it to OO/UML and (a yet formally undefined) "GDM". Here we apply it to fact modeling (FM).

Fact Modeling


“... another school of modelers working with "fact modeling". Their approach is not new. It goes back to the 70's, where Eckhard Falckenberg and Sjir Nijssen started working on the approach (in parallel). Fact Modeling was known for many years as Object-Role-Modeling (ORM), and it was supported by the popular Visio diagramming tool at the time that Microsoft bought the company behind Visio. I like Nijssens name “Binary Relationship Modeling” a lot and it has been in the back of my head since the early 80's. Fact Modeling is definitely at the right level (concepts and their relationships), but it also contains all of the logic details required for formal, precise specifications. The visual syntax goes back to: Nijssen, G.M. and T.A. Halpin, Conceptual Schema and Relational Database Design — A fact oriented approach, Prentice Hall 1989.”

Sunday, April 28, 2019

Understanding Data Modeling Part 3: OO/UML, and "Graph Data Models"




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's "data modeling" mess that Friesendal has tried to catalog. In Part 2 we applied this knowledge to the first two modeling approaches considered by Friesendal, the E/RM and RDM. We apply it here to other two, OO/UML and "GDM".


Object Orientation and Unified Modeling Language


“A "counter revolution" against the relational movement was attempted in the 90’s. Graphical user interfaces came to dominate and they required advanced programming environments. Functionality like inheritance, sub-typing and instantiation helped programmers combat the complexities of highly interactive user dialogs. The corresponding Data Modeling tool is the Unified Modeling Language ...”

Saturday, April 20, 2019

Understanding Data Modeling Part 2: "E/RM" and "RDM"




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's modeling mess that Friesendal has tried to map. We now proceed to apply it to the various industry "data models" considered by Friesendal, and his understanding thereof. In this part, we apply this knowledge to the first two industry "data models" considered by Friesendal -- the E/RM and RDM.


"Entity-Relationship Model"


“One of the first formal attempts at a framework for Data Modeling was the Entity-Relationship data model paradigm proposed [in 1976] by Peter Chen. Notice that in the original Chen-style, the attributes are somewhat independent and the relationships between entities are named and carry cardinalities ("how many" participants in each end of the relationship) ... Attributes are related to their "owner" entity" in what other people called "functional dependencies".”

Sunday, April 14, 2019

Understanding Data Modeling Part 1: Models, Models Everywhere, Nor Any Time to Think




“... I needed to know what the constituent parts of data models really are. Across the board, all platforms, all models etc. Is there anything similar to atoms and the (chemical) bonds that enables the formation of molecules? My concerns were twofold ... I wanted a simple, DIY-style, metadata repository for storing 3-level data models -- what would the meta model of such a thing look like? -- [where] atomicity is of essence ... I took a tour (again) in the Data Modeling zone, trying to deconstruct the absolutely essential metadata, which data modelers cannot do without.”
--Thomas Friesendal, The Atoms and Molecules of Data Models, Dataversity.com

All data models? 3-level data models? Platforms? Hhhmmmm!

Wednesday, March 27, 2019

Graph Databases: They Who Forget the Past...




Out of the plethora of misconceptions common in the industry[1], quite a few are squeezed into this paragraph:
“The relational databases that emerged in the ’80s are efficient at storing and analyzing tabular data but their underlying data model makes it difficult to connect data scattered across multiple tables. The graph databases we’ve seen emerge in the recent years are designed for this purpose. Their data model is particularly well-suited to store and to organize data where connections are as important as individual data points. Connections are stored and indexed as first-class citizens, making it an interesting model for investigations in which you need to connect the dots. In this post, we review three common fraud schemes and see how a graph approach can help investigators defeat them.
--AnalyticBridge.DataScienceCentral.com

Relational databases did not emerge in the 80s (SQL DBMSs did);
  • There is no "tabular data" (the relational data structure is the relation, which can be visualized as a table on a physical medium[2], and SQL tables are not relations);
  • Analysis is not a DBMS, but an application function (while database queries, as deductions, are an important aspect of analysis, and computational functions can be added to the data sublanguage (as in SQL), the primary function of a DBMS is data management)[3];
  • A data model has nothing to do with storage (storage and access methods are part of physical implementation, which determines efficiency/performance[4]).

Here, however, we will focus on the current revival (rather than emergence) of graph DBMSs claimed superior -- without any evidence or qualifications -- to SQL DBMSs (not relational, which do not exist) that purportedly "make it difficult to connect data scattered across multiple tables". This is a typical example of how lack of foundation knowledge and of familiarity with the history of the field inhibit understanding and progress[5].


Saturday, March 9, 2019

Fourth Order Properties Part 2: Association Relations in Database Design - An Example




Part 1 outlined fundamentals of fourth order properties (4OP) of a multigroup arising from relationships among its group members due to 1:1, M:1, M1:M2, and, generally, M1:M2:M3:...:Mn relationships among the groups' entity members. Fundamentals are commonly missing from database practice, as reflected in the exchange:

“We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation  of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”
“Interesting -- I’d never heard this term before. I’ve heard it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”
It's not really introduced because it's way more towards academic than functional.”
--Reddit.com
Knowledge of the fundamentals would have obviated the question, the ad-hoc terminology, and the answers. Note in particular how -- notwithstanding the conceptual and logical nature of the question -- the first answer typically delves directly into implementation[1]. The second answer does not merit attention, except as indicator of the sad state of the industry.

Given a conceptual model, we shall now compare the database design proposed in the question with (1) conventional industry practice that includes "embedded foreign keys", and (2) assuming a true RDBMS, the unified representation using association relations of not just M1:M2, or, generally, M1:M2:M3:...:Mn 4OP relationships, but also the 1:1 and M:1 special cases[2].


Saturday, March 2, 2019

Fourth Order Properties Part 1: Association Relations vs. Foreign Keys




 “We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation  of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”

“Interesting -- I’d never heard this term before. I’ve hears it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”

“It's not really introduced because it's way more towards academic than functional.”
--Reddit.com

Such questions, and ad-hoc terms like "identifying relationships"[1] come up because practice is driven by intuition and experience (if any), without the benefit of foundation knowledge[2]. Whether practitioners know/like it or not, a database is a formal computable representation of an informal conceptual model[3] and, therefore, data modeling (i.e., logical database design)[4] is impossible without (1) a well-defined and complete conceptual model and (2) a formal data model with which to formalize it as a logical model[5]and the two should not be confused[6]. Otherwise all bets are off.

Here's how foundation knowledge should have informed modeling and design.

View My Stats