DATABASE DEBUNKINGS: Pred

Showing posts with label Pred. Show all posts

Sunday, February 15, 2026

FACTS, ENTITIES, AND BUSINESS RULES

“... In ORM there is no concept of an entity record (tuple), although relational tables can be automatically generated from an ORM model (furthermore, guaranteed to be fully normalized).” --Online comment

“Object Role Modeling (ORM) is a ...a fact-oriented modeling approach for specifying, transforming, and querying information at a conceptual level. Unlike [other modeling approaches] ... fact-oriented modeling is attribute-free, treating all elementary facts as relationships ... In practice, ORM data models often capture more business rules, and are easier to validate and evolve than data models in other approaches. --ORM.net

DATA MUDDLING

Follow @DBDebunk Follow @ThePostWest

Chris Date once published an article at the old DBDebunk titled “Models, Models, Everywhere, Nor Any Time to Think”. If you want to get a hold of what he meant then, you oughta do a search on the title now and see what you get.

The continuous proliferation of models is an indication and measure of the disregard, if not outright hostility of the industry to sound theoretical foundations. It keeps reminding me of a decades-old piece I posted in response to David Hay's critique of Ron Ross's then proposal of a “fact model” (yet another one) as an alternative to data model. It is more relevant than ever, which is why I decided to bring it up to date. The problem is so entrenched and widespread, that even those who try to address it fail to realize that they are victims of it too.

Hay correctly observed:

“In our industry, there is a strong desire to put names on things. This is natural enough, given the amount of information that we have to classify and deal with in our work. To give something a name is to gain control over it, and this is not necessarily a bad thing. The problem is when the name takes the place of true understanding of the thing named. Discourse tends to be the bantering of names, without true understanding of the concepts involved.”

In this industry, many of the names are just re-labeling, whether it fits or not. Here are a couple of exquisite examples of both cases:

“I was amused to read in [Ralph Kimball's] article that my own suppliers and parts database design was "a perfect, beautiful star schema!" When I first learned the term "star schema", my reaction was that a properly designed star schema would be nothing neither more, nor less than a properly designed schema per se (in other words, one that did obey those scientific principles of relational design that do exist). So to see RK say that my schema was in fact a star schema reminded me (I’m afraid) of Peter Chen’s original E/R paper, in which—among other things—he reinvented the concept of domains, but called them value sets, and then went on to analyze the relational model in terms of his own ideas and said “Look, domains are just value sets!” --C. J. Date

Note: Kimball's "star schema" is, of course, not a relational schema, but quite an attempt to avoid it, due to failure to distinguish application views of the database from the database schema.

WHAT MEANING MEANS: BUSINESS RULES, PREDICATES, CONSTRAINTS, AND SEMANTIC CONSISTENCY

Follow @DBDebunk Follow @ThePostWest

“If we step back and look at what RDBMS is, we’ll no doubt be able to conclude that, as its name suggests (i.e., Relational Database Management System), it is a system that specializes in managing the data in a relational fashion. Nothing more. Folks, it’s important to keep in mind that it manages the data, not the MEANING of the data! And if you really need a parallel, RDBMS is much more akin to a word processor than to an operating system. A word processor (such as the much maligned MS Word, or a much nicer WordPress, for example) specializes in managing words. It does not specialize in managing the meaning of the words ... So who is then responsible for managing the meaning of the words? It’s the author, who else? Why should we tolerate RDBMS opinions on our data? We’re the masters, RDBMS is the servant, it should shut up and serve. End of discussion.” --Alex Bunardzik, Should Database Manage The Meaning?

METALOGICAL PROPERTIES Part 2: Assertion Predicate

Follow @DBDebunk Follow @ThePostWest

In Part 1 we introduced in the conceptual model (CM) the metalogical designation property. It represents—in the absence of known shared defining properties of an entity type, the designation by a group's definer that an entity identifier (aka assigned name) or property value is a member of the group. Such a group is not a group of entities, but a group of name and property values. In the logical model (LM), it is formalized as a designation predicate (DP) and defines a domain.

In Part 2, we introduce the metalogical assertion property. It represents the assertion by an authorized database user that a specific entity, represented by a tuple, either does or does not correspond to an actual entity in the real world.

METALOGICAL PROPERTIES PART 1: Designation Property

Follow @DBDebunk Follow @ThePostWest

with David McGovern

One purpose of our contributions here is to suggest a vocabulary that avoids confusion not just within the formal logical level, but also between conceptual and logical terminologies, which is widespread in the industry and is exacerbated by limitations of natural language (NL). We use the following terminology in our approach to conceptual modeling:

Objects are:

- Primitive (basic entities);

- Compound:

- groups of related entities;

- multigroups (groups of related groups);

Properties are:

- Individual (of basic entities);

- Collective:

- Of groups: relationships among entities within a group;

- Of multigroups: relationships among groups within a multigroup.

Note: It is a McGoveran insight that relationships between objects at a lower aggregate level are properties of the object at the higher aggregate level which the former comprise (LOGIC FOR SERIOUS DATABASE FOLK, forthcoming; see draft chapters) http://www.alternativetech.com/ATpubs_dir.html For classification of properties as first, second, third and fourth order (1OP, 2OP, 3OP and 4OP) see RELATIONSHIPS AND THE RDM Parts 1-3. https://www.dbdebunk.com/2023/03/relationships-and-rdm-v2-part-1.html All such properties can be expressed logically in a FOPL-based relational data sublanguage as constraints, which is beyond the scope of this discussion.

ENTITIES, PROPERTIES AND CODD'S SLEIGHT OF HAND

Follow @DBDebunk Follow @ThePostWest

Note: This a revision of an earlier post

RDM is an application to database management of mathematical relation theory (MRT) consistent with simple set theory (SST) expressible in first order predicate logic (SST/FOPL) that is used to formalize symbolically conceptual models of reality as logical models for database representation.

In RDM a domain can "appear" in multiple relations: the domain represents an abstract property, attributes defined on it represent that property in contexts of specific entity groups that relations represent. For example, attribute SALARY in relation EMPLOYEES represents the property represented by domain MONEY in the context of entities of type Employee and attribute BUDGET in relation DEPARTMENTS represents it in the context of entities of type Department.

PREDICATE LOGIC, SEMANTICS AND RDM (sms)

Follow @DBDebunk Follow @ThePostWest

Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“As I have said many times, if the original relational model had been based on predicate logic and also the semantics and rules of definitions we'd all be better off now. It wasn't. Full stop.”
--Ronald Ross, LinkedIn.com

Assessing such arguments normally requires clarification of what exactly is meant by "the relational model". Ross does refer specifically to the "original" -- which we take to mean that introduced by Codd in 1969-70 -- but given the massive misuse and abuse in the industry, perceptions of it may well be corrupted (Nobody Understands the Relational Model Semantics, Relational Closure and Database Correctness). Moreover, there are many predicate logic (PL) systems and many ways of categorizing them (1st vs n-th order being only one way) -- we assume Ross means RDM is based on none.

DATABASE RELATIONS, TABLES AND SEMANTIC CONSISTENCY

Follow @DBDebunk Follow @ThePostWest

by David McGoveran with Fabian Pascal

Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“In a RDBMS, a table is columned rows, as in you treat individual rows as an actual entity while the columns are its attributes. In an excel tab, you can create a column, but it doesn't have to have all the same data types in that column, nor does one row have to represent one entity. It's more free form ... All in all, RDB is relational because it's column based rows and constrained to that format, while non relational can have free form like an excel. When you have rows that are uniform (constrained to what the column should be), you create entities as tables, and link them through columns to keep track of the relationships.”
--Quora.com

I posted this on LinkedIn as one of my "To Laugh or Cry?" items which, unlike "Setting Matters Right" posts, are beyond debunking. But the exchange that followed made me realize that there is, nevertheless, pedagogical value to it: it expresses something important, but poorly due to author's lack of foundation knowledge.

Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2

Follow @DBDebunk Follow @ThePostWest

with David McGoveran

(Title inspired by Richard Feynman)

In Part 1 we explained that all database relations are, mathematically, relations, but not all relations are database relations, which are in both 1NF and 5NF and we agreed with a statement in a LinkedIn discussion ending as follows: "Update anomalies are not as big of a problem as an algebra where relations aren't closed under join". Unfortunately, update anomalies, closure, and how relational operators were defined are all interrelated and represent an even "bigger problem". Update anomalies are not "bugs", let alone irrelevant, but actually a reflection of that much bigger problem.

In this second part we delve into that problem.

Understanding Relational Constraints

Follow @DBDebunk Follow @ThePostWest

“The data in a relational database is stored in form of a table. A table makes the data look organized. Yet in some cases we might face issues while working with the data like repetition. We might want enforce rules on the data to avoid such technical problems. Theses rules are called constraints. A constraint can be defined as a rule that has to enforced on the data to avoid faults. There are three kinds of constraints: entity, referential and semantic constraints. Listed below are the differences between these three constraints:
1. Entity constraints -- primary key, foreign key, unique, NULL -- are posed within a table and used to enforce uniqueness and to define no value [respectively].
2. Referential constraints -- foreign key -- are enforced with more than one table for referring other tables for analysis of the data.
3. Semantic constraints -- datatypes -- are enforced in a table on the values of a specific attribute and help the data segregate according to its type. Example: name varchar2(30).”
--GeeksforGeeks.com

Before we tackle the main subject, let's get some misconceptions out of the way. As we have explained so many times:

Data is not "stored in a form of a table" -- it can be stored in any number of physical formats, at the discretion of DBMS designers and DBAs. Physical independence is a core advantage of the RDM.
A table does not "make the data look organized". Data is by definition organized -- be it relationally or not -- otherwise it would be random noise not data. A database relation can be visualized as a R-table, but tables do not play any role in RDM.
While some "repetition" (i.e., redundancy) is prevented by constraints (e.g., uniqueness), others are avoided by database design (e.g., 5NF DB relations).

And now to constraints.

TYFK: Facts, Properties, Relationships, Domains, Relations, Tuples

Follow @DBDebunk Follow @ThePostWest

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

A statement from a 1986 book that "Data are facts represented by values -- numbers, character strings, or symbols -- which carry meaning in a certain context" triggered the following response on Linkedin:

“...In contrast, Date and Darwen (2000) say:
Domains are the things that we can talk about.
Relations are the truths we utter about those things.
Thus, the declarative sentence "Fred is in the kitchen." is a fact that links the domains Person[s] and Place[s] with the predicate "is in". The complete relation might be made up of three facts:
Fred is in the kitchen.
Mary is in the garden.
Arthur is in the garden.
This seems to be more precise than the 1986 statement.”

To which the book author responded:

“...back then we did not have the refinement, clarity, nor precision from people like Sjir Nijssen and Terry Halpin regarding facts, or elementary fact sentences, which today you and I know are the bedrock of data modeling. Facts are expressed in sentences (with domains and predicates).”

Unfortunately none of this is sufficiently clear and precise to prevent confusion and it inhibits understanding of the RDM.

RE-WRITE

Follow @DBDebunk Follow @ThePostWest

See: https://www.dbdebunk.com/2023/08/entities-properties-and-codds-sleight.html

Friday, November 27, 2020

OBG: Missing Data -- "Horizontal Decomposition" Part 2

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability of a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.

In Part 1 we re-published a reader's response to "horizontal decomposition" -- Hugh Darwen's How to Handle Missing Information without Using NULLs -- in comparison to our The Final NULL in the Coffin: A Relational Solution to Missing Data). Here's Hugh's response.

TYFK: Misconceptions About the Relational Model

Follow @DBDebunk Follow @ThePostWest

“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:

Data is stored in tables called relations.

Relations can be normalized, [in which case] values saved are atomic values.

Each row in a relation contains a unique value.

Each column in a relation contains values from a same domain.”

Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

TYFK: What Is A Database Relationship?

Follow @DBDebunk Follow @ThePostWest

Note: This is a re-write of an earlier post. About TYFK posts (Test Your Foundation Knowledge) see the post insert below.

“Here two or more table[s] are related with each other. This is Database relationship. Database relationship is used a lot ... [in] relational database management systems ... shortly called RDBMS. Here is Join_data [sic] table and Interview_data table. For creating a relational database management system both of the table[s] must have a common field. Here Employee_ID is a common field ... Database relationship types: One-To-One relation, One-To-many relation, Many-to-many relation. Minimum one common field is essential in all the tables. The data type of common field and field size will be same in all the tables.”

First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Fourth Order Properties Part 1: Association Relations vs. Foreign Keys

Follow @DBDebunk Follow @ThePostWest

“We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”

“Interesting -- I’d never heard this term before. I’ve hears it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”

“It's not really introduced because it's way more towards academic than functional.”

--Reddit.com

Such questions, and ad-hoc terms like "identifying relationships"[1] come up because practice is driven by intuition and experience (if any), without the benefit of foundation knowledge[2]. Whether practitioners know/like it or not, a database is a formal computable representation of an informal conceptual model[3] and, therefore, data modeling (i.e., logical database design)[4] is impossible without (1) a well-defined and complete conceptual model and (2) a formal data model with which to formalize it as a logical model[5]and the two should not be confused[6]. Otherwise all bets are off.

Here's how foundation knowledge should have informed modeling and design.

Understanding Conceptual vs. Data Modeling Part 4: Properties-object Modeling

Follow @DBDebunk Follow @ThePostWest

Revised 6/26/19.

In Part 1 and Part 2 we explained that when the RDM (1969-70) and the E/RM (1976) were introduced, there was no distinction between a conceptual and a logical level -- the conceptual-logical-physical distinction of levels of representation emerged in mid 80s. Only in 1980 did Codd specify three components of a formal data model -- structure, integrity, manipulation. While the RDM satisfies the specification, the E/RM does not: it is a conceptual modeling approach, weaknesses of which have been elaborated elsewhere[1]. In Part 3 we presented a common example of conceptual-logical conflation (CLC), and corresponding confusion of types of model (conceptual, logical, physical, and data).

As promised, here we outline a new conceptual modeling approach derived by David McGoveran from his work formalizing Codd's RDM. It makes an ontological commitment different from that by conventional modeling, which requires revision and extension of the RDM -- an objective of David's effort.

Understanding Conceptual vs. Data Modeling Part 3: Don't Conflate Reality and Data

Follow @DBDebunk Follow @ThePostWest

In Part 1 and Part 2 we explained that between 1975-81, when the E/RM and RDM were introduced, there was no distinction between an informal conceptual and a formal logical level. In 1980, however, Codd defined a formal data model and in the later 80s the conceptual-logical-physical levels of representation emerged. If applied to the two models:

Only the RDM satisfies the definition;
The E/RM can be used at the conceptual level to model reality, the latter can be used to model data at the logical level (i.e., formalize conceptual models as logical models for database representation).

Current practitioners, however, continue to confuse levels of representation and confuse/conflate types of model. So much so, that in my presentations I used to draw an imaginary line dividing the room into two sections, and move to the right section to discuss one level/model, and to the left section to discuss another.

Consider the question "does data modeling slow down an application development process?". I will set aside the notion of "speeding up" application development by skipping altogether "data modeling" (whichever way it is meant), and focus on the response.

Understanding Conceptual vs.Data Modeling Part 2: E/RM Models Reality, RDM Models Data

Follow @DBDebunk Follow @ThePostWest

Re-write 10/17/18
Revised 11/1/18

In Part 1 we explained that when the RDM and the E/RM were introduced, the distinct conceptual-logical-physical levels of representation had not yet emerged, and a data model had not yet been formally defined. But in 1980 Codd defined a formal data model as a combination of (1) data structures, (2) integrity constraints, and (3) operators on the structures[1], and later on the three-fold trinity of levels came into being. Given a conceptual level distinct from the logical, do the RDM and the E/RM satisfy the definition -- are they data models in today's terms?

Recall from Part 1 that the RDM has all three components and is defined in purely logical terms, so it is a data model. But the E/RM definition intermingles conceptual and logical terminology, and therefore is not consistent with two distinct levels. Moreover, as a data model E/RM is incomplete:

“The E/RM is not a data model as formally defined by Codd: no explicit structural component except sets classified in various ways, no explicit manipulative component except implied set operations, and very limited integrity (keys).”

--David McGoveran

Contrary to claims, Date does not exactly say that the E/RM is a data model:

“[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested).”[2]

Note that even if, charitably, the E/RM is considered a data model, it is not up to the RDM.

RE-WRITE

Follow @DBDebunk Follow @ThePostWest

See: https://www.dbdebunk.com/2018/09/designation-property-and-assertion.html

POSTS

Sunday, February 15, 2026

Monday, February 2, 2026

Sunday, January 25, 2026

Monday, February 5, 2024

Tuesday, January 9, 2024

with David McGovern

Thursday, August 17, 2023

Monday, June 19, 2023

Thursday, August 4, 2022

by David McGoveran with Fabian Pascal

Thursday, November 11, 2021

Saturday, September 4, 2021

Thursday, August 5, 2021

Thursday, June 10, 2021

Friday, November 27, 2020

Sunday, June 28, 2020

Sunday, May 10, 2020

Saturday, March 2, 2019

Saturday, November 3, 2018

Sunday, October 28, 2018

Saturday, September 29, 2018

Tuesday, September 11, 2018