Sunday, April 30, 2023

RELATIONSHIPS AND THE RDM V2 Part 3: SEMANTIC CONSTRAINTS



Note: This is a multipart re-write of a previous series that, when completed, is intended to replace it.

In Part 1 we documented the differences between mathematical and database relations (see table in Part 1). We attributed the fallacy that the RDM can express only one type of relationship -- between relations using FKs -- to the industry being unaware of the adaptation of math relations for database management. We intimated that some of the additional features of database relations express relationships other than between relations.

In Part 2 we identified the intra-group c-relationships (and the corresponding within-relation l-relationships) in our approach to conceptual modeling:

  • Properties-entities relationships

- general dependencies

  • Properties Relationships
  • Entities Relationships

- entity uniqueness
- functional dependencies (FD)
- entity supertype-subtypes relationships

and used a simple conceptual model (CM) of six entity groups to illustrate them:

Customers (cID, cname, FICO, discount)
Products (pID, pname, price)
Salesmen (sID, sname, sales, salary, commission)
Orders (oID, pID, cID, sID, date, amount)
Order Items (oID, iID, pID, quantity)

Database design is the use of a data model (DM) (here, RDM) to formalize conceptual models (CM) -- including c-relationships -- as logical models (LM) for database representation, so it must be able to convert the business rules (BR) that express those relationships in specialized natural language at the conceptual level to formal constraints in a FOPL-based data sublanguage at the logical level.

Our intention is to demonstrate that the RDM can express all c-these relationships, but we face a difficulty.

Sunday, April 23, 2023

THE DENORMALIZATION ILLUSION (t&n)



Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the disregarded evolution/progress of RDM, I am re-visiting my old debunkings, bringing them up to the current state of knowledge. This will enable you to judge how well arguments have held up and realize the increasing gap between industry stagnation --  and scientific progress.

DENORMALIZATION, PERFORMANCE & INTEGRITY

(Email exchange with reader originally published August 2002)

Then ...

“I'd like to comment on your other recent articles: on denormalization. Of course you prove that denormalization does not improve performance, because you pay for it by maintaining integrity. But, when people say that de-normalization improves performance, they usually mean just on one side. For example, I can merge DEPT and EMP tables into a third table DE and achieve a better query performance by replacing a join by a simple select from the new table. If this is the most frequent and most important operation in my application (vs. updates, inserts, deletes), then my overall performance will be improved (and that's what usually happens in DW). But if the opposite is true, then performance will suffer. I didn't see these considerations in your articles ...

Many people, yes, but not nobody. I always considered the cost of denormalization. I know many people in this field that do the same; however, I do agree with you that many people, especially those "younger" ones learning from more "modern" books on database design, especially those in the OO field, are not aware, and what's worse, don't even want to be aware.

That's exactly how I always thought and when I had discussions with people, that's what I always said to them (not that it made a big difference in their thinking). However, when I read your articles on this topic, I had another thought. As you always say (and again, I fully agree with you on this), we must always separate logical and physical. I always considered denormalization as one of the things done at the physical level. So, denormalization shouldn't even be your concern, because it has nothing to do with the relational model. The rule I always follow is that whatever I do at the physical level, it should not destroy my logical model, which must stay normalized. If I denormalize to achieve some performance gains for a selected set of functions, then I do pay for it by writing additional logic to preserve the integrity and by creating views that represent the entities on my logical model, which I had to "destroy". So as long as I separate these two levels, I don't think I'm in any conflict with the relational model. Of course if DBMS gave me more options in physical design while protecting the integrity of my logical model, I wouldn't have to do this myself.

Theoretically, I think the way you do, and that's why I enjoy reading your columns. But I also have to deliver practical results to my users. Unfortunately, I can't go to my users and tell them that their response time is slow because of Oracle's technology. And I don't believe screaming at Oracle will do me any good either (and yes I know what you will say to this). So until that mysterious technology you mentioned many times is implemented, I have to do what I can.”

Sunday, April 16, 2023

RELATIONSHIPS & THE RDM V2 PART 2: INTRA-GROUP RELATIONSHIPS



 

Note: This is a multi-part complete re-write of a previous series which, when completed, is intended to replace it.

In Part 1 we attributed the fallacy that the RDM can express only one type of relationship -- between relations, using FKs -- to practitioners being unaware of the adaptation of math relations to database management and missing the additional features of database relations. We documented the differences in features between math and database relations (see the table in Part 1) and intimated that some of the additional features express relationships other than those between relations using FKs (which we leave out in this discussion).

In this Part 2 we identify the c-relationships and use a simple conceptual model (CM) of six entity groups:

Customers (cID, cname, FICO, discount)
Products (pID, pname, price)
Salesmen (sID, sname, sales, salary, commission)
Orders (oID, pID, cID, sID, date, amount)
Order Items (oID, iID, pID, quantity)

to illustrate them (to recall, we prepend 'relationship' with c- and l- when we use the term at the conceptual and logical levels, respectively).

Saturday, April 8, 2023

MISSING DATA: RDM VS SQL -- A REAL WORLD COMPARISON (t&n)



Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the disregarded evolution/progress of RDM, I am re-visiting my old debunkings, bringing them up to the current state of knowledge. This will enable you to judge how well arguments have held up and realize the increasing gap between industry stagnation -- and scientific progress.

(This a revised version of an earlier post with clarity improvements).

Q: “What would you suggest for a datetime field where the value is not known and should therefore be not-applicable?”
A: ”NULL sounds good to me.”
While searching through records I came across an old consulting project involving the migration of a neo-natal research database from Focus -- an old hierarchic DBMS -- used to record extensive details about hundreds of monthly births at a university hospital for more than 20 years. The person who had designed the Focus database was the only one who knew and understood its complexity sufficiently to maintain it. Each time a researcher needed some subset of data to analyze, he would extract it and serve it upon request. Aside from the inefficiency of the process, the person was retiring at a time when hierarchic DBMSs reached the end of their usefulness, Focus experts were already few and expensive and "relational" (read: SQL) was the dominant fad. 

View My Stats