Saturday, November 30, 2019

TYFK: 5NF, Association Relations and Join





Assume a conceptual model of a multigroup consisting of two related entity groups, Customers and Orders, where a customer can issue multiple orders. The conventional logical database design is:
CUSTOMERS
===============================================
| CID | NAME     | AGE | ADDRESS   | SALARY   |
-=====-----------------------------------------
|   1 | Ramesh   |  32 | Ahmedabad |  2000.00 |
|   2 | Khilan   |  25 | Delhi     |  1500.00 |
|   3 | Kaushik  |  23 | Kota      |  2000.00 |
|   4 | Chaitali |  25 | Mumbai    |  6500.00 |
|   5 | Hardik   |  27 | Bhopal    |  8500.00 |
|   6 | Komal    |  22 | MP        |  4500.00 |
|   7 | Muffy    |  24 | Indore    | 10000.00 |
-----------------------------------------------

ORDERS
===================================
| OID | DATE       | CID | AMOUNT |
-=====-----------------------------
| 102 | 2009-10-08 |   3 |   3000 |
| 100 | 2009-10-08 |   3 |   1500 |
| 101 | 2009-11-20 |   2 |   1560 |
| 103 | 2008-05-20 |   4 |   2060 |
-----------------------------------
where ORDERS.CID is an "embedded" foreign key (FK) referencing the primary key (PK) CUSTOMERS.CID.

Consider the query "For all orders, find the CID, name, OID, amount, and date" that applies a join of the two relations on CID. In SQL:

SELECT c.cid,c.name,o.oid,o.amount,o.date
FROM customers c
INNER JOIN orders o
ON c.cid = o.cid;
with the result displayed by the table:
====================================================
| C.CID | C.NAME   | O.OID | O.AMOUNT | O.DATE     |
-=======------------=======-------------------------
|     2 | Khilan   |   101 |     1560 | 2009-11-20 |
|     3 | Kaushik  |   102 |     3000 | 2009-10-08 |
|     3 | Kaushik  |   100 |     1500 | 2009-10-08 |
|     4 | Chaitali |   103 |     2060 | 2008-05-20 |
----------------------------------------------------
Note: A table is just a tabular display of a relation and the two should not be confused[1,2]. Bear in mind that SQL tables are not relations.

It may surprise you to know that both the design and the result are problematic from a relational standpoint.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE 

Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you. 

NEW

  • 10/26/19: The POSTS page now links to all 2012-2018 posts (to be updated annually at year-end). Except for 2017, the (italicized) links are to abstracts of my columns @All Analytics site, which was discontinued (see below).
  • 10/26/19: Updated and cleaned up the WRITINGS page.
  • 08/09/19: Following my series of posts on data sublanguage (Parts 1-4), I have revised for consistency the corresponding section of paper #2 in the Understanding the Real RDM series, Logical Access, Data Sublanguage, Kinds of Relations, and Database Redundancy and Consistency, which is available for ordering from the PAPERS page.

LATEST PUBLICATIONS (order PAPERS and BOOKS) 

USING THIS SITE 

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics site, the links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA 

I deleted my Facebook account. You can follow me:

  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

Association Relations vs. "Embedded" FKs


In conventional database design many-to-one (M:1) relationships among groups such as the one between ORDERS and CUSTOMERS are represented by "embedded" FKs, while many-to-many (M:N) relationships are represented by association relations. But M:1 is a special case of M:N where N=1 and, as we explained elsewhere, this difference in representation is arbitrary -- an accident of history that lacks formal justification; there are actually advantages to representing M:1 relationships by association relations too[3] -- here's an instance where it could be helpful.


In which case the base design should be:


CUSTOMERS
===============================================
| CID | NAME     | AGE | ADDRESS   | SALARY   |
-=====----------------------------------------- 
|   1 | Ramesh   |  32 | Ahmedabad |  2000.00 |
|   2 | Khilan   |  25 | Delhi     |  1500.00 |
|   3 | Kaushik  |  23 | Kota      |  2000.00 |
|   4 | Chaitali |  25 | Mumbai    |  6500.00 |
|   5 | Hardik   |  27 | Bhopal    |  8500.00 |
|   6 | Komal    |  22 | MP        |  4500.00 |
|   7 | Muffy    |  24 | Indore    | 10000.00 |
-----------------------------------------------
ORDERS
=============================
| OID | DATE       | AMOUNT |
-=====-----------------------
| 102 | 2009-10-08 |   3000 |
| 100 | 2009-10-08 |   1500 |
| 101 | 2009-11-20 |   1560 |
| 103 | 2008-05-20 |   2060 |
-----------------------------
CUSTOMER_ORDERS
=============
| OID | CID |
-===========-
| 101 |   2 |
| 100 |   3 |
| 102 |   3 |
| 103 |   4 |
-------------

Fifth Normal Form, Join, and Relational Closure


According to the conventional understanding of the RDM, a relation is in at least first normal form (1NF)[4]: it has only attributes drawn from simple domains
[5] (i.e., no "nested relations"). This is the formal way of saying that a relation represents at the logical level a group at the conceptual level that has only individual entities -- no groups thereof -- as members. 1NF is required for system-guaranteed logical validity (i.e., query results follow provably from the database (the facts represented by result relation tuples are logical implications of the facts represented by the tuples of the base relations in the database); but as we also explained, semantic consistency of the result with the conceptual model requires that relations are by definition in both 1NF and fifth normal form (5NF)[6]. Formally, the only dependencies that hold in a 5NF relation are functional dependencies (FD) of non-key attributes on the PK (there is 1:1 relationship between each non-key value and the corresponding PK value). This is the formal way of saying that a relation represents facts about a group of entities of a single type (no "bundling")[7].

Note: According to an yet unproven McGovern conjecture, the three design principles required by the theoretical foundation of the RDM (SST/FOPL) jointly imply the Principle of Full Normalization (POFN)[8]. Otherwise put, designs adhering the principles produce fully normalized databases -- explicit normalization and further normalization are required only to repair poor designs[7,9].

This means that to ensure relational closure under the POFN mandate users must design 5NF base relations, and the relational algebra (RA) operations, join included, must derive 5NF relations.

In our example, the CUSTOMERS and ORDERS base relations are in 5NF. But what about result of the join? It bundles facts about entities of two types, non-key attributes are FD on components of the composite FK and, thus, not in 5NF and not a relation.

Note that this is not a SQL flaw: the join operation as currently defined does not preserve relational closure under the POFN mandate! A critical implication is that updating join views might be anomalous even when the base relations from which they are derived are in 5NF and, thus, present no anomaly risk of their own, defeating logical independence[10,11].


Test Your Foundation Knowledge


For a comprehensive explanation of (and a proposed solution for) this problem -- and others -- you will have to wait for David McGoveran's book[12].

But in the meantime, given (1) the revised design and (2) the POFN mandate, what do you think should the relational result of the above query be? (for a hint check out[13]).


References

[1] Pascal, F., What Relations Really Are and Why They Are Important.


[2] Pascal, F.,  Understanding Relations series.

[3] Pascal, F., Fourth Order Properties series.

[4] Pascal, F., First Normal Form in Theory and Practice series.

[5] Pascal, F., Simple Domains and Value Atomicity.

[6] Pascal, F., Relational Database Design, Logical Validity, and Semantic Consistency.

[7] Pascal, F., Normalization and Further Normalization.

[8] Pascal, F., Database Design: What It Is and Isn't.

[9] Pascal, F., The Costly Illusion: Normalization, Integrity, and Performance.


[10] Pascal, F., On View Updating (C. J. Date and D. McGoveran).

[11] McGoveran, D., Can All Relations Be Updated?


[12] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming.

[13] Pascal, F., The Last NULL In the Coffin: A Relational Solution to Missing Data.




No comments:

Post a Comment

View My Stats