Thursday, August 20, 2020

TYFK: Relations, Tables, Domains and Normalization

Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:

  • Data is stored in tables called relations.
  • Relations can be normalized. In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same domain.”

DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.



No data model compliant with Codd's definition thereof -- a theory-based combination of structure/integrity and manipulation -- other than the RDM has been formalized (those who pretend otherwise, specifiy -- precisely, please! -- the components of any such model and the theory on which it is based).

It is the dual theoretical grounding of the RDM -- simple set theory (SST) expressible in first order predicate logic (FOPL) -- that makes it "scientific", namely it enables a relational database system -- a RDBMS and properly designed database -- to ensure correctness, as well as a host of other advantages.

Most practitioners believe SQL DBMSs are relational, but they are not; neither are databases designed as required by the RDM. While SQL was intended to be a relational data sublanguage, it violates the RDM, it is not just a data sublanguage, and is a poorly designed language. 

While a database relation can be displayed as a R-table on some physical medium -- tuples as rows, attributes as columns -- it is not a table: the row-column arrangement on the medium plays no part in the RDM.

The RDM is mute on how relation data is represented in storage -- any physical structure is permitted, provided it is not exposed to applications and users -- which is referred to as physical independence (PI).

Note: A SQL table, while not a relation, is a logical structure, the data of which may be stored (base) or not (view). A direct image representation as an indexed sequential file is  common, but the RDM neither mandates, nor prohibits it, or any other implementation. SQL DBMSs, however, usually restrict to it.

“There are many physical structures that could be used to represent a logical relation in storage (e.g., a linked list with multiple orderings, or an indexed file storing the join of the relations, all maintained transparently by the DBMS), but products do not offer such choices -- only one is usually possible (essentially one or two flat files, possibly ordered by some key in SQL DBMSs). Because of this 1:1 mapping of a logical model to one specific physical representation, practitioners end up modifying the former and manifest a change in the latter, creating the (false) impression that denormalization improves performance.” --David McGoveran
Industry misconceptions nowithstanding, a database relation is by definition in both first normal form (1NF) and fifth normal form (5NF), otherwise it is not a relation and all bets are off.
  • 1NF means attributes draw their values from simple domains, the values of which are treated as atomic ("non-decomposable") by the data sublanguage (i.e., even if they have internal structure/components, they are not accessible to the language).
  • 5NF means the only attribute dependencies that hold in a relation are functional dependencies of the non-key attributes on the primary key (PK) (informally, the relation represents (facts about) a group of entities of a single type).
5NF implies 1NF. 

A relation attribute
represents either a property that describes entities (in which case it is non-key), or a name that identifies (i.e., references) them (in which case it is a PK).

The Debunking

  • The popular model is the "SQL model", confused with RDM, which it is not (and, strictly speaking, is not even a proper data model by Codd's definition); no other Codd-compliant data models have been formalized).
  • The RDM is based on SST expressible in FOPL; its structure/integrity is the database semantically constrained relation, which is displayable as -- but is not -- a R-table; tables play no part in RDM.
- tuples display as rows and each has a unique PK value;
- attributes display as columns -- each draws its values from a simple domain.
  • Data in relations are not stored in tables, which are a logical -- not physical storage -- structure; a direct image representation as a sequential file is the storage commonly implemented by SQL DBMSs.
  • By definition relations are not only normalized (in 1NF) -- attributes are defined on simple domains with non-decomposable values -- but also fully normalized (in 5NF).

No comments:

Post a Comment

View My Stats