Friday, January 21, 2022

READ MY LIPS: IF THERE'S NULLs, IT'S NOT RELATIONAL



“Let's say I want to store a list of movies that are stored on iTunes. For simplicity, we'll just store a few fields so that the film Avatar has these values:
ID: 354112018
Name: Avatar
Year: 2009
Synopsis: "From Academy Award®-winning director James Cameron comes Avatar, the story..."
However, sometimes the Synopsis is missing...and sometimes the Year is missing. Without giving it a second thought, I would probably create one table to store those four fields, something like this:
ID (INT)
Name (VARCHAR)
Year (INT NULL)
Synopsis (VARCHAR NULL)
Is there any advantage in 'further normalizing' the database so that, for example, I don't store any null values, such as:
Title
 TitleID
 Name

TitleSynopsis
 TitleID
 Synopsis

TitleYear
 TitleID
 Year
To me it seems like doing this would potentially create hundreds of extra tables (on a large database) and make inserts a nightmare -- I suppose a View could be created to flatten out the results so it's queryable, but even though I feel like it would require so much overhead. So is there any reason in the above case to normalize to remove nulls, or in general, what would be the case to do so, if there ever is one?”  --StackOverflow.com

Fallacies

That we see this in 2022 is testament to abysmal ignorance of fundamentals in the industry. Let's enumerate the fallacies:

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

01/16 OBG: No Understanding without Foundation Knowledge Part 3 -- Debunking an Online Exchange 2

01/08 OBG: No Understanding Without Foundation Knowledge Part 2 -- Debunking an Online Exchange 1

01/01 Schema and Performance: Never the Twain Shall Meet

02/08 Aded the 2021 to POSTS page.

12/17 OBG: No Understanding Without Foundation Knowledge Part 1: Reviewing a Book Review

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

  • SQL tables are not relations, NULLs being one very reason.
  • Missing data (and, therefore, NULLs) have nothing to do with storage.
  • "NULL values" is a contradiction in terms.
  • Avoidance of NULLs has nothing to do with further normalization.
  • Proliferation of relations is not a problem for the correct relational solution to missing data (without NULLs) and those relations are not those referred to.

How about the replies to the the above question?

“Splitting a single table into multilple tables like this is often called "vertical partitioning" so if you search on that you might find some use cases for and against. You would need to outer join to all of these tables which yields a NULL anyway. Personally this looks like it's overcomplicating things. If you had zillions of fields and lots were NULL, you could use various "sparse" implementations in various databases.”

  • Vertical partitioning (via relational projection) of relations should not be confused with correct relational design that is NULLless.
  • Outer join is not a relational algebra operation -- the correct relational design does not  necessitate it.
  • By implementing NULL and generating it on their own, all SQL DBMSs are not truly relational and do not support the relational solution to missing data.

“Dozens of published academic information modeling & DB design textbooks are online free in pdf & html. Read many. The most cogent information modelling method is Halpin's ORM2 (Object-Role Modelling), child of Nijssen's NIAM "fact based" method. ORM2's main book IM & relational DBs (2 editions) have chapters mapping it to other main methods (ER & pseudo- pre- ER. Presentations of the relational model per se are generally poor. Read Darwen's free intro to relational DB theory. stackoverflow.”

  • An overwhelming majority of the published material is wrong.
  • As a conceptual modeling approach, NIAM/ORM2 does not address missing data, which is a logical database design aspect.
  • Ditto for E/RM, which is problematic for other reasons.
  • Date's & Darwen's interpretations of the RDM differs from ours (McGoveran's) and does not support the latter, nor are we aware that they offer an alternative. 

“Whether to normalize your data, or use null, can be considered separate questions, which do not affect each other. Reasons for normalization can be to avoid redundancy (duplicate data), ensure only related information is in a particular table, simplify queries or avoid data modification issues. Whether to store NULL values or not, is more a matter of whether it makes sense for a value to be NULL in your data, rather than being a default value. For example, if it would be more correct to specify that data does not exist, rather that provide some default value.”

  • Designs with NULLs are not relationally correct andnormalization is not the correction.
  • Normalization (to 1NF) does not avoid redundancy and is relationally necessary but insufficient.
  • It never "makes sense for a value to be NULL" and default values are not a solution to missing data.

Debunking - References

David McGoveran and myself have written extensively about the critical problems with any multi-valued logic  (MVL) approaches, including NULL -- SQL's messy implementation of 3VL and the relationally correct solution to missing data without NULLs and I will not repeat it here. Instead, we refer the reader who does not but desired to understand the above to the pertinent writings. It's fair to say that you will not find this information anywhere else.

Note: Good intentions notwithstanding, Codd's 4VL was a blunder and might have served as inspiration for NULL, but the two should not be confused.

The Final NULL in the Coffin: A Relational Solution to Missing Data  (paper description)

Missing Data -- Many-valued Logics and NULL series 

NULL Value is a Contradiction in Terms 

 Missing Data -- Horizontal Decomposition (4 part series): my critique of Date's and Darwen's proposed solution to missing data without NULLs (that we believe they dropped)

THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER: my last book has a 'Missing data (NULL, many-valued logics)' section (book's description and excerpts from preface and David McGoveran's foreword

Nothing from Nothing (4 part series): McGoveran's  formal logic exposition of the problematics of multi-valued logics.

Up to a Point, Lord Copper: McGoveran's (with C. J. Date and H. Darwen) debunking of Tom Johnston's series of articles in support of the use of MVL and NULLs in SQL databases.

 

 

 

 

No comments:

Post a Comment

View My Stats