DATABASE DEBUNKINGS

Saturday, August 20, 2022

DATABASE RELATIONS, DATABASE DESIGN & CORRECTNESS (sms)

Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“...The relational model organizes data through relations (aka tables). You then normalize it in one of six forms. By normalizing data you:
- Reduce redundancy
- Ensure consistency
- Optimize for atomic inserts, updates and deletes
The biggest drawback ... are keys that let you join different tables across multiple systems.”

--LinkedIn.com

THE VOCIFEROUS IGNORANCE HALL OF SHAME (t&n))

Follow @DBDebunk Follow @ThePostWest

(First published in 2006)

Note: "Then & Now" (T&N) is a new version of the old "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

(Nothing to add on re-publication.)

There is massive ignorance in data management, but the vociferous kind (VI) is the purview of a special breed that is characterized by one or more of the following:

1. Lack of knowledge and understanding of, and appreciation for data fundamentals in general and relational concepts, principles and methods in particular.
2. Unwillingness to let it stand in the way of pronouncing extensively on the subject.
3. Inability and/or unwillingness to respond to evidence of ignorance and/or to reason.
4. Lack of interest — often admitted — in truth and correctness.
5. Focus on self promotion and appeasement of the industry by riding fads, or telling (uninformed) audiences what they want to hear.

The combination of 1 and 2 characterizes the Unskilled and Unaware of It, 2 is the vociferous part. Frankfurt defined 4 as bullshit. 5 is just self-aggrandizing.

DATABASE RELATIONS, TABLES AND SEMANTIC CONSISTENCY

Follow @DBDebunk Follow @ThePostWest

by David McGoveran with Fabian Pascal

Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“In a RDBMS, a table is columned rows, as in you treat individual rows as an actual entity while the columns are its attributes. In an excel tab, you can create a column, but it doesn't have to have all the same data types in that column, nor does one row have to represent one entity. It's more free form ... All in all, RDB is relational because it's column based rows and constrained to that format, while non relational can have free form like an excel. When you have rows that are uniform (constrained to what the column should be), you create entities as tables, and link them through columns to keep track of the relationships.”
--Quora.com

I posted this on LinkedIn as one of my "To Laugh or Cry?" items which, unlike "Setting Matters Right" posts, are beyond debunking. But the exchange that followed made me realize that there is, nevertheless, pedagogical value to it: it expresses something important, but poorly due to author's lack of foundation knowledge.

MISSING DATA AND MULTI-RELATION QUERY RESULTS (t&n)

Follow @DBDebunk Follow @ThePostWest

Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

On NULLs and Multi-Table Relvars

(first published 04/05/02)

"I had a question about the missing-values suggestion in PRACTICAL ISSUES IN DATABASE MANAGEMENT, page 234. You write:

"Table operations would have to be modified to yield results with as many tables as there are types of propositions with only known values."

How would this be represented in a language like Tutorial D, where relvars are required to be strongly typed? One possible idea is to make use of type inheritance. Suppose I had a domain of tuple values {x,a,b,c} (all integers, say) where x is not allowed to be missing but a, b, and c are allowed to be missing. Suppose we extended the domains of a, b, and c with an "imaginary" special value that we will never represent, which I will show for diagram purposes only as '?'. Then the domain can be split into parts:

XABC {x,a,b,c} possrep: {X: int, A: int, B: int, C: int}
XAB {x,a,b,'?'} possrep: {X: int, A: int, B: int}
XAC {x,a,'?',c} possrep: {X: int, A: int, C: int}
XBC {x,'?',b,c} possrep: {X: int, B: int, C: int}
XA {x,a,'?','?'} possrep: {X: int, A: int}
XB {x,'?',b,'?'} possrep: {X: int, B: int}
XC {x,'?','?',c} possrep: {X: int, C: int}
X {x,’?','?','?'} possrep: {X: int}

Using Mr. Date's specialization by constraint idea, we can inherit all the subtuple types from the main tuple type. Updates could make a tuple change type. A relation of relations of XABC type could be used to return results of a query. Each relation within the relation would contain one subtype.

However, the exponential explosion of possible subtypes would be very difficult to handle, practically speaking. As you admit in your book, a real DBMS might have to handle thousands of small subtables. This cannot be passed off as an "implementation detail" since table operations "yield results" at the user presentation level. No matter how efficient the underlying system might be, this seems unacceptable. Perhaps we have to fall back on default values after all."

RELATIONS, DATABASE RELATIONS AND TABLES (sms)

Follow @DBDebunk Follow @ThePostWest

Q: “What is a relation in database?”

A: “Relational databases were so named in 1970 by computer scientist E. F. Codd because the tables are themselves relations, which is a mathematical term. What makes a relation (aka a table) a relation? Basically:
A relation has a heading, which names a finite set of columns.
Columns are defined by their name and their type.
A relation has a finite set of tuples (aka rows), and every tuple has the same set of columns (i.e. same name and type) as those named in the heading.
Being finite sets, both the set of columns in the heading and the set of tuples in the relation have no duplicates and no inherent order.
See? There’s nothing about relationships between tables in the definition of a relation. You could have a relational database that contains just one relation. If there’s any relationship described in a relation, it’s actually the relationship between the columns within a relation. That is, the value "Pittsburgh" goes with the value "Steelers" on a given row because the relation is defined as "pro football teams by city" and therefore there’s a linkage between some values in the set of football teams and the set of city names.” --Quora.com

REPEATING GROUPS AND 1NF (t&n)

Follow @DBDebunk Follow @ThePostWest

09/19/23: For the latest on this subject see: FIRST NORMAL FORM - A DEFINITIVE GUIDE

“A commonly used example of a table that is not in 2-NF is one with repeated attributes (i.e. child1, child2, child3). However, after examining the definition of 2NF in your book PRACTICAL ISSUES IN DATABASE MANAGEMENT, it seems to me that tables such as these do in fact satisfy 2NF. Am I missing something?” --Reader

ORDER & RELATIONAL DATABASES (sms)

Follow @DBDebunk Follow @ThePostWest

Note: In "Setting Matters Straight" I post on LinkedIn online Q&As that involve fundamentals under the header "What's Right and Wrong with this Database Picture" and then debunk them here. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “I'm not sure what this means: "The order of the rows and columns is immaterial to the DBMS?" -- could anyone explain?”

A: “It means two things:
The engine is under no obligation to insert new rows immediately following the previously inserted row(s)... During processing of selects, the optimizer is free to use any index it finds efficient to use or none at all... For this reason, if the order of returned data is important to your processing, then you must include an ORDER BY clause.”

Q: “How do you reorder fields in the database?”

A: “Depends on how you define "reorder". What view of your data are you trying to set the order. Are you in Table Design view? ... Are you looking at form? The answer is different depending on what you are referring to.”
--Quora.com

NO RDBMS WITHOUT RELATIONAL DOMAINS (obg)

Follow @DBDebunk Follow @ThePostWest

The following is an email exchange with a reader and DBMS designer.

ON DATA TYPES AND WHAT A DBMS IS

(originally published in 2001)

Reader:
"I would like to hear your (or Date's) opinion on The Suneido Database … it seems to me self-contradictory. They aren't typed ... so how can they define operators, or even the idea of domains. They also say they include administrative commands, which as far as I understand isn't allowed in the THIRD MANIFESTO. While they do not claim to be an implementation of the Manifesto, their claims that their database language was created by CJ Date do not sound appropriate."

"They don't know what [domains (distinct from programming data types)] are and what their function in the RDM is. That's common for all DBMS vendors, the claims of which should be always taken with more than a grain of salt."

RELATION PROLIFERATION (sms)

Follow @DBDebunk Follow @ThePostWest

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “How do I avoid too many relations in databases?”

A: “You don’t. Every relation is there to store meaningful data, hopefully you do not define database relations for data that are not to be stored in your database.”

A: “By following proper design principles. Normalization, standard data patterns, and progressing from logical to physical always. Never denormalize (or avoid normalizing in the first place) because performance never trumps accuracy. It really doesn't matter how fast you get the wrong answer.”
--Quora.com

RELATIONAL DATABASES & SET THEORY (sms)

Follow @DBDebunk Follow @ThePostWest

Q: “To what extent is relational database theory related to set theory?”

A: “Relational database theory is indeed closely derived from set theory. Many operations in relational data are directly related to common operations one does with sets. In fact, SQL has keywords for them that should sound familiar to someone who has just taken a class in Discrete Mathematics:
UNION
INTERSECT
DIFFERENCE (called MINUS in Oracle)
Even the structure of a table is set-oriented. A table is a set of rows, and a row is a set of columns, and those columns must match the set of columns defined in the table's header.”

--Quora.com

QUOTA QUERIES (sms)

Follow @DBDebunk Follow @ThePostWest

Note: "Setting Matters Straight" (SMS) is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs).

Q: “How do you return the most recent record in SQL?”

A: “There are many ways of doing it. I would suggest (first thing came to my mind):

Select Top 1
from YourTable
order by TablePrimaryKey Desc;”

A: “If you mean "the last inserted record which has no datetime stamp field" ... you have a few options.
If you cannot use date/time -- your next best bet would be an auto-increment/sequence field, which assigns increasing numbers to each inserted record.
If that’s not available, you would have to rely on business logic e.g. order # or some such.

Some vendors, like Oracle, provide ROWID pseudocolumn for each record which might help in some quick’n’dirty cases -- it is not guaranteed to be sequential but could be (e.g., when table has had no DELETE operations).” --Quora.com

If you don't know, I set matters straight @dbdebunk.com.

KEYS & INDEXES (sms)

Follow @DBDebunk Follow @ThePostWest

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals to encourage readers to test their foundation knowledge, which they can then compare with our debunking here, where we confirm what is correct and correct what is fallacious (with clarifications, wherever necessary). For in-depth treatment check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs).

Q: “What is the difference between a primary key, a unique key, and an index in databases?”

A: “Unique key is a field (or fields) with a set of unique values; the uniqueness is usually enforced with UNIQUE constraint. There might be one or more per table. Every PRIMARY key is always a unique key; there should be only one per table. It uniquely identifies record, and is used to enforce integrity - entity integrity, and, in tandem with FOREIGN key, referential integrity. Index is a data structure to facilitate records search. It might be created on PRIMARY key (best practice), unique key or any other field or combination thereof in the table. The limit on how many indices a table might have is defined in RDBMS implementation. An index might - or might not - speed up some queries.”

A: “The primary key is inherently indexed and unique and is the cross reference to related tables. Often the best primary key is an auto number integer as any value entered by humans is subject to error or delay that can be challenging to manage in the user interface ... whereas an auto number is assigned immediately upfront and eliminates any possible record conflict in tables during multi user entries. A unique key is somewhat of an informal definition. My view is that it is a definition of a field that is not being used as the primary key, but is unique unlike i.e. Last Name -- for instance a social security number field. So it is not the primary key as it is not the field/value being used to cross reference to related tables but it is unique in the table.”

A: “A primary key is a unique, non null value which can identify every tuple (row in the table) uniquely. A unique key/column/constraint ensures that no two rows contain the same value (almost the same as primary key). Unless specified explicitly for the column configuration, a NULL is a valid value for column with unique constraint. A index can be thought of as the appendix at the end of the book. The information is sorted in specific order so that look up is easy and it points to the location that is being searched for.”

--Quora.com

ENTITIES & RECORDS (sms)

Follow @DBDebunk Follow @ThePostWest

Q: “What is the relationship between an entity and a record?”

A: “In the context of a database design, an ‘entity’ is a type or category of persons, places, things or events. It’s a collectivisation of the nouns in a system about which you wish to keep data. For example, Employee might be the name of an entity in your system. A ‘record’ is a collection of data about a specific entity, a particular person or place, an identifiable thing, or a single event. For example, Name: ‘Dave Voorhis’, StartYear: 2019, Salary: £1,398,293 might be a record of one Employee entity in your system.”

A: “Database, file, and recordset are basically the same thing. They are collections of information or data. Each database or file or recordset typically has some sort of common purpose or definition. Like a database (relational, hierarchical, etc.) of data of a business process. A File is again a collection of data such as all transactions to be posted. A recordset is also basically a file.

Entity and table are basically the same thing. While you have the grouping of all the data, and entity (logical view) and a table (physical view) are the same. As Dave said, it is a logical grouping of a specific piece of data.

File, recordset, record, row or line are basically the same. A .csv file is a grouping of records. A file is a grouping of records. A row is an individual grouping of data from a relational database.

The last is element or attribute or field. This is the individual piece of data like Transaction_Amount or First Name.”
--Quora.com

A simple and the answer oversimplifies. But things seem simple only in the absence of foundation knowledge. Practitioners use different terms for the same thing, or the same word for different things, but that must be corrected, not accepted or validated.

RELATIONSHIPS: UNIQUENESS & ATTRIBUTE CONSTRAINTS (tyfk)

Follow @DBDebunk Follow @ThePostWest

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“A unique constraint is a type of column restriction within a table, which dictates that all values in that column must be unique [and] allows null values ... a null is the complete absence of a value (not a zero or space). Thus, it is not possible to say that the value in that null field is not unique, as nothing is stored in that field.”
--Techopedia

This is one of my recent "What's Wrong with this database picture" posts on LinkedIn.

Misconceptions

In the RDM a uniqueness constraint:

Should not be viewed solely as a "column restriction within a table'.
Does NOT allow SQL "NULLs" (not "NULL values"), which have nothing to do with storage.

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 6: DEBUNKING AN ONLINE EXCHANGE 5 (obg)

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability offered by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references, which I enclose in square brackets).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the) order of the original publication.

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

CLARIFICATIONS ON A DISCUSSION OF MY BOOK PART 2

(originally posted 2/21/01)

In Part 1 debunked a review of my book @Slashdot.Org. In parts 2-5 I tackled the discussion generated there by the review. In this last part I focus on the discussion of data hierarchies covered in chapter 7 of my book [the in-vogue re-emergent graph fad].

“Chapter 7 discusses data hierarchies and trees. In a nutshell: there are no trees in SQL. The author is distressed by this. Given that a foreign key is basically a pointer, you can store trees in databases, it might not be pretty and there may not be easy way to read them and it might not be a good thing to do - but if you feel the need then get right in there. Of course I could be totally wrong about this.”

Confusing keys with pointers is one of the major errors many practitioners make ]. One intentional core advantage of the RDM is precisely that it prohibits pointers -- both physical and, as in object-orientation, logical. Exposing pointers to users has caused many unnecessary problems and complications, but offered no benefit (Don't Mix Pointers and Relations and Don't Mix Pointers and Relations - Please! in Date's RELATIONAL DATABASE WRITINGS 1994-1997). There is an easy way to demonstrate that relational keys are not, like object IDs (OID), pointers, but values: they represent uniquely identifying names/attributes of rel world entities. Pointers are system-generated internals and have no real world counterpart. The desirability of a data model that produces logical models that are faithful representations of the real world, without adding artifacts of their own. Indeed, as Date points out in Why The Object Model' is Not a Data Model in his above-mentioned book, the fact that "in the object world all the references to objects are by means of their corresponding OIDs explains why -- as is well known -- OO systems typically provide (a) two different equality comparison operators, equal OID vs. equal value and (b) two different assignment operators, assign OID vs. assign value. Note the added complication -- what is the benefit?

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 5: DEBUNKING AN ONLINE EXCHANGE 4 (obg)

Follow @DBDebunk Follow @ThePostWest

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Slashing a Slashdot Exchange - Part 1

(first published @DBAzine.com in 2001)

I was recently contacted by a reporter for an interview. When I expressed my disappointment with the trade media’s tendency to regurgitate vendor marketing claims instead of assessing them, he admitted "that is what happens about 98 percent of the time", but added "There are some outlets with a good piece from time to time that deal with serious architecture issues", mentioning SlashDot as one of them.

There is, of course, a Catch 22 here: to judge the seriousness of such outlets, foundation and substantive knowledge is necessary in the first place. And, alas, reporters possess even less of it than vendors and users (see, for example, The Ignorance Mechanism, On Trade Media’s "Balance"),
without which sources may appear serious even when they are nothing of the sort. As luck would have it, I ran into a good opportunity to prove this point for SlashDot. It so happened that shortly after my exchange with the journalist, Database Debunkings experienced a sudden ten-fold increase in traffic. Now, [given that my target audience is thinking practitioners,] were my material to suddenly become "hot", I would worry as to where I did go wrong. But the odds for that are rather slim and, fortunately, there was no need for concern: an email from a reader informed me that "there recently was an article posted to SlashDot.org which refers to Dbdebunk.com and Mr. Pascal/Date" and "There [were] some 443 comments to that posting." Such volume is practically always indicative of heat (hot air, to be more precise), rather than light. Ah, well, I thought, yet another source of weekly quotes (as if one was needed).

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 4: DEBUNKING AN ONLINE EXCHANGE 3 (obg)

Follow @DBDebunk Follow @ThePostWest

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

“I did see your plea for help with funding Chris Date. Frankly, I think his approach is "dated", from what I could understand from talking to him at VLDB’99 in Edinburgh. We now live in a world of Agents, Semantic Web and XML. That is our main research focus here. Thus we would not be interested.”
--Sr. faculty, Academic Institution

“But within the context of the University of Washington, it would not be my classes where it would be appropriate to present that type of information [on fundamentals]. My classes are graduate level, highly technical and I don’t allow PowerPoint slides or any non-technical content.”
--Oracle practitioner, graduate teaching

“Recently, James H. Billington, the current Librarian of Congress, remarked that instead of a knowledge-based democracy, we may end up with an information-inundated democracy. I share his concern, so allow me to end with this simple wish. May, in spite of all distractions generated by technology, all of you succeed in turning information into knowledge, knowledge into understanding, and understanding into wisdom.”
--Edsger Dijkstra, Convocation Speech

NOBODY UNDERSTANDS WHAT A DATA MODEL IS (tyfk)

Follow @DBDebunk Follow @ThePostWest

“A data model is a collection of concepts ... used to describe the structure of a database...data types, relationships and constraints...is basically a conceptualization between attributes and entities ...
The building blocks in the data model are as follows:
Entity − An entity represents a particular type of object in the real world.
Entity set − Sets of entities of the same type which share the same properties are called entity Sets.
Attribute − An attribute is a characteristic of an entity.
Constraints − A constraint is a restriction placed on the data. It is helpful to ensure data integrity.
Relationship − A relationship describes an association among entities.
--TutorialsPoint.com

Fallacies, Misconceptions and Confusion

A data model:

- does not describe (just) the structure of a database.
- is not "a conceptualization between attributes and entities" (whatever that means).

Entities, entity sets and relationships are not building blocks of a data model.

READ MY LIPS: IF THERE'S NULLs, IT'S NOT RELATIONAL

Follow @DBDebunk Follow @ThePostWest

“Let's say I want to store a list of movies that are stored on iTunes. For simplicity, we'll just store a few fields so that the film Avatar has these values:
ID: 354112018
Name: Avatar
Year: 2009
Synopsis: "From Academy Award®-winning director James Cameron comes Avatar, the story..."
However, sometimes the Synopsis is missing...and sometimes the Year is missing. Without giving it a second thought, I would probably create one table to store those four fields, something like this:
ID (INT)
Name (VARCHAR)
Year (INT NULL)
Synopsis (VARCHAR NULL)
Is there any advantage in 'further normalizing' the database so that, for example, I don't store any null values, such as:
Title
TitleID
Name

TitleSynopsis
TitleID
Synopsis

TitleYear
TitleID
Year
To me it seems like doing this would potentially create hundreds of extra tables (on a large database) and make inserts a nightmare -- I suppose a View could be created to flatten out the results so it's queryable, but even though I feel like it would require so much overhead. So is there any reason in the above case to normalize to remove nulls, or in general, what would be the case to do so, if there ever is one?” --StackOverflow.com

Fallacies

That we see this in 2022 is testament to abysmal ignorance of fundamentals in the industry. Let's enumerate the fallacies:

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 3: DEBUNKING AN ONLINE EXCHANGE 2 (obg)

Follow @DBDebunk Follow @ThePostWest

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Slashing a SlashDot Exchange Part 3

(first published in 2001 @DBazine.com)

The following comments being debunked are by the W3C XML Query Working Group's Activity Lead and by an academic. [The exchange took place when XML DBMS was one of the hottest fads as late as 2013. Consider them in this context: where are XML DBMSs today?]

“The article seems to say ‘I don’t like SQL and I don’t like XML and I think XML Query is about merging them although I don’t understand it very well, so the people working on XML Query must be stupid, and in any case it’s easier to attack people than understand a specification.’ Perhaps that’s unfair, but it’s clear to me that the writer is a little fuzzy on the design goals of XML and also on the focus of SQL development over the past 10 or 15 years. In both cases the story is about interoperability.”

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 2: DEBUNKING AN ONLINE EXCHANGE 1 (obg)

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the soundness and stability conferred by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. In re-publishing I may revise, break into or merge parts and/or add comments and/or references that I enclose in square brackets).

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

SCHEMA & PERFORMANCE: NEVER THE TWINE SHALL MEET

Follow @DBDebunk Follow @ThePostWest

One of the core objectives of this site (and my work) has been to demonstrate that there will not be progress in data management as long as the industry and trade media require and promote exclusively (mainly tool) experience in the absence of foundation knowledge. I have published and analyzed ample evidence that relational language and terminology are used without grasping what it actually means -- a good way to gauge lack of foundation knowledge.

Recently I posted a four part series titled "Nobody Understands the Relational Model" showing that even a practitioner steeped in the RDM does not really understand it. Consider now a practitioner's mistake at the beginning of career -- "a bad database schema and what it did to system performance" -- which, he claims, belatedly taught him a lesson. Hhhhmmm, did it, really?

Happy Holidays and New Year to all my readers!

Follow @DBDebunk Follow @ThePostWest

Friday, December 17, 2021

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 2: DEBUNKING A BOOK REVIEW (obg)

Follow @DBDebunk Follow @ThePostWest

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Clarifications of a Review of My Book Part 1

(originally posted 1/14/2001)

“Many of us ... do not think that harmony is the great goal, or unity or peacefulness, [and] actually quite like hard questions for their own sake, and enjoy ... the life of the mind. To the question of how to live, the answer is "by disagreement.” --Christopher Hitchens

Let me say, first and foremost, that as the subtitle of the book -- A REFERENCE FOR THE THINKING PRACTITIONER -- indicates, it is targeted at the minority of practitioners who think clearly, independently and critically. It should not be a surprise, then, that those not belonging to that (alas, very small) target audience don't see its practical value. As I said so many times, if my work gained mass appeal, I would wonder what I was doing wrong. This is the sad reality, whether we like it or not. In fact, to be consistent I will go one step further: I don't assume that positive reviews are any better than negative ones -- they are frequently grounded in as faulty reasoning and/or ignorance as the critiques.

Let me also make clear that I do not place all of the blame on the individual database practitioners or users. Rather, problems are rooted in a systemic, much more profound societal and business culture that fails to instill and encourage foundation knowledge and independent, critical thinking, which not only does not reward, but actually punishes such. This is true to a degree in all societies, of course, but in the US the problem is much more acute (there can hardly be a better demonstration of the horrendous implications of this than how the election was covered, perceived and accepted by most of the press and public) [I wrote this prior to the last two elections -- I leave it to the reader to judge the steepness of the subsequent regress.]

NOBODY UNDERSTANDS THE RELATIONAL MODEL: SEMANTICS, CLOSURE & CORRECTNESS PART 4

Follow @DBDebunk Follow @ThePostWest

with David McGoveran

(Title inspired by Richard Feynman)

In Parts 1 and Part 2 we provided some clarifications following a discussion on LinkedIn about our contention that, conventional wisdom notwithstanding, database relations -- distinct from mathematical relations -- are by definition not just in 1NF, but also in 5NF, as a consequence of which the RA, as currently defined for 1NF closure, produces what the industry calls "update anomalies" and, thus, is not a proper algebra. In Part 3 we used that information to debunk some leftover misunderstandings in the discussion.

We conclude in Part 4 with comments on a private exchange that followed the public one on LinkedIn regarding the difference between the McGoveran (DMG) and Date and Darwen's (TTM)
interpretations of the RDM, which can be summarized as follows:

HOW NOT TO EXPLAIN THE RELATIONAL MODEL (tyfk)

Follow @DBDebunk Follow @ThePostWest

“The key idea is "Parent-Child" relationship. Entities ~ Relations ~ Tables (tilde stands for "more or less like"). Concept of a Table resonates with most of the people just as everybody intuitively grasps a concept of "rows and columns” but might struggle with "tuples and attributes". Explain relations and relationships, 1:1, 1:N, N:N etc. Explain rationale for this way of collecting and storing data, touch upon data normalization, and tell a few anecdotes about cost of storage back in 1970 and Y2K problem it have caused; add that we have inadvertently created Y10K problem while fixing it (not exactly true but not wrong either). Show an ERD diagram, trace the relationships, introduce SQL, maybe run a few simple SELECT queries to help your listeners visualize it, including equijoin and ORDER BY. Save other JOIN types, data types and other, more advanced topics, and for the next encounter.”
--Quora.com

An excellent example that validates my claim of lack of foundation knowledge in the industry: most "explainers" of RDM have acquired relational jargon, but do not know or understand it at all.

Nobody Understands the Relational Model: Semantics, Closure and Database Correctness Part 3

Follow @DBDebunk Follow @ThePostWest

(Title inspired by Richard Feynman)

In Parts 1 and 2 we provided some clarifications following a discussion on LinkedIn about our contention that, conventional wisdom notwithstanding, database relations -- distinct from mathematical relations -- are by definition not just in 1NF, but also in 5NF, as a consequence of which the relational algebra (RA), as currently defined for 1NF closure, produces update anomalies and, thus, is not a proper algebra. In this third part we will use that information to debunk some leftover misunderstandings in the discussion.

THE FATE OF FADS: XML DBMS (obg)

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

Remember XML DBMS? At one point it was the fad of the day, similar to today's NoSQL or the old new "knowledge graph" -- "the future that you ignored at the peril of being left behind". As I predicted, it went the way of all fads (ODBMS, Associative DBMS, you name them) together with their "data models" that were nothing of the sort. My prediction was grounded in the same sound foundations I rely on today -- unlike the industry we are progressing it -- that fads lack and which were and still are dismissed, evidence be damned.

Here's a typical example (comments on republication in square brackets).

Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2

Follow @DBDebunk Follow @ThePostWest

with David McGoveran

(Title inspired by Richard Feynman)

In Part 1 we explained that all database relations are, mathematically, relations, but not all relations are database relations, which are in both 1NF and 5NF and we agreed with a statement in a LinkedIn discussion ending as follows: "Update anomalies are not as big of a problem as an algebra where relations aren't closed under join". Unfortunately, update anomalies, closure, and how relational operators were defined are all interrelated and represent an even "bigger problem". Update anomalies are not "bugs", let alone irrelevant, but actually a reflection of that much bigger problem.

In this second part we delve into that problem.

OBG: Database Consistency and Physical Truth

Follow @DBDebunk Follow @ThePostWest

“I'm presently reading your book PRACTICAL ISSUES IN DATABASE MANAGEMENT and there are a couple of points that I find a little confusing. I'll start first by saying that I have no formal database oriented education, and I'm attempting to familiarize myself with some of the underlying theories and practices, so that I can further my personal education and career prospects (but aren't we all!). My questions may sound a little bit ignorant, but that would be because I am! (Please note ignorant, not stupid!) I'll quote you directly from the book for this (possibly I'm taking you out of context or missing something important)

Chapter 3, A Matter of Identity: Keys, pg. 75: "Databases represent assertions of fact - propositions - about entities of interest in the real world. The representation must be correct - only true propositions (facts) must be represented."

Now, correct me if I'm wrong with a basic assumption here, but isn't a database simply a model of a "real world" data collection? I would've thought that the intention of a database would be to model real life effectively (and accurately) enough to provide useful data for interpretation. Now obviously this is not an easy process with complex data types, but would it even be possible to have a 100% true proposition with only atomic data types? (i.e. can a simplified model contain only facts?) In my understanding of modeling, any model that fits real life closely enough to be a good statistical representation is a usable model. e.g. Newton's Laws are accurate enough when applied on a local scale, but we need to use Einstein's model of space-time across larger scales. Wouldn't recording only "facts" (which I would presume you mean to be statements that are provable in the objective sense i.e. no interpretation, only investigation or calculation) possibly eliminate the utility of some aspects of the database? Or do we account for the interpretative aspect in the metadata or in some other way?

Essentially, I can see what you're saying, but not necessarily how you've reached the conclusion. Admittedly in an ideal world we should be able to record only facts in a database, but this is not an ideal world. As an example, in surveys we see such questions as "Are you happy with this product?" followed by a rating system of 1-5, or 'completely unhappy to completely happy'. This is an artificial enforcement of a quantitative measure on a qualitative property. How do we account for the fact that this is interpreted data and not calculated or measured?

My questions may have little relevance to database theory in general, but the concept fascinates me!”

Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 1

Follow @DBDebunk Follow @ThePostWest

with David McGoveran

(Title inspired by Richard Feynman)

“As currently defined, relational algebra produces anomalies when applied to non-5NF relations. Since an algebra cannot have anomalies, they should have raised a red flag that RA was not defined quite right, especially defining "relation" as a 1NF table and claiming algebraic closure because 1NF was preserved. Being restricted to tabular representation as the "language" for relationships is like being restricted to arithmetic when doing higher mathematics like differential calculus -- you need more expressive power, not less! Defining RA operations in terms of table manipulations aided initial learning and implementations by making data management look simple and VISUAL. Unfortunately, it was never grasped how much was missing, let alone how much more "intelligent" the RA and the RDBMS needed to be made to fix the problems. And I can see that those oversights were, in part, probably due to having to spend so much time correcting the ignorance in the industry."
--David McGoveran

I recently posted the following Fundamental Truth of the Week on LinkedIn, together with links to more detailed discussions of 1NF and 5NF (see References):

“According to conventional understanding of the RDM (such as it is) [and I don't mean SQL], a relation is in at least first normal form (1NF) -- it has only attributes drawn from simple domains (i.e., no "nested relations") -- the formal way of saying that a relation represents at the logical level an entity group from the conceptual level that has only individual entities -- no groups thereof -- as members. 1NF is required for decidability of the data sublanguage.

However, correctness, namely (1) system-guaranteed logical validity (i.e., query results follow provably from the database) and (2) by-design semantic consistency (of query results with the conceptual model) requires that relations are in both 1NF and fifth normal form (5NF). Formally, the only dependencies that hold in a 5NF relation are functional dependencies of non-key attributes on the PK -- for each PK value there is exactly one value of every corresponding non-key attribute value. This is the formal way of saying that a relation represents facts about a group of entities of a single type.

Therefore we now contend that database relations are BY DEFINITION in both 1NF and 5NF, otherwise all bets are off.”

It triggered a discussion that raised some fundamental issues for which an online exchange is too limiting. This post offers further clarifications, including comments by David McGoveran, on whose interpretation of the RDM (LOGIC FOR SERIOUS DATABASE FOLKS, forthcoming) I rely on. The portions of my interlocutor in the discussion are in quotes.

Relational and Referential Integrity

Follow @DBDebunk Follow @ThePostWest

“Relational Data Integrity is like every other integrity constraint that checks that the relationships created between data using foreign keys has a consistency. This can be done by using ON UPDATE, ON DELETE constraints on the table.”
--Quora.com

I recently quoted this as one of my To Laugh or Cry? items on LinkedIn, which initiated an exchange triggered by the following question:

“You have a better definition? What is it?”

In the exchange the asker's interpretation seemed to be "referential constraints are constraints like any other constraints, so there is no problem". It is hard to recognize misconceptions without proper understanding of the RDM. We ignore that the above is not really a definition and focus on debunking.

Decades ago I wrote an article in DATABASE PROGRAMMING AND DESIGN carrying the double-meaning title Integrity Is Not Only Referential, in which I debunked Borland's claim that its Paradox file manager supported referential integrity (at the time no PC product did). As one component of the RDM, database integrity is, of course, a DBMS function, but Pradox relegated it to applications. Then, as now, one of the most common and entrenched misconceptions was that relational comes from "relationships between tables" and so relational integrity amounts to referential integrity (RI). RI is, of course, but one of several components that comprise relational integrity -- it is necessary, but insufficient. While practitioners are familiar with referential and PK constraints, if asked what other constraints comprise relational integrity very few know. Having enumerated them recently on LinkedIn, I asked this very question:

“... what other RELATIONAL constraints ARE there and what is their purpose? I recently posted a weekly truth and other items here that answer it.”

which went unanswered.

Data integrity is one of the three components of the RDM, together with data structure and manipulation. It consists of several categories of constraints which I detailed more than once, most recently in Understanding Relational Constraints, to which I referred the asker (can you give an example for each category?) Defining relational integrity means specifying all the constraint categories required by the RDM.

Consider now the above paragraph: it purports to define relational integrity, but it specifies functionality of referential integrity -- implying the old misconception I wrote about decades ago. The asker did not seem to comprehend the distinction:

“I can't see a problem here. Isn't it simply as follows? ... A *referential integrity constraint* ensures consistency between attributes of different entities - e.g. between primary and foreign keys of related entities (aka relational integrity). Isn't that what the definition says?"

Yes, it is the definition of referential integrity, but not of relational integrity -- there is more to the latter than the former. No matter in how many ways I tried to explain this, I was unable to convey it, because it's practically impossible in the absence of sufficient knowledge and understanding of the RDM.

Sunday, September 19, 2021

TYFK: Calculated Attributes -- Redundancy, Full Normalization and Relational Theory

Follow @DBDebunk Follow @ThePostWest

“If you have shopping cart, you probably have some field "TOTAL" somewhere that stores the final amount due for the customer. It so happens that such a thing violates relational theory...”

“Having a "TOTAL" field in your "order" table *might* violate relational theory, but if you make it so that only a trigger can update it based on what's in your "order_item" table, then I think it's fine. You still get data integrity and that is what matters.”

“I still fail to see what you mean by the "calculated TOTALS field" (attribute, really) violates the Relational Model.”

“The result of having the field ... is what is called a DELETE ANOMALY.”

“Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed.”

“There are four practical problems with a fully normalized database, three of which I have listed before. I will list them all here for completeness:
* No calculated values. Calculated values are a fact of life for all applications, but a normalized database lacks them. The burden of providing calculated values must be taken up by somebody somehow. Denormalization is one approach to this, though there are others.”
--Database Programmer blog

“...I'm now working with IT to normalize part of the database to remove calculated fields...:
`lineitems`.`extended total` = `lineitems`.`units` * `biditems`.`price`.
`jobs`.`jobvalue` = the sum of related `lineitems`.`extended total` records
`orders`.`ordervalue` = the sum of related `jobs`.`jobvalue` records.”
--mySQL.com

Do calculated attributes (not fields!) violate relational theory and must be "normalized" out of them? Determining that requires foundation knowledge that is scarce in the industry, which has a poor and outdated understanding of the RDM.

OBG: Data Warehouses Are Non-Relational Application Views

Follow @DBDebunk Follow @ThePostWest

ON DATA WAREHOUSES

(originally posted 11/10/2001)

“It is dawning on me that data warehouse techniques (as advocated by Ralph Kimball) are a response to perceived SQL DBMS performance weaknesses and nothing more. Dimensional modeling is a physical implementation design to deal with what may already be out of date performance tests to back up said design. I agree that where data warehouse logical design deviates from the relational data model there will be trouble down the road. I can attest to the high cost involved in maintaining a data warehouse. I am now questioning three purported benefits of using the current popular data warehouse design techniques.

Physical DBMS designs like the star schema produce faster more predictable query results than a normalized one (I realize normalization is a strictly logical concept but it does appear to have its direct physical mappings in current DBMS systems). Well, I am about to find out. We will be performing some benchmarks. What really are the query times performed on our OLTP system vs. a Star schema for a few relevant reports.

Data warehouses offload query processing from the OLTP system. This is true, but may not be necessary. One needs to thoroughly analyze the traffic on the OLTP system to see if offloading is necessary. We are looking into simply replicating the database (or part of it) for reporting purposes. Replication is far simpler to maintain than a data warehouse.

Data warehouse designs are simpler to understand than a relational one, thereby query construction is easier. I think this is more due to the fact that designers have had a bad habit of throwing up on a wall a full ER chart of their systems. Saying in effect Look how great I am, no one will ever understand this! Creating ER diagrams of subsystems to describe important characteristics of a database can also be simple to understand.

Item #3 brings up a question. What do you think of OLAP tools such a Microsoft s Analysis Services? We have discovered you can use the tool without redesigning your database. We have seen some pretty fast query times if we take the option of allowing the tool to store data extracted from the production database into its own proprietary format. The opposition to its adoption here is the learning curve to master the MDX query language."

Fabian Pascal: The fact is that so-called "dimensional modeling" is logical, not physical modeling. Kimball does not present his "techniques" as just performance-maximization and, what is more, they won’t necessarily yield better performance. He also seems to believe, erroneously, that star-schemas are fully normalized designs. The real solution to performance problems is true RDBMSs with better implementations, not ad-hoc logical designs. Like so many, Kimball simply does not understand relational technology and confuses levels of representation.

I consider warehouses a regression to the good old days of application-biased files -- which we discarded for application-neutral and DBMSs, because they did not prove cost-effective. They are application-specific views of databases that are not derived via relational algebra from relational databases, but are rather non-relational SQL tables. The industry has a long and profitable history of recycling relabeled old failures, witness XML and graph throwbacks to hierarchic databases.

Aside from being necessary for soundness, fully normalized relational databases are the easiest to understand if (1) one thoroughly knows and understands the segment of reality to be represented in the database -- the business, that is -- and (2) data fundamentals. Arbitrary designs such as star-schemas are cope-outs, due to poor knowledge and understanding of the latter.

I am not familiar with the product, but I am generally wary of what vendors do.

Note on re-publication: See also Data Warehouses and the Logical-Physical Confusion.

Saturday, September 4, 2021

Understanding Relational Constraints

Follow @DBDebunk Follow @ThePostWest

“The data in a relational database is stored in form of a table. A table makes the data look organized. Yet in some cases we might face issues while working with the data like repetition. We might want enforce rules on the data to avoid such technical problems. Theses rules are called constraints. A constraint can be defined as a rule that has to enforced on the data to avoid faults. There are three kinds of constraints: entity, referential and semantic constraints. Listed below are the differences between these three constraints:
1. Entity constraints -- primary key, foreign key, unique, NULL -- are posed within a table and used to enforce uniqueness and to define no value [respectively].
2. Referential constraints -- foreign key -- are enforced with more than one table for referring other tables for analysis of the data.
3. Semantic constraints -- datatypes -- are enforced in a table on the values of a specific attribute and help the data segregate according to its type. Example: name varchar2(30).”
--GeeksforGeeks.com

Before we tackle the main subject, let's get some misconceptions out of the way. As we have explained so many times:

Data is not "stored in a form of a table" -- it can be stored in any number of physical formats, at the discretion of DBMS designers and DBAs. Physical independence is a core advantage of the RDM.
A table does not "make the data look organized". Data is by definition organized -- be it relationally or not -- otherwise it would be random noise not data. A database relation can be visualized as a R-table, but tables do not play any role in RDM.
While some "repetition" (i.e., redundancy) is prevented by constraints (e.g., uniqueness), others are avoided by database design (e.g., 5NF DB relations).

And now to constraints.

TYFK: Normalized, Fully Normalized, Non-Normalized, Denormalized -- Clearing the Mess

Follow @DBDebunk Follow @ThePostWest

“A non-normalized database is a disorganized one, where nobody has bothered to work out where the facts should be stored. It is like a stack of paper files that has been tossed down the stairs. We are not interested in non-normalized databases.

A normalized database has been organized so that each fact is stored in exactly one place (2nf and greater) and no more than one fact is stored in each place (1nf). In a normalized database there is a place for everything and everything is in its place.

A denormalized database is a normalized database that has had redundancies deliberately re-introduced for some practical gain. Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed. Values are copied from table to table, calculations are made within a row, and totals, averages and other aggregrations are made between child and parent tables.”
--database-programmer.blogspot.com

OBG: The Myth of Market-Based Education

Follow @DBDebunk Follow @ThePostWest

(Originally posted on 09/08/2001, slightly revised)

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

“In a world torn by every kind of fundamentalism -- religious, ethnic, nationalist and tribal -- we must grant first place to economic fundamentalism, with its religious conviction that the market, left to its own devices, is capable of resolving all our problems. This faith has its own ayatollahs. Its church is neo-liberalism; its creed is profit; its prayers are for monopolies.”
--Carlos Fuentes

"We as humans have an instinct for creativity and a moral instinct. A good educational system ought to nurture and encourage these aspects of human life and allow them to flourish. But of course that has problems. For one thing, it means that you will encourage challenge of authority and domination. It will encourage questioning of powerful institutions. So the way schools actually function, by and large, there's a very strong tendency that works its way out in the long run and on average, for the schools to have a kind of filtering effect. They filter out independence of thought, creativity, imagination, and in their place foster obedience and subordination."
--Noam Chomsky

"The educated person is not the person who can answer the questions, but the person who can question the answers"
--T. Schick Jr.

TYFK: Facts, Properties, Relationships, Domains, Relations, Tuples

Follow @DBDebunk Follow @ThePostWest

A statement from a 1986 book that "Data are facts represented by values -- numbers, character strings, or symbols -- which carry meaning in a certain context" triggered the following response on Linkedin:

“...In contrast, Date and Darwen (2000) say:
Domains are the things that we can talk about.
Relations are the truths we utter about those things.
Thus, the declarative sentence "Fred is in the kitchen." is a fact that links the domains Person[s] and Place[s] with the predicate "is in". The complete relation might be made up of three facts:
Fred is in the kitchen.
Mary is in the garden.
Arthur is in the garden.
This seems to be more precise than the 1986 statement.”

To which the book author responded:

“...back then we did not have the refinement, clarity, nor precision from people like Sjir Nijssen and Terry Halpin regarding facts, or elementary fact sentences, which today you and I know are the bedrock of data modeling. Facts are expressed in sentences (with domains and predicates).”

Unfortunately none of this is sufficiently clear and precise to prevent confusion and it inhibits understanding of the RDM.

Documents and Databases

Follow @DBDebunk Follow @ThePostWest

'These new data technologies were developed because there are new usage scenarios for data — which do not fit into the relational model.'
--Reddit.com

Don't let the NoSQL label fool you. It's the relational model (RDM), not SQL, that its proponents are really dismissing. The main argument, as advanced in a recent LinkedIn exchange, is that lots of information "cannot be represented in rows and columns". IOW, the RDM is not general enough -- there are certain types of information that it is not suited for. Ignoring the tabular nonsense, the response from David McGoveran, is important enough to restate here.

“Information consists of facts (i.e., propositions asserted to be true) about objects, properties, and relationships among objects and properties. We have shown that a database relation -- which a R-table visualizes -- is constrained to represent a set of facts about (properties of) a group of entities with within-group relationships among properties and entities and cross-group relationships. Yet we are told that document information "do not fit" in a relational structure. They are referred to as "unstructured" (which, if they were, they would contain random noise, not information).

But documents don't lack structure. Rather, they are multistructured: have complex multi-level/type structures -- lots of content, metadata, interpretations, and internal relationships (formatting, semantic, structural or syntactic, and so on). At one level of analysis, they are just documents that have subject matter or content involving objects, properties and relationships. At another they might relate to that of other data (e.g., other documents). How we represent knowledge and in how much detail is determined by which of the structures we choose to represent and that always partially determines the class of queries we can express. This is precisely what Codd understood and tried to address via the RDM.
--David McGoveran

And there's the rub: which type of data (facts) at which document level is of interest? Take this post. There are facts about it (e.g., author, title, date and so on). There are facts in it (its content). Either can can be readily represented relationally, for example:

POSTS (AUTHOR,TITLE,DATE,CONTENT)

where CONTENT is a column defined on a text, PDF, or HTML domain with built-in operators applicable to values of either of those types (e.g., a substring operator for text). Facts at other levels (e.g., grammatical, or semantic) could be of interest and would require multi-table representation. One must choose the type/level of information of interest to represent relationally in a database. We can choose to not do the analysis and modeling of the content of documents, but that does not mean that they are unstructurable as facts. More often than not data professionals don’t know what type of facts are to be represented, or are unfamiliar with data modeling and relational fundamentals. Product advocates avoid to say that without investing time and effort in analysis and modeling one cannot ask the same questions of and produce results equivalent to those from relational databases (i.e., make precise inferences from data that are guaranteed to be correct -- logically valid and semantically consistent). In fact, the use of such products trades upfront structuring effort for subsequent prohibitive manipulation effort.

As David points out, "complaints about RDM are not about knowledge representation, but knowledge discovery -- the problem, for example, that Google Search, analytics and data integration face and attempt to solve. It's an expensive, imprecise, and difficult problem", but it is distinct from what database management does and the two should not be confused.

Saturday, July 10, 2021

Relational Misconceptions Part 2: RDM is Applied Theory

Follow @DBDebunk Follow @ThePostWest

In Part 1 we showed (yet again) how even those with their heart in the right (relational) place can't help being affected by the common and entrenched industry misconceptions, in this case about relationships, relations and tables. More often than not authors exhibit the very misconceptions they try to debunk.

We left the author distinguishing sets (with unordered, unique elements) from tables (lists of ordered, possibly duplicate rows).

OBG: Experimental Science and Database Design

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability conferred by a sound theoretical foundation relative to the industry's fad-driven, ad-hoc "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments of then hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

The following is an email exchange from 2001 that I recommend reading jointly with my Data Meaning and Mining post (itself a revision of an article originally published at the old "All Analytics" website). I have slightly touched my replies for pedagogical purposes and clarity. You can substitute any data structure for XML hierarchy.

RE-WRITE

Follow @DBDebunk Follow @ThePostWest

See: https://www.dbdebunk.com/2023/08/entities-properties-and-codds-sleight.html

POSTS

Saturday, August 20, 2022

Sunday, August 14, 2022

Thursday, August 4, 2022

by David McGoveran with Fabian Pascal

Wednesday, July 13, 2022

On NULLs and Multi-Table Relvars

Sunday, July 3, 2022

Sunday, June 26, 2022

Saturday, June 11, 2022

Saturday, May 21, 2022

ON DATA TYPES AND WHAT A DBMS IS

Monday, May 2, 2022

Monday, April 25, 2022

Sunday, April 10, 2022

Friday, March 25, 2022

Friday, March 18, 2022

Sunday, March 6, 2022

Misconceptions

Saturday, February 19, 2022

CLARIFICATIONS ON A DISCUSSION OF MY BOOK PART 2

Sunday, February 13, 2022

Slashing a Slashdot Exchange - Part 1

Friday, February 4, 2022

Sunday, January 30, 2022

Fallacies, Misconceptions and Confusion

Friday, January 21, 2022

Fallacies

Sunday, January 16, 2022

Slashing a SlashDot Exchange Part 3

Saturday, January 8, 2022

Saturday, January 1, 2022

Friday, December 24, 2021

Friday, December 17, 2021

Clarifications of a Review of My Book Part 1

Saturday, December 11, 2021

Sunday, December 5, 2021

Thursday, November 25, 2021

Friday, November 19, 2021

Thursday, November 11, 2021

Friday, November 5, 2021

Wednesday, October 27, 2021

Saturday, October 9, 2021

Sunday, September 19, 2021

Saturday, September 11, 2021

ON DATA WAREHOUSES

Saturday, September 4, 2021

Tuesday, August 31, 2021

Friday, August 13, 2021

Thursday, August 5, 2021

Thursday, July 22, 2021

Saturday, July 10, 2021

Thursday, July 1, 2021

Thursday, June 10, 2021