Saturday, May 21, 2022

OBG: Domains and RDBMS

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).

The following is an email exchange with a reader and DBMS designer.

ON DATA TYPES AND WHAT A DBMS IS

(originally published in 2001)

Reader:
"I would like to hear your (or Date's) opinion on The Suneido Database … it seems to me self-contradictory. They aren't typed ... so how can they define operators, or even the idea of domains. They also say they include administrative commands, which as far as I understand isn't allowed in the THIRD MANIFESTO. While they do not claim to be an implementation of the Manifesto, their claims that their database language was created by CJ Date do not sound appropriate."

 "They don't know what [domains (distinct from programming data types)] are and what their function in the RDM is. That's common for all DBMS vendors, the claims of which should be always taken with more than a grain of salt."

Monday, May 2, 2022

SMS: "Relation Proliferation"?

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “How do I avoid too many relations in databases?”

A: “You don’t. Every relation is there to store meaningful data, hopefully you do not define database relations for data that are not to be stored in your database.”

A: “By following proper design principles. Normalization, standard data patterns, and progressing from logical to physical always. Never denormalize (or avoid normalizing in the first place) because performance never trumps accuracy. It really doesn't matter how fast you get the wrong answer.”
--Quora.com

Monday, April 25, 2022

SMS: Relational Database and Set Theory

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “To what extent is relational database theory related to set theory?”

A: “Relational database theory is indeed closely derived from set theory. Many operations in relational data are directly related to common operations one does with sets. In fact, SQL has keywords for them that should sound familiar to someone who has just taken a class in Discrete Mathematics:
  • UNION
  • INTERSECT
  • DIFFERENCE (called MINUS in Oracle)
Even the structure of a table is set-oriented. A table is a set of rows, and a row is a set of columns, and those columns must match the set of columns defined in the table's header.”

--Quora.com

Sunday, April 10, 2022

SMS: Quota Queries

Note: "Setting Matters Straight" (SMS) is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). 

Q: “How do you return the most recent record in SQL?”

A: “There are many ways of doing it. I would suggest (first thing came to my mind):
Select Top 1
from YourTable
order by TablePrimaryKey Desc;”
A: “If you mean "the last inserted record which has no datetime stamp field" ... you have a few options.
  • If you cannot use date/time -- your next best bet would be an auto-increment/sequence field, which assigns increasing numbers to each inserted record.
  • If that’s not available, you would have to rely on business logic e.g. order # or some such.
Some vendors, like Oracle, provide ROWID pseudocolumn for each record which might help in some quick’n’dirty cases -- it is not guaranteed to be sequential but could be (e.g., when table has had no DELETE operations).”    --Quora.com

If you don't know, I set matters straight @dbdebunk.com.

Friday, March 25, 2022

SMS: Keys and Indexes

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals to encourage readers to test their foundation knowledge, which they can then compare with our debunking here, where we confirm what is correct and correct what is fallacious (with clarifications, wherever necessary). For in-depth treatment check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs).


Q: “What is the difference between a primary key, a unique key, and an index in databases?”

A: “Unique key is a field (or fields) with a set of unique values; the uniqueness is usually enforced with UNIQUE constraint. There might be one or more per table. Every PRIMARY key is always a unique key; there should be only one per table. It uniquely identifies record, and is used to enforce integrity - entity integrity, and, in tandem with FOREIGN key, referential integrity. Index is a data structure to facilitate records search. It might be created on PRIMARY key (best practice), unique key or any other field or combination thereof in the table. The limit on how many indices a table might have is defined in RDBMS implementation. An index might - or might not - speed up some queries.”

A: “The primary key is inherently indexed and unique and is the cross reference to related tables. Often the best primary key is an auto number integer as any value entered by humans is subject to error or delay that can be challenging to manage in the user interface ... whereas an auto number is assigned immediately upfront and eliminates any possible record conflict in tables during multi user entries. A unique key is somewhat of an informal definition. My view is that it is a definition of a field that is not being used as the primary key, but is unique unlike i.e. Last Name -- for instance a social security number field. So it is not the primary key as it is not the field/value being used to cross reference to related tables but it is unique in the table.”

A: “A primary key is a unique, non null value which can identify every tuple (row in the table) uniquely. A unique key/column/constraint ensures that no two rows contain the same value (almost the same as primary key). Unless specified explicitly for the column configuration, a NULL is a valid value for column with unique constraint. A index can be thought of as the appendix at the end of the book. The information is sorted in specific order so that look up is easy and it points to the location that is being searched for.”
--Quora.com

Friday, March 18, 2022

SMS: Entities and Records

 

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals to encourage readers to test their foundation knowledge, which they can then compare with our debunking here, where we confirm what is correct and correct what is fallacious (with clarifications, wherever necessary). For in-depth treatment check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs).

Q: “What is the relationship between an entity and a record?”

A: “In the context of a database design, an ‘entity’ is a type or category of persons, places, things or events. It’s a collectivisation of the nouns in a system about which you wish to keep data. For example, Employee might be the name of an entity in your system. A ‘record’ is a collection of data about a specific entity, a particular person or place, an identifiable thing, or a single event. For example, Name: ‘Dave Voorhis’, StartYear: 2019, Salary: £1,398,293 might be a record of one Employee entity in your system.”

A: “Database, file, and recordset are basically the same thing. They are collections of information or data. Each database or file or recordset typically has some sort of common purpose or definition. Like a database (relational, hierarchical, etc.) of data of a business process. A File is again a collection of data such as all transactions to be posted. A recordset is also basically a file.

Entity and table are basically the same thing. While you have the grouping of all the data, and entity (logical view) and a table (physical view) are the same. As Dave said, it is a logical grouping of a specific piece of data.

File, recordset, record, row or line are basically the same. A .csv file is a grouping of records. A file is a grouping of records. A row is an individual grouping of data from a relational database.

The last is element or attribute or field. This is the individual piece of data like Transaction_Amount or First Name.”
--Quora.com
A simple and the answer oversimplifies. But things seem simple only in the absence of foundation knowledge. Practitioners use different terms for the same thing, or the same word for different things, but that must be corrected, not accepted or validated.

Sunday, March 6, 2022

TYFK: Relationships -- Uniqueness and Attribute Constraints

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

 

“A unique constraint is a type of column restriction within a table, which dictates that all values in that column must be unique [and] allows null values ... a null is the complete absence of a value (not a zero or space). Thus, it is not possible to say that the value in that null field is not unique, as nothing is stored in that field.”
--Techopedia
This is one of my recent "What's Wrong with this database picture" posts on LinkedIn.

Misconceptions

In the RDM a uniqueness constraint:

  • Should not be viewed solely as a "column restriction within a table'.
  • Does NOT allow SQL "NULLs" (not "NULL values"), which have nothing to do with storage.

Saturday, February 19, 2022

OBG: No Understanding without Foundation Knowledge Part 6: Debunking an Online Exchange 5

Note: To demonstrate the correctness and stability offered by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references, which I enclose in square brackets).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the)  order of the original publication.
Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com
Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com
Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com
Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com
Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com
Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

CLARIFICATIONS ON A DISCUSSION OF MY BOOK PART 2

(originally posted 2/21/01)

In Part 1 debunked a review of my book @Slashdot.Org. In parts 2-5 I tackled the discussion generated there by the review. In this last part I focus on the discussion of data hierarchies covered in chapter 7 of my book [the in-vogue re-emergent graph fad].

“Chapter 7 discusses data hierarchies and trees. In a nutshell: there are no trees in SQL. The author is distressed by this. Given that a foreign key is basically a pointer, you can store trees in databases, it might not be pretty and there may not be easy way to read them and it might not be a good thing to do - but if you feel the need then get right in there. Of course I could be totally wrong about this.”
Confusing keys with pointers is one of the major errors many practitioners make ]. One intentional core advantage of the RDM is precisely that it prohibits pointers -- both physical and, as in object-orientation, logical. Exposing pointers to users has caused many unnecessary problems and complications, but offered no benefit (Don't Mix Pointers and Relations and Don't Mix Pointers and Relations - Please! in Date's RELATIONAL DATABASE WRITINGS 1994-1997). There is an easy way to demonstrate that relational keys are not, like object IDs (OID), pointers, but values: they represent uniquely identifying names/attributes of rel world entities. Pointers are system-generated internals and have no real world counterpart. The desirability of a data model that produces logical models that are faithful representations of the real world, without adding artifacts of their own. Indeed, as Date points out in Why The Object Model' is Not a Data Model in his above-mentioned book, the fact that "in the object world all the references to objects are by means of their corresponding OIDs explains why -- as is well known -- OO systems typically provide (a) two different equality comparison operators, equal OID vs. equal value and (b) two different assignment operators, assign OID vs. assign value.  Note the added complication -- what is the benefit?

Sunday, February 13, 2022

OBG: No Understanding without Foundation Knowledge Part 5: Debunking an Online Exchange 4

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the)  order of the original publication.
Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com
Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com
Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com
Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com
Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com
Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Slashing a Slashdot Exchange - Part 1

(first published @DBAzine.com in 2001)

I was recently contacted by a reporter for an interview. When I expressed my disappointment with the trade media’s tendency to regurgitate vendor marketing claims instead of  assessing them, he admitted "that is what happens about 98 percent of the time", but added "There are some outlets with a good piece from time to time that deal with serious architecture issues", mentioning SlashDot as one of them.

There is, of course, a Catch 22 here: to judge the seriousness of such outlets, foundation and substantive knowledge is necessary in the first place. And, alas, reporters possess even less of it than vendors and users (see, for example, The Ignorance Mechanism, On Trade Media’s "Balance"),
without which sources may appear serious even when they are nothing of the sort. As luck would have it, I ran into a good opportunity to prove this point for SlashDot. It so happened that shortly after my exchange with the journalist, Database Debunkings experienced a sudden ten-fold increase in traffic. Now, [given that my target audience is thinking practitioners,] were my material to suddenly become "hot", I would worry as to where I did go wrong. But the odds for that are rather slim and, fortunately, there was no need for concern: an email from a reader informed me that "there recently was an article posted to SlashDot.org which refers to Dbdebunk.com and Mr. Pascal/Date" and "There [were] some 443 comments to that posting." Such volume is practically always indicative of heat (hot air, to be more precise), rather than light. Ah, well, I thought, yet another source of weekly quotes (as if one was needed).

Friday, February 4, 2022

OBG: No Understanding without Foundation Knowledge Part 4: Debunking an Online Exchange 3

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the)  order of the original publication.
Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com
Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com
Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com
Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com
Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com
Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

“I did see your plea for help with funding Chris Date. Frankly, I think his approach is "dated", from what I could understand from talking to him at VLDB’99 in Edinburgh. We now live in a world of Agents, Semantic Web and XML. That is our main research focus here. Thus we would not be interested.”
--Sr. faculty, Academic Institution
“But within the context of the University of Washington, it would not be my classes where it would be appropriate to present that type of information [on fundamentals]. My classes are graduate level, highly technical and I don’t allow PowerPoint slides or any non-technical content.”
--Oracle practitioner, graduate teaching
“Recently, James H. Billington, the current Librarian of Congress, remarked that instead of a knowledge-based democracy, we may end up with an information-inundated democracy. I share his concern, so allow me to end with this simple wish. May, in spite of all distractions generated by technology, all of you succeed in turning information into knowledge, knowledge into understanding, and understanding into wisdom.”
--Edsger Dijkstra, Convocation Speech

Sunday, January 30, 2022

TYFK: Nobody Understands What a Data Model Is

 “A data model is a collection of concepts ... used to describe the structure of a database...data types, relationships and constraints...is basically a conceptualization between attributes and entities ...
The building blocks in the data model are as follows:
  • Entity − An entity represents a particular type of object in the real world.
  • Entity set − Sets of entities of the same type which share the same properties are called entity Sets.
  • Attribute − An attribute is a characteristic of an entity.
  • Constraints − A constraint is a restriction placed on the data. It is helpful to ensure data integrity.
  • Relationship − A relationship describes an association among entities.
--TutorialsPoint.com

Fallacies, Misconceptions and Confusion

  • A data model:

- does not describe (just) the structure of a database.
- is not "a conceptualization between attributes and entities" (whatever that means).

  • Entities, entity sets and relationships are not building blocks of a data model.

Friday, January 21, 2022

Read My Lips: If There's NULLs, It's Not Relational

“Let's say I want to store a list of movies that are stored on iTunes. For simplicity, we'll just store a few fields so that the film Avatar has these values:
ID: 354112018
Name: Avatar
Year: 2009
Synopsis: "From Academy Award®-winning director James Cameron comes Avatar, the story..."
However, sometimes the Synopsis is missing...and sometimes the Year is missing. Without giving it a second thought, I would probably create one table to store those four fields, something like this:
ID (INT)
Name (VARCHAR)
Year (INT NULL)
Synopsis (VARCHAR NULL)
Is there any advantage in 'further normalizing' the database so that, for example, I don't store any null values, such as:
Title
 TitleID
 Name

TitleSynopsis
 TitleID
 Synopsis

TitleYear
 TitleID
 Year
To me it seems like doing this would potentially create hundreds of extra tables (on a large database) and make inserts a nightmare -- I suppose a View could be created to flatten out the results so it's queryable, but even though I feel like it would require so much overhead. So is there any reason in the above case to normalize to remove nulls, or in general, what would be the case to do so, if there ever is one?”  --StackOverflow.com

Fallacies

That we see this in 2022 is testament to abysmal ignorance of fundamentals in the industry. Let's enumerate the fallacies:

Sunday, January 16, 2022

OBG: No Understanding without Foundation Knowledge Part 3 -- Debunking an Online Exchange 2

Note: To demonstrate the correctness and stability offered by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references, which I enclose in square brackets).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the)  order of the original publication.
Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com
Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com
Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com
Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com
Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com
Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Slashing a SlashDot Exchange Part 3

(first published in 2001 @DBazine.com)

The following comments being debunked are by the W3C XML Query Working Group's Activity Lead and by an academic. [The exchange took place when XML DBMS was one of the hottest fads as late as 2013.  Consider them in this context: where are XML DBMSs today?]

“The article seems to say ‘I don’t like SQL and I don’t like XML and I think XML Query is about merging them although I don’t understand it very well, so the people working on XML Query must be stupid, and in any case it’s easier to attack people than understand a specification.’ Perhaps that’s unfair, but it’s clear to me that the writer is a little fuzzy on the design goals of XML and also on the focus of SQL development over the past 10 or 15 years. In both cases the story is about interoperability.”

Saturday, January 8, 2022

OBG: No Understanding Without Foundation Knowledge Part 2 -- Debunking an Online Exchange 1

Note: To demonstrate the soundness and stability conferred by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. In re-publishing I may revise, break into or merge parts and/or add comments and/or references that I enclose in square brackets). 

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the)  order of the original publication.
Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com
Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com
Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com
Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com
Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com
Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Saturday, January 1, 2022

Schema and Performance: Never the Twain Shall Meet

One of the core objectives of this site (and my work) has been to demonstrate that there will not be progress in data management as long as the industry and trade media require and promote exclusively (mainly tool) experience in the absence of foundation knowledge. I have published and analyzed ample evidence that relational language and terminology are used without grasping what it actually means -- a good way to gauge lack of foundation knowledge.

Recently I posted a four part series titled "Nobody Understands the Relational Model" showing that even a practitioner steeped in the RDM does not really understand it. Consider now a practitioner's mistake at the beginning of career -- "a bad database schema and what it did to system performance" -- which, he claims, belatedly taught him a lesson. Hhhhmmm, did it, really?

View My Stats