Saturday, May 21, 2022

OBG: Domains and RDBMS

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).

The following is an email exchange with a reader and DBMS designer.

ON DATA TYPES AND WHAT A DBMS IS

(originally published in 2001)

Reader:
"I would like to hear your (or Date's) opinion on The Suneido Database … it seems to me self-contradictory. They aren't typed ... so how can they define operators, or even the idea of domains. They also say they include administrative commands, which as far as I understand isn't allowed in the THIRD MANIFESTO. While they do not claim to be an implementation of the Manifesto, their claims that their database language was created by CJ Date do not sound appropriate."

 "They don't know what [domains (distinct from programming data types)] are and what their function in the RDM is. That's common for all DBMS vendors, the claims of which should be always taken with more than a grain of salt."

Monday, May 2, 2022

SMS: "Relation Proliferation"?

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “How do I avoid too many relations in databases?”

A: “You don’t. Every relation is there to store meaningful data, hopefully you do not define database relations for data that are not to be stored in your database.”

A: “By following proper design principles. Normalization, standard data patterns, and progressing from logical to physical always. Never denormalize (or avoid normalizing in the first place) because performance never trumps accuracy. It really doesn't matter how fast you get the wrong answer.”
--Quora.com

Monday, April 25, 2022

SMS: Relational Database and Set Theory

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “To what extent is relational database theory related to set theory?”

A: “Relational database theory is indeed closely derived from set theory. Many operations in relational data are directly related to common operations one does with sets. In fact, SQL has keywords for them that should sound familiar to someone who has just taken a class in Discrete Mathematics:
  • UNION
  • INTERSECT
  • DIFFERENCE (called MINUS in Oracle)
Even the structure of a table is set-oriented. A table is a set of rows, and a row is a set of columns, and those columns must match the set of columns defined in the table's header.”

--Quora.com

Sunday, April 10, 2022

SMS: Quota Queries

Note: "Setting Matters Straight" (SMS) is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). 

Q: “How do you return the most recent record in SQL?”

A: “There are many ways of doing it. I would suggest (first thing came to my mind):
Select Top 1
from YourTable
order by TablePrimaryKey Desc;”
A: “If you mean "the last inserted record which has no datetime stamp field" ... you have a few options.
  • If you cannot use date/time -- your next best bet would be an auto-increment/sequence field, which assigns increasing numbers to each inserted record.
  • If that’s not available, you would have to rely on business logic e.g. order # or some such.
Some vendors, like Oracle, provide ROWID pseudocolumn for each record which might help in some quick’n’dirty cases -- it is not guaranteed to be sequential but could be (e.g., when table has had no DELETE operations).”    --Quora.com

If you don't know, I set matters straight @dbdebunk.com.

Friday, March 25, 2022

SMS: Keys and Indexes

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals to encourage readers to test their foundation knowledge, which they can then compare with our debunking here, where we confirm what is correct and correct what is fallacious (with clarifications, wherever necessary). For in-depth treatment check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs).


Q: “What is the difference between a primary key, a unique key, and an index in databases?”

A: “Unique key is a field (or fields) with a set of unique values; the uniqueness is usually enforced with UNIQUE constraint. There might be one or more per table. Every PRIMARY key is always a unique key; there should be only one per table. It uniquely identifies record, and is used to enforce integrity - entity integrity, and, in tandem with FOREIGN key, referential integrity. Index is a data structure to facilitate records search. It might be created on PRIMARY key (best practice), unique key or any other field or combination thereof in the table. The limit on how many indices a table might have is defined in RDBMS implementation. An index might - or might not - speed up some queries.”

A: “The primary key is inherently indexed and unique and is the cross reference to related tables. Often the best primary key is an auto number integer as any value entered by humans is subject to error or delay that can be challenging to manage in the user interface ... whereas an auto number is assigned immediately upfront and eliminates any possible record conflict in tables during multi user entries. A unique key is somewhat of an informal definition. My view is that it is a definition of a field that is not being used as the primary key, but is unique unlike i.e. Last Name -- for instance a social security number field. So it is not the primary key as it is not the field/value being used to cross reference to related tables but it is unique in the table.”

A: “A primary key is a unique, non null value which can identify every tuple (row in the table) uniquely. A unique key/column/constraint ensures that no two rows contain the same value (almost the same as primary key). Unless specified explicitly for the column configuration, a NULL is a valid value for column with unique constraint. A index can be thought of as the appendix at the end of the book. The information is sorted in specific order so that look up is easy and it points to the location that is being searched for.”
--Quora.com

Friday, March 18, 2022

SMS: Entities and Records

 

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals to encourage readers to test their foundation knowledge, which they can then compare with our debunking here, where we confirm what is correct and correct what is fallacious (with clarifications, wherever necessary). For in-depth treatment check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs).

Q: “What is the relationship between an entity and a record?”

A: “In the context of a database design, an ‘entity’ is a type or category of persons, places, things or events. It’s a collectivisation of the nouns in a system about which you wish to keep data. For example, Employee might be the name of an entity in your system. A ‘record’ is a collection of data about a specific entity, a particular person or place, an identifiable thing, or a single event. For example, Name: ‘Dave Voorhis’, StartYear: 2019, Salary: £1,398,293 might be a record of one Employee entity in your system.”

A: “Database, file, and recordset are basically the same thing. They are collections of information or data. Each database or file or recordset typically has some sort of common purpose or definition. Like a database (relational, hierarchical, etc.) of data of a business process. A File is again a collection of data such as all transactions to be posted. A recordset is also basically a file.

Entity and table are basically the same thing. While you have the grouping of all the data, and entity (logical view) and a table (physical view) are the same. As Dave said, it is a logical grouping of a specific piece of data.

File, recordset, record, row or line are basically the same. A .csv file is a grouping of records. A file is a grouping of records. A row is an individual grouping of data from a relational database.

The last is element or attribute or field. This is the individual piece of data like Transaction_Amount or First Name.”
--Quora.com
A simple and the answer oversimplifies. But things seem simple only in the absence of foundation knowledge. Practitioners use different terms for the same thing, or the same word for different things, but that must be corrected, not accepted or validated.

Sunday, March 6, 2022

TYFK: Relationships -- Uniqueness and Attribute Constraints

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

 

“A unique constraint is a type of column restriction within a table, which dictates that all values in that column must be unique [and] allows null values ... a null is the complete absence of a value (not a zero or space). Thus, it is not possible to say that the value in that null field is not unique, as nothing is stored in that field.”
--Techopedia
This is one of my recent "What's Wrong with this database picture" posts on LinkedIn.

Misconceptions

In the RDM a uniqueness constraint:

  • Should not be viewed solely as a "column restriction within a table'.
  • Does NOT allow SQL "NULLs" (not "NULL values"), which have nothing to do with storage.

View My Stats