Monday, March 23, 2020

TYFK: How (Not) to Compare NoSQL Systems and RDBMSs?

Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“But if you still want to compare NOSQL databases with RDBMS, they primarily vary in
1. "normalization" where RDBMS contains normalized (upto certain degree) data and NOSQL based database contains non-normalized data;
2. RDBMS based databases are (I MUST say, generally and it isn't a criteria) fully ACID compliant while NOSQL databases are partially ACID compliant.
3. RDBMS are much slower and difficult to scale while NOSQL databases are much faster and easily scalable.
4. RDBMS normalization was very useful 50 years ago when cost of disk and memory was high, and computation power was limited. With the revolution in computing power, cheapest disk and memory availability has made RDBMS normalization a matter of joke - many people do not really understand why they need to normalize data in today's time.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Note: In what follows RDBMS refers to a truly relational DBMS (of which currently aren't any), not to be confused with a SQL DBMS.

Thursday, March 12, 2020

Muddling Modeling Part 2: An Example

In an old article I used a Hay-Ross exchange to illustrate how disregard for fundamentals and the associated name proliferation -- which underlies the industry's fad-to-fad tradition -- cause confusion that inhibits understanding of conceptual modeling for database design. A recent LinkedIn exchange -- hardly unique -- showed the article to be as relevant today as it was two decades ago, prompting me to bring it up to date.

In Part 1 we reiterated pertinent fundamentals. Here is the re-written article
-- try to apply the fundamentals from Part 1 before you proceed with our debunking.

Sunday, March 1, 2020

Muddling Modeling Part 1: Fundamentals

“Data modelling, star schema, snow flakes, data vault. Implementing virtual data warehouses (many stage to modify relationships). Normalisation (using a lot of surrogate keys) all for the sake of business reporting analytics. Reason a SQL DBMS approach columns rows is mandatory.”

This recent "comment" reminded me of a decades-old article I published in response to a critique by David Hay of the "fact model" then newly proposed by Ron Ross as an "alternative to the data model". In a Letter to the Editor, Hay correctly observed:
“In our industry, there is a strong desire to put names on things. This is natural enough, given the amount of information that we have to classify and deal with in our work. To give something a name is to gain control over it, and this is not necessarily a bad thing. The problem is when the name takes the place of true understanding of the thing named. Discourse tends to be the bantering of names, without true understanding of the concepts involved.”
of which the above comment is an exquisite example.

Friday, February 14, 2020

TFYK: What Is a Relational Schema?

Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“A relational database stores information in a structured format called a schema. This schema is defined according the rules of database normalization. These rules are meant to ensure the integrity of the data. The schema for a database is broken up into the objects such as tables and constraints. Tables hold your data and are broken down into rows. each row represents a single entity such as a person and has columns which define the attributes of the entity such as age. Constraints define limitations around the data. For example a check constraint might limit the range of valid dates in a datetime column. From there queries can be run to extract data from the database. These queries will often join multiple tables to pull data from them.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Wednesday, February 5, 2020

TYFK: What is a database relationship?

 Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.

“Here two or more table are related with each other. This is Database relationship. Database relationship is used a lot. Also known as relational database management system. It is shortly called RDBMS. Here is Join_data table and Interview_data table. For creating a relational database management system both of the table must have a common field. Here Employee_ID is a common field.
Database relationship types:
01. One-To-One relation
02. One-To-many relation
03. Many-to-many relation.
01. Minimum one common field is essential in all the tables. The data type of common field and field size will be same in all the tables.
02. There must be a primary key field in any of the table from all the tables.”

First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Thursday, January 30, 2020

TYFK: What Is a Relational Database?

“RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access. RDBMS store the data into collection of tables, which might be related by common fields (database table columns). RDBMS also provide relational operators to manipulate the data stored into the database tables. An important feature of RDBMS is that a single database can be spread across several tables. This differs from flat-file databases, in which each database is self-contained in a single table. The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
  • Data is stored in tables called relations.
  • Relations can be normalized.
  • In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same domain.”

The question got 18 answers online, but none came even close to being correct. This is the only one that merits debunking -- the rest will be posted on LinkedIn as "To laugh or cry?".

Note: While the question is about database, due to routine interchangeable use of database and DBMS, we suspect the intention was DBMS. Our debunking applies to database, and our correct answer makes the proper distinction.

First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our posts, BOOKS, PAPERS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.


Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you.

The industry is chockful of misconceptions due to lack of foundation knowledge. Corrections them are dismissed as "theory that is not practical", misinterpreted as "ad-hominem attacks", or ignored altogether, regardless of the amount and quality of reasoning and supporting evidence. Most practitioners -- be it user or vendor personnel -- cannot discern fallacies and do not realize the practical implications thereof and, thus, cannot associate problems with their real causes., hence the industry's "cookbook approach" and succession of fads.
What about you? Are you just a practitioner, or a thinking professional? 
TYFK (Test Your Foundation Knowledge) posts will each present and debunk a pronouncement containing one or more misconceptions. First try to detect them, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

  • 01/14/20 Updated the LINKS page
  • 01/04/20 Updated the POSTS page with the 2020 posts
  • 12/08/19 Added two educational references on set theory to the LINKS page.


  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics site, the links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.


I deleted my Facebook account. You can follow me:
  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


The Misconceptions

  • While SQL was initially intended as a relational data sublanguage, it is riddled with violations of the RDM, it is not just a data sublanguage, and is a poorly designed language. SQL DBMSs are not, therefore, RDBMSs.
  • True relational databases consist of relations -- a special kind of mathematical set adapted for and applied to database management; database relations can be visualized on physical media as R-tables (tuples and attributes as rows and columns) -- not to be confused with SQL tables -- that play no part in the RDM. Neither RDBMSs, nor SQL DBMSs store data as tables.
  • Database relations are logical model objects, "flat files" are physical implementation objects. Using them interchangeably confuses levels of representation referred to as logical-physical confusion (LPC). The RDM supports physical independence (PI): a database relation can be implemented as one or more multiple file(s), multiple relations can be implemented as one file.
  • Relational databases represent several types of real world relationships as constraints on single and multiple relations. The conventional wisdom to date was that one type -- M:1 relationships among entity groups -- are represented by referential constraints (not "common columns"), while M:N relationships are represented by association relations and referential constraints on both referencing and referenced relations. According to a new understanding of the RDM, all such relationships -- both M:1 and M:N, 1:1 being just a special case thereof) -- are represented by association relations and corresponding referential constraints.
  • The RDM is an adaptation of simple set theory (SST) expressible in first order predicate logic (FOPL) applied to database management, and it is this formal theoretical foundation that confers scientific status. Among multiple practical advantages is system-guaranteed logical validity and by-design semantic consistency.
  • The data model concept is poorly understood in the industry. The RDM is the only formally well defined and complete data model, but it is routinely confused with conceptual, logical, and physical models, all of which are referred to as "data models". When practitioners refer to RDM, they mean SQL, which has no formally well-defined, complete, or relational data model.
  • As already mentioned, the RDM does not "define a table as a n-ary relation". According to the new understanding of the RDM database relations are both normalized (in 1NF) and fully normalized (in 5NF) by definition, otherwise they are not relations. For all the relational advantages to materialize database design must adhere to three formal design principles that jointly imply full normalization. Explicit normalization and full normalization are necessary only to repair poor designs due to failure to adhere to these principles.
  • 1NF means that relations have attributes defined on simple domains, the values of which are treated by a relational data sublanguage as atomic. Each attribute is a simple function (1:1 mapping) of the domain.
  • Each tuple of a relation is a set of values -- each drawn from a domain -- at least one of which is a PK value that uniquely identifies the tuple within the relation.

The Correct Answer

A relational database is one designed to adhere to the three core design principles required by the RDM [1] that jointly imply full normalization [2]. 

For relational advantages -- foremost among them correctness (system-guaranteed logical validity and by-design semantic consistency) [3] to materialize, relational databases must be managed by a RDBMS that implements correctly and fully the RDM introduced by Codd as re-interpreted and formalized by McGoveran [4,5]. 

A relational system is comprised of a RDBMS and the relational databases it manages.

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


[1] Pascal, F., Database Design: What It Is and Isn't

[2] Pascal, F., Normalization and Further Normalization, Parts 1-2

[3] Pascal, F., Logical Validity and Semantic Correctness

[4] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming

[5] Pascal, F., Re-interpreting Codd

Friday, January 24, 2020

Naming Relations: Singular or Plural?

Revised 1/24/20.
“There is a lot of confusion when it comes to designing tables in SQL Server around whether to pluralize names or not. How do you choose whether to pluralize or not? If we want to store a list of people and their details do we use "Person", "Persons", "People" or "Peoples"? Some people will use "People" and some will use "Person", other persons or people would go for "Peoples" or "Persons". The defined standard is to go for non-plural because in a table we are storing a set of an entity and we name the table as the entity so if we want to store one or more people in a single entity or table, we store it or them in the “Person” table. If we stick to this then it makes other situations simpler and stops us having to think about how to pluralize a word, I have for example seen hierarchy pluralized as "hierarcys" [sic].

If we look at Relational Model of Data Large Shared Data Banks by none other than "E. F. Codd" who basically invented the relational database, the examples he gives are singular (supplier and component). If we then look at the ISO standard for naming things (11179-5: Naming and identification principles), this also says that singular names should be used "Nouns are used in singular form only".

For new projects or where you can easily change the name of entities then I would say you must use singular names, for older projects you’ll need to be a bit more pragmatic!”


Ignoring, for the purposes of this discussion, that a SQL table is not a relation[1], and we don't "store a list of an entity set" in it[2], naming relations involves two choices: (1) the name per se (person, people?), and (2) singular or plural (people or peoples? person or persons)? The former is determined at the conceptual level by the enterprise's business terminology[3]. While the RDM is mute on the latter, nevertheless foundation knowledge (here, what relations represent) is, as always, relevant.

View My Stats