Sunday, June 28, 2020

TYFK: Misconceptions About the Relational Model


“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
  • Data is stored in tables called relations.
  • Relations can be normalized, [in which case] values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same domain.”

Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

Friday, June 12, 2020

Semantics and the Relational Model

“The RDM is semantically weak ... struggles with consistent granularity and has limitations at the property level... it has no concept of data flow ... it is an incomplete theory. Great for its time but needs something better now ... it uses ill defined and linguistically suspect labels ... it has no rules for semantic accuracy ... this just makes the RDM 1% of the truth ... the RDM should have solved this all by now ... but it has clearly not. You fail to see the reality of the failure of RDM in the real world ... this is your choice. I understand why you cling to it ... it is a most excellent theory that I respect greatly ... [but o]pen minds make progress...” 
Thus in a LinkedIn exchange. Criticism of the RDM almost always reflects poor foundation knowledge and lack of familiarity with the history of the field, and as we shall see, this one is not different. It is often triggered by what I call the "fad-to-fad cookbook approach", one of the latest fads being the industry's revelational "discovery" of semantics.

Thursday, May 28, 2020

No Such Thing As "Current Relational Data Models"

“... the concept of a state group is indeed a missing modeling concept in relational/current data models...”

Thus in a LinkedIn exchange. I don't know what a "state group" is, but I spent almost six decades debunking the misuses of data model in general and the abuses of the RDM in particular and I smell them from miles away. While the time when lack of foundation knowledge shocked me is long gone, practitioners' total unawareness of and indifference to it, and poor reasoning in a field founded on logic never ceases to amaze me.

What exactly are "relational/current data models"?

Sunday, May 10, 2020

TYFK: What Is A Database Relationship?

Note: This is a re-write of an earlier post. About TYFK posts (Test Your Foundation Knowledge) see the post insert below.

“Here two or more table[s] are related with each other. This is Database relationship. Database relationship is used a lot ... [in] relational database management systems ... shortly called RDBMS. Here is Join_data [sic] table and Interview_data table. For creating a relational database management system both of the table[s] must have a common field. Here Employee_ID is a common field ... Database relationship types: One-To-One relation, One-To-many relation, Many-to-many relation. Minimum one common field is essential in all the tables. The data type of common field and field size will be same in all the tables.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.


Monday, April 27, 2020

TYFK: "Multi-model DBMSs" is an Empty Set


Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“Traditional databases ... don't have a multi-model capability. Point is that richer data models are underused, relational data models are overused, and graph data models have so many advantages that shouldn't be ignored. Relational models, on the other hand, have wildly complex structures often with hundreds to thousands of tables. Each table then contains tens to hundreds of columns, arbitrarily constructed in each and every relational system. And just in case the situation wasn't complex enough, many of those columns are exist exclusively to manage uniqueness and provide connections to other tables. This Structure-FIrst approach produced the cascade of complexity from which we have struggled to recover ever since.”

First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Monday, March 23, 2020

TYFK: How (Not) to Compare NoSQL Systems and RDBMSs?


Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“But if you still want to compare NOSQL databases with RDBMS, they primarily vary in
1. "normalization" where RDBMS contains normalized (upto certain degree) data and NOSQL based database contains non-normalized data;
2. RDBMS based databases are (I MUST say, generally and it isn't a criteria) fully ACID compliant while NOSQL databases are partially ACID compliant.
3. RDBMS are much slower and difficult to scale while NOSQL databases are much faster and easily scalable.
4. RDBMS normalization was very useful 50 years ago when cost of disk and memory was high, and computation power was limited. With the revolution in computing power, cheapest disk and memory availability has made RDBMS normalization a matter of joke - many people do not really understand why they need to normalize data in today's time.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Note: In what follows RDBMS refers to a truly relational DBMS (of which currently aren't any), not to be confused with a SQL DBMS.

Thursday, March 12, 2020

Muddling Modeling Part 2: An Example


In an old article I used a Hay-Ross exchange to illustrate how disregard for fundamentals and the associated name proliferation -- which underlies the industry's fad-to-fad tradition -- cause confusion that inhibits understanding of conceptual modeling for database design. A recent LinkedIn exchange -- hardly unique -- showed the article to be as relevant today as it was two decades ago, prompting me to bring it up to date.

In Part 1 we reiterated pertinent fundamentals. Here is the re-written article
-- try to apply the fundamentals from Part 1 before you proceed with our debunking.

Sunday, March 1, 2020

Muddling Modeling Part 1: Fundamentals


“Data modelling, star schema, snow flakes, data vault. Implementing virtual data warehouses (many stage to modify relationships). Normalisation (using a lot of surrogate keys) all for the sake of business reporting analytics. Reason a SQL DBMS approach columns rows is mandatory.”
--LinkedIn

This recent "comment" reminded me of a decades-old article I published in response to a critique by David Hay of the "fact model" then newly proposed by Ron Ross as an "alternative to the data model". In a Letter to the Editor, Hay correctly observed:
“In our industry, there is a strong desire to put names on things. This is natural enough, given the amount of information that we have to classify and deal with in our work. To give something a name is to gain control over it, and this is not necessarily a bad thing. The problem is when the name takes the place of true understanding of the thing named. Discourse tends to be the bantering of names, without true understanding of the concepts involved.”
of which the above comment is an exquisite example.

Friday, February 14, 2020

TFYK: What Is a Relational Schema?

Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“A relational database stores information in a structured format called a schema. This schema is defined according the rules of database normalization. These rules are meant to ensure the integrity of the data. The schema for a database is broken up into the objects such as tables and constraints. Tables hold your data and are broken down into rows. each row represents a single entity such as a person and has columns which define the attributes of the entity such as age. Constraints define limitations around the data. For example a check constraint might limit the range of valid dates in a datetime column. From there queries can be run to extract data from the database. These queries will often join multiple tables to pull data from them.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Thursday, January 30, 2020

TYFK: What Is a Relational Database?

“RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access. RDBMS store the data into collection of tables, which might be related by common fields (database table columns). RDBMS also provide relational operators to manipulate the data stored into the database tables. An important feature of RDBMS is that a single database can be spread across several tables. This differs from flat-file databases, in which each database is self-contained in a single table. The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
  • Data is stored in tables called relations.
  • Relations can be normalized.
  • In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same domain.”

The question got 18 answers online, but none came even close to being correct. This is the only one that merits debunking -- the rest will be posted on LinkedIn as "To laugh or cry?".


Note: While the question is about database, due to routine interchangeable use of database and DBMS, we suspect the intention was DBMS. Our debunking applies to database, and our correct answer makes the proper distinction.

First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our posts, BOOKS, PAPERS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Friday, January 24, 2020

Naming Relations: Singular or Plural?

Revised 1/24/20.
“There is a lot of confusion when it comes to designing tables in SQL Server around whether to pluralize names or not. How do you choose whether to pluralize or not? If we want to store a list of people and their details do we use "Person", "Persons", "People" or "Peoples"? Some people will use "People" and some will use "Person", other persons or people would go for "Peoples" or "Persons". The defined standard is to go for non-plural because in a table we are storing a set of an entity and we name the table as the entity so if we want to store one or more people in a single entity or table, we store it or them in the “Person” table. If we stick to this then it makes other situations simpler and stops us having to think about how to pluralize a word, I have for example seen hierarchy pluralized as "hierarcys" [sic].

If we look at Relational Model of Data Large Shared Data Banks by none other than "E. F. Codd" who basically invented the relational database, the examples he gives are singular (supplier and component). If we then look at the ISO standard for naming things (11179-5: Naming and identification principles), this also says that singular names should be used "Nouns are used in singular form only".

For new projects or where you can easily change the name of entities then I would say you must use singular names, for older projects you’ll need to be a bit more pragmatic!”

--The.AgileSQL.Club

Ignoring, for the purposes of this discussion, that a SQL table is not a relation[1], and we don't "store a list of an entity set" in it[2], naming relations involves two choices: (1) the name per se (person, people?), and (2) singular or plural (people or peoples? person or persons)? The former is determined at the conceptual level by the enterprise's business terminology[3]. While the RDM is mute on the latter, nevertheless foundation knowledge (here, what relations represent) is, as always, relevant.

Wednesday, January 22, 2020

TYFK: “Why is a relational database so powerful?”


 Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“...the theoretical awesomeness of relational algebra is kinda hard to intuitively relate back to your payroll or audit-log tables - the real power is the computed join ... it lets you dynamically fetch sets of data in the exact format that you need ... with any group of tables in the dataset. Unlike other data models, where the things you can fetch are typically fixed when you define your elements, and where relationships between data - if any - are statically defined in advance ... joins let you specify the relationships between objects (rows and tables in SQL-based relational databases) ... create queries and run them on your data without needing to write a lot of extra code beyond the SQL itself. This Ad Hoc Query capability ends up being hugely valuable when doing "secondary" business tasks in a big data world such as doing reporting and analytics, and is often hugely difficult to do in non-relational environments without a lot of extra code and often a specialized reporting schema.”

“Relational theory as applied to databases is that all data is connected to each other, keyed to each piece ... And with a SQL query [you can] create anything that can exist, as output.”

“...main so called power of RDBMs lies within ACID compliance. A transaction in a RDB is Atomic, Consistent, isolated, durable ... makes a database useful, unique, or I suppose powerful ... let's say you update or insert some new record, right in the middle power goes out, due to ACID compliance your transaction will not go through ... either the operation will complete or fail, nothing in between. [And RDBMSs are] tried, tested and true for almost 50 years.”

First try to detect the misconceptions then check against our debunking (some of the above is correct per se but not directly relevant to relational power, some only partially correct, and some is wrong. Can you discern which is which?) If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Wednesday, January 15, 2020

TYFK: "What is better than relational databases?"

Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“Relational databases like MySQL, PostgreSQL and SQLite3 represent and store data in tables and rows. They're based on a branch of algebraic set theory known as relational algebra. Meanwhile, non-relational databases like MongoDB represent data in collections of JSON documents. The Mongo import utility can import JSON, CSV and TSV file formats. Mongo query targets of data are technically represented as BSON (binary JASON).

“Relational databases use Structured Querying Language (SQL), making them a good choice for applications that involve the management of several transactions. The structure of a relational database allows you to link information from different tables through the use of foreign keys (or indexes), which are used to uniquely identify any atomic piece of data within that table. Other tables may refer to that foreign key, so as to create a link between their data pieces and the piece pointed to by the foreign key. This comes in handy for applications that are heavy into data analysis.

“If you want your application to handle a lot of complicated querying, database transactions and routine analysis of data, you’ll probably want to stick with a relational database. And if your application is going to focus on doing many database transactions, it’s important that those transactions are processed reliably. This is where ACID (the set of properties that guarantee database transactions are processed reliably) really matters, and where referential integrity comes into play.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Friday, January 3, 2020

Science, "Data Science", and Database Science


“The foundation of modern database technology is without question the relational model; it is the foundation that makes the field a science.”
--C. J. Date, AN INTRODUCTION TO DATABASE SYSTEMS
“Over the past decades mainstream economics in universities has become increasingly mathematical, focusing on complex statistical analyses and modeling to the detriment of the observation of reality.”
--J. Luyendijk, Don’t let the Nobel Prize fool you, economics is not a science

Science is the formulation and validation of theories about the real world in the context of discovery (CoD) and context of validation (CoV), respectively. There is "hard" science -- theories about the physical world (physics, chemistry, biology) -- and "soft" science -- theories about human behavior (political, economics, psychology). All science uses data, initially only in the CoV, but increasingly also in the CoD -- computerized discovery of patterns as potential hypotheses (i.e., "data mining"). 

View My Stats