Thursday, January 30, 2020

TYFK: What Is a Relational Database?

“RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access. RDBMS store the data into collection of tables, which might be related by common fields (database table columns). RDBMS also provide relational operators to manipulate the data stored into the database tables. An important feature of RDBMS is that a single database can be spread across several tables. This differs from flat-file databases, in which each database is self-contained in a single table. The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
  • Data is stored in tables called relations.
  • Relations can be normalized.
  • In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same domain.”

The question got 18 answers online, but none came even close to being correct. This is the only one that merits debunking -- the rest will be posted on LinkedIn as "To laugh or cry?".


Note: While the question is about database, due to routine interchangeable use of database and DBMS, we suspect the intention was DBMS. Our debunking applies to database, and our correct answer makes the proper distinction.

First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our posts, BOOKS, PAPERS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE

Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you.

DATA FUNDAMENTALS 
 
The industry is chockful of misconceptions due to lack of foundation knowledge. Corrections them are dismissed as "theory that is not practical", misinterpreted as "ad-hominem attacks", or ignored altogether, regardless of the amount and quality of reasoning and supporting evidence. Most practitioners -- be it user or vendor personnel -- cannot discern fallacies and do not realize the practical implications thereof and, thus, cannot associate problems with their real causes., hence the industry's "cookbook approach" and succession of fads.
What about you? Are you just a practitioner, or a thinking professional? 
TYFK (Test Your Foundation Knowledge) posts will each present and debunk a pronouncement containing one or more misconceptions. First try to detect them, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

NEW
  • 01/14/20 Updated the LINKS page
  • 01/04/20 Updated the POSTS page with the 2020 posts
  • 12/08/19 Added two educational references on set theory to the LINKS page.

LATEST PUBLICATIONS (order PAPERS and BOOKS)

USING THIS SITE
  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics site, the links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA  

I deleted my Facebook account. You can follow me:
  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

The Misconceptions


  • While SQL was initially intended as a relational data sublanguage, it is riddled with violations of the RDM, it is not just a data sublanguage, and is a poorly designed language. SQL DBMSs are not, therefore, RDBMSs.
  • True relational databases consist of relations -- a special kind of mathematical set adapted for and applied to database management; database relations can be visualized on physical media as R-tables (tuples and attributes as rows and columns) -- not to be confused with SQL tables -- that play no part in the RDM. Neither RDBMSs, nor SQL DBMSs store data as tables.
  • Database relations are logical model objects, "flat files" are physical implementation objects. Using them interchangeably confuses levels of representation referred to as logical-physical confusion (LPC). The RDM supports physical independence (PI): a database relation can be implemented as one or more multiple file(s), multiple relations can be implemented as one file.
  • Relational databases represent several types of real world relationships as constraints on single and multiple relations. The conventional wisdom to date was that one type -- M:1 relationships among entity groups -- are represented by referential constraints (not "common columns"), while M:N relationships are represented by association relations and referential constraints on both referencing and referenced relations. According to a new understanding of the RDM, all such relationships -- both M:1 and M:N, 1:1 being just a special case thereof) -- are represented by association relations and corresponding referential constraints.
  • The RDM is an adaptation of simple set theory (SST) expressible in first order predicate logic (FOPL) applied to database management, and it is this formal theoretical foundation that confers scientific status. Among multiple practical advantages is system-guaranteed logical validity and by-design semantic consistency.
  • The data model concept is poorly understood in the industry. The RDM is the only formally well defined and complete data model, but it is routinely confused with conceptual, logical, and physical models, all of which are referred to as "data models". When practitioners refer to RDM, they mean SQL, which has no formally well-defined, complete, or relational data model.
  • As already mentioned, the RDM does not "define a table as a n-ary relation". According to the new understanding of the RDM database relations are both normalized (in 1NF) and fully normalized (in 5NF) by definition, otherwise they are not relations. For all the relational advantages to materialize database design must adhere to three formal design principles that jointly imply full normalization. Explicit normalization and full normalization are necessary only to repair poor designs due to failure to adhere to these principles.
  • 1NF means that relations have attributes defined on simple domains, the values of which are treated by a relational data sublanguage as atomic. Each attribute is a simple function (1:1 mapping) of the domain.
  • Each tuple of a relation is a set of values -- each drawn from a domain -- at least one of which is a PK value that uniquely identifies the tuple within the relation.


The Correct Answer


A relational database is one designed to adhere to the three core design principles required by the RDM [1] that jointly imply full normalization [2]. 


For relational advantages -- foremost among them correctness (system-guaranteed logical validity and by-design semantic consistency) [3] to materialize, relational databases must be managed by a RDBMS that implements correctly and fully the RDM introduced by Codd as re-interpreted and formalized by McGoveran [4,5]. 

A relational system is comprised of a RDBMS and the relational databases it manages.



Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.
 

References

[1] Pascal, F., Database Design: What It Is and Isn't

[2] Pascal, F., Normalization and Further Normalization, Parts 1-2

[3] Pascal, F., Logical Validity and Semantic Correctness

[4] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming

[5] Pascal, F., Re-interpreting Codd



Friday, January 24, 2020

Naming Relations: Singular or Plural?

Revised 1/24/20.
“There is a lot of confusion when it comes to designing tables in SQL Server around whether to pluralize names or not. How do you choose whether to pluralize or not? If we want to store a list of people and their details do we use "Person", "Persons", "People" or "Peoples"? Some people will use "People" and some will use "Person", other persons or people would go for "Peoples" or "Persons". The defined standard is to go for non-plural because in a table we are storing a set of an entity and we name the table as the entity so if we want to store one or more people in a single entity or table, we store it or them in the “Person” table. If we stick to this then it makes other situations simpler and stops us having to think about how to pluralize a word, I have for example seen hierarchy pluralized as "hierarcys" [sic].

If we look at Relational Model of Data Large Shared Data Banks by none other than "E. F. Codd" who basically invented the relational database, the examples he gives are singular (supplier and component). If we then look at the ISO standard for naming things (11179-5: Naming and identification principles), this also says that singular names should be used "Nouns are used in singular form only".

For new projects or where you can easily change the name of entities then I would say you must use singular names, for older projects you’ll need to be a bit more pragmatic!”

--The.AgileSQL.Club

Ignoring, for the purposes of this discussion, that a SQL table is not a relation[1], and we don't "store a list of an entity set" in it[2], naming relations involves two choices: (1) the name per se (person, people?), and (2) singular or plural (people or peoples? person or persons)? The former is determined at the conceptual level by the enterprise's business terminology[3]. While the RDM is mute on the latter, nevertheless foundation knowledge (here, what relations represent) is, as always, relevant.

Wednesday, January 22, 2020

TYFK: “Why is a relational database so powerful?”


 Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“...the theoretical awesomeness of relational algebra is kinda hard to intuitively relate back to your payroll or audit-log tables - the real power is the computed join ... it lets you dynamically fetch sets of data in the exact format that you need ... with any group of tables in the dataset. Unlike other data models, where the things you can fetch are typically fixed when you define your elements, and where relationships between data - if any - are statically defined in advance ... joins let you specify the relationships between objects (rows and tables in SQL-based relational databases) ... create queries and run them on your data without needing to write a lot of extra code beyond the SQL itself. This Ad Hoc Query capability ends up being hugely valuable when doing "secondary" business tasks in a big data world such as doing reporting and analytics, and is often hugely difficult to do in non-relational environments without a lot of extra code and often a specialized reporting schema.”

“Relational theory as applied to databases is that all data is connected to each other, keyed to each piece ... And with a SQL query [you can] create anything that can exist, as output.”

“...main so called power of RDBMs lies within ACID compliance. A transaction in a RDB is Atomic, Consistent, isolated, durable ... makes a database useful, unique, or I suppose powerful ... let's say you update or insert some new record, right in the middle power goes out, due to ACID compliance your transaction will not go through ... either the operation will complete or fail, nothing in between. [And RDBMSs are] tried, tested and true for almost 50 years.”

First try to detect the misconceptions then check against our debunking (some of the above is correct per se but not directly relevant to relational power, some only partially correct, and some is wrong. Can you discern which is which?) If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Wednesday, January 15, 2020

TYFK: "What is better than relational databases?"

Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“Relational databases like MySQL, PostgreSQL and SQLite3 represent and store data in tables and rows. They're based on a branch of algebraic set theory known as relational algebra. Meanwhile, non-relational databases like MongoDB represent data in collections of JSON documents. The Mongo import utility can import JSON, CSV and TSV file formats. Mongo query targets of data are technically represented as BSON (binary JASON).

“Relational databases use Structured Querying Language (SQL), making them a good choice for applications that involve the management of several transactions. The structure of a relational database allows you to link information from different tables through the use of foreign keys (or indexes), which are used to uniquely identify any atomic piece of data within that table. Other tables may refer to that foreign key, so as to create a link between their data pieces and the piece pointed to by the foreign key. This comes in handy for applications that are heavy into data analysis.

“If you want your application to handle a lot of complicated querying, database transactions and routine analysis of data, you’ll probably want to stick with a relational database. And if your application is going to focus on doing many database transactions, it’s important that those transactions are processed reliably. This is where ACID (the set of properties that guarantee database transactions are processed reliably) really matters, and where referential integrity comes into play.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Friday, January 3, 2020

Science, "Data Science", and Database Science


“The foundation of modern database technology is without question the relational model; it is the foundation that makes the field a science.”
--C. J. Date, AN INTRODUCTION TO DATABASE SYSTEMS
“Over the past decades mainstream economics in universities has become increasingly mathematical, focusing on complex statistical analyses and modeling to the detriment of the observation of reality.”
-- J. Luyendijk, Don’t let the Nobel Prize fool you. Economics is not a science

Science is the formulation and validation of theories about the real world in the context of discovery (CoD) and context of validation (CoV), respectively. There is "hard" science -- theories about the physical world (physics, chemistry, biology) -- and "soft" science -- theories about human behavior (political, economics, psychology). All science uses data, initially only in the CoV, but increasingly also in the CoD -- computerized discovery of patterns as potential hypotheses (i.e., "data mining"). 

View My Stats