Sunday, April 11, 2021

(TYFK) Relations, Tables, and Semantic Consistency

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).


“A relation, or table, in a relational database ... must have a set of columns or attributes, and it must have a set of rows to contain the data. A tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate rows, but there should be practices in place to avoid this, such as the use of unique primary keys (next up). Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key. This primary key cannot be duplicated.”

 

Misconceptions

 
  • A relation is not a table and, thus, has neither columns, nor rows (certainly not fields);
  • "Duplicate tuples" is a contradiction in terms -- a table with duplicate rows does not visualize a relation (i.e., is not a R-table) -- and a database with duplicated data is not relational;
  • Unlike a mathematical relation, there is no such thing as a database relation without a PK, which would be semantically inconsistent (i.e. it would not be a faithful representation of group of entities, which are distinguishable in the real world);
 

Fundamentals

 

  • A database relation:
- is a relationship among domains (sets of values) -- a subset of their cross-product -- a set of tuples (sets of values drawn from the domains) or, in other words, a set of sets from sets;
- has attributes, which are representations (1:1 mappings) of the domains in the relation;
- is semantically constrained to be consistent with (i.e., represent faithfully) the entity group in the conceptual model represented by the database), including by a PK constraint:
. domains represent properties;
. relations represent entity groups;
. attributes represent entity properties;
. tuples represent (facts about -- properties of) entities;
. some constraints represent relationships among properties, entities and groups (some properties are relationships);
  • Duplication would violate the RDM and mean semantic inconsistency with (inaccurate representation of) the group.
  • A R-table visualizes a relation on some physical medium -- it plays no part in the relational model.


Further Reading


What Relations Really Are and Why They Are Important

Understanding Relations series

What Is a Relational Database




Thursday, March 25, 2021

(OBG) Don't Confuse Levels of Representation Part 1

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

This is an email exchange with readers in response to my article Normalization and Performance: Never the Twain Shall Meet.

Friday, March 19, 2021

Data Sublanguages vs. Programming Languages

Revised 3/20/21

I recently came across a review of Edsger Dijkstra's work, where the following comment on a book he co-authored (referred to as D&S) raised my debunking antennae:

“... in general computer people seem to have a penchant for whipping up homebrew logics ... D&S is not the only example ... See E.F. Codd’s Relational Calculus, an obvious mess.”
--Maarten van Emden, A Bridge too Far: E.W. Dijkstra and Logic 

Having recently argued that "Codd was wrong" and "You're teaching [his] gospel" Betray Lack of Foundation Knowledge, my suspicion should hardly surprise. Besides, criticism of Dijkstra is a very tall order in itself, particularly in the context of disputes among logicians). As a reader asked, "What’s so obviously messy in Codd’s Relational Calculus?". Answer:

Friday, March 12, 2021

(TYFK) Tables Can Visualize -- But ARE NOT -- Relations!

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).


“A relation, or table, in a relational database ... must have a set of columns or attributes, and it must have a set of rows to contain the data. A tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate rows, but there should be practices in place to avoid this, such as the use of unique primary keys (next up). Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key. This primary key cannot be duplicated.”

Misconceptions


  • A relation is not a table and, thus, has neither fields, nor columns (which are not attributes) and rows;
  • "Duplicate tuples" is a contradiction in terms -- a table with duplicate rows does not visualize a relation (i.e., is not a R-table) -- and a database with duplicated data is not relational;
  • Without PKs a relation is not semantically consistent (i.e. it does not faithfully represent group of entities, which are distinguishable in the real world);


Fundamentals


  • A database relation

- is a relationship among domains (sets of values) -- a subset of their cross-product -- a set of tuples (sets of values drawn from the domains) or, in other words, a set of sets from sets;
- has attributes, which are representations (1:1 mappings) of the domains in the relation;
- is semantically constrained to be consistent with (i.e., represent faithfully) the entity group in the conceptual model it represents in the database.

  • A R-table visualizes a relation on some physical medium -- it plays no part in the relational model.
  • Absence of PKs is semantic inconsistency with (inaccurate representation of) reality and a violation of the RDM.



Further Reading

What Relations Really Are and Why They Are Important

Understanding Relations series

What Is a Relational Database

Duplicates: Stating the Same Fact More Than Once Does Not Make it Truer, Only Redundant





Wednesday, February 24, 2021

(OBG) Third Order Properties and Multi-Tuple Constraints: An Example

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).


As part of the new understanding of the RDM we posted articles -- one last week -- about the types of properties and relationships at the conceptual level that are enforced via semantic constraints at the logical database level. One category of relationships exist among all members of an entity group, which are collective third order properties (3OP) of the group, enforced via multi-tuple constraints. There are at least two kinds of 3OP relationships: entity uniqueness, enforced via PK constraints and aggregate restriction, enforced via aggregation constraints. Practitioners are familiar with -- even if they do not necessarily have a full understanding of -- the former, but not so much with the latter. It so happens that they were the subject of an exchange between a reader of the old dbdebunk and C.J. Date. It is worth re-visiting as an example and, with the benefit of hindsight, to add some comments on re-publication.

Friday, February 19, 2021

(TYFK) Semantics, Relations and the Missed Link: Constraints

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to our debunking thereof. If there isn't a match, you can review references -- which reflect the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS and LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“[As a] set [a database relation] is a collection of similar or related things.”

--ArtfulSoftware.com


Can you tell what's wrong with this statement (hint: one word is wrong)? If not, it is because it is impossible without the old industry interpretation of the RDM.

Friday, February 12, 2021

(TYFK) What Is a Relational Database and Why Is It Important?

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
  • Data is stored in tables called relations.
  • Relations can be normalized.
  • In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same [sic] domain.”
--What is a relational database and why is it important, Quora.com
View My Stats