Thursday, August 4, 2022

DATABASE RELATIONS, TABLES AND SEMANTIC CONSISTENCY



by David McGoveran with Fabian Pascal

 

Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“In a RDBMS, a table is columned rows, as in you treat individual rows as an actual entity while the columns are its attributes. In an excel tab, you can create a column, but it doesn't have to have all the same data types in that column, nor does one row have to represent one entity. It's more free form ... All in all, RDB is relational because it's column based rows and constrained to that format, while non relational can have free form like an excel. When you have rows that are uniform (constrained to what the column should be), you create entities as tables, and link them through columns to keep track of the relationships.”
--Quora.com
I posted this on LinkedIn as one of my "To Laugh or Cry?" items which, unlike "Setting Matters Right" posts, are beyond debunking. But the exchange that followed made me realize that there is, nevertheless, pedagogical value to it: it expresses something important, but poorly due to author's lack of foundation knowledge.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

07/12 MISSING DATA AND MULTI-RELATION QUERY RESULTS (T&N)

07/03 Relations, Database Relations and Tables (SMS)

06/26 Repeating Groups and 1NF (T&N)

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
------------------------------------------------------------------------------------------------------------------

Fundamentals

"The fact that relations can be perceived as tables ... breeds the false assumption that the freedom of actions permitted [with] tables ... must also be permitted when manipulating relations."
--E.F. Codd (1990)
A set has:

  • An intension: The definition of the type of objects set members are -- the description of the common properties/relationships that all members share.
  • An extension: The object members that satisfy the definition (i.e., share those properties/relationships).

Applying the intension to the universe of objects selects out the extension.

You can specify a set extensionally by exhibiting its members, or intensionally by stating the properties and relationships they share (including properties stated negatively as properties they may not have) that qualify them as members.

In RDM (correctly understood) a database relation is a set of tuples that represent a group of entities of a single type (specified as precisely and completely as possible). Group membership is defined by the properties and relationships shared by entities that qualify them as members. As such, a database relation:

  • Is in 5NF by definition: the only dependencies that hold in it are functional dependencies (FD) of non-key attributes on the PK;
  • Its intension is the relation predicate (RP), expressed as constraints in a FOPL based data sublanguage:

- domain constraints represent properties;`
- attribute constraints represent first order properties (1OP) (i.e., individual) of members of specific groups;
- tuple constraints represent relationships among 1OPs of each group member that are 2nd order  properties (2OPs) of a group's members;
- multi-tuple constraints represent relationships among a group's members that are 3rd order collective properties (3OP) of the group;
- assertion predicate (AP): assertion by an authorized user that a tuple does represent an entity in the real world.

  • Its extension consists of the tuples that, out of the universe thereof, satisfy the RP (i.e., the constraints) and therefore qualify for inclusion in the relation.

The extension is data, the intension is its meaning (semantics), hence semantic constraints: they restrict the data to be consistent with its intended meaning (i.e., the relation to faithfully represent the group).

Note:

  • Non-5NF relations that represent entities of multiple types would require more constraints, rendering RP and RA expressions way more complex.
  • An attribute is a representation of a domain in a relation. For example, the domain $AMOUNT (representing a property) can appear in relations EMPLOYEES and DEPARTMENTS as attributes SALARY and BUDGET, that represent properties of employee and department entities, respectively.
  • There are relationships among groups -- which form a multigroup -- that are collective fourth order properties (4OP) of the multigroup and, thus, multi-relation predicates (MRP) and constraints that, jointly with the RPs comprise the database predicate (DBP). For the purposes of this discussion we focus on individual relations.

While tables made relational databases seem simple (and so "visual"!), aiding initial learning and implementations, a table is an incomplete expression of a database relation that is semantically constrained to represent a group of entities: the body of a table represents the extension of a relation and data alone does not convey meaning (semantics). Otherwise put, a table by itself is an incomplete specification of a relation. For example, does the table:

 

represent a database relation (that is in 5NF)? If so, what does it mean (what does it represent)?
Is the projection {COURT,MEMBER,START} a relation and what does it mean?

Formally, what matters in RDM is the symbolic (FOPL) expression of predicates -- the RPs and MRPs comprising the DBP -- expressed in a FOPL based data sublanguage as constraints and declared explicitly to the DBMS. Only they (or some formal equivalent):

  • Are capable of fully defining relations and their database context -- data and meaning -- ; and,
  • For correctness, must first be manipulated algebraically and only then enforced (by the DBMS) on  relations' extensions (their tuples).

Without predicates no database can be created correctly, or shown to be correct. And unless RA operations preserve all their semantics (including dependencies, but without introducing spurious ones), semantic consistency across operations cannot be guaranteed -- by definition.

If 5NF is to be meaningful, all relationships (including non-functional dependencies) must be expressed intensionally such that, from an implementation perspective, semantic constraint enforcement is integrated with RA operations by the DBMS when applied to relations.

Notwithstanding his warning, Codd nevertheless defined RA operations as sheer manipulation of tables (extensions), rather than as algebraic operations on RPs (intensions), which has been adopted by Date and has been the industry (albeit imperfect, to put it mildly) practice to date. This is the source of "update anomalies", a mislabeling of what is essentially information (semantic) loss that demands (otherwise unnecessary) further normalization.

This is not to say that the RDM as introduced by Codd (and even SQL) was not a huge improvement over pre-relational approaches. It is, nevertheless, the reason that the first author has spent so much (difficult to find) time over the past three decades developing a semantic relational algebra (SRA) with semantic closure. That formal work began with the introduction of RPs and MRPs comprising a DBP, uniform algorithms for relation (including view) updating and new database design principles.

Setting Matters Straight

Going back to the online comment, aside from confusion of levels of representation, it exhibits  the common perception of relations as tables. An extension of a relation (data) can be presented on a physical medium in a variety of ways, one being as (the body of) a table. Codd had actually started with the mathematically well-defined two-dimensional array as his slightly "informal" description of relations (more familiar to programmers than mathematical relations) and only later switched to the less well-defined "tables", but much more familiar to the non-mathematical. The spreadsheet is an equally acceptable presentation. 

What the author wants to express that is the crux of the matter but lacks the necessary foundation knowledge and terminology, is the spreadsheet's lack of the intension (RPs) and the RA of the RDM. While both a table and a spreadsheet display an extension as rows and columns, the extension is not a complete expression of a database relation. This is what the second author means by "the row-column arrangement per se plays a very limited role in the RDM".

Note: Emulating a RDBMS with a spreadsheet language would be akin to (and of the same difficulty) as using a non-object oriented language to implement an object oriented programming methodology.

Further Reading

David McGoveran
LOGIC FOR DATABASE FOLKS (draft chapters)

Fabian Pascal
Reinterpreting Codd
Understanding Relations: Tables, So What?
Relationships and Tables
What Relations Really Are and Why They Are Important

 

 

 

 

No comments:

Post a Comment

View My Stats