Friday, February 19, 2021

TYFK: Semantics, Relations and the Missed Link - Constraints



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to our debunking thereof. If there isn't a match, you can review references -- which reflect the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS and LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“[As a] set [a database relation] is a collection of similar or related things.”

--ArtfulSoftware.com


Can you tell what's wrong with this statement (hint: one word is wrong)? If not, it is because it is impossible without the old industry interpretation of the RDM.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST UPDATES
-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

Fundamentals

Like its mathematical counterpart, a database relation is a relationship among domains -- a subset of their Cartesian (i.e., cross) product. But unlike abstract mathematical relations, database relations have an intended interpretation -- they mean something: they jointly represent a conceptual model of reality:

  • Domains represent properties;
  • Individually, each relation represents (facts about) a group of entities of a single type, which

- share direct 1st order properties (1OP);
- shere indirect 2nd order properties (2OP) that are relationships among 1OPs;
- have relationships among all group members that are collective 3rd order properties (3OP) of their group;

  • Collectively, they represent a multigroup -- a collection of entity groups that have relationships that are collective 4th order properties (4OP) of the multigroup.

Note that this conceptualization is distinct from traditional entity-relationship modeling (E/RM):
It includes not just primitive objects (entities), but also compound objects (groups as aggregates of entities and the multigroup as aggregate of groups). The relationships:

  • Among 1OPs are 2OPs of entities;
  • Among entities within a group are 3OPs of the group;
  • Among groups are 4OPs of the multigroup.

In other words, relationships at lower level of aggregation are properties at the higher level.

One of the many flawed criticisms of the RDM is that it is "semantically poor". It is true that at the time Codd introduced the RDM the three levels of representation had not yet emerged and he focused on the logical level. But even the RDM in its initial form has an integrity component and while practitioners are aware of it, few realize that its function is semantic -- to constrain relations and the database for consistency with the intended conceptual model -- hence semantic constraints, of which there are several types:

  • Domain constraints ensure consistency with properties;
  • Relation constraints:

- attribute constraints ensure consistency with 1OPs (properties in the context of specific groups);
- tuple constraints ensure consistency with 2OPs;
- multi-tuple constraints ensure consistency with 3OPs;

  • Database (multi-relation) constraints ensure consistency with 4OPs)

We've written extensively about them (see references). For the purposes of this discussion relation constraints are pertinent.

Note: Simplicity was a core objective of the RDM. Codd did subsequently propose semantic enhancements, but they unavoidably increased complexity.

Conclusion

At the logical level a database relation is a set of tuples, but "similar or related things" is conceptual, not formal logical language. It refers to what tuples represent: entity members of a group, which

  • Are similar -- share the same 1OPs (enforced via attribute constraints) and 2OPs (enforced via tuple constraints); and,
  • Have relationships with all group members (enforced via multi-tuple constraints).

There are at least two types of multi-tuple constraints: PK and aggregate constraints ensuring consistency with relationship among all group members, which are collective group 3OPs.

In other words, the wrong word in the statement is 'or' -- it should be 'and'. The correct statement should thus be: 

“As a set a database relation represents (facts about) a collection of similar AND related entities.”
No doubt this will prompt criticisms of "nitpicking" and "theoretical purism". But without this interpretation there will never be sufficient understanding of the RDM to take full advantage of it in both implementation and use.

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


References

McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming.

Codd, E.F., Extending the Database Relational Model to Capture More Meaning

Merrett, T.H., Extending the Relational Algebra to Capture Less Meaning

Relationships and the RDM series

Semantics and the Relational Model

Data and Meaning series

What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency

Relationships, Rules, Relations and Constraints

 

 

 

 

No comments:

Post a Comment

View My Stats