Monday, June 19, 2023

PREDICATE LOGIC, SEMANTICS AND RDM (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

 

“As I have said many times, if the original relational model had been based on predicate logic and also the semantics and rules of definitions we'd all be better off now. It wasn't. Full stop.”
--Ronald Ross, LinkedIn.com
Assessing such arguments normally requires clarification of what exactly is meant by "the relational model". Ross does refer specifically to the "original" -- which we take to mean that introduced by Codd in 1969-70 -- but given the massive misuse and abuse in the industry, perceptions of it may well be corrupted (Nobody Understands the Relational Model Semantics, Relational Closure and Database Correctness).  Moreover, there are many predicate logic (PL) systems and many ways of categorizing them (1st vs n-th order being only one way) -- we assume Ross means RDM is based on none.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

05/28 INTENSION, EXTENSION AND R-TABLES (t&n)

05/08 ON PROPERTIES & CHEN'S E/RM (rm)

04/30 RELATIONSHIPS AND THE RDM V2 Part 3: SEMANTIC CONSTRAINTS

UPDATES

04/23 Added The Story of Mathematical Proof to LINKS page

04/03 Added First OrderLogic to LINKS page

04/03 Added Mathematical Logic - Reasoning in First Order Logic to LINKS page

03/26 Added Modeling of Integrity Constraints Dependencies to LINKS page

03/14 Added Russell’s On Denoting to LINKS page

03/14 Added Russell’s Paradox to LINKS page.

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)

08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
------------------------------------------------------------------------------------------------------------------

We suspect that Codd, a mathematician @IBM, discerned a useful correspondence of tuples in the mathematical relation theory (MRT) -- a subset of simple set theory (SST) -- to the physical records of data systems: it could raise the level of abstraction of data representation and manipulation.  This explains the primary focus on the logical level and physical independence (PI). In 1969-70 there was no explicit conceptual perspective, which emerged only later in the mid-70s with Chen's entity-relationship model (E/RM).  Codd presented his semantic extension to RDM, RM/T, in 1979.

The core connection of RDM to PL is from Alonzo Church's AN INTRODUCTION TO MATHEMATICAL LOGIC,  which Codd cited in his initial two papers. In Derivability, Redundancy And Consistency Of Relations Stored In Large Data Banks (1969) he permitted relation-valued domains/attributes (RVA/RVD), which require the power of second order logic (SOL):

“The adoption of a relational view of data ... permits the development of a universal retrieval sublanguage based on the second-order predicate calculus. The second-order predicate calculus (rather than first-order) is needed because the domains on which relations are defined may themselves have relations as elements (see section 1). Such a language would provide a yardstick of linguistic power for all other proposed retrieval languages, and would itself be a strong candidate for embedding (with appropriate syntactic modification) in a variety of host languages
(programming, command or problem oriented).”  
But realizing that SOL would expose to undecidability and relative complexity, in A Relational Model of Data for Large Shared Data Banks (1970), he prohibited them, trading some of the power of SOL for the decidability and simplicity of first order predicate logic (FOPL).
“The adoption of a relational model of data ... permits the development of a universal data sublanguage based on an applied predicate calculus. A first-order predicate calculus suffices if the collection of relations is in normal form. Such a language would provide a yardstick of linguistic power for all other proposed data languages, and would itself be a strong candidate for embedding (with appropriate syntactic modification) in a variety of host languages (programming, command-, or problem-oriented).”

Note: The 1970 normal form (TNF) was distinct from  later first normal form (1NF) known today; the best way to describe the difference between them is that TNF was to 1969 join what 5NF is to today's join (5NF, Association Relations and Join).  This is why we contend that database relations are by definition and design in 5NF (Normalized, Fully Normalized, Non-Normalized, Denormalized -- Clearing the Mess).

Understanding exactly how RDM is based on first order predicate logic (FOPL) requires deep knowledge of PLs and lots of details beyond the scope of this post, intended to be accessible to practitioners. For our purposes here, concretization of RDM takes the form of FOPL-based relationally complete data sublanguages, to be hosted by computationally complete application-development languages. Notwithstanding various issues identified by McGoveran in his re-interpretation of Codd's work that we have documented, everything in the original RDM was expressible in FOPL

Note: Among the issues:

·          Gloss over details of FOPL assumptions about attributes that don't quite fit the normal data notion/use of tuplerepresentation by practitioners; (Entities, Properties and Codd's Sleight of Hand);

·          Simplified and severely limited relational algebra operators, with the consequence being "normalization";

·          Crippled set operators (union compatibility) and an evolving definition of join;

·          Unspecified kind of 3VL/4VL incompatible with the axioms and rules of inference as 2VL FOPL (Last NULL in the Coffin: A Relational Solution to Missing Data) and so on.

Like mathematical relations, a database relation is a set and thus, has an intension -- the criterion for membership of tuples in the relation -- and an extension, the set of tuples that satisfy the criterion. Unlike mathematical relations, which are abstract (represent nothing), database relations have real world interpretations -- business rules (BR) expressed in specialized (Ross calls it "structured") natural language (NL) that define the aspects of reality -- groups of entities with properties and relationships -- that relations are designed to represent. The intension is a formalization in FOPL -- a relation predicate (RP) -- for database representation and manipulation, what we know today as semantic constraints.

Against Codd's strong objections (who proposed his own Alpha (A data base sublanguage founded on the relational calculus), IBM developed SEQUEL as a prototype for its System R research project, which was rushed to market as SQL by Oracle and the rest is history. SEQUEL/SQL authors had little understanding of RDM (Why Are There No Relational DBMSs; If You Liked SQL, You'll Love XQUERY) and SQL is so full of RDM violations, that it is far from being relational. I recall describing SQL as "not structured, not just for queries and a poorly designed (not really a) language" (Language Redundancy and DBMS Performance: A SQL Story). 

and other, more subtle violations of RDM (SQL Sins) that defeats FOPL expressibility Over the years it's been increasingly overloaded by vendors with more relationa violations and with application development features.

Unfortunately, IBM's market power at the time imposed SQL as the industry standard, defeating all alternative, often superior, predicate based languages and, paradoxically, rendering it synonymous with the RDM. So tightly identified has been and is SQL with the RDM, that when practitioners refer to RDM, they think of SQL without even realizing it. Does Ross?

Note: FOPL based languages can express the relational calculus (are declarative) or the relational algebra (imperative), which is a matter of degree. Ingres QUEL  & Query by Forms and IBM's own Query by Example) were more declarative than SQL, which is more imperative.

SQL uses tables instead of relations and very few realize the implications. While a tabular visualization of relations is useful in communicating RDM to an industry lacking foundation knowledge and hostile to theoretical foundations, aside from reinforcing logical-physical confusion (LPC), tables have been and are equated with relations.

The original RDM did include two category of constraints -- first PK uniqueness, then referential  (Codd called them "integrity rules"),  but absent a conceptual perspective, they were intended to preserve the mathematical properties of database relations and accommodate conversion of hierarchic to relational databases, rather than semantic adaptations.  The initial versions of SQL, however, did not include even those (adding them later caused complications). The tabular perception of relations induced by SQL's institutionalization as the "lingua franca of relational databases" (the ANSI committee explicitly refrained from relational terminology) distracted further away from the RPs and pushed towards "normalization" (extensions). This was internalized by the Date & Darwen interpretation of the original RDM dominant in the industry, whereby a relation consists of a body and header. While the body represents the extension, the header -- a set of attribute names -- is not a substitute for the intension. Being invisible in tables, RPs sort of "got lost" (Intension, Extension and R-tables).

Given the FOPL expressibility of the original RDM (see FOL MODELING OF INTEGRITY CONSTRAINTS (DEPENDENCIES), there was nothing, SQL notwithstanding, to prevent FOPL-based relational data sublanguages that express the RPs, comprised of:

  • Domain constraints;
  • Relation constraints;
- tuple constraints;
- multituple constraints;
  • Database (multirelation) constraints.

In my first paper (CONCEPTUAL MODELING FOR RELATIONAL DATABASE DESIGN cca 2000+)  you'll find an example of a conceptual model consisting of BRs expressed in specialized NL, the formalization of which as predicates is expressed in an imaginary FOPL-based language because there was no language other than SQL, or influenced by SQL (indeed, Ingres had to implement a SQL interface to QUEL to survive and even that was not enough). 

Were it not for industry's disregard for theory (I am unaware of any commitment by Ross, a BR specialist, to PL during the years), a true relational data sublanguage expressing the RPs might have been produced. Would we have been better off, as Ross believes, had Codd not been distracted by SQL, tables and under implementation pressures by IBM? In an educated, knowledgeable industry appreciatives of sound theoretical foundations, of course. But in a fad driven industry that operates like the fashion industry I very much doubt it. The same factors responsible for SQL would have worked against it too.

As we explained elsewhere (Semantics and the Relational Model), even the modest semantic extension in Codd's RM/T adds considerable complexity (in fact, there is one theoretically grounded proposal for even less semantics (Extending the relational algebra to capture less meaning). If the industry failed to take advantage of the simpler original RDM, what are the chances that a more complex RDM would have fared any better? How many practitioners even minimally proficient in FOPL are there even now?

David McGoveran has dedicated years re-interpreting Codd's work and semantically extending and formalizing a RDM based on a modified FOPL (LOGIC FOR SERIOUS DATABASE FOLK, draft chapters), but just like the original RDM, alternative relational languages and RM/T, his ideas presented over time were ignored, the industry reverting to its usual ad-hoc, pre-relational and non-relational technologies.

Claiming that the RDM is not based on PL -- specifically, FOPL -- is not only false, but also tantamount to blaming it for industry's failure to implement it.

 

 

 

 

No comments:

Post a Comment

View My Stats