Saturday, October 29, 2022

NEW "DATA MODELS" 3 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2001 exchanges on Simon Wlliams' so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 was the email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that started the series. Part 2 was my response to a reader's email questioning our dismissal of Williams's claims.  Part 3 is my email exchange with Williams: he provided his "definition" of a data model on which I conditioned any discussion with him and I proved my point by debunking it.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

10/23 NOBODY UNDERSTANDS NORMALIZATION 2 (sms)

10/16 NEW "DATA MODELS" 2 (t&n)

10/08 NOBODY UNDERSTANDS NORMALIZATION 1 (sms)

UPDATES

08/20 Added Logic and databases course to LINKS page.

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
------------------------------------------------------------------------------------------------------------------

Then: ON WHAT IS A DATA MODEL -- REPLY TO SIMON WILLIAMS

(originally published April 2002)

“In an effort to get beyond the depth or otherwise of my ignorance, can I perhaps entice you into debate on the primary issue that the associative model was conceived to address, which I'll state here as succinctly as I can:

A relational database uses a separate, uniquely shaped relation to store data about each different type of thing in its problem domain. Thus, each new relational database application requires a new set of programs to be written from scratch, by programmers with knowledge of the database's schema. During this process, the schema becomes hard-coded into the programs, so that each subsequent change to the schema requires consequent changes to the programs. With the increasing complexity of modern database applications, the programming resource required is imposing a level of cost on application developers that is becoming unsustainable.

By contrast, an associative database uses a single, generic structure to store data and metadata about all types of real-world thing in its problem domain. Thus, it is possible, using existing programming languages, to write programs pitched at a higher level of abstraction, that do not have knowledge of individual schemas hard-coded into them, and may thus be used without modification against all possible associative schemas. Hence the amount of programming resource required to develop and maintain database applications is significantly reduced.”
Before you make any such arguments you got to understand what a data model is and what the relational data model is. Unfortunately, you don't. You are, therefore, making silly arguments, whether you realize it or not. The burden is on you to demonstrate you understand database fundamentals first, before you make public argument and you cannot ask for a response until such time as you do. Or we'd be swallowed into constantly arguing with ignorance and generating publicity for you, which is what you desire.

You told Lee earlier you know what a data model is. I challenged you to prove that in an article on DATABASE DEBUNKINGS. Well, then: Do it. Then we can talk.
“I wrote the attached in some haste. As the saying goes, if I'd had more time I'd have written less.”
Exactly right. The definition of a data model takes one line and four bullets. Your definition is too long to be correct. And it is clear from your arguments in general that you don't have a definition.

Let me put it as succinctly as I can: the relational model is the application of predicate logic and set theory to database management. A better data model means a replacement of predicate logic and set theory [with something that does what RDM does, but more or better -- that is the implication of the claim of a superior data model] and, with all due respect, I doubt you could have done that, to put it politely.

Editor Comments: My reply to Williams was written before I reviewed his so-called definition (included below). Now that I have reviewed it, I was right to doubt his claims. It is not a definition of a data model:]
  • It includes unnecessary historical details;
  • Only the last part contains what can be considered anything like a definition
  • Much of the text is lifted from other sources.
  • Be that as it may, even that lacks one critical ingredient -- the theoretical foundation.

A succinct, precise definition of a data model is:
A general theory of information representation and retrieval which defines structural/integrity and manipulation.

It is straightforward to specify these precisely for the RDM:

  • Theory: simple set theory expressible in first order predicate logic (SST/FOPL)
  • Structure: [Semantically constrained 5NF relations];
  • Integrity: domain, attribute, relation and database constraints;
  • Manipulation: Relational algebra [properly revised].

Does Williams understand the "definition" that he cites? If so, and if Williams' "associative model of data" is a data model, then:

  • On what theory is it based?
  • What are –- precisely, please! -- the structural, integrity and manipulation equivalents for his "model" and how are they different, let alone better, than the relational ones?

And if, as I suspect, he can't specify this, Chris Date refers him to Hugh Darwen's article What a Database Really Is: Predicates and Propositions in his and Hugh's RELATIONAL DATABASE WRITINGS 1994-1997.

========================================
Data Models
by Simon Williams

The general need for data models is asserted by Codd in the opening sentence of the paper that is credited with defining the relational model: "Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation)." Codd credits his colleague C T Davies of IBM Poughkeepsie with convincing him of the need for data independence.

This theme is reflected in work from the early 1970s by the CODASYL Data Base Task Group (DBTG) and the IBM user groups Guide and Share. In 1975 the ANSI-SPARC committee proposed a more formal three-level meta-model, which consisted of an external level comprising the users' individual views of the database, a conceptual level comprising an abstract view of the whole database, and an internal level, comprising the physical representation of data in the database. The conceptual level corresponds to the DBTG concept of the "schema", and the views in the external level corresponds to the DBTG concept of "subschemas". Codd returns to the issue more formally in his 1979 paper "Extending the Database Relational Model to Capture More Meaning", in which he states that relational model consists of:

  • A collection of time-varying tabular relations;
  • The insert-update-delete rules (i.e. the entity integrity rule and the referential integrity rule); and
  • The relational algebra.

He also observes that various semantic decomposition concepts are closely associated with (i.e. are almost part of) the relational model. Codd goes on to describe an extended version of the relational model called RM/T in an attempt to capture more of the meaning of the data by embracing concepts dealing with molecular semantics such as aggregation and generalisation. Codd later refines RM/T into RM/V2, which comprises 333 features. In Chapter 8 of "Relational Database: Selected Writings", Date presents some of Codd's work in a more accessible way, and goes on to deal with how the relational model should be interpreted. Referencing Codd, he asserts that a data model consists of a collection of data object types, a collection of general integrity rules and a collection of operators. Date goes on to clarify Codd's non-exhaustive list of six uses for a data model, and articulates his own interpretation principle, namely that a data model "must have a commonly accepted (and useful) interpretation; that is, its objects, integrity rules and operators must have some generally accepted correspondence to phenomena in the real world."

Date presents arguments to support his claims, stressing that data models are formal systems, whilst the real world is an informal system, and thus a data model must use formal behaviour to mimic informal aspects of the real world. The rest of Date's arguments focus on explaining how the relational model should be interpreted to conform to his interpretation principle.

It is interesting to compare and contrast the approaches that Codd and Date have taken to the relational model and, by inference, to data models and their scope more generally. Codd, the mathematician, has grown increasingly rigorous, fine-grained and proscriptive, lamenting that the commercial database world has failed to follow him. Date, the communicator, has sought through his Relational Database Writings series to interpret, amplify and render accessible most of the more arcane aspects of relational database theory.

Date expresses the belief that "Everyone professionally involved in database management should be thoroughly conversant not only with the relational model per se, but also with its interpretation." Today, many of the groups that Date defines are neither. Such a lack of knowledge of the conceptual underpinnings of their profession would be unthinkable (and potentially disastrous) for, say, a civil engineer or an architect.

The purpose of a data model is to provide an abstract view of data and schema in a database, so that its users don't need to know how the data is physically stored and retrieved. This objective, called implementation independence, is desirable in order to ensure that the programs that maintain the data and the queries that users ask of the data do not need to change as the hardware and software platforms on which the database runs evolve, or as the database is re-implemented on other platforms. To this end, a data model needs to include, as a minimum:

A. A set of abstractions that database designers may use to represent types of things in the real world and their properties, together with the integrity rules that govern how instances of those abstractions may interrelate, and the types of operations that may be performed on such instances.

It is desirable and appropriate that a data model also includes at least one proposal for:

B. A set of physical data storage constructs that may be used to implement a database management system based on the data model, together with appropriate mappings between the physical constructs and the abstractions.

A proposal may qualify as a data model based solely on A (the relational model did) but today its credentials are strengthened if it can demonstrate at least one way in which it may be implemented. It is also useful if a data model addresses: C. How it proposes to fulfill the many practical demands that the modern data processing environment imposes, such as how data may be secured, transmitted and distributed.

========================================

Now

We don't need the entire history of the database management field copied from sources to define the concept of a data model.

A database relation represents something more specific than "types of things and their properties in the real world", but rather a multigroup -- a collection of related groups of entities defined by the properties and intra-group relationships shared by the entities and collectively by the inter-group relationships. The abstractions are structure/integrity (for RDM: relation; domain, attribute, tuple, multi-tuple and multi-relation constraints) and manipulation (relational algebra). The representation is formal and grounded in a general theory of data representation and retrieval.

A data model is defined in the context of database management and consequently, by definition implementable (preferably in a variety of physical ways for physical independence). But "physical data constructs" are part of the implementation and not of the data model.

You are welcome to do the exercise of seeking his AMD book and confirm that it is not a data model that satisfies the definition, let alone grounded in a formal theory, or better than RDM.

 

 

 

 

No comments:

Post a Comment

View My Stats