ON WHAT IS A DATA MODEL: REPLY TO SIMON WILLIAMS
with Fabian Pascal

 

 

 





From: Simon Williams 
To: Editor

 

In an effort to get beyond the depth or otherwise of my ignorance, can I perhaps entice you into debate on the primary issue that the associative model was conceived to address, which I'll state here as succinctly as I can:

 

A relational database uses a separate, uniquely shaped relation to store data about each different type of thing in its problem domain. Thus, each new relational database application requires a new set of programs to be written from scratch, by programmers with knowledge of the database's schema. During this process, the schema becomes hard-coded into the programs, so that each subsequent change to the schema requires consequent changes to the programs. With the increasing complexity of modern database applications, the programming resource required is imposing a level of cost on application developers that is becoming unsustainable.

 

By contrast, an associative database uses a single, generic structure to store data and metadata about all types of real-world thing in its problem domain. Thus, it is possible, using existing programming languages, to write programs pitched at a higher level of abstraction, that do not have knowledge of individual schemas hard-coded into them, and may thus be used without modification against all possible associative schemas. Hence the amount of programming resource required to develop and maintain database applications is significantly reduced.


 

To: Simon Williams
From: Fabian Pascal

 

Before you make any such arguments you got to understand what a data model is and what the relational data model is. Unfortunately, you don't. You are, therefore, making silly arguments, whether you realize it or not.

 

The burden is on you to demonstrate you understand database fundamentals first, before you make public argument and you cannot ask for a response until such time as you do. Or we'd be swallowed into constantly arguing with ignorance and generating publicity for you, which is what you desire.

 

You told Lee earlier you know what a data model is. I challenged you to prove that in an article on DATABASE DEBUNKINGS. Well, then: Do it. Then we can talk.
 
 

From: Simon Williams 

I wrote the attached in some haste. As the saying goes, if I'd had more time I'd have written less.
 

 

From: Fabian Pascal

 

Exactly right. The definition of a data model takes one line and four bullets. Your definition is too long to be correct. And it is clear from your arguments in general that you don't have the correct definition.

 

Let me put it as succinctly as I can: the relational model is the application of predicate logic and set theory to database management. A replacement of the model means a replacement of predicate logic and set theory. That is what is implied when you say you've got a better "model" and, with all due respect, I don't think you have come up with that.

 

 

Editor Comments: My last reply to Williams was written before I had a chance to review his so-called definition (included below). Now that I have reviewed it, I was right that it is too long. It includes unnecessary historical details. Only the last part contains what can be deemed a definition, and my guess is that much of the text is lifted from other sources. Be that as it may, even that lacks one critical ingredient--the theoretical foundation.

 

A succinct, precise definition of a data model is:

 

"A general theory of data which defines structural (organization), integrityand manipulationfeatures."

 

It is straightforward to specify these precisely for the relational data model:

 

Theory: predicate logic and set mathematics

 

·         Structure: R-tables (precise definition!)

·         Integrity: domain, column, table and database integrity 

·         Manipulation: R-operations (restrict, project, join, etc.)

 

The question is does Williams understand the "definition" that he cites? If so, and if Williams' "associative model of data" Is a data model, then: On what theory is it based?

 

What are–-precisely, please!--the structural, integrity and manipulation equivalents for his "model" and how are they different, let alone better, than the relational ones?

 

And if, as I suspect, he can't specify this, Chris Date refers him to Hugh Darwen's article What a Database Really Is: Predicates and Propositions in his and Hugh's RELATIONAL DATABASE WRITINGS 1994-1997

.

 

Data Models

by Simon Williams

 

The general need for data models is asserted by Codd in the opening sentence of the paper that is credited with defining the relational model: "Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation)." Codd credits his colleague C T Davies of IBM Poughkeepsie with convincing him of the need for data independence.

 

This theme is reflected in work from the early 1970s by the CODASYL Data Base Task Group (DBTG) and the IBM user groups Guide and Share. In 1975 the ANSI-SPARC committee proposed a more formal three-level meta-model, which consisted of an external level comprising the users' individual views of the database, a conceptual level comprising an abstract view of the whole database, and an internal level, comprising the physical representation of data in the database. The conceptual level corresponds to the DBTG concept of the "schema", and the views in the external level corresponds to the DBTG concept of "subschemas". Codd returns to the issue more formally in his 1979 paper "Extending the Database Relational Model to Capture More Meaning", in which he states that relational model consists of:

 

·   A collection of time-varying tabular relations;

·   The insert-update-delete rules (i.e. the entity integrity rule and the referential integrity rule); and

·   The relational algebra.

 

He also observes that various semantic decomposition concepts are closely associated with (i.e. are almost part of) the relational model. Codd goes on to describe an extended version of the relational model called RM/T in an attempt to capture more of the meaning of the data by embracing concepts dealing with molecular semantics such as aggregation and generalisation. Codd later refines RM/T into RM/V2, which comprises 333 features. In Chapter 8 of "Relational Database: Selected Writings", Date presents some of Codd's work in a more accessible way, and goes on to deal with how the relational model should be interpreted. Referencing Codd, he asserts that a data model consists of a collection of data object types, a collection of general integrity rules and a collection of operators. Date goes on to clarify Codd's non-exhaustive list of six uses for a data model, and articulates his own interpretation principle, namely that a data model "must have a commonly accepted (and useful) interpretation; that is, its objects, integrity rules and operators must have some generally accepted correspondence to phenomena in the real world."

 

Date presents arguments to support his claims, stressing that data models are formal systems, whilst the real world is an informal system, and thus a data model must use formal behaviour to mimic informal aspects of the real world. The rest of Date's arguments focus on explaining how the relational model should be interpreted to conform to his interpretation principle. 

 

It is interesting to compare and contrast the approaches that Codd and Date have taken to the relational model and, by inference, to data models and their scope more generally. Codd, the mathematician, has grown increasingly rigorous, fine-grained and proscriptive, lamenting that the commercial database world has failed to follow him. Date, the communicator, has sought through his Relational Database Writings series to interpret, amplify and render accessible most of the more arcane aspects of relational database theory.

 

Date expresses the belief that "Everyone professionally involved in database management should be thoroughly conversant not only with the relational model per se, but also with its interpretation." Today, many of the groups that Date defines are neither. Such a lack of knowledge of the conceptual underpinnings of their profession would be unthinkable (and potentially disastrous) for, say, a civil engineer or an architect.

 

The purpose of a data model is to provide an abstract view of data and schema in a database, so that its users don't need to know how the data is physically stored and retrieved. This objective, called implementation independence, is desirable in order to ensure that the programs that maintain the data and the queries that users ask of the data do not need to change as the hardware and software platforms on which the database runs evolve, or as the database is re-implemented on other platforms. To this end, a data model needs to include, as a minimum: 

 

A. A set of abstractions that database designers may use to represent types of things in the real world and their properties, together with the integrity rules that govern how instances of those abstractions may interrelate, and the types of operations that may be performed on such instances.

 

It is desirable and appropriate that a data model also includes at least one proposal for: 

 

B. A set of physical data storage constructs that may be used to implement a database management system based on the data model, together with appropriate mappings between the physical constructs and the abstractions.

 

A proposal may qualify as a data model based solely on A (the relational model did) but today its credentials are strengthened if it can demonstrate at least one way in which it may be implemented. It is also useful if a data model addresses: C. How it proposes to fulfill the many practical demands that the modern data processing environment imposes, such as how data may be secured, transmitted and distributed.

 

 

Posted 04/26/02

 

 

 

[ABOUT] [QUOTES] [LINKS]