From: Simon Williams
To: Editor
In an effort to get beyond the depth or otherwise of my
ignorance, can I perhaps entice you into debate on the primary issue that the
associative model was conceived to address, which I'll state here as succinctly
as I can:
A relational database uses a separate, uniquely shaped
relation to store data about each different type of thing in its problem
domain. Thus, each new relational database application requires a new set of
programs to be written from scratch, by programmers with knowledge of the
database's schema. During this process, the schema becomes hard-coded into the
programs, so that each subsequent change to the schema requires consequent changes
to the programs. With the increasing complexity of modern database
applications, the programming resource required is imposing a level of cost on
application developers that is becoming unsustainable.
By contrast, an associative database uses a single, generic
structure to store data and metadata about all types of real-world thing in its
problem domain. Thus, it is possible, using existing programming languages, to
write programs pitched at a higher level of abstraction, that do not have
knowledge of individual schemas hard-coded into them, and may thus be used
without modification against all possible associative schemas. Hence the amount
of programming resource required to develop and maintain database applications
is significantly reduced.
To: Simon Williams
From: Fabian Pascal
Before you make any such arguments you got to understand what
a data model is and what the relational data model is. Unfortunately, you
don't. You are, therefore, making silly arguments, whether you realize it or
not.
The burden is on you to demonstrate you understand database
fundamentals first, before you make public argument and you cannot ask for a
response until such time as you do. Or we'd be swallowed into constantly
arguing with ignorance and generating publicity for you, which is what you
desire.
You told Lee earlier you know what a data model is. I
challenged you to prove that in an article on DATABASE DEBUNKINGS. Well, then:
Do it. Then we can talk.
From: Simon Williams
I wrote the attached in some haste. As the saying goes, if
I'd had more time I'd have written less.
From: Fabian Pascal
Exactly right. The definition of a data model takes one line
and four bullets. Your definition is too long to be correct. And it is clear
from your arguments in general that you don't have the correct definition.
Let me put it as succinctly as I can: the relational model is
the application of predicate logic and set theory to database
management. A replacement of the model means a replacement of predicate logic
and set theory. That is what is implied when you say you've got a better
"model" and, with all due respect, I don't think you have come up
with that.
Editor Comments: My
last reply to Williams was written before I had a chance to review his
so-called definition (included below). Now that I have reviewed it, I was right
that it is too long. It includes unnecessary historical details. Only the last
part contains what can be deemed a definition, and my guess is that much of the
text is lifted from other sources. Be that as it may, even that lacks one
critical ingredient--the theoretical foundation.
A succinct, precise definition of a data model is:
"A general theory of data which defines structural
(organization), integrityand manipulationfeatures."
It is straightforward to specify these precisely for
the relational data model:
Theory: predicate logic and set mathematics
·
Structure: R-tables (precise definition!)
·
Integrity: domain, column, table
and database integrity
·
Manipulation: R-operations (restrict, project,
join, etc.)
The question is does Williams understand the
"definition" that he cites? If so, and if Williams' "associative
model of data" Is a data model, then: On what theory
is it based?
What are–-precisely, please!--the structural,
integrity and manipulation equivalents for his "model" and how are
they different, let alone better, than the relational ones?
And if, as I suspect, he can't specify this, Chris Date
refers him to Hugh Darwen's article What a Database Really Is: Predicates
and Propositions in his and Hugh's RELATIONAL
DATABASE WRITINGS 1994-1997
Data Models
by Simon Williams
The general need for data models is asserted by Codd in the
opening sentence of the paper that is credited with defining the relational
model: "Future users of large data banks must be protected from having to
know how the data is organized in the machine (the internal
representation)." Codd credits his colleague C T Davies of IBM
Poughkeepsie with convincing him of the need for data independence.
This theme is reflected in work from the early 1970s by the
CODASYL Data Base Task Group (DBTG) and the IBM user groups Guide and Share. In
1975 the ANSI-SPARC committee proposed a more formal three-level meta-model,
which consisted of an external level comprising the users' individual views of
the database, a conceptual level comprising an abstract view of the whole
database, and an internal level, comprising the physical representation of data
in the database. The conceptual level corresponds to the DBTG concept of the
"schema", and the views in the external level corresponds to the DBTG
concept of "subschemas". Codd returns to the issue more formally in
his 1979 paper "Extending the Database Relational Model to Capture More
Meaning", in which he states that relational model consists of:
· A
collection of time-varying tabular relations;
· The
insert-update-delete rules (i.e. the entity integrity rule and the referential
integrity rule); and
· The
relational algebra.
He also observes that various semantic decomposition concepts
are closely associated with (i.e. are almost part of) the relational model.
Codd goes on to describe an extended version of the relational model called RM/T
in an attempt to capture more of the meaning of the data by embracing concepts
dealing with molecular semantics such as aggregation and generalisation. Codd
later refines RM/T into RM/V2, which comprises 333 features. In Chapter 8 of
"Relational Database: Selected Writings", Date presents some of
Codd's work in a more accessible way, and goes on to deal with how the
relational model should be interpreted. Referencing Codd, he asserts that a
data model consists of a collection of data object types, a collection of
general integrity rules and a collection of operators. Date goes on to clarify
Codd's non-exhaustive list of six uses for a data model, and articulates his
own interpretation principle, namely that a data model "must have a
commonly accepted (and useful) interpretation; that is, its objects, integrity
rules and operators must have some generally accepted correspondence to
phenomena in the real world."
Date presents arguments to support his claims, stressing that
data models are formal systems, whilst the real world is an informal system,
and thus a data model must use formal behaviour to mimic informal aspects of
the real world. The rest of Date's arguments focus on explaining how the
relational model should be interpreted to conform to his interpretation
principle.
It is interesting to compare and contrast the approaches that
Codd and Date have taken to the relational model and, by inference, to data
models and their scope more generally. Codd, the mathematician, has grown
increasingly rigorous, fine-grained and proscriptive, lamenting that the
commercial database world has failed to follow him. Date, the communicator, has
sought through his Relational Database Writings series to interpret, amplify
and render accessible most of the more arcane aspects of relational database
theory.
Date expresses the belief that "Everyone professionally
involved in database management should be thoroughly conversant not only with
the relational model per se, but also with its interpretation." Today,
many of the groups that Date defines are neither. Such a lack of knowledge of
the conceptual underpinnings of their profession would be unthinkable (and
potentially disastrous) for, say, a civil engineer or an architect.
The purpose of a data model is to provide an abstract view of
data and schema in a database, so that its users don't need to know how the
data is physically stored and retrieved. This objective, called implementation
independence, is desirable in order to ensure that the programs that maintain the
data and the queries that users ask of the data do not need to change as the
hardware and software platforms on which the database runs evolve, or as the
database is re-implemented on other platforms. To this end, a data model needs
to include, as a minimum:
A. A set of abstractions that database designers may use to
represent types of things in the real world and their properties, together with
the integrity rules that govern how instances of those abstractions may
interrelate, and the types of operations that may be performed on such
instances.
It is desirable and appropriate that a data model also includes
at least one proposal for:
B. A set of physical data storage constructs that may be used to
implement a database management system based on the data model, together with
appropriate mappings between the physical constructs and the abstractions.
A proposal may qualify as a data model based solely on A (the
relational model did) but today its credentials are strengthened if it can
demonstrate at least one way in which it may be implemented. It is also useful
if a data model addresses: C. How it proposes to fulfill the many
practical demands that the modern data processing environment imposes, such as
how data may be secured, transmitted and distributed.
Posted
04/26/02
[ABOUT]
[QUOTES]
[LINKS]