Saturday, July 28, 2012

What Is a Data Model?

PV: I've read a blog post about what really is a data model, as used in the term "relational data model" (RM). It made the following points:

1. The implementation of a data model is a programming language.

2. The RM is not necessary. It is not necessary for developing software solutions, maintaining large shared databases, or any other purpose in the world of software development. Any software solutions that can be developed while employing the RM could be written without it, using other data models.

The first conclusion came from the analysis of Chris Date's definition of a data model:
A data model is an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact. The objects allow us to model the structure of data. The operators allow us to model its behavior. --C. J. Date, AN INTRODUCTION TO DATABASE SYSTEMS, Addison Wesley, 8th ed., 2003, p 15-16)
It concluded from this that the implementation of a data model is a programming language, whether a general purpose programming language or not.

I'm not sure if data languages (e.g., SQL and Tutorial D) qualify as programming languages. But maybe in a broader sense, we can say that data languages are also programming languages in the sense that we use them  to "program" (i.e., declare and manipulate) our data. So if only the relational data model had been implemented correctly, then the industry would have produced better data languages (i.e., the D languages). Am I right?

I don't know how #2 derives from #1, and the consequences of not using RM were not stated.

(UPDATED 6/17/12)

1. SQL is a data sublanguage, to be embedded in a programming language. As far as I know--and anybody should correct me if I am wrong--D is intended to be computationally complete.

A data model is not a programming language, but it must be concretized in some way, either as a a data sublanguage or within a full programming language. It is possible to have multiple languages (syntax) concretizing the semantics of the relational model e.g. QUEL, Dataphor and D.

(There is a different sense in which a data model can be compared to a programming language: A data model is to logical models what a programming language is to programs, but that is just a metaphor.)

Note: Incidentally, Codd preferred a data sublanguage for two reasons, although he vehemently opposed SQL. First, he wanted to avoid the complications of dealing with the programming language standard committees; and second, he wanted it to be based on First Order Logic—which is "declarative"/"non-procedural"--while programming languages are not. He designed his own language Alpha, which was relationally complete.

2. When he came up with the concept, Codd realized that there cannot be data management without some structure, integrity and manipulation, which is what a data model is. Prior to that, database products relied implicitly on the hierarchic and network data models, which lacked theoretical formulations. Subsequent efforts were made to abstract the models from practice, but they were not productive due to complexity and the difficulties of postfitting theory to practice.

You put your finger exactly on the critical issue when you referred to the consequences of using data models other than relational. Here is outline on how to assess those consequences from the forthcoming new version of my paper Business Modeling for Database Design:

Any data management technology claimed to be an improvement over the relational model
  •  Must be based on a data model that:
  •         Has a formal foundation as sound as
  •      Predicate logic
  •      Set theory
  •         Has a real world interpretation
  •      Is as complete
  •      Structure
  •      Integrity
  •      Manipulation
  •         Is more general and/or simpler
Anything less would be trading down.

(Originally posted at 3/17/06)

No comments:

Post a Comment