ABSTRACT
In the first paragraph of his first ever published exposition
of the relational idea 36 years ago Codd made clear three critical advantages
of a relational model of data:
·
A scientific, and therefore sound formal basis—logic
and mathematics—for database management
·
Physical data independence
·
System-guaranteed integrity of data and query
results.
He was also explicit about his intent to address deficiencies
of the hierarchical and network (graph) approaches underlying commercial
products at the time, which lacked those beneficial properties.
Yet from 1969 to date the industry has botched the
concretization of Codd’s ideas, by implementing SQL-based products that bear
limited resemblance to it, and violate relational principles left and right.
Moreover, instead of correcting their mistakes, vendors—including IBM (where
the relational model was invented), and Oracle (the first implementer of a SQL
DBMS)—are currently regressing to the same costly and unproductive technology
made obsolete by Codd’s innovation more than thirty years ago.
This is mostly due to the utter failure by industry and users
to educate themselves on, understand, and appreciate the practical value of his
contribution, and the huge cost of ignoring it. Indeed, young generations of
practitioners are not formally introduced to the model, and are instilled with
the notion that SQL products are relational. Driven by products rather than
principles, academia fails to provide the necessary knowledge.
It is therefore imperative—and proper for this series,
intended to make data fundamentals accessible to practitioners—to revisit
Codd’s original work, reassert those aspects that have been ignored, recall
those that were missed, clarify those that are opaque, correct
misinterpretations as well as original mistakes, and settle some current
disagreements over what the relational model really is.
This paper covers Codd’s seminal first two papers, Derivability,
Redundancy and Consistency of Relations Stored in Large Data Banks
(1969), and A
Relational Model of Data for Large Shared Data Banks (1970), the latter
being an important public revision of the former internal one, which contains
changes and introduces new material.
·
INTRODUCTION
·
RELATIONS ON DOMAINS
·
RELATION REPRESENTATION
·
TIME-VARYING RELATIONS VS. RELVARS
·
RELATION INTERPRETATION
·
DATA SUBLANGUAGE
·
ATOMICITY, NESTED RELATIONS, AND NORMALIZATION
·
FOREIGN KEYS AND (FIRST) NORMAL FORM
·
OPERATIONS ON RELATIONS
·
KINDS OF RELATIONS
·
DERIVABILITY, REDUNDANCY, CONSISTENCY
·
DEBUNKING MISCONCEPTIONS
·
CONCLUSION
·
REFERENCES
·
ADD-ON: DAVID MCGOVERAN ON THE 1969 RELATIONAL
OPERATIONS
Use of Materials Policy
PRICING AND ORDERING
Counter by WebCounter