TRULY RELATIONAL - WHAT IT REALLY MEANS
Fabian Pascal Paper #1 v.3 (January 2011)

ABSTRACT

In the first paragraph of his first ever published exposition of the relational idea 42 years ago Codd made clear three critical advantages of a relational model of data:

· A sound theoretical basis—logic and mathematics—for database management

· Physical data independence

· DBMS-guaranteed integrity of data and query results.

He also made explicit his intent to address deficiencies of the graph model underlying the hierarchic and network commercial products at the time, which lacked those beneficial properties.

Yet from 1969 to date the industry has botched the concretization of Codd’s ideas by implementing SQL-based products that bear only limited resemblance to them and violate relational principles left and right. Moreover, instead of correcting their mistakes, vendors—including IBM (where the relational model was invented) and Oracle (the first implementer of a SQL DBMS)—have been regressing to the same costly and unproductive technologies (objects, XML) made obsolete by Codd’s innovation more than thirty years ago.

This is due primarily to the utter failure by vendors, experts, users and the trade press to educate themselves on, understand, and appreciate the practical value of Codd’s contribution and the huge cost of ignoring it. Young generations of practitioners are not even introduced to the relational model and are instilled either with the notion that SQL products are relational, or that relational technology is obsolete. Driven by industry rather than principles, academia has renounced its true function of education and is serving as product trainer for vendors.

It is therefore imperative—and proper for this series, intended to make data fundamentals accessible to practitioners—to revisit Codd’s original work, reassert those aspects that have been ignored, recall those that were missed, clarify those that are opaque, correct misinterpretations as well as original mistakes and settle current disagreements and confusion over what the relational model really is.

This paper covers Codd’s seminal first two papers, Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks (1969), and A Relational Model of Data for Large Shared Data Banks (1970), the latter being an important public revision of the former (an internal IBM document), that contained changes and introduced new material.

· Introduction

· Relations on Domains

· Relation Representation

· Time-varying Relations

· Relation Interpretation

· Data Sublanguage

· Atomicity, Nested Relations and Normalization

· Foreign Keys and (First) Normal Form

· Operations on Relations

· Kinds of Relations

· Derivability, Redundancy, Consistency

· Debunking Misconceptions

· Conclusion

· REFERENCES

·Add-on: David MCGoveran on the 1969 Relational Operations

Use of Materials Policy

PRICING AND ORDERING

Counter by Digits