Sunday, May 29, 2016

Multidimensional Relations, Flat Tables and Logical-Physical Confusion

Many--and I would guess, most--data professionals are clueless about the Relational Data Model (RDM) and its practical value for database practice. They confuse RDBMS with SQL DBMS and use relational jargon, but don't understand what it really means and don't care whether they do or not. As I announced, every other week (with occasional exceptions), I will debunk common and entrenched myths and misconceptions exposed by the "What's wrong with this picture?" of the preceding week.

From last week:

"Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables. This usually leads to data storage becoming redundant, and also difficult to maintain on account of addition and deletion anomalies. This is only the case if we do not normalize the data. Keys are fields or combinations of fields used to identify records." --Nigel Peck, Denormalization Summary
"Relations are multidimensional. They are not flat. They are not two dimensional. Don't let the term table mislead you." I read the above statement on the back cover of CJ Date book on relational theory "Database in depth". Can anyone help how to visualize this multidimensional nature of relations?

Relations, Not Tables

First, relational databases do not consist of tables, but of relations, whose components are tuples and attributes, not rows and columns: a relation is a set of unordered tuples, each consisting of a set of attribute values drawn from domains. For every specific database, the database designer assigns a real world interpretation (meaning) to every relation as a set of facts about specific property-sharing entities. The tuples represent the facts and the attributes the properties in the database

Note: A relation designed to represent facts about a single class of entities is in fifth normal form (5NF).

Attributes = Dimensions

Second, a relation is a multidimensional object--each of its attributes representing an entity property is a dimension. Note that the multidimensionality is due to multiple attributes within relations, not to  relationships between relations (e.g., referential).

Relations = Relationships Defined by Relationships

Third, a database relation is a relationship among domains--a subset of their Cartesian product and is defined by mandatory and optional relationships between its attributes and among its tuples. For example, tuple uniqueness is a disjunctive relationship among all tuples of a relation.

R-Tables: Pictures of Relations Are Not Relations

Fourth, the R-table is the preferred visualization (picture) of a relation on some physical medium (e.g., paper, or screen): columns picture attributes and rows picture tuples (it is just one possible visualization, albeit the simplest and most familiar). But the picture of a thing should not be confused with the thing itself: it is the medium that is two-dimensional, not the relation which the R-table pictures. Would you consider people or objects flat when they are pictured on print photos or on screen?

The physical arrangement of values in rows and columns within a R-table is not meaningful--the logical relationships between the attributes and among the tuples of the underlying relation are. Those relationships are not visible in the R-tables; they are defined by the database designer as business rules, formally represented in the database by integrity constraints on the (relations pictured by the) R-tables (e.g., an entity integrity rule defines the disjunctive relationship among the tuples and is represented by a uniqueness (key) constraint on the relation).

Note: A R-table obeys a special discipline:

  • Unique, unordered rows;
  • Uniquely named, unordered columns;
  • No missing values.
Tables that are not R-tables do not picture relations.

Data, Not Relations Are Stored

Fifth, database relations are not physically stored, their data are. How they are stored is at the discretion of the DBMS designer and the database administrator (DBA). While possible, a direct image implementation of relations i.e., one physical indexed sequential file per relation and one physical record per tuple, the RDM does not require it. This gives vendors complete freedom to implement any and all storage and access methods deemed efficient; and the DBA freedom to deploy any of the available ones and change them at will without impairing queries or applications--that's what physical data independence (PDI) means.

Note: SQL databases had initially exclusive direct image implementations. While that is no longer the case, it induced logical-physical confusion (LPC) of relations with tables and tables with files, which remains entrenched. Rows and columns are shorthand for logical tuples and attributes, not equivalent to physical records and fields.

These misconceptions prevent data practitioners from grasping the practical advantages of the RDM and from taking full advantage of even the ones offered by SQL DBMS's. What is more, by confusing SQL deficiencies with relational weaknesses they ensure that true RDBMS's will not materialize.

No comments:

Post a Comment