Sunday, June 24, 2018

Understanding Relations Part 1: Tables? So What?




Note: This is a re-write of two older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation of Codd's real RDM, including his own refinements, corrections, and extensions[1]

“Put simply, a "relation" is a table, the heading being the definition of the structure and the rows being the data.”
“In simple English: relation is data in tabular format with fixed number of columns and data type of each column. This can be a table, a view, a result of a subquery or a function etc.”
“Practically, a "Relation" in relational model can be considered as a "Table" in actual RDBMS products(Oracle, SQL Server, MySQL, etc), and "Tuples" in a relation can also be considered as "Rows" or "Records" in a table.”
“In common usage, however, when someone refers to a "relation" in a database course, they are referring to a tabular set of data either permanently stored in the database (a table) or derived from tables according to a mathematical description (a view or a query result).”
“In SQL RDBMSes (such as MS SQL Server and Oracle] tables are permently stored relations, where the column names defined in the data dictionary form the "heading" and the rows are the "tuples" of the relation. Then from a table, a query can return a different relation.”
“Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables.”
“I read [that] "Relations are multidimensional. They are not flat. They are not two dimensional. Don't let the term table mislead you." on the back cover of CJ Date's DATABASE IN DEPTH. Can anyone help how to visualize this multidimensional nature of relations?”
Because SQL DBMSs have been sold as relational databases (which they are not), and in SQL the data structure is the table, in the absence of foundation knowledge[2] most practitioners think that relational databases consist of tables, but do not ask themselves why and how is that significant for database practice. The subtitle of this post is a question I used to ask in presentations years ago that always got silence. I see no evidence of improvement -- in fact, it's gotten worse. To emulate Feynman, "Nobody understands the RDM".

That such a simple and commonly understood structure can visualize relations is an advantage of the RDM, but a table is not a relation and, SQL notwithstanding, confusing the two reflects a lack of understanding of the RDM, misses its significance for database practice, and prevents taking full advantage of its benefits.

Note: The table is the preferred way to picture relations, there are others (e.g., array).

First, the fundamentals.


------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS


NEW: SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:


The DBDebunk page will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back. I created a #RelModel hashtag for this purpose.

The PostWest pages will contain links to evidence for, and my take on Dystopian Western Decadence, The Only Acceptable Racism Left, and The Weaponized Myth of a "Palestinian Nation" for which the anti-semitic world fell lock stock and barrel. I will create a #PostWest hash tag for this purpose. 

HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------ 

Sets and Mathematical Relations


In simple set theory (SST), a relation is a special kind of set: a subset of a specific kind of set of sets. Namely, it is a set of unordered n-valued tuples that is a subset of the Cartesian (or cross) product of n domains. A domain is a set of values, and a tuple is a set of ordered values, each drawn from a domain. Sets can be manipulated mathematically by set operations to yield sets, just like numbers can be to yield numbers (formally, sets are closed to set operations like numbers are closed to arithmetic)[1].

Mathematical domains and relations are abstractions devoid of real world meaning -- they are sets of arbitrary values. Databases represent information (facts) about segments of reality of interest to their users, and Codd realized that relations could be adapted to represent the real world (i.e., carry meaning) without loss of mathematical properties, which would be advantageous for database management by providing such benefits as system-guaranteed logical validity and semantic correctness of query results (i.e., soundness)[3], physical (PI) and logical independence (LI)[4], system-optimized performance, and more.

The RDM is, essentially, an adaptation and application of the mathematical theory of relations expressible in first order predicate logic (FOPL) to database management and, as such, it is not pure, but applied theory.


Database Relations


Unlike their mathematical counterparts, database relations (and domains) are not sets of arbitrary values, but are constrained to values consistent with the conceptual models of reality that the databases represent, which consist of:

  • Objects of multiple types -- entities, entity groups, and multi-groups;
  • Business rules (BR) that specify their defining properties, namely: 
- Individual properties-in-context of entities members of a group, including any relationships among them;
- Collective group properties arising from relationships among:
> members of a group;
> members of a group and those of subgroups, if any;
> members of different groups[5].
A database relation represents (facts about) entity members of a group. Using a FOPL based language, informal BRs can be expressed formally as predicates, which a RDBMS can enforce as constraints on and among relations, to ensure database consistency with the model (i.e., faithful representation of the objects and properties).

Several types of BRs formalize as constraints in the RDM:

  • Property rules --> Domain constraints;
  • Property-in-context rules --> Attribute constraints;
  • Entity rules --> Tuple constraints;
  • Multi-entity rules --> Multi-tuple constraints;
  • Multi-group rules --> Multi-relation (database) constraints[6].
The constraints ensure that database relations (1) represent the model accurately and (2) preserve their mathematical properties. It follows that if and only if both the database and the DBMS are relational, namely:
  • Relations are properly designed and constrained; and,
  • The DBMS is a true and full RDBMS
full advantage can be taken of relational benefits.

Note: To preserve mathematical properties of database relations (1) database design must ensure unordered tuples, u
niquely named and unordered attributes and (2) constraints must enforce primary keys (PK) and no missing values.

Mathematical relations have unique, unordered tuples, unnamed ordered attributes, and no missing values by mathematical definition. Database relations have tuple uniqueness enforced by primary key constraints[7], and unordered tuples, uniquely named, unordered attributes, and no missing values by design.

Contrary to conventional wisdom:

  • A relation is not just in first normal form (1NF)[8], but also in fifth normal form (5NF), otherwise it is not a relation[9]. This means, respectively, that it is defined only on simple domains (i.e., with atomic values)[10], and represents an entity group with members of a single type[4].
  • There are three principles of relational database design. According to an as yet unproven McGoveran conjecture, they subsume the Principle of Full Normalization (POFN) (i.e., adherence to them ensures POFN, but not vice-versa)[11].
Note very carefully that relations are purely logical objects (i.e., they are not stored). 

Relational Tables


Tables are visualizations (pictures) of relations on some physical media -- paper, screen, or storage -- and the picture of a thing is not the same as the thing itself. Consider the CUSTOMERS table:

==============================================================

 CID   FNAME       LNAME      FICO      BALANCE    STATE  TAX   
=====---------------------------------------------------------
   1   Maria       Anders       4        439.00      CA    Y     
   3   Antonio     Moreno       5        227.60      DC    N     
   4   Thomas      Hardy        6        517.00      FL    N       
   5   Christina   Berglund     5        540.00      ME    Y    
   8   Martín      Sommer       3       1010.30      MD    Y       
   9   Laurence    Lebihan      2        188.00      NV    Y
  10   Elizabeth   Lincoln      1       1200.42      NY    N
===============================================================


The physical arrangement of rows (picturing tuples) and columns (picturing attributes) on paper/screen does not represent anything in the conceptual model, and is not an element of a purely logical relation. The only meaningful elements are tuples (representing entities), attributes (representing properties), and constraints (representing relationships among them), all of which are integral logical components of a relation definition that are known to a RDBMS. But constraints are not table components. In other words, a table includes superfluous, and excludes meaningful information and, thus, lacks the required mathematical properties, and is not a relation[1].

We refer to a table, the body of which -- tables don't have headers -- satisfies the constraints on the relation it pictures, as a relational table (R-table), which should not be confused with the relation either. As I often remind practitioners, it is difficult, if not impossible, to determine by visual inspection whether a table is a R-table, even if the constraints are known, let alone if they are not[12].


Flat Tables and Multi-dimensional Relations


A database relation has a real world interpretation (meaning) as representing a set of facts about the members of an entity group. Tuples represent (facts about), attributes properties-in-context of the entities[4]. Since entities of a type have multiple properties, the entity type is multi-dimensional
(i.e., attributes = dimensions), and, thus, so is the relation that represents it .

A table is flat because it is a picture of a multi-dimensional relation
a two-dimensional physical medium of (one reason, as we have seen, it is not a relation). It is the medium that is flat, not the underlying relation. Do you deem photographed or painted people or object flat?

“...The fact that relations can be perceived as tables ... breeds the false assumption that the freedom of actions permitted [with] tables .....must also be permitted when manipulating relations...” --E. F. Codd, 1990

(Continued in Part 2)


References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2]
Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

[3] Pascal, F., Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness.

[4] Pascal, F., Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation.


[5] Pascal, F., Conceptual Modeling for Database Design: A New Perspective, forthcoming. 

[6] Pascal, F., Logical Database Design: Formalizing the Informal, forthcoming.

[7] Pascal, F., The Key to Relational Keys: A New Understanding.

[8] Pascal, F., First Normal Form (1NF) in Theory and Practice, Part 1,2,3.

[9] Pascal, F., What Relations Really Are and Why They Are Important.

[10] Pascal, F., Simple Domains and Value Atomicity.

[11] Pascal, F., Database Design: What It Is and Isn't.

[12] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models.



Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.




No comments:

Post a Comment