Sunday, November 26, 2017

What Relations Really Are and Why They Are Important



Note: Some of the References have been re-written to bring them into line with the McGoveran formalization and interpretation [1] of Codd's real RDM -- re-reading is recommended.

Here's what's wrong with the picture of two weeks ago, namely:

"In SQL RDBMSes (such as MS SQL Server and Oracle] tables are permently stored relations, where the column names defined in the data dictionary form the "heading" and the rows are the "tuples" of the relation."

"A relation can be represented by a table in database. A relation in the context of modeling a problem will include the fields and possibly the identification of fields which have relationships with other relations..."

"Put simply, a "relation" is a table, the heading being the definition of the structure and the rows being the data."

"In simple English: relation is data in tabular format with fixed number of columns and data type of each column. This can be a table, a view, a result of a subquery or a function etc."

"A relation is a table, which is a set of data. A table is the result of a query."

--What is a relation in database terminology?, StackOverflow.com

--------------------------------------------------------------------------------
I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.
--------------------------------------------------------------------------------- 

Even after my more than four decades of debunking misconceptions about data and relational fundamentals, the ignorance in the industry still astounds. In fact, it has been getting worse, not better. Note that (1) practitioners don't even understand what a definition is and (2) it is clear that neither of the "definers" have been ever taught or learned the RDM -- they all "ingested by osmosis" whatever misconceptions floated around while practicing.

Database Adapted Sets


Database relations are the exclusive relational data structure -- mathematical relations adapted for and applied to database management [2].

A database relation is

  1. A set of unordered n-valued tuples, each of which is
  2. A set of unique, unordered, non-missing values of
  3. Unordered, uniquely named attributes that are
  4. Defined on simple domains [3] (i.e., it is in First Normal Form (1NF)) and
  5. The only dependencies that hold in it are functional dependencies of the non-key attributes on the PK (is in Fifth Normal Form).
Every relation is properly constrained for consistency with the business rules specifying the individual and collective properties required for membership in the real world object group it represents in the database.

Real World Meaning, Logical Validity and Semantic Correctness


What is most important to understand is that the database adaptations enable  assignment of real world meaning to abstract mathematical relations -- tuples represent facts about the real world, attributes object properties and semantic constraints business rules -- while preserving set-mathematical properties -- relations can be manipulated mathematically as sets by the relational algebra (RA) with relations as results (relational closure).

Mathematics guarantees logical validity of query and update results and database design adherence to 

  • The Principle of Orthogonal Design (POOD);
  • The principle of Expressive Completeness (POEC);
  • The Principle of Representational Parsimony (PORP);
that jointly imply the Principle of Full Normalization (POFN) (i.e., 5NF relations) avoids anomalous results and guarantees semantic correctness.

In 5NF By Definition


Initially there was a single normal form, tied to Codd’s 1969-70 first definition of the join operation. The re-definition of join in 1970 as the one we use today necessitated multiple normal forms (1NF-5NF) and “further” normalization, with 5NF considered “full” normalization.

It is currently accepted in the industry that relations are in at least 1NF, but that is insufficient. An algebra does not have anomalies (e.g., as in view updates [4]) -- their occurrence is an indication that the definition of either the operators, or the operands of the algebra is flawed. The 1969-70 single normal form was to the first join what 5NF is to the current join. Given the current join, 5NF relations avoid anomalies and are necessary semantic correctness [5]. Had Codd defined relations as being in 5NF and join as it is now from the start, 1NF-5NF would have not arose.


Common Misconceptions


None of the answers reflects knowledge and understanding of this.

  • "SQL RDBMSs" is a contradiction in terms -- they are not true RDBMSs due to multiple violations of the RDM, among them,SQL tables can contain duplicate rows, attributes ordered or with duplicate names and NULLs, in which case they are not relations;
  • A relation can be visualized as a R-table -- not just any table -- on some physical medium; in the database there are no tables, only relations; fields and rows are visualizations of attributes and tuples, not relation elements [6];
  • Relations do not have "headings" -- which are part of the tabular visualization of relations (meta-data) --  and so are columns and rows;
  • The headings are not "the definition" of a relation -- the business rules are the informal meaning and the relation predicate (RP)  (i.e. the conjunction of the semantic formal constraints corresponding to the business rules that specify the required properties of the represented object group) constitute the formal semantics [7].
  • A relational database represents relationships other than just between relations:
- among attributes (i.e., among represented object properties in the real world);
- among all tuples (i.e., among all members of the represented object group) [8,9];

  • Attributes are defined on simple domains, which are data types with user constrained value ranges that represent real world properties;
  • There are base and derived (query results, views, or snapshots) relations (no functions), which SQL tables are not (sub-queries are a SQL-specific kind of query expression);


Note: Data of SQL base tables are stored, causing confusion with Codd's base relations, which are the base of a database from which all meaningful relations are derived and their data may or may not be stored (base tuples represent axioms, derived tuples the theorems -- inferences from the axioms.
 

References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., The Interpretation and Representation of Database Relations.

[3] Pascal, F., Simple Domains and Value Atomicity.

[4] McGoveran, D., On View Updating.

[5] Pascal, F., Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness.

[6] Don't Confuse Tables with Relations!

[7] Pascal, F., Relation Predicates and Identical Relations.


[8] Pascal, F., It’s Not Tables, It's the Relationships.


[9] Pascal, F, Relations & Relationships Part I and Part II.



No comments:

Post a Comment

View My Stats