Friday, June 29, 2018

Understanding Relations Part 2: Beware the Misconceptions




Note: This is a re-write of two older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation of Codd's real RDM, including his own refinements, corrections, and extensions[1].

(Continued from Part 1)



I started Part 1 with comments exposing confusion (among the many misconceptions rampant in the industry[2]) of database relations with tables (reinforced by SQL), and of levels of representation. Debunking them requires foundation knowledge lacking in the industry, so I provided some: I explained the difference between mathematical relations, database relations, and relational tables. We are now in a position to debunk the comments.


------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE 

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS 

NEW: The Key to Relational Keys: A New Perspective

NEW: SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:
The DBDebunk page will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back. I created a #RelModel hashtag for this purpose.

The PostWest pages will contain links to evidence for, and my take on Dystopian Western Decadence, The Only Acceptable Racism Left, and The Weaponized Myth of a "Palestinian Nation" for which the anti-semitic world fell lock stock and barrel. I will create a #PostWest hash tag for this purpose.

HOUSEKEEPING
  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------

“Put simply, a "relation" is a table, the heading being the definition of the structure and the rows being the data.”
“In simple English: relation is data in tabular format with fixed number of columns and data type of each column. This can be a table, a view, a result of a subquery or a function etc.”
“Practically, a "Relation" in relational model can be considered as a "Table" in actual RDBMS products(Oracle, SQL Server, MySQL, etc), and "Tuples" in a relation can also be considered as "Rows" or "Records" in a table.”

"Simple English" has nothing to do with it -- relation is a precise formal concept, and, per Einstein, "Everything should be as simple as possible, but not simpler".
A database relation -- base, view, or query result -- is not a table, it can only be visualized as a R-table: it does not have columns and heading (it has attributes visualizable as columns, the names of which are not the "definition of structure", but part of its interpretation (i.e., meaning))[3,4].

Tables are useful for picturing relations, but do not have the set-mathematical properties of relations, and do not confer the advantages of the RDM. SQL tables are not visualizations of, but substitutes for, relations that are not guaranteed to be fully constrained for accurate representation of reality without loss of mathematical properties. This is one reason SQL DBMSs are not RDBMSs[5]).

“In common usage, however, when someone refers to a "relation" ... they are referring to a tabular set of data either permanently stored in the database (a table) or derived from tables according to a mathematical description (a view or a query result).”
“In SQL RDBMSes (such as MS SQL Server and Oracle] tables are permanently stored relations, where the column names defined in the data dictionary form the "heading" and the rows are the "tuples" of the relation. Then from a table, a query can return a different relation.”
There is no such thing as a "tabular set of data". There are database relations that are sets visualizable as R-tables[5]. The data of a base or snapshot relation are stored, but not necessarily, and usually not as tables. Views are virtual relations derived mathematically from base relations (and other views) by the set operations of the relational algebra (RA). All query results derived by RA operations are by definition relations. Both views and query results are derived relations, ultimately from base relations[6].

As we have seen, SQL base tables are stored, but for a variety of reasons are not relations (there are no stored relations) and rows are not tuples. Because of this, and because SQL operations may not be RA operations, the results are tables rather than relations, and logical validity and semantic correctness are not guaranteed[7].

“Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables.”
“I read [that] "Relations are multidimensional. They are not flat. They are not two dimensional. Don't let the term table mislead you." on the back cover of CJ Date's DATABASE IN DEPTH. Can anyone help how to visualize this multidimensional nature of relations?”
A relation is a multi-dimensional object: its number of attributes (i.e., its degree) are the dimensions -- n-ary relations are n-dimensional. It's the medium on which tables picture relations that's two-dimensional, not the  relations.

These misconceptions prevent practitioners from grasping the RDM and its practical significance and, therefore, from taking full advantage of even the limited relational fidelity of SQL DBMSs. They confuse true RDBMSs with SQL DBMSs, mistake the latter's deficiencies for RDM weaknesses, ensuring that the former have not materialized, and probably will not.



Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

[3] Pascal, F., The Interpretation  and Representation of Database Relations.

[4] Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints, and Database Consistency.

[5] Pascal, F., NoSQL and SQL: A Plague on Both Their Houses.

[6] Pascal, F., Structure, Integrity, Manipulation: How to Compare Data Models.

[7] Pascal, F., Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness.



No comments:

Post a Comment