Sunday, July 15, 2018

Understanding Relations Part 3: Debunking Conventional Wisdom




(See Part 1 and Part 2)

“A common term used in database design is a "relational database" -- but a database relation is not the same thing and does not imply, as its name suggests, a relationship between tables. Rather, a database relation simply refers to an individual table in a relational database. In a relational database, the table is a relation because it stores the relation between data in its column-row format. The columns are the table's attributes, while the rows represent the data records. A single row is known as a tuple to database designers.”
“A relation, or table, in a relational database has certain properties.”

“First off, its name must be unique in the database, i.e. a database cannot contain multiple tables of the same name.”

“Next ... as with the table names, no attributes can have the same name.”

“Next, no tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate rows, but there should be practices in place to avoid this, such as the use of unique primary keys (next up). Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key. This primary key cannot be duplicated. This means that no tuple can have the same unique, primary key. The key cannot have a NULL value, which simply means that the value must be known.”

“Further, each cell, or field, must contain a single value. For example, you cannot enter something like "Tom Smith" and expect the database to understand that you have a first and last name; rather, the database will understand that the value of that cell is exactly what has been entered.”

“Finally, all attributes—or columns—must be of the same domain, meaning that they must have the same data type. You cannot mix a string and a number in a single cell.”

“All these properties, or constraints, serve to ensure data integrity, important to maintain the accuracy of data.”
  --Definition of Database Relation

It is easy to discern when explanations of relational features are not grounded in the formal foundations of the RDM[1], but in industry practices. Here are some further clarifications and corrections.
 


------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE 

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS 

NEW: The Key to Relational Keys: A New Perspective

NEW: SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:

The DBDebunk page will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back. I created a #RelModel hashtag for this purpose.

The PostWest pages will contain links to evidence for, and my take on Dystopian Western Decadence, The Only Acceptable Racism Left, and The Weaponized Myth of a "Palestinian Nation" for which the anti-semitic world fell lock stock and barrel. I will create a #PostWest hash tag for this purpose.

HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------

Relations, Tables, and Meaning


A relational database has a formal definition[1], and "how it is commonly used in database design" is only relevant for pointing out the discrepancies and the implications thereof. It does not consist of tables, but of mathematical relations adapted from set theory to database management. I have sufficiently explained why relations should not be confused with tables[2,3], so I will not repeat it here. A relation:

  • Does not just "simply refer to an individual table", but rather can be visualized as a R-table on some physical medium: the relation's attributes are visualized as columns, and the tuples as rows;
  • Is purely logical -- its data can be physically stored in any number of ways, not necessarily (and usually not) "in a column-row format" (i.e., as a table).
Conventional wisdom reinforces the confusion by assigning "body and heading" -- table elements -- to relations. When I pointed this out on my Twitter page, a reader (who is a mathematician and teacher) replied:
“True, relations are just the data, not the metadata ... Still, I think it's nitpicking ... By the way, I'm not fond of the "heading" concept. A heading is just a predicate, one of infinitely many that holds for a specific relation and, to some extent, ties the relation in with some meaning in the domain it's used to represent ... the fact that the meaning of the tuple elements in a relation are not viewed as part of the relation itself, I find to be a less important distinction (although mathematically obvious). After all, it's necessary to make any sense of the relation.”
The important point, of course, is that a table does not have the formal properties of a relation. As Codd wrote,
“The fact that relations can be perceived as tables ... breeds the false assumption that the freedom of actions permitted [with] tables ... must also be permitted when manipulating relations...”[4]
Equating relations with tables inhibits understanding of those properties and their advantages by practitioners, most of whom lack a mathematical and logic background. Moroever, it reinforces the logical-physical confusion (LPC) common in the industry[5].

The relation-table confusion also induces/reinforces a misconception about the interpretation of a relation (i.e., the meaning assigned to it by the database designer). The meaning is the collection of informal business rules (BR) specifying in natural language the defining properties of the entity group that the relation represents formally in the database. For database representation, the BRs are formalized as first order predicates enforceable by the DBMS as constraints on the relation, to make it consistent with the BRs. The conjunction of these constraining predicates constitutes the relation predicate (RP), the formal version of the interpretation that users understand semantically, and the DBMS can enforce algorithmically[6].

One of the BRs, the properties-entity rule (PER), associates individual defining properties with the entity members of the group, represented by the relation attributes in the database. The heading of the R-table that visualizes the relation contains the names of the columns corresponding visually to those attributes. But
attribute names -- the heading -- is not a predicate. At best, it can be said to represent some predicate that, as my mathematician reader suggests, is consistent with “infinitely many" PERs. But even leaving the ambiguity aside, it represents one conjunct predicate of those comprising the RP -- the formal version of the meaning.

Consider the following R-table: given the heading, what is the meaning of the relation it visualizes?

-----------------------------------------
 COURT  MEMBER  START  END     RATE
=========================================
 hard   yes     09:30  10:30   SAVER
 hard   yes     11:00  12:00   SAVER
 hard   no      14:00  15:30   STANDARD
 grass  no      10:00  11:30   PREMIUM-B
 grass  no      11:30  13:30   PREMIUM-B
 grass  yes     15:00  16:30   PREMIUM-A
-----------------------------------------
It is precisely because the meaning -- informal BRs and formal RP -- is not visible in the tabular display of relations that the focus on tables is obscuring and misleading (e.g., the frequent question "Is this table in X-normal form?" without the benefit the meaning of the underlying relation[7] -- replying to which is a fool's errand).

Relation vs. Multi-relation Properties


Why relations are uniquely named is obvious -- they represent distinct entity groups. What isn't, however, but should be, is that this is not a property of individual relations, but a multi-relation property arising from a relationship among all of them. There are, in fact, three multi-relation properties that underlie three database design principles mandated by the RDM that are completely absent from conventional wisdom, and unknown and not adhered to in the industry[8].



Unique Attribute Names


The industry is also unaware that this is one of the adaptations of mathematical relation theory to database practice: the attributes of mathematical relations are referenced by their order, which would  complicate matters considerably for both database users and the DBMS. Unique attribute names is a way to avoid this.



Primary Keys


Primary keys (PK) are another database adaptation of mathematical relation theory. As an abstract subset of a Cartesian product of n domains, a mathematical relation has tuples that are unique combinations of all their n values. Database tuples, on the other hand, represent facts about real world entities, which are distinguishable in the real world by a collection of one or more (not all!) properties, to which a unique name is usually assigned as an identifying shorthand. A PK is an attribute that represents that name in the database
[9], on which the DBMS enforces a uniqueness constraint, guaranteeing database consistency with (i.e., accurate representation of) reality. This prevents duplicates (as Codd used to say, "Stating a fact more than once does not make it truer, only redundant.")

SQL tables may contain duplicate rows because PKs are not mandatory, one reason for which they are not relations.

Note: C. J. Date argues that PKs have no theoretical foundation, but we contend that the RDM imposes a formal PK mandate -- PK selection is pragmatic[10].


Missing Data


NULL is SQL's way to represent missing values (another reason SQL tables are not relations) -- it is a mark for an absence of a value, not a value[11]. As I explained so many times, database relations do not have missing values, another reason SQL tables with NULLs are not relations (for a relational solution to missing data see [12]).



Simple Domains and Atomic Values


"Each cell, or field, must contain a single value" is the conventional definition of value atomicity. But relations have neither cells (tables do), nor fields, only values. A more accurate formulation is that attributes are defined on simple domains without meaningful components, the values of which are treated as atomic by the data language[13] -- another way of saying that a relation is in first normal form (1NF)[14]. 


Either differentiating first name and last name is not meaningful to users and applications, in which case a NAME attribute is defined on a simple domain, and the data language need not -- indeed, must not -- reference its components, effectively treating NAME values as single, atomic values; or two attributes should be defined on a simple domain.

Note: "All attributes -- or columns -- must be of the same domain, meaning that they must have the same data type" is a poor expression of "for every attribute drawn from a domain, its corresponding values in all tuple are of that domain's type" (note the distinction between type and domain, which is a range-constrained type[15]).

Constraints
ensure that relations represent accurately the the conceptual model of reality that they represent, thereby providing semantics to the database
(for which reason they are referred to as semantic constraints), while preserving their mathematical properties. Not all are supported by SQL DBMSs.

Note: Database consistency is one component of data integrity[16]. Not all relation properties are enforced via constraints. Some (e.g., no inapplicable values, 5NF) are enforced by database design, others (e.g., value atomicity) by the data language. Distinct from conventional wisdom, we contend that relations are not only in 1NF, but also in 5NF by definition, otherwise they are not relations[3].



References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., Understanding Relations, Parts 1,2.

[3] Pascal, F., What Relations Really Are and Why They Are Important.

[4] Codd, E. F., THE RELATIONAL MODEL FOR DATABASE MANAGEMENT, VERSION 2 (Addison-Wesley, 1990).

[5] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[6]
Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency.

[7] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models.

[8]
Pascal, F., Database Design: What It Is and Isn't.
 

[9] Pascal, F., A New Understanding of Keys, Parts 1,2. 

[10] Pascal, F., The Key to Relational Keys: A New Understanding
 

[11] Pascal, F., "NULL Value" is a Contradiction in Terms.

[12] Pascal, F., The Last NULL In the Coffin: A Relational Solution to Missing Data.

[13] Pascal, F., Simple Domains and Value Atomicity.

[14] Pascal, F., First Normal Form in Theory and Practice Parts 1,2.

[15] Pascal, F., Class, Type, Relation and Domain in Database Management.



[20] Pascal, F., Logical Symmetric Access, Data Sublanguage, Kinds of Relations, Redundancy and Consistency.


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

No comments:

Post a Comment

View My Stats