Sunday, July 16, 2017

Relations and Relationships Part II

Note: This is a 6/21/17 rewrite of a 4/21/13 post, to bring it in line with McGoveran's interpretation [1] of the true RDM envisioned by Codd . It is the second part of a debunking of a LinkedIn thread (the first part of which was debunked two weeks ago).


Here's what's wrong with last week's picture, namely:
"A conceptual model has no rigorous definition? It is like a sketch of a picture yet to be completed? Or like an outline to a paper to be written or fleshed out? And once the model is rigorously defined, the ad hoc, informal model must be precisely consistent with the underlying model in all its semantics. Are you suggesting that a conceptual model is a precursor to a defined logical (relational) model? Then after the relational model is defined, the conceptual model needs to be a consistent abstraction of the formal logical model. 

What other type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier. Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between entity tables or relations) was through explicitly stored values (i.e., attributes, foreign keys).

What do you mean "Attributes are subsets of domains"? An attribute only exists in the context of a relationship. Something (a domain) is a descriptor of (i.e., is related to) something else (another domain).

What is an "R-table"? What do you mean by a "PICTURE [of a relation]"? There are things and there are views or manifestations/presentations of things. There is the model, and there are various presentations of that model. Is that what you are getting at?"--Gordon Everest, LinkedIn.com

"Do you mean that...relations are defined over types (also known as domains); a type is basically a conceptual pool of values from which actual attributes in actual relations take their actual values. (taken from the SQL AND RELATIONAL THEORY [2009] by Chris Date). I am also not sure about "pointers". Can I define a domain of pointers? There might be an interesting relation over such domain.In addition, what will happen if I define a relation over a set of types, each of which is (another) relation? Lets say that a relation is either defined over types (domains), or defined over a "heading" (or a "definition") of other relations ... and I also try to eliminate identifiers completely". --AT, LinkedIn.com


Nothing But Relationships


A conceptual model consists of business rules expressed in natural language in real world informal terms -- objects, properties, object groups. While it must be developed as systematically, completely and consistently as possible, it is informal and driven by pragmatic perceptions of reality -- it is not rigorous in the formal sense. That is why its database representation requires formalization as a logical model, to which it is, indeed, a precursor. So it's the other way around: the logical model must be a consistent abstraction of the conceptual model, which is the interpretation -- meaning -- of the logical model. 

The misconception that the RDM represents relationships of only one type -- between relations (referential rules) -- most likely originates with the E/R conceptual modeling approach. It assumes an "absolute" distinction between entities (objects) and relationships. The distinction, however, is in the "eye of the modeler": objects, properties and object groups are all, in fact, relationships labeled differently as a matter of subjective, pragmatic convenience.  
  • (Facts about) objects are (1) relationships among property values, represented by tuples;
  • Object groups are relationships (2) among properties and (3) among objects within the groups, represented by tuple and multi-tuple constraints, respectively;
  • A group of object groups is a set of (4) relationships between members of distinct groups, represented in the database by database (multi-relation) constraints.
All the relationships expressed as business rules comprising a conceptual model are formalized in a relationally complete FOPL-based data sub-language as constraints, enforceable by a RDBMS for consistency with the rules. That neither SQL, nor any other current data languages can express -- nor can the DBMSs based on it enforce -- all of them is their deficiency, not a RDM weakness. 


Referential constraints are just one type of database constraints that represent cross-group relationships in relational databases. That neither SQL, nor any of the current non-relational data languages go beyond PK uniqueness and referential constraints is their own deficiency, not a RDM weakness.

Note: As I already explained, (2), (3) and (4) relationships give rise to second, third and fourth order properties[2].

The above interpretation of Codd is an example of logical-physical confusion (LPC). He "decreed", in fact, the exact opposite: the Information Principle -- his rule 0 -- mandates that all information in a relational database -- including relationships between relations -- be represented not physically, by pointer paths between records, but logically, by values. But it's not the FK the represents the cross-group relationship -- it is an attribute representing an object property in context, it's the referential constraint that constrains the FK values in the referencing relation to match the values of the PK in the referenced relation. 


Attributes Are Constrained Domains


A mathematical relation is an abstraction devoid of any real world meaning -- it is fixed in time and can have arbitrary values. A database relation is adapted to represent an object group in the real world. Its data are constrained by the real world context, one component of which is time. Database relations are time-varying (i.e., they represent object groups at various points in time). We distinguish between relation type and time-specific relation instances of that type at different times. Domains represent all the possible (valid) values consistent with the properties they represent, attributes represent constrained domains -- by time and other context factors. 


Relations and R-tables


A relation is a set of relationships among attributes and among tuples -- all the relationships that a relational database represents must be satisfied at all times. It can be visualized on some physical medium -- screen, or paper -- as a R-table: tuples display as rows, attributes as columns. But the arrangement of rows and columns on the medium (e.g., their order) is insignificant (i.e., not part of the relation specification). A R-table obeys the mathematical set discipline of the relation it displays: unique, unordered rows without missing values and uniquely named unordered columns.


Domains Vs. Data Types


Date and Darwen equate domains with programming data types, but we follow Codd in considering them distinct[3]. Domains (1) represent real world properties (2) are user-constrained types to be consistent with the properties and (3) are database objects under DBMS control, while data types do not have to and are application objects under programmer control.

Domains can be relation-valued (RVD), but are not relations. As we have repeatedly explained, non-simple domains, including RVDs, require data sub-languages based on higher logic than first order predicate logic (FOPL), which would would lose relational advantages: declarativity, decidability and physical independence (PI). One of the defining properties of the set of members of an object group as a whole -- distinguishability -- arises from a relationship among all the members. It is represented in the database by a PK uniqueness constraint, and the PK represents the object identifier that AT "intends to eliminate". So, on the one hand, the RDM is, erroneously, criticized for inability to express relationships, while on the other hand the means by which relationships that are not only expressible, but actually mandatory, are "eliminated". That's lack of foundation knowledge for you.




References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., What Meaning Means: Business Rules, Predicates, Constraints, Integrity Constraints and Database Consistency.

[3] Codd, E. F., THE RELATIONAL MODEL FOR DATABASE MANAGEMENT: VERSION 2 (Addison Wesley, 2000).







Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment