Sunday, April 21, 2013

Un-muddling Modeling, Part II: More Misconceptions

This is a 6/21/17 rewrite of a 4/21/13 post, to bring it in line with McGoveran's interpretation of the true RDM envisioned by Codd[1]. It is the second part of a debunking of a LinkedIn thread (the first part of which was debunked two weeks ago).

Here's what's wrong with last week's picture, namely:

"A conceptual model has no rigorous definition? It is like a sketch of a picture yet to be completed? Or like an outline to a paper to be written or fleshed out? And once the model is rigorously defined, the ad hoc, informal model must be precisely consistent with the underlying model in all its semantics. Are you suggesting that a conceptual model is a precursor to a defined logical (relational) model? Then after the relational model is defined, the conceptual model needs to be a consistent abstraction of the formal logical model. 

What other type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier. Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between entity tables or relations) was through explicitly stored values (i.e., attributes, foreign keys).

What do you mean "Attributes are subsets of domains"? An attribute only exists in the context of a relationship. Something (a domain) is a descriptor of (i.e., is related to) something else (another domain).

What is an "R-table"? What do you mean by a "PICTURE [of a relation]"? There are things and there are views or manifestations/presentations of things. There is the model, and there are various presentations of that model. Is that what you are getting at? --Gordon Everest,

Do you mean that "...relations are defined over types (also known as domains); a type is basically a conceptual pool of values from which actual attributes in actual relations take their actual values." (taken from the "SQL and Relational Theory" [2009] by Chris Date). I am also not sure about "pointers". Can I define a domain of "pointers"? There might be an interesting relation over such domain.In addition, what will happen if I define a relation over a set of types, each of which is (another) relation? Lets say that a relation is either defined over types (domains), or defined over a "heading" (or a "definition") of other relations ... and I also try to eliminate identifiers completely". --AT,

The RDM: Nothing But Relationships

A conceptual model consists of business rules expressed in natural language in real world informal terms--object groups, properties, objects. While it must be developed as systematically, completely and consistently as possible, it is informal and driven by pragmatic perceptions of reality--it is not rigorous in the sense that it has no theoretical basis. That is why its database representation requires formalization to a logical model, to which it is, indeed, a precursor. So it's the other way around: the logical model must be a consistent abstraction of the conceptual model, which is the interpretation--meaning--of the logical model.

A formal data model--e.g., the RDM--is the mechanism for formalizing informal conceptual models as logical models for database representation. The real world relationships captured in a conceptual model can be expressed relationally in the corresponding logical model:

  • A relation is a set of database relationships (1) among attributes and (2) among tuples, which represent relationships in the real world among properties and among objects within an object group;
  • A database is a set of relationships (3) among relations, which represent relationships among object groups in the real world;
The misconception that the RDM represents only type (3) relationships probably originates with the E/R conceptual modeling approach. It makes an "absolute" distinction between objects types (entities) and relationships, but object types are relationships and relationships can be objects. All the relationships in a conceptual model captured by business rules are expressible in a relationally complete FOPL-based data language as integrity constraints enforcible by a RDBMS for consistency with the rules. That neither SQL, nor any other current data languages permit expression/enforcement of all the constraints corresponding to all relationships is their own deficiency, not a RDM weakness. 

This interpretation of Codd is an example of logical-physical confusion (LPC). He said, in fact, the exact opposite: his Information Principle--rule 0--mandates that all information in a relational database--including relationships between relations--be represented not physically, by pointer paths between records, but logically, by information content i.e., values. The relationships are not represented by foreign keys (attributes), but by referential constraints that each constrains the values of the referencing FK to match the values of the primary key of the referenced relation.  

Attributes as Constrained Domains

A mathematical relation is a pure abstraction devoid of any real world meaning--it is fixed in time and any values are possible. A database relation is an adaptation of a mathematical relation to represent an object group in the real world, so its data must be constrained to be consistent with the real world context it represents, one component of which is time. Database relations are time-varying, i.e., each represents facts about a set of objects at some point in time--we distinguish between relation type and time-specific relation values of that type. Domains represent all the possible (valid) values consistent with those of the properties they represent, attributes represent constrained domains--by time and, possibly, other context factors. This is independent of the relationships that domains and attributes participate in. 

Relations and R-tables

A relation is a set of logical relationships among attributes and among tuples. It can be visualized on some physical medium--screen, or paper--as a table: tuples display as rows, attributes as columns. But the arrangement of rows and columns on the medium (e.g., their order) is insignificant--i.e., not part of the relation specification. The table is special in that it obeys the mathematical set discipline of the relation it pictures: unique, unordered rows without missing values, uniquely named unordered columns.

Domains vs. Data types

Date and Darwen equate domains with programming data types, but we follow Codd in considering them distinct[2]. Domains (1) represent real world properties (2) are user-constrained types to be consistent with the properties and (3) database objects under DBMS control, while data types do not have to and are application objects under programmer control.

Domains can be relation-valued (RVD), but they are not relations. As we have repeatedly explained, non-simple domains, including RVDs, require data languages based on higher logic than first order predicate logic (FOPL), which would defeat core advantages of the RDM: declarativity, decidability and physical independence (PI).
One of the collective defining property of members of an object group--distinguishability--arises from a relationship among all the members. It is represented in the database by a uniqueness constraint on the primary key, which represents the object identifier, which AT "intends to eliminate". So, on the one hand the RDM is, erroneously, criticized for inability to express relationships, while on the other hand the means by which relationships that are expressible, but actually mandatory, are "eliminated". That's lack of foundation knowledge for you.


[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.


Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment