Sunday, April 21, 2013

Un-muddling Modeling, Part II

This is the second part of my response to comments in the online exchange Data (and other) models and aesthetics.
GE:So, from your perspective, a conceptual model has no rigorous definition? It is like a sketch of a picture yet to be completed? Or like an outline to a paper to be written or fleshed out? And once the model is rigorously defined, the ad hoc, informal model must be precisely consistent with the underlying model in all its semantics. Are you suggesting that a conceptual model is a precursor to a defined logical (relational) model? Then after the relational model is defined, the conceptual model needs to be a consistent abstraction of the formal logical model.

Informal does not mean sketchy, or 'not rigorous'. In Part I I stressed how important the rigorous use of conceptual language is, precisely because of its informal nature.

An R-table is a formal structure in the sense that its mathematical manipulation as a set and the results do not depend on its interpretation--the semantics, or meaning captured by the conceptual model--only on the structure itself, its form (hence, formal). A logical model is the formalization of a conceptual model for the purpose of database representation. The conceptual model is what the logical model and query results mean to users--the interpretation--which a DBMS cannot understand.
GE: What other type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier.
The relationships expressible in databases designed in compliance with the relational data model (a more accurate formulation) include those between:
  • rows e.g. uniqueness, a disjunctive relationship;
  • columns within an R-table e.g. functional dependencies of non-key columns on the key;
  • columns across R-tables e.g. referential constraints (note that FK's and PK's are defined over the same domain--they have the same meaning at the conceptual level--which is why Codd called them the "glue that holds the database together".
GE: What is an "R-table"? What do you mean by a "PICTURE"? There are things and there are views or manifestations/presentations of things. There is the model, and there are various presentations of that model. Is that what you are getting at?
A relation is a mathematical abstraction. A database table is not an abstraction, but a material presentation--a picture--of it. If and only if the table obeys a specific discipline, it is a faithful representation of a relation:
  • unique, unordered rows
  • uniquely named, unordered,  single-valued  columns
  • no missing values
and can, therefore, be manipulated mathematically as a set. Tables that violate this discipline break the relational algebra.
GE: What do you mean "Attributes are subsets of domains"? An attribute only exists in the context of a relationship. Something (a domain) is a descriptor of (i.e., is related to) something else (another domain).
"Attributes are subsets of domains" is an excellent example of how easy it is to mix levels of representation. Attributes are subsets of properties at the conceptual level; columns are subsets of domains at the logical level. That simply means that a property is a named pool of all possible valid values and an attribute is a subset of actual (used) values drawn from the pool at a given time. Domains and columns are the logical counterparts.
GE: Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between entity tables or relations) was through explicitly stored values (i.e., attributes, foreign keys).
The reference is in my paper, Truly Relational: What It Really Means. Mathematical relations are on ordered domains. Codd recognized that it is not practical for users to rely on column ordering, so he initially referred to relations without such ordering as relationships. But later he reverted back to relations.

R-tables whose rows represent entities or, formally, true propositions about entities, is more accurate than "entity tables".
GE: I understand his definition of a relation: A subset of the Cartesian product of a set of (finite) domains. Since that is still a set, there are no duplicates, so each member of the set must be uniquely identifiable, hence the concept of an identifier. 
Mathematically, a subset of a Cartesian product of domains consists of tuples whose combination of all of their values is guaranteed to be unique by definition. For reasons explained in my paper, that is neither acceptable, nor practical for databases representing facts about the real world, where entities have identifiers, hence another adaptation of the theory--keys--to represent them.
AT: Do you mean that "… relations are defined over types (also known as domains); a type is basically a conceptual pool of values from which actual attributes in actual relations take their actual values." (taken from the "SQL and Relational Theory" [2009] by Chris Date).
I am also not sure about "pointers"… Can I define a domain of "pointers"? There might be an interesting relation over such domain...

In addition, what will happen if I define a relation over a set of types, each of which is (another) relation? Lets say that a relation is either defined over types (domains), or defined over a "heading" (or a "definition") of other relations… - that probably very close to my original post above (and I also try to eliminate identifiers completely…)
Date and Darwin view domains as identical to programming types and use the latter term. Codd used domain to distinguish the concept from type. I use Codd's term. My domains have types, which are probably akin to D&D's possible representation (see Domain vs. Type and Class).

The relational model accepts domains of values of any type of arbitrary complexity, as long as they have well-defined operators applicable to those values. This is not a relational, but database design issue. There is nothing to prohibit pointer-valued and R-table-valued domains, but I might question the wisdom of such designs (for a more in-depth discussion see Chapter 1 in PRACTICAL ISSUES IN DATABASE MANAGEMENT, available via the Home page).

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment