Sunday, July 10, 2016

Levels of Representation: Relationships, Rules, Relations and Constraints




What's Wrong with Last Week's Picture (Question about relational model )

There are relationships at both the conceptual and logical representation levels. Confusing them is bad conceptual modeling and database design.

Relationships, Rules and Relations

AT: "In my personal understanding, a relation is defined as a set of tuples. Then ... "in the relational model every relation represents a relationship". And then a quote from Chen: "each tuple of entities ... is a relationship". If I use the first and the second statements - I can say that a relationship is a set of tuples. The third statement says that a relationship is a tuple. So far, is a relationship a set of an element of a set? (Or may be a set of sets?)".
A relation is a set of tuples that is a subset of the Cartesian product of the domains, i.e., it is a relationship of domains at the logical level (Chen may have been misquoted: a tuple is a set of attribute values (i.e., a relationship thereof, that represents a fact about a single entity).

A relation of a specific database is a formal database representation at the logical level of a class of facts about property-sharing entities at the conceptual level. To be members in a class facts must satisfy a set of membership criteria specified by the conceptual modeler informally in natural language as a set of single-class business rules:

  • Property rules;
  • Multi-property rules;
  • Entity rules;
  • Multi-entity rules.
To be members of the class of all classes to be recorded in the database, facts must also satisfy multi-class rules.

The rules specify relationships at the conceptual level:
  • Between entity properties;
  • Among entities.
For example, the entity distinguishability rule
Employees are uniquely identified by employee number.
is an entity rule that specifies a disjunctive relationship (mutual exclusivity) among all entities in a class of employees. The referential rule
Employees are assigned to existing departments.
is a multi-relation rule that specifies a M:1 relationship between entities of the employee and department classes.

Rules, Relations and Constraints

A DBMS can only manipulate abstract symbols mathematically. This is why both the classes of facts and the rules that specify the class membership criteria at the conceptual level must be formalized for representation and enforcement at the logical database level and expressed in a data language native to the particular DBMS employed. Relations represents sets of facts and as sets can be manipulated mathematically to answer queries i.e., derive new sets of facts that are logical implications of the sets of facts recorded in database relations. In effect, a RDBMS is a logic inference engine.

Integrity constraints are formal database representations at the logical level of the business rules that specify the relationships between properties and among entities at the conceptual level. Single relation constraints represent singleclass rules

  • Domain and attribute constraints;
  • Multi-attribute constraints;
  • Tuple constraints;
  • Multi-tuple constraints.
Multi-relation constraints correspond to multi-class rules.

For example, the entity distinguishability rule is represented by and enforced in the database by a uniqueness (or key) constraint, the referential rule by a referential (or foreign key) constraint.

In other words, at the logical level a database relation is a relationship of domains and tuple membership in it is defined by relationships between attributes and among tuples  that represent relationships between properties and among tuples at the conceptual level.

GE: "I argue that there is essentially no difference between relationships between entity (type tables) and between an entity and its attributes. They both represent relationships between two populations of things. Something is an attribute by virtue of there being a relationship. If relationships are represented by foreign keys and the entity tables must be in 1NF, as in the relational model, then all relationships must be at most Many-to-One (a very unnecessary limitation when modeling some user domain)."
Note the conceptual-logical conflation--entities in classes have properties (conceptual), tuples in relations have attributes (logical). I must rephrase the comment so that it means something (compare it to the original):
FP rephrase: "I argue that there is essentially no difference between relationships between entities of distinct classes and between properties of the same class. They both represent relationships. A property can represent a relationship between entities of distinct classes. If such relationships are represented by foreign keys and the relations representing the classes must be in 1NF, then relational databases can represent only M:1 relationships, a very unnecessary limitation when modeling some reality of interest."
Points arising:
  • Saying that relationships between entities and between properties are both relationships obscures the important difference between them--they are expressed by different rules and represented by different constraints--and may cause confusion that is at the root of much poor modeling and design;
  • A M:1 relationship between classes at the conceptual level is not represented by the referencing attribute (FK) at the logical level, but by the referential constraint on both relations that corresponds to the referential rule specifying the relationship.
  • I have already explained in a previous post that a M:N relationship between entities of distinct classes at the conceptual level is modeled as a class of associative entities and two M:1 relationships between its entities and those of the classes it associates, represented by a relation and two referential constraints at the logical level. This is not a limitation, but an advantage of the RDM: structural--and, therefore, manipulative and integrity parsimony (see below).
Note: GE's "relationships between an entity and its attributes" is a misexpression that I corrected in the rephrase but, interestingly, there is a special type of multi-property relationship between an entity identifier and its descriptive properties--expressible as an entity-property rule (EPR)--represented in the database by a special multi-attribute constraint--tuple-attribute constraint (TAC)--that the RDM as currently understood, does not incorporate (watch for McGoveran's forthcoming book).

E/RM Relationships and Referential Constraints

TF: "The entity-relationship model is essentially a directed graph model, where relationships are prominent residents. Not so in the relational model (despite the name), where relationships (between relations, mind you) are not visible and in the SQL implementations is reduced to constraints. Relationships are about structure, which is as important as meaning (the semantics of the terms used in the universe being modeled)."
E/R modeling produces conceptual models that are data model agnostic: they can be formalized for logical database representation either using a direct graph--hierarchic or network--data model (HDM/NDM), or using the RDM (both the HDM and NDM were effectively discarded decades ago because they proved prohibitively complex and inflexible).

The R in the E/RM refers only to the M:1 relationships at the conceptual level between distinct classes of entities--not to all the relationships specified by business rules--that, if the RDM is used for formalization, are represented by referential constraints at the logical level.

Note: Direct graph data models have two structural constructs to represent entities and relationships--nodes and edges, respectively, which is one source of complexity. The RDM has only one--the relation--which encapsulates both--a parsimony advantage for integrity and manipulation (think Occam's razor).

M:1 relationships are part of the real world structure modeled at the conceptual level captured by the business rules, which specify the meaning--semantics--assigned by the database designer to the data recorded in the database. The relation is the abstract structure in which the data are recorded at the logical level, to which the business rules assign meaning. In other words, the abstract logical structure (relations) carries the meaning specified by the business rules at the conceptual level and represented by constraints at the logical level. By enforcing the constraints, a RDBMS ensures that the database is consistent with the meaning assigned to it by the business rules. The mathematical set manipulation ensures that query results are logically correct.

Constraints are not visible in R-tables used to visualize relations, which is one reason not to confuse relations with tables. This is particularly important in SQL, as SQL DBMS's are not true RDBMS's and SQL tables are not necessarily R-tables i.e., they don't visualize relations.


Note: The name relation comes from relationship at the logical level, not of the E/RM kind; the relational in RDM comes from relations, not relationships.




No comments:

Post a Comment

View My Stats