Note: This two-part post replaces several previous posts which now redirect here. The example is not meant to be realistic, but to convey specific points in an accessible way.
"The principle of orthogonal design (abbreviated POOD) ... is the second of the two principles of database design, which seek to prevent databases from being too complicated or redundant, the first principle being the principle of full normalization (POFN). Simply put, it says that no two relations in a relational database should be defined in such a way that they can represent the same facts. As with database normalization, POOD serves to eliminate uncontrolled storage redundancy and expressive ambiguity, especially useful for applying updates to virtual relations (views). Although simple in concept, POOD is frequently misunderstood ... is a restatement of the requirement that a database is a minimum cover set of the relational algebra. The relational algebra allows data duplication in the relations that are the elements of the algebra. One of the efficiency requirements of a database is that there be no data duplication. This requirement is met by the minimum cover set of the relational algebra." --Wikipedia.orgWell, not quite.
Base and Derived Relations
Every database relation has an associated formal relation predicate (RP), a conjunction of constraints expressed in a DBMS-specific data language. They formally represent in the database informal business rules comprising a conceptual model, which denote the meaning of the relation and its tuples assigned by the database designer. Expressed in a specific data sublanguage, constraints are integrity constraints.
Relational databases consist of a set of base and derived relations, the latter--query results, views and snapshots--derived from the former by applying relational algebra operations. Because SQL stored tables are called base tables, they induce the mistaken notion that base means stored, but while their data can and often will be stored (not necessarily as tables), base relations does not mean stored relations! Rather, they represent the base facts--axioms--about the real world objects of interest (i.e., serve as the base of the database). Derived relations represent facts that are logical implications of the axioms--theorems--and are also about the objects represented by base relations--no new objects are added in deriving relations from the base relations.
Relations are derived from base relations either directly, or via other derived relations and inherit the integrity constraints on the relations from which they were derived, supplemented by any constraints added by the relational operations applied to produce them (e.g., restrictions, projections, joins and so on). Thus, at every relation derivation a true RDBMS--which SQL DBMSs are not--records the derived RP (constraints) in the database catalog--another way of saying that it supports constraint inheritance--and thus always knows how the derived constraints on each each derived relation relate back to the constraints of the base relations.
Some derived relations (e.g., views, snapshots) are given names, which are just a shorthand for their RPs (constraints). Others (e.g., query results) don't have names. Note very carefully that in a truly relational system application developers and end-users should not need to know if they interact with base and when with derived relations. In fact, in a truly relational environment the database should be accessible to applications only via views, to take advantage of logical independence--insulation from logical reorganizations of the database.
Three Formal Design Principles
Formal systems adhering to three principles have certain desirable properties. Relational databases are formal logic systems and, therefore, will acquire those properties if their design adheres to these principles  which, applied to databases, are:
Note: There is a McGoveran conjecture that the Principle of Full Normalization (POFN) implies the three principles, but not vice-versa.
- Principle of Orthogonal Design (POOD): Base relations are independent;
- Principle of Representational Minimality (PORM): There are no superfluous base relations;
- Principle of Expressive Completeness (POEC): All meaningful relations are derivable from the base relations.
A base relation is independent if it is not derivable from other base relations. Here's Codd in the first ever paper (1969) that launched the RDM: "A relation R is derivable from a set S of relations if there exists a sequence of [relational operations] which yields R from members of S." In other words, a base relation is independent if the facts it represents are not derivable from those represented by the other base relations (identical relations are a trivial case of non-independence).
While POOD (and the POFN) reduce logical redundancy, their principal objective is
- Logical correctness: if the base facts are true, the derived facts are also true;
- Semantic correctness: There are no update anomalies--i.e., updates of derived relations have no undesirable side-effects on base relations.
Entity Supertypes and Subtypes
Consider a (very) simple conceptual model consisting of the following business rules and assume for now that no data are missing.
- All employees have an id, name and hire date;
- Some earn a salary, some make a commission (salary and commission are mutually exclusive).
Many practitioners would design the following SQL table:
EMPLOYEES (EMP#,ENAME,HIREDATE,SALARY,COMMISSION)that would contain NULLs that stand for "inapplicable". The NULLs are an artifact of poor design--the inaccurate representation of the conceptual model. The model specifies that some employees do not have salary or commission, yet the design assigns them to all tuples, forcing NULLs. Whether they stand for missing or inapplicable, NULLs violate the core Information Principle (IP) of the relational data model (RDM), which mandates that all information in a relational database be represented explicitly and in exactly one way: as values of relation attributes defined on domains. NULL is not a value, but a marker for the absence of a value that cannot be legitimately treated as a value--that would be a logical type error--so the table is not a R-table--it does not picture a relation. Most data professionals are oblivious to the problems caused by NULLs (see  and bibliography in ), because they focus on representation (structure) and ignore the implications for integrity enforcement and manipulation.
In the past I advocated the following NULL-avoiding three base relations design:
EMPS (EMP#,ENAME,HIREDATE)but what entities do the latter two relations represent? The application developer, let alone the end user, should be presented with a logical design that reflects entities with which they are familiar.
When entities have some properties in common, but some also have properties not shared, not all entities are of the same type. Entities with unique properties in addition to common ones are a subtype of those with only common entities, the supertype i.e., there is an entity supertype-subtype (ESS) relationship.
Salaried and commissioned employees are subtypes of the employee supertype. According to the model, employees can be either salaried, or commissioned, but not both. So one possible logical design is one base relation for each type:
Note: Once base relations are determined, for reasons of physical efficiency (e.g., anticipation of lots of queries about all employees) and barring other application requirements, there is nothing to prevent physical storage of the data in an indexed sequantial file with the following records:
(EMP#,ENAME,HIREDATE,SALARY,COMM)Relations do not occur, nor need they at the physical level, so this is a perfectly acceptable physical design. EMPS, SAL_EMPS and COMM_EMPS can be derived as projection views at the logical level. Default values outside the otherwise permissible value ranges for the attributes can be used instead of NULLs for inapplicable SALARY and COMMISSION values. Default values are not just NULLs by another name: they occur in the physical, not logical design. They extend the domain and, except by caveat, are not legitimate values for computational purposes (again, logical inference and relational operators occur only in the logical model). So there is never a concern to the user--they are handled by a true RDBMS transparently, provided users understand the domains.
Generally, ESS relationships yield designs with base relations (that may be stored) and views derived from them. For this reason, it is assumed that applications will be built on top of views that represent particular entity supertypes or subtypes.
Note that this design violates the POOD: EMPS data is derivable logically as a union view from SAL_EMPS and COMM_EMPS.
Stay tuned for Part II for the consequences.
 McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLKS, forthcoming (chapter drafts).
 McGoveran, D., Nothing from Nothing, Parts 1-4.
 Pascal, F., The Final NULL in the Coffin: A Relational Solution to Missing Data.
Do you like this post? Please link back to this article by copying one of the codes below.URL: HTML link code: BB (forum) link code: