Monday, September 19, 2016

The Principle of Orthogonal Database Design Part I


Note: This post replaces several previous posts which now redirect here. The example is not meant to be realistic, but to convey specific points in an accessible way.

"The principle of orthogonal design (abbreviated POOD) ... is the second of the two principles of database design, which seek to prevent databases from being too complicated or redundant, the first principle being the principle of full normalization (POFN). Simply put, it says that no two relations in a relational database should be defined in such a way that they can represent the same facts. As with database normalization, POOD serves to eliminate uncontrolled storage redundancy and expressive ambiguity, especially useful for applying updates to virtual relations (e.g., view (database)). Although simple in concept, POOD is frequently misunderstood ... is a restatement of the requirement that a database is a minimum cover set of the relational algebra. The relational algebra allows data duplication in the relations that are the elements of the algebra. One of the efficiency requirements of a database is that there be no data duplication. This requirement is met by the minimum cover set of the relational algebra." --Wikipedia.org

Monday, September 12, 2016

This Week

1. Quote of the Week
"Data sense-making does not benefit from the relational data model. Dr. Codd’s rules for relational modeling were designed to improve efficiencies in data processing and storage, not to make data more intelligible. In fact, structuring data relationally makes the work of data sensemaking more difficult, which is why Dr. Kimball created dimensional data modeling and also why an entire industry of middleware products emerged to hide the complexities of relational models." --Stephen Few, PerceptualEdge.com

2. To Laugh or Cry?

Saturday, September 3, 2016

Database Design: Relation Predicates and "Identical Relations"

I have rewritten an earlier post that you should also read:
Database Design: Property-Sharing Classes and Relations
Here's what's wrong with the last wrong picture I posted, namely:

Q: "Can you have 2 tables
  • VIEWS
  • DOWNLOADS
with identical structure in a good DB schema (item_id, user_id, time). Some of the records will be identical but their meaning will be different depending on which table they are in. The "views" table is updated any time a user views an item for the first time. The "downloads" table is updated any time a user downloads an item for the first time. Both of the tables can exist without the other."
A1:"I don't think that there is a problem, per se. From a E/R modeling point of view I don't see a problem with that, as long as they represent two semantically different entities."
A2:"Are you saying that both tables have an 'item_id' Primary Key? In this case, the fields have the same name, but do not have the same meaning. One is a 'view_id', and the other one is a 'download_id'. You should rename your fields consequently to avoid this kind of misunderstanding."
A3: "Chris Date and Dave McGoveran formalised the Principle of Orthogonal Design. Roughly speaking it means that in database design you should avoid the possibility of allowing the same tuple in two different relvars. The aim being to avoid certain types of redundancy and ambiguity that could result."
A4: "When designing a DB there are lots of different parameters, and some (e.g.: performance) may take precedence. Case in point: even if the structures (and I suppose indexing) are identical, maybe "views" has more records and will be accessed more often. This alone could be a good reason not to burden it with records from the downloads."
--Can you have 2 identical tables in a good schema?, StackOverflow.com

Friday, August 26, 2016

Data, Information, Knowledge Discovery, and Knowledge Representation

 My August post @All Analytics

As any true scientist should know, science has a context of discovery and a context of validation (I wonder how many of those who call themselves “data scientists” know the concepts). In the former, experiments are usually done to discover relationships -- a theory (to be validated in the latter, in which further implications thereof can also be derived). So there is no theory/conceptual model/knowledge yet. Data or information can be represented by any database (e.g. a SQL database can represent variables and/or runs as columns and/or rows in tables), docubase and even file -- none is superior or inferior for analytical purposes.

Read it all.

Please comment there, not here.

Monday, August 22, 2016

This Week

1. What's wrong with this picture?
"Can you have 2 tables
- VIEWS
- DOWNLOADS
with identical structure in a good DB schema (item_id, user_id, time). Some of the records will be identical but their meaning will be different depending on which table they are in. The "views" table is updated any time a user views an item for the first time. The "downloads" table is updated any time a user downloads an item for the first time. Both of the tables can exist without the other.

"I don't think that there is a problem, per se from a E/R modelling point of view, as long as they represent two semantically different entities."
"Are you saying that both tables have an 'item_id' Primary Key? In this case, the fields have the same name, but do not have the same meaning. One is a 'view_id', and the other one is a 'download_id'. You should rename your fields consequently to avoid this kind of misunderstanding."
 "Chris Date and Dave McGoveran formalised the Principle of Orthogonal Design. Roughly speaking it means that in database design you should avoid the possibility of allowing the same tuple in two different relvars. The aim being to avoid certain types of redundancy and ambiguity that could result."

"When designing a DB there are lots of different parameters, and some (e.g.: performance) may take precedence. Case in point: even if the structures (and I suppose indexing) are identical, maybe "views" has more records and will be accessed more often. This alone could be a good reason not to burden it with records from the downloads."
--Can you have 2 tables with identical structure in a good DB schema?, StackOverflow.com

Monday, August 15, 2016

Understanding Keys: Entity-Defining Properties and Entity Names

Here's what's wrong with last week's picture, namely:
"My understanding has always been that a primary key should be immutable, and my searching since reading this answer has only provided answers which reflect the same as a best practice. Under what circumstances would a primary key value need to be altered after the record is created?"

"When a primary key is chosen that is not immutable?"
--Why should a primary key change, StackExchange.com
This is, a more detailed treatment of keys undertaken in:
[1] On Kinds of Keys: Natural, Primary and Surrogate Are Sufficient
[2] Surrogate Key Illusions
[3] Duplicates: Stating the Same Fact More Than Once Doesn't Make It Truer, Only Redundant
which I have just revised. It is the result of several discussions I had with David McGoveran as part of my familiarization with his formal interpretation of Codd's relational data model (RDM), which he'll introduce in his forthcoming book LOGIC FOR SERIOUS DATABASE FOLKS. My own forthcoming book--THE DBDEBUNK GUIDE TO FUNDAMENTAL DATA MANAGEMENT MISCONCEPTIONS--incorporates some of his thoughts in an informal form accessible to the thinking data professional. As you shall see, it diverges in some important ways from the current understanding of the RDM (well, whatever little understanding there is).

Monday, August 8, 2016

This Week

1. What's wrong with this picture?
Q: "My understanding has always been that a primary key should be immutable, and my searching since reading this answer has only provided answers which reflect the same as a best practice. Under what circumstances would a primary key value need to be altered after the record is created?"

A: "When a primary key is chosen that is not immutable?" --StackExchange.com

2. Quote of the Week

"We start with a value. This may be a number, a date, or a chunk of text. Domain refers to the meaning for a value or a set of possible values. When we have a value with a consistently and widely used set of units, value domains, or applications, we call this value a data element. A data element may be a ticket number, a temperature reading, a hair color. (Some modeling approaches omit the notion of data elements or domains.)
In all the techniques and stages of data modeling, the concepts of entity and attribute are universal. An entity is the “thing” that must be manipulated as a data object. Entity represents an indivisible concept that consists of data elements. Each data element in the entity is called an attribute. Conversely, we could say that we build an entity by attributing data elements to it. You create an entity as a whole, and you delete it as a whole." --What a Concept! Is Logical Data Modeling Obsolete? -- LinkedIn.com
Note: In the preface to my PRACTICAL DATABASE FOUNDATIONS series of papers I deplored an author's (considered an expert) reliance on an industry ANSI committee as a starting point in an explanation of conceptual, logical and physical data fundamentals--there is practically 0 chance of soundness and 100% chance of confusion. So when this article started with "In the common usage established by ANSI in 1975, data modeling goes from abstract to concrete in three steps" I knew I would not read very far. Sure enough, I stopped almost immediately, after the above quote. If you don't understand why, I recommend you check out my papers.