Sunday, February 7, 2016

Noticed this week

1. Quote of the Week
NULL values can be very useful, especially on indexes, as an indication of "index is not set" or "no index here", or "default inherited index applies".

I use Null values extensively (in huge database systems) with not only no problems whatsoever, but measurably signifcant advantages. People who try to tell you that "Nulls are the work of the devil", or :the sky will fall down if you allow nulls", or some such unsubstantiated childish delusion are exclusively ignorant of the correct ways to handle them. (Or too lazy/ineducable to learn their correct implementation and/or benefits.)

Fact, just plain indisputable fact.

Wednesday, January 27, 2016

It's not tables, it's the relationships

January post @ my All Analytics blog.

It’s Not Tables, It's the Relationships
"Logical refers to the relationships among the components of the relation, not to any arrangement of the components of a relation. Any presentation that preserves those relationships and adds no extra ones is acceptable. An R-table is one possible such presentation. The problem is that people fixate on this one presentation, identifying it with relation. They then go even further and force the physical implementation of a relation to be table-like."
-- David McGoveran
Read it all.

Sunday, January 24, 2016

Noticed this week

1. Quote of the Week
I don't know that I would say that the RM is dead per se, or even holding us back. But I have to ask... why the relational model? ...A Hierarchical Model would work. In fact if we look at these NoSQL databases ... Hierarchical Models work better than relational models. The point is that some of the factors which cause one to think in terms of relationships have changed. Disk is cheap. There are definitely problems where having a strongly typed and structured model make sense. Then there are problems where not having a rigid model make sense. Structure on read seems to make sense these days. PS... Relational models don't scale. Especially on MPP.

Sunday, January 17, 2016

"Tableitis", CLCitis, LPCitis, "Essential" Modeling and the Relational Model

A serious problem in the database field is not just that many data professionals do not know and understand the RDM, but also that they believe they do and criticize it. It is relatively easy to detect such critics. In "Recognizing and Treating Tableitis" Gordon Everest tries to be humorous, but it ends up more sad than funny.

Sunday, January 10, 2016

Noticed This Week

1. Quote of the Week
Table (n.) – a collection of information (data?) describing a population of entities which possess some common characteristics, called attributes. Tables are the building block of relational databases.  Tables must generally be “normalized,” at least to 1NF.  That may be an appropriate way to think of databases when implemented in a modern day DBMS.  However, it is not the way the world thinks logically. People have no problem with commonly occurring phenomena such as:
* A multi-valued attribute, e.g., an Employee possesses multiple Skills.
* Many-to-many (M:N) relationships, e.g., as between Employees and Projects
* A relationship with attributes

--Gordon Everest, Recognizing and Treating Tableitis

Monday, January 4, 2016

The Real Data Science Series: 1NF In Theory & Practice

My January presentation for the San Francisco Microsoft Data Platform User Group:

Wednesday, January 13, 2016 6:30 PM
Microsoft Reactor, 680 Folsom Street , San Francisco, CA (map)

In the early 70's E. F. Codd provided a very precise, formal definition of a table in its normal form. Any table not normalized was in violation of the RDM and not considered a R-table. But you are unlikely to have encountered that definition. Instead you probably heard about "repeating groups", "simple domains" and "atomic values", neither of which are formal relational concepts. C. J. Date provided a 1NF definition different than Codd's. And you probably think that the same design principle underlies all normal forms, but 1NF is somewhat distinct.

This presentation introduces order and makes sense of all this, including the practical implications for SQL database practice. It is first in THE REAL DATA SCIENCE series (that includes papers and seminars) expounding the Codd-McGoveran relational model, distinct from Date-Darwen's.You will learn:
• Normalization vs. further normalization
• Repeating Groups
• Simple domains and atomic values
• SQL and 1NF

Sunday, January 3, 2016

NoSQL and SQL: A Plague on Both Their Houses

Oracle Defends Relational DBs Against NoSQL Competitors prompted Does Oracle Really Understand NoSQL?, which was shared on LinkedIn and triggered a LinkedIn exchange in which I participated. The following comment is an adequate summary of the second article:
JN: It's an unfortunate bit of propaganda. Some truth mixed in with distractions and irrelevant comparison. I've met the DataStax team. They're smart people with a solid understanding of their space. I'm disappointed to see them mix good and bad information into something that looks like objective truth.
and the first is not much better. The interested reader can visit all three links. What I want to do here is amplify on some of my LinkedIn comments and add some.