Sunday, June 26, 2016

On Kinds of Keys: Natural, Primary and Surrogate (REWRITTEN); and the NoSQL Illusion

Request to my readers: My current posts and writings, including my forthcoming book and paper revisions, are in large part dedicated to making accessible to practitioners David McGoveran's formal interpretation of Codd's RDM, with refinements and extensions he believes Codd would have ended up with had he lived long enough, including some corrections (David will expose the formalisms in his own book). In some important aspects there is divergence from the current understanding of the RDM--well, whatever little understanding there is--which should be of interest to all thinking practitioners, particularly those entering the field whose minds have not yet been corrupted.

I ask, therefore, my readers--particularly the regular ones--please disseminate the information by linking to this site's posts on as many of the web and social media sites for whose audience you deem the material important. It is not available anywhere else. Thanks.



Here is what is wrong with last week's picture painted by Martijn Evers in Kinds of Keys: On the Nature of Key Classifications.
"Many data and information modelers talk about all kinds of keys (or identifiers. I'll forego the distinction for now). I hear them they talk about primary keys, alternate keys, surrogate keys, technical keys, functional keys, intelligent keys, business keys (for a Data Vault), human keys, natural keys, artificial keys, composite keys, warehouse keys or Dimensional Keys (or Data Warehousing) and whatnot. Then a debate rises on the use (and misuse) of all these keys ... The foremost question we should actually ask ourselves: can we formally disambiguate kinds of keys (at all)?"
I, of course, sympathize, as I've deplored the misuse and abuse of terminology in the industry for longer than I care to remember. This is not unique to keys, it's a general absence of foundation knowledge.

The answer is yes, we can disambiguate--the key to keys is understanding what
in the real world they represent and what database function they fulfill. Things are much simpler than the mindless industry jargon.

Sunday, June 19, 2016

This Week

1. What's wrong with this picture?

This week's picture is the one of the state of knowledge about keys that Martijn Evers painted in Kinds of Keys: On the Nature of Key Classifications, that I had already commented on. As a result of discussions I've been having with David McGoveran in the context of our forthcoming books (his LOGIC FOR SERIOUS DATABASE FOLKS, my DBDEBUNK GUIDE TO FUNDAMENTAL DATA MANAGEMENT MISCONCEPTIONS), I've decided to rewrite my
comments On Kinds of Keys on the subject. I refer the reader to Martijn's article for a refresher--my rewrite will be posted next week.

2. Quote of the Week

There are no rules of normalization for non-relational databases. Effectively, you start out by denormalizing everything. Which means you're designing the data organization to serve specific queries. So follow the same principle in NoSQL databases as you would for denormalizing a relational database: design your queries first, then the structure of the database is derived from the queries. --Bill Karwin, What is a good way to design a NoSQL database

Sunday, June 12, 2016

Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation

From last week:

What's wrong with this picture? (Kinds of Data Models, LinkedIn.com)

David Hay: "Part of the ... confusion as to what exactly was meant by “data modeling”--conceptual, logical or physical--is that most data modeling activities seem to focus on achieving good relational database designs ... my approach is the portrayal of the underlying structure of an enterprise’s data--without regard for any technology that might be used to manage it ... a “conceptual data model” ... that represents the business."

Nothing raises uncertainty whether to laugh or cry better than attempts to dispel confusion which suffer from the very confusion they purport to dispel.

Sunday, June 5, 2016

This Week

1. What's wrong with this picture?
David Hay: Part of the ... confusion as to what exactly was meant by “data modeling”--conceptual, logical or physical--is that most data modeling activities seem to focus on achieving good relational database designs ... my approach is the portrayal of the underlying structure of an enterprise’s data--without regard for any technology that might be used to manage it ... a “conceptual data model” ... that represents the business.

Nigel Higgs: ... many folks do not get the difference between the Barker entity relationship style of modeling and the relational style of modeling ... [because] the modeling conventions are very similar and the former [is always] a precursor to RDBMS design.

Clifford Heath: Any terminology for models must project three aspects of intention: (a) audience, (b) level of detail and (c) purpose. These three variables are sufficient to discriminate all the main kinds of models in use. The traditional terms of "conceptual/logical/physical" are manifestly inadequate.

Remy Fannader: Models are meant to describe sets of instances (objects or behaviors).
--Kinds of Data Models, LinkedIn.com

2. Quote of the Week

The first consideration that needs to be made when selecting a database is the characteristics of the data you are looking to leverage. If the data has a simple tabular structure, like an accounting spreadsheet, then the relational model could be adequate. Data such as geo-spatial, engineering parts, or molecular modeling, on the other hand, tends to be very complex. It may have multiple levels of nesting and the complete data model can be complicated. Such data has, in the past, been modeled into relational tables, but has not fit into that two-dimensional row-column structure naturally. --Jnan Dash, RDBMS vs. NoSQL: How do you pick?

Sunday, May 29, 2016

Multidimensional Relations, Flat Tables and Logical-Physical Confusion

Many--and I would guess, most--data professionals are clueless about the relational data model (RDM) and its practical value for database practice. They confuse RDBMS with SQL DBMS and use relational jargon, but don't understand what it really means and don't care whether they do or not. As I announced, every other week (with occasional exceptions),I will debunk common and entrenched myths and misconceptions exposed by the "What's wrong with this picture?" of the preceding week.

From last week:

"Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables. This usually leads to data storage becoming redundant, and also difficult to maintain on account of addition and deletion anomalies. This is only the case if we do not normalize the data. Keys are fields or combinations of fields used to identify records." --Nigel Peck, Denormalization Summary
"Relations are multidimensional. They are not flat. They are not two dimensional. Don't let the term table mislead you." I read the above statement on the back cover of CJ Date book on relational theory "Database in depth". Can anyone help how to visualize this multidimensional nature of relations? --LinkedIn.com

Wednesday, May 25, 2016

Why Data Scientists Must Understand Normalization

My May post @All Analytics:

We are constantly told how data scientists must be “jacks of many skills”, but one of the most important is rarely included in the list.

Very few databases are properly designed. Many SQL databases are denormalized inadvertently, or intentionally (and erroneously) "for performance". They require special constraints to control data redundancy and prevent inconsistencies, which are practically never enforced. Analysts cannot, therefore, take database consistency for granted. Furthermore, to issue sensible queries and ensure correct results and interpretation thereof, it’s not enough for analysts to know the types of fact represented in the database, but also whether and how the database designer has chosen to bundle -- nest or merge -- those facts and how to disentangle them for analysis.

Read it all (and comment there, not here, please). 



Sunday, May 22, 2016

This Week (& a Change in Format)

I am changing the format of the posts on DBDebunk.

"This Week" posts every other week will continue to be followed by posts on data and relational fundamentals, but with a twist: each of the latter posts will explain the preceding week's "What's wrong with this picture?"Here's the first post in the new format

1. What's wrong with this picture?

"Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables. This usually leads to data storage becoming redundant, and also difficult to maintain on account of addition and deletion anomalies. This is only the case if we do not normalize the data. Keys are fields or combinations of fields used to identify records." --Nigel Peck, Denormalization Summary