Sunday, May 29, 2016

Multidimensional Relations, Flat Tables and Logical-Physical Confusion

Many--and I would guess, most--data professionals are clueless about the Relational Data Model (RDM) and its practical value for database practice. They confuse RDBMS with SQL DBMS and use relational jargon, but don't understand what it really means and don't care whether they do or not. As I announced, every other week (with occasional exceptions), I will debunk common and entrenched myths and misconceptions exposed by the "What's wrong with this picture?" of the preceding week.

From last week:

"Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables. This usually leads to data storage becoming redundant, and also difficult to maintain on account of addition and deletion anomalies. This is only the case if we do not normalize the data. Keys are fields or combinations of fields used to identify records." --Nigel Peck, Denormalization Summary
"Relations are multidimensional. They are not flat. They are not two dimensional. Don't let the term table mislead you." I read the above statement on the back cover of CJ Date book on relational theory "Database in depth". Can anyone help how to visualize this multidimensional nature of relations? --LinkedIn.com

Wednesday, May 25, 2016

Why Data Scientists Must Understand Normalization

My May post @All Analytics:

We are constantly told how data scientists must be “jacks of many skills”, but one of the most important is rarely included in the list.

Very few databases are properly designed. Many SQL databases are denormalized inadvertently, or intentionally (and erroneously) "for performance". They require special constraints to control data redundancy and prevent inconsistencies, which are practically never enforced. Analysts cannot, therefore, take database consistency for granted. Furthermore, to issue sensible queries and ensure correct results and interpretation thereof, it’s not enough for analysts to know the types of fact represented in the database, but also whether and how the database designer has chosen to bundle -- nest or merge -- those facts and how to disentangle them for analysis.

Read it all 

Please comment there, not here. 



Sunday, May 22, 2016

This Week (& a Change in Format)

I am changing the format of the posts on DBDebunk.

"This Week" posts every other week will continue to be followed by posts on data and relational fundamentals, but with a twist: each of the latter posts will explain the preceding week's "What's wrong with this picture?"Here's the first post in the new format

1. What's wrong with this picture?

"Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables. This usually leads to data storage becoming redundant, and also difficult to maintain on account of addition and deletion anomalies. This is only the case if we do not normalize the data. Keys are fields or combinations of fields used to identify records." --Nigel Peck, Denormalization Summary

Sunday, May 15, 2016

Weak Entities, Referential Constraints and the Conceptual-Logical Conflation (REWRITE)

I rewrote the earlier post Weak Entities, Referential Constraints and Normalization.

Data model agnosticism has nothing to do with the “emergence/popularity of non-relational technologies” (NoSQL, NewSQL did not exist when Chen introduced the E/RM in the 70's), it is  is inherent in conceptual modeling--which is what E/RM is used for--by definition. Conceptual models are expressed in real world terms--facts about classes of property-sharing entities. To be represented in a database, a conceptual model must be formalized as a logical model expressed in database terms. That's when a data model comes into play, to provide abstract data structure, integrity and manipulation (relation, relational constraints and relational algebra in the relational case). The conceptual and  logical were always separate, but as Everest comments show, levels of representation are constantly confused. 

Read it all




Monday, May 2, 2016

This Week

1. What's wrong with this picture?
The query framework, as it exists in modern day MongoDB, supports the following functionality:
  • Filtering. The rough equivalent of the WHERE clause in SQL.
  • Paging. The rough equivalent of LIMIT and OFFSET in SQL.
  • Sorting. The rough equivalent of ORDER BY in SQL.
--John De Goes, MongoDB: The Frankenstein Monster of NoSQL Databases, LinkedIn.com