Monday, June 30, 2014

Big Data, Normalization & Analytics: Meaning & Constraints

My June post @All Analytics

Combining data extracts from databases for analytical purposes without knowing what the source database tables mean -- what exactly in the real world they represent -- can produce wrong results.

Read it all. (And please comment there, not here)


Sunday, June 29, 2014

Denormalization: Database Bias, Integrity Tradeoff and Complexity

The common and entrenched misconception about normalization was recently visible yet again in a LinkedIn exchange.
R: Unless the need is for ACID compliant transactions, denormalization is generally not considered logically, physically or whatever-ally-–so essentially a thoroughly normalized mode is relevant for a write-infrequently consumption of data and data integrity can be guaranteed by design.

Sunday, June 22, 2014

Weekly Update

I have corrected a mistake in For Codd's Sake: a mathematical relation is not the Cartesian product (CP) of the domains over which it is defined. Two readers correctly pointed out what I actually wrote myself in my business modeling paper:
Mathematically, a relation on domains—which are sets of values of a type—is a subset of the Cartesian product of the domains.
Note that the whole CP is also a subset, so it is also a relation, which happens to have useful applicability to business modeling and database design. In the database context, it can be pictured as the pool of all possible rows--past, present and future--for a R-tablevar defined by the domains' types. A database R-table is the set of actual rows at any point in time that is consistent with the set of all integrity constraints to which the R-table is subject (see Business Modeling for Database Design).

1. Quote of the Week
NoSQL usant correct m'y indeed totof n'y most of the dev ans devops who clearly thing nosql Means they will ne a le to do whatever they wants ans still have answers to their twisted query in a correct time. Those people see nosql as the mean to get ris of DBAs. And il not kiddin since it's happening right now un many companies i know of.

2. To Laugh or Cry?
Architecting IMS for Big Data - a symbiotic relationship.

3. Online 

4. Interesting Elsewhere
IEEE Computer Issue on CAP Theorem
H/t Erwin Smout.

5. And now for something completely different

The PostWest.

Sunday, June 15, 2014

Conceptual Muddling and Database Kludges

The Problem with "Conditional" Unique Constraints... raised on LinkedIn and the response to it merit the attention of practitioners.
EP: I'm seeing more implementations where developers/database design professionals are implementing the following type of conditional unique constraints, typically related to the use of 'soft deletes'): 
Uniqueness is defined for {COL_A,COL_B} iff COL_ACTIVE_FLAG='YES'. Any row with a COL_ACTIVE_FLAG = 'NO' is excluded from the unique requirement. Note that I do not mean that COL_ACTIVE_FLAG is part of the key; instead it is being used to conditionally enforce the key.
Most SQL DB implementations I know of do not allow this type of constraint to be enforced declaratively. Instead it relies on tricks within the index specification for enforcement.

This conditional application of unique constraints troubles me. The prevalent use of a surrogate primary key avoids duplicates in the table as a whole. But this approach seems to declare a business rule that can be turned off and on based on the value of a non-key column. However, I cannot definitively find a specific rule/guideline within relational theory that it violates. Any thoughts on that matter? I know there are design alternatives. I'm looking at a way of critiquing (or ultimately accepting) this type of approach for a theoretical standpoint.
(The fact that there is no true always-on business key other than the surrogate key IS an issue, but the fact that technically the surrogate PK prevents duplicates is almost always presented as a counter argument).

It definitely feels there is more than one kind of business entity here - but they do share the same attributes. I also see a similar design pattern when OO classes are mapped to a table during implementation (when the implementation approach is to combine classes into a single table and an attempt is made to enforce two types of 'uniqueness').

Sunday, June 8, 2014

Weekly Update

1. Quote of the Week
Logical design is where the Architect defines entities (which will become tables in a database), attributes (which will become columns in a database), etc. This is typically the level that SMEs are most comfortable. I think that a Logical design may deal with data types and keys, but it does not cater to any specific platform or engine.
Physical design is where the Architect translates the logical design into tables, columns, datatype specifics like INT versus NUMERIC, indexes, partitions, etc. This is where "the rubber meets the road" and the logical design gets mapped into a form that can exist and be tested on a database server.
While I'm sure that someone will object to this link on religious grounds, the discussion  does a pretty good job of making the distinctions that concern me.
2. To Laugh or Cry?
MyBatis Schema Migration System
H/t Ben Samuel, who adds:
"From the department of "we haven't really thought this feature through" comes this gem, one of several schema migration systems that allow "reverse migrations" or "downward migrations". Whereas a forward migration creates tables, columns, etc., a reverse migration drops them. The video proudly shows them "reverse migrating" their database until all tables are dropped. Another vendor patiently explains why they don't offer this feature."

3. Online Debunkings

4. Interesting Elsewhere
The Death Of Expertise

5. And now for something completely different

The "productive" business and tech elite:
God's given gift to humanity and pillars of society.

Sunday, June 1, 2014

Big Data, Normalization & Analytics

May Post @All Analytics

What you need to know for the purposes of this discussion is that tables that bundle multiple entity classes have certain drawbacks. Normalization is a design repair procedure that unbundles the classes -- the columns representing attributes pertaining to each class -- each into its own table. This is possible if and only if there is no data lost or made spurious in the process -- that is, when a bundling of table A is mathematically equivalent to the joins of its unbundled projection tables B and C.

Read it all. (And please comment there, not here)