Monday, June 30, 2014

Big Data, Normalization & Analytics: Meaning & Constraints

My June post @All Analytics

Combining data extracts from databases for analytical purposes without knowing what the source database tables mean -- what exactly in the real world they represent -- can produce wrong results.

Read it all. (And please comment there, not here)


Sunday, June 29, 2014

Denormalization: Database Bias, Integrity Tradeoff and Complexity

The common and entrenched misconception about normalization was recently visible yet again in a LinkedIn exchange.
R: Unless the need is for ACID compliant transactions, denormalization is generally not considered logically, physically or whatever-ally-–so essentially a thoroughly normalized mode is relevant for a write-infrequently consumption of data and data integrity can be guaranteed by design.

Sunday, June 22, 2014

Weekly Update

I have corrected a mistake in For Codd's Sake: a mathematical relation is not the Cartesian product (CP) of the domains over which it is defined. Two readers correctly pointed out what I actually wrote myself in my business modeling paper:
Mathematically, a relation on domains—which are sets of values of a type—is a subset of the Cartesian product of the domains.
Note that the whole CP is also a subset, so it is also a relation, which happens to have useful applicability to business modeling and database design. In the database context, it can be pictured as the pool of all possible rows--past, present and future--for a R-tablevar defined by the domains' types. A database R-table is the set of actual rows at any point in time that is consistent with the set of all integrity constraints to which the R-table is subject (see Business Modeling for Database Design).

1. Quote of the Week
NoSQL usant correct m'y indeed totof n'y most of the dev ans devops who clearly thing nosql Means they will ne a le to do whatever they wants ans still have answers to their twisted query in a correct time. Those people see nosql as the mean to get ris of DBAs. And il not kiddin since it's happening right now un many companies i know of.

2. To Laugh or Cry?
Architecting IMS for Big Data - a symbiotic relationship.

3. Online 

4. Interesting Elsewhere
IEEE Computer Issue on CAP Theorem
H/t Erwin Smout.

5. And now for something completely different

The PostWest.

Sunday, June 8, 2014

Weekly Update

1. Quote of the Week
Logical design is where the Architect defines entities (which will become tables in a database), attributes (which will become columns in a database), etc. This is typically the level that SMEs are most comfortable. I think that a Logical design may deal with data types and keys, but it does not cater to any specific platform or engine.
Physical design is where the Architect translates the logical design into tables, columns, datatype specifics like INT versus NUMERIC, indexes, partitions, etc. This is where "the rubber meets the road" and the logical design gets mapped into a form that can exist and be tested on a database server.
While I'm sure that someone will object to this link on religious grounds, the discussion  does a pretty good job of making the distinctions that concern me.
2. To Laugh or Cry?
MyBatis Schema Migration System
H/t Ben Samuel, who adds:
"From the department of "we haven't really thought this feature through" comes this gem, one of several schema migration systems that allow "reverse migrations" or "downward migrations". Whereas a forward migration creates tables, columns, etc., a reverse migration drops them. The video proudly shows them "reverse migrating" their database until all tables are dropped. Another vendor patiently explains why they don't offer this feature."

3. Online Debunkings

4. Interesting Elsewhere
The Death Of Expertise

5. And now for something completely different

The "productive" business and tech elite:
God's given gift to humanity and pillars of society.

Sunday, June 1, 2014

Big Data, Normalization & Analytics

May Post @All Analytics

What you need to know for the purposes of this discussion is that tables that bundle multiple entity classes have certain drawbacks. Normalization is a design repair procedure that unbundles the classes -- the columns representing attributes pertaining to each class -- each into its own table. This is possible if and only if there is no data lost or made spurious in the process -- that is, when a bundling of table A is mathematically equivalent to the joins of its unbundled projection tables B and C.

Read it all. (And please comment there, not here)


View My Stats