DATABASE DEBUNKINGS

Sunday, May 15, 2016

Weak Entities, Referential Constraints and the Conceptual-Logical Conflation

Follow @DBDebunk Follow @ThePostWest

Note: This is a rewrite of a 09/07/14 post.

A LinkedIn exchange initiated by the question What is a weak entity? diversified into various aspects of data modeling and database design, some of which was contaminated by the conceptual-logical conflation.

"Peter Chen (E/R Modeling, 1976) used the term "weak" entity to describe one which could not meaningfully have an independent existence. An example might be an Order which requires a customer (and a Product or set of products). A weak entity need not be a composite, as in your OrderItem example. The central issue here is dependency of one entity [of one] type on another [of another type]. Furthermore, there could be more that one such dependency. In your example, the OrderItem would be dependent on BOTH Order and Item.

It is important to note that this dependency will NOT be enforced by a simple RDBMS unless you also define referential integrity on both parts of the composite key, notwithstanding that some RDBMS's (e.g., SQL Server) couple the definition of the relationship with a referential integrity constraint. In other words, you cannot have a defined relationship without enforcing referential integrity.

This is not always desirable. Consider the task of bulk loading some records which have a relationship with some other entity type(s). The only practical way to do this is to first turn off referential integrity enforcement which means deleting the definition of the relationship entirely. After completing the loading of data, if you reinstate the defined relationship, will the system automatically ensure that referential integrity is not violated? Good question." --Gordon Everest

Relationships in the E/RM are at the conceptual level. Dependencies of entities of one class/type on those of another are one kind of relationships defined as business rules in a conceptual model. Referential constraints are their formal representation at the logical level of a relational database. So it's not that "some" SQL DBMS's (true RDBMS's do not exist) "couple the definition of the relationship with a referential integrity constraint", the constraints are precisely how a RDBMS (what exactly is a "simple RDBMS"?) enforces business rules: a defined relationship is enforced as a referential constraint in a relational database. How does Everest propose to enforce business rules without constraints?

There are no circumstances in which FK constraints "are not desirable" and deactivating them is not the the solution to bulk loads, staging tables are. For example:

"In Oracle you can instruct the DBMS to log violating rows into an 'exceptions' table while enabling a constraint (be it, a primary key, unique key, foreign key, or plain check constraint). This fits the bill better, particularly if it's possible to INSERT...SELECT them after corrections." --Toon Koppelaars

"SQL Server will check on re-enabling the constraint that it is not violated by the new state of the database. If it is, it only reports that an error occurred. It has a real time violation catalog to easily map a constraint name to a specific violation and transaction id. In other words, doing bulk batch constraint validation is part of the product." --William Sisson and Racim Boudjakdji

Much of the criticism of the RDM is rooted in ignorance of data and relational fundamentals and the history of database management.

"How to implement“existence dependence” ... between two entities ...is a design decision and is dependent upon the architecture of the implementation technology. [Chen's E/R models] ... were completely technology agnostic and had no “relational” flavor at all. Why? With the rapid emergence and popularity of the NoSQL, NewSQL, Graph Databases, etc. no longer can relational implementation and FK be assumed and we again need to separate the “conceptual data model” from the “logical data model”. Whether this is evolution of de-evolution is a whole other subject (for those of us that remember WHY relational technology emerged in the 1980s--and as a warning to those who would forget their history--IDMS, IMS, etc.)"

The relational model has been so dominant for so long, we have forgotten our roots and the reason that there were originally three layers of data model, not just two. The rise of the post-relational technologies has again required their creation and reminds us of their utility. --David Tryon

Data model agnosticism has nothing to do with the “emergence/popularity of non-relational technologies” (NoSQL, NewSQL did not exist when Chen introduced the E/RM), it is is inherent in conceptual modeling--which is what E/RM is used for--by definition. Conceptual models are expressed in real world terms--facts about classes of property-sharing entities. To be represented in a database, a conceptual model must be formalized as a logical model expressed in database terms. That's when a data model comes into play, to provide abstract data structure, integrity and manipulation (relation, constraints and relational algebra in the relational case). The conceptual and logical were always separate, but as Everest comments show, levels of representation are constantly confused.

Incidentally

A logical model is implemented in specific hardware and DBMS software, a conceptual model is formalized as a logical model by means of a data model (e.g., the RDM). Using 'implementation' for both induces and reinforces the confusion.
Emergence, or popularity are not grounds for technology adoption: a superior generality-to-simplicity ratio (G2SR), soundness and flexibility are.

If only there were relational dominance. Relational technology was treated by the industry as just any other "hot" fad and was never understood and implemented. Instead we got SQL DBMS's, whose even poor relational fidelity proved superior to the hierarchic and network products that they replaced, but did not confer all the advantages of true RDBMS's.

Do away with the RDM and constraints and you're back to relying on application developers for database functions, particularly integrity enforcement and optimization, without guaranteed logical and semantic correctness--clearly de-evolution.

Monday, May 2, 2016

This Week

Follow @DBDebunk Follow @ThePostWest

1. What's wrong with this picture?

The query framework, as it exists in modern day MongoDB, supports the following functionality:

Filtering. The rough equivalent of the WHERE clause in SQL.

Paging. The rough equivalent of LIMIT and OFFSET in SQL.

Sorting. The rough equivalent of ORDER BY in SQL.

--John De Goes, MongoDB: The Frankenstein Monster of NoSQL Databases, LinkedIn.com

Re-write

Follow @DBDebunk Follow @ThePostWest

My April column @All Analytics.

A R-table with attributes defined only on simple domains takes a less convoluted form -- a normal form -- devoid of nesting. If R-tables are in the preferred normal form i.e., components meaningful to applications (here, employee attributes) are simple domains in their own right and a true RDBMS enforces value atomicity -- first order logic is sufficient. This imposes some limitations on the expressive power of data languages, but they are declarative and PDI and simplicity are preserved. A true RDBMS enforces atomicity via a data language that does not allow applications to access attribute components not explicitly defined on their own domains.

Read it all. (Please comment there, not here)

FYI: I have revised all three parts of the series on 1NF -- mainly refinements and clarifications.

Tuesday, April 19, 2016

First Normal Form in Theory and Practice Part 3

Follow @DBDebunk Follow @ThePostWest

09/19/23: For the latest on this subject see: FIRST NORMAL FORM - A DEFINITIVE GUIDE

Note: This is a 11/23/17 revision of Part 3 of a three-part series that replaced all of my previous posts on the subject (pages of which redirect here), in order to further tighten integration with the formalization and interpretation [1] of McGoveran's formalization and interpretation [1] of Codd's true RDM.

(Continued from Part 2)

"Is this table in 1NF?" is a common question in database practice. On the other hand, "What problems are solved by splitting street addresses into individual columns?", or What's the best way to store an array in a relational database does not seem to evoke associations with 1NF. This reveals poor foundation knowledge.

Part 1 introduced the poor understanding of 1NF and Part 2 provided a correct definition and explanation. Part 3 explains how 1NF can be enforced by the data sublanguage, which SQL does not.

This Week

Follow @DBDebunk Follow @ThePostWest

1. What's wrong with this picture?

NoSQL database management systems give us the opportunity to store our data according to more than one data storage model, but our entity-relationship data modeling notations are stuck in SQL land. Is there any need to model schema-less databases, and is it even possible? --Theodore Hills, The Hybrid Data Model, Dataversity.net

First Normal Form in Theory and Practice Part 2

Follow @DBDebunk Follow @ThePostWest

09/19/23: For the latest on this subject see: FIRST NORMAL FORM - A DEFINITIVE GUIDE

Note: This is a 11/23/17 revision of Part 2 of a three-part series that replaced all of my previous posts on the subject (pages of which redirect here), in order to further tighten integration with the McGoveran formalization and interpretation [1] of Codd's true RDM.
(Cont'd from Part 1)

Part 1 raised the issue of poor understanding of Codd's concept of a simple domain with atomic values underlying 1NF. In Part 2 I clarify Codd's definition of 1NF and its correct interpretation.

This Week

Follow @DBDebunk Follow @ThePostWest

1. What's wrong with this picture?

Not Worth Repeating: Duplicates

Follow @DBDebunk Follow @ThePostWest

My March post @All Analytics.

Frequent hits @dbdebunk.com are driven by the question “Are keys mandatory?” Puzzlingly, many data professionals do not seem to understand why duplicates should be prohibited. This should worry analysts. But “Stating the same fact more than once, does not make it truer, only redundant,” as E. F. Codd used to say. The absence of an identifier means that individual entities are not meaningful, so this representation contradicts the real world. Contradictions produce problems. First, a DBMS is incapable of “visually” discerning a data entry duplication error from "valid" duplicates, which means high risk of inconsistent databases and wrong counts and other query results.

Read it all. (Please comment there, not here)

Sunday, March 20, 2016

This Week

Follow @DBDebunk Follow @ThePostWest

1. QUOTE OF THE WEEK

Q: I am trying to remove duplicate rows from a database containing 4446 tables and when finished, merge data from that database to an existing database with the same structure. Is there an easy way to remove duplicate rows from all those 4446 tables?

A1: Please tell us "what is a 4446 table?" And, please give examples of duplicate records.

A2: Recreate your database with unique primary keys. 2) Copy records from original database to recreated database, one table at a time. The duplicate records will drop on the floor. 3) Fix the thousands of bugs in your application code that created the duplicate rows in the first place.

A3: I'm creating temporary tables + I add unique index to that table and insert data into and then I rename it to the original one.

--stackoverflow.com

The Fourth V -- Veracity

Follow @DBDebunk Follow @ThePostWest

My February post @All Analytics.

"A fact is represented in the database not because it is categorically true, or categorically false, but only because somebody has asserted it, or has removed that assertion. We rely on people (and only on people!) for whatever a database system treats as true or false. When we authorize someone (or some application) to update the database, we accept their veracity, judgment, personal integrity, intelligence, comprehension, etc. insofar as the database is to have any utility. Nothing more can (or should) be said about truth and falsity." -- David McGoveran

Read it all. (Please comment there, not here)

Sunday, March 6, 2016

This Week

Follow @DBDebunk Follow @ThePostWest

1. Quote of the Week

[With] a declarative language and you have no real control over the execution plan. Heck, Oracle 12c can use multiple different execution plans for the same query depending on bind parameters, statistics not matching reality, and the phase of the moon.

No declarative language will ever be a first-class programming citizen in my eyes. Too much magic. Too non-linear. Too hard to debug for those reasons. They are great for when you have to do really simple things but once you step off the well worn path, you are in the thicket.
--reddit.com

First Normal Form in Theory and Practice Part 1

Follow @DBDebunk Follow @ThePostWest

09/19/23: For the latest on this subject see: FIRST NORMAL FORM - A DEFINITIVE GUIDE

Note: This is a 11/23/17 revision of Part 1 of a three-part series that replaced all of my previous posts on the subject (pages of which redirect here), in order to further tighten integration with the McGoveran formalization and interpretation [1] of Codd's true RDM.

On the one hand:

"... there is no generally accepted definition of 1NF ... the word that you see most often is 'atomic'. It is common to say that a relation is in 1NF if all its attributes [sic] are atomic ... Does 1NF equate to “atomic attribute [values]”? ... what [do] people have in mind [when they claim] atomicity? ... the [meaning] behind definitions is that you should rarely need to extract information from a value of an attribute ... But that explains why one cannot decide, depending on theory only, whether a relation is in 1NF ... it is a habitual use of data that makes attributes atomic, not theory. No wonder, there is so much mess in theory about what 1NF should be."

--What is the actual definition of First Normal Form, Vertabelo.com

On another, according to a DBDebunk reader:

"Codd in 1969/70 (and RM V/2 20 years later) gave a precise, theory-based definition of "atomic" aka "simple" aka "non-decomposable" (later aka non-"compound" aka non-"structured"): not relation-valued. And he gave a precise definition of "normalized" (1NF): relations free of relation-valued-domains (RVD)!"

All sorts of other definitions proliferate, for example:

"First normal form enforces these criteria:

- Eliminate repeating groups in individual tables.
- Create a separate table for each set of related data.
-Identify each set of related data with a primary key."
--First normal form, Wikipedia

Note: A relational databases consists of relations that can be visualized as R-tables. Normal forms are a property of relations, not R-tables -- a "R-table in 1NF" is shorthand for consistency with the underlying relation.

This Week

Follow @DBDebunk Follow @ThePostWest

1. Quote of the Week

Karen Lopez: To bring others up to speed, Fabian is using an academic taxonomy for data modeling terms. It's valid. It's popular in research. It's not used in any major data modeling tool, nor in many practitioner resources. I have been using the industry vernaculars in my posts and here. Part of the debate that Fabian is having is because he does not tolerate the industry terms, so he chooses to attack others who use the mainstream terms. He believes they are "wrong" instead of "alternative". So that's part of the pain we have in debating his positions. I'm bilingual, but I choose to use just one set in my writings.

Fabian Pascal: Karen, bollocks. See, I can use non-academic terms too.

Healthcare, Data Fundamentals and the PASS Summit (UPDATED)

Follow @DBDebunk Follow @ThePostWest

When, years ago in an online exchange, I argued that working with SQL DBMS's without knowledge and understanding of data and relational fundamentals is a costly proposition, an Oracle practitioner replied that "they train doctors on how to use medical devices, not teach them the theories behind them". I asked him what do doctors learn in their six years of medical school, but got no reply.

I have documented and debunked for decades the substitution of tool training for education and the ensuing "cookbook approach" to database practice it produces. While I have become more jaded, it is still difficult to run so frequently across something like

This Week

Follow @DBDebunk Follow @ThePostWest

1. Quote of the Week

NULL values can be very useful, especially on indexes, as an indication of "index is not set" or "no index here", or "default inherited index applies".

I use Null values extensively (in huge database systems) with not only no problems whatsoever, but measurably signifcant advantages. People who try to tell you that "Nulls are the work of the devil", or :the sky will fall down if you allow nulls", or some such unsubstantiated childish delusion are exclusively ignorant of the correct ways to handle them. (Or too lazy/ineducable to learn their correct implementation and/or benefits.)

Fact, just plain indisputable fact. --bytes.com

It’s Not Tables, It's the Relationships

Follow @DBDebunk Follow @ThePostWest

My January post @All Analytics.

"Logical refers to the relationships among the components of the relation, not to any arrangement of the components of a relation. Any presentation that preserves those relationships and adds no extra ones is acceptable. An R-table is one possible such presentation. The problem is that people fixate on this one presentation, identifying it with relation. They then go even further and force the physical implementation of a relation to be table-like." -- David McGoveran

Read it all. (Please comment there, not here)

Sunday, January 24, 2016

This Week

Follow @DBDebunk Follow @ThePostWest

1. Quote of the Week

I don't know that I would say that the RM is dead per se, or even holding us back. But I have to ask... why the relational model? ...A Hierarchical Model would work. In fact if we look at these NoSQL databases ... Hierarchical Models work better than relational models. The point is that some of the factors which cause one to think in terms of relationships have changed. Disk is cheap. There are definitely problems where having a strongly typed and structured model make sense. Then there are problems where not having a rigid model make sense. Structure on read seems to make sense these days. PS... Relational models don't scale. Especially on MPP.

--LinkedIn.com

Re-Write

Follow @DBDebunk Follow @ThePostWest

See:

SEMANTICS, DATABASE RELATIONS, AND TABLES

Sunday, January 10, 2016

This Week

Follow @DBDebunk Follow @ThePostWest

1. Quote of the Week

Table (n.) – a collection of information (data?) describing a population of entities which possess some common characteristics, called attributes. Tables are the building block of relational databases. Tables must generally be “normalized,” at least to 1NF. That may be an appropriate way to think of databases when implemented in a modern day DBMS. However, it is not the way the world thinks logically. People have no problem with commonly occurring phenomena such as:
* A multi-valued attribute, e.g., an Employee possesses multiple Skills.
* Many-to-many (M:N) relationships, e.g., as between Employees and Projects
* A relationship with attributes
--Gordon Everest, Recognizing and Treating Tableitis

The Real Data Science Series: 1NF In Theory & Practice

Follow @DBDebunk Follow @ThePostWest

My January presentation for the San Francisco Microsoft Data Platform User Group:

The Real Data Science Series: 1NF In Theory & Practice

Wednesday, January 13, 2016 6:30 PM

Microsoft Reactor, 680 Folsom Street , San Francisco, CA (map)

Join & RSVP

Tell a friend

In the early 70's E. F. Codd provided a very precise, formal definition of a table in its normal form. Any table not normalized was in violation of the RDM and not considered a R-table. But you are unlikely to have encountered that definition. Instead you probably heard about "repeating groups", "simple domains" and "atomic values", neither of which are formal relational concepts. C. J. Date provided a 1NF definition different than Codd's. And you probably think that the same design principle underlies all normal forms, but 1NF is somewhat distinct.

This presentation introduces order and makes sense of all this, including the practical implications for SQL database practice. It is first in THE REAL DATA SCIENCE series (that includes papers and seminars) expounding the Codd-McGoveran relational model, distinct from Date-Darwen's.You will learn:

• Normalization vs. further normalization

• Repeating Groups

• Simple domains and atomic values

• SQL and 1NF

Sunday, January 3, 2016

NoSQL and SQL: A Plague on Both Their Houses

Follow @DBDebunk Follow @ThePostWest

Oracle Defends Relational DBs Against NoSQL Competitors prompted Does Oracle Really Understand NoSQL?, which was shared on LinkedIn and triggered a LinkedIn exchange in which I participated. The following comment is an adequate summary of the second article:

JN: It's an unfortunate bit of propaganda. Some truth mixed in with distractions and irrelevant comparison. I've met the DataStax team. They're smart people with a solid understanding of their space. I'm disappointed to see them mix good and bad information into something that looks like objective truth.

and the first is not much better. The interested reader can visit all three links. What I want to do here is amplify on some of my LinkedIn comments and add some.

Re-interpreting Codd: LOGIC FOR SERIOUS DATABASE FOLKS

Follow @DBDebunk Follow @ThePostWest

David McGoveran has been working on a book on logic for database professionals. He will post articles online for review that will become chapters in the book. He has just posted first 4 along with a revised "Series Introduction" (link below).

I thought it's worth sharing a short note he sent me on what prompted the book and what some of its objectives are.

Documents and Databases

Follow @DBDebunk Follow @ThePostWest

My December post @All Analytics.

Don't let the label fool you. It's the relational data model (RDM), not SQL that NoSQL proponents really are rejecting. The main argument, advanced in a recent LinkedIn exchange, is that lots of information "cannot be represented in rows and columns". The implication is that the RDM is not general enough -- there are certain types of information that it cannot represent. The response from my colleague in RDM David McGoveran, is important enough to restate here.

Read it all. (Comment there, not here, please. Thanks.)

Happy new year!

Sunday, December 20, 2015

Weekly Update

Follow @DBDebunk Follow @ThePostWest

1. Quote of the Week

According to Wikipedia, Amazon's Redshift is a modified version of Postgres.
Maybe its speed redshifted data integrity into a bloody mess.

It has no primary keys, foreign keys or unique constraints. It just has optimizer hints in the DDL that *maybe* the data behaves that way. If they want to put hints in the DDL, OK, but don't call those hint PRIMARY KEY, FOREIGN KEY or UNIQUE.
--Unsupported PostgreSQL Features - Amazon Redshift

I don't think any version of Postgres lets you say Create Table Foo (Bar Int PRIMARY KEY) and then let you do

INSERT INTO Foo (BAR) Values (42)
INSERT INTO Foo (BAR) Values (42)
INSERT INTO Foo (BAR) Values (42)
That might be fine for a one time only static data warehouse, but an ongoing data update system is going to break the integrity rules, it is just a matter of time. That would make for some surprises when someone decides to migrate their data and lots of application code from a relational DBMS to Redshift.
--Jeff Winchell

Moving in Circles: RDBMS-SQL Conflation & Logical-Physical Confusion

Follow @DBDebunk Follow @ThePostWest

In my last post I demonstrated how disregard for the scientific foundation and history of a field, here, database management, leads to Moving in Circles. The piece I debunked was by CTO of VoltDB, one in the "VVV" series of products by Michael Stonebraker (MS). I've recently come across The Traditional RDBMS Wisdom is All Wrong, a presentation by the man himself, that reinforces my point.

Weekly Update

Follow @DBDebunk Follow @ThePostWest

To all my Jewish readers:

I was making minor revisions to Surrogate Key Illusions when I ran across this: How To Find Duplicate Addresses For A PhysicianRowID.

1. Quote of the Week

The entire Information Technology industry is still stuck in 1971 with Dr. Codd's 3rd Normal form ... when 3 more exist and now a 7th or (N) Normal Form ... Dr. Codd, created, devised, extrapolated 6 forms of Data NORMALIZATION, and after 45 years (1971), every single Database system or information management system to date has not exceeded the 3rd normal Form... except AtomicDB, (N) Normal Form improving on Dr. Codd's work by 4 levels(dimensions)at the very least ... There was and is a method to his theory, that with each level of Normalization brings a geometric increase in "Efficiency" and scalability, however, no one has even attempted the restrictions of the 4th normal form (no duplicates) let alone the 5th or 6th, and now with Dr. Everett’s (N)th Normal form, we can do anything the human mind can devise on a computer and in nearly real time. --Jean Michel LeTennier, CTO, Atomic Database Corp.

Re-write

Follow @DBDebunk Follow @ThePostWest

See:

Science, "Data Science", and Database Science

Sunday, November 22, 2015

Weekly Update

Follow @DBDebunk Follow @ThePostWest

Housekeeping: Added If a table with a SK has a NK does it violate 3NF? to LINKS page.

1. Quote of the Week

One other intriguing benefit of NoSQL that I started to unwitting benefit from recently is the ability to push data scheme concerns entirely to the application layer. In this scenario, the applications use a NoSQL database predominantly as a storage service, lightly structured by a few indexed key fields. The object structured data document within the payload becomes transparent to the database. The applications then assume the role of enforcing and understanding the data scheme.

This approach allows the application architect to encode the data structures and meaning directly in the code that creates and consumes the data. So data structure changes required for functional updates can be implemented, tested, and deployed in the application code base with no updates to the database layer at all. (Of course, a conversion of existing NoSQL data documents may be required in situations.)

In this NoSQL approach, the removed translation of object data scheme to a relational structure and, and then back to an object structure again is a very welcome relief as well. --use-the-index-luke.com

Moving in Circles: SQL for NoSQL

Follow @DBDebunk Follow @ThePostWest

We've been there, done that.

In Coming Full Circle: Why SQL now powers the NoSQL Craze Ryan Betts, CTO at VoltDB, argues that NoSQL products should adopt SQL for queries. I don't know about you, but to me it looks like a contradiction. Let me make it clear that my intention here is neither to defend SQL, nor to criticize it--I sure have done enough of that during the years--but rather strictly debunk the notion that its use with NoSQL systems is a good idea.

Weekly Update

Follow @DBDebunk Follow @ThePostWest

Housekeeping: Added

to LINKS page.

RAQUEL

to SOFTWARE section on Home page.

1. Quote of the Week

After attending NoSQL conference I am really hoping that companies think through this 'big data' implementation! No one there was interested in Data model ... and said so ... forget the data model ... even 'standards' were looked at 'its 'too early' for this new technology ... and no one could tell us anything about 'meta-data'...!!!!! --LinkedIn.com

Re-write

Follow @DBDebunk Follow @ThePostWest

See series starting at:

The Principle of Orthogonal Database Design Part I

Sunday, October 25, 2015

Weekly Update

Follow @DBDebunk Follow @ThePostWest

1. Quote of the Week

The ER notation consists a set of constructs, such as, a rectangle to represent an entity types, an ellipse to represent an attribute a diamond to represent a relationship type, and so on. The RDS is a set of linear relation schemas. A relation schema has a name and is a sequence of attributes of text separated by comas and arranged horizontal. I have also developed an ER-to relational transformation algorithm to transform an ER schema to its corresponding RDS. I wish to implement this project as a CASE tool. --LinkedIn.com

The problem is not the student, but the quality of education he gets at his university.

Mathematics & Meaning

Follow @DBDebunk Follow @ThePostWest

My October column @AllAnalytics.

We have seen that the usefulness of the relational data model (RDM) is in its dual theoretical foundation -- first order predicate logic (FOPL) and set theory -- the mathematics guarantees provable logical correctness of query results regardless what meaning is assigned to the database R-tables. But a logically correct result is not necessarily a meaningful result. As a reader commented "If a is Salary and b is Start date, the [Cartesian] product a*b still doesn't have a sensible meaning." It is critical for the analyst to understand the distinction.

Read it all. (Please comment there, not here)

Sunday, October 18, 2015

Re-write

Follow @DBDebunk Follow @ThePostWest

See series starting at

The First Normal Form in Theory and Practice, Part I

Sunday, October 11, 2015

Weekly Update (UPDATED)

Follow @DBDebunk Follow @ThePostWest

Housekeeping: Added following to LINKS page:

1. Quote of the Week

"The real definition of Big Data?? Simple: Whatever does not fit in Excel!"

Me: What is--precisely, pls!--the threshold from small to big data? And how do the structural, manipulative and integrity aspects change over the threshold?

HM: Big data := the smallest set of data for which the sensitivity of your transfer function is minimal, and such that the cardinality of this set is too large to implement said transfer function on a single physical machine.Transfer function := though not strictly a function, as in the Lambda calculus, it is more of an operator which ingests data of any size and produces a monetizeable product or service. Sensitivity := the degree to which a perturbation on the input into a transfer function affects the result produced by said function.

Me: Ugh! I'm sure that's exactly what all the overnight data scientists, including those who invented Big Data, had in mind.--LinkedIn.com

Database Education: Oughts and OughtNot's

Follow @DBDebunk Follow @ThePostWest

From an online exchange in response to A Tiny Intro to Database Systems:

C: As a non-CS grad coming fresh to databases, I found both the entity-relationship, and the object-oriented models confusing. Then I read Date [1] and Codd's [2] books and papers on the relational model, the one from the 1970s that is basically set and type theory applied to data, and found that to be a lot clearer and a more powerful abstraction to deal with your data model. As a non-"full time developer" it amazes me the number of "experienced" developers who are not aware of the relational model and who do not know what a foreign key is, or why referential integrity might be important.

For example, your Relational Model introduction has a discussion of various data types. But arguably, whether your integer is implemented as BIGINT or TINYINT is an implementation decision which should be separate from the model discussion (dixit Date). In other words, that attribute has a type of integer and how that integer is stored is a separate issue, and your RDBMS ought to abstract it away (as, I think, Postgres is pretty good with, and MySQL quite annoying). The beauty of the latest RDBMS developments, particularly in PostgreSQL world, is that the implementation has gotten so good that you don't need to really worry about it like you used to just a decade ago, at least in 95% of use cases.

I think one can teach SQL (and the relational model) to a non-developer in about 2 hours, because it is so declarative and intuitive. One day I'll go write that tutorial, as many clients need it sorely.

Weekly Update

Follow @DBDebunk Follow @ThePostWest

UPDATE: I have posted, via David McGoveran, an update to last week's post on Codd's 12 rules.

Reactions to my presentation "The Real Science: Tables- So What?" to the Silicon Valley SQL Server User Group.

With regards to Language Redundancy and DBMS Performance: A SQL Story:

1. Quote of the Week

... the challenges inherent in the SQL RDBMS [sic] approach ... the constrained schema (or schema-first) approach of SQL RDBMS engines imposes semantic infidelity rather than fidelity on all applications and services that depend on this RDBMS type, solely ... SQL RDBMS engines (as per what I've outlined above) do impose a "one size fits all" constraint on DBMS driven apps and services that manifests as the "data variety issue" outlined by the "Big Data" meme.
--LinkedIn.com

The Real Data Science: Tables -- So What?

Follow @DBDebunk Follow @ThePostWest

My September post @AllAnalytics.

We have seen that if database tables are designed to represent a set of (facts about) a single class of attribute-sharing entities each and to preserve the mathematical properties of relations, databases are easier to understand, and query results are guaranteed to be provably correct and easier to interpret. Let's see how and why with the help of an example.

Read it all. (Please comment there, not here)

Sunday, September 20, 2015

Re-Write

Follow @DBDebunk Follow @ThePostWest

Understanding Codd's 12 Rules for RDBMS

Sunday, September 13, 2015

Weekly Update

Follow @DBDebunk Follow @ThePostWest

The Real Data Science: Tables--So What?

My Presentation to Silicon Valley SQL Server User Group

6:30 PM, Tuesday, September 15, 2015

Microsoft
1065 La Avenida, Building 1
Mountain View, CA

Free and open to the public (+ pizza)
For details and RSVP see Meetup.

1. Quote of the Week

You see, in Cassandra 1.x, the data model is centered around what Cassandra calls “column families”. A column family contains rows, which are identified by a row key. The row key is what you need to fetch the data from the row. The row can then have one or more columns, each of which has a name, value, and timestamp. (A value is also called a “cell”). Cassandra’s data model flexibility comes from the following facts:
* column names are defined per-row
* rows can be “wide” — that is, have hundreds, thousands, or even millions of columns
* columns can be sorted, and ranges of ordered columns can be selected efficiently using “slices”.
--http://blog.parsely.com/post/1928/cass/

Compare this to the RDM.

2. To Laugh or Cry?

3. Online Debunkings

4. Elsewhere

5. And now for something completely different

Re-write

Follow @DBDebunk Follow @ThePostWest

See series starting at

The First Normal Form in Theory and Practice Part I

Sunday, August 30, 2015

Weekly Update

Follow @DBDebunk Follow @ThePostWest

The Real Data Science: Tables--So What?

My Presentation to Silicon Valley SQL Server User Group

6:30 PM, Tuesday, September 15, 2015

Microsoft
1065 La Avenida Building 1
Mountain View, CA

Free and open to the public (+ pizza)
For details and RSVP see Meetup

1. Quote of the Week

[Do] formalized languages need the definition of data types? Up to now I have not read strong arguments against my statement that for interpretation and operation on data the use of character strings is sufficient when

All data are expressed as character strings that are explicitly based in language communities, whereas the character strings denote concepts that are represented by UID's;

The denoted concepts are defined by their supertype concepts (among others);

Collections of allowed qualitative concepts (that are denoted by string values or value ranges) are defined to enable the specification of constraints; --LinkedIn.com

2. To Laugh or Cry?

3. Online Debunkings

4. Elsewhere
Which technologies emerge from the abyss
Why Big Data gets it Wrong

5. Housekeeping Added to LINKS page:

Query Optimization
Relational Algebra
LEAP RDBMS
Relational
Relational Algebra Translator
System for Translating Relational Algebra Scripts into Microsoft SQL Server SQL Scripts
System for Translating Relational Algebra Scripts into Oracle SQL Scripts

And now for something completely different

Saturday, August 22, 2015

Silicon Valley SQL Server User Group Presentation

Follow @DBDebunk Follow @ThePostWest

The Real Data Science: Tables -- So What?

During hyping of fads such as "Data Science", all you hear is the "huge opportunities for enterprises to gain hitherto unimagined insights" and very little about the potential to tell enterprises really big lies, which can rise from 100% correct data in poorly designed databases. That's because what passes for "Data Science" is not science, let alone science of data.

Most data professionals know that relational databases consist of tables, but so what? Provably correct query results are guaranteed by the real data science--the RDM--if and only if tables are well-designed and properly constrained R-tables and the DBMS truly and fully supports it. Unfortunately, more often than not tables are neither, and SQL DBMS's don't, which makes databases harder to understand, queries don't always make sense and results are hard to interpret, or outright wrong.

You will learn:

The Real Data Science
Relations and databases
5NF R-tables
"Table arithmetic"
RDM and SQL

6:30 PM, Tuesday, September 15, 2015

Microsoft
1065 La Avenida Building 1
Mountain View, CA (map)

For details see Meetup.

POSTS

Sunday, May 15, 2016

Monday, May 2, 2016

Tuesday, April 26, 2016

Tuesday, April 19, 2016

Sunday, April 17, 2016

1. What's wrong with this picture?

Sunday, April 10, 2016

Sunday, April 3, 2016

Friday, March 25, 2016

Sunday, March 20, 2016

Wednesday, March 9, 2016

Sunday, March 6, 2016

Tuesday, March 1, 2016

Sunday, February 21, 2016

Sunday, February 14, 2016

Sunday, February 7, 2016

Wednesday, January 27, 2016

Sunday, January 24, 2016

Sunday, January 17, 2016

Sunday, January 10, 2016

Monday, January 4, 2016

Sunday, January 3, 2016

Wednesday, December 30, 2015

Wednesday, December 23, 2015

Sunday, December 20, 2015

Sunday, December 13, 2015

Sunday, December 6, 2015

1. Quote of the Week

Friday, November 27, 2015

Sunday, November 22, 2015

Sunday, November 15, 2015

Sunday, November 8, 2015

Sunday, November 1, 2015

Sunday, October 25, 2015

Wednesday, October 21, 2015

Sunday, October 18, 2015

Sunday, October 11, 2015

Sunday, October 4, 2015

Sunday, September 27, 2015

Tuesday, September 22, 2015

Sunday, September 20, 2015

Sunday, September 13, 2015

Sunday, September 6, 2015

Sunday, August 30, 2015

Saturday, August 22, 2015

The Real Data Science: Tables -- So What?