Saturday, October 19, 2019

Brother, Spare Me the "Paradigms"

Note: This is a revised version of an old column @All Analytics in response to a recent LinkedIn exchange (check out my comments in the exchange).
“Consider dimensional design and Big Data as two additional paradigms ... Big Data paradigms like Hadoop and NoSQL will alleviate the temptation people have to try to use the relational database in unnatural ways.”
Every few years (and the intervals are getting shorter) a "fundamentally different" new way of doing data management -- a "paradigm shift" -- is being promoted that, if you don't adopt, you’ll be "left behind". In the above mentioned online exchange it is argued that data management is undergoing a paradigm shift from application-centric to data-centric data management. For the very few who (1) understand what a paradigm is and (2) are familiar with data fundamentals and the history of the field, the irony could not be richer.

Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you.

  • 08/09/19: Following my series of posts on data sublanguage (Parts 1-4), I have revised for consistency the corresponding section of paper #2 in the Understanding the Real RDM series, Logical Access, Data Sublanguage, Kinds of Relations, and Database Redundancy and Consistency, which is available for ordering from the PAPERS page.


  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics may work.


I deleted my Facebook account. You can follow me:

  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

Paradigm shift is Thomas Kuhn’s account of -- note very carefully -- scientific progress. A paradigm is an exemplar of a broad-scope theory (let's call it "super-theory") that scientists in a particular scientific field admire and emulate. A field in a pre-paradigmatic state is characterized by disunity of purpose and method. A paradigm characterizes what Kuhn calls "normal science" -- agreement on what research should be done and how. When anomalies -- problems that the research driven by the super-theory cannot account for -- accumulate, it loses confidence and the field undergoes a crisis. A new super-theory emerges that accounts for what the old one did as well as for the anomalies, and normal science returns until the next crisis[1]. Darwin’s theory of evolution is a well known paradigm; a paradigmatic shift occurred from Newtonian to Einsteinian physics.

What we consider today data management was pre-1970s in what can be viewed as a pre-paradigmatic state: no theoretical framework, only applications and application-specific data files (the infamous "islands of information") and practically inexistent data integrity, security, and concurrency control. Database management emerged as a response to these problems:

  • Application-specific data files were replaced by "neutral" databases shared by multiple concurrent applications;
  • Data management functions were centralized in a DBMS, leaving applications responsible for communication with users and presentation of results.
which eliminated redundancy and reduced development and maintenance burdens.

Note that the shift from an application-centric to a data-centric approach was nitiated in the 1970s. But it cannot per se be considered paradigmatic in the Kuhn sense, because it lacked the primary ingredient -- a theoretical foundation: the first generation of DBMSs and databases -- hierarchic and network (CODASYL) -- were abstracted in ad hoc manner from existing practice  and subsequent attempts to postfit directed graph theory proved too complex and inflexible in practice)

It's the RDM's formal theoretical grounding in simple set theory (SST) expressible in first order predicate logic (FOPL) that put database management on a scientific basis. Otherwise put, while the shift to database management was data centric, the paradigm was to relational database management! It's the RDM that ensures, among multiple practical advantages, the critical one of correctness of data and conclusions derived from them -- system-guaranteed logical validity and by-design semantic consistency[3].

Yet this is the very aspect that has been almost entirely missed and is mostly dismissed by the industry. Failure to appreciate its importance is responsible for the flaws of SQL, the only attempt to implement the RDM[4,5,6,7]. It is testament to RDM's theoretical foundation that even SQL's poor relational fidelity rendered it superior not only to all that preceded it, but also to the fad[8] driven "cookbook"[9] practice of the pre-paradigmatic state and application-specific databases and DBMSs[10,11] to which, sadly, the field has regressed. Documents and NoSQL are not paradigms; move to cloud is not a paradigm shift, SQL is not a programming paradigm [12]. Besides, multiple "simultaneous paradigms" are a contradiction in terms, certainly not even science, let alone normal science.
Those who forget the past...

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


[1] Kuhn, T., The Structure of Scientific Revolutions

[2] Pascal, F., Graph Databases: They Who Forget the Past...

[3] Pascal, F., Logical Validity and Semantic Correctness

[4] Pascal, F., SQL Sins

[5] Pascal, F., Language Redundancy and DBMS Performance: A SQL Story

[6] Pascal, F., To Really Understand Integrity, Don't Start with SQL

[7] Pascal, F., DISTINCT and ORDER BY Are Not Relational

[8] Pascal, F., Data Fundamentals, Fads and Big Data

[9] Pascal, F., The Cookbook Approach to Data Management

[10] Pascal, F., The Trouble with Data Warehouse Analytics

[11] Pascal, F., Denormalization for Performance Don't Blame the Relational Model

[12]  Pascal, F., Data Sublanguage Parts 1-4

No comments:

Post a Comment

View My Stats