Saturday, January 8, 2022

OBG: No Understanding Without Foundation Knowledge Part 2 -- Debunking an Online Exchange 1



Note: To demonstrate the soundness and stability conferred by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. In re-publishing I may revise, break into or merge parts and/or add comments and/or references that I enclose in square brackets). 

In Part 1 I debunked a review of my third book, which had triggered an exchange @SlashDot.org in reaction to an article of mine @DBAzine.com. This and forthcoming Parts 3 and 4 were my debunkings of that exchange, in which a W3C XML committee member and an academic -- who ought to have known better -- participated. 

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

- 01/01 Schema and Performance: Never the Twain Shall Meet

- 12/28 Updated the POSTS page – added 2021 posts.

-  12/17 OBG: No Understanding Without Foundation Knowledge Part 1: Reviewing a Book Review

- 12/11 Nobody Understands the Relational Model: Semantics, Closure and Database Correctness Part 4

- 12/05 TYFK: How Not to Explain the Relational Model

- 11/25 Nobody Understands the Relational Model: Semantics, Closure and Database Correctness Part 3

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

Slashing a Slashdot Exchange Part 1

(first published @DBazine.com on in 2001) 

Unfortunately, the following SlashDot.org debunked here is representative of the level of discourse in the industry.

“Fabian Pascal is smart and well-informed, but a zealot. Like all zealots he is willing to sacrifice anything and everything for his vision of technical purity.”
A "smart and well-informed zealot" is almost a contradiction in terms. But labeling one a "zealot" is a common way to marginalize the proponent of arguments that cannot be countered  (see "Lenin, Trotsky, and Freedom from the Tyranny of Knowledge and Reason" ). More importantly, what does it say about an industry (and, worse, society) that considers insistence on science zealotry? What exactly am I "sacrificing"? Does "technical purity" mean we should compromise on logic and mathematics in database management? Isn’t that exactly what we should not sacrifice?
“This quote needs a position in the library of intellectual arrogance as well:
'Indeed, data/information management requires "some organizing principle that is, structure; anything "unstructured" -- and many in the industry promote XML for that purpose -- is not data, but meaningless random noise that carries no information.’
A snit crassly dismisses several millennia of literature because it is unstructured. Quite frankly, meaning and structure are independent of each other. It is possible to find meaning in things with radically different structures. It is true that there is a correlation between structure and the ability to communicate meaning, but a healthy mind can find meanings in things that have not been normalized. Likewise, you can have meaningless garbage in relational databases. A case in point is the large number of fake web sites that do things like join the FIPS database to product names so that they can have millions of pages that show up in search engines. Likewise, we see academician filling volume after volume of publications with meaningless tripe.”
Insults are another common way of distracting from issues the addressing of which requires knowledge and reason one does not possess. [In full display here are lack of foundation knowledge, failure to understand arguments and poor reasoning.]

I do not "dismiss literature" (i.e. text), quite the opposite. I never claimed that meaning is dependent on a specific structure, but I insist that in the absence of any structure there is only meaningless random noise, not data. It is the industry, not I, that erroneously refers to text as "unstructured data" that, strictly speaking, is a contradiction in terms. In fact, what is usually meant is "not structured in a specific way" (relationally, hierarchically), but even that does not mean it is unstructurable that way. Text has a complex multistructure that can be analyzed and one selected for structuring either way if the effort/cost is justified! The relational structure is the database relation, which is both normalized (in 1NF) and fully normalized (in 5NF). It offers multiple critical advantages for manipulation and integrity due to its grounding in SST/FOPL, foremost of which is soundness -- system-guaranteed logical validity and by-design semantic consistency. [The fake web sites are excellent examples of misuse of relational terminology.]

“I read this and pretty much gave up getting anything of value out of this article -- I hadn’t understood much that went before it, though my distrust of all things XML had led me to believe this guy might know what he's talking about. If you removed NULLs from relational database design, people would reinvent them (poorly) -- probably by using IDs of -1 or 0, or IDs to a special magic ‘null’ row, which I suspect is what he's talking about by ‘it can be handled relationally.’ To suggest that missing or inapplicable values are not part of ‘the real world’ is so wrong it’s... well... wrong. Anyone who’s actually done database work (or programming work, for that matter) knows this.”
If he does not understand, how does he know I am wrong?

The current generation of data practitioners is unfamiliar with the history of the database field, so reinventions of (square) wheels would hardly be a surprise -- those who do not know the past are condemned to repeat it. The author seems unaware that default (i.e., exception) values have been in use for decades and SQL NULL was an attempt to get rid of them because they are problematic. Unfortunately, NULL is equally problematic, but to understand that requires foundation knowledge and the use of the term "NULL value" betrays the absence of such. [As I explain in The Last NULL in the Coffin: A Relational Solution to Missing Data, problems with NULL stem precisely from the fact that it is not a value, but a marker for the absence of a value. When we say "there are no NULLs in the real world" we mean that missing data -- which exists in the real world -- should not be confused with SQL's flawed way of representing it in the database. which does not. The paper describes the correct relational solution without NULL or any "special magic ‘null’ row", whatever that means.]
“In other words, Dr. Codd was a brilliant theoretician, but a lousy marketer and packager. We just have to agree on and/or find relational operators and syntax that we find more intuitive than those in the original papers. Sometimes I feel that "look-up" would be more intuitive than "Join", for example. Relational as a practice is still young.”
Let me get this straight: the person who, for the first time, put database management on a scientific basis, is criticized for not being a better marketeer? The marketeers have given us multivalued, object, SQL and XML DBMSs rather than true RDBMSs -- that's why we get a new "paradigm"... er, fad every other year. [Codd denounced SQL forcefully and proposed his own data sublanguage, but in the absence of foundation knowledge nobody paid attention -- to this day the industry believes SQL DBMSs are relational and blames the RDM for what are SQL's violations thereof.]
“Basically what I take from this is that the table (e.g. SELECT * FROM foo) is simply a convenient logical representation of a stored relation. That is to say, foo can be implemented by the DBMS as a linked list, a tree, any data structure. True. However, this encoding is usually very inconvenient (consider representing an HTML document or a structured piece of literature in this manner). Besides this, nested structures are at least as logical as flat structures (I continue to call them flat because they *are*). Relational database logic is merely a fragment of first order predicate logic, one that is restricted to - guess what - flat relations, whereas first order predicate logic usually works with *nested* structures (called terms) and relations. XML and other nested data structures fit very well into logic, and in fact we (a research group in Munich and some other places) are working on a logic-based query language that exploits this similarity. I agree with many of the statements that the author of the article makes, in particular regarding XQuery. However, some are so arrogant and *unproven* that it leaves the article in a bad light. Also, while he claims to have a good insight into database theory, I don't think he really has. SQLs big advantages are (1) it is easy to use and (2) it has a very limited expressive power which makes it easy to implement and efficient to evaluate. Other approaches have been considered, e.g. in deductive databases or knowledge base systems. However, those needed languages that were basically Turing complete or at least supported basic recursion (to implement transitive closure) and thus could lead to very inefficient queries.”
I am accused of zealotry, arrogance, unproven statements and ignorance, when the amount of drivel packed into this paragraph is astounding -- some patently false, inconsistent, or meaningless. How does one respond to such drivel?

Worse than arrogance is arrogant ignorance (see "Unskilled and Unaware of It").]

Note on re-publication

Speaking of ignorance:

“the table (e.g. SELECT * FROM foo) is simply a convenient logical representation of a stored relation.”
“[table] can be implemented by the DBMS as a linked list, a tree, any data structure ... very inconvenient (consider representing an HTML document or a structured piece of literature in this manner).”
“Relational database logic is merely a fragment of first order predicate logic, one that is restricted to - guess what - flat relations whereas first order predicate logic usually works with *nested* structures (called terms) and relations.”
“XML and other nested data structures fit very well into logic, and in fact we ... are working on a logic-based query language that exploits this similarity.”
“those needed languages that were basically Turing complete or at least supported basic recursion (to implement transitive closure) ... could lead to very inefficient queries.”
This would take several pages to debunk -- time permitting I will do so in a separate post, but can you tell what's wrong with them? (I am going to disregard the claims about SQL, but I do ask where exactly are those "deductive databases or knowledge base systems"?)]

(Continued in Part 3)




No comments:

Post a Comment

View My Stats