Friday, November 19, 2021

THE FATE OF FADS: XML DBMS (obg)



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

Remember XML DBMS? At one point it was the fad of the day, similar to today's NoSQL or the old new "knowledge graph" -- "the future that you ignored at the peril of being left behind". As I predicted, it went the way of all fads (ODBMS, Associative DBMS, you name them) together with their "data models" that were nothing of the sort. My prediction was grounded in the same sound foundations I rely on today -- unlike the industry we are progressing it -- that fads lack and which were and still are dismissed, evidence be damned.

Here's a typical example (comments on republication in square brackets).

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

- 11/11 Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2

- 11/05 OBG: Database Consistency and Physical Truth

- 10/27 Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 1

- 09/19 TYFK: Calculated Attributes -- Redundancy, Full Normalization and Relational Theory

- 09/11 OBG: Data Warehouses Are Non-Relational Application Views

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

Comments on an Exchange at XML-DEV.COM

(originally published on 11/03/2001)

In summarizing "some of the comments made by XML-DEV members in response to recent critical article[s] on the relationship between XML and databases" in my Against the Grain series, Leigh Dodds validates my reply to the series editor, who asked me to respond to yet another such reaction to my articles on XML: "I read the reaction and, as I guessed, it is very weak. The fact that it’s not flaming and personal does not mean that it’s based on sufficient knowledge."

Few practitioners have formal education [and foundation knowledge]. They rely on "common sense" and experience and that is simply not enough. That is precisely why the state of data management is so horrendous. Responding to each and every rebuttal of this nature is a never-ending and mostly useless task, because they never learn and misconceptions persist. Bear in mind that my three-part article was itself a response to another reaction to an yet earlier article, which makes it obvious that that article has not helped any -- the new reaction exhibits some of the fallacies I had already debunked. And they keep coming, regardless of my response.[To this day, of course.]

Champion:

“My biggest question after reading his stuff is ‘If the pure relational model is so powerful, why have the RDBMS vendors, presumably driven by customer demand, supported ‘post-relational’ Object-Relational and XML features in their recent releases?’ I personally doubt if ‘ignorance’ is the answer. I keep hoping that there is some middle ground where the rigorous mathematics of the relational model and the pragmatic usability of XML can meet and inform one another. In private correspondence, Mr. Pascal assured me that a truly mathematical model of XML is impossible, but I’m keeping an open mind.”
Aside from simple set theory (SST), the second theoretical foundation of the RDM is first order predicate logic (FOPL). As explained in my books, articles, posts  and teaching

  • A database is a set of axioms, the response to a query is a theorem, the process of deriving the theorem from the axioms is a proof, a proof is made by manipulating symbols according to agreed mathematical rules. The proof, of course, are only as sound and consistent as the rules are.
  • A DBMS is a deductive logic system: it derives new facts (query results) from a set of user asserted facts (the database); the derived facts are true (query results are correct -- [logically valid and semantically consistent]) if and only if the initial assertions are true and the derivation rules are (logically) sound.

Whether he realizes it or not, Champion is looking for a way to compromise logic for practical purposes. In fact, he provides an excellent example of how practical it is to flout logic: the notion that if vendors don’t do something, it can only mean it’s either not worth doing, or it cannot be done is faulty logic. [There is no demand to drive vendors if, like Champion, users and vendors are uneducated on RDM, lack foundation knowledge, and are constantly exposed to misconceptions in the industry, media and online. As to mathematics, see the comments on graph theory below; open mind is fine, but not so your brain falls out.]

Dodds:

“Presumably these features are being added because customers are keen to use their data in different ways; for example, in closer conjunction with business objects or to store different kinds of data that don’t fit cleanly into a relational system. Documents are an obvious example, and the Web is a gold mine of semi-structured data just begging to be usefully manipulated. Much of the XML database and query work is geared toward exploiting this information. And as Joshua Allen observed, while relational databases have been steadily optimized for many years, research on semi-structured data is only now becoming mainstream.”
An excellent example of [the same fallacy that I demonstrated with respect to ODBMS (OO for Application Development, Not Database Management and OODBM vs. RDBMS: To a Hammer Everything Looks Like Nails). [Docubases do not serve the same purpose as databases.] That aside, can Dodds provide a formal, precise definition of "semi-structured data"? Can he explain why documents "do not fit cleanly into a relational system"? And when he fails to do so, I suggest he reads Chapter 1, Careful What You Wish For: Data Types and Complexity, in PRACTICAL ISSUES IN DATABASE MANAGEMENT.  [You may recognize the same old "don't fit in RDM" in the arguments advanced today for NoSQL DBMS. Aside from the fact that data (including documents) is by definition structured. They are not structured relationally, but that does not mean they are not structurable so, if the purpose is to make inferences of the kind RDM enables. What should be avoided is the illusion that that purpose can be achieved without such structuring, which is quite common.]

Allen:
“The only reason that RDBMS software dominates the market right now is because we are good at solving these problems, and RDBMS design has evolved to disallow users from asking questions that the database [sic] isn’t good at answering. The fact that we ship databases [sic] that only permit things that we know how to answer efficiently does NOT imply that we will never be able to answer other questions more efficiently (in fact, RDBMS systems have evolved and gobbled up much of the research on data warehousing to include those techniques into the engines -- witness materialized views and bit-mapped indexes). It is quite easy to see a trend in the industry that shows consistent continual progress at solving hard query problems. Of course some problems will always be hard (distributed cost-based query optimization is one), but I would point out that research on RDBMS optimizations has tapered off quite a bit and we have seen major increases in research geared toward semi-structured data in the past decade. So we are simply easing off on some of the traditional RDBMS constraints and beginning to allow things like recursive self-joins, ragged hierarchies [?], etc. and we are optimizing these things.
First, [as long as RDM has not been truly implemented] (SQL is not it), we won't get close  to being "good at solving problems". Second, the major reason practitioners think relational databases cannot answer "certain questions" [they never specify which] is failure to understand the RDM. Third, the common confusion of logical and physical levels: RDM is logical and has nothing to do with efficiency (determined by physical implementation, such as indexes). Fourth, the same kind of "just in its infancy" argument has also been advanced for other database technologies to no avail (e.g., see OO for Application Development, Not Database Management). And fifth, the database field is actually regressing, not progressing, XML being a throwback to the good old days of hierarchic DBMSs, which we discarded because they were not cost-effective (see Skyscrapers with Shack Foundations and Those Who Don’t Know Remember the Past, Are Condemned to Repeat It. )[The hierarchic regression is now full throttle beyond XML.]

Allen:
“... I think that areas of discrete mathematics that deal with graphs are currently the most vibrant area of research in the industry. The web itself is   one huge graph structure, and research on ways to index the web, optimize routing, etc. all feed directly into techniques for optimizing XML processing ... The web is a graph. XML is the web made just a bit less sloppy, but we still have key/keyref and XLink, XPointer, RDF--all that stuff John mentions. Take the graph that is the web and make it more machine-readable. Take all of the services and data in silos at the edges of the web and expose it as XML documents (as appropriate of course). Now you have one big huge honkin’ graph. What is more fun that that?”
More logical/physical confusion. Moreover, it is quite instructive that while hierarchic database management has a foundation in graph theory, hierarchic DBMS products do not adhere to it. IBM’s IMS is not an implementation of the theory and by W3C's own admission
neither is XML [due in large part to complexity (see Chapter 7, Climbing Trees in SQL in PRACTICAL ISSUES IN DATABASE MANAGEMENT). Indeed, relational technology was invented for simplification and flexibility. [Codd reserved RDM for non-networked applications, recognizing that graph DBMSs would be beneficial to networked application, allbeit at the cost of inherent complexity and rigidity; in other words, using GDBMs where RDBMSs will do is counterproductive).

Dodds:
“Indeed, one may find it hard to criticize the current XML Query efforts, which are defining the algebraic underpinning for querying XML data sources. If this formal work were not being carried out, Mr. Pascal’s claims might make more sense. How else will advances happen if the basic research is not carried out?”
"Algebraic underpinnings", mathematics my foot. XML authors insist that it is for data exchange and those involved in specifying XML standards do not have a data management background and do not even realize that they are extending XML to it. They use terms such as "query algebra" (does data exchange require a query language?) to create a scientific impression, but by renouncing graph theory they discard science. To quote Hugh Darwen:
“Now, my eyes light up at the word "algebra" ... Originally, I understood it to mean a set of operations that are closed over some type. That is, every operation in X Algebra operates on zero or more values of type X and returns a value of type X. Hence, set algebra, Boolean algebra, relational algebra and the algebra of numbers that gives us arithmetic. Over what is the XML Query Algebra closed? Nobody has ever given me an answer that makes sense (apart from the occasional, honest "I don’t know").”
Dodds:
“It’s hard to reconcile this image with Fabian Pascal’s vision of a centralized DBMS.”
One of the many objectives of RDM was to enable not only database, but also DBMS distribution, practically impossible with physically exposed graph DBMSs. Given the low level of foundation knowledge in the database field, it is clear that Dodds and Allen are unaware of why truly distributed DBMSs (DDBMS) must be relational. I urge them -- and everybody else -- to educate themselves before they take public positions.[C. J. Date has published 12 Rules for distributed DBMS which, properly understood, make it clear why the RDM is the best, if not only hope for DDBMS.]




No comments:

Post a Comment

View My Stats