Tuesday, October 16, 2012

What Is a Relational DBMS?

In Timewarp: What Is a Relational Database? Kevin Kline writes:
Maybe you're thinking that relational databases management systems (RDBMSs), like Microsoft SQL Server and Oracle, are going the way of punched cards and rotary phones.  After all, there's been a lot of hype these days in the IT media about the rise of so-called NoSQL (Not Only SQL) databases.  Many new and upcoming CS and MIS graduates who like working with data might think that relational databases are, at best, soon-to-be legacy systems and, at worst, are a career dead-end. Wrong!!!
As I commented on the post, his heart is in the right place, but several clarifications are in order.

A quick-and-dirty definition for a relational database might be: a system whose users view data as a collection of tables related to each other through common data values.
This is indeed quick-and-dirty. First, a DBMS is not a database. Second, both the DBMS and databases must be relational for the practical benefits to materialize. Third, any system can present users with a tabular view of data, but that does not necessarily make it relational. The tables have special DBMS-enforced properties that make them manipulable mathematically as sets and it is set-processing by the DBMS that confers the prefix R on it. Fourth, I would have liked Kline to mention that the name relational does not come from "relating tables", but rather from mathematical relations, which are sets (as one commenter correctly pointed out).

Kline does provide a more elaborate definition of a RDBMS:
The whole basis for the relational model follows this train of thought: data is stored in tables, which are composed of rows and columns. Tables of independent data can be linked, or related, to one another if they each have columns of data that represent the same data value, called keys.  This concept is so common as to seem trivial; however, it was not so long ago that achieving and programming a system capable of sustaining the relational model was considered a longshot with limited usefulness.
Well, no, data is represented as R-tables, not stored as such. SQL implementations--which are not exactly relational--do tend to have mirror image storage for their base tables. However, an RDBMS should keep the logical representation completely separate and independent of physical storage--files, indexes, hashes and so on--physical data independence being one of the most important beneficial features of relational systems. Some commercial SQL products offer some of that, but not enough; others store column rather than row data. What's important is that the relational model does not impose any constraints on physical implementation beyond the prohibition of exposing its details to users in applications.

Candidate keys--one of which can be selected, for pragmatic reasons, to be a primary key--guarantee uniqueness of rows for two reasons: to accurately represent entities which are distinct in the real world; and as part of the discipline that tables must obey to be R-tables (mathematical relations don't have duplicate tuples). A foreign key is a column in one R-table whose values reference a primary key in another R-table, but the relationship is based on the referencing and referenced columns being defined over the same domain i.e. they "mean the same thing", which is what Kline means by "representing the same data value" (a not very accurate expression). Indeed, Codd called columns defined over the same domain the "glue of databases".
If a vendor’s database product didn’t meet Codd’s 12 item litmus tests, then it was not a member of the club. Note that the rules do not apply to applications development. Instead, these rules determine whether the database engine itself can be considered truly “relational”. These rules were constructed to support a data model that would ensure the ACID properties of transactions and also eliminate a variety of data manipulation anomalies that frequently occurred on non-relation [sic] database platforms (and **still do** occur on non-relational database platforms). (As an aside, the transactional paradigm was conceived by my hero, Gray, Jim in 1981 while at Tandem Computer and presented in the paper "The Transaction Concept: Virtues and Limitations").
Because at the time vendors of various existing products were pretending relational fidelity by prefixing or suffixing their name with an R, Codd whipped up a few quick and dirty "rules" that could be used to expose false claims (they were "constructed to ensure that products supported a data model") and had little to do with transactions and ACID properties. Today the rules are deprecated because they are neither systematic, nor complete, nor precise.

Because Kline uses the term 'anomalies' in the context of both the relational model and transactions, there is an opportunity for confusion here. In a relational context, the term is used to describe the negative consequences of updating databases that are not fully normalized. In the context of transactions, the term could be used to describe the corruption potential of poor transaction support for updates, although the term is not commonly used that way.

I agree with Kline's description of the aspects of SQL databases which were superior to the hierarchic and network systems that were common at the time. But:
  • Given the nature of those systems, it wasn't very hard to be better
  • SQL is far from being truly and fully relational (some of the commenters provide good explanations why)
This contributes to those old products being reinvented in the object/XML/NoSQL fads and are bringing back the very same problems and complexities that Codd solved 50+ years ago.

Klein claims that
where NoSQL databases excel at storing data that is moderately important and requires eventual consistency, SQL database excel at storing data that is of paramount importance and requires immediate consistency.
I am not convinced that this is the real attraction of NoSQL, because as soon as you figure out what exactly you want to do with the data in a NoSQL "database", that NoSQL DBMS won't let you do it, at least not cost-effectively. No, my guess is that while the relational approach demands that you think upfront hard about your information needs and model the business and design the database to that end, NoSQL products let you get away without much upfront thinking and design time and effort. Short-term expediency is too tempting.

To Adam Machanic's question whether SQL products are relational, Kline replies:
But it's a great question.  Certainly, all of the major RDBMSes support the 12 Principles by degrees. Access, for example, probably does an even poorer job than SQL Server does.  Otoh, I believe that SQL Server passes muster reasonably well, say an A-, if not a perfect score of 100%. And, ironically, most all of the core database engine enhancements that have come out in the past few releases are decidedly NOT relational.  I'd point to SSAS, SSIS, ColumnStore, etc as features that go way outside of the domain of relational databases.
First, as I already mentioned, the 12 rules are no longer a sound criterion for assessing relational fidelity. Second, I am not sure exactly how Kline determined the "degree" to which the products comply with the rules (I suspect that if I applied them, the grades would be lower); and, anyway, in one of my books I demonstrated that the rules are not independent, so that violating one has repercussions for the others. Third, engine enhancements that are non-relational can violate the rules. And fourth, I do not know if the products he lists are relational or not (I suspect the latter), but columnar storage does not, per se, violate relational principles for reasons that I explained above.

Read the rest of the comments to Kline's post, they are informative.

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment