The ideas for relational databases were worked out in the 1970’s and the first commercial implementations appeared around 1980. By the 1990’s relational databases were the dominant way to store data. There were some non-relational databases in use, but these were not popular. Hierarchical databases seemed quaint, clinging to pre-relational approaches that had been deemed inferior by the march of progress. Object databases just seemed weird.
How many times do I need to reiterate that industry's persistent confusion of SQL with the relational model does not make SQL DBMSs relational?
The claim that SQL DBMSs became dominant because non-relational DBMSs were just "quaint" or object DBMSs "weird" is a rather cavalier characterization of their flaws. The reality is that hierarchic and CODASYL DBMSs were so prohibitively complex and rigid, that IT organizations outright could not cope with them anymore. That is why even SQL, despite its poor relational fidelity and language design, readily proved superior.
Now the tables have turned. Relational databases are still dominant, but all the cool kids are using NoSQL databases. A few years ago the implicit assumption was that nearly all data lived in a relational database, now you hear statements such as "Relational databases are still the best approach for some kinds of projects", implying that such projects are a small minority. Some of the praise for NoSQL databases is hype, but some is deserved: there are numerous successful applications using NoSQL databases because they need to, not because they want to be cool.Points arising:
1. The "cool kids" were not around when the IT industry struggled with the horrendous problems of non-relational DBMSs and they are reinventing, albeit with relabeling, all of them. Worse, even SQL implementations increasingly bring back old flaws. Here's what a friend developer emailed me about MySQL not recovering unused space:
The official solution is dump the tables, drop them and recreate them. This is even more fun when you find out how slow standard restores can be, or how slow data copying is. And people pay good money for support contracts based on this.IMS, anybody? You bet: Switching from relational to the graph model.
2. As I have demonstrated over and over again in my writings and teaching, trite, meaningless statements such as "good for some projects, but not for others", or "use the proper tool for the job" are indications of poor foundation knowledge (what a DBMS and databases are, what their functions are relative to applications, and the significance and importance of and requirements from, a data model--by which I don't mean a business model or a logical model--are. What do you exactly learn from the following?
I’m not an advocate of SQL or NoSQL databases. Each has its place. A few years ago developers assumed that nearly all data was best stored in a relational database. That was a mistake, just as it would be a mistake now to assume that all data should now move to a non-relational database.3. The notion that SQL DBMSs are now in a minority is bunk. It takes decades for a technology to overcome resistance, be widely adopted, used and mastered. It took roughly a decade for SQL to be implemented and four decades to dominate, despite the horrible problems of preceding technologies. And SQL implementations are far from conferring all the practical advantages of relational technology. Marketing fads notwithstanding, can you name a current technology that comes even close?
So why the shift to NoSQL, and why now? I’m not an expert in this area, but I’ll repeat some of the explanations that I’ve heard that sound plausible.
Relational databases were designed in an era of small, expensive storage media. They were designed to conserve a resource that is now abundant. Non-relational databases may be less conservative with storage.Aside from the constant logical-physical confusion, this is sort of backwards. In fact, one of the main initial objections to SQL DBMSs, that slowed their adoption, was that they were resource hogs. It is true that in time hardware became cheaper and SQL DBMSs became more efficient, but efficiency was also considerably predicated on whatever little relational fidelity they had: their performance was optimizable by the vendors at the database level, relieving application developers from a huge burden, including dependence on physical storage and access methods. The human productivity and improvements that SQL DBMSs brought were so significant relative to what preceded them, that even their initial higher resource consumption could not prevent their adoption over time.
Relational databases were designed for usage scenarios that not all applications follow. In particular, they were designed to make writing easier than reading. But its not uncommon for a web application to do 10,000 reads for every write.This is backwards too. In fact, SQL's name in itself is due to the fact that it was initially a query-only language, without any integrity features important for updates. The foremost objective was to make databases more accessible to end-users.
It may well be that much of Web data is not important enough to require consistency. As one of my readers commented:
I think NoSQL has a place in the web because most of the information "web developers" want to store is useless and forgettable crap anyway. (I recall a .com I consulted at once who tried to sell my consulting firm its "user database" - I was able to point out to the owner that I could easily generate a mountainous list of duplicates of Mickey Mouse and Adolf Hitler myself without needing their "data.") When the people that employ these "developers" later find out that the "data" they have paid for is less substantial than wet toilet-tissue they then have an opportunity to learn the only way most humans seem capable of learning - by an abrasive experience. And the data itself can be (and probably would be even if it was queryable) safely flushed into the bitbucket. (I won't cry for the missed chances of marketers to sell more dirty ice to brain-damaged Eskimos.)But not all Web data is unimportant--not everybody is Facebook, Google, or Twitter. And as all the hoopla over analytics and BI attests, there is now much focus on extracting useful information from that data. A variety of ad-hoc, proprietary NoSQL niche products proliferate, promising to be magic wands that will enable information extraction from data that was never structured with consideration of this possible need. There is no better evidence of the delusional aspects of the IT industry than (1) XML and the notion that by turning HTML tags into "semantic" tags, the Web will become usefully structured (2) BigData and the notion that "unstructured" data can be readily and usefully mined. What good is mining inconsistent data?
Unfortunately, the industry persists the pattern of extending technologies invented for different purposes e.g. XML (data transfer) and object-orientation (programming) to data management.
The scalability of relational database transactions is fundamentally limited by Brewer’s CAP theorem. It says that you can’t have consistency, availability and partition tolerance all in one distributed system. You have to pick two out of three.A data model is purely logical and has nothing to do with scalability, which is part of te physical implementation. If SQL products do not scale, that is not because they are relational (which they really are not). If the promotion is accurate--it never is--then maybe there is finally some evidence in support of this argument.
More importantly, if consistency is not important, there is nothing even in SQL products that prevent its relaxation. What TRDBMSs (and I don't mean SQL) can do and NoSQL ones cannot is to guarantee consistency when necessary and relax it when it's not. Why trade down instead up?
The object oriented worldview of most application developers maps more easily to NoSQL databases than to relational databases. But in the past, this objection was brushed off. A manager might say "I don’t care if it takes a lot of work for you to map your objects to tables. Data must be stored in tables."If application developers have an OO worldview, that's because OO is an application programming paradigm. But that is exactly its problem, the view is application-specific, not of a shared data resource. The paradigm for the latter is a data model, which OO lacks. This is why extending it to databases is akin to turning DBMSs into applications, defeating the whole purpose: a DBMS is not an application, it was invented precisely to relieve applications from functions that need to be centralized and serve multiple applications.
But here's the $64,000 question: What exactly does all this have to do with 3NF???
Do you like this post? Please link back to this article by copying one of the codes below.
URL: HTML link code: BB (forum) link code:
People frequently leave college with the impression that "if it is in 3NF then it is relational"
ReplyDeleteYiorgos,
DeleteYou don't really think that the debunker-in-chief does not know that, do you?? :)
My question at the end was sort of rhetorical and to invite precisely the kind of comment you made and possibly a discussion.
I think the first question I would ask would be does the author of "The Rise and Fall of the Third Normal Form" understand what normalisation is and what the goals of normalisation are?
ReplyDeleteYiorgos has answered that for you: Many professionals conflate relational with normalization. So this is yet another attack on the relational model without a grasp of it.
DeleteHas a lot to do with the anti-intellectual nature of business and IT. Many IT professionals are hostile to theory because it restricts their freedom to do whatever they damn please. Who the hell is Codd to tell me what to do?
And that is, of course, a consequence of educational failure.
I think there is a connection between an enthusiasm for distributed processing and message passing approaches like object orientation.
ReplyDeleteThere is a long standing idea that large pieces of software can be put together by bolting together small independently written pieces that communicate by sending messages to each other. In other words integration takes place as an afterthought. This idea crops up again and again in CORBA, the actor model, Service Oriented Architecture, Enterprise Service Buses and any other number of trends. This approach is, of course, antithetical to having a central coherent data structure.
I suspect the message passing approach is fundamentally flawed and leads inevitably to inconsistency. Hence the tendency for its adherents to be become apologists for inconsistency. I wonder if it would be possible to prove the fundamental flaws in message passing methods?
Well, communication is one of the most difficult aspects even for people, so there is no reason to suspect that it's gonna be any easier within software and hardware.
DeleteThere is a valid argument to be made for system modularity, but that requires a clear and good grasp of the integrativity of the modules. The problem in what you describe is that the modularity is a way to avoid thinking about the integration because it is too hard, so components are sot of patched together.
"I suspect the message passing approach is fundamentally flawed and leads inevitably to inconsistency."
DeleteIn general there is no way to know that a message will be construed as it was intended. Provided it is in a format that is not discernably incorrect it will be interpretted as the receiver sees fit. Who knows if the sender and the receiver share one understanding?
If the exchange of messages were preceded by verification of shared understanding (e.g. constraints) all would be well. I can't recall ever hearing of a system that attempted that though.
The problem is hard enough when one is dealing with propositions *about* the world, as in the relational model. When one is dealing with attempted *simulations of* the world, as in the object "model" it is pretty much impossible. Even OO programmers balk at serializing and deserializing instances between systems even though that is exactly the purpose for which serialization was conceived.
A detailed description of what I meant by "communication is one of the hardest" even between people, let alone systems. There are other aspects that make it unreliable.
Delete"And that is, of course, a consequence of educational failure."
ReplyDeleteI won't leap to defend my instructors, who were entirely disdainful of database management (in the late 1970s), possibly as being akin to book-keeping. But I rarely encounter people with even failed relevant education. Most have no relevant education at all; the vast majority of my colleagues over the years have come fields such as marine biology, botany, particle physics, economics, and electrical engineering, to name just a few.
Most were bright enough that they could probably read (say) Date's Database Systems and really get a lot out of it. I can't say why they don't read it. Maybe they are paid so well they think they must be doing something right.
By education I mean intellectual development: acquisition of foundation knowledge (not just experience) and ability to think critically, independently, and abstractly. People are not born with this, they must be developed. That's why I claim that the problem is systemic and cultural, not a sheer database or IT problem.
DeleteThe Western society has gradually replaced education with job training. Academia is pressured from all sides (vendors, students, employers) to teach tools and to follow the buzzword rather than lead with scientific research. I can tell you from personal experience if they don't succumb to this pressure, they certainly won't be able to survive, let alone justify the obscene prices they charge. What is more, they are increasingly hire It professionals to teach who cannot do anything else but train.
They don't read because their education has failed them and the system cares only about experience with tools. People will do the minimum that is required of them to succeed.