Sunday, February 17, 2013

Forward to the Past: Application-Managed Data Not a Distributed DBMS Make

Sima Ilic: This may be a little unusual ask, but I'd be interested to hear you opinion on Google's evolution of distributed databases use/development: from Megastore to Bigtable to Spanner.

I know that there may be only a handful of companies that need (or have resources to use/develop) such things: Google, Amazon, Facebook? Unfortunately, people talk about it like it's the end of relational DBMS (which is plain nonsense) or the next thing that everybody should be looking at or using (the only word that comes to mind is false, but it's not strong enough for marketing/sales people).

Let me tell you what prompted the question.

I came across the article. It describes what Spanner is and even explains the motivation for developing it. There are two aspects that I believe are worth commenting on. One is about the required features of a DBMS and the other is about response time being essential in a distributed DBMS.

DBMS Features: I believe that just a few quotes from Google articles illustrate the points I wanted to make. Quote:
Spanner is a scalable, globally-distributed database ... (DBMS, not database) [it] supports general-purpose transactions, and provides a SQL-based query language. Even though many projects happily use Bigtable, we have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication. Many applications at Google have chosen to use Megastore because of its semi-relational data model and support for synchronous replication, despite its relatively poor write throughput.
Spanner started being experimentally evaluated under production workloads in early 2011, as part of a rewrite of Google’s advertising back-end called F1. This back-end was originally based on a MySQL database that was manually sharded many ways.
A search for Bigtable using (as may be expected) Google yielded, quote:
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.
A search for Megastore found, quote:
Megastore is a storage system developed to meet the requirements of today's interactive online services. Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way and provides both strong consistency guarantees and high availability. We provide fully serializable ACID semantics within fine-grained partitions of data.
Time in Distributed DMBS: It seems to me that one of the requirements for Spanner was that it must be possible to reconstruct the chronology of transactions. The time that they're looking at is the commit time. Here are the quotes that justify the undertaking and explain what has been done.
Spanner has two features that are difficult to implement in a distributed database: it provides externally consistent reads and writes and globally-consistent reads across the database at a timestamp. These features are enabled by the fact that Spanner assigns globally-meaningful commit timestamps to transactions, even though transactions may be distributed. The timestamps reflect serialization order. The key enabler of these properties is a new TrueTime API and its implementation. The API directly exposes clock uncertainty and the guarantees on Spanner’s timestamps depend on the bounds that the implementation provides. If the uncertainty is large, Spanner slows down to wait out that uncertainty. Google’s cluster-management software provides an implementation of the TrueTime API. This implementation keeps uncertainty small (generally less than 10ms) by using multiple modern clock references (GPS and atomic clocks).
I'm no expert in distributed DBMS or distributed computing but it seems to me that the assignment of the timestamps to transactions must be serialized to achieve what they set out to achieve. Otherwise the uncertainty of less than 10ms will come back and bite them sooner or later. To quote one quite smart guy, (read Genius), named Dijkstra, who back in 1965 said:
I warn the reader that my consistent refusal to make any assumptions about the speed ratios will at first sight appear as a mean trick to make things more difficult than they already are. I feel, however, fully justified in my refusal. For instance, part of the system may be a manually operated input station, another part of the system might be such, that it can be stopped externally for any period of time, thus reducing its speed temporarily to zero. Secondly, and this is much more important, when we think that we can rely upon certain speed ratios, we shall discover that we have been "pound foolish and penny wise".
By the way, this article, fixed the development of operating systems for good.

Spanner: Google's Globally-Distributed Database
BigTable: A Distributed Storage System for Structured Data
MegaStore: Providing Scalable, Highly Available Storage for Interactive Services
Dijkstra Lectures EWD123

David McGoveran: My comments are based solely on a quick read of the three referenced articles, without any further research.

1. The authors assert that Spanner is "semi-relational" (a bizarre term in my opinion). This means that it provides a layer giving a key-value (a.k.a. associative) physical data model the appearance of tables.

2. These tables are not relational in any sense: rows are ordered and duplicates are allowed (as are NULLs).

3. The concurrency control model is based on a variant of global timestamping. Without going to the details, there is nothing wrong with this model per se.

4. At best, I would consider spanner a DBMS only in the most loose sense. It comprises a set of application callable and background services. For example:
   (a) Two-phase commit for distributed transactions is entirely application controlled, an approach that the SQL DBMS community left behind in the early 1990s.
   (b) Data partitioning depends on careful choice of key naming by the application developer. It is not specified at the schema level and so is not easily changed or optimized.
   (c) Joins and other operations are performed by the application programmer by coding. As is well-known, choice of best join algorithm depends dynamically on data value distributions, which are not available to application programmers, nor would many programmers know what to do with that information were it available.

5. Spanner is application centric. By this I mean that data is presumed to be owned by an application and operations on it are controlled almost entirely via application code (even though a library of services are supplied). It also means that Spanner imposes certain characteristics on the types of applications that will work well with it: For example, reads dominate writes, applications do not share data or have conflicting requirements and application-specific data can be modeled as a collection of hierarchies.

6. The Spanner conception of consistency does not seem to match the SQL DBMS (let alone relational) conception. In particular, there appears to be no notion of integrity constraints defined at the DBMS or data model level. As a result, although it may be true that transactions have the ACID properties, this really mans that what the application writes within a transaction is preserved (subject to the timestamp protocol) if committed. In other words, all integrity checking is the province of the application programmer, so the entire team had better get it right and be mutually consistent. This is a tall order and so prone to errors in all but the simplest of transactional applications that DBMS researchers long ago moved to declarative centralized DBMS enforced approach. When integrity checks are procedural, there can be side-effects and the outcome is dependent on order. If two programmers code the check differently, the integrity of shared data can be easily compromised. Worse, I've seen numerous examples over the past twenty years where the harmful effects were not detected until it was too late.

7. I can't consider Spanner a DBMS. Call it a toolkit for developing a dedicated DBMS.

I could go on had I time. I haven't gone into the positive features (like automatic partition balancing, replication and high scalability for the intended set of applications) of Spanner for fear my comments would be misunderstood.

The foregoing having been said, I don't criticize Spanner for its intended purpose. My concern is that its strengths are overstated and that subtleties like the limited enforcement of consistency (i.e., its restricted meaning) will mislead. Whether they understand it or not, the designers simply have not addressed the problem that SQL DBMSs (let alone TRDBMS) solve. Instead, they have designed a useful set of data management services. Put another way, Spanner provides programmers with tools for custom data management solutions relevant for a narrow set of applications. It is not a general solution to distributed transaction management, replication, consistency, and the like. It makes no sense to compare Spanner to enterprise-class DBMS products of any type, either positively or negatively.

David McGoveran is a consultant, researcher, lecturer, and industry analyst to the software industry; author of numerous books and articles in the fields of relational databases, transaction processing,

Ed. Note: This is more confirmation of my observation that the industry is regressing to application specific data management. Because data fundamentals, including history of the field, are not part of IT education, the raison d'etre of the DBMS is lost on the young generation of developers and software engineers. DBMS is perceived as just another application, returning data management burdens to the application developer.

This is not going to end any differently or better than it did in the pre-DBMS days and somebody will wake up one morning and have a grand idea: how about centralizing data management functions? He will probably call it something different than DBMS and deem it something new and everybody will jump on that bandwagon until a new fad comes along to solve the problems of the last one.

If you ever wondered why most of the time and effort is spent on migration, integration and belaboring to make a plethora of software talk to one another, rather than on productive work, it's the constant reinvention of ad-hoc, proprietary, narrowly scoped square wheels.This is regress, not progress.

No comments:

Post a Comment

View My Stats