NoSQL Zone is brought to you in partnership with:

Developer with experience in a variety of different systems and technologies, with a customer focus and balance with business goals. Particularly interested in backend and large scale systems, and also interested in high level architecture, and API design. Always open to feedback in order to keep learning and improving as a professional. Rodrigo is a DZone MVB and is not an employee of DZone and has posted 39 posts at DZone. You can read more from them at their website. View Full User Profile

Spanner: Google's Globally-Distributed Database

03.04.2013
| 2037 views |
  • submit to reddit

Curator's Note: The post below was originally published in September, 2012.

Today I read this recent paper by Google:

Spanner: Google's Globally-Distributed Database

As a globally-distributed database, Spanner provides several interesting features.

  • First, the replication configurations for data can be dynamically controlled at a fine grain by applications. Applications can specify constraints to control which datacenters contain which data, how far data is from its users (to control read latency), how far replicas are from each other (to control write latency), and how many replicas are maintained (to control durability, availability, and read performance).
  • Second, Spanner has two features that are difficult to implement in a distributed database: it provides externally consistent [16] reads and writes, and globally-consistent reads across the database at a time-stamp. These features enable Spanner to support consistent backups, consistent MapReduce executions [12], and atomic schema updates, all at global scale, and even in the presence of ongoing transactions.

The very interesting part is that it provides semi-relational tables (which some other NoSQL systems also do), SQL-like query language, and synchronous replications – you can see that many applications within Google use non-optimal data stores just to have the synchronous replication.

Spanner exposes the following set of data features to applications: a data model based on schematized semi-relational tables, a query language, and general- purpose transactions. The move towards supporting these features was driven by many factors.
  • The need to support schematized semi-relational tables and synchronous replication is supported by the popularity of Megastore [5]. At least 300 applications within Google use Megastore (despite its relatively low per- formance) because its data model is simpler to manage than Bigtable’s, and because of its support for synchronous replication across datacenters. (Bigtable only supports eventually-consistent replication across data- centers.)
  • The need to support a SQL-like query language in Spanner was also clear, given the popularity of Dremel [28] as an interactive data- analysis tool.
  • Finally, the lack of cross-row transactions in Bigtable led to frequent complaints; Percolator [32] was in part built to address this failing. Some authors have claimed that general two-phase commit is too expensive to support, because of the performance or availability problems that it brings [9, 10, 19]. We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. Running two-phase commit over Paxos mitigates the availability problems.

Essentially, they accomplish by still having several versions of each record (like BigTable, but with timestamps assigned by the server) and guaranteeing timestamps to be within an error margin by using multiple time sources like GPS and atomic clocks. Then there are Paxos groups, of course, and two-phase commits for the transactions.

I found this paper very interesting as I've been thinking of some challenges of implementing these features (like across DC replication and backup) on NoSQL data stores. Quite an interesting paper and accomplishment by this Google team.

Published at DZone with permission of Rodrigo De Castro, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)