NoSQL Zone is brought to you in partnership with:

Paul is a principal consultant at ThoughtWorks. He is enthusiastic about open source in particular. He is known for Dependency Injection (one of its pioneers with PicoContainer), Selenium browser automation (co-founder), Branch by Abstraction and most recently Client-Side MVC frameworks. Paul is a DZone MVB and is not an employee of DZone and has posted 69 posts at DZone. You can read more from them at their website. View Full User Profile

SCM and Key-Value Store Convergence

01.02.2013
| 1498 views |
  • submit to reddit

In this blog entry, I'm going to dwell on some important differences between SCM tools and Key-value stores that might narrow as work happens on these technologies in the next few years.

First, a reminder

Key-Value stores differ from Document Stores. The latter allow indexing by elements within the document, the former does not. Document stores are different to key-value stores in many more ways that understanding the nature of the payload, but that's what I want to dwell on for this blog entry (consider Document Stores grouped with Key-Value stores for the rest of this page).

SCM tools, you could say, are like Key-Value stores in that there's a key (the path to the resource) and the value (ordinarily the source file or resource). In many other ways there are differences:

SCM tools want to map the payload to a file in a file system, and deal with checkouts and changes to commit as sets. They'll associate a reason for the commit if the end user types a message at the pertinent moment. They also have some rules around the nature of the key, in that it must conform to something meaningful in a directory-delimited file system.

Key-Value stores want to supply fetches to a running program (say 'in memory'), and don't require a change-message to come back with a commit. They are much more open about the nature of the key.

History

History is one big difference between Key-Value stores and SCM tools. SCM tools keep history without having to encode a version/revision number/hash in the key. History is available orthogonally for an item, with the default being 'HEAD' (latest).

The NoSQL page on Wikipedia says nothing about History/Revisions/Versions yet.

Speed

Key-Value stores are build for speed of access. If replicas or distributed deployment happens as part of an application-stack build-out, then the Key-Value store is going to push changes around quite quickly, and most likely ahead of need. SCM tools by contrast, are most likely to only do that if the 'fetch' cycle contains an implicit or explicit 'refresh' operation. They're also likely to bring down more changes than just the resource being looked sought.

SCM tools can be made to be faster with a cache for the get/fetch cycle. You could add one if you understand the protocol, or are layering on top of the native protocol. You could say that the checkout to working copy, and a strategy to constantly refresh that could be fast, but having all items of a checkout local to the app that needs it could be inconvenient. For example, there are four million English Wikipedia articles, and you would not want them all on a iPhone, if that were the client. Not their minute-by-minute changes.

Branching

This I've labored in recent blog entries. SCM tools are typically good at having multiple branches of data that might have been identical at one point, and could be again if a merge happens. Typically that means text-based forms of data that stand a chance of being mergeable (JSON is better than XML for example). Maintained Divergence is also an SCM feature that allows two or more branches to keep their distance.

What next?

I see some of the NoSQL variants expand their tooling around history of documents. With that some brave soul will no-doubt try to make wrappers so that one can act as a formal SCM.

I'm not sure the enterprise SCM tool-makers will, but they could expand into the type of availability/scaling/consistency that a number of NoSQL vendors are delivering now. The FOSS vendors could move there more quickly. Deciding factors for various vendors: local-history (or not), read-only flags vs optimistic-locking, read vs write speed, client/server vs distributed. What's nice about the SCM vendors getting involved is that the science is decades old, and the implementations are industrially hardened in some cases.

Feel free to ignore me though :)

Footnote

Matthew Anslett of The 415 Group, published a tube-map style view of the range of choices for database/store things yesterday. It looks like the London underground (AKA 'tube') map a little.

I'm not sure that two-dimensions are enough to group implementations, yet also separate them. He is also not sure that this version is more appropriate than the his previous version six weeks ago or the one back in April of 2011. For example there was single rubber band "as a service" in the first version, that was three rubber bands in the version from last month, and is a green 'line' in the current version (again following the London Underground lines concepts).

Published at DZone with permission of Paul Hammant, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)