NoSQL Zone is brought to you in partnership with:

Eric is the Editorial Manager at DZone, Inc. Feel free to contact him at egenesky@dzone.com Eric has posted 804 posts at DZone. You can read more from them at their website. View Full User Profile

Why is MongoDB Wildly Popular? It’s a Data Structure Thing.

01.16.2013
| 3209 views |
  • submit to reddit
Curator's Note: The content of this article was originally published over at the MongoLab blog .

“Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.” - Eric Raymond, in The Cathedral and the Bazaar, 1997

Linguistic innovation

The fundamental task of programming is telling a computer how to do something.  Because of this, much of the innovation in the field of software development has been linguistic innovation; that is, innovation in the ease and effectiveness with which a programmer is able to instruct a computer system.

While machines operate in binary, we don’t talk to them that way. Every decade has introduced higher-level programming languages, and with each, an advancement in the ability of programmers to express themselves. These advancements include improvements in how we express data structures as well as how we express algorithms.

The Object-Relational impedance mismatch

Almost all modern programming languages support OO, and when we model entities in our code, we usually model them using a composition of primitive types (ints, strings, etc…), arrays, and objects.

While each language might handle the details differently, the idea of nested object structures has become our universal language for describing ‘things’.

The data structures we use to persist data have not evolved at the same rate. For the past 30 years the primary data structure for persistent data has been the Table – a set of Rows comprised of Columns containing scalar values (ints, strings, etc…). This is the world of the relational database, popularized in the 1980′s by its transactionality, speedy queries, space efficiency over other contemporary database systems, and a meat-eating ORCL salesforce.

The difference between the way we model things in code, via objects, and the way they are represented in persistent storage, via tables, has been the source of much difficulty for programmers. Millennia of man-effort have been put against solving the problem of changing the shape of data from the object form to the relational form and back.

Tools called Object-Relational Mapping systems (ORMs) exist for every object-oriented language in existence, and even with these tools, almost any programmer will complain that doing O/R mapping in any meaningful way is a time-consuming chore.

Ted Neward hit it spot on when he said:

“Object-Relational mapping is the Vietnam of our industry”

There were attempts made at object databases in the 90s, but there was no technology that ever became a real alternative to the relational database. The document database, and in particular MongoDB, is the first successful Web-era object store, and because of that, represents the first big linguistic innovation in persistent data structures in a very long time. Instead of flat, two-dimensional tables of records, we have collections of rich, recursive, N-dimensional objects (a.k.a. documents) for records.

An Example: the Blog Post

Consider the blog post. Most likely you would have a class / object structure for modeling blog posts in your code, but if you are using a relational database to store your blog data, each entry would be spread across a handful of tables.

As a developer you, need to get know how to convert the each ‘BlogPost’ object to and from the set of tables that house them in the relational model.

A different approach

Using MongoDB, your blog posts can be stored in a single collection, with each entry looking like this:

{
    _id: 1234,
    author: { name: "Bob Davis", email : "bob@bob.com" },
    post: "In these troubled times I like to …",
    date: { $date: "2010-07-12 13:23UTC" },
    location: [ -121.2322, 42.1223222 ],
    rating: 2.2,
    comments: [
       { user: "jgs32@hotmail.com",
         upVotes: 22,
         downVotes: 14,
         text: "Great point! I agree" },
       { user: "holly.davidson@gmail.com",
         upVotes: 421,
         downVotes: 22,
         text: "You are a moron" }
    ],
    tags: [ "Politics", "Virginia" ]
 }

With a document database your data is stored almost exactly as it is represented in your program. There is no complex mapping exercise (although one often chooses to bind objects to instances of particular classes in code).

What’s MongoDB good for?

MongoDB is great for modeling many of the entities that back most modern web-apps, either consumer or enterprise:

  • Account and user profiles: can store arrays of addresses with ease
  • CMS: the flexible schema of MongoDB is great for heterogeneous collections of content types
  • Form data: MongoDB makes it easy to evolve structure of form data over time
  • Blogs / user-generated content: can keep data with complex relationships together in one object
  • Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas
  • System configuration: just a nice object graph of configuration values, which is very natural in MongoDB
  • Log data of any kind: structured log data is the future
  • Graphs: just objects and pointers – a perfect fit
  • Location based data: MongoDB understands geo-spatial coordinates and natively supports geo-spatial indexing

Looking forward: the data is the interface

There is a famous quote by Eric Raymond, in The Cathedral and the Bazaar (rephrasing an earlier quote by Fred Brooks from the famous The Mythical Man-Month):

“Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t  usually need your code; it’ll be obvious.”

Data structures embody the essence of our programs and our ideas. Therefore, as programmers, we are constantly inviting innovation in the ease with which we can define expressive data structures to model our application domain.

People often ask me why MongoDB is so wildly popular. I tell them it’s a data structure thing.

While MongoDB may have ridden onto the scene under the banner of scalability with the rest of the NoSQL database technologies,  the disproportionate success of MongoDB is largely based on its innovation as a data structure store that lets us more easily and expressively model the ‘things’ at the heart of our applications. For this reason MongoDB, or something very like it, will become the dominant database paradigm for operational data storage, with relational databases filling the role of a specialized tool.

Having the same basic data model in our code and in the database is the superior method for most use-cases, as it dramatically simplifies the task of application development, and eliminates the layers of complex mapping code that are otherwise required. While a JSON-based document database may in retrospect seem obvious (if it doesn’t yet, it will), doing it right, as the folks at 10gen have, represents a major innovation.


Published at DZone with permission of its author, Eric Genesky. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)