Big Data/Analytics Zone is brought to you in partnership with:

John Cook is an applied mathematician working in Houston, Texas. His career has been a blend of research, software development, consulting, and management. John is a DZone MVB and is not an employee of DZone and has posted 175 posts at DZone. You can read more from them at their website. View Full User Profile

The Rise and Fall of the Third Normal Form

11.27.2012
| 4605 views |
  • submit to reddit

The ideas for relational databases were worked out in the 1970s and the first commercial implementations appeared around 1980. By the 1990’s relational databases were the dominant way to store data. There were some non-relational databases in use, but these were not popular. Hierarchical databases seemed quaint, clinging to pre-relational approaches that had been deemed inferior by the march of progress. Object databases just seemed weird.

Now the tables have turned. Relational databases are still dominant, but all the cool kids are using NoSQL databases. A few years ago the implicit assumption was that nearly all data lived in a relational database, now you hear statements such as “Relational databases are still the best approach for some kinds of projects,” implying that such projects are a small minority. Some of the praise for NoSQL databases is hype, but some is deserved: there are numerous successful applications using NoSQL databases because they need to, not because they want to be cool.

So why the shift to NoSQL, and why now? I’m not an expert in this area, but I’ll repeat some of the explanations that I’ve heard that sound plausible.

  1. Relational databases were designed in an era of small, expensive storage media. They were designed to conserve a resource that is now abundant. Non-relational databases may be less conservative with storage.
  2. Relational databases were designed for usage scenarios that not all applications follow. In particular, they were designed to make writing easier than reading. But its not uncommon for a web application to do 10,000 reads for every write.
  3. The scalability of relational database transactions is fundamentally limited by Brewer’s CAP theorem. It says that you can’t have consistency, availability and partition tolerance all in one distributed system. You have to pick two out of three.
  4. Part of the justification for relational databases was that multiple applications can share the same data by going directly to the same tables. Now applications share data through APIs rather through tables. With n-tier architecture, applications don’t access their own data directly through tables, much less another application’s data.

The object oriented worldview of most application developers maps more easily to NoSQL databases than to relational databases. But in the past, this objection was brushed off.  A manager might say “I don’t care if it takes a lot of work for you to map your objects to tables. Data must be stored in tables.”

And why must data be stored in tables? One reason would be consistency, but #3 above says you’ll have to relax your ideas of consistency if you want availability and partition tolerance. Another reason would be in order to share data with other applications, but #4 explains why that isn’t necessary. Still another reason would be that the relational model has a theoretical basis. But so do NoSQL databases, or as Erik Meijer calls them, CoSQL databases.

I’m not an advocate of SQL or NoSQL databases. Each has its place. A few years ago developers assumed that nearly all data was best stored in a relational database. That was a mistake, just as it would be a mistake now to assume that all data should now move to a non-relational database.

Published at DZone with permission of John Cook, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)