NoSQL Zone is brought to you in partnership with:

Enterprise Architect in HCL Technologies a $7Billion IT services organization. My role is to work as a Technology Partner for large enterprise customers providing them low cost opensource solutions around Java, Spring and vFabric stack. I am also working on various projects involving, Cloud base solution, Mobile application and Business Analytics around Spring and vFabric space. Over 23 yrs, I have build repository of technologies and tools I liked and used extensively in my day to day work. In this blog, I am putting all these best practices and tools so that it will help the people who visit my website. Krishna is a DZone MVB and is not an employee of DZone and has posted 64 posts at DZone. You can read more from them at their website. View Full User Profile

Understanding Document-Based MongoDB NoSQL

02.14.2013
| 5717 views |
  • submit to reddit

Introduction: MongoDb NoSQL

This article touches upon various aspects of RDBMS and where RDBMS hits the roadblock in an enterprise world. This article also talks about how we can overcome this road blocks by relaxing few key aspects of RDBMS. It also talks about how a NoSQL based solution fits into this and as a popular MongoDb NoSQL solution.

RDBMS related to enterprise scalabilities, performance and flexibilities

RDBMS evolved out of strong roots in math like Relational and Set theories. Some of the aspects are Schema validation, normalized data to avoid duplication, atomicity, locking, concurrency, high availability and one version of the truth.

While these aspects have lot of benefits in terms of data storage and retrieval, they can impact enterprise scalabilities, performance and flexibilities. Let us consider a typical purchase order example. In RDBMS world we will have 2 tables with one-to-many relationship as below,

Consider that we need to store huge amount of Purchase orders and we started partitioning, one of the ways to partition is to have OrderHeader table in one Db instance and LineItem information in another. And if you want to insert or Update an Order information, you need to update both the tables atomically and you need to have a transaction manager to ensure atomicity. If you want to scale this further in terms of processing and data storage, you can only increase hard disk space and RAM.

The easy way to achieve Scaling in RDBMS is Vertical Scaling

MongoDb NoSQL

Let us consider another situation, because of the change in our business we added a new column to the LineItem table called LineDesc. And imagine that this application was running in production. Once we deploy this change, we need to bring down the server and for some time to take effect this change.

Achieving enterprise scalabilities, performance and flexibility

Fundamental requirements of modern enterprise systems are,

  1. Flexibilities in terms of scaling database so that multiple instance of the database can process the information parallel
  2. Flexibilities in terms of changes to the database can be absorbed without long server downtimes
  3. Application /middle tier does not handle Object-relational impedance mismatch – Can we get away with it using techniques like JSON

Let us go back to our PurchaseOrder example and relax some of the aspects of RDBMS like normalization (avoid joins of lot of rows), atomicity and see if we can achieve some of the above objectives.

Below is an example of how we can store the PurchaseOrder (there are other better way of storing the information).

orderheader:{
orderdescription: “Krishna’s Orders”
date:"Sat Jul 24 2010 19:47:11 GMT-0700(PDT)",
lineitems:[
{linename:"pendrive", quantity:"5"}, {linename:"harddisk", quantity:"10"}
]
}

If you notice carefully, the purchase order is stored in a JSON document like structure. You also notice, we don’t need multiple tables, relationship and normalization and hence there is no need to join. And since the schema qualifiers are within the document, there is no table definition.

You can store them as collection of objects/documents. Hypothetically if we need to store several millions of PurchaseOrders, we can chunk them in groups and store them in several instances.

If you want to retrieve PurchaseOrders based on specific criteria, for example all the purchase orders in which one of the line item is a “pendrive”, we can ask all the individual instances to retrieve in “parallel” based on the same criteria and one of them can consolidate the list and return the information to the client. This is the concept of Horizontal Scaling

MongoDb NoSQL

Because the there is no separate Table schema and and the schema definition is included in the JSON object, we can change document structure and store and retrieve with just change in application layer. This does not need database restart.

Finally the object structure is JSON, we can directly present it to the web tier or mobile device and they will render it.

NoSQL is a classification of Database which is designed to keep the above aspects in mind.

MongoDb: Document based NoSQL

MongoDb NoSQL database is document based which is some of the above techniques to store and retrieve the data. There are few NoSQL databases that are Ordered Key Value based like Redis, Cassandra whichalso take these approaches but are much simpler.

If you have to give RDBMS analogy, Collection in MongoDb are similar to Tables, Document are similar to Rows. Internally MongoDb stores the information as Binary Serializable JSON objects called BSON. MongoDb support JavaScript style query syntax to retrieve BSON objects.

MongoDb NoSQL

Typical documents looks as below,

post={
author:“Hergé”,
date:new Date(),
text:“Destination Moon”,
tags:[“comic”,“adventure”] }

> db.post.save(post)

------------

>db.posts.find()  {

_id:ObjectId(" 4c4ba5c0672c685e5e8aabf3"),
author:"Hergé",
date:"Sat Jul 24 2010 19:47:11 GMT-0700(PDT)",
text:"Destination Moon",
tags:["comic","adventure"]
}

In MongoDb, atomicity is guaranteed within a document. If you have to achieve atomicity outside of the document, it has to be managed at the application level. Below is an example,

Many to many:

products:{
_id:ObjectId("10"),
name:"DestinationMoon",
category_ids:[ObjectId("20"),ObjectId("30”)]}
categories:{
_id:ObjectId("20"),
name:"adventure"}

//All products for a given category

>db.products.find({category_ids:ObjectId("20")})

//All categories for a given product
product=db.products.find(_id:some_id)

>db.categories.find({

_id:{$in:product.category_ids}
})

[feedly mini] 

In a typical stack that uses MongoDb, it makes lot of sense to use a JavaScript based framework. A good web framework, we use Express/Node.js/MongoDb stack. A good example of how to use these stack is here.

MongoDb NoSQL also supports sharding which supports parallel processing/horizontal scaling. For more details on how a typical BigData handles parallel processing/horizontal scaling, refer Rickly Ho’s link

A typical use cases for MongoDb include, Event logging, Realtime Analytics, Content Management, Ecommerce. Use cases where it is not a good fit are Transaction base Banking system, Non Realtime Data warehousing

References:








 

Published at DZone with permission of Krishna Prasad, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Ferdinando Colu... replied on Wed, 2013/02/20 - 7:10am

I think that in the enterprise world, the transaction atomicity is a must ... almost for the core part of the enterprise application.  The simpler examples are, of course, the inventory and the accounting subsystems, but any real time application handling more than one type of entity(document) in one transaction does require ACID properties. Perhaps you know a way to guarantee data integrity without that ?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.