NoSQL Zone is brought to you in partnership with:

As a pasionate software developer, motivated by learning and appliyng innovative and interesting software development tools, techniques and methodologies, my professional objectives are the following. To be in a technology oriented enterprise where the technichal staff is the soul of the company. To be in an important IT team. Be able to design and develop state of the art software. Be able to apply new knowledge everyday, on innovative ways and with a great degree of freedom. To architect, design and develop software that uses the best practices of the field. Play with the latest technologies, learn everyday and participate in the research and innovation of the software products. Carlo is a DZone MVB and is not an employee of DZone and has posted 15 posts at DZone. You can read more from them at their website. View Full User Profile

Essential Reference for MongoDB

03.05.2012
| 3767 views |
  • submit to reddit

The following is a quick tutorial / reference to help you start using the MongoDB database.

We’ll start with a quick definition, and then go into the following topics, using a simple example: installing, inserting, querying, updating, deleting, indexes and explain, MapReducing, drivers, and distributing.


MongoDB is a document oriented database built with the intention of being able to deal with very big amounts of data with good performance, to adjust to the increasing data that modern applications have to deal with, particularly when “in the cloud”. MongoDB is also intended to keep some of the great characteristics from Relational Databases, like the capacity to execute dynamic queries, so that you don't lose this great flexibility.

 

Installing:

Installing MongoDB for using in our tests is really easy: just go to http://www.mongodb.org/downloads and download the pertinent version (This references uses Linux and MacOSX).

After download, gunzip and untar the file and that’s it. It is installed.

To start mongo server: go to the untared directory, go to the bin directory and execute ./mongod. (This will assume you have a /data/db directory in your system. If you don’t, then create one.)

 

Inserting:

MongoDB works with Documents. In Mongo, a Document is simply a binary representation of a JSON object called BSON. You can simply of a Document as a JSON object, and the 'binary' is just the way Mongo represents this object internally.

In Mongo, every document must belong to a Collection. (A Collection can be though as a table of a RDBMS, but just as a temporary aid to understanding. MongoDB Collections and RDBMS tables are very different things.) Every Collection belongs to a Database.

So let’s say we want to insert a Car Document, into a Cars Collection that belong to the Concesionary Database. We would do the following:

  • From the bin directory, and with the server started, we execute ./mongo to open the interactive Shell. The interactive shell of Mongo allows us to interact with the database server using JavaScript.
  • Next, we change to use our concesionary Database (even although the database doesn’t exist yet, this command will work):
use concesionary
  • Next, we insert our new Car in the Cars Collection (again, the Ccollection doesn’t exist yet, but it (and the concesionary Database) will get created when you insert the first element):

 

db.cars.insert({maker:'ferrari',model:'f50',acceleration:{speed100:3,speed200:9},colors:['white','black']});

As you can see, we are inserting a new Car, which is basically a JSON object (including simple types, subdocument types, and arrays).

Let’s insert another Car to use in the next section (on querying):

db.cars.insert({maker:'fiat',model:'500',acceleration:{speed100:10,speed200:’NEVER’},colors:['blue','red']});

 

Querying:

MongoDB allows you a lot of flexibility in querying, very close to what you can do with SQL. You can use lots of filters, comparisons, etc. We just do a couple of basics queries here, to get your feet wet.

In general you query MongoDB by calling the find method on the collection, and passing a JSON document with the selections you want to query on:

db.cars.find()

{ "_id" : ObjectId("4dde4d1c6eb878af72075592"), "maker" : "ferrari", "model" : "f50", "acceleration" : { "speed100" : 3, "speed200" : 9 }, "colors" : [ "white", "black" ] }

{ "_id" : ObjectId("4dde4f3d6eb878af72075593"), "maker" : "fiat", "model" : "500", "acceleration" : { "speed100" : 10, "speed200" : "NEVER" }, "colors" : [ "blue", "red" ] }


db.cars.find({maker:'ferrari'});

{ "_id" : ObjectId("4dde4d1c6eb878af72075592"), "maker" : "ferrari", "model" : "f50", "acceleration" : { "speed100" : 3, "speed200" : 9 }, "colors" : [ "white", "black" ] }

db.cars.find({'acceleration.speed200':'NEVER'});

{ "_id" : ObjectId("4dde4f3d6eb878af72075593"), "maker" : "fiat", "model" : "500", "acceleration" : { "speed100" : 10, "speed200" : "NEVER" }, "colors" : [ "blue", "red" ] }

db.cars.find({'colors':'white'});

{ "_id" : ObjectId("4dde4d1c6eb878af72075592"), "maker" : "ferrari", "model" : "f50", "acceleration" : { "speed100" : 3, "speed200" : 9 }, "colors" : [ "white", "black" ] }

 

Updating:

Basic updating is pretty straightforward. It needs a filter document, like find, and a parameter indicating how to modify the document:

db.cars.update({model:'f50'},{$set:{model:'f40'}});

db.cars.find({maker:'ferrari'});

{ "_id" : ObjectId("4dde54b56eb878af72075594"), "maker" : "ferrari", "model" : "f40", "acceleration" : { "speed100" : 3, "speed200" : 9 }, "colors" : [ "white", "black" ] }

 

Deleting:

Deleting is even more starightforward than updatin. It just requires the document filter (or nothing, if you want to delete all the documents in the collection):

db.cars.remove({maker:'ferrari'});

db.cars.find()
{ "_id" : ObjectId("4dde4f3d6eb878af72075593"), "maker" : "fiat", "model" : "500", "acceleration" : { "speed100" : 10, "speed200" : "NEVER" }, "colors" : [ "blue", "red" ] }
db.cars.remove()
db.cars.find()
>

 

Creating indexes, and query explain:

Indexes, as in any other database, are extremely important in MongoDB, and extremely important to get right. They work like you may expect, and allow you to accelerate the speed and performance dramatically of your queries, if applied right. You can create compund indexes as well. Here we will touch the basics once again.

Let’s insert our two cars again:

db.cars.insert({maker:'fiat',model:'500',acceleration{speed100:10,speed200:'NEVER'},colors:['blue','red']});

db.cars.insert({maker:'ferrari',model:'f50',acceleration{speed100:3,speed200:9},colors:['white','black']});

MongoDB automatically creates an index for the _id property of its documents. We can query existent indexes like this:

db.system.indexes.find()

{ "name" : "_id_", "ns" : "concesionary.cars", "key" : { "_id" : 1 }, "v" : 0 }

Our application probably will make a lot of queries per car maker, so we will add an index to the maker property like this:

db.cars.ensureIndex({maker: 1})

Now when we query for existent indexes we get our new index:

{ "name" : "_id_", "ns" : "concesionary.cars", "key" : { "_id" : 1 }, "v" : 0 }
{ "_id" : ObjectId("4dde57fe6eb878af72075597"), "ns" : "concesionary.cars", "key" : { "maker" : 1 }, "name" : "maker_1", "v" : 0 }

So how can we see if some query is using our index? Simple enough: we use the explain method to do so. But before doing that, let’s remove the index and run explain without it.

db.runCommand({deleteIndexes: "cars", index: "maker_1"})

db.system.indexes.find()
{ "name" : "_id_", "ns" : "concesionary.cars", "key" : { "_id" : 1 }, "v" : 0 }

We removed the index; now let’s see explain in action:

db.cars.find({maker:'ferrari'}).explain()

{
"cursor" : "BasicCursor",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

The main things to take a look at when running explain (for the purposes of our discussion) are: the type of cursor, the nscanned, and the n attributes.

The cursor “BasicCursor” is simply a cursor that scans through all the collection to get the query results. nscanned is the total documents scanned. n is the total documents returned. In an ideal world, the n and nscanned should be the same.

Now let’s create the index again and rerun the explain for the query:

db.cars.ensureIndex({maker: 1})
db.cars.find({maker:'ferrari'}).explain()
{
"cursor" : "BtreeCursor maker_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"maker" : [
[
"ferrari",
"ferrari"
]
]
}
}


We can see the different results. We are using the index (indicated by the cursor property) and the nscanned and n properties have the same value. We are just scanning the elements we are returning.


Map Reducing:

Apart from the common grouping operations allowed by MongoDB (like sum, max, etc.) we can use MapReduce for more fine-grained and customized grouping requirements -- and it is built into the mongodb functionality. (For an explanation of MapReduce see: http://cscarioni.blogspot.com/2010/11/hadoop-basics.html). Here we show an extremely simple MapReduce:

Let's say we want to simply count all the cars per maker.

First, we insert a new Fiat into the collection (do that yourself).

Then we define our MapReduce in one line, like this:

db.cars.mapReduce(
function(){
emit (this.maker,{number:1})
},
function(key,values){
var total = 0;
values.forEach(
function(value){
total += value.number;
});
return {total:total}},"result"
)

When we run it, we get this:

{
"result" : "result",
"timeMillis" : 3,
"counts" : {
"input" : 3,
"emit" : 3,
"output" : 2
},
"ok" : 1,
}

and the result of the counting is in the new result collection:

 

db.result.find()
{ "_id" : "ferrari", "value" : { "number" : 1 } }
{ "_id" : "fiat", "value" : { "total" : 2 } }

As we can see, the MapReduce method receives a map function, a reduce function, and normally the name of the collection to store the results.

 

Drivers:

In this section we aren’t going to say a lot. Simpl:  there already exist in MongoDB drivers for the most common programming languages out there. They all work kind of the same (taking into account the advantages and limitations of each programming language) and they are pretty easy to start experimenting with.

 

Distributing:

One of the most important characteristics of MongoDB is its support for distribution, from creating Replica Sets to Sharding. I’ll cover that soon, in another article. For now, let's just say that the sharding model is really powerfull and allows for transparent failover, and transparent sharding and distribution of data chunks across the sharded cluster.

 


Published at DZone with permission of Carlo Scarioni, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)