Thursday, May 26, 2011

MongoDB Tutorial/Reference Essentials

The following is a quick tutorial / reference to start using the mongodb database.

We’ll start with a fast definition, and then go into the following topics, in a fast example approach:

Installing
Inserting
Querying
updating
deleting
Indexes and Explain
map reducing
drivers
distributing


MongoDB is a document oriented database built with the intention of being able to deal with very big amounts of data with good performance, to adjust to the increasing data that modern applications have to deal with, particularly when “in the cloud”. MongoDB is also intended to keep some of the great characteristics from Relational Databases, like the capacity to execute dynamic queries, so that you don't miss this great flexibility.

Installing:


Installing MongoDB for using in our testings is relly easy just go to http://www.mongodb.org/downloads and download the pertinent version (This references uses Linux and MacOSX).

After download, gunzip and untar the file and that’s it. It is installed.
To start mongo server go to the untared directory, go to bin directory and execute ./mongod (This will assume you have a /data/db directory in your system, if you don’t create one)


Inserting:


MongoDB works with Documents. In Mongo, a Document is simply a binary representation of a JSON object called BSON. you can simply think that a Document is a JSON object, and the binary thing is just the way mongo represents this object internally.

In Mongo, every document must belong to a Collection (a Collection can be though as a table of a RDBMS, but just for help understanding them, because they are different things), and every Collection should belong to a Database.

So let’s say we want to insert a car Document, into a Cars collection that belong to the Concesionary Database. We would do the following:

- From the bin directory, and with the server started, we execute ./mongo to open the interactive Shell. The interactive shell of mongo allow us to interact with the database server using Javascript.

- Next, we change to use our concesionary database (Even although the database doesn’t exist yet this command will work)

use concesionary

- Next, we insert our new Car in the cars collection (Again, the collection doesn’t exist yet, but it (and the concesionary database) will get created when inserting the first element.)

db.cars.insert({maker:'ferrari',model:'f50',acceleration:{speed100:3,speed200:9},colors:['white','black']});

As you can see we are inserting a new card, that is basically a JSON object (including simple types, subdocument types and arrays).

Let’s insert another car to use in the next session on querying:

db.cars.insert({maker:'fiat',model:'500',acceleration:{speed100:10,speed200:’NEVER’},colors:['blue','red']});

Querying:


MongoDB allows you a lot of flexibility in querying, very close to what you can do with SQL. You can use lots of filters, comparisons, etc. We just do a couple of basics queries here, to get your feet wet.

In general you query MongoDB calling the find method on the collection, and passing a JSON document with the selections you want to query on:

db.cars.find()

{ "_id" : ObjectId("4dde4d1c6eb878af72075592"), "maker" : "ferrari", "model" : "f50", "acceleration" : { "speed100" : 3, "speed200" : 9 }, "colors" : [ "white", "black" ] }

{ "_id" : ObjectId("4dde4f3d6eb878af72075593"), "maker" : "fiat", "model" : "500", "acceleration" : { "speed100" : 10, "speed200" : "NEVER" }, "colors" : [ "blue", "red" ] }



db.cars.find({maker:'ferrari'});


{ "_id" : ObjectId("4dde4d1c6eb878af72075592"), "maker" : "ferrari", "model" : "f50", "acceleration" : { "speed100" : 3, "speed200" : 9 }, "colors" : [ "white", "black" ] }

db.cars.find({'acceleration.speed200':'NEVER'});

{ "_id" : ObjectId("4dde4f3d6eb878af72075593"), "maker" : "fiat", "model" : "500", "acceleration" : { "speed100" : 10, "speed200" : "NEVER" }, "colors" : [ "blue", "red" ] }

db.cars.find({'colors':'white'});

{ "_id" : ObjectId("4dde4d1c6eb878af72075592"), "maker" : "ferrari", "model" : "f50", "acceleration" : { "speed100" : 3, "speed200" : 9 }, "colors" : [ "white", "black" ] }

Updating:


Basic updating is pretty straightforward, it needs a filter document, like find, and a parameter indicating how to modify the document:

db.cars.update({model:'f50'},{$set:{model:'f40'}});

db.cars.find({maker:'ferrari'});

{ "_id" : ObjectId("4dde54b56eb878af72075594"), "maker" : "ferrari", "model" : "f40", "acceleration" : { "speed100" : 3, "speed200" : 9 }, "colors" : [ "white", "black" ] }

Deleting


Even more starightforward than updating, just requiring the document filter (or nothing if you want to delete all the documents in the collection):

db.cars.remove({maker:'ferrari'});

db.cars.find()
{ "_id" : ObjectId("4dde4f3d6eb878af72075593"), "maker" : "fiat", "model" : "500", "acceleration" : { "speed100" : 10, "speed200" : "NEVER" }, "colors" : [ "blue", "red" ] }
db.cars.remove()
db.cars.find()
>

Creating indexes, and query explain:


Indexes, as in any other database are extremely important in MongoDB and extremely important to get them right. They work like you may expect, and allow you to accelerate the speed and performance dramatically of your queries if applied right. You can create compund indexes as well. Here we will touch the basics once again.

Let’s insert our two cars again:

db.cars.insert({maker:'fiat',model:'500',acceleration{speed100:10,speed200:'NEVER'},colors:['blue','red']});

db.cars.insert({maker:'ferrari',model:'f50',acceleration{speed100:3,speed200:9},colors:['white','black']});

MongoDB automatically creates an index for the _id property of its documents. We can query existent indexes like this:

db.system.indexes.find()

{ "name" : "_id_", "ns" : "concesionary.cars", "key" : { "_id" : 1 }, "v" : 0 }

Our application probably will make a lot of queries per car maker, so we will add an index to the maker property like this:

db.cars.ensureIndex({maker: 1})

Now when we query for existent indexes we get our new index:

{ "name" : "_id_", "ns" : "concesionary.cars", "key" : { "_id" : 1 }, "v" : 0 }
{ "_id" : ObjectId("4dde57fe6eb878af72075597"), "ns" : "concesionary.cars", "key" : { "maker" : 1 }, "name" : "maker_1", "v" : 0 }


So how can we see if some query is using our index?. Simple enough we use the explain method to do so, but before doing that, let’s remove the index and run explain without it.

db.runCommand({deleteIndexes: "cars", index: "maker_1"})

db.system.indexes.find()
{ "name" : "_id_", "ns" : "concesionary.cars", "key" : { "_id" : 1 }, "v" : 0 }


We removed the index, now let’s see explain in action:

db.cars.find({maker:'ferrari'}).explain()

{
"cursor" : "BasicCursor",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}


The main things to take a look at when running explain (for the purposes of our discussion) is the type of cursor, the nscanned, and the n attributes.

The cursor “BasicCursor” is simply a cursor that scans through all the collection to get the query results, nscanned is the total documents scanned, and the n is the total documents returned. In an ideal world the n and nscanned should be the same.

Now let’s create the index again and rerun the explain for the query:

db.cars.ensureIndex({maker: 1})
db.cars.find({maker:'ferrari'}).explain()

{
"cursor" : "BtreeCursor maker_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"maker" : [
[
"ferrari",
"ferrari"
]
]
}
}



We can see the different results. We are using the index (Indicated by the cursor property) and the nscanned and n properties have the same value. We are just scanning the elements we are returning.


Map Reducing:


Apart from the common grouping operations allowed by MongoDB (like sum, max, etc) we can use map reduce for more grained and customized grouping requirements, and it is built into the mongodb functionality. (For an explanation of map reduce see: http://cscarioni.blogspot.com/2010/11/hadoop-basics.html). Here we show an extremely simple map reduce:

We want to simply count all the cars per maker:
First we insert a new fiat into the collection (Do that yourself);
Then we define our map reduce in one line like this:

db.cars.mapReduce(
function(){
emit (this.maker,{number:1})
},
function(key,values){
var total = 0;
values.forEach(
function(value){
total += value.number;
});
return {total:total}},"result"
)


When runned we get this:

{
"result" : "result",
"timeMillis" : 3,
"counts" : {
"input" : 3,
"emit" : 3,
"output" : 2
},
"ok" : 1,
}



and the result of the counting is in the new result collection:

db.result.find()
{ "_id" : "ferrari", "value" : { "number" : 1 } }
{ "_id" : "fiat", "value" : { "total" : 2 } }



As we can see the mapReduce method receives a map function, a reduce function, and normally the name of the collection to store the results.


Drivers:


In this section we aren’t going to say a lot. Simply that there already exists mongodb drivers for the most common programming languages out there, they all work kind of the same (taking into account the advantages and limitations of each programming language) and they are pretty easy to start experimenting with.



Distributing:


One of the most important characteristics of MongoDB is its support for distribution, from creating Replica Sets to Sharding, I’ll cover that in a soon to write article. For now just to say that the sharding model is really powerfull and allow for transparent failover, and transparent sharding and distribution of data chunks accross the sharded cluster.