NoSQL Zone is brought to you in partnership with:

Mike is currently working as a senior LAMP architect for a San Francisco start-! !up. With over 30-years of exeperience in the programming field, his only comment is "I should know better by now." Micheal is a DZone MVB and is not an employee of DZone and has posted 12 posts at DZone. You can read more from them at their website. View Full User Profile

An Inquiry into a Slow MongoDB Query, and How thie User Sped Things Up

02.19.2012
| 5776 views |
  • submit to reddit
Why’s my mongodb query so slow?

I got my geospatial collection set-up — I am running some really great queries making sure that the locations I am pulling aren’t in any sort of cache, and I am just blown-away by how fast data is being returned.

The problem is:  when I query the collection to pull up the requisite lon/lat data by name:  city & state, or city & country, the query seems to take seconds to complete!

I set-up the table correctly…I indexed the crap out of all my columns…a week or two ago, I was at the mongoSV 2011 in Santa Clara and learned some really cool stuff about queries, indexing, and performance management, so let’s dig-out the notes and see where I went wrong.  Because I strongly doubt that the problem is in mongo but, rather as we used to say in technical support: this is a PBCK issue…

The first thing I want to do is run an explain against my query so I can see mongo’s query plan for my query.  This should provide me with a starting point for trying to figure out what went wrong.

> db.geodata_geo.find({ cityName : “Anniston”, stateName : “Alabama” }).explain();

 By adding the trailing function: .explain(), I’m requesting that mongoDB return the query-plan to me instead of executing the query.  I hit enter to launch the explain() and get back the following output:

    > db.geodata_geo.find({ cityName : “Anniston”, stateName : “Alabama”}).explain();
    {
    “cursor” : “BasicCursor“,
    “nscanned” : 3691723,
    “nscannedObjects” : 3691723,
    “n” : 1,
    “millis” : 2269,
    “nYields” : 0,
    “nChunkSkips” : 0,
    “isMultiKey” : false,
    “indexOnly” : false,
    “indexBounds” : {

    }
    }

The important information, I bold-faced in the query output (above).   What this output is telling me is that I’ve using a “BasicCursor” for my search cursor — which is indicates that, yes, I am doing a table-scan on the collection.  So, already I know my query is not optimal.  But, wait!  More good news…

The value for nscanned and nscannedObjects is the same: 3,691,723 — which coincidently is the same as the cardinality of the collection.  This number is the number of documents scanned to satisfy the query which, given it’s value, confirms that I am doing a full table scan.

millis tells me the number of milliseconds that the query would take:  2.269 seconds:  way too slow for my back-end methods() serving a REST API — unacceptable.

And then we get to the tell:  IndexOnly tells me that if the query could have been resolved by an (existing) covering index.  Seeing the value false here tells me that the collection has no index on the columns I am scanning against.

What?!?  I know I indexed this collection…

So, I run db.geodata_geo.getIndexes() to dump my indexes and … I … don’t see my name columns indexed.  Oh, I remembered to index the the ID and Code columns…but somehow, indexing the Name columns completely slipped past my lower brain-pan.

I add these indexes to my collection:

> db.geodata_geo.ensureIndex({ cityName : 1 });
> db.geodata_geo.ensureIndex({ stateName : 1 });

 And then I rerun the query plan and see the following output:

    > db.geodata_geo.find({ cityName : “Anniston”, stateName : “Alabama”}).explain();
    {
    “cursor” : “BtreeCursor cityName_1″,
    “nscanned” : 2,
    “nscannedObjects” : 2,
    “n” : 1,
    “millis” : 101,
    “nYields” : 0,
    “nChunkSkips” : 0,
    “isMultiKey” : false,
    “indexOnly” : false,
    “indexBounds” : {
    “cityName” : [
    [
    "Anniston",
    "Anniston"
    ]
    ]
    }
    }

 Instead of BasicCursor, I see BtreeCursor which gives me a happy.  I also see that the nscanned and nscannedObjects values are now more realistic…seriously:  2 is a LOT better than 3.6 million something, right?  Another happy for me!

I score the third happy when I see that the millis has dropped down to 101:  0.101 seconds to execute this search/query!  Not jaw-dropping, I agree — but acceptable considering that everything is running off my laptop…I know production times will be much, much lower.

In the end, I learned that a simple tool like .explain() can tell me where my attention is needed when it comes to optimization and fixing even simple, seemingly innocent queries.  Knowing what you’re looking at, and what you’re looking for, is pretty much thick-end of the baseball bat when it comes to crushing one out of the park.

I hope this helps!

Source: http://shallop.com/2011/12/why-is-my-mongo-query-so-slow/

Published at DZone with permission of Micheal Shallop, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)