Performance Zone is brought to you in partnership with:

Eric is an artist and storyteller who boasts his first computing experience juggling drivers into memory to run as many DOS games as possible without rebooting. Since then he has become a creative, patient reverse engineer who operates on the basic premise that something never has to be the way that it is. Eric has a B.S. in Computer Science from the Georgia Institute of Technology and his experience in the ETL layer across a broad range of industries has instilled in him an appreciation for semantics and the belief that all of mankind's problems are related to our adherence to inaccurate representations of reality. Eric is a DZone MVB and is not an employee of DZone and has posted 8 posts at DZone. You can read more from them at their website. View Full User Profile

MongoLab Presents Dex, the Index Bot

07.30.2012
| 4862 views |
  • submit to reddit
Greetings adventurers! MongoLab is pleased to introduce Dex! We hope he assists you on your quests.


Dex is a MongoDB performance tuning tool, open-sourced under the MIT license, that compares logged queries to available indexes in the queried collection(s) and generates index suggestions based on a simple rule of thumb. To use Dex, you provide a path to your MongoDB log file and a connection URI for your database. Dex is also registered with PyPi, so you can install it easily with pip.

Quick Start

pip install dex

THEN

dex -f mongodb.log mongodb://localhost

Dex provides runtime output to STDERR as it finds recommendations:

{
 
"index": "{'simpleIndexedField': 1, 'simpleUnindexedFieldThree': 1}",
 
"namespace": "dex_test.test_collection",
 
"shellCommand": "db.test_collection.ensureIndex(
  {'simpleIndexedField': 1, 'simpleUnindexedFieldThree': 1},
  {'background': true})"
 
}

As well as summary statistics:

Total lines read: 7 Understood query lines: 7 Unique recommendations: 5 Lines impacted by recommendations: 5

Just copy and paste each shellCommand value into your MongoDB shell to create the suggested indexes.

Dex also provides the complete analysis on STDOUT when it’s done, so you will see this information repeated before Dex exits. The output to STDOUT is an entirely JSON version of the above, so Dex can be part of an automated toolchain.

For more information check out the README.md and tour the source code at https://github.com/mongolab/dex. Or if you’re feeling extra adventurous, fiddle with the source yourself!

git clone https://github.com/mongolab/dex.git

The motivation behind Dex

MongoLab manages tens of thousands of MongoDB databases, heightening our sensitivity to slow queries and their impact on CPU.  What started as a set of operational heuristics has been cast as an automated tool.  For example, “Create indexes with this order: Exact values first. Sorted fields next. Ranged fields after that.”  It’s worked very well for us and we hope that it’ll work as well for your own MongoDB databases, even if you never host one with us.  We’ll continue to improve Dex — see the future directions below — and would love your feedback,  suggestions, and requests.

How it works

At a high level, Dex reads the MongoDB log and performs three steps:

  1. Parse the query
  2. Evaluate existing indexes against query
  3. Recommend an index (if necessary)
Step 1: Parse the query

Each query is parsed into an internal representation that classifies each query term into one of:

  • EQUIV – a standard equivalence check; ex: {a: 1}
  • SORT – a sort/orderby clause; ex: .sort({a: 1})
  • RANGE – a range or set check:
    Specifically: ‘$ne’, ‘$gt’, ‘$lt’, ‘$gte’, ‘$lte’, ‘$in’, ‘$nin’, ‘$all’, ‘$not’
  • UNSUPPORTED
    • Composite ($and, $or, $nor)
    • Nested “operators” not included in RANGE above.
Step 2: Evaluate existing indexes against query  The query is evaluated against each index according two criteria:
  • Coverage (none, partial, full) - a less granular description of fields covered. None corresponds to Fields Covered 0 and indicates the index is not used by the query. Full means the number of fields covered is equal to the number of fields in the query. Partial describes any value of fields covered value between None and Full.
  • Order (ideal or not) - describes whether the index is partially-ordered according to Dex’s notion of ideal index order. This notion of order is:
    Equivalence ○ Sort ○ Range
    This ordering is a synthesis of conventional indexing wisdom and a rule of thumb developed to avoid expensive MongoDB scanAndOrderoperations when performing sorted range queries.Note: Geospatial queries and indexes are unsupported. Index evaluation is performed but Dex will not make recommendations for Geospatial queries. Analysis continues but the index is no longer considered for recommendation purposes.
Step 3: Recommend an index (if necessary)

Once evaluation is complete, Dex considers an ideal index to have Coverage ‘full’ and Ideal Order true. If these conditions are not met, and the query itself contains no UNSUPPORTED terms, Dex reccommends an ideal index (with an index order of 1 for all fields).

Note: Dex does not really need to look at existing indexes in order suggest the ideal index for a given query. But Dex does examine existing indexes as a courtesy to users who already have them in place — both to provide analysis of partial indexes (in verbose mode), and to avoid suggesting indexes that already exist.

Future Directions

  • Line Parsers: Better coverage of log lines, with a goal of complete coverage of all indexable queries
  • Analyze the system.profile collection with -p option
  • Constrain analysis by a time range with -t option
  • Add Dex’s own “SLOW_MS” to narrow results if desired
  • Support geospatial queries
  • Improved recommendation caching, storing queries by mask and summing:
    • Number of like queries
    • Time consumed
    • Min/max time range
    • Min/max nscanned/nreturned
  • Improved recommendations:
    • Combine like recommendations (or generate recommendations from multiple like queries)
    • Measure cardinality (yay aggregation framework) in the collection to inform recommended index key order.

Conclusion

We’re really excited about Dex at MongoLab and look forward to the many improvements that are possible in the MongoDB automation space. I’m presenting Dex at the June MongoDB San Francisco User’s Group today, June 19, 2012. If you can make it on such short notice, check out the details here. We will follow up with a link to presentation slides a short time later.

Finally, for those interested in the indexing knowledge we’ve accumulated as Dex was built, check out my Cardinal $ins blog here.

As always, good luck out there!

Published at DZone with permission of Eric Sedor, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)