NoSQL Zone is brought to you in partnership with:

Kristina Chodorow is a core contributor to MongoDB. She has written several O'Reilly books (MongoDB: The Definitive Guide, Scaling MongoDB, and 50 Tips and Tricks for MongoDB Developers) and has given talks at conferences around the world, including OSCON, FOSDEM, Latinoware, TEK·X, and YAPC. Her Twitter handle is @kchodorow. Kristina is a DZone MVB and is not an employee of DZone and has posted 50 posts at DZone. You can read more from them at their website. View Full User Profile

An Introduction to Replication in MongoDB: Part 3

  • submit to reddit

Part 3 of the replication internals series: three handy tricks.

This is the third post in a three-part series on replication. See also parts 1 (replication internals) and 2 (getting to know your oplog).

DIY triggers

MongoDB has a type of query that behaves like the tail -f command: it shows you new data as it’s written to a collection. This is great for the oplog, where you want to see new records as they pop up and don’t want to query over and over.

If you want this type of ongoing query, MongoDB returns a tailable cursor. When this cursor gets to the end of the result set it will hang around and wait for more elements to be added to the collection. As they’re added, the cursor will return them. If no elements are added for a while, the cursor will time out and the client has to requery if they want more results.

Using your knowledge of the oplog’s format, you can use a tailable cursor to do a long pull for activities in a certain collection, of a certain type, at a certain time… almost any criteria you can imagine.

Using the oplog for crash recovery

Suppose your database goes down, but you have a fairly recent backup. You could put a backup into production, but it’ll be a bit behind. You can bring it up-to-date using your oplog entries.

If you use the trigger mechanism (described above) to capture the entire oplog and send it to a non-capped collection on another server, you can then use an oplog replayer to play the oplog over your dump, bringing it as up-to-date as possible.

Pick a time pre-dump and start replaying the oplog from there. It’s okay if you’re not sure exactly when the dump was taken because the oplog is idempotent: you can apply it to your data as many times as you want and your data will end up the same.

Also, warning: I haven’t tried out the oplog replayer I linked to, it’s just the first one I found. There are a few different ones out there and they’re pretty easy to write.

Creating non-replicated collections

The local database contains data that is local to a given server: it won’t be replicated anywhere. This is one reason why it holds all of the replication info.

local isn’t reserved for replication stuff: you can put your own data there, too. If you do a write in the local database and then check the oplog, you’ll notice that there’s no record of the write. The oplog doesn’t track changes to the local database, since they won’t be replicated.

And now I’m all replicated out. If you’re interested in learning more about replication, check out the core documentation on it. There’s also core documentation on tailable cursors and language-specific instructions in the driver documentation.

Published at DZone with permission of Kristina Chodorow, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)