NoSQL Zone is brought to you in partnership with:

Kristina Chodorow is a core contributor to MongoDB. She has written several O'Reilly books (MongoDB: The Definitive Guide, Scaling MongoDB, and 50 Tips and Tricks for MongoDB Developers) and has given talks at conferences around the world, including OSCON, FOSDEM, Latinoware, TEK·X, and YAPC. Her Twitter handle is @kchodorow. Kristina is a DZone MVB and is not an employee of DZone and has posted 52 posts at DZone. You can read more from them at their website. View Full User Profile

How MongoDB's Journaling Works

10.07.2012
| 3624 views |
  • submit to reddit

I was working on a section on the gooey innards of journaling for The Definitive Guide, but then I realized it’s an implementation detail that most people won’t care about. However, I had all of these nice diagrams just laying around.

Good idea, Patrick!

So, how does journaling work? Your disk has your data files and your journal files, which we’ll represent like this:

When you start up mongod, it maps your data files to a shared view. Basically, the operating system says: “Okay, your data file is 2,000 bytes on disk. I’ll map that to memory address 1,000,000-1,002,000. So, if you read the memory at memory address 1,000,042, you’ll be getting the 42nd byte of the file.” (Also, the data won’t necessary be loaded until you actually access that memory.)

This memory is still backed by the file: if you make changes in memory, the operating system will flush these changes to the underlying file. This is basically how mongod works without journaling: it asks the operating system to flush in-memory changes every 60 seconds.

However, with journaling, mongod makes a second mapping, this one to a private view. Incidentally, this is why enabling journalling doubles the amount of virtual memory mongod uses.

Note that the private view is not connected to the data file, so the operating system cannot flush any changes from the private view to disk.

Now, when you do a write, mongod writes this to the private view.

mongod will then write this change to the journal file, creating a little description of which bytes in which file changed.

The journal appends each change description it gets.

At this point, the write is safe. If mongod crashes, the journal can replay the change, even though it hasn’t made it to the data file yet.

The journal will then replay this change on the shared view.

Finally, at a glacial speed compared to everything else, the shared view will be flushed to disk. By default, mongod requests that the OS do this every 60 seconds.

The last step is that mongod remaps the shared view to the private view. This prevents the private view from getting too “dirty” (having too many changes from the shared view it was mapped from).

And that’s how journaling works. Thanks to Richard, who gave the best explanation of this I’ve heard (Richard is going to be teaching an online course on MongoDB this fall, if you’re interested in more wisdom from the source).

 

 

 

Published at DZone with permission of Kristina Chodorow, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Stefan Krinkel replied on Tue, 2013/03/05 - 8:57am

"I’ll map that to memory address 1,000,000-1,002,000. So, if you read the memory at memory address 1,000,042, you’ll be getting the 42nd byte of the file.” Technically, you'd get the 43rd byte of the file at that address...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.