NoSQL Zone is brought to you in partnership with:

Daniel Doubrovkine (aka dB.) is one of the tallest engineers at Art.sy. He founded and exited a successful Swiss start-up in the 90s, worked for Microsoft Corp. in Redmond, specializing in security and authentication, dabbled in large scale social networking and ran a big team that developed an expensive Enterprise product in NYC. After turning open-source cheerleader a few years ago in the worlds of C++, Java and .NET, he converted himself to Ruby and has been slowly unlearning everything he learned in the last 15 years of software practice. Daniel has posted 46 posts at DZone. You can read more from them at their website. View Full User Profile

A Tutorial on Paging and Iterating Over Large MongoDB Collection

03.29.2012
| 5873 views |
  • submit to reddit

Sometimes you need to iterate over a large MongoDB collection. The biggest issue is that, by default, cursors timeout after 10 minutes of inactivity. For very large collections it’s not uncommon to take longer than that to process results and you get an exception half way through the iteration. A cursor is a server-side construct, how about a client-side cursor?

Here’s a Mongo Ruby iterator that will call Mongo::Collection.find in increments.

    module Mongo
      class Collection
        def find_all(query = {}, by = 1000, &block)
          idx = 0
          while ((results = find(query, { :limit => by, :skip => idx })) && results.count(true) > 0)
            results.each do |result|
              yield result
              idx += 1
            end
          end
          self
        end
      end
    end

And a Mongoid iterator built into Mongoid::Criteria.

 

    module Mongoid
      class Criteria
        def each_by(by = 1000, &block)
          idx = 0
          set_limit = options[:limit]
          while ((results = clone.limit(by).skip(idx)) && results.any?)
            results.each do |result|
              return self if set_limit and set_limit >= total
              yield result
              idx += 1
            end
          end
          self
        end
      end
    end

Of course you must be careful that the collection doesn’t change during the iteration. If you add or remove an item before you, or will skip elements or process some elements twice.

 

 

Published at DZone with permission of its author, Daniel Doubrovkine. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)