NoSQL Zone is brought to you in partnership with:

Software developer specializing in MongoDB, Python, Tornado, and Javascript, with particular interests in real-time web and new tools that get the job done with grace and alacrity. A. Jesse Jiryu is a DZone MVB and is not an employee of DZone and has posted 67 posts at DZone. You can read more from them at their website. View Full User Profile

Four Strategies for Maintainability with PyMongo

07.20.2012
| 2752 views |
  • submit to reddit

When I started writing Motor, my async driver for Tornado and MongoDB, my main concern was maintainability. I want 100% feature-parity with the official driver, PyMongo. And I don't just want it now: I want to easily maintain that completeness in the future, forever.

Maintainability is a struggle for the Tornado version of any Python library. There's always the gold-standard implementation of some library written in the standard blocking fashion, and then there's a midget cousin written for Tornado, which starts small and never seems to grow up. For example, Python ships with a SimpleXMLRPCServer which is fairly complete. If you're using Tornado, however, you have to use Tornado-RPC. It hasn't been touched in two years, and it has severe deficiencies, e.g. it doesn't work with tornado.gen.

Gevent solves the maintainability problem by monkey-patching existing libraries to make them async. When the library code changes, but the monkey-patching still works with the new version. Node.js, on the other hand, is a space where no synchronous libraries exist. The best implementation of any library for Node is already the async version.

But Tornado libraries are always playing catch-up with a more complete synchronous library, and usually not playing it very well.

With Motor, I've done the best job I can think of to get caught up with PyMongo and stay caught up. I have 4 strategies:

1. Reuse PyMongo. I use a cute technique with greenlets to reuse most of PyMongo's code and make it async. I've written up this method previously.

2. Directly test Motor. As with any library, thorough tests catch regressions, and it's particularly important with Motor because it could break when PyMongo changes. Testing async code is a bit painful; I've written both callback-style tests using my assertEventuallyEqual method, and generator-style tests using my async_test_engine decorator. If the underlying PyMongo code changes and breaks Motor, I'll know immediately.

3. Reuse PyMongo's tests. Just as Motor wraps PyMongo and makes it async, I've written another wrapper that makes Motor synchronous again, so Motor looks just like PyMongo. This wrapper is called Synchro. For each async Motor method, Synchro wraps it like:

class Collection(object):
    """Synchro's fake Collection, which wraps MotorCollection, which
       wraps the real PyMongo Collection.
    """
    def find_one(self, *args, **kwargs):
        loop = tornado.ioloop.IOLoop.instance()
        outcome = {}

        def callback(result, error):
            loop.stop()
            outcome['result'] = result
            outcome['error'] = error

        kwargs['callback'] = callback
        self.motor_collection.find_one(*args, **kwargs)
        loop.start()

        # Now the callback has been run and has stopped the loop
        if outcome['error']:
            raise outcome['error']
        else:
            return outcome['result']

(In the actual code I also add a timeout to the loop so an error doesn't risk hanging my tests.)

What does this craziness buy me? I can run most of PyMongo's tests, about 350 of them, against Synchro. Since Synchro passes these tests, I'm confident Motor isn't missing any features without my knowledge. So, for example, we're adding an aggregate method to PyMongo in its next release, and we'll add a test to PyMongo's suite that exercises aggregate. That test will fail against Synchro, since Synchro uses Motor and Motor doesn't have aggregate yet. The Synchro tests fail promptly, and I can simply add a line to Motor saying, "asynchronize aggregate, too."

4. Reuse PyMongo's documentation. Every Motor method takes the same parameters and has the same behavior as the PyMongo method it wraps, except it's async and takes a callback. I could just copy and paste PyMongo's docs and add the callback parameter to each method, but then when PyMongo's docs change Motor will fall behind. Instead, I wrote a Sphinx extension. For each method in Motor, the extension finds the analogous PyMongo documentation and adds the callback parameter. For example, the MotorCollection API docs are largely generated from PyMongo's Collection docs.

 

 

 

 

 

Published at DZone with permission of A. Jesse Jiryu Davis, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)