NoSQL Zone is brought to you in partnership with:

Eric is the Editorial Manager at DZone, Inc. Feel free to contact him at egenesky@dzone.com Eric has posted 804 posts at DZone. You can read more from them at their website. View Full User Profile

How Tumblr Evolved Towards JVM-Centric Development

02.15.2012
| 6751 views |
  • submit to reddit
According to a recent interview, Tumblr's transformation from small startup to blogging beast has led to an adoption of a JVM-centric approach to development. Todd Hoff interviewed Blake Matheny, a Distributed Systems Engineer at Tumblr, who provided some useful insight into how the company has coped with its wild success.  Whether you like it or not, Tumblr is the 2nd most popular  social networking site in terms of time spent by users and is currently growing by 30% per month.  At approximately 500 million page views per day, Tumblr has had to make some major adjustments, including maintaining a team of almost 20 engineers to deal with the hurdles of massive scaling.  

Tumblr started on Rackspace in 2007, but quickly outgrew the space available through the IT hosting company.  They began with an open-source solution stack, and primarily developed with PHP - for a while, nearly every engineer at Tumblr programmed in PHP.  In the past, Tumblr's status as a startup kept them tied to a "squeeze everything out of a single server" approach, according to Matheny, but they have since moved on to bigger and better things.  

Perhaps the most surprising change at the development level has been a conversion to a JVM-centric approach in order to increase efficiency of hiring and development.  One aspect of this new JVM-centric approach has been the adoption of the Twitter library Finagle, a network stack that allows for the creation of asynchronous RPC clients and servers in any JVM-hosted language.  According to Hoff, this choice was made over node.js because the Tumblr team believed node.js wasn't "developed enough to have standards and best practices."  

At the same time, there has been a shift to non-relational data stores like HBase and Redis, although HBase has been used for "smaller less critical path projects" because the team claimed that it could not bank on HBase over MySQL.  It sounds like Tumblr is adamant about the effectiveness of MySQL sharing, as they have not adopted MongoDB despite its popularity in New York (their location).  Instead, Tumblr maintains that MySQL can "scale just fine."  Regarding Redis, the team currently has 22 Redis servers, with hundreds of Redis instances being used in production.  

For a startup that began just five years ago, Tumblr has had to deal with some big changes in their development philosophy.  At the outset, says Matherny, developers were encouraged to "use any tool that they wanted," but over time and with growth they realized that this just wouldn't work.  Thus, Tumblr has since standardized on a stack in order to address production issues, and implemented a lightweight, Scrum-like process.  The long road to change at Tumblr has left Matherny with some lessons learned that may be applicable to other companies meeting similar challenges.  Here are some of those lessons, as recorded by Hoff:

-  Automation everywhere.
-  MySQL (plus sharding) scales, apps don't.
-  Redis is amazing.
-  Scala apps perform fantastically.
-  Scrap projects when you aren’t sure if they will work.
-  Build around the skills of your team.
-  Read papers and blog posts. Key design ideas like the cell architecture and selective materialization were taken from elsewhere.
-  Wade, don’t jump into technologies. They took pains to learn HBase and Redis before putting them into production by using them in pilot projects or in roles where the damage would be limited.

You can read more details of Tumblr's evolution at the High Scalability blog
Published at DZone with permission of its author, Eric Genesky.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)