Did you know? DZone has great portals for Python, Cloud, NoSQL, and HTML5!
NoSQL Zone is brought to you in partnership with:

Eric is a former teacher and future time traveler who has returned to the present to work at DZone - the coolest site on earth. He likes reading and curating about NoSQL and Cloud development, and is always excited to see something new, shiny, and distracting. In his free time, Eric likes knowing more about movies than you do, and looking forward to when the unified Chinese-American Corpocracy will develop the technology needed for transport to the year 2056. Eric is a DZone Zone Leader and has posted 149 posts at DZone. You can read more from them at their website. View Full User Profile

How Tumblr Evolved Towards JVM-Centric Development

02.15.2012
Email
Views: 3906
  • submit to reddit
This article is part of the DZone NoSQL Resource Portal, which is brought to you in collaboration with Neo Technology and DataStax. Visit the NoSQL Resource Portal for additional tutorials, videos, opinions, and other resources on this topic.
According to a recent interview, Tumblr's transformation from small startup to blogging beast has led to an adoption of a JVM-centric approach to development. Todd Hoff interviewed Blake Matheny, a Distributed Systems Engineer at Tumblr, who provided some useful insight into how the company has coped with its wild success.  Whether you like it or not, Tumblr is the 2nd most popular  social networking site in terms of time spent by users and is currently growing by 30% per month.  At approximately 500 million page views per day, Tumblr has had to make some major adjustments, including maintaining a team of almost 20 engineers to deal with the hurdles of massive scaling.  

Tumblr started on Rackspace in 2007, but quickly outgrew the space available through the IT hosting company.  They began with an open-source solution stack, and primarily developed with PHP - for a while, nearly every engineer at Tumblr programmed in PHP.  In the past, Tumblr's status as a startup kept them tied to a "squeeze everything out of a single server" approach, according to Matheny, but they have since moved on to bigger and better things.  

Perhaps the most surprising change at the development level has been a conversion to a JVM-centric approach in order to increase efficiency of hiring and development.  One aspect of this new JVM-centric approach has been the adoption of the Twitter library Finagle, a network stack that allows for the creation of asynchronous RPC clients and servers in any JVM-hosted language.  According to Hoff, this choice was made over node.js because the Tumblr team believed node.js wasn't "developed enough to have standards and best practices."  

At the same time, there has been a shift to non-relational data stores like HBase and Redis, although HBase has been used for "smaller less critical path projects" because the team claimed that it could not bank on HBase over MySQL.  It sounds like Tumblr is adamant about the effectiveness of MySQL sharing, as they have not adopted MongoDB despite its popularity in New York (their location).  Instead, Tumblr maintains that MySQL can "scale just fine."  Regarding Redis, the team currently has 22 Redis servers, with hundreds of Redis instances being used in production.  

For a startup that began just five years ago, Tumblr has had to deal with some big changes in their development philosophy.  At the outset, says Matherny, developers were encouraged to "use any tool that they wanted," but over time and with growth they realized that this just wouldn't work.  Thus, Tumblr has since standardized on a stack in order to address production issues, and implemented a lightweight, Scrum-like process.  The long road to change at Tumblr has left Matherny with some lessons learned that may be applicable to other companies meeting similar challenges.  Here are some of those lessons, as recorded by Hoff:

-  Automation everywhere.
-  MySQL (plus sharding) scales, apps don't.
-  Redis is amazing.
-  Scala apps perform fantastically.
-  Scrap projects when you aren’t sure if they will work.
-  Build around the skills of your team.
-  Read papers and blog posts. Key design ideas like the cell architecture and selective materialization were taken from elsewhere.
-  Wade, don’t jump into technologies. They took pains to learn HBase and Redis before putting them into production by using them in pilot projects or in roles where the damage would be limited.

You can read more details of Tumblr's evolution at the High Scalability blog

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Neo Technology and DataStax are leading the charge for the NoSQL movement.  You can learn more about the Neo4j Graph Database in the project discussion forums and try out the new Spring Data Neo4j, which enables POJO-based development.  You can also see how Apache Cassandra, a ColumnFamily data store, is pushing the boundaries of persistence with cloud capabilities and deployments at SocialFlow and Netflix.