Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2574 posts at DZone. You can read more from them at their website. View Full User Profile

Launching Memcached into the Cloud

12.15.2009
| 7976 views |
  • submit to reddit
The ubiquitous Memcached memory caching system recently made another big step into the world of cloud computing.  Gear6, a clustering and caching startup that's been in business for four years, recently unveiled a web caching server that runs Memcached in the Amazon EC2.  Joaquin Ruiz, the EVP of Products at Gear6, and Mark Atwood, the Director of Community Development, were interviewed by DZone about this new release and the significance of Memcached in the cloud.  Ruiz says the web 2.0 world wouldn't be where it is right now without a Memcached type of service.  With the growth of dynamic web content, he says, most companies are having to contend with the Memcached protocol by competitive means or by adopting it.  Gear6 is a company that has adopted Memcached and they have built a Web Caching Server that gives Memcached a major boost.

Gear6 and Memcached
Gear6 started out with products for scaling back-end storage through NFS and distributed caching.  Several of Gear6's early customers ended up being massive web properties.  There were interesting commercial opportunities for Gear6 in scaling applications through distributed caching for storage.  Ruiz said there was even greater opportunity for speeding up web applications with distributed caching.  Gear6 found that the Memcached system offered exactly what customers wanted.  "There's a huge demand for mission critical features within Memcached," said Ruiz.  "The LAMP, Ruby, and Java worlds are all on Memcached."

Distributed caching and hashing algorithms have been around for a long time.  A lot of solutions have taken a proprietary approach or a unique approach where the application has to be rewritten to a large extent.  Ruiz says Memcached emerged six years ago when LiveJournal.com began offloading it's large MySQL farm using a distributed base, language neutral caching scheme. Their solution was called Memcached and it eventually became open source under the BSD.  Because Memcached is agnostic to the objects that go in it, all the web companies needed was the right client interface.  With the advent of web 2.0 and community content-driven sites, Memcached started popping up everywhere.  All of the large properties on the web are using it now except for Microsoft. 

Web Cache Server
When Ruiz came to Gear6, the company was beginning to shift its focus to the web and Memcached.  After Gear6 launched its beta for the Web Cache product line, all the way through GA they built out mission critical features for Memcached.  At a high level, Ruiz said, that had to do with how Memcached uses resources so that one doesn't have to dedicate a quarter or a third of their server infrastructure to Memcached in order to get the appropriate boosts in performance.  Other major features had to do with high-availablility and management.  Ruiz says if you're using Memcached to store session or profile data, for example, users coming into the site are able to immediately get content that is interesting because its all in memory.  "If that were to go down," he says, "the data is recoverable, but it may take 4 hours to 3 days to re-warm your cache if you're a moderate to large site, and that's not acceptable." 

Memcached in the Cloud
Gear6 launched a first GA incarnation of Memcached in April and because of customer demand, they kept expanding the packaging for their distiribution.  Recently, Gear6 announced a universal Memcached distribution that works with many of the major hardware platforms.  Ruiz says their customers started needing Memcached not just in their data centers, but also in the cloud to handle peak traffic so they don't have to design for peak within their data centers.  Gear6 announced this month that it could fulfill that need by spawning EC2 cloud instances with its Web Cache server Memcached product.  Ruiz says that the new offering is a lot more efficient with RAM.  "Typically Memcached is very fast but its live allocation routine uses the concept of buckets and bucket sizing, and so it's not terribly efficient," he says.  "It might take a 1.1k object and store it in a 2k bucket and you get a lot of wasted DRAM.  The bottom line is that we save a lot of DRAM which means we can get more per gigabyte than just standard Memcached."   This means with Gear6 you can deploy less DRAM to get the same effect, which is great for consolidation.



"From a technical perspective," Ruiz says, "we've included our block based interface in the Memcached server."  This allows customers to take advantage of the block based storage that's likely to be (in Amazon's case, it is) in the image.  Ruiz as that instead of just getting all the RAM, customers are able to use some of the block-based storage to expand the cache depth.

With its new Web Caching capabilities, Gear 6 is also the first to put Memcached on Amazon's High-Memory Instances.  Ruiz says these instances are built for applications like Memcached.  It is a perfect fit, he says, because Memcached is memory intensive.  High Memory Instances allow you to use a lot of memory without soaking up a lot of CPU resources.  The smallest of Amazon's instances has 34GB of DRAM and the largest has 68GB.  With so much RAM at one's disposal, putting Memcached in those instances results in a high amount of power and throughput. 

The Future of Memcached in the Cloud
Mark Atwood said, "One of the complaints when people are moving to the cloud environment is that they don't have Memcached servers.  People have had to do awkward 'roll-their-owns' or do without entirely.  If anything, Memcached is even more important when you're in a cloud environment because the machines there are frankly not as fast as when you buy your own hardware, and they're much more inconsistent."  Ruiz adds, in the cloud, one needs distributed caching even more to accelerate applications. 

"The reason Memcached is so important is that it's meant for dynamic web content that's generally very 'long tail'," said Ruiz.  Long tail is the opposite of frequently accessed static content, he said.  "If it's personalized content, none of this stuff can be cached on CDNs, said Ruiz.  "It doesn't make any sense to cache your status on a particular social network in a remote CDN halfway across the planet."  He says this information needs to be stored near the application or database.  "In order to make up for the lag to get the dynamic content that you need, it has to be in memory to begin with," said Ruiz  "For every tenth of a second increase in wait time on the internet, you see at least a 2% drop in traffic."  For performance's sake, he says, things like dynamic content has to be in memory. 

"I think in the evolution of the cloud you will begin to see memory-based services available," said Ruiz.  "My contention is that those interfaces will speak Memcached,  because everything else does.  Memcached is the most used way of getting stuff out of one layer and into memory and it's agnostic as to whether it is receiving database data, storage, or application content.  It's also language neutral."  Ruiz says there are already databases that already speak Memcached natively.  Drizzle is one example, and Mark Atwood is a contributor to that project.  Ruiz says Gear6 plans to add integration points between Memcached and non-relational databases, which are experiencing growth in the cloud.