Did you know? DZone has great portals for Python, Cloud, NoSQL, and HTML5!
NoSQL Zone is brought to you in partnership with:

Mitch Pronschinske is the Senior Content Curator (aka. "Lord of the Zones") at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often has hotdogs for lunch, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone employee and has posted 1703 posts at DZone. You can read more from them at their website. View Full User Profile

Cassandra Adds Hadoop MapReduce

04.13.2010
Email
Views: 20028
  • submit to reddit
This article is part of the DZone NoSQL Resource Portal, which is brought to you in collaboration with Neo Technology and DataStax. Visit the NoSQL Resource Portal for additional tutorials, videos, opinions, and other resources on this topic.
Today the Cassandra project announced its first new release since becoming a Top-Level Project at Apache.  Don't let the low version number fool you.  Cassandra 0.6 is one of the most mature NoSQL distributed data stores in the open source market.  It was heavily developed by Facebook before it was open sourced in August 2008.  Currently Cassandra is being used by four of the largest social media sites in the world: Facebook, Digg, Reddit, and Twitter

One of the primary new features in Cassandra 0.6 is support for Apache Hadoop.  This is a major upgrade for Cassandra, giving it even more "big data" capabilities.  The new feature will allow Cassandra to run analytics against its own data using Hadoop's reliable MapReduce framework.

                      

Cassandra 0.6 simplifies its architecture with a new integrated caching row.  With the implementation of this new feature, Cassandra no longer needs a separate caching layer.  Along with the simplified architecture, Cassandra 0.6 also features a performance boost.  The distributed data store can already process thousands of writes per second, and this version's enhancements builds on that number.

"Apache Cassandra 0.6 is 30% faster across the board, building on our already-impressive speed," said Jonathan Ellis, Apache Cassandra Project Management Committee Chair in the press release.  "It achieves scale-out without making the kind of design compromises that result in operations teams getting paged at 2 AM."  The Storage Team Technical Lead at Twitter, Ryan King, explained Twitter's reasons for using Cassandra: "At Twitter, we're deploying Cassandra to tackle scalability, flexibility and operability issues in a way that's more highly available and cost effective than our current systems."

One of Cassandra's best known features is its lack of any single point of failure.  The data store's distributed system smoothly replaces any node that goes down with a new node.  The system also has the flexibility to be tuned for more consistency or more availability.

The previous version of Cassandra (0.5) added load balancing and significantly improved bootstrap and concurrency.  New tools were also added, including JSON-based data import and export, new JMX metrics, and an improved command line interface.  “It's fantastic seeing the Project's community at the ASF grow to match the promise of the technology," said Ellis.

You can download Cassandra 0.6 now on the project's website.  For more info on Cassandra, check out "4 Months with Cassandra, a love story."

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Neo Technology and DataStax are leading the charge for the NoSQL movement.  You can learn more about the Neo4j Graph Database in the project discussion forums and try out the new Spring Data Neo4j, which enables POJO-based development.  You can also see how Apache Cassandra, a ColumnFamily data store, is pushing the boundaries of persistence with cloud capabilities and deployments at SocialFlow and Netflix.

Comments

Endre Varga replied on Tue, 2010/04/13 - 9:44am

Wow, this is cool! Lack of MapReduce was the only missing feature for me.

Kate Lewis replied on Thu, 2010/04/22 - 1:22am

Very informative post indeed! I am interested in knowing more on Mapreduce and Large Data analytics.. This one resource looks great... High Performance Analytics with Hadoop http://www.impetus.com/featured_webinar?eventid=16

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.