Big Data/Analytics Zone is brought to you in partnership with:

Machine learner and data scientist, Ph.D. from the University of Bonn in 2005, now working as a PostDoc at TU Berlin and as chief data scientist and co-founder at TWIMPACT, a startup focussing on real-time social media analysis. Mikio is a DZone MVB and is not an employee of DZone and has posted 36 posts at DZone. You can read more from them at their website. View Full User Profile

Presentation: Scalability Challenges in Big Data Science

  • submit to reddit

Scalability Challenges in Big Data Science

Yesterday I gave a talk on scalability and machine learning at the BerlinBuzzword conference. I give an overview of different ways to scale data analysis and machine learning methods. I cover MapReduce (of course), large scale training of SVMs via stochastic gradient descent, but also stream mining, and real-time (as you know, “you don’t just scale into real-time”).

The conference continues today, follow the conference on Twitter on the #bbuzz hashtag.

Update: On scribd, the hyperlinks are somehow lost, so here is the list:

Scalable Databases

Multithreadding and Messaging Frameworks


Large Scale Classifier Training

Other frameworks

Stream processing



Published at DZone with permission of Mikio Braun, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)