Big Data

  • submit to reddit

R Quick Tip: Shutdown Windows after Script Has Finished

Quite often I have long procedures running and want to do this over night. However, my computer would still be running all night after the script has finished....

0 replies - 999 views - 05/24/13 by Kay Cichini in Articles

Building Human Fault-Tolerant Systems

In this really excellent talk from Strata 2013, Twitter's Nathan Marz walks through the challenges and serious rewards of building systems that are resilient...

0 replies - 1296 views - 05/23/13 by Eric Gregory in Articles

Big Data Overview and Cassandra Plunge

Thanks everyone for coming out last night.  We plowed through a lot of material. I posted the slides here: Big data philly_jug from Brian...

0 replies - 1224 views - 05/23/13 by Brian O' Neill in Articles

Solr 4.2: Index Structure Reading API

With the release of Solr 4.2 we’ve got the possibility to use the HTTP protocol to get information about Solr index structure. Of course, if one wanted to do...

0 replies - 2355 views - 05/22/13 by Rafał Kuć in Articles

Data Science at LinkedIn

Data scientist Monica Rogati discusses data scaling at LinkedIn and reflects on the evolving role of the data scientist:

0 replies - 2030 views - 05/21/13 by Eric Gregory in Articles

How Does a Search Engine Work? An Educational Trek Through A Lucene Postings Format

A new feature of Lucene 4 – pluggable codecs – allows for the modification of Lucene’s underlying storage engine. Working with codecs and examining their...

0 replies - 5355 views - 05/21/13 by Doug Turnbull in Articles

"Say Goodbye to Anonymity"

In case you missed 60 Minutes on CBS the other night, there’s a new challenge to privacy that is coming faster than people realize and...

0 replies - 1866 views - 05/21/13 by Christopher Taylor in Articles

Quantifying Scientific Consensus, Zombies in R, and More Data Links

Several posts and articles, this week, starting with this nice“Quantifying the consensus on anthropogenic global warming in the scientific...

0 replies - 1156 views - 05/21/13 by Arthur Charpentier in Articles

Bucketing, Multiplexing and Combining in Hadoop - Part 1

This is the first blog post in a series which looks at some data organization patterns in MapReduce. We’ll look at how to bucket output across multiple files...

0 replies - 1366 views - 05/20/13 by Alex Holmes in Articles

Pi as Non-Optimal Random Number Generator and More Data Links

Several interesting posts, here and there, this weekFrom April 26th until May 7th, 1986, in Europe, via http://ucsusa.org/news/… see“Why have so...

0 replies - 925 views - 05/17/13 by Arthur Charpentier in Articles

Big Data Ends the Era of Hunches

I made great money in college recruiting people for focus groups. Those were tough days for students in a state (New York) where the minimum wage was only...

0 replies - 2631 views - 05/17/13 by Christopher Taylor in Articles

Data News: The Wait for Natural Language Processing and More

The most interesting post read those past days is, without any doubt,“Liberal Wonk Blogging Could Be Your...

0 replies - 865 views - 05/16/13 by Arthur Charpentier in Articles

Educating the Planet and Graph Databases

Pearson is striving to accomplish the ambitious goal of providing an education to anyone, anywhere on the planet. New data processing technologies and...

0 replies - 1085 views - 05/15/13 by Marko Rodriguez in Articles

Hive with HBase Quickstart

Though there is some decent documentation, I found that setting up Hive with a HBase back-end to be somewhat fiddly. Hopefully this guide will help you...

0 replies - 2430 views - 05/14/13 by Chase Seibert in Articles

Eating Dogfood with Lucene

Eating your own dog food is important in all walks of life: if you are a chef you should taste your own food; if you are a doctor you should treat yourself...

1 replies - 1884 views - 05/14/13 by Michael Mccandless in Articles