Big Data? How About SMALL Data
Big data is getting a lot of buzz lately, but small data is interesting too. In some ways it’s more interesting. Because of limit theorems, a lot of things become dull in the large that are more interesting in the small.
When working with small data sets you have to accept that you will very often draw the wrong conclusion. You just can’t have high confidence in inference drawn from a small amount of data, unless you can do magic. But you do the best you can with what you have. You have to be content with the accuracy of your method relative to the amount of data available.
For example, a clinical trial may try to find the optimal dose of some new drug by giving the drug to only 30 patients. When you have five doses to test and only 30 patients, you’re just not going to find the right dose very often. You might want to assign 6 patients to each dose, but you can’t count on that. For safety reasons, you have to start at the lowest dose and work your way up cautiously, and that usually results in uneven allocation to doses, and thus less statistical power. And you might not treat all 30 patients. You might decide — possibly incorrectly — to stop the trial early because it appears that all doses are too toxic or ineffective. (This gives a glimpse of why testing drugs on people is a harder statistical problem than testing fertilizers on crops.)
Maybe your method finds the right answer 60% of the time, hardly a satisfying performance. But if alternative methods find the right answer 50% of the time under the same circumstances, your 60% looks great by comparison.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)