Michael loves building software; he's been building search engines for more than a decade, and has been working on Lucene as a committer, PMC member and Apache member, for the past few years. He's co-author of the recently published Lucene in Action, 2nd edition. In his spare time Michael enjoys building his own computers, writing software to control his house (mostly in Python), encoding videos and tinkering with all sorts of other things. Michael is a DZone MVB and is not an employee of DZone and has posted 49 posts at DZone. You can read more from them at their website. View Full User Profile

Crawling Crowd-Data Spots Side Effects Faster

03.13.2012
| 4058 views |
  • submit to reddit
The social crowd has proven to be powerful, if you can find some way to harness it: crowd-sourcing can perform tasks and solve collaborative problems, crowd-funding can raise substantial financing.

I suspect crowd-data will similarly become an effective way to create large, realistic databases.

A great application of this is the medical world, where many people post to health forums raising medical problems, possible side effects from drugs and vaccines, etc. Why not collect all such posts to find previously undiscovered problems? In fact, this paper describes just that: the authors extracted the nasty side effects of statin drugs based on posts to online health forums. Similarly, this abstract describes a system that used crowd-data to spot nasty side effects from Singulair, years before the FDA issued a warning. The VAERS database, which gathers parent-reported problems after children receive vaccines, is another example.

Unfortunately the drug safety trials that take place before a drug can be released are not especially trustworthy. Here's a scary quote from that interview:

    When you look at the highest quality medical studies, the odds that a study will favor the use of a new drug are 5.3 times higher for commercially funded studies than for noncommercially funded studies.

And that was 7 years ago! I imagine the situation has only gotten worse.

When a new drug is released, the true, unbiased drug trial begins when millions of guinea-pigs start taking taking it. Crowd-data makes it possible to draw conclusions from that that post-market drug trial.

Of course there are challenging tradeoffs: crowd-data, being derived from "ordinary people" without any rigorous standard collection process, can be dirty, incomplete and reflect sampling bias (only people experiencing nasty side effects speak up). For these reasons, old-fashioned journals turn their noses up at papers drawing conclusions from crowd-data.

Nevertheless, I believe such limitations are more than offset by the real-time nature and shear scale the millions of people, constantly posting information over time. Inevitably, trustworthy patterns will emerge over the noise. Unlike the synthetic drug trial, this data is as real as you can get: sure, perhaps the drug seemed fine in the carefully controlled pre-market testing, but then out in the real world, unexpected interactions can suddenly emerge. Crowd-data will enable us to find such cases quickly and reliably, as long as we still have enough willing guinea-pigs!

Fast forward a few years and I expect crowd-data will be an excellent means of drawing conclusions, and will prove more reliable than the company-funded pre-market drug trials.
Published at DZone with permission of Michael Mccandless, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Goel Yatendra replied on Thu, 2013/01/17 - 8:14am

Wow, I was searching for Lucene help for this y libraris a bit new for me now for my aging brains, and found this post - a bit unexpected for me. Mike, did you have unpleasant experience with statins (I don't know you age, but if you did i would be another valuable piece of authentic experience). And thank you very much for Lucene guides

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.