Big Data/BI Zone is brought to you in partnership with:

Reimagining the way work is done through big data, analytics, and event processing, Chris is the cofounder of Successful Workplace. He believes there’s no end to what we can change and improve. Chris is a marketing executive and flew for the US Navy before finding a home in technology 17 years ago. An avid outdoorsman, Chris is also passionate about technology and innovation and speaks frequently about creating great business outcomes at industry events. As well as being a contributor for The TIBCO Blog, Chris contributes to the Harvard Business Review, Venture Beat, Forbes, and the PEX Network. Christopher is a DZone MVB and is not an employee of DZone and has posted 249 posts at DZone. You can read more from them at their website. View Full User Profile

Curing Cancer with Data Visualization

  • submit to reddit

This post was originally written by Jesse Paquette.

Everyone thought that The Netherlands Cancer Institute’s 12-year-old dataset on breast cancer was old news. That was until a researcher with Merck & Co, Pek Lum, analyzed and visualized the dataset with the use of topological data analysis (TDA) and advanced machine learning technology. Her analysis landed her a featured story in the Wall Street Journal and, more importantly, allowed her to identify a previously unknown subgroup of cancer survivors that exhibited particular traits, a step forward in finding a cure.

Topological Data Analysis


Pek’s experience led her to become the Chief Data Scientist for Ayasdi, the company that created the software that was critical to her discovery. The Ayasdi Platform allowed Pek to leverage TDA to automatically construct a topological network of the cancer dataset. TDA is based on a branch of mathematics known as topology, the study of shape. Being able to use automation to find similarities within extremely complex datasets allows experts to discover new insights from analysis of the shape the data takes.

Let the Data Speak for Itself

Instead of using more traditional methods of asking a question about a particular part of the data, Pek was able to look at the dataset in its entirety, allowing the data to speak for itself. Pek was able to upload the breast cancer dataset into the Ayasdi Platform to create nodes of similar patients based on genetic make up. These nodes were then connected by edges between two or more nodes that shared a common patient. This created a complex venn diagram of relationships within the data. Further analysis of the topological network uncovered the subgroup of survivors and determined what traits made this group unique.

Driving Breakthrough Insights

TDA is changing the way domain experts around the world are looking at their data, and is offering a new framework in which many different algorithms can be cohesively used to automatically examine extremely complex datasets. More importantly, it allows mechanical engineers, financial service analysts, marketers, and oncologists to visualize their data in new ways to drive breakthrough insights.


Jesse Paquette (Lead, Translational Engineer):
Jesse leads the development of new features and pipelines that enhance user experience, particularly in the life science domain. Prior to Ayasdi, Jesse was a computation biologist at UCSF. He maintains an impressive collection of scar tissue, perpetuated be weekly overdoses of pick-up soccer.

Published at DZone with permission of Christopher Taylor, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)