Improving Cancer Treatment Using Machine Learning and Big Data Analysis

Written by: Tapesh Santra
Written on: Tuesday, 09 May, 2017

One in ten women in Ireland will develop breast cancer at some stage of their lives. There are several treatment options available, but these are expensive and have poor response rates. Existing diagnostic tests have limited capability in predicting whether a breast cancer patient will respond to a treatment before the treatment starts. Hospitals across the world waste millions of euros each year, just by giving the wrong treatment to the wrong breast cancer patient. In a recent project, we focused on Lapatinib, a commonly used drug for treating breast cancer. We used big-data sets and machine learning techniques to (a) find genetic markers for predicting response of breast cancer patients to lapatinib and (b) find another drug that, when administered in combination with Lapatinib, can increase its response rates among breast cancer patients.

We analyzed ~ 1.056 million noisy, irregularly sampled, variable length time series, containing temporal expressions of ~22000 genes in four breast cancer cell models treated by placebo and two doses of Lapatinib. Two of these cell models respond to Lapatinib and the rest do not. The objective was to identify genes that responded to Lapatinib in the responsive cells but did not respond in the non-responsive cells. This amounts to performing n-way ANOVA on noisy, irregularly sampled, variable length time series data. To the best of our knowledge, there are no existing algorithm that can perform this analysis. So we developed our own algorithm called GEAGP which fits Gaussian Process (GP) models to individual time-series, uses the fitted models to compare time series profiles of each gene across multiple cell lines and treatment conditions, performs multiple test correction and uses a boolean logic model to filter desired sets of genes. The results of above analysis were then vetted against existing drug response, protein interaction, transcriptional interaction and gene ontology databases, leading to a small set of biomarkers which may be useful in predicting Lapatinib response in breast cancer patients.

Furthermore, we used Cox-regression on patient survival data to show that the expressions of some of the genes identified in the above study have significant correlation with the survival of different groups of breast cancer patients. Vitamin D receptor (VDR) which is activated by Vitamin D intake is one of these genes. In follow-up biochemical experiments we found that some of the breast cancer cells which do not respond well to lapatinib, may become more sensitive to this drug if applied in combination with vitamin D. These early results suggest that taking Vitamin D, an inexpensive food supplement that can be bought over the counter, may help breast cancer patients who have innate or acquired resistance to lapatinib.

About the Author

Dr Tapesh Santra leads the Statistical Inference, Data Integration and Network Science Group at SBI.


Further Reading

Santra T, Roche S, Conlon N, O'Donovan N, Crown J, O'Connor R, et al. (2017)
"Identification of potential new treatment response markers and therapeutic targets using a Gaussian process-based method in lapatinib insensitive breast cancer models." 
PLoS ONE 12(5): e0177058.