Q: What role does data science have in drug development today?
Data is driving change in the drug development space. Drug development is complex, resource-heavy, and very costly. Collaborating across diverse specialisms – clinicians, chemists, geneticists, epidemiologists – scientists contribute their expertise. Notably, data scientists play a critical role in research, utilizing state-of-the-art data science and engineering to optimize resource prioritization. Machine learning algorithms prove invaluable in analytically optimizing drug targets, enhancing the efficiency of the development process.
Despite community-wide efforts, the success rates in phase III trials remain relatively low, emphasizing the need for advancements to streamline drug development, a process that typically spans 10–15 years with an average cost of $1 billion.
Q: How can data science help focus resources on the most promising compounds?
Over the last decade, significant strides have been made by academic teams and human geneticists in identifying thousands of disease-associated genes. This wealth of genetic information has culminated in the formation of expansive data libraries, such as biobanks. Recognizing an opportunity, drug developers now prioritize candidate compounds that target genes believed to be causative of diseases.
The impact is promising as retrospective data analysis reveals a two to three-fold increase in successful development rates. This success has prompted a paradigm shift, with an increase in the development of computational methodologies aimed at refining causal gene identification, ultimately improving the efficiency of drug development.
Q: What are the main obstacles?
The challenges are plentiful. Prioritizing causative genes involves discerning the stronger candidates, with a focus on genes robustly linked to a multitude of biological traits shaping disease pathology. However, the scale of computational complexity poses a massive challenge. Threading the needle of causal associations across a vast biological network is a formidable task, surpassing the calculations involved in the number of particles in the known universe.
Addressing this computational challenge is a central focus of our Data Science team as we attempt to enhance the prioritization of genes in the drug development pipeline.
Q: Have you been able to overcome these challenges successfully?
During my time at the University of Cambridge, our team made important progress. We developed a machine learning algorithm that significantly reduces the mathematical and computational problem, supporting the linkage of causal genes to thousands of biological traits. This breakthrough is detailed in our recent publication in the academic journal Nature Communications, titled A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits.
The journey continues, but the rewards are significant: drugs are developed more efficiently with side-effects and adverse events pre-empted, monitored, and responded to more quickly.