Explore UCD

UCD Home >

Paula Petrone

HHIT Series Episode 4: Biomedicine and digital health, with Paula Petrone

Physicist by training and biomedical data scientist by trade, Petrone says biomedical data science is “a very hot topic” and “an area of huge impact”. While data scientists have formal training in developing predictive models and in machine learning techniques, she says “all scientists are data scientists because we analyse data”. There are many routes into the data science field: physics, engineering, informatics, biology and mathematics.

“What is most important is that people have a passion for healthcare and medicine.”

Before she joined ISGlobal, Petrone spent 15 years working in pharmaceutical companies and in research and development for start-ups. The common denominator has been an interest in patient care and in “thinking about what makes us sick and what keeps us healthy and alive”.

Synthetic Data 

One of Petrone’s research areas is biomedical image analysis, which uses a combination of artificial intelligence (AI) and non-invasive imaging technologies for the early detection of disease. 

But despite all of the “hype” around AI in healthcare, “the implementation of AI tools in the clinic is quite scarce”.

She points to the lack of patient data to validate AI models for use in clinical settings and notes how people seem reluctant to use digital health technology, citing 4% adherence to digital health apps such as those that monitor glucose and track migraines.  

“Basically I think what we need is more data in order for models to be validated and really applicable.”

To potentially bridge that data gap is synthetic data, artificial data that is generated from original data and a model that is trained to reproduce the characteristics and structure of the original data.

“ Like you can use AI to create a painting that looks like a Picasso or a melody that sounds like Mozart, you could in principle also create patient data that are synthetic or not real. And hopefully we can also use those images to train our models. ” 

The value of this  “incipient technology” is “not yet fully validated” and has limitations. 

“We hope that synthetic data will fill in the gaps in datasets that are incomplete.”

Newborn Solutions

ISGLobal has collaborated with a start-up called Newborn Solutions, to (opens in a new window)develop an affordable device that uses ultrasound technology to scan the brains of newborn babies in areas of need like Africa, for early prediction and treatment of infant meningitis. 

“This is a way of assessing the amount of infection in the brains of newborn babies in a way that is non-invasive.”

While developing AI models to count the amount of white blood cells in the cerebrospinal fluid in the brains of babies - to assess severity of infection - they made a fascinating observation.

“What makes this project different from other imaging projects is that usually you can see the difference between healthy and sick babies. In this case, a human cannot resolve the difference in the two images but the deep learning algorithm can. It is the first time that I encountered such a project.”

Algorithms for All

One algorithm or model may not fit all situations or populations. The generalisation of detection algorithms is an ongoing challenge. This has been the case with the ISGlobal/Newborn Solutions project. 

Differences in factors such as physiology, demographics and cultural background can lead to variations in data, which can affect the performance of a model. 

“ We find that our AI models do not always work for all of the populations. One of the limitations of AI for healthcare is that some models have good performance only for a specific sector of the population which is very well represented in the training data set.  ” 

To ensure a model's effectiveness, it is essential to assess whether the data used to train it is representative of the population it will be applied to. In doing this, researchers and practitioners can develop models that are more robust and reliable for a wide range of applications.

Alzheimer’s Disease

Another interesting project I carried out at the Barcelona Brain Research center as a postdoc focused on a machine learning algorithm with the potential to detect early signs of Alzheimer's disease in patients who are still considered asymptomatic, that is healthy. By training our classification models on patients who have amyloid deposition in the brain and those who do not, the algorithm could identify patterns or signatures that are indicative of preclinical Alzheimer's disease. 

“Using Explainable AI we found that the signature was present in various regions such as the hippocampus and the olfactory areas, the loss of smell being one of the earliest signatures of Alzheimer's disease.”

This shows that AI has the ability to capture subtle signatures of a disease, even before it becomes clinically apparent, which could potentially lead to earlier diagnosis and intervention.

Donating Data

Petrone makes the point that we may be too quick to share our personal data with apps and services without fully understanding how it will be used or where it will go - yet we are hesitant when it comes to sharing our data for scientific research. 

“We are sometimes very eager and very generous with our data when we get something in return. But then when it's about research and biology, a lot of people bring up the confidentiality issue.”

She raises the question of whether we should be more open to allowing science to happen by sharing our data, especially if it could lead to important breakthroughs in fields like biology.  But it is important to find a balance between sharing data for research and maintaining confidentiality and privacy.