With MarketsandMarkets forecasting that the health data market will reach $900 million by 2028, it was fantastic to sit down with Vera Mucaj, CSO at Datavant, to discuss how data is changing the healthcare landscape. Her background gives her real insight into both the scientific and technological challenges of using health data and what needs to be done to ensure it has a positive impact on both patients and the wider industry.
The Interview
Guest Profile
Vera describes herself as a scientist turned technologist. A research scientist by training, she holds a BA in Biochemistry from Harvard College and a PhD in Cell and Molecular Biology from the University of Pennsylvania Perelman School of Medicine. Vera is currently Chief Scientific Officer at Datavant, a technology company that focuses on connecting health data to improve patient outcomes.
The Highlights
During our conversation with Vera, we took a deep dive into the topic of health data, covering the issues of data fragmentation and availability, how data can be used to support better research and decision-making in healthcare, and the challenges of semantic interoperability in datasets. Here are some of the highlights in more detail:
- Privacy-Preserving Record Linkage (6:25): Also known as tokenization, Vera starts by digging into the need for privacy-preserving record linkage. One of the key areas of focus at Datavant, it allows real-world patient data to be accurately linked across multiple datasets while ensuring a patient's privacy is preserved and their anonymity maintained. It is particularly beneficial for HEOR (Health Economics and Outcomes Research) and enables large-scale, long-term research.
- Improving Longitudinal Studies (17:59): When it comes to novel therapies, for example, cell and gene therapies, regulators are asking for 10-15 years of follow-up data on trial patients. One of the challenges of achieving this is the trials are often conducted on pediatric patients, and as they go on and live their lives, getting the follow-up data from someone moving around, going to college, etc., makes it difficult to ask them to attend a clinic for tests every six months. Tokenization allows for more passive data collection. It means information that is being recorded elsewhere can be pulled through for those follow-up studies relieves the burden on patients and makes it easier to gather more information as part of the ongoing research.
- Improving Longitudinal Studies (17:59): When it comes to novel therapies, for example, cell and gene therapies, regulators are asking for 10-15 years of follow-up data on trial patients. One of the challenges of achieving this is the trials are often conducted on pediatric patients, and as they go on and live their lives, getting the follow-up data from someone moving around, going to college, etc., makes it difficult to ask them to attend a clinic for tests every six months. Tokenization allows for more passive data collection. It means information that is being recorded elsewhere can be pulled through for those follow-up studies relieves the burden on patients and makes it easier to gather more information as part of the ongoing research.
- Using AI to Improve Healthcare Data Quality (26:05): AI is starting to be used more and more not only to analyze data but also to improve its quality by enriching it with other data and making additional types of unstructured data useable. Vera cites a number of examples; doctor's notes, genomic information, and medical imaging – they are incredibly informative and valuable but difficult to standardize. AI technology can interpret this data, de-identify it, and allow it to be exchangeable in a way that provides even greater benefits to medical research.
- Education For Everybody (34:22): When it comes to ensuring that tokenization as a way to deidentify data can reach its full potential in allowing data to be exchanged, a huge education piece is needed. This includes educating clinical trial coordinators and centers so they fully understand how technology preserves patient privacy. It allows them to see the potential for a better understanding of trial outcomes when trial data is bolstered by broader real-world data. This then allows them to educate participants both on the safety of their medical data and also that the ongoing monitoring post-trial can be done in a passive way.
- Semantic Interoperability Between Datasets (39:32): The last major talking point of the podcast is on the topic of data exchange. The aim of Datavant is to make it possible to exchange medical data safely and securely at the touch of a button. The issue currently is even if that button existed, what do you do with it next? “How do you standardize it? What is the right common data model to exchange it across different researchers?” This is something Datavant is looking into with expert determination, including bringing together expert statisticians who are trained in healthcare data to take a look at data, analyze individual datasets, and combine datasets to limit re-identification risk.
Further Reading: Alongside FDA and AMEA guidance documents that Vera recommends as a resource on how to use real-world data for regulatory decision-making. She suggests a DataVant blog on “The Fragmentation of Health Data” as a good primer for anyone new to the industry.
Continuing the Conversation
The podcast with Vera really highlighted one of the big problems facing the life sciences industry. The problem is not with data availability; the quantity of data available is more than doubling year on year. Instead, what we have is a data logistics problem.
The root of this issue is the compliance landscape for patient data, where patients are rightfully protected from having their data shared without their consent. But health data touches every element of a patient’s life, across multiple healthcare providers, diagnostic devices, illnesses, therapeutic interventions, insurance claims, and lifestyle choices. And it is not enough to have data from a single snapshot of a given point in time. Multiple readings over time are necessary to understand the impact of interventions and the mitigating factors that might explain variance in impacts across patients. So data for biotech is data that encompasses a person’s entire lived experience. In this environment, anonymized data that is integrated at the patient level is not just useful; it is essential to answering the most important questions biotech organizations ask about how to improve health outcomes for the patient population.
To say that data exists in “silos” is an understatement. The truth is that the data is purposefully partitioned into anonymized individual measurements that cannot be connected along the longitudinal time dimension nor along the horizontal “healthcare touchpoint” dimension for privacy reasons. It is great to see researchers, hospitals, and biotech companies finding ways to build a better ecosystem for data sharing and integration that meets the analytical needs of the biotech community for therapeutic insights while also protecting patient privacy. Unfortunately, many of the patients represented by this data have no idea that their data is being shared and integrated, nor the value for society that this data can potentially drive. An optimal model would bring the patient to the table as well, while integrating a much higher proportion of patient healthcare touchpoints over time. Yet explaining the value proposition to a patient is near impossible, so hiding the ability to share anonymized data in the terms and conditions of each healthcare touchpoint seems the only achievable option right now. In this environment, anonymized and linked data vendors like Datavant are essential to liberate higher quality datasets that can produce the insights biotech organizations need.
In the meantime, it is incumbent on each organization to develop a strategy for data acquisition that is targeted toward critical areas of need inside the organization (aligned with the R&D, manufacturing, and commercial sides of the business), and also to consider how to support stakeholders across the organization maximize the ROI and minimize time to value from new and existing data assets. These are places where CorrDyn can help close the gaps between the organization of today and the one you strive to become.
If you're interested in discovering how your organization can unlock the value of data and maximize its potential, get in touch with CorrDyn for a free SWOT analysis.
Want to listen to the full podcast? Listen here:
Tags:
January 24, 2024