We were delighted to be joined on the Data in Biotech podcast by Wolfgang Halter from Merck Life Science for a deep dive into Bayesian Optimization for design of experiments (DOE). This approach to setting up experimental campaigns to leverage the information gained from previous experiments to design new experiments, subject to business and biological constraints, has huge potential in the biotech space to increase the efficiency of experimentation significantly. We won’t go into extensive detail about Bayesian statistics and how Bayesian optimization works, but basically, Bayesian approaches to optimization help you make better decisions about how to balance exploration of new spaces in experiments versus exploitation of information you have already gained about where your highest value experiments are likely to come from. In our conversation with Wolfgang, we focused on how Bayesian Optimization (BO) is moving the needle to improve outcomes in life sciences research through more intelligent experimentation.The potential impact of AI on the diagnosis and development of treatment for neurological diseases is significant. In August last year, a study demonstrated how AI could detect markers of Parkinson’s disease up to seven years before any symptoms appeared. This is the tip of the iceberg, and in our podcast interview with Annemie Ribben, VP Science, Evidence and Trials at icometrix, we delved further into how AI is being used in the neurology space and the developments we can expect in the near future.
Wolfgang joined the Merck team in 2019 as a data scientist for the life science business. Over the past 5 years, he has worked in different positions within the company and now leads the data science and bioinformatics teams, a role he has held since 2021. Before joining Merck, Wolfgang studied Engineering Cybernetics at the University of Stuttgart with a focus on systems biology, before completing PhD in synthetic biology.
Wolfgang explains his interest in this area is motivated by an urge for efficiency. He explains how he sees so many valuable insights that can be derived from data, and the satisfaction of uncovering those insights and using them to improve how things work. The full episode can be found here, and here are the highlights:From the start of our conversation with Annemie, it was clear that her role at icometrix aligns very closely with her personal interest in the intersection between data science and healthcare, particularly neurology. Her passion for the topics we discussed makes this podcast an interesting listen, but if you just want the highlights, here they are:
How BayBE is improving DoE (05:56) Wolfgang introduces BayBE, Merck’s Bayesian optimization toolbox. BayBE is an open-source tool that integrates with Design of Experiments (DOE) to offer an improved approach compared to classical DoE. Classical DoE approaches do achieve the desired outcome but don’t account for previous knowledge in a pattern of iterative experimentation. BayBE-based DoE uses BO to incorporate prior knowledge and probability to give a predictive model that guides scientists on the direction of experimentation that moves them toward their end goal, going so far as to recommend the specific experiments a team should run next.
Expediting drug development (21:00): Wolfgang talks through a specific example of how BayBE is being used in the wild. In this use case, it is being used to find the best viscosity-reducing excipients for drug formulation. Merck has developed a tool that allows customers to test a few excipients with the protein for a particular drug before suggesting the ideal combinations of excipients. By removing the need for drug developers to test every combination, they are able to save significant time and resources and increase their speed to market. In the applications BayBE is supporting, Wolfgang says they see between 30%-50% time savings.
Transfer Learning (25:01): To elevate the time saving from 50% to more like 95% Wolfgang explains how Merck is looking into transfer learning. This is where data from past campaigns is used to inform the approach for new experiments. He describes this as ‘creating a warm start’ for a theory. It moves away from starting with uniform priors (a mathematical representation of no prior knowledge about the experimental domain) and instead uses a set of priors that, based on previous research, is likely closer to the optimum. This allows the aim of the experiment to be achieved more quickly with the safety of knowing that even if the priors are worse than uniform priors, eventually the data will catch up.
Removing barriers to entry (31:16): We discussed with Wolfgang what has stood in the way of BO adoption up until this point. He explains there are multiple factors with the most significant being the gap between developers of DOE methodologies and those using it and the fact that it is ‘computationally hungry’. Merck is working on ways to make using BO more efficient, particularly for large domains with many data points.
Universal Data Model (34:25): When asked what the biggest challenges are in the biotech space, Wolfgang immediately focuses on the need for a universal data model. He explains the current challenges of data collection: too much manual data collection, too many Excel spreadsheets, and a lack of standardized, digitized data. There is a need for structure and standardization, but even current standards are complicating the issue with competing standards and different interfaces making it difficult to achieve a connected data layer to support better R&D.
Further reading: For those wanting to read more about the topics explored with Wolfgang on the podcast, he suggests starting with the Merck website. There is also a github page for BayBE for anyone wanting to look at the toolkit in more detail.
The obvious direction for this week’s continuing conversation is to explore the topic of the universal data model and data incompatibility that Wolfgang identifies as one of the key challenges for the industry. Yes, every company can wait for the ecosystem to do the integration for them, and there are a variety of third-party tools that promise to integrate data together as long as you make their tool the system of record for your business. However, for companies with legacy systems and companies that need to drive value from their data right now (which pretty much covers all companies), it is critical to take ownership of ensuring that internal data is interoperable.
Particularly in the biotech and pharma spaces, the scope of useful data that has the potential to power significant advancement, efficiency, and discovery is huge. The volume of that data that speaks the same language is another matter. At CorrDyn we work with customers to build data pipelines that allow all relevant data to be interpreted in context, but what are the first steps in this process for organizations looking to start their data journeys?
Moving Toward Data Interoperability
The starting point for many organizations is to conduct a data audit. This means looking at the data they are generating, the different forms it takes, the different locations it resides in, and how to integrate them together. For organizations without a dedicated data team, it can be a daunting process, so working with a data specialist is the best way to get their data in order and ready to utilize.
The Role of Natural Language Processing in Data Interoperability
One of the most valuable emerging tools on the journey to a universal data model is natural language processing (NLP). While the large language models (LLMs) available now can sometimes produce inaccurate information when prompted to retrieve knowledge, they excel at extracting valuable information from unstructured text. Biotech and healthcare-adjacent companies have innumerable use cases for entity extraction and normalization, where data that could not previously be organized and analyzed at scale can now be integrated into the full-scale interoperable data system of a biotech company. When combined with automation of analysis and summarization of large and complex text documents, the ability to drive insight throughout a biotech organization is larger than it has ever been.
Data Strategy in Interoperability
At a fundamental level, interoperability begins with bringing stakeholders with data expertise to the table as systems and processes are being developed that will produce meaningful data that the organization will need to analyze in the years to come. In interoperability, an ounce of prevention is worth a pound of cure. By integrating strategic goals and functional requirements from data into every technology, system, vendor, or process selection and configuration decision, companies can set themselves up for a future in which they are not reliant on the industry to standardize interoperable data definitions for each company to exercise control over how they generate value from data.
If you would like to speak to CorrDyn about how to get on top of your data so it can start to support your business goals, get in touch for a free SWOT analysis.
Want to listen to the full podcast? Listen here: