Jesse Johnson on Managing Data as a Biotech Start-up, the Need for Modular Tools, and the Role of Large Language Models

One of the goals of Data in Biotech is to build a community of professionals interested in how data can impact biotech organizations and create a space for discussion. Only one month in, we are starting to see that community grow as one of our previous guests, Markus Gershater from Synthace, put us in touch with Jesse Johnson so we could get his take on how biotechs can make the most of their data.

The Interview

Guest Profile

Jesse Johnson began his career as an academic mathematician before transitioning to software engineering at Google. He then ventured into the biotech world, working at startups like Solarity and Dewpoint Therapeutics in roles that combined data science and software engineering. Currently, Jesse is an independent consultant, helping early-stage biotech start-ups better manage their data.

The Highlights

In the podcast, we spoke with Jesse about the challenges faced by biotech startups in managing their data, the potential of automation and machine learning, and the role of software and hardware vendors in the biotech industry. Here are our top highlights:

Managing data as a start-up (6:54): Jesse kicks off by explaining some of the challenges biotech start-ups have when trying to prioritize and unify their data from different experiments and assays. The data is often generated across different equipment, in different formats, and stored across multiple devices, making it challenging to establish repeatable data processes that can produce consistent data insights. However, helping organizations design and organize their data from the beginning is a worthwhile process that can save time and resources and is particularly valuable for series B funding rounds, where it can help accelerate processes and is often seen as an asset to investors.
The need for modular tools (16:47): Start-ups are facing a significant challenge in finding the right data tools. The current trend is to build broad platforms with a lot of functionality. In reality, a start-up needs only a fraction of the power in these platforms, but the consequence can be unintended lock-in that hinders the process of scaling up further down the line. The industry needs a more modular approach that will reduce overheads and allow start-ups to iteratively bring in granular components as they need them.
AI/ML Fact or Fiction (26:30): When asked if it was fact or fiction that AI and automation would automate all R&D processes within 10 years, Jesse started by outlining that automation experts predict that all lab processes will be automated in the next 5-10 years. Once the lab is automated, it becomes much easier to get the relatively clean data into an automated pipeline and greatly increases the chances of that automation. When asked, “is data more important than machine learning models in biotech?” Jesse agrees, concluding, “Models are fun, but if you don't have the data, they don't work.”
Using digital twins in the lab environment (36:40): There is a need to know where everything is in the lab at any given time. Not just inventory, but also the status of every experiment, from early-stage discussions to those that have already been completed. Jesse discusses why this is operationally essential to make everything run smoothly and avoid bottlenecks. His view is that a digital twin of the lab environment can be used for real-time tracking of inventory and experiment flow. Part of this is using a database to integrate data science into the production process of biotech research, allowing for seamless understanding and recording of all steps in the wet lab. Biotech startups can then plan and record experiments, making it easier for data scientists to analyze and make changes.
The Role of Large Language Models in Biotech (39:30): Jesse gives his take on the potential of large language models like GPT in the biotech industry. He sees them primarily being used as a learning tool and as a bridge to writing specialized content. These models can assist in generating long-form pieces that can then be checked and fine-tuned by humans. This reduces the manual effort needed without impacting the quality of what is produced. While the application of these models is still in its early stages, they can be trained to learn best practice and there is certainly a good use case for this type of technology.

Further reading: As is our Data in Biotech tradition, we asked Jesse for his reading recommendations. He suggested Kaleidoscope's blog post on the different phases that biotech startups go through and Benn Stancil’s newsletter from a few weeks ago on why big solutions can be hard to adopt and why organizations often only need 10% of the solution created. In addition, Jesse publishes a weekly ‘Scaling Biotech’ newsletter that you can subscribe to here.

Continuing the Conversation

One of the most interesting things that Jesse highlighted in the podcast was the cultural differences between the wet lab and dry lab teams and how that drives their differing perspectives with regard to data capture throughout the experimentation process.

Biologists, by nature, are used to following complex processes through their interaction patterns and trying to discern through experimental procedures what mechanisms are driving those interactions. They are very much in the weeds of determining what is happening to a particular cell in a particular environment with an expectation about what will happen under different conditions. They tend to view each experiment in isolation, rather than seeing how it relates to experiments conducted by other team members.

Data teams, by nature, are used to integrating information from multiple sources, converting the data into a format that can be easily explored, analyzed, aggregated, and presented to different stakeholder groups. Their approach to problems is about viewing them through a lens of the elements that can be efficiently represented across all experiments and all categories of experimental metadata. They typically see each experiment as an instantiation of a larger experimental data model, into which every experiment should fit neatly.

In the context of those differences in culture, mentality, and analytical approach, it is easy to understand how conflicts emerge between wet lab and dry lab scientists. Yet these teams need each other to be successful. A wet lab scientist can see an anomaly and immediately generate hypotheses about its origin. A data team member can come up with efficient computational approaches to measuring phenomena in a dataset that would require much longer if the wet lab scientist were asked to work on the same problem. Having an expert present who can help to navigate these cultural differences, and help each stakeholder group feel their concerns and needs are being addressed, enables R&D organizations to get out of their own way and collaborate on their common goals.

If you're interested in discovering how your organization can unlock the value of data and maximize its potential, get in touch with CorrDyn for a free SWOT analysis.

Want to listen to the full podcast? Listen here:

Tags:

Post by Ross Katz
November 30, 2023

Jesse Johnson on Managing Data as a Biotech Start-up, the Need for Modular Tools, and the Role of Large Language Models

Jesse Johnson on Managing Data as a Biotech Start-up, the Need for Modular Tools, and the Role of Large Language Models

The Interview

Guest Profile

The Highlights

Continuing the Conversation

Tags:

Get in Touch