Data in Biotech

Synthesizable by Design: Rethinking AI's Role in Small Molecule Drug Discovery

In this episode of Data in Biotech, host Ross Katz sits down with Paul Finn, Chief Scientific Officer at Oxford Drug Design, for a conversation on what it actually takes to find a drug molecule that works not just on paper but also in the lab, in the cell, and, ultimately, in the clinic. Paul brings four decades of experience across what became GSK, Pfizer, and a series of Oxford-area spinouts and has shepherded a compound all the way to a marketed drug. That perspective gives him a particular kind of skepticism toward AI results that look too good to be true because he's done the work of checking whether they are. The conversation moves through synthesizability as a first-class constraint, why chemistry has proven so much harder for AI than biology, how 3D molecular representation gets closer to the physics that actually matters, and what rigorous multi-parameter optimization looks like when you're trying to kill cancer cells and drug-resistant bacteria at the same time. What you'll learn in this episode: >> Why synthesizability is chronically underestimated and why changing a single atom in a structure can take a molecule from trivially easy to make to practically impossible >> How Oxford Drug Design constrains the generative search to reaction schemes and purchasable building blocks, and why that chemical space is still so vast that novelty is not meaningfully sacrificed >> Why most generative AI models learn from a 2D string representation of a molecule; two steps removed from the 3D physics that govern how a drug actually binds to its target >> How Bayesian optimization over reagent space, rather than molecular space, allows an active learning loop to focus on the structural patterns associated with activity >> Why benchmarking complex models against simple ones is the discipline that exposes false correlations and why Paul and his co-authors were able to recover the Halicin result using methods decades older than deep learning >> What a pharma company should actually ask an AI drug discovery vendor before buying what they're selling Meet our guest: Paul Finn is Chief Scientific Officer at Oxford Drug Design, a computational drug discovery company with roots in Oxford's chemistry department. His career spans over 40 years of computational drug discovery, from early structure-activity modeling in the 1980s through to modern generative AI methods, with deep experience at what became GSK and Pfizer before moving into the Oxford spinout ecosystem. At Oxford Drug Design, Paul leads internal programs in oncology and antibacterial resistance, combining novel computational methods with a rigorous, synthesizability-first approach to multi-parameter optimization. Connect with Paul Finn on LinkedIn: https://uk.linkedin.com/in/paul-finn-2250616 About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn: https://www.linkedin.com/in/b-ross-katz/ Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn. https://www.linkedin.com/company/corrdyn/

Transcribe →

From Tissue to Mechanism to Decision: Building AI for Computational Oncology

Jun 200:46:54Tap to summarize

In this episode of Data in Biotech, host Ross Katz sits down with Arvind Rao, Professor of Computational Medicine and Bioinformatics at the University of Michigan, for a discussion on the gap between what biomedical AI can do and what it can reliably be trusted to do in clinical practice. Arvind's research sits at the intersection of computational oncology and AI governance and his lab works across H&E histopathology, multiplex immunofluorescence, spatial transcriptomics, and single-cell RNA sequencing, not just to build predictive models, but to understand the full lifecycle from data to model to inference, and to ask where that lifecycle can be trusted and where it can't. The conversation moves through two of his recent papers on SPIFEE, a graph-based framework that replaces scalar interaction scores in the tumor microenvironment with spatially resolved functional representations, and a multimodal framework that traces a path from stained tissue slides to nominated drug targets via morphological pattern discovery and spatial transcriptomic mapping. What you’ll learn in this episode: >> Why the field's central failure is not algorithmic but translational and the gap between a model that performs well on a benchmark and one that can be consistently trusted in a high-stakes clinical setting >> How SPIFEE replaces the conventional scalar edge representation of cell-cell interactions in the tumor microenvironment with spatially resolved functional edges >> How Arvind's multimodal framework moves from H&E pathology slides labeled with clinical outcomes, through morphological pattern discovery via multiple instance learning, to spatial transcriptomic mapping, to the nomination of molecular mechanisms and actionable drug targets >> Why Goodhart's Law applies directly to foundation model evaluation in biology >> What the AI literacy gap costs when it goes unaddressed in healthcare and pharma organizations Meet our guest: Arvind Rao is a Professor of Computational Medicine and Bioinformatics, with a joint appointment in Radiation Oncology, at the University of Michigan. His research focuses on establishing trust in biomedical AI predictions across the full data-to-decision pipeline, integrating H&E histopathology, spatial transcriptomics, multiplex immunofluorescence, and single-cell RNA sequencing to build models that are predictive, interpretable, and biologically credible. Alongside his research, Arvind develops AI literacy programs for healthcare and pharma professionals, helping clinical and procurement teams evaluate and govern AI systems with the rigor those decisions demand. Connect with Arvind Rao on LinkedIn: https://www.linkedin.com/in/arvind-rao-3301301ba/ About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn: https://www.linkedin.com/in/b-ross-katz/ Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn. https://www.linkedin.com/company/corrdyn/

Transcribe →

Cavities in the Data: Building FDA-Cleared AI for Dental Imaging with Overjet

May 1300:58:42Tap to summarize

In this episode of Data in Biotech, host Ross Katz sits down with Sadegh Salehi, Director of Research and Principal Scientist at Overjet, to explore what rigorous model evaluation actually looks like when the stakes are clinical. Overjet builds FDA-cleared vision models that detect and quantify dental disease across billions of X-ray images from thousands of practices - a data problem with a staggering number of dimensions. Thirty-two teeth per adult patient, each with different morphology. Multiple image types capturing different anatomy. Fifteen to twenty sensor manufacturers producing perceptually distinct images, each with different contrast, resolution, and noise characteristics. And disease severity distributions ranging from barely visible early-stage decay to obvious pathology. Sadegh walks through what it takes to evaluate models responsibly across all of those dimensions and discusses why aggregate metrics like F1 score can mask catastrophic failures on specific subgroups, how models find and exploit shortcuts in training data, and why the same flawed sampling that creates gaps in your training set also creates them in your test set. He also traces Overjet's architectural evolution from over twenty narrow task-specific models to a single foundation model they call Unity, explains how treatment plan procedure codes provide a noisy but real production feedback signal, and describes how Overjet became one of the first companies to secure the FDA's Predetermined Change Control Plan (a framework that allows model updates without filing a new clearance each time.) What you’ll learn in this episode: >> Why aggregate evaluation metrics are insufficient for high-stakes medical AI >> How models exploit shortcuts in training data: if all images from a rare sensor in the training set happen to be healthy, the model doesn't learn to read that sensor, it learns that the sensor means healthy, bypassing the visual task entirely and producing systematic false negatives in production >> How Overjet evolved from over twenty narrow, sensor-specific and indication-specific models into a single foundation model called Unity, using noisy labels generated by the small models as the training signal for a much larger backbone, then building independent prediction heads for each clinical indication on top of it >> Why the decision to keep prediction heads architecturally independent from one another was driven as much by FDA regulatory strategy as by modeling considerations >> How Overjet uses dental treatment plan procedure codes as a production monitoring signal Meet our guest: Sadegh Salehi is Director of Research and Principal Scientist at Overjet, where he leads the team responsible for building, evaluating, and deploying FDA-cleared vision models for dental disease detection and quantification. Connect with Sadegh Salehi on LinkedIn: https://www.linkedin.com/in/sadegh-salehi/ About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn: https://www.linkedin.com/in/b-ross-katz/ Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn. https://www.linkedin.com/company/corrdyn/

Transcribe →

Data as a Moat: Why Biotech's Most Valuable Asset is Buried in a Hard Drive

Apr 3000:42:52Tap to summarize

In this episode of Data in Biotech, host Ross Katz sits down with Jesse Johnson, founder of Merelogic, a software consulting firm specializing in data infrastructure for biotech organizations. Jesse brings a rare perspective to the conversation: having built data systems at Google where engineers control the data collection function end to end, before moving into biotech, where the biology does what it wants and bench scientists, not engineers, generate the data. The result is a grounded, pragmatic take on one of the most consequential and underappreciated questions in life sciences right now: as bio foundation models fundamentally change the value equation for experimental data, are biotech labs structured to capture that value? Jesse argues the answer is usually no and that the fix is less technical than most assume. It doesn't require a production-grade data pipeline or a cloud architecture. It requires lightweight, human-readable standard operating procedures, clear expectations between computational and wet lab teams, and a data strategy designed not just for the questions you're asking today, but for the ones you don't yet know you'll need to ask. What you’ll learn in this episode: >> Why the transition from tech to biotech requires a fundamental reset of assumptions about data infrastructure and why the biggest difference isn't technical, it's organizational. >> How bio foundation models have flipped the value equation for experimental data by reducing the cost of organizing it while dramatically increasing the potential return >> How the strategic value of proprietary data is evolving in the biotech ecosystem, from Tahoe Therapeutics building an acquirable single-cell dataset to Eli Lilly's Lowe lab using data as currency for partnerships >> Why electronic lab notebooks aren't going anywhere and how the real question facing biotech software teams isn't whether to use an ELN, but how to balance schema rigidity against the flexibility required for the long tail of one-off exploratory assays that no automation pipeline will ever fully capture Meet our guest: Jesse Johnson is the founder of Merelogic, a software consulting firm that works with biotech and biopharma organizations on data infrastructure and data operations strategy. Jesse writes regularly about data strategy for biotech on his Substack, covering topics from bio foundation model adoption to the evolving role of electronic lab notebooks in an AI-augmented research environment. Connect with Jesse Johnson on LinkedIn: https://www.linkedin.com/in/jesse-johnson-biotech/ Follow Merelogic on Linkedin: https://www.linkedin.com/company/merelogic/ About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn: https://www.linkedin.com/in/b-ross-katz/ Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

Transcribe →

Markus Gershater on Why Experimental Design in Biotech is Broken and How to Fix It

Apr 1500:41:09Tap to summarize

This week, we're delighted to be joined by Markus Gershater, Chief Scientific Officer and CoFounder of Synthace - a digital experiment platform built for high-performance life science R&D teams to help them run more powerful experiments and accelerate scientific progress. Host Ross Katz speaks with Markus on what’s broken about: What’s broken about how laboratories across the globe run biological experiments at scaleThe opportunity that exists for researchers in striving to achieve experimental design automation processesWhy, as an industry, we must move towards implementing infrastructure that enables multifactorial experiments versus one-factor experimentsAnd the role of AI in making sense of complex systems. --- If you’re a biotech company struggling to transform your business with data, CorrDyn can help. Whether you need to supplement existing technology teams with specialist expertise or launch a data program that lays the groundwork for future internal hires, you can partner with Corrdyn to unlock the potential of your business data - today. Visit connect.corrdyn.com/biotech to learn more. --- Data in Biotech is a fortnightly podcast exploring how companies leverage data to drive innovation in life sciences.

Transcribe →

Data Science and Diagnostic Models - the What, Why and How with Michelle Wiest

Apr 1500:44:05Tap to summarize

This week, we're delighted to be joined by Michelle Wiest, Director of IVD Biostatistics at Freenome - a high-growth biotech company that creates tools to help prevent, detect, and treat disease. Host Ross Katz speaks with Michelle on the use of biostatistics in the field of diagnostics, what biases can corrupt diagnostic tests and how to catch them early, the different types of data sets that are being used to develop diagnostic models and how to prepare to present data to regulatory bodies such as the FDA. --- If you’re a biotech company struggling to transform your business with data, CorrDyn can help. Whether you need to supplement existing technology teams with specialist expertise or launch a data program that lays the groundwork for future internal hires, you can partner with Corrdyn to unlock the potential of your business data - today. Visit connect.corrdyn.com/biotech to learn more. --- Data in Biotech is a fortnightly podcast exploring how companies leverage data to drive innovation in life sciences.

Transcribe →

The Patient is Not a Document: Foundation Models for Biomedical AI with Standard BioModel

Apr 1500:49:54Tap to summarize

In this episode of Data in Biotech, host Ross Katz sits down with Kevin Brown, co-founder of Standard BioModel, to explore one of the most ambitious projects in biomedical AI, building a multimodal foundation model that represents the full complexity of a patient across time. Drawing on a career spanning brain-computer interfaces, computer-aided diagnosis at Siemens Healthineers, and oncology data science at Bristol Myers Squibb, Kevin shares the scientific and philosophical journey that led him to a single conviction: a patient is not a document. Rather than reducing a patient to clinical notes, ICD-10 codes, or isolated test results, Standard BioModel's approach maps every available modality - CT imaging, digital pathology, genomics, EKGs, longitudinal EHR data - into a shared latent space, and models how that patient moves through time. The result is a framework designed not just for prediction, but for counterfactual reasoning, clinical trial matching, and personalized intervention, with open-source models already being validated across leading academic medical centers. What you’ll learn in this episode: >> Why reducing a patient to text - clinical notes, radiology reports, genomic assay summaries - and how mapping multimodal data into a shared latent embedding space preserves information that never makes it into the written record >> How Standard BioModel's temporal architecture models patients as trajectories through an abstract embedding space rather than static snapshots, enabling counterfactual reasoning about the likely impact of interventions on a patient's future health trajectory >> Why no single foundation model can own every clinical vertical and how building a highly generalizable base model that facilitates downstream fine-tuning is a more defensible and scalable strategy than building narrow, application-specific models >> How the model handles missing modalities in real-world clinical settings, and why the architecture is designed to function effectively even when not every data type is available for every patient >> Why Standard BioModel has chosen to open-source its models and why broad, institution-specific validation across diverse patient populations is not just a scientific priority, but a prerequisite for trustworthy clinical AI Meet our guest: Kevin Brown is the Founder and CEO of Standard Model Biomedicine, where he builds foundation models for biomedicine. He previously led AI work as Director of Artificial Intelligence at SimBioSys, and held data science and applied ML roles at Bristol Myers Squibb and Siemens Healthineers. With a neuroscience research background from New York University, Kevin’s work spans generative AI and machine learning for biomedical and medical imaging applications. Connect with Kevin Brown on LinkedIn About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

Transcribe →

Physics, Free Energy, & Drug Discovery: Inside Schrödinger's Computational Platform

Apr 100:57:31Tap to summarize

In this episode of Data in Biotech, Ross Katz sits down with Robert Abel, Chief Scientific Officer of the Platform at Schrödinger, to explore how physics-based computational modeling is transforming drug discovery. Robert unpacks why machine learning alone isn't enough to navigate the vast complexity of chemical space - an estimated at 10⁶⁰ possible drug-like molecules - and how integrating atomistic simulations with ML creates a more accurate, reliable, and scalable approach to identifying viable drug candidates. From free energy perturbation calculations to generative AI, Robert offers a rare inside look at how Schrödinger's technology platform is accelerating the path from target identification to clinical candidate and where the field is headed next. What you’ll learn in this episode: >> Why chemical space (~10⁶⁰ molecules) makes purely data-driven ML approaches fundamentally insufficient for drug discovery, and how physics-based sampling solves the training data problem >> How free energy perturbation (FEP) calculations enable quantitative prediction of protein-ligand binding affinities at near-experimental accuracy (~1.2 kcal/mol RMSE) >> How Schrödinger's active learning framework combines physics-based simulations and ML to triage billions of candidate molecules before committing to wet lab synthesis >> Why Schrödinger operates across three business lines; software licensing, collaborative programs, and proprietary drug discovery and how each strengthens the underlying technology platform >> Where the next frontiers lie: routine anti-target selectivity profiling, retrosynthetic AI integration, and the expanding role of generative ML in de novo molecular design Meet our guest: Robert Abel is Chief Scientific Officer, Platform at Schrödinger, where he helps lead the scientific direction behind computational approaches that support modern drug discovery and molecular design. With a PhD in Chemical Physics from Columbia University and a deep background in computational chemistry, he has held multiple senior science leadership roles at Schrödinger, guiding teams that build and scale scientific methods into production-grade platforms used across research and industry. Connect with Robert Abel on LinkedIn About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

Transcribe →

AI in biotech: separating hype from reality with Ben Locwin

Mar 1100:30:29Tap to summarize

In this episode of Data in Biotech, host Ross Katz sits down with Ben Locwin, Vice President at Reliant Life Sciences, to explore the evolving landscape of artificial intelligence in biotechnology. Join us as we discuss why nearly every biotech claims to use AI but few actually do, examine successful applications like AlphaFold, and explore the challenges of implementing AI across drug development, manufacturing, and regulatory processes. Ben shares insights on maintaining healthy skepticism, understanding data provenance, and looking ahead to what this year may bring for AI in life sciences. What you’ll learn in this episode: >> The AI hype problem in biotech and why most companies claim to use AI but few actually do. >> AlphaFold as the gold standard and how DeepMind's protein structure prediction model represents the most successful application of AI in biotech >> Data quality over algorithmic sophistication and the critical importance of data provenance, examining primary sources, and understanding that data quality matters more than the complexity of the AI model >> The balance between optimism and evidence-based decision-making, distinguishing between sophisticated AI and advanced statistical modeling Meet our guest: Ben Locwin is a healthcare and life sciences executive and medical scientist known for helping bring pharmaceuticals, vaccines, and medical devices to market faster and with higher quality. A TEDx speaker and seasoned leader, he’s worked across major biotech hubs and has deep expertise in global regulatory pathways, having collaborated with the FDA, EMA, MHRA, PMDA, and more. Connect with Ben Locwin on LinkedIn About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

Transcribe →

3D Printing Therapeutics at Scale with Aprecia Pharmaceuticals

Feb 2500:49:01Tap to summarize

In this episode of Data in Biotech, Ross Katz sits down with Kyle Smith and Jacob Mayer from Aprecia Pharmaceuticals to explore how 3D printing is transforming pharmaceutical manufacturing. They dive into the unique binder jetting process, in-cavity printing, and how real-time data and automation are enabling agile, scalable, and precise drug production. Discover how Aprecia's approach is changing the game for clinical trials and personalized medicine. What you'll learn in this episode: >> How Aprecia developed the world’s first FDA-approved 3D printed drug >> Why binder jetting stands out among 3D printing methods in pharma >> How in-cavity 3D printing enables real-time tablet-level data collection >> The future of closed-loop control and digital twins in drug manufacturing >> Why 3D printing is key to agile, distributed, and personalized pharma production Meet our guests: Kyle Smith is President and COO of Aprecia Pharmaceuticals, leading strategic growth and innovation in GMP-regulated pharma manufacturing. With 12+ years at Aprecia, he brings deep expertise in engineering, operations, and technology transfer. Jacob Mayer is Director of Engineering Innovation at Aprecia Pharmaceuticals. With a decade of experience across automation, additive manufacturing, and life sciences, he leads the advancement of 3D printing technologies and integrated pharma systems. About the host Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with our guests: Sponsor: CorrDyn, a data consultancyConnect with Jacob Mayer on LinkedIn Connect with Kyle Smith on LinkedIn Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode!Connect with Ross Katz on LinkedIn Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

Transcribe →

All episodes

Synthesizable by Design: Rethinking AI's Role in Small Molecule Drug Discovery

From Tissue to Mechanism to Decision: Building AI for Computational Oncology

Cavities in the Data: Building FDA-Cleared AI for Dental Imaging with Overjet

Data as a Moat: Why Biotech's Most Valuable Asset is Buried in a Hard Drive

Markus Gershater on Why Experimental Design in Biotech is Broken and How to Fix It

Data Science and Diagnostic Models - the What, Why and How with Michelle Wiest

The Patient is Not a Document: Foundation Models for Biomedical AI with Standard BioModel

Physics, Free Energy, & Drug Discovery: Inside Schrödinger's Computational Platform

AI in biotech: separating hype from reality with Ben Locwin

3D Printing Therapeutics at Scale with Aprecia Pharmaceuticals