
Hosted by CorrDyn · EN
Data in Biotech is a fortnightly podcast exploring how companies leverage data to drive innovation in life sciences.
Every two weeks, Ross Katz, Principal and Data Science Lead at CorrDyn, sits down with an expert from the world of biotechnology to understand how they use data science to solve technical challenges, streamline operations, and further innovation in their business.
You can learn more about CorrDyn - an enterprise data specialist that enables excellent companies to make smarter strategic decisions - at www.corrdyn.com

In this episode of Data in Biotech, host Ross Katz sits down with Sadegh Salehi, Director of Research and Principal Scientist at Overjet, to explore what rigorous model evaluation actually looks like when the stakes are clinical. Overjet builds FDA-cleared vision models that detect and quantify dental disease across billions of X-ray images from thousands of practices - a data problem with a staggering number of dimensions. Thirty-two teeth per adult patient, each with different morphology. Multiple image types capturing different anatomy. Fifteen to twenty sensor manufacturers producing perceptually distinct images, each with different contrast, resolution, and noise characteristics. And disease severity distributions ranging from barely visible early-stage decay to obvious pathology. Sadegh walks through what it takes to evaluate models responsibly across all of those dimensions and discusses why aggregate metrics like F1 score can mask catastrophic failures on specific subgroups, how models find and exploit shortcuts in training data, and why the same flawed sampling that creates gaps in your training set also creates them in your test set. He also traces Overjet's architectural evolution from over twenty narrow task-specific models to a single foundation model they call Unity, explains how treatment plan procedure codes provide a noisy but real production feedback signal, and describes how Overjet became one of the first companies to secure the FDA's Predetermined Change Control Plan (a framework that allows model updates without filing a new clearance each time.) What you’ll learn in this episode: >> Why aggregate evaluation metrics are insufficient for high-stakes medical AI >> How models exploit shortcuts in training data: if all images from a rare sensor in the training set happen to be healthy, the model doesn't learn to read that sensor, it learns that the sensor means healthy, bypassing the visual task entirely and producing systematic false negatives in production >> How Overjet evolved from over twenty narrow, sensor-specific and indication-specific models into a single foundation model called Unity, using noisy labels generated by the small models as the training signal for a much larger backbone, then building independent prediction heads for each clinical indication on top of it >> Why the decision to keep prediction heads architecturally independent from one another was driven as much by FDA regulatory strategy as by modeling considerations >> How Overjet uses dental treatment plan procedure codes as a production monitoring signal Meet our guest: Sadegh Salehi is Director of Research and Principal Scientist at Overjet, where he leads the team responsible for building, evaluating, and deploying FDA-cleared vision models for dental disease detection and quantification. Connect with Sadegh Salehi on LinkedIn: https://www.linkedin.com/in/sadegh-salehi/ About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn: https://www.linkedin.com/in/b-ross-katz/ Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn. https://www.linkedin.com/company/corrdyn/

In this episode of Data in Biotech, host Ross Katz sits down with Jesse Johnson, founder of Merelogic, a software consulting firm specializing in data infrastructure for biotech organizations. Jesse brings a rare perspective to the conversation: having built data systems at Google where engineers control the data collection function end to end, before moving into biotech, where the biology does what it wants and bench scientists, not engineers, generate the data. The result is a grounded, pragmatic take on one of the most consequential and underappreciated questions in life sciences right now: as bio foundation models fundamentally change the value equation for experimental data, are biotech labs structured to capture that value? Jesse argues the answer is usually no and that the fix is less technical than most assume. It doesn't require a production-grade data pipeline or a cloud architecture. It requires lightweight, human-readable standard operating procedures, clear expectations between computational and wet lab teams, and a data strategy designed not just for the questions you're asking today, but for the ones you don't yet know you'll need to ask. What you’ll learn in this episode: >> Why the transition from tech to biotech requires a fundamental reset of assumptions about data infrastructure and why the biggest difference isn't technical, it's organizational. >> How bio foundation models have flipped the value equation for experimental data by reducing the cost of organizing it while dramatically increasing the potential return >> How the strategic value of proprietary data is evolving in the biotech ecosystem, from Tahoe Therapeutics building an acquirable single-cell dataset to Eli Lilly's Lowe lab using data as currency for partnerships >> Why electronic lab notebooks aren't going anywhere and how the real question facing biotech software teams isn't whether to use an ELN, but how to balance schema rigidity against the flexibility required for the long tail of one-off exploratory assays that no automation pipeline will ever fully capture Meet our guest: Jesse Johnson is the founder of Merelogic, a software consulting firm that works with biotech and biopharma organizations on data infrastructure and data operations strategy. Jesse writes regularly about data strategy for biotech on his Substack, covering topics from bio foundation model adoption to the evolving role of electronic lab notebooks in an AI-augmented research environment. Connect with Jesse Johnson on LinkedIn: https://www.linkedin.com/in/jesse-johnson-biotech/ Follow Merelogic on Linkedin: https://www.linkedin.com/company/merelogic/ About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn: https://www.linkedin.com/in/b-ross-katz/ Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

This week, we're delighted to be joined by Markus Gershater, Chief Scientific Officer and CoFounder of Synthace - a digital experiment platform built for high-performance life science R&D teams to help them run more powerful experiments and accelerate scientific progress. Host Ross Katz speaks with Markus on what’s broken about: What’s broken about how laboratories across the globe run biological experiments at scaleThe opportunity that exists for researchers in striving to achieve experimental design automation processesWhy, as an industry, we must move towards implementing infrastructure that enables multifactorial experiments versus one-factor experimentsAnd the role of AI in making sense of complex systems. --- If you’re a biotech company struggling to transform your business with data, CorrDyn can help. Whether you need to supplement existing technology teams with specialist expertise or launch a data program that lays the groundwork for future internal hires, you can partner with Corrdyn to unlock the potential of your business data - today. Visit connect.corrdyn.com/biotech to learn more. --- Data in Biotech is a fortnightly podcast exploring how companies leverage data to drive innovation in life sciences.

This week, we're delighted to be joined by Michelle Wiest, Director of IVD Biostatistics at Freenome - a high-growth biotech company that creates tools to help prevent, detect, and treat disease. Host Ross Katz speaks with Michelle on the use of biostatistics in the field of diagnostics, what biases can corrupt diagnostic tests and how to catch them early, the different types of data sets that are being used to develop diagnostic models and how to prepare to present data to regulatory bodies such as the FDA. --- If you’re a biotech company struggling to transform your business with data, CorrDyn can help. Whether you need to supplement existing technology teams with specialist expertise or launch a data program that lays the groundwork for future internal hires, you can partner with Corrdyn to unlock the potential of your business data - today. Visit connect.corrdyn.com/biotech to learn more. --- Data in Biotech is a fortnightly podcast exploring how companies leverage data to drive innovation in life sciences.

In this episode of Data in Biotech, host Ross Katz sits down with Kevin Brown, co-founder of Standard BioModel, to explore one of the most ambitious projects in biomedical AI, building a multimodal foundation model that represents the full complexity of a patient across time. Drawing on a career spanning brain-computer interfaces, computer-aided diagnosis at Siemens Healthineers, and oncology data science at Bristol Myers Squibb, Kevin shares the scientific and philosophical journey that led him to a single conviction: a patient is not a document. Rather than reducing a patient to clinical notes, ICD-10 codes, or isolated test results, Standard BioModel's approach maps every available modality - CT imaging, digital pathology, genomics, EKGs, longitudinal EHR data - into a shared latent space, and models how that patient moves through time. The result is a framework designed not just for prediction, but for counterfactual reasoning, clinical trial matching, and personalized intervention, with open-source models already being validated across leading academic medical centers. What you’ll learn in this episode: >> Why reducing a patient to text - clinical notes, radiology reports, genomic assay summaries - and how mapping multimodal data into a shared latent embedding space preserves information that never makes it into the written record >> How Standard BioModel's temporal architecture models patients as trajectories through an abstract embedding space rather than static snapshots, enabling counterfactual reasoning about the likely impact of interventions on a patient's future health trajectory >> Why no single foundation model can own every clinical vertical and how building a highly generalizable base model that facilitates downstream fine-tuning is a more defensible and scalable strategy than building narrow, application-specific models >> How the model handles missing modalities in real-world clinical settings, and why the architecture is designed to function effectively even when not every data type is available for every patient >> Why Standard BioModel has chosen to open-source its models and why broad, institution-specific validation across diverse patient populations is not just a scientific priority, but a prerequisite for trustworthy clinical AI Meet our guest: Kevin Brown is the Founder and CEO of Standard Model Biomedicine, where he builds foundation models for biomedicine. He previously led AI work as Director of Artificial Intelligence at SimBioSys, and held data science and applied ML roles at Bristol Myers Squibb and Siemens Healthineers. With a neuroscience research background from New York University, Kevin’s work spans generative AI and machine learning for biomedical and medical imaging applications. Connect with Kevin Brown on LinkedIn About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

In this episode of Data in Biotech, Ross Katz sits down with Robert Abel, Chief Scientific Officer of the Platform at Schrödinger, to explore how physics-based computational modeling is transforming drug discovery. Robert unpacks why machine learning alone isn't enough to navigate the vast complexity of chemical space - an estimated at 10⁶⁰ possible drug-like molecules - and how integrating atomistic simulations with ML creates a more accurate, reliable, and scalable approach to identifying viable drug candidates. From free energy perturbation calculations to generative AI, Robert offers a rare inside look at how Schrödinger's technology platform is accelerating the path from target identification to clinical candidate and where the field is headed next. What you’ll learn in this episode: >> Why chemical space (~10⁶⁰ molecules) makes purely data-driven ML approaches fundamentally insufficient for drug discovery, and how physics-based sampling solves the training data problem >> How free energy perturbation (FEP) calculations enable quantitative prediction of protein-ligand binding affinities at near-experimental accuracy (~1.2 kcal/mol RMSE) >> How Schrödinger's active learning framework combines physics-based simulations and ML to triage billions of candidate molecules before committing to wet lab synthesis >> Why Schrödinger operates across three business lines; software licensing, collaborative programs, and proprietary drug discovery and how each strengthens the underlying technology platform >> Where the next frontiers lie: routine anti-target selectivity profiling, retrosynthetic AI integration, and the expanding role of generative ML in de novo molecular design Meet our guest: Robert Abel is Chief Scientific Officer, Platform at Schrödinger, where he helps lead the scientific direction behind computational approaches that support modern drug discovery and molecular design. With a PhD in Chemical Physics from Columbia University and a deep background in computational chemistry, he has held multiple senior science leadership roles at Schrödinger, guiding teams that build and scale scientific methods into production-grade platforms used across research and industry. Connect with Robert Abel on LinkedIn About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

In this episode of Data in Biotech, host Ross Katz sits down with Ben Locwin, Vice President at Reliant Life Sciences, to explore the evolving landscape of artificial intelligence in biotechnology. Join us as we discuss why nearly every biotech claims to use AI but few actually do, examine successful applications like AlphaFold, and explore the challenges of implementing AI across drug development, manufacturing, and regulatory processes. Ben shares insights on maintaining healthy skepticism, understanding data provenance, and looking ahead to what this year may bring for AI in life sciences. What you’ll learn in this episode: >> The AI hype problem in biotech and why most companies claim to use AI but few actually do. >> AlphaFold as the gold standard and how DeepMind's protein structure prediction model represents the most successful application of AI in biotech >> Data quality over algorithmic sophistication and the critical importance of data provenance, examining primary sources, and understanding that data quality matters more than the complexity of the AI model >> The balance between optimism and evidence-based decision-making, distinguishing between sophisticated AI and advanced statistical modeling Meet our guest: Ben Locwin is a healthcare and life sciences executive and medical scientist known for helping bring pharmaceuticals, vaccines, and medical devices to market faster and with higher quality. A TEDx speaker and seasoned leader, he’s worked across major biotech hubs and has deep expertise in global regulatory pathways, having collaborated with the FDA, EMA, MHRA, PMDA, and more. Connect with Ben Locwin on LinkedIn About the host: Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Ross Katz on LinkedIn Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode! Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

In this episode of Data in Biotech, Ross Katz sits down with Kyle Smith and Jacob Mayer from Aprecia Pharmaceuticals to explore how 3D printing is transforming pharmaceutical manufacturing. They dive into the unique binder jetting process, in-cavity printing, and how real-time data and automation are enabling agile, scalable, and precise drug production. Discover how Aprecia's approach is changing the game for clinical trials and personalized medicine. What you'll learn in this episode: >> How Aprecia developed the world’s first FDA-approved 3D printed drug >> Why binder jetting stands out among 3D printing methods in pharma >> How in-cavity 3D printing enables real-time tablet-level data collection >> The future of closed-loop control and digital twins in drug manufacturing >> Why 3D printing is key to agile, distributed, and personalized pharma production Meet our guests: Kyle Smith is President and COO of Aprecia Pharmaceuticals, leading strategic growth and innovation in GMP-regulated pharma manufacturing. With 12+ years at Aprecia, he brings deep expertise in engineering, operations, and technology transfer. Jacob Mayer is Director of Engineering Innovation at Aprecia Pharmaceuticals. With a decade of experience across automation, additive manufacturing, and life sciences, he leads the advancement of 3D printing technologies and integrated pharma systems. About the host Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with our guests: Sponsor: CorrDyn, a data consultancyConnect with Jacob Mayer on LinkedIn Connect with Kyle Smith on LinkedIn Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode!Connect with Ross Katz on LinkedIn Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

In this episode of Data in Biotech, host Ross Katz sits down with James Yoder, Founder and CEO of OpenBench, to unpack a radical new approach to early-stage drug discovery. James shares how OpenBench's "success-driven" model shifts risk away from biotech partners by only charging for validated hits. They dive deep into computational screening, molecular modeling, and the company's evolving tech stack that's making hit discovery smarter and more accessible. Discover how data, AI, and strategic collaboration are redefining biotech R&D. What you'll learn in this episode: >> Why OpenBench moved away from SaaS to a success-based service model >> How their computational platform predicts binding affinity and screens trillions of compounds >> The role of data flywheels and ML in improving drug discovery success rates >> Real-world case studies from biotech collaborations >> How OpenBench evaluates druggable targets in one week Meet our guest James Yoder is the Founder and CEO of OpenBench. With a background in statistics, data science, and applied machine learning, he leads OpenBench's mission to deliver validated drug discovery hits through computational innovation and a success-driven business model. About the host Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with our guest: Sponsor: CorrDyn, a data consultancyConnect with James Yoder on LinkedIn Connect with us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode!Connect with Ross Katz on LinkedIn Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.

Brant Peterson, Vice President & Fellow at Valo Health, joins Data in Biotech to explore how his team leverages real-world data, genetic insights, and machine learning to de-risk drug discovery. From building causal DAGs to identifying patient subtypes in neurodegenerative diseases like Parkinson’s, this episode dives deep into a patient-first, data-driven approach to biomedical innovation. What You'll Learn in This Episode: >> How Valo Health uses real-world evidence and EHR data to prioritize drug targets earlier in the development pipeline. >> Why integrating wet lab experiments and causal DAGs accelerates therapeutic validation. >> The importance of genetic pleiotropy and Mendelian randomization in refining disease hypotheses. >> How Valo Health identifies high-impact patient subgroups in neurodegenerative diseases like Parkinson’s and Alzheimer’s. >> Where machine learning models succeed and fall short, in uncovering mechanisms of disease from sparse longitudinal data. Meet Our Guest Brant Peterson is Vice President & Fellow in Data Science at Valo Health. He brings deep expertise in genetics, computational biology, and biomedical innovation. Formerly a Distinguished Data Scientist at Valo and Computational Biologist at Novartis, Brant focuses on leveraging patient-centric data to drive causal discovery in drug development. About The Host Ross Katz is Principal and Data Science Lead at CorrDyn. Ross specializes in building intelligent data systems that empower biotech and healthcare organizations to extract insights and drive innovation. Connect with Our Guest: Sponsor: CorrDyn, a data consultancyConnect with Brant Peterson on LinkedIn Connect with Us: Follow the podcast for more insightful discussions on the latest in biotech and data science.Subscribe and leave a review if you enjoyed this episode!Connect with Ross Katz on LinkedIn Sponsored by… This episode is brought to you by CorrDyn, the leader in data-driven solutions for biotech and healthcare. Discover how CorrDyn is helping organizations turn data into breakthroughs at CorrDyn.