NVIDIA AI Podcast
Episode: From AlphaFold to MMseqs2-GPU: How AI is Accelerating Protein Science
Date: September 10, 2025
Host: Noah Kravitz
Guests: Chris Delago (Research Lead, NVIDIA & Visiting Professor, Duke University)
Martin Steiniger (Associate Professor of Biology, Seoul National University; Co-author, AlphaFold)
Brief Overview
This episode dives deep into the convergence of AI, accelerated computing, and protein science, spotlighting recent breakthroughs in protein structure prediction and homology search. Host Noah Kravitz speaks with NVIDIA’s Chris Delago and Seoul National University’s Martin Steiniger about monumental advancements like AlphaFold and the newly GPU-accelerated MMseqs2, discussing their transformative impact on research, drug discovery, and the scalability of computational biology. The episode features insights on the technological, collaborative, and open-source ethos pushing biology into a new era.
Key Discussion Points & Insights
1. Why Protein Structure Matters
- Proteins as the Machinery of Life:
- Proteins are fundamental to cellular processes; their 3D structure determines function.
- Accurate prediction is essential for drug discovery, understanding cellular mechanisms, and bioengineering.
- “So the structure is really fundamental for that.” (Martin, 01:55)
2. AlphaFold’s Revolution in Biology
-
Impact Across Academia and Industry:
- AlphaFold is now woven into the workflows of virtually every pharma and biotech company.
- It enables computational prediction of 3D protein structures, expediting both basic research and drug development.
- “I don't think I have seen any pharma company or tech, bio, biotech company at this point that doesn't use AlphaFold in some way in their research.” (Chris, 04:13)
-
From Data Scarcity to Data Avalanche:
- The number of known structures jumped from ~200,000 to hundreds of millions due to tools like AlphaFold.
- This data explosion necessitates new, high-speed analysis tools.
3. NVIDIA’s Role and Announcements
- Accelerating the Ecosystem:
-
NVIDIA is focusing on ecosystem collaboration and hardware acceleration.
-
Recent announcements:
- Accelerated structure prediction methods (AlphaFold/OpenFold/ColabFold & newer models).
- NVIDIA Inference Microservice (NIM) for deploying these models at scale.
- MMseqs2-GPU now compatible with Blackwell GPUs.
- MMseqs2-GPU tool accepted for publication in Nature Methods.
-
“We’ve done targeted accelerations of these atomic things that researchers can use to build the next generation of models.” (Chris, 05:45)
-
4. MMseqs2-GPU: Homology Search at Scale
-
The Importance of Homology Retrieval:
- Finding evolutionary similarities ("Google for proteins") narrows the folding problem and strengthens structure prediction.
- “When you do an AlphaFold prediction... you need homological information. So... you search through a database of hundreds of millions or billions of protein sequences.” (Martin, 07:55)
-
GPU Acceleration as a Game Changer:
- Traditional search is a bottleneck, taking up to 80% of compute time in AlphaFold pipelines.
- With MMseqs2-GPU, this is reduced to 20%, putting the focus back on deep learning steps, which can themselves be further accelerated.
- “With MM6 to GPU, we are inverting that relationship... Now it only takes you 20% of the total execution time to do this homology retrieval step.” (Chris, 09:48)
-
Integration & Practical Impact:
- MMseqs2-GPU supports downstream tools like Foldseq and is part of a broader ecosystem.
- Multi-GPU systems offer linear (or near-linear) speedups for large-scale analyses.
5. Tool-Building Philosophy & Data Trends
-
Community-driven Open Science:
- Martin emphasizes building tools that scale with exponential data growth—metagenomics, structure prediction, molecular dynamics, etc.
- “I am somehow looking where data is going as well as what is needed by biologists in the end.” (Martin, 12:39)
-
Shaping the Next Generation of Tools:
- Foldseq, Foldseq-multimers, and FoldDisco address new challenges in analyzing structure, function, and interaction networks.
- “All of these tools that existed before, can we just make them ready for this data explosion?” (Martin, 15:16)
6. Limits & Challenges in Computational Biology
- The Analysis Bottleneck:
- Generating data computationally is now faster than analyzing it.
- AI-powered acceleration on both the data generation and analysis fronts is crucial for progress.
- “We are constantly being pulled in two directions. One is we have scale problems to generate and then we have the problem of scale to sift through.” (Chris, 16:40)
7. Academia & Industry Collaboration
-
Story Behind the Partnership:
- Longstanding relationship between Chris and Martin; early ideas for GPU-acceleration date back over a decade.
- Collaboration philosophy: open-source, no patents, immediate community access.
- “I can only do that if we open source everything. It should be free and we cannot, there should be no patent involved.” (Martin, 19:53)
-
Engineering Innovations:
- Collaboration with Bertl Schmidt’s group enabled custom compiler optimization for maximum speed.
8. Real-World Impact & Community Response
-
Industry Feedback:
- Accelerated tools have unblocked startups, enabled funding, and been widely adopted.
- “Some have reached out personally to say this enabled us to get more funding, which obviously to me makes me very happy because it's a great signal.” (Chris, 22:03)
-
Academic Feedback:
- GPU acceleration brings major speedups even for users with modest hardware.
- “A user... said like, oh, this GPU implementation is like linear. You made a quadratic problem... linear. And I was like, nah, it's not true... but it looked very linear to that end user because it was extremely accelerated.” (Martin, 22:58)
9. Looking to the Future: New Models & Research Directions
-
Open-Sourcing Protein Design Models:
- NVIDIA released "La Proteina," an open-source generative model for protein design, trained on AlphaFold data and scaled across GPUs.
- “The idea here is... both sides of the coin, if you want. Really important. We've talked about protein structure prediction and acceleration today. MM6 falls into that...We're also doing work on protein design...” (Chris, 25:26)
-
Frontiers of Prediction & Interaction:
- Monomer structure prediction is “under control.”
- Multimer (protein-protein interaction) prediction remains a challenge and is the next big leap.
- Incorporation of cellular context and simulation (MD, lipid layers, dynamics) is on the horizon but computationally daunting.
- “Everything I just said sounds really nice, but technically really, really challenging because everything is like a combinatorial problem.” (Martin, 27:58)
-
Expanding Molecular Scope:
- Newer models (e.g., AlphaFold 3, Boltz) allow integrated prediction of proteins, DNA, RNA, and small molecule complexes.
- “Now it's not just the protein and the other protein that interacts with, but it may be those two proteins and some drug that interacts with them.” (Chris, 29:45)
10. How to Engage & Stay Updated
-
NVIDIA Resources:
- Digital Biology product and research announcements (Digital Biology at NVIDIA, BioNemo, NIM, etc.)
- GitHub: https://github.com/NVIDIA/digital-biology
- Planned releases, guides, and blog posts for easier community application.
-
Martin’s Lab:
- Open-source-first code release — often before papers
- GitHub: [Seoul National University organization]
- Updates via Bluesky, LinkedIn, and collaborators’ channels.
- “We put code out before we have even papers out.” (Martin, 32:57)
Memorable Quotes & Notable Moments
-
On AlphaFold’s impact:
- “I don't think I have seen any pharma company or tech, bio, biotech company at this point that doesn't use AlphaFold in some way in their research.” (Chris, 04:13)
-
On GPU acceleration transforming workflows:
- “With MM6 to GPU, we are inverting that relationship... Now it only takes you 20% of the total execution time to do this homology retrieval step.” (Chris, 09:48)
-
On the open-source collaborative ethos:
- “I can only do that if we open source everything. It should be free and we cannot, there should be no patent involved.” (Martin, 19:53)
-
On the community and democratization:
- “We've done the thing that I think Nvidia is really, really good at, which is lifting everybody up and helping everybody be more successful at what they want to do...” (Chris, 22:03)
-
On the computational challenge:
- “Everything I just said sounds really nice, but technically really, really challenging because everything is like a combinatorial problem.” (Martin, 27:58)
Key Timestamps
| Timestamp | Topic / Quote | |-----------|----------------------------------------------------------------------------------| | 01:55 | “So the structure is really fundamental for that.” – Martin explains protein significance | | 04:13 | “...doesn’t use AlphaFold in some way in their research.” – Chris on AlphaFold impact | | 05:45 | NVIDIA's acceleration and model inference announcements | | 07:55 | Martin explains homology retrieval and its necessity for structure prediction | | 09:48 | “With MM6 to GPU, we are inverting that relationship...” – Chris on GPU acceleration | | 12:39 | Martin on historical progression and tool-building philosophy | | 15:16 | Post-AlphaFold tools and data era challenges | | 16:40 | The analysis bottleneck: “pulled in two directions...” – Chris | | 19:53 | Martin: “I can only do that if we open source everything.” | | 22:03 | Real-world startup impact: “this enabled us to get more funding...” – Chris | | 22:58 | User feedback on GPU speed: “this GPU implementation is like linear...” – Martin | | 25:26 | NVIDIA's open-source generative model "La Proteina" | | 27:58 | Multimers, combinatorial challenge in prediction – Martin | | 29:45 | Expanding scope beyond protein-protein to complexes (DNA, RNA, drugs) – Chris | | 32:57 | Martin: “We put code out before we have even papers out.” |
Closing Notes
This episode highlights the critical, symbiotic progress happening at the intersection of AI, accelerated hardware, and biology—from foundational data infrastructure to user impact and open-source collaboration. The consensus: the protein sciences are in the midst of an AI-fueled revolution, with new discoveries, tools, and challenges coming at exponential scale.
To stay involved:
- For NVIDIA’s latest: digital biology pages, GitHub, enterprise product releases
- For Martin’s tools: GitHub, social media, and active pre-publication code drops
End of summary.
