Latent Space: The AI Engineer Podcast
Episode: Why There Is No "AlphaFold for Materials" — AI for Materials Discovery with Heather Kulik
Date: March 24, 2026
Guest: Prof. Heather Kulik, MIT Chemical Engineering
Episode Overview
This episode delves into the evolving landscape of AI-driven materials discovery, with a focus on why there is no direct equivalent to AlphaFold for materials science. Professor Heather Kulik discusses her pioneering research at the convergence of computational chemistry, machine learning, and materials engineering. The conversation covers AI's current capabilities and limitations for molecular design and discovery, the role of data sets, experimental and computational bottlenecks, and the unique structural challenges of materials compared to proteins. The episode is particularly tailored for AI engineers seeking to understand how their skills could advance the scientific discovery of new materials.
Key Discussion Points & Insights
1. The Promise and Realities of AI for Materials Discovery
-
Accelerating Discovery with AI
Heather Kulik describes using AI to search through tens of thousands of materials, a process that would take years in the lab, to discover polymers with unexpected properties—such as being four times tougher (01:36–03:23)."We were able to scale screen with artificial intelligence… [and] uncovered this unexpected chemical phenomenon that led to a emergent property… [making] the polymer about four times tougher."
—Heather Kulik (02:28) -
Surprising Outcomes
The most valuable AI-driven discoveries are those that surprise even expert chemists, leading to real-world lab validation."When we showed the design that AI had come up with to the experimentalists, they were really surprised. They would have never come on this, on their own."
—Heather Kulik (02:55)
2. From Traditional Computational Chemistry to Machine Learning
-
Evolution of Methods
Kulik’s journey began by accelerating slow quantum mechanical models, progressing to ML approaches as these became practical (05:17–07:06)."Somewhere around 2015, 2016, I realized it was a bad idea to call things cheminformatics, and it was a good idea to start calling things machine learning."
—Heather Kulik (05:42) -
Active Learning in Complex Optimization
Current projects use active learning to optimize for multiple conflicting properties (up to seven objectives) in materials like metal-organic frameworks (MOFs)."Usually…even for a not-so-accurate machine learning model, you get at least 100 to 1000 fold speed up for every dimension you’re optimizing over."
—Heather Kulik (08:12)
3. Why There’s No “AlphaFold for Materials”
-
Structural Complexity & Diversity
Unlike proteins (20 amino acids, globular structure), materials science covers vastly more diverse chemical bonds and building blocks (24:14–25:27)."What AlphaFold has done really well is predict structures of globular proteins… The challenge is…you have a lot more than 20 building blocks when it comes to materials."
—Heather Kulik (24:28) -
Limitations of Data and Ground Truth
Materials data often lacks robust experimental ground truth, relying heavily on quantum mechanical approximations which are computationally expensive and sometimes unreliable for new chemistry (18:11–19:30)."All the smartest ML engineers right now are learning on data that is not going to be reflective of experiment…there aren’t big experimental data sets."
—Heather Kulik (18:28)"Electronic structure calculations are expensive…should give you the right answer…But every time someone comes out with a new data set…It looks really good and then you get it into your lab and…starts doing kind of wacky things."
—Heather Kulik (19:30) -
No Universal Benchmarks
Protein folding has CASP; materials science lacks a comparably rigorous benchmark that connects computational predictions with experimental validation (17:57–18:56).
4. The Continuing Importance of Chemical & Physical Intuition
-
Limits of LLMs and ChatGPT
While LLMs are useful for introductory knowledge, they fail at nuanced or creative design tasks that require deep domain expertise (13:24–16:07)."One of my favorite things to actually throw at GPT as an anecdote is…I just ask it, please design me a ligand that has 22 atoms…I can never get an answer that has 22 atoms."
—Heather Kulik (13:53)"You should learn chemistry well enough to know when these models are right or wrong. And if you don’t…it’s hard to know if you’re assessing correctly."
—Heather Kulik (15:46)
5. Data Challenges and Opportunities for ML Engineers
-
Sparse and Biased Datasets
Most open data sets cover “boring” chemistry, not the complex multi-element systems of industrial or “exotic” materials (16:23–17:57)."We have really good data sets out there for really boring chemistry…lots of challenges out there. The physics is much more complex…[but] there may not be a benchmark or a leaderboard yet for that."
—Heather Kulik (16:35) -
Literature Extraction & Bias
Integrating textual data from scientific papers with ML models can introduce new errors or biases, especially since human interpretations of experiment can differ (27:01–28:31)."You can get the temperature at which a material will break down two ways…from the graph…and from what the authors say…those two things do not line up."
—Heather Kulik (27:36)
6. Automation, Human Insight, and The "Bits-to-Atoms" Bottleneck
-
The Experiment-Compute Loop
Real-world testing is a major bottleneck; even the best models cannot circumvent the need for physical lab experiments, which are still difficult to fully automate (21:30–23:41)."There are some types of experiments that…are really hard for autonomous high throughput experimentation, but are really easy for a human and vice versa."
—Heather Kulik (22:34) -
From Material to Device
Predicting and optimizing for process variables (not just the material’s structure/properties) is a mostly unsolved problem in computational materials science (22:23–23:41).
7. The Role of Academia vs. Industry
-
Resource Disparity
Academic labs are increasingly outmatched in compute compared to tech companies. Kulik’s strategy is to focus on creativity and problems not (yet) on industry’s radar (32:45–33:50)."…Companies have access to [compute] that academics don't. So I ask myself what can we do that's more creative, that doesn't require just brute force compute?"
—Heather Kulik (32:56)
8. Practical Resources and Ways to Get Involved
-
Open Source Tools
Kulik’s group’s MulSimpliFY and MOFSimplify codes for structure generation and ML-based screening in inorganic complexes and MOFs are openly available."It includes machine learning predictions, but it also makes novel structures…I'm just really interested to hear…if people are using it."
—Heather Kulik (34:30)
Notable Quotes with Timestamps
-
On the difference from AlphaFold:
"The bonding is highly variable across all of material space…there’s no real way to know if you’re right or wrong. The experimental data is not there."
—Heather Kulik (25:27) -
On LLMs and chemistry expertise:
"ChatGPT is super good at Wikipedia level chemistry knowledge..."
—Heather Kulik (13:48) -
On the data bottleneck:
"Most people who actually work on getting materials to the device scale…will tell you that it's not just the material, it's the process… we’re at ground zero."
—Heather Kulik (23:06) -
On scientific literature and ML:
"We spend a lot of energy trying to get [data] back out [of papers]. Some of this is a need also for maybe systematization of how results get reported so they can be ML ready from day one."
—Heather Kulik (31:09)
Timestamps for Important Segments
- 01:36–03:23: Accelerated polymer discovery with AI
- 05:17–07:06: Evolution from traditional computation to machine learning
- 08:48–10:19: Active learning and multidimensional optimization
- 13:24–16:07: Limits of LLMs; human vs. machine chemistry knowledge
- 16:23–17:57: Data challenges and overlooked chemistry problems
- 18:11–19:30: Why there is no CASP/AlphaFold equivalent in materials
- 19:30–21:30: Limitations of ML potentials; transparency and computational cost
- 22:23–23:41: Automation, human insight, bits-to-atoms bottleneck
- 24:14–25:27: Structural and bonding complexity vs. protein folding
- 27:01–28:31: Integrating literature with ML; pitfalls of textual data
- 30:49–32:11: Call for user facilities, data systematization
- 32:45–33:50: Academia vs. industry—creativity versus compute
- 34:05–34:54: How to get involved; open source tools
Getting Involved: Resources & Final Thoughts
- Open Source Tools:
- Mulsimplify / MOFsimplify — for transition metal complexes and MOFs, with ML predictions and generative designs.
- Data sets:
- Curated data sets for metal-organic framework stability available via Kulik’s group.
Heather’s Call to Action:
“If you do have an interest in transition metal complexes, just try it out…We sort of find out after the fact, so if you’re interested more in this material space, I’m definitely interested and open to feedback.” (34:30)
Summary Prepared For
Listeners and AI engineers seeking a comprehensive understanding of the state-of-the-art and current barriers in AI-driven materials discovery, with actionable resources and key perspectives from a field leader.
