Latent Space: The AI Engineer Podcast

Episode: [State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify
Date: December 30, 2025
Guest: Sarah Catanzaro (B), Amplify
Host: Latent.Space (A)

Overview

In this episode, Sarah Catanzaro, partner at Amplify, offers her perspective on the evolution of the data and AI startup landscape in 2025. The conversation explores the intersection of the modern data stack with AI workflows, the reality behind high-valuation funding rounds, trends in memory and personalization, and candid takes on "hot" areas such as RL (Reinforcement Learning) environments. Catanzaro shares both insights and provocative viewpoints, making this episode a lively resource for practitioners, founders, and investors following rapid changes in the field.

Key Discussion Points & Insights

1. The Evolution of the Modern Data Stack and its Place in AI

[00:21 – 08:16]

The widely discussed DBT-Fivetran merger is not the "end" of the modern data stack; both companies were thriving but needed higher revenue for IPOs in the current environment.
Quote:
"Many of the big frontier labs are actually using both DBT and Fivetran... training datasets need to be managed." — Sarah Catanzaro [02:17]
The modern data tools’ adoption persists, but democratization of analytics has reduced the need for large analytics teams.
Data workflows for LLM and agent training are becoming more "heterogeneous" and less predictable than classic BI analytics workloads.
Tools like DBT and Fivetran are being repurposed to serve training data preparation as much as analytics.

Data Catalogs and Their Unfulfilled Promise

[05:22 – 08:16]

Catanzaro reflects on her earlier belief that data catalogs would be a crucial standalone category.
Consolidation occurred: data catalog features now bundled into larger platforms (Snowflake, DBT, etc.), which was "good enough" for human users but underdeveloped for machine use cases.
Quote:
"I do wonder at times if we built data catalogs for the wrong people and potentially for the wrong use cases." — Sarah Catanzaro [07:59]
Opportunity seen in creating machine-centric metadata services, rather than human-centric catalogs.

2. Data Infrastructure for AI Labs

[08:16 – 10:12]

Large AI labs carefully manage their data stacks, especially with respect to GPU data loading efficiency.
Portfolio company Spotlight: Spiral’s Vortex file format for fast GPU data loading.
Much classic data infrastructure scaled surprisingly well for AI, but future transaction volumes could explode as agent-to-agent interactions grow.

3. Wild Funding Climate in 2025

[10:12 – 17:02]

Common to see $100M+ "seed rounds" with only a broad vision, little near-term roadmap, and a quick (sometimes 7-day) turnaround for decisions.
Quote:
"It definitely makes me anxious… Founders are asking, 'How much should I raise?' and I’m typically saying, like, three [million], like five…" — Sarah Catanzaro [10:57]
Many founders seek prestige of high valuations rather than focusing on business fundamentals, even sometimes using high valuations as a recruiting tool.
Quote:
"The thing though is that like the valuation is a made up number... until a company exits, it is an entirely made up number." — Sarah Catanzaro [16:04]
Some outlier companies (e.g., building wet labs) genuinely require large raises for infrastructural reasons, but most don't have clear short-term milestones.
The upside in employee equity from inflated valuations is often illusory; the approach is "not the right way to choose a job." [16:31]

4. Top Technical Themes: World Models, Memory, and Personalization

[17:06 – 23:24]

World Models

Ongoing confusion about what constitutes a "world model" and its use cases; promising for video, some robotics, and limited coding applications, but generalization is a challenge.
Quote:
"We have not yet defined what a world model is... A world model for video game generation might not generalize to factory settings or robotics." — Sarah Catanzaro [17:45; 18:47]

Memory & Continual Learning

Memory and personalization are emerging as foundational for application stickiness and continued product differentiation.
High user churn and low retention are rampant—improved memory/learning systems could be an answer.
Quote:
"Personalization is so important... AI application companies are growing quickly, but they suffer from relatively low retention, relatively high churn." — Sarah Catanzaro [19:56]
AI's static nature (unlike dynamic human intelligence) is seen as a major shortcoming.

The Infrastructure Challenge

Making model inference stateful (to update weights per user) is an unsolved system-level problem that will need to be tackled.
Connecting infrastructure (loading, caching, updating) to personalization is both an opportunity and challenge.

5. RL Environments: Fad vs. Value

[23:25 – 25:35]

Candid "hot take": RL environments as currently hyped are a short-term fad; real-world data is a superior RL environment in the long run.
Despite labs paying 7-8 figure amounts for RL environments, Catanzaro posits those resources could be better used directly on real application data.
Quote:
"I think RL Environments is just a fad... The best RL environment is, you know, the real world." — Sarah Catanzaro [23:41; 24:49]
System design and task/rubric definition within the "RL environment" will remain relevant, but building clones of apps for RL is not.

6. Archetype of an Exciting AI Startup

[25:46 – 27:56]

Startups connecting research and application most excite Catanzaro—teams that solve hard technical problems (e.g., retrieval, rule-following, memory) because their app requires it.
Examples:
- Harvey: Advanced RAG (retrieval-augmented generation) as product differentiation.
- Sierra: Success attributed to solving the rule-following research problem for customer support.
- Runway: Unlocked by solving deep product/research integration challenges.

Notable Quotes (with Attribution & Timestamps)

"Many of the big frontier labs are actually using both DBT and Fivetran..."
— Sarah Catanzaro [02:17]
"I do wonder at times like if we built data catalogs for the wrong people and potentially even for the wrong use cases."
— Sarah Catanzaro [07:59]
"It definitely makes me anxious… when founders are asking me, 'How much should I raise?' I'm typically saying, like, three, like five..."
— Sarah Catanzaro [10:57]
"The thing though is that like the valuation is a made up number... until a company exits, it is an entirely made up number."
— Sarah Catanzaro [16:04]
"Personalization is so important... AI application companies are growing quickly, but they suffer from relatively low retention, relatively high churn."
— Sarah Catanzaro [19:56]
"We have not yet defined what a world model is... world model for video game generation might not generalize to... robotics."
— Sarah Catanzaro [17:45; 18:47]
"I think RL Environments is just a fad... The best RL environment is... the real world."
— Sarah Catanzaro [23:41; 24:49]

Important Segments & Timestamps

Modern Data Stack & DBT-Fivetran Merger – [00:56–03:14]
Data Catalogs: Missed Opportunity? – [05:22–08:16]
RISE of High-Dollar, High-Velocity Funding – [10:12–17:02]
World Models & Definition Challenges – [17:45–18:47]
Memory, Continual Learning, Personalization – [18:56–23:18]
RL Environments Hot Take – [23:41–25:35]
Research-to-Application Starups Archetype – [25:46–27:56]

Memorable Moments

Sarah's candid reflection on data catalogs being "the thing I wanted" as a data scientist, but their standalone potential didn’t materialize [06:33].
Lively back-and-forth on the absurdity and risks of chasing billion-dollar valuations in early-stage fundraising [13:09–16:31].
The analogy of LLMs’ early buzz to the "magic" that drew in users—now receding in favor of product fundamentals like retention metrics [21:55].
Her provocative stance that RL environments, as a startup category, may be overhyped.

Conclusion

Sarah Catanzaro brings sharp, experience-driven commentary to the current and future state of AI startup infrastructure, funding trends, and product innovation. She argues for building infrastructure and products that deeply integrate hard research problems, rather than chasing hype cycles or inflated valuations. For founders and engineers, her advice points toward investing in memory/personalization, leveraging the real world as a training environment, and staying focused on applications that genuinely need cutting-edge science to succeed.

For more resources and full show notes, visit: latent.space

Latent Space: The AI Engineer Podcast

Episode: [State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify
Date: December 30, 2025
Guest: Sarah Catanzaro (B), Amplify
Host: Latent.Space (A)

Overview

Key Discussion Points & Insights

1. The Evolution of the Modern Data Stack and its Place in AI

[00:21 – 08:16]

The widely discussed DBT-Fivetran merger is not the "end" of the modern data stack; both companies were thriving but needed higher revenue for IPOs in the current environment.
Quote:
"Many of the big frontier labs are actually using both DBT and Fivetran... training datasets need to be managed." — Sarah Catanzaro [02:17]
The modern data tools’ adoption persists, but democratization of analytics has reduced the need for large analytics teams.
Data workflows for LLM and agent training are becoming more "heterogeneous" and less predictable than classic BI analytics workloads.
Tools like DBT and Fivetran are being repurposed to serve training data preparation as much as analytics.

Data Catalogs and Their Unfulfilled Promise

[05:22 – 08:16]

Catanzaro reflects on her earlier belief that data catalogs would be a crucial standalone category.
Consolidation occurred: data catalog features now bundled into larger platforms (Snowflake, DBT, etc.), which was "good enough" for human users but underdeveloped for machine use cases.
Quote:
"I do wonder at times if we built data catalogs for the wrong people and potentially for the wrong use cases." — Sarah Catanzaro [07:59]
Opportunity seen in creating machine-centric metadata services, rather than human-centric catalogs.

2. Data Infrastructure for AI Labs

[08:16 – 10:12]

Large AI labs carefully manage their data stacks, especially with respect to GPU data loading efficiency.
Portfolio company Spotlight: Spiral’s Vortex file format for fast GPU data loading.
Much classic data infrastructure scaled surprisingly well for AI, but future transaction volumes could explode as agent-to-agent interactions grow.

3. Wild Funding Climate in 2025

[10:12 – 17:02]

Common to see $100M+ "seed rounds" with only a broad vision, little near-term roadmap, and a quick (sometimes 7-day) turnaround for decisions.
Quote:
"It definitely makes me anxious… Founders are asking, 'How much should I raise?' and I’m typically saying, like, three [million], like five…" — Sarah Catanzaro [10:57]
Many founders seek prestige of high valuations rather than focusing on business fundamentals, even sometimes using high valuations as a recruiting tool.
Quote:
"The thing though is that like the valuation is a made up number... until a company exits, it is an entirely made up number." — Sarah Catanzaro [16:04]
Some outlier companies (e.g., building wet labs) genuinely require large raises for infrastructural reasons, but most don't have clear short-term milestones.
The upside in employee equity from inflated valuations is often illusory; the approach is "not the right way to choose a job." [16:31]

4. Top Technical Themes: World Models, Memory, and Personalization

[17:06 – 23:24]

World Models

Ongoing confusion about what constitutes a "world model" and its use cases; promising for video, some robotics, and limited coding applications, but generalization is a challenge.
Quote:
"We have not yet defined what a world model is... A world model for video game generation might not generalize to factory settings or robotics." — Sarah Catanzaro [17:45; 18:47]

Memory & Continual Learning

Memory and personalization are emerging as foundational for application stickiness and continued product differentiation.
High user churn and low retention are rampant—improved memory/learning systems could be an answer.
Quote:
"Personalization is so important... AI application companies are growing quickly, but they suffer from relatively low retention, relatively high churn." — Sarah Catanzaro [19:56]
AI's static nature (unlike dynamic human intelligence) is seen as a major shortcoming.

The Infrastructure Challenge

Making model inference stateful (to update weights per user) is an unsolved system-level problem that will need to be tackled.
Connecting infrastructure (loading, caching, updating) to personalization is both an opportunity and challenge.

5. RL Environments: Fad vs. Value

[23:25 – 25:35]

Candid "hot take": RL environments as currently hyped are a short-term fad; real-world data is a superior RL environment in the long run.
Despite labs paying 7-8 figure amounts for RL environments, Catanzaro posits those resources could be better used directly on real application data.
Quote:
"I think RL Environments is just a fad... The best RL environment is, you know, the real world." — Sarah Catanzaro [23:41; 24:49]
System design and task/rubric definition within the "RL environment" will remain relevant, but building clones of apps for RL is not.

6. Archetype of an Exciting AI Startup

[25:46 – 27:56]

Startups connecting research and application most excite Catanzaro—teams that solve hard technical problems (e.g., retrieval, rule-following, memory) because their app requires it.
Examples:
- Harvey: Advanced RAG (retrieval-augmented generation) as product differentiation.
- Sierra: Success attributed to solving the rule-following research problem for customer support.
- Runway: Unlocked by solving deep product/research integration challenges.

Notable Quotes (with Attribution & Timestamps)

"Many of the big frontier labs are actually using both DBT and Fivetran..."
— Sarah Catanzaro [02:17]
"I do wonder at times like if we built data catalogs for the wrong people and potentially even for the wrong use cases."
— Sarah Catanzaro [07:59]
"It definitely makes me anxious… when founders are asking me, 'How much should I raise?' I'm typically saying, like, three, like five..."
— Sarah Catanzaro [10:57]
"The thing though is that like the valuation is a made up number... until a company exits, it is an entirely made up number."
— Sarah Catanzaro [16:04]
"Personalization is so important... AI application companies are growing quickly, but they suffer from relatively low retention, relatively high churn."
— Sarah Catanzaro [19:56]
"We have not yet defined what a world model is... world model for video game generation might not generalize to... robotics."
— Sarah Catanzaro [17:45; 18:47]
"I think RL Environments is just a fad... The best RL environment is... the real world."
— Sarah Catanzaro [23:41; 24:49]

Important Segments & Timestamps

Modern Data Stack & DBT-Fivetran Merger – [00:56–03:14]
Data Catalogs: Missed Opportunity? – [05:22–08:16]
RISE of High-Dollar, High-Velocity Funding – [10:12–17:02]
World Models & Definition Challenges – [17:45–18:47]
Memory, Continual Learning, Personalization – [18:56–23:18]
RL Environments Hot Take – [23:41–25:35]
Research-to-Application Starups Archetype – [25:46–27:56]

Memorable Moments

Sarah's candid reflection on data catalogs being "the thing I wanted" as a data scientist, but their standalone potential didn’t materialize [06:33].
Lively back-and-forth on the absurdity and risks of chasing billion-dollar valuations in early-stage fundraising [13:09–16:31].
The analogy of LLMs’ early buzz to the "magic" that drew in users—now receding in favor of product fundamentals like retention metrics [21:55].
Her provocative stance that RL environments, as a startup category, may be overhyped.

Conclusion

For more resources and full show notes, visit: latent.space

wavePod

[State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify

Summary

Latent Space: The AI Engineer Podcast

Overview

Key Discussion Points & Insights

1. The Evolution of the Modern Data Stack and its Place in AI

Data Catalogs and Their Unfulfilled Promise

2. Data Infrastructure for AI Labs

3. Wild Funding Climate in 2025

4. Top Technical Themes: World Models, Memory, and Personalization

World Models

Memory & Continual Learning

The Infrastructure Challenge

5. RL Environments: Fad vs. Value

6. Archetype of an Exciting AI Startup

Notable Quotes (with Attribution & Timestamps)

Important Segments & Timestamps

Memorable Moments

Conclusion

Transcript

Summary

Latent Space: The AI Engineer Podcast

Overview

Key Discussion Points & Insights

1. The Evolution of the Modern Data Stack and its Place in AI

Data Catalogs and Their Unfulfilled Promise

2. Data Infrastructure for AI Labs

3. Wild Funding Climate in 2025

4. Top Technical Themes: World Models, Memory, and Personalization

World Models

Memory & Continual Learning

The Infrastructure Challenge

5. RL Environments: Fad vs. Value

6. Archetype of an Exciting AI Startup

Notable Quotes (with Attribution & Timestamps)

Important Segments & Timestamps

Memorable Moments

Conclusion