The Growth Podcast

Episode Title: AI PM Crash Course: Prototyping → Observability → Evals + Prompt Engineering vs RAG vs Fine-Tuning
Host: Akash Gupta
Guest: Aman Khan (Director of Product at Arise, Former AIPM at Cruise, Spotify)
Date: June 15, 2025

Overview

In this jam-packed crash course, host Akash Gupta sits down with Aman Khan—one of the industry's most experienced AI Product Managers—to deliver an up-to-date, practical guide to thriving as an AI PM (AIPM) in 2025. Covering core PM skills for building with generative AI, the episode dives into AI prototyping, observability, evals, prompt engineering, RAG, fine-tuning, and best practices for collaborating with engineers. Aman and Akash break down their five-step model for mastering AIPM workflows, using live examples and real code, and offer actionable advice for both aspiring and current PMs navigating the rapidly evolving role of product management in the AI era.

Key Discussion Points & Insights

1. The Evolution of Product Management in the AI Age

[02:35, 05:07]

Aman: The PM role is changing rapidly with the advent of AI. Expectations from stakeholders and customers are growing, requiring PMs to integrate AI into workflows regardless of industry.
Key idea: The term “AI PM” isn’t an exclusive specialization; rather, every PM will become an AI PM (e.g., “Fintech x AIPM”) as AI tools and techniques increasingly become standard.

"I think every PM will become some flavor of AI PM—either using those tools or building around them if you aren't already." — Aman Khan [03:39]
Akash: Even regulated industries and internal tools PMs are incorporating AI (e.g., using LLMs to standardize PRD templates in experimentation systems).

2. Five Essential Skills for AI PMs

[05:52]

The core structure for the episode:

AI Prototyping
Observability
Evals
Prompt Engineering vs. RAG vs. Fine-Tuning
Collaborating with AI Engineers & Researchers

3. AI Prototyping: Hands-On With Modern Tools

[06:31 – 45:51]

Cursor IDE: Aman recommends Cursor (a fork of VS Code tailored for AI prototyping) over alternatives like Bolt, Lovable, Vercel, and Replit, especially for projects where depth, flexibility, and agent-based systems are critical.

"The reason I really like Cursor is just because of the amount of control and flexibility it gives me to iterate on specific components." — Aman Khan [06:59]
Live Demo: Aman walks through bootstrapping an agentic Trip Planner using Cursor and the LangGraph agent framework:
- UI and backend are both automatically generated by agents (using models like Claude 4).
- Debugging is conversational—errors are quickly copy-pasted to the agent, which iterates and resolves them.
- Human intervention is sometimes needed when working with complex dependencies or production code.
Key takeaways:
- Cursor is powerful from "zero to one" prototyping, giving visibility, control, and the ability to iterate deeply versus one-click, UI-focused tools.
- Build comfort with agents generating and editing your codebase; expect and embrace some chaos (e.g., dependency issues, broken builds) as part of the prototyping journey.
- Best practice: Start with small, isolated projects before bringing AI agents into production repositories.

"Don't be scared about things breaking. They're going to break. What matters is how you can work with the agent to fix your problems." — Aman Khan [29:34]

4. AI Observability: Understanding What’s Happening Under the Hood

[46:18 – 54:49]

Tracing and Observability: After prototyping, the next step is adding observability to understand, debug, and improve AI systems.
- Aman demonstrates tracing (e.g., via Arise), visualizing agent workflows, execution paths, and how agents interact for a user task (e.g., breaking a travel planning request into research, budgeting, curating).
- Observability supports A/B testing of models, prompts, and components, surfacing latency, failure points, or optimization opportunities.
"Being able to visually see what are the paths that the agent is taking to accomplish a goal—you can see what happens...it kicks off three different agents in parallel to generate an output." — Aman Khan [52:02]
Implementing Tracing: It’s now straightforward—add a tracing package and a decorator to core functions to capture and display traces in UI dashboards.

5. Prompt Engineering, RAG, and Fine-Tuning: When and How to Use Each

[61:57 – 103:08]

A. Prompt Engineering via Playground Experiments

Iterating on agent prompts changes tone, verbosity, and output quality.
Example: Tweaked the prompt to make the travel planner friendlier, briefer, and to collect emails for product feedback or discounts.

"This is what prompt engineering really is... it gives you, think of it as sculpting a block of clay or stone into getting it into the right shape that you want." — Aman Khan [67:12]

B. RAG (Retrieval Augmented Generation)

RAG supplies the model with fresh, relevant context retrieved at runtime (like giving a doctor access to a medical textbook before answering).
RAG is essential for specialized, context-aware products (e.g., domain-specific knowledge bases).

"RAG is basically getting access to a specific part of the data...that's useful to answer a question on the spot." — Aman Khan [65:06]

C. Fine-Tuning

Specializes the LLM to a domain or style, boosts reliability, and can reduce costs/latency at scale.
High effort—used for deeply customized products or to meet very specific requirements.

D. Comparison Table (Effort/Impact)

Prompt Engineering: Low effort, high impact (great for tone, instruction tweaks).
RAG: Medium-to-high effort (need pipelines for retrieval, more infra), very high impact for supporting broad knowledge or customization.
Fine-Tuning: High effort, context-dependent impact (especially valuable for cost, latency, or strict style/format requirements).
Use a layered approach; start with prompts, add RAG for context, use fine-tuning as a final resort.

"Prompt engineering is huge. RAG is another really high impact way to improve your system... Fine tuning is really helpful for saving cost or reducing latency." — Aman Khan [101:28]

6. Evals (Evaluations): From Vibe Coding to Thrive Coding

[70:10 – 88:19]

Why evals? They quantify improvements, AB test candidate prompts/models, and automate quality control.
- Three types:
  1. Human labels (PM or user rates outputs as ‘good’ or ‘bad’)
  2. Code-based evals (e.g., checking if a competitor is mentioned)
  3. LLM-as-a-judge (LLMs act as graders for output quality, friendliness, etc.)
Best Practice: Human evaluation should “close the loop” for LLM judges—PMs should sample and label outputs, compare alignment scores, and help teams iterate the judge prompts.
- Aman demonstrates how PMs evaluate, tune, and optimize outputs, including running prompt versions on sample datasets, employing LLMs to score outputs (e.g., “friendly” vs “robotic”) and comparing these with human labels.
- Strong advice: Treat evals as living product requirements; make continuous evaluation an explicit part of the PM role.

"Evals are what tells you what's good or bad about your system. What if evals were your requirements?" — Aman Khan [107:00]

7. Deconstructing Agentic Products: A Bolt and Cursor Comparison

[90:04 – 103:08]

Bolt / Lovable: Super-fast, magical AI prototyping tools rely on massive system prompts and templates, built on top of agentic reasoning, chain-of-thought, implicit tool calling, and basic RAG.
- Limitations: Closed boxes; difficult to inject external APIs.
- When to move to Cursor/Code: Need more flexibility, custom APIs, or full-code control.
Key insight: Most "magic" AI agent products are well-engineered prompts, reasoned execution flows, tool calling, and observability.
Strategic Product Takeaway: PMs should deconstruct these products and identify where to layer in prompt engineering, RAG, and evals to drive product reliability, output quality, and differentiation.

8. Collaborating with AI Engineers & Researchers

[103:08 – 108:18]

PMs need to speak the language of engineers—understand code, agent workflows, and be hands-on with data, errors, and evaluation.
Move from traditional PRDs to “evals requirements docs”—shared dashboards and metrics, not docs.
PMs must be able to participate in iteration: labeling data, running and interpreting evals, and being in the code or tracing dashboards.

"The stronger you are at communicating with data...the more impact and influence you're going to have as an AI product manager." — Aman Khan [105:15]

9. Pitfalls and Best Practices for Aspiring AI PMs

[108:33 – 113:55]

Mistakes:
1. Not having side projects: You need hands-on experience.
  
  "A classic example...is Claire Vo. She had chat PRD two years ago out...she's been using this side project to learn about the stack every single week." — Aman Khan [109:30]
2. Waiting until AI models are 'better': Start now to build intuition, even if models aren't perfect.
3. Automating too much: Don't blindly trust LLMs' first suggestions. Use them as “second brains,” but reason, iterate, and explore alternatives.
Success formula (for PMs with limited time):
- Try tools yourself, even for personal use cases.
- Build AI intuition by breaking down existing AI products, reading code, and talking to practitioners.
- Apply that intuition in small, ongoing side projects, iterating with new technologies as they emerge.

10. Industry & Career: Are AIPM Jobs Real?

[117:28 – 120:07]

The AI PM job market is growing. Search for “Product Manager, AI” or “AI Product Manager” returns thousands of LinkedIn roles (at premium compensation).
The field is early but accelerating fast—upskilling now puts you ahead of the “wave.”

"You want to be ahead of the wave so that when the wave comes you're able to ride it." — Aman Khan [118:03]

Notable Quotes & Memorable Moments

"Every PM will become some flavor of AIPM...Think of it as an accelerator on top of the workflows you already have."
— Aman Khan [03:39]
"Don't be scared about things breaking. They're going to break. What matters is how you can work with the agent to fix your problems."
— Aman Khan [29:34]
"Evals are your requirements now. Treat them as living product requirements, not documents."
— Aman Khan [107:00]
"I think the reason Bolt is so magical is it's a really great, big system prompt and code generator...not magic at all."
— Aman Khan [93:37]
"Prompt engineering is huge. A small change can get 10%, 20%, 30% gains in your eval scores."
— Aman Khan [101:28]
On side projects:
"If you don't have side projects, that's a really common mistake...If you wait until the models get better, you'll be left behind."
— Aman Khan [109:51]

Timelines for Key Segments

| Section | Timestamp | |-----------------------------------------------|---------------| | Intro to AI PM role evolution | 00:00 – 06:00 | | The Five Core AIPM Skills | 05:52 – 06:30 | | Deep-dive: AI Prototyping with Cursor | 06:31 – 45:51 | | AI Observability & Tracing | 45:51 – 54:49 | | Prompt Engineering, RAG, Fine-Tuning Overview | 61:57 – 103:08| | Evals in Practice | 70:10 – 88:19 | | Tearing Down Bolt/Lovable, Product Thinking | 90:04 – 103:08| | Collaborating with Engineers | 103:08– 108:18| | Common AIPM Mistakes/Best Practices | 108:33–113:55 | | AI PM Career Opportunities |117:28 – 120:07|

Actionable Advice for AI PMs

Try AI tools firsthand (start with personal projects)
Build AI intuition: Deconstruct existing AI-powered products and understand what’s under the hood
Apply your learnings: Ongoing side projects help you adapt to rapid changes
Be hands-on with evals and observability: Make evals your product requirements, not just afterthoughts
Start layering in RAG and prompt engineering before attempting fine-tuning
Collaborate closely with engineers using shared language, values, and dashboards—not static docs

Where to Find Aman Khan & Further Resources

Website: Amank AI
LinkedIn: Search Aman Khan
Maven Course: “AI Prototyping for PMs” (for hands-on, production-level instruction)

Akash recommends Aman as a trusted voice in AIPM education.

Final Thoughts

This episode provides a modern playbook for mastering AI product management—from prototyping to shipping scalable, observable, and rigorously evaluated AI-powered products. With real code examples, hands-on troubleshooting, and actionable frameworks, Aman Khan and Akash Gupta offer a must-listen guide for PMs aiming to thrive in the next generation of digital product management.

The Growth Podcast

Overview

Key Discussion Points & Insights

1. The Evolution of Product Management in the AI Age

[02:35, 05:07]

Aman: The PM role is changing rapidly with the advent of AI. Expectations from stakeholders and customers are growing, requiring PMs to integrate AI into workflows regardless of industry.
Key idea: The term “AI PM” isn’t an exclusive specialization; rather, every PM will become an AI PM (e.g., “Fintech x AIPM”) as AI tools and techniques increasingly become standard.

"I think every PM will become some flavor of AI PM—either using those tools or building around them if you aren't already." — Aman Khan [03:39]
Akash: Even regulated industries and internal tools PMs are incorporating AI (e.g., using LLMs to standardize PRD templates in experimentation systems).

2. Five Essential Skills for AI PMs

[05:52]

The core structure for the episode:

AI Prototyping
Observability
Evals
Prompt Engineering vs. RAG vs. Fine-Tuning
Collaborating with AI Engineers & Researchers

3. AI Prototyping: Hands-On With Modern Tools

[06:31 – 45:51]

Cursor IDE: Aman recommends Cursor (a fork of VS Code tailored for AI prototyping) over alternatives like Bolt, Lovable, Vercel, and Replit, especially for projects where depth, flexibility, and agent-based systems are critical.

"The reason I really like Cursor is just because of the amount of control and flexibility it gives me to iterate on specific components." — Aman Khan [06:59]
Live Demo: Aman walks through bootstrapping an agentic Trip Planner using Cursor and the LangGraph agent framework:
- UI and backend are both automatically generated by agents (using models like Claude 4).
- Debugging is conversational—errors are quickly copy-pasted to the agent, which iterates and resolves them.
- Human intervention is sometimes needed when working with complex dependencies or production code.
Key takeaways:
- Cursor is powerful from "zero to one" prototyping, giving visibility, control, and the ability to iterate deeply versus one-click, UI-focused tools.
- Build comfort with agents generating and editing your codebase; expect and embrace some chaos (e.g., dependency issues, broken builds) as part of the prototyping journey.
- Best practice: Start with small, isolated projects before bringing AI agents into production repositories.

"Don't be scared about things breaking. They're going to break. What matters is how you can work with the agent to fix your problems." — Aman Khan [29:34]

4. AI Observability: Understanding What’s Happening Under the Hood

[46:18 – 54:49]

Tracing and Observability: After prototyping, the next step is adding observability to understand, debug, and improve AI systems.
- Aman demonstrates tracing (e.g., via Arise), visualizing agent workflows, execution paths, and how agents interact for a user task (e.g., breaking a travel planning request into research, budgeting, curating).
- Observability supports A/B testing of models, prompts, and components, surfacing latency, failure points, or optimization opportunities.
"Being able to visually see what are the paths that the agent is taking to accomplish a goal—you can see what happens...it kicks off three different agents in parallel to generate an output." — Aman Khan [52:02]
Implementing Tracing: It’s now straightforward—add a tracing package and a decorator to core functions to capture and display traces in UI dashboards.

5. Prompt Engineering, RAG, and Fine-Tuning: When and How to Use Each

[61:57 – 103:08]

A. Prompt Engineering via Playground Experiments

Iterating on agent prompts changes tone, verbosity, and output quality.
Example: Tweaked the prompt to make the travel planner friendlier, briefer, and to collect emails for product feedback or discounts.

"This is what prompt engineering really is... it gives you, think of it as sculpting a block of clay or stone into getting it into the right shape that you want." — Aman Khan [67:12]

B. RAG (Retrieval Augmented Generation)

RAG supplies the model with fresh, relevant context retrieved at runtime (like giving a doctor access to a medical textbook before answering).
RAG is essential for specialized, context-aware products (e.g., domain-specific knowledge bases).

"RAG is basically getting access to a specific part of the data...that's useful to answer a question on the spot." — Aman Khan [65:06]

C. Fine-Tuning

Specializes the LLM to a domain or style, boosts reliability, and can reduce costs/latency at scale.
High effort—used for deeply customized products or to meet very specific requirements.

D. Comparison Table (Effort/Impact)

Prompt Engineering: Low effort, high impact (great for tone, instruction tweaks).
RAG: Medium-to-high effort (need pipelines for retrieval, more infra), very high impact for supporting broad knowledge or customization.
Fine-Tuning: High effort, context-dependent impact (especially valuable for cost, latency, or strict style/format requirements).
Use a layered approach; start with prompts, add RAG for context, use fine-tuning as a final resort.

"Prompt engineering is huge. RAG is another really high impact way to improve your system... Fine tuning is really helpful for saving cost or reducing latency." — Aman Khan [101:28]

6. Evals (Evaluations): From Vibe Coding to Thrive Coding

[70:10 – 88:19]

Why evals? They quantify improvements, AB test candidate prompts/models, and automate quality control.
- Three types:
  1. Human labels (PM or user rates outputs as ‘good’ or ‘bad’)
  2. Code-based evals (e.g., checking if a competitor is mentioned)
  3. LLM-as-a-judge (LLMs act as graders for output quality, friendliness, etc.)
Best Practice: Human evaluation should “close the loop” for LLM judges—PMs should sample and label outputs, compare alignment scores, and help teams iterate the judge prompts.
- Aman demonstrates how PMs evaluate, tune, and optimize outputs, including running prompt versions on sample datasets, employing LLMs to score outputs (e.g., “friendly” vs “robotic”) and comparing these with human labels.
- Strong advice: Treat evals as living product requirements; make continuous evaluation an explicit part of the PM role.

"Evals are what tells you what's good or bad about your system. What if evals were your requirements?" — Aman Khan [107:00]

7. Deconstructing Agentic Products: A Bolt and Cursor Comparison

[90:04 – 103:08]

Bolt / Lovable: Super-fast, magical AI prototyping tools rely on massive system prompts and templates, built on top of agentic reasoning, chain-of-thought, implicit tool calling, and basic RAG.
- Limitations: Closed boxes; difficult to inject external APIs.
- When to move to Cursor/Code: Need more flexibility, custom APIs, or full-code control.
Key insight: Most "magic" AI agent products are well-engineered prompts, reasoned execution flows, tool calling, and observability.
Strategic Product Takeaway: PMs should deconstruct these products and identify where to layer in prompt engineering, RAG, and evals to drive product reliability, output quality, and differentiation.

8. Collaborating with AI Engineers & Researchers

[103:08 – 108:18]

PMs need to speak the language of engineers—understand code, agent workflows, and be hands-on with data, errors, and evaluation.
Move from traditional PRDs to “evals requirements docs”—shared dashboards and metrics, not docs.
PMs must be able to participate in iteration: labeling data, running and interpreting evals, and being in the code or tracing dashboards.

"The stronger you are at communicating with data...the more impact and influence you're going to have as an AI product manager." — Aman Khan [105:15]

9. Pitfalls and Best Practices for Aspiring AI PMs

[108:33 – 113:55]

Mistakes:
1. Not having side projects: You need hands-on experience.
  
  "A classic example...is Claire Vo. She had chat PRD two years ago out...she's been using this side project to learn about the stack every single week." — Aman Khan [109:30]
2. Waiting until AI models are 'better': Start now to build intuition, even if models aren't perfect.
3. Automating too much: Don't blindly trust LLMs' first suggestions. Use them as “second brains,” but reason, iterate, and explore alternatives.
Success formula (for PMs with limited time):
- Try tools yourself, even for personal use cases.
- Build AI intuition by breaking down existing AI products, reading code, and talking to practitioners.
- Apply that intuition in small, ongoing side projects, iterating with new technologies as they emerge.

10. Industry & Career: Are AIPM Jobs Real?

[117:28 – 120:07]

The AI PM job market is growing. Search for “Product Manager, AI” or “AI Product Manager” returns thousands of LinkedIn roles (at premium compensation).
The field is early but accelerating fast—upskilling now puts you ahead of the “wave.”

"You want to be ahead of the wave so that when the wave comes you're able to ride it." — Aman Khan [118:03]

Notable Quotes & Memorable Moments

"Every PM will become some flavor of AIPM...Think of it as an accelerator on top of the workflows you already have."
— Aman Khan [03:39]
"Don't be scared about things breaking. They're going to break. What matters is how you can work with the agent to fix your problems."
— Aman Khan [29:34]
"Evals are your requirements now. Treat them as living product requirements, not documents."
— Aman Khan [107:00]
"I think the reason Bolt is so magical is it's a really great, big system prompt and code generator...not magic at all."
— Aman Khan [93:37]
"Prompt engineering is huge. A small change can get 10%, 20%, 30% gains in your eval scores."
— Aman Khan [101:28]
On side projects:
"If you don't have side projects, that's a really common mistake...If you wait until the models get better, you'll be left behind."
— Aman Khan [109:51]

Timelines for Key Segments

Actionable Advice for AI PMs

Try AI tools firsthand (start with personal projects)
Build AI intuition: Deconstruct existing AI-powered products and understand what’s under the hood
Apply your learnings: Ongoing side projects help you adapt to rapid changes
Be hands-on with evals and observability: Make evals your product requirements, not just afterthoughts
Start layering in RAG and prompt engineering before attempting fine-tuning
Collaborate closely with engineers using shared language, values, and dashboards—not static docs

Where to Find Aman Khan & Further Resources

Website: Amank AI
LinkedIn: Search Aman Khan
Maven Course: “AI Prototyping for PMs” (for hands-on, production-level instruction)

Akash recommends Aman as a trusted voice in AIPM education.

AI PM Crash Course: Prototyping → Observability → Evals + Prompt Engineering vs RAG vs Fine-Tuning

Summary

The Growth Podcast

Overview

Key Discussion Points & Insights

1. The Evolution of Product Management in the AI Age

2. Five Essential Skills for AI PMs

3. AI Prototyping: Hands-On With Modern Tools

4. AI Observability: Understanding What’s Happening Under the Hood

5. Prompt Engineering, RAG, and Fine-Tuning: When and How to Use Each

A. Prompt Engineering via Playground Experiments

B. RAG (Retrieval Augmented Generation)

C. Fine-Tuning

D. Comparison Table (Effort/Impact)

6. Evals (Evaluations): From Vibe Coding to Thrive Coding

7. Deconstructing Agentic Products: A Bolt and Cursor Comparison

8. Collaborating with AI Engineers & Researchers

9. Pitfalls and Best Practices for Aspiring AI PMs

10. Industry & Career: Are AIPM Jobs Real?

Notable Quotes & Memorable Moments

Timelines for Key Segments

Actionable Advice for AI PMs

Where to Find Aman Khan & Further Resources

Final Thoughts

Summary

The Growth Podcast

Overview

Key Discussion Points & Insights

1. The Evolution of Product Management in the AI Age

2. Five Essential Skills for AI PMs

3. AI Prototyping: Hands-On With Modern Tools

4. AI Observability: Understanding What’s Happening Under the Hood

5. Prompt Engineering, RAG, and Fine-Tuning: When and How to Use Each

A. Prompt Engineering via Playground Experiments

B. RAG (Retrieval Augmented Generation)

C. Fine-Tuning

D. Comparison Table (Effort/Impact)

6. Evals (Evaluations): From Vibe Coding to Thrive Coding

7. Deconstructing Agentic Products: A Bolt and Cursor Comparison

8. Collaborating with AI Engineers & Researchers

9. Pitfalls and Best Practices for Aspiring AI PMs

10. Industry & Career: Are AIPM Jobs Real?

Notable Quotes & Memorable Moments

Timelines for Key Segments

Actionable Advice for AI PMs

Where to Find Aman Khan & Further Resources

Final Thoughts