Podcast Summary: Lenny's Reads
Episode: How to do AI analysis you can actually trust
Host: Lenny Rachitsky
Guest Author: Caitlin Sullivan (content written by Caitlin, narrated by Lenny)
Date: February 17, 2026
Episode Overview
In this episode, Lenny Rachitsky presents an audio edition of Lenny's Newsletter, featuring a guest post by Caitlin Sullivan, a leader in the use of AI for customer research. The main objective is to address a critical challenge: Why AI-powered analysis of user research data is often unreliable, and how to extract genuinely trustworthy, actionable insights from tools like ChatGPT, Claude, and Gemini. Caitlin shares her most effective techniques and prompting frameworks to avoid misleading AI-generated conclusions—skills honed from advising companies like Canva and YouTube and training hundreds of product professionals.
Key Discussion Points & Insights
The Core Problem of AI in User Research
-
Confident but Wrong Outputs: AI models present results with high confidence, even when their analysis contains invented facts, hallucinated quotes, false insights, or biased recommendations.
- "The problem with AI is that the output always looks confident, even when it's full of lies, made up quotes, false insights, and completely wrong conclusions." (00:20, Lenny reading Caitlin)
-
Confirmation Bias in AI: AI may cherry-pick certain quotes or feedback, giving an inaccurate representation and leading to poor decisions.
- Example: Two LLMs analyzing the same data can output wildly different narratives, each delivered as if they're unquestionable truth.
Why Is AI Analysis So Difficult?
-
Interviews:
- Unstructured and messy; AI models impose tidy summaries too quickly, missing contradictions and nuance.
- Real analysis requires sitting with the “mess,” catching shifts, tangents, and contradictions.
- "LLMs handle this by imposing structure and jumping to conclusions a bit too fast." (07:02)
-
Surveys:
- Are not as clean as they appear; metadata, codes, and sparse responses can confuse AI.
- Without clear instructions, AI may misinterpret internal tags or generic answers, leading to poor signal.
Four Major Failure Modes in AI Analysis (and Solutions)
1. Invented Evidence
- Types of Hallucination:
- Completely fictionalized quotes
- "Frankenstein" quotes (merged snippets from various speakers)
- Quote Verification is Crucial:
- LLMs generate text based on probability, not retrieval—so “verbatim” quotes may be made up.
- Even participant IDs and timestamps can be fabricated by AI.
- Solution:
- Define explicit quote rules in your prompts (what counts as a valid quote, use of qualifiers, etc.)
- Always verify quotes in the output—manually or by using another LLM check.
“For each quote in the analysis above: Confirm the quote exists verbatim in the source transcript. If the quote is a close paraphrase but not exact, flag it and provide the actual wording. If the quote cannot be located, mark as not found.” (32:12)
2. False or Generic Insights
- Issue:
- AI outputs are often "true but useless" (e.g., “price matters,” “users want reliability”) but lack depth or specificity.
- "The AI analysis just told me what I already know." (39:41)
- AI outputs are often "true but useless" (e.g., “price matters,” “users want reliability”) but lack depth or specificity.
- Why It Happens:
- LLMs are pattern-finding machines, biased toward consensus and training set priors.
- Sparse survey responses (e.g., “It’s not for me”) get lumped into overly broad themes.
- Solution:
- Provide thorough and specific context in prompts, covering:
- Project context (the scope—what decision are you making?)
- Business goal (what are you trying to achieve? Attract new vs. alienate existing users?)
- Product context (domain knowledge – e.g., screenless wearable competing with Apple Watch)
- Participant overview (who is making the statement?)
"Effective context loading has at least four components that shape how AI interprets everything that follows." (52:01)
- Provide thorough and specific context in prompts, covering:
3. Insights That Don’t Guide Decisions
- Symptoms:
- Output themes are so broad or irrelevant they cannot inform a real business decision.
- Clusters found in survey analyses may aggregate information in ways that don’t correspond to actionable next steps.
- Solution:
- Use context and pointed objectives in your prompt so LLMs are constrained to analyzing information relevant to your current decision.
4. Contradictory Insights Not Surfaced
- Issue:
- LLMs often flatten nuances and contradictions in user responses, losing the valuable tension that informs good product decisions.
- Solution:
- Explicitly instruct the model to highlight contradictions in data, preserve original nuanced language (hedges, qualifiers), and extract both quotes and participant context.
Choosing and Using AI Models for Analysis
-
Model Differences:
- Claude: Best for in-depth, nuanced analysis; covers more ground, but themes need validation afterwards.
- Gemini / NotebookLM: Strongest at generating highly evidenced themes and analyzing video (unique ability), but less complete—needs multiple prompts.
- ChatGPT: Most creative for formatting and communication, but least reliable for evidence—often combines or summarizes rather than quoting verbatim.
-
Model Recommendation:
- For analysis work, Claude is preferred thanks to thoroughness and coverage.
- ChatGPT is commonly used but most prone to failure modes discussed—prompting fixes shared improve results across any LLM.
Memorable Quotes
- “AI finds themes that are too broad and generic to act on, or biased by what you accidentally primed it with.” (41:27)
- “Confirmation bias is not a human-only thing; AI is easily led.” (23:11)
- “LLMs don’t retrieve quotes; they generate text that looks like what a quote should be.” (30:45)
- “This often takes just an extra five minutes... but it catches errors that would otherwise undermine the evidence behind your product decisions.” (37:41)
Actionable Prompting Frameworks
Quote Rules to Add to Your Analysis Prompts
"Start where the thought begins and continue until fully expressed. Include reasoning, not just conclusions. Keep hedges and qualifiers—they signal uncertainty. Include emotional language when present. Cite with participant ID and approximate timestamp. Do not combine statements from different parts of the interview. If a quote would exceed three sentences, break it into separate quotes." (35:10)
Verification Prompt Example
“For each quote in the analysis above: Confirm the quote exists verbatim in the transcript. If the quote is a close paraphrase, flag and provide the actual wording. If the quote cannot be located, mark as not found...” (36:05)
Context Loading Checklist (for analysis prompts)
-
- Project context: What is the specific decision or feature being explored?
-
- Business goal: What are we trying to achieve or decide?
-
- Product context: What domain or product-specific details are relevant?
-
- Participant overview: Who are the users/customers generating this feedback?
Timestamps of Important Segments
- 00:00–02:00 — Introduction and episode premise
- 02:00–09:00 — Why AI analysis fails with interviews vs. surveys
- 09:00–12:30 — Illustrative examples of misleading vs. trustworthy AI output
- 14:40–17:00 — Four common failure modes in AI analysis
- 18:00–32:00 — Failure Mode 1: Invented evidence and how to fix it
- 32:10–41:00 — Failure Mode 2: False or generic insights and prompting solutions
- 41:30–52:00 — Prompt structure and context loading for actionable AI insights
- 55:00–End — Closing thoughts, preview cut-off
Overall Takeaways
- Verification is non-negotiable when using AI for research analysis—never trust AI outputs at face value, especially for quotes or nuanced insights.
- Prompting quality makes or breaks results—clear rules, context, and verification steps significantly increase the trustworthiness and utility of AI-driven analysis.
- Model choice matters, but process matters more—any major LLM benefits from these prompting strategies.
For visuals or detailed walkthroughs, check the written version linked in the show notes.
