wavePod

How to Do AI-Powered Discovery (Step-by-Step with Live Demo) | Caitlin Sullivan - The Growth Podcast | Wave AI Podcast Notes

Back to The Growth Podcast

How to Do AI-Powered Discovery (Step-by-Step with Live Demo) | Caitlin Sullivan

The Growth Podcast

Fri Feb 13 2026

Summary

The Growth Podcast – Episode Summary

Episode: How to Do AI-Powered Discovery (Step-by-Step with Live Demo)
Host: Aakash Gupta
Guest: Caitlin Sullivan (Expert in AI-driven user research)
Date: February 13, 2026

Episode Overview

This episode is a masterclass with Caitlin Sullivan on conducting rigorous user discovery and customer research with the aid of AI, especially large language models (LLMs) like Claude, Gemini, and ChatGPT. Caitlin dives deep into the practical workflows, showcases live prompts and demos, debunks myths about AI “hallucination,” and explains how step-by-step prompting can replicate (and accelerate) the best practices of human researchers. The episode focuses on both survey and interview analysis—walking listeners through every phase, from preparing context to running audits, both interactively (step-by-step) and in agentic, code-driven workflows.

Key Discussion Points & Insights

1. The Mess and the Promise of AI in User Research

AI for user research is "messy and unreliable" if handled poorly—but with the right workflow, you can halve your analysis time without hallucinations.
- Caitlin:
  
  “The key is actually replicating the way that you would do things in a rigorous way as a human and just doing it like that with AI.” [00:05]
Critical insight: Don’t skip straight to synthesis. True rigor comes from mirroring the methodical steps humans take.

2. Selecting Your LLM Workspace

Caitlin prefers Claude for its nuanced, thorough analysis by default—but regularly cross-tests with Gemini and ChatGPT.
- Quote:
  
  “By default [Claude] does a more thorough, more nuanced analysis than the other two platforms... Gemini seems to be fine-tuned a bit more for accuracy...but you have to push [it] more than Claude to give you the complete picture.” – Caitlin [01:03, 11:47]
Markdown transcripts are essential for preserving structure and enhancing accuracy in AI analysis.
- Tip:
  
  “Transforming your interview transcripts into markdown files...not just better for file handling, it actually helps these models do a more accurate job.” [12:44]

3. Replicating Rigorous Human Analysis (The Core Workflow)

Breakdown of Tasks and Workflow

The process echoes classic research methodology:
- Analysis (dig through, pick apart data)
- Verification/Stress Testing (ensure insights hold up)
- Synthesis (only at the end)
- Caitlin:
  
  "Most people...jump straight ahead to synthesis. And that’s exactly what we don’t want to do." [02:50]

Phase Details:

I. Load Context:

Present background/project info alone in a first prompt (do not combine with tasks).
Sample Instruction:

“Internalize this only, do not run analysis yet.” [16:19] II. Analysis (Per-Participant/Per-Response Digging):
Extract “value anchors” and “fragile points” for interviewees.
Explicitly define quote selection, ratings, and coding rules.
Quote:

“I spell out what a quote looks like to me.” [17:40] III. Verification/Audit/Stress Test:
Instruct the model to find contradictions or inconsistencies in its own findings.
Quote:

“The truth is they do find mistakes. They find mistakes all the time.” [27:11] IV. Synthesis (Summarize themes/action items):
Only after verification.

**Notable Segment: “Two/Three/Four Step Prompting”

Context load → Per-participant analysis → Verification → (Optional) Synthesis. [16:44-34:34, 54:21-55:17]
Aakash:

“Step by step prompting is how you get the AI to follow your human process. And that’s our North Star here.” [46:04-46:12]

4. Survey Analysis in Detail

Dataset Example: Churn survey with open-ended feedback.
Process:

Coding: Assign mutually exclusive labels to open-text responses (not software ‘coding’ but categorization). Use inductive coding to let themes arise from the data.
- Provide coding rules:
  
  “Each response needs one primary code...mutually exclusive, not overlapping.” [38:49]
- Watch for math errors, instruct to use code for calculations.
  
  “If you want to be safe on your math, tell it to code it.” [40:52]
Quantification: Count the frequency of each code.
Intensity Ratings: For emotional strength—differentiate between ‘soft exits’ (circumstantial) and ‘angry exits’ (fixable product issues).
- Tip: Give few-shot examples plus rationale:
  
  “Few shot is reasoning. Not just showing the example, but why as well.” [49:21-49:35]
Audit: Have AI review and correct its own coding and ratings.

5. Agentic / Parallelized (“Mega”) AI Workflows

Running Parallel Agents:

Set up Claude Code (in your terminal) to automate and parallelize both interview and survey analysis via agent markdown files and context documents.
- Summary: Three types of input: context document; agent markdown prompt (e.g., Interview Analyzer Lite); and actual data files. [60:02]
File structure and explicit context/rules in markdown = higher accuracy and speed.
Time savings: Analyze interviews + survey in parallel, slashing overall time.
- “You just let your agents...do both things at once...cutting out half the time by parallelizing this.” [63:33-64:11]

File Output:

Produces ready-to-use markdown summaries with executive summaries, per-user breakouts, traceable quotes, and intensity ratings—prime for synthesis or reporting.

6. AI-Moderated Surveys & Limitations

AI-moderated interviews/surveys are improving but vary in quality. Still, human skill in research is irreplaceable.
- Caitlin:
  
  “Some [AI tools] are asking really, really good questions...some are not quite there yet.” [06:36]
  “I also wasn’t particularly impressed [with Anthropic’s results].” [06:50]
- Aakash:
  
  “Even Anthropics interviewer couldn’t do a good job...think about this as a skill.” [07:52]

Notable Quotes & Memorable Moments

Respect the Craft:

"Respect the craft of user research. Respect the people who are working on it in your company." – Aakash [07:52]
Two-Step Prompting:

“First step context, second step analysis. Most people try to shove it into one prompt.” – Aakash [16:50]
Why Separate Context Load:

“If you try to paste a two page long prompt in one go...they drop some of the instructions...So I just copy my first prompt, which is only about context.” – Caitlin [09:22]
‘CYA’ Audit:

“I’ll call this the CYA way to use AI – cover your ass – have AI check that out.” – Aakash [53:18]

Timestamps for Key Segments

| Topic | Timestamp | |:---------------------------------------- |:---------------:| | Initial framing on AI analysis | 00:00 – 02:41 | | Importance of “not skipping steps” | 02:50 – 03:43 | | Caitlin on preferred LLMs | 11:47 | | Markdown transcripts approach | 12:44 – 14:54 | | Step-by-step prompting demo (Interviews) | 16:44 – 24:22 | | Tools: Dovetail, Breda, Reveal | 24:22 – 26:14 | | Multi-step prompting: Audit explained | 27:11 – 31:38 | | Survey analysis: Coding/Quantification | 35:02 – 44:14 | | Coding vs. Quantification vs. Verification | 54:21 – 55:17 | | Sentiment/Intensity Rating | 48:13 – 50:36 | | Agentic workflow via Claude Code | 58:14 – 67:00 | | Output formatting/markdown review | 67:13 – 70:39 | | Host summary wrap-up | 70:50 – 71:44 |

Practical Takeaways & Advanced Tips

Replicate, don’t shortcut, the proven human analysis workflow with AI: Always split out context, explicit per-response analysis, and verification. Synthesis should be last.
Be explicit in rules and give examples: Key to avoiding bias and AI misinterpretation.
Audit/falsify the AI’s own reasoning: Push the model to critique itself, spot contradictions, and recode where necessary.
Consider survey/interview data as separate parallelizable jobs. Agentic workflows in terminal environments can halve your analysis time for large, multi-modal discoveries.

Closing Wisdom

“The best products are built with the best user understanding. And this is your roadmap.” – Aakash [70:50]

Guest resources:
Caitlin Sullivan’s cohort-based course (more depth, advanced prompt engineering) – see show notes for discount.
Host resources: Bundle of AI product tools for listeners at buildle.akashg.com

For anyone serious about using AI to power up their user discovery, this episode is the definitive playbook.

Loading summary...

Transcript

A (0:00)

AI for user research is messy. It's unreliable. But what if I told you I know how to fix it?

B (0:05)

There's a real art to doing this right, but if you do it the right way, then AI actually isn't going to hallucinate on you and you're going to cut your analysis time in half. The key is actually replicating the way that you would do things in a rigorous way as a human and just doing it like that with AI.

A (0:20)

This is Caitlin Sullivan, one of the world's leading experts in user research.

C (0:24)

And in today's episode, she's going to

A (0:26)

give you all the tools, the demos, the workflows, so that you can do amazing user discover and customer research enabled with AI. These results genuinely shock. What does good AI customer research and analysis actually look like?

B (0:40)

So I think there's actually a really simple answer, which is it looks like replicating the way that we do rigorous analysis as humans.

A (0:48)

You've chosen Claude. If people have the option to choose between all the AI models, why have you chosen Claude?

B (0:54)

I keep coming back to Claude because by default it does a more thorough, more nuanced analysis than the other two platforms.

A (1:03)

Then there's two big types of analysis that people tend to run, surveys and interviews. What are we going to do today on the surveys front?

B (1:10)

I'm going to demo how you can use any of the large language models to run a few key steps that I just talked about to get really solid results.

A (1:19)

Wow, that's insane. Before we go any further, do me

C (1:23)

a favor and check that you are

A (1:25)

subscribed on YouTube and following on Apple and Spotify podcasts.

C (1:29)

And if you want to get access

A (1:30)

to amazing AI tools, check out my bundle, where if you become an annual subscriber to my newsletter, you get a

C (1:36)

full year free of the paid plans

B (27:11)

It's three step here and I would say it's four if synthesis is added. So the context is almost like a step zero. I would say you kind of just load the background information. So I would call this three steps in this workflow flow, which is condensed compared to what I usually do. So for the most rigorous possible version of this, you have more steps because you need more feedback loops with AI. But you'll see one version of this. Here, let me just space this out so people can see this more easily. Just going to let it do its thing while I talk. So this is what I referred to earlier as a stress testing or verification step. We need this because even if you think everything looks really good which, you know, most people go, oh, AI did this analysis for me and it looks really solid. It's usually not as solid as you think it is. And what happens down the line is then a product manager or designer or researcher, you know, presents some findings in a meeting and then can't trace them back or realizes there are actually lots of holes in this insight or set of insights. But if you push your model through an additional step for verification, or I usually call this some kind of audit step, then it very often catches its own mistakes. And I get questions all the time from people who are really rigorous or kind of know that these models have a lot of bias built in. They ask me, yeah, but how is it going to critique its own work? You know, it's bias toward its own work. The truth is they do find mistakes. They find mistakes all the time. And I've tried this kind of splitting the process between tools too. But this actually using it in the same model that you're doing analysis in does find issues. So let's see if it found some things. So, yeah, so in this particular prompt, actually, let me bring this back up again. This version of a verification check, I'm asking for contradictions in user statements. So it's very common that one person in a transcript will be telling a story one way. Like they say, you know, well, first I got my shopping cart when I was going grocery shopping and then I looked for the apples or whatever in a certain order. The second time around, in a different part of the conversation, they tell the story of the user journey in a completely different order or they contradict whatever they said earlier in some way. Well, you know, I use this every day, the equivalent of like I go to the gym every day or five times a week. And then, well, when was the last time you went to the gym? Oh, well, it was probably Saturday and it's Wednesday today. Right. So very often if you're doing a good interview, you find some contradictions in what someone does and says or what someone said the first time around and then what they said the second time around. But AI is very likely to blur that kind of cherry pick the stories that it likes or that match what you asked for most closely and forget about or ignore the second thing they said. That is actually the opposite. So I'm putting it through this pass where I'm checking for contradictions and that will make the full analysis in the end much more bulletproof, kind of preventing the cherry picking of the story that best aligns and getting the full picture. So to spare you the details. I think people can look at this on the screen, stop the video if they want to, but but I'm basically asking for contradictions and I'm defining what contradictions look like to me. I've even spelled out what is not a contradiction and a few examples. So I'm consistently giving examples, definitions and then what that looks like to me to be really, really clear and just so you see a few contradictions here. So something was revised. The rating was revised from original no, this is confirmed. Sorry, this one is moderate risk in the first place, like at risk of churning and then it was revised to high risk. So upgraded because it double checked what the person was saying, the stories they were telling and actually realized that there's more risk involved. So this is the sort of correction that I'm looking for. I want to make sure that it has combed through the data again, made sure that the claims, the statements that it made were actually verified.

B (67:18)

These days it really depends on what I'm trying to do. But if I have everything based in my files, then I find Claude code in the terminal to just be a sort of more efficient way of doing things. I'm. Like I said earlier, I'm. I don't consider myself a super technical person person. I have a marketing and then design and then research background. So I'm not coming from coding. And I, once I got used to this, like the whole interface not being particularly, you know, designy, I, I started really appreciating just being able to use slash commands to kind of call things, start tasks with a couple of keystrokes. I think it's really interesting to automate these processes that I would normally do step by step manually, but just learn how to make it call the right information from different documents, or the fact that it can find the right information within my documents without me having to go, oh, where did I put that? Or even saying, do we have, you know, for repository cases where someone's saying, what do we already know about X? Well, if you have Claude code and it's actually synced to all of your company files, your project files, then it can just go figure that out for you. You need to know how to prompt really well. You need to be clear and explicit and you need to know how to structure documents in a way that makes it easier for your large language model to call all of them. But you know, if you figure that out, then a process like this goes really, really fast and much easier. Easier. So interview analysis is complete here. Survey analysis is complete and it has a few quick findings. But what's more important is that it has output things to a file. So I'm going to stop the screen share here and I'm going to go over to where I hope it saved it. So here's our Meditation app Premium user retention analysis. There's an executive summary in here because of my instructions. So we have how many stable subscribers, moderate risk subscribers there are, and so on from the Interview analysis. Right, so this is my interview analysis results, Markdown file phase one data confirmation phase two individual retention analysis. So there's a little bit more in here, but it's basically doing the same thing. We have the value anchors analysis and the results here. This particular therapy chat feature directly addresses a gap that traditional therapy couldn't fill. And so on one we've got quotes and timestamps from this participant. So it's done exactly what it did when I was doing this step by step in the browser. UI per participant, pulling out everything. Then we've got P2, Participant 02 and all of the value anchors from this person. It's all of this, the output you've already seen but just now in a markdown file. So what I usually do from here is I kind of format this differently, put it in pages or Google Docs or something and make it look a little bit nicer and then I'll pluck out things that kind of stood out to me. It has, you know, that executive summary at the top. So most of the time that's pretty ready. I can kind of copy paste that and share it with someone, but it has all of the details that I would need if I want to review this on my own as well. Like you saw in, in the browser and even some tables.