
Discover how AI and synthetic users are reshaping UX research in this episode featuring insights from researcher Mario Callegaro.
Loading summary
Nathan Isaacs
Welcome back to Insights Unlocked. In this episode, we sit down with market researcher and AI expert Mario Calogero to explore how artificial intelligence is reshaping the way we plan, conduct and activate research. From the rise of synthetic users to the new skill of prompt engineering, Mario helps us make sense of what's changing and what still firmly belongs to humans. Enjoy the show.
Podcast Narrator
Welcome to Insights Unlocked, an original podcast from User Testing where we bring you candid conversations and stories with the thinkers, doers and builders behind some of the most successful digital products and experiences in the world, from concept to execution.
Nathan Isaacs
Welcome to the Insights Unlocked podcast. I'm Nathan Isaacs, Principal Content Marketing Manager at User Testing and joining us today as host is Amrit Batu, a principal customer Experience consultant here at User Testing. Welcome to the show, Amrit.
Amrit Batu
Hi, everyone. Hi Nathan. Thank you for having me on again.
Nathan Isaacs
And our guest today is Mario Caligaro. Mario is the founder of Calogero Research, a leading voice at the intersection of survey methodology and and user experience. With 15 years at Google, many more in academia before that, and a passion for advancing research with AI, Mario brings a unique blend of rigor, curiosity and practical insight into how we understand people and design better experiences. Mario is also a board member of the Quantitative User Experience association, or QUANT UX now at its fourth year conference that took place virtually in November. We'll provide a link to the recordings in the show Notes. Welcome to the show, Mario.
Mario Calogero
Thank you. Thanks for having me, Nathan and Amart.
Amrit Batu
Yeah, lovely. Lovely to meet you, Mario. Our first time speaking with each other. I was saying I've done a lot of background reading into yourself and your agency and a lot of the work that you guys have been doing. You've really been working at the crossroads of survey research and UX for a number of years e even before your sort of time at Google as well, which I'm sure you're going to get into. What first sparked your interest in exploring how AI could enhance the work that you were doing?
Mario Calogero
Yeah, excellent question. Well, I guess everything started when I was working at Google and was following the initial development of Gemini and all the work about AI. So we had a lot of, as you can imagine, internal discussion. We were also testing the tool early on as dog fooders, as we say at Google. Also at the time, it just happened that I was in Google Cloud and I started working from day zero on something that now is called Gemini Cloud Assist. Gemini Cloud Assist is basically an AI assistance for managing your applications across the lifecycle of Google Cloud from application design, deployment Monitoring, troubleshooting and performance and cost optimization. This is just a blurb, you know, from the official Gemini Cloud Assist. But in practice, what I was doing is doing research with customers, doing lots of qualitative interviews and asking them, what do you want AI doing for you? What do you think? Where can we help you in managing all your cloud workloads? You know, cloud. So there's this exponential growth of tools, which also makes it a bit complicated for researcher to decide which one should I use or not. But we can talk about the later. But that's pretty much how I started. And I got more excited and I started in my consulting job to do consulting because some customers said, well, how can we use AI? That's forced me to explain it. You know, that's, that's how you learn. You know, if you need to teach somebody, you need to know it very well. That forced me to read more papers, explain, and then obviously testing with different tools, try to break them, mess around with it and see how the output is. And, and, and then all the discussion about quality which we go through during the podcast.
Amrit Batu
Nice. So I've got, I've got a question for you here. You were really in, at the very start of this AI and the work that was being done in Gemini when you were at Google, as you were going through and doing that research, what were people's initial reaction to how you presented AI? How AI was going to impact them? Way back when you first started looking at it, how did they react to it at that point in time?
Mario Calogero
Well, initially, because I was talking to. So it was a B2B context, and I was talking to a lot of engineers. They were very, very, I would say, strict about the quality of the answers, you know, they were expecting. And what I can. So the mental model for them is that I have this senior, you know, software engineer sitting next to me, and I can ask this person if, hey, how do you do this? So they had very, very high expectations from the tool. And so they were very critical of the answers, which sometimes they were not great in the way we were at the beginning. So think about how quickly tools are becoming better and better every time there's a new version. So there was the initial reaction. At the same time, they were also very forgiving. So they understood that, for example, latency was not as fast as you would expect for, let's say, Google search, obviously, because there was an expectation that quickly changed in people's mind that you cannot expect to have that answer to pop up after so quickly comparing it to An LLM answer. So they were at the same time very strict and quality driven and also forgiven in terms of latency. But then when we talk to other people, for example, other customers who were using one of the endless Google Cloud tools for the first time, well, they said well that's actually very useful because I don't need to read all to this documentation. So they appreciated a summary for example of their issue, but also a link to the official documentation to read it more, which is how is it working now? You know, so it's very similar. So kind of having somebody sitting next to you who tells you that gives you like the TLDR initially and then okay, but now obviously you need to know more. So go and read the official documentation because when actually you're going to write this code or anything, you will probably need to know a bit more. So it's a good, it's also using as more like a learning tool. You know, I need to use this tool which I never use because I'm specializing databases for example. But now we need to use something about compute. Well, okay, I cannot know everything so I can get AI to just get me started and then obviously I'm going to read the documentation because I know it's, it's official, written by Google and it's actually correct, you know, and all the other things. And then maybe I can go to a forum and ask other folks. So it was interesting to see how people used it. And also they used different tools. They didn't use also cloud, they used other tools and said okay, if I ask the same question to another tool, what's the answer? So they were always bringing the competitors up in the conversation, which is actually fair. It's fair.
Amrit Batu
AI literally is everywhere that we look now, from the high end, that commercial sense to how you go and search for, we're at that time of year for Christmas shopping, etc. A lot of people are using AI lists to get through that. Where do you see it adding value when it comes to the research workflow? And how do you see the AI tools influencing the role of the researchers as things move forward?
Nathan Isaacs
Forward.
Amrit Batu
So a couple of questions in there for you.
Mario Calogero
Yeah, so on the first question, so see value in the research workflow. There are so many publications out there experimenting and testing AI system really at each step of the process. So at this point AI can add value of every step. And I'm now going to use a framework which was developed by my former colleague at Google, DeepMind, Yong Wei Yang, where we especially young way divided the resource process in three big steps we presented at different conference. One is the planning, the execution and the activation. So in the planning phase and this is general about research. So it can be all kind of research, qualitative, quantitative, Microsoft survey shots or UX research. One is in the planning stage, you generally have a moment where you need to define and get your research ideas. Where sometimes is given to you by a customer if you're working with customer or by your team if you work in a company. And so you need at that point generate ideas, refine business searches, questions and extract and summarize a lot of information. That's generally this initial part where you want also to be a bit of a domain knowledge expert. And sometimes it's a new domain which are not familiar with and AI can definitely help on that. Then in the planning phase, the second part is the design and planning. So you can design your research process depending on the research question that you just decided to work on. And at that point sometimes you need to create some artifacts. So you know, AI can help designing some artifacts, briefs, analysis plan or even some images that you might create to show folks like some mocks or anything else that can be now created a bit faster with some visual tools. Then in the execution phase you need to gather and process the data depending on your methodology. And so at that point there's a lot of really research being written on drafting questionnaires if you are in the survey space or create interview guidelines or. And then we go into the generate synthetic Personas and samples which I guess we're going to discuss it later then. So that's the data gathering space. And then what do you do with all this data? So you have the analysis part and so we call it analysis copilot, you know, so you can assist, automate iterated analysis, you can do coding, data analysis and can be qual or quant. And there are tools out there that can do like lots of stuff on that including also helping you in the qualitative space to guarantee more privacy to your respondents. I've seen tools that they can completely replace for example, you know, with new qualitative research. I just want to focus on that because I think AI is actually a nice example here. When you have, let's say you see the person and so you want to show these videos and it's confidential. So now there are tools out there that can replace, including the voice just keep. But keeping the same tone because you don't want to lose that emotional piece and replace it with an avatar. So you don't really need to show this person, so protect that person identity. But then you don't want to dilute down all the rich qualitative tone of the voice and everything. So that was impossible to do before. Think about that. It was just really impossible. Now there's some tools that can do that. And then in the activation phase, you are generating insights. So AI can help in writing summaries, generate even reports. At this point, this is what AI can do. I'm not talking about the quality here, it's just what the capabilities. The quality is a different discussion, which we can talk later. So that's the insights part. And then you have the storytelling. So we call it activation of the insights. So think about in the past, you had to create. It was very time consuming to create different artifacts because depending on the audience, some people, they want just. If you talk to a stakeholder, let's say a vice president, they might want five slides at best. But then if you talk to the research team or in a company, you talk to your product managers, they really want more details. So you need to create a different set. But then you might run a. And then I might say, well, how about we write a blog post about this research? Well, it's a completely different style. And so you need to write all these assets and if you have all the research done, you can feed it to an AI. And I did some experiments. When you have. I just took one of my papers, so a very academic paper, and I just upload it and say write a blog post about it. Well, the output was actually pretty good. Obviously you do a bit of editing, but it would have taken me a lot of time to write something from 5,000 words of a more academic paper to a blog post in more conversational language, still keeping the meaning of the paper and summarize it in 500 words. For example, which. Which office is very useful, especially if you are either not a good writer or maybe you know the language you're writing is not your first language. So I had lots of colleagues who like to use AI because it's very helpful. And there's also research on that, on using it to improve the quality of your written text, basically. So those are six steps which are basically planning execution and activation. And there are endless tools out there which you can classify in your mind as you have these tools that can do everything. So the classic chatgpt, Gemini Anthropic, all of them. And then you have these specialized tools which you see there's an AI for that. They do just One thing and hopefully they do it better than the other tools. Hopefully. But you need to test it, obviously, which is tricky because now which one do you use? We have this kind of information overload. Before we kind of, we had our tools. You go to a company, you work with the tools they give you and you have some statistical packaging softwares, some other tools to help you to do the qualitative stuff. And now it's done. And here is like so many more that you might use with different quality, with different contracts, you know, the free version and they can do up to a certain point and you need to pay a subscription. So it gets complicated very quickly and I think a lot of researchers are a bit overwhelmed by the offer, I would say that is out there.
Amrit Batu
Yeah. And it's really interesting. The word that you kept on using as you were going through that was help. The AI can help with each of the different phases. That's something I'm sure we're going to touch on in a bit more time. How do you see the tools influencing the role of the researcher themselves? How do you see it changing what the influencer the researcher is actually doing?
Mario Calogero
It's a massive, this is like a really gigantic shift, you know, and researchers are using AI tools at every step. So now it's like having, you know, before you had a set of tools and you had your colleagues, you know, helping you to do research, and now they have these other tools or these, you know, helpers or assistants or co pilots, you know, that can actually help you, assist you save some time, ideally do it properly, obviously. But as we said before, you know, which one do you use, how do you use it? And then that's the other part of the story which we need to talk about is the prompting piece, which is very important to discuss. And the way I see prompting is learning a new language because we are not used to talk that way, number one. So prompting is like a different kind of language. We were used to type few keywords in a search bar. That was it. And for most queries you would get a pretty good answer. Now the prompt, the way you prompt makes a massive difference in the quality of the answers, in the depth, in the length. And there are new vague technical papers coming out from the computer science literature telling you new prompting strategies that you can use that can give you different, you know, higher quality of different answers. And so we need to learn this new language. And there's no, obviously there are many trainings out there. Luckily, you know, the big companies producing these tools are actually Pretty much having their own kind of guidelines. So that's a good starting point if you want to learn about prompting. But they make so much different, which is, it's fascinating to see how, how that is changing the answer. So sometimes I think we overlook that. The way I read the research papers, it's a bit different. So I just want to go straight to the prompt. I want to know the tool people use and the prompt they use and then the data set obviously. But guess what? Many times the prompt is not in the paper, it's in the appendix. So I'm going to read the appendix first, which generally you don't really need to do unless you really, really need to know this paper really well. So I go to the appendix, get that prompt and then I try to analyze it based on some, you know, what I know about prompting, which I'm not saying I'm an expert by any means, but you know, I know, I guess something and say, okay, was this a good prompt? Was it two too simple, could have done better. And so that's the prompt engineering piece that I think researchers need to learn and how even the same tool is sensitive to small changes. And then when you push the same prompt to different tool sets, even more variation. And then every tool also has their own kind of personality or style in the answers. Language, length, you know, some tools are more verbatim, some other might more, you know, they tell you a bit less. And the problem is that you learn a lot of stuff but then the tools keep changing so you need to keep keeping up is actually a full time job. So that's why I would say it is very overwhelming for anybody to, to try to use them because you learn something and then the same prompt, even two days later it gives you different answers. Because we need to understand also the probabilistic nature of the answers, which is not like we got used to the search idea. You type these three keywords and the top 10 results are pretty much the same if you do it during the same few days. But that's not the case anymore. And so we need. That's why I'm saying learn a new language, learn this new concept of everything is not deterministic. And trying to, to see where I can get value from that.
Amrit Batu
Yeah, because very much from what I'm hearing you say, and it's from what I'm hearing in the industry as well. At the same time it's very much the fact that AI as a key benefit to research can help speed up processes and efficiencies. Within those processes, however, there's that age old saying of rubbish in, rubbish out. And if we don't learn these languages, if we don't keep up to date with what's going on, if we don't try things in different ways, the bias come through, misinformation come through, we can miss the mark with what we're doing as well, which can actually then slow down our process and increase inefficiency within it. So you're saying keep on learning, keep on learning the language and keep going through that kind of a learning strategy.
Mario Calogero
Essentially, yes. And also I see the tools now are getting better. They actually give when after a prompt and the answer, some tools are actually providing you some kind of follow up. They kind of, because I guess the companies are understanding that prompting is very complicated. And so I think we are getting there. Just the this kind of prompt engineering language, it's kind of odd in a way, we need to wrap our head around it. And so if that can be assisted already by the tool, hopefully it's going to be a bit easier for everybody to get what they want. Because at the end of the day sometimes you don't get what you want just because you don't know the right prompt.
Amrit Batu
Maybe we can write an AI song. That sounds like great. You'll always get what you want.
Mario Calogero
Exactly.
Amrit Batu
The next question, this is one that I'm really interested in. In your work. Have you seen any, anything surprising, any surprising ways that AI has sparked new types of insights or uncovered things that might have, that might have been missed otherwise?
Mario Calogero
Not really in my work yet I would say so on the other hand, I've seen and I've read papers about authors that are using, for example, AI to peer review that paper before they submit it to a journal. So that is like having already a reviewer that before you actually submit it to a, I would say human reviewer. And so that actually an example where you know, you can, you can see like a different point of view on your research. And I use papers as just a research and also I am, I can't remember the name of the journal but you know, I, I was talking to a journal editor who says that they want to introduce actually officially an AI reviewer tool. So basically when you submit the paper I don't think they're going to make it mandatory, but that's a different discussion. They want to say, okay, now we have a tool which hopefully is also optimized for doing just that. That's the difference between a generic tool and a tool that is optimized and trained on specific kind of data set to do that. So once you submit, you submit hopefully a higher quality paper than without review. And so that's. That I would say is probably one example that can, I can use in terms of spot new insights and then at the same time and something that we might discuss later. It is also a question of how much time do you want to invest on your research? I just attended an ESOMAR webinar where Ezomer is an international market research association for our listeners. It's the biggest international market research association, worldwide association, where they were discussing synthetic data and the language that is being used. And the question was, how do we define quality? I mean, what's the benchmark? And it seems also it's like it keeps moving. So is this like, is the AI as good as a senior researchers, a junior researcher, you know, or what's the gold standard? And, and that's, that's where I guess we need to rethink everything. That's also very difficult to do a proper experiment to compare the quality of whatever you do, even in qualitative research. So for example, let's say you give, you have, you do 10 interviews and you give the same 10 transcripts to 10 different researchers. The research report they're going to write is going to be different if they do it independently. Okay, but that's, it's going to be similar even if you do, if you prompt the different AIs to do, to do the same report. So which one is the right one? Well, I'm not sure, you know. And so what's the benchmark here? You know?
Amrit Batu
Yeah, it feels like we're getting close to a situation where we are writing content supported by AI. Same content is being reviewed with support of AI. So it's almost the AI's work. Are we losing the human factor within some of this?
Mario Calogero
I think there's already research, academic research showing how the language of papers is changing. So some keywords, some specific words that before in the papers were not used much. They saw an increase because AI is an interesting way of communicating. They use words that I would generally not use, for example, and so that's already changing. And is it good or bad? That's a good question there. But yes. And then you're right. How many steps are we moving away from in market research? Now they are pushing a new language which is, they call it organic, which is funny. So to distinguish from synthetic data, they say, well, if you collect data from humans, we just call it organic. Okay. Which I don't think is a bad idea. Just to make sure that everybody on the same page. I think it's funny. I always love market research because we always come up with new keywords. But besides that, so how exactly how many steps we move from the original human data, human insights, what are we missing? The nuances, especially in qualitative research. To think about it, a lot of. Let's say you give a transcript to an AI to analyze. Well, you lose all the audio piece, the. The emotion piece. And sometimes you can say the same thing with a different tone. It means the opposite. But yeah, the transcript doesn't catch that. So should we use, should we feed now instead of using the transcript, why don't we feed directly the. The video to an AI tool? Is an AI be able to.
Amrit Batu
Is.
Mario Calogero
Is in a. Is an AI to be able to pick up the facial emotion expression of this person? The tone? I'm not sure yet. You know, transcript is already a lower level. You're already missing a lot. And so, but it's easy because it's text. You know, the tools, AI tools do very well with text. We know that. Okay, so that's, I guess we all agree on that. But then the next level is the video, you know, audio. How do we deal with that? Are we missing something there? That. Are we diluting down the message or missing some insights? That might be very important.
Amrit Batu
Yeah, I like how you've. You framed that there. And what I'm reading into what you've said is at this point in time, and there is this conversation within the industry just now of user research or market researchers worried about their jobs and how AI is going to take over jobs. But from what you're saying, it sounds very much like there's a human at the end of a process doing the buying, doing the engagement. So the human involved in the research is still going to be very important just to manage that along the way. And then you've mentioned it a couple of times as well. Coming on to the next topic, let's talk a bit about synthetic users or simulated users. Some researchers are using LLMs and other AI tools to simulate human responses. It comes back to what we've just kind of spoken about there. What's your take on that? Is it a useful shortcut or is it too risky?
Mario Calogero
I mean, this is a massive topic. We should do just a podcast on that. And there's a lot of research coming out. But also, before I start saying something, there's also a lot of interest. You know, there are, you know, many companies are actually pouring millions of dollars into, in that. And so there's a lot of pressure from the investors to, you know, to, to have some results, some roi. So we need to remember the general context where if you see some piece of research from a company saying that this synthetic data is great. Okay, my question is, well, how many times did you try. Is this the 10th try that it was great and the other nine you're not really telling me? You know, because obviously you have pressure to show that the quality is high. And we, I want to be very clear also many of these companies, you know, Ezoma, for example, they have a list of questions should ask a company about transparency. But many of these questions, I couldn't find any company actually answering those questions. Maybe if you miss something, please put it in the podcast comments. But you know, what actually did you do? Because unfortunately, AI is a black box already. And then if on top of this black box, you add another black box from the company, I really don't know what's happening here. I don't know how the data was generated. Which tools is he used. There's also this big problem of reproducing everything because even if you had exactly the same thing, you can never reproduce anyway. So that's a general constant to remember this kind of market pressure in a way. And a lot of stuff is hype. I've been to conferences where synthetic data try to be sold. Like, oh, this is the answer of anything that you can actually. So you don't need to collect data anymore. I don't think we are there yet, but I want to use the Azure framework, which just came out in September. So I'm talking about something very recent and you can look it up. It's called Five Topics and discussion to help buyers of augmenting synthetic data and what they do, they classify synthetic data in three different groups, which I think is very helpful. From a mental model. One is called and the language keeps changing. So even today in this Isomer webinar, they're going to revise that because it's such a, such a moving target. One is data boosting and imputation. So you can call it as augmenting synthetic data. This is basically you have some data ready and you want to use synthetic data to augment or to boost your original collection. That's generally done in quantitative studies. Let's say I give a simple example. You will collect 500 surveys and you generate 500 more and you get a thousand. Okay. Because it's cheaper. Okay, let's say that I mean, why should we do it? Probably just because it's cheaper. Not really faster with online research, but they say cheaper. So that's a way to think about kind of imputation, sample boosting, blah, blah, blah. The other one is the fully synthetic data. So now you actually don't call, you don't have any previous data, just generate synthetic data either quantitative or qualitatively. Okay. And that's the second bucket. And the third one is called Synthetic Personas. Okay, that's used a lot in US which are generally interactive tools that some company can build for you where you kind of, kind of unlike the other two, the first two are not really interactive. The third two is many companies, they generate these Personas for you and then you can talk to them. And so let's say you generate a Persona of a specific user type for a product or a market segment and then you kind of talk to them like you do qualitative interviews and try to get some insights from them. So that's those three different, that's what we see in the market. That's actually a good way to think about it. And here the discussion is endless. So because we don't have endless time, I was like, okay, is there some good soul, I would call it good researcher who spent the time to read all these papers that came out so far and did some kind of summary. I found one. Thank you to this researcher. It's actually a paper written just last year in psychology and marketing and the title is Using large Language models to generate Silicon Samples in Consumer and Marketing Research. I guess the title is Good Challenges, Opportunities and Guidelines. The summary of this paper is that the overall verdict that the results are mixed across domain. They also use the term silicon samples, which is used more in the academic literature instead of synthetic data. But they are very similar. They say that studies comparing silicon and human samples show some replications but many non replications. So LLM cannot be assumed to mimic human behavior reliably across items and across countries. And here I cite another paper which was written in humanities and social science communication, which is called Performance and Biases in Large Language Models in Public Opinion Simulation where they use the World Value Survey which is a gigantic World Value surveys done in many countries and they pick six countries, so US and others. And one other summary is that the models they perform better in western English speaking and developed nations like the United States in comparison to others. But also there were disparities in demographic groups showing biases related to gender, ethnicity, age, education and social class. So that's Just a quick summary for the overall verdict. These researchers of this big summary study, they suggest to use it for pre test and pilots. They say that it's promising for qualitative pre testing and pilot studies. My personal view is that I prefer to talk to real people if I can. And now there are so many tools out there that can allow you to do that, including obviously user testing. So I would prefer to do that, but that's my personal view. And then the last one is that the using quantitative studies remain problematic. And if you read many papers, there are some common trends. For example, there are, there's some prompt sensitivity so you change the prompt slightly and the results are very different. And another one which is interesting is this reduction variability. And there are many papers when they basically compare, they are more like about surveys. So they compared a survey distribution completely generated by LLMs to a gold standard high quality survey and they see a reduction in variability. And so that's a problem. We want to capture the nuances of human attitudes and opinions. And so we don't want to just get the mean because we also, if you are a subject matter expert, you already know that you should know where this topic is going, how people think more or less about this topic. But it seems that this, one of the issues with LLMs is this reduction of variability. And then this is at an overall level when you start doing subgroup analysis that breaks down very quickly actually where you see way, way more biases and in, in lesser represented groups, which obviously we need to remember, is another thing to discuss. I guess in general, besides prompt engineering is like how are actually this data generated? And there are lots of people studying biases, training data, where the data coming from and all these issues, which is probably beyond the scope of this podcast, but it's something important.
Amrit Batu
Nice. The listeners maybe can't see I'm nodding away as you're speaking there.
Nathan Isaacs
Yes, yes.
Amrit Batu
From what I'm taking away from it, Mario, it sounds like there's quite a lot of work that still needs to be done in that synthetic user space that we get to a point where we can lean on it reliably on a consistent basis.
Mario Calogero
Yes. And the other part that I can mention is we also, many companies will sit on lots of data already being collected. We obviously have lots of issues in terms of privacy, confidentiality, especially if you work with multiple customers, you just cannot use those data. But if you have your own data, let's say you have a company who collects data on your own customers and you have all the permission and everything the privacy settings and everything. Can we use this data that we already been collecting with organic or human folks to help design the next study? I'm not saying completely generate everything from scratch, but you have a good starting point to move from. That's one way to do it. Another idea which is very simple is think about how many research reports generally sit around in a company, wherever they are, in different folders, different databases that you have the marketing team doing research, the UX team doing research. Many times don't talk to each other in these two teams and you have this gold mine that sometime we don't use enough because it's scattered all over. Can we use AI to go over all these research reports? Especially a new researcher comes in and wants to know about this specific topic. This person can easily get a good summary of all the studies done without reading 200 papers, 200 research reports, which eventually you will read some of them just as a starting point and leverage the research as this been done before instead of reinventing the wheat from scratch.
Amrit Batu
Nice. Awesome. So I know we're kind of hitting time as we come to the end of the podcast. Now. Last question for you. If you have to give one piece of advice to UX or CX teams experimenting with how to use AI tools right now, what would that piece of advice be?
Mario Calogero
Yes, yes, I would say do not wait to experiment, don't wait, but do it carefully. First I want to say one thing to do is read always the terms and conditions for where your data going. Because especially if you're feeding the AI internal data, you want to make sure that they are not used to train the model. So never use a personal license, but you always use commercial licenses and talk to your lawyers about the terms and conditions. That's just absolutely table stake. The other thing I would say just try as a starting point, try to reproduce research which you already did. So you know everything about this research. Okay. And you want to reproduce it to see how good it is. I do. A very simple and quick example is like how many of us already collected endless open ended answers. Either it can be from a survey or even open ended interviews. Okay. And you already have a human who already did a research report or if it's open ended answer you already you coded them and so you have a coding schema and you have basically some kind of frequency of codes. Now can you use the AI to recode the same open ended answers and then you use AI as another research and you can compute some intercoder reliability. How close are you to the two. Can you recall the things and get very similar results. And there are some papers already out there with some good examples where it seems that it's getting better and better. So start doing that, but use example where you really know, let's say that you already know the quality of it, so that's something you can use. And then keep messing around. I. I did in my latest blog, I tried to reproduce a chart from a website. I thought it was a good example. It was wine production around the world in 2024. And so turns out that Italy's number one, which I thought it was France, but actually it's Italy number one in terms of number of liters. And so they produce a really nice visualization was a grape and each of the grapes had a number of it, the number of billions of liters or something like that. I took the spreadsheet that was on the website and I fed it into one of these tools to generate an image of it. The image was actually looking great. It's something that I would not be able to do. I'm not a designer, I'm everything. And I tried to prompt as best as I could. And then I looked at the numbers, even by telling the tool, read the numbers from this specific table and this column and the first numbers were all correct, the numbers at the bottom. So the number with lower production of wine were wrong. And I tried to prompt many times saying hinon actually this is not for this country. This is the number of liters. And after an hour and a half, I gave up. They were still wrong. So just test these tools with small examples about the quality. Was the image great? Yes, absolutely. I'm not able to do that image. It looks great, but then the numbers were wrong. So maybe in the next version it's going to be great. So keep actually testing the tools because if you don't test the tools is back hub. Also, the more you test the noise, the more you know about prompting and everything like that.
Amrit Batu
I think that's a fantastic point to finish on and I'm sure Mario, you agree, we could sit here and speak for another couple about all these topics and maybe we can get you back again later.
Mario Calogero
Yeah, we'd be happy to do that.
Amrit Batu
But in the meantime, thank you so much for being on the show. As you can tell, I've really enjoyed the conversation. How does someone learn more about you and your thought leadership?
Mario Calogero
Yeah, very simple. You can follow me on LinkedIn or my website is very simple. It's calar. I try to post blogs and news every couple of weeks. That's the best way to do that either. And so I generally post on LinkedIn and the website at the same time, sometime only on the website if it's less LinkedIn kind of topic. But yes, that's the easiest way to do that.
Amrit Batu
Excellent. Thank you so much, Mario.
Podcast Narrator
Want to keep the conversation going? You can find the show notes@usertesting.com podcast if you haven't already. Don't forget to follow us on Apple Podcast, Spotify, Overcast, or Google Play, so you never miss an episode. And if you enjoyed today's show, please share it with a friend or leave us a rating and review on Apple Podcasts. And until next time, this is Insights Unlocked, an original podcast from User Testing.
Episode: Why AI in user research isn’t replacing real people (yet)
Guest: Mario Callegaro (Founder, Callegaro Research; former Google, QUANT UX board member)
Hosts: Amrit Batu (Principal CX Consultant, UserTesting), Nathan Isaacs (Content Marketing Manager, UserTesting)
Date: December 15, 2025
Length: ~47 min
This episode explores how artificial intelligence (AI) is reshaping user research—its potentials, limitations, and why humans remain irreplaceable (for now). Mario Callegaro brings his deep expertise from Google, academia, and consulting to outline how AI tools fit into the research process, the challenges of relying on synthetic users, and practical advice for research and experience teams navigating the fast-evolving AI landscape.
[02:34–04:33]
“We had a lot of...internal discussion. We were also testing the tool early on as dog fooders, as we say at Google.”—Mario Callegaro [02:41]
[04:34–08:06]
“They were very...strict about the quality of the answers...But at the same time, they were also very forgiving.”—Mario Callegaro [04:58]
[08:06–16:13]
Mario introduces a framework (from Yong Wei Yang, Google DeepMind)—three phases in research:
A. Planning
B. Execution
C. Activation
“If you have all the research done, you can feed it to an AI...the output was actually pretty good...it would have taken me a lot of time to write something from 5,000 words of a more academic paper to a blog post in more conversational language...”
—Mario Callegaro [14:17]
[16:13–23:13]
“Prompting is like a different kind of language...Now the prompt...makes a massive difference in the quality of the answers...we need to learn this new language.”—Mario Callegaro [16:44]
[21:26–29:46]
“Let’s say you give a transcript to an AI to analyze. Well, you lose all the audio piece, the emotion piece. And sometimes you can say the same thing with a different tone. It means the opposite. But yeah, the transcript doesn’t catch that.”
—Mario Callegaro [28:35]
[29:46–39:45]
Mario outlines the three big buckets of synthetic data:
Risks and issues:
“Studies comparing silicon and human samples show some replications but many non-replications. So LLM cannot be assumed to mimic human behavior reliably across items and across countries.”—Mario Callegaro [35:38]
[40:07–42:01]
[42:01–46:00]
“Do not wait to experiment, don’t wait, but do it carefully...try to reproduce research which you already did...and then keep messing around.”—Mario Callegaro [42:22]
On the Limits of AI-Generated Insights:
"Are we diluting down the message or missing some insights? That might be very important."
—Mario Callegaro [29:44]
On Prompt Engineering as a Vital Skill:
“Prompting is like a different kind of language...the way you prompt makes a massive difference in the quality of the answers, in the depth, in the length.”
—Mario Callegaro [16:44]
On Synthetic Data’s Reliability:
“LLM cannot be assumed to mimic human behavior reliably across items and across countries.”
—Mario Callegaro [35:45]
On the AI ‘Helper’ Role:
“The AI can help with each of the different phases.”
—Amrit Batu [16:13]
The conversation is candid and accessible, blending deep technical insights with practical, real-world advice. The hosts embrace Mario’s expertise with curiosity—asking both high-level and nuanced questions—and Mario balances optimism about AI’s potential with clear-eyed realism about its current limits.
Learn more about Mario Callegaro: [LinkedIn] or [calar.io]
For more resources, recordings, and future episodes, visit [usertesting.com/podcast].