
Loading summary
A
Foreign. Tired of database limitations and architectures that break when you scale? Think outside rows and columns. MongoDB is built for developers by developers. It's asset compliant, enterprise ready and fluent in AI. Start building faster@mongodb.com build. Hello everyone and welcome to the Stack Overflow podcast, a place to talk all things software and technology. I am Ryan Donovan, your host and today we are talking about AI as a renaissance and what else is going on in the world of E commerce. And my guest today is Vanessa Lee, VP of Product at Shopify. So welcome to the show, Vanessa.
B
Thanks for having me, Ryan.
A
So before we get into the details here, we like to get to know our guests. Tell us a little bit about how you got into software and technology.
B
Well, I am a robotics engineer. We call it Mechatronics for those who are really hardcore. But I did engineering in university here in Waterloo in Canada. I've always loved building things and so that was very, it was a very like natural post secondary education for me to have. And I think from there I did a couple startups. I think this is like back in, in the time before startups were that cool. And so it was, it was not that in vogue to do startups. So it was a little bit, I felt like a little bit of a lone wolf. But whenever I was part of a team, whether it was in school and we were building like a robot, you, you end up picking like a specialization because when robotics someone does the sensors, someone does the hardware and that was always the software person. And so I really learned how to code like in C and some Java because that was the language of the day with Arduinos and Raspberry PIs and those kind of things. And so I came out, did my startup, in which case I was the only person coding. And then I found myself at Shopify nine years ago and I started here as a senior PM working on our app platform. So back then we were really just getting started as a platform. Like we had some great APIs. There was actually a bargaining ecosystem of app developers already because you know, the opportunities were so, so vast building for entrepreneurs. But we hadn't put a lot of deliberate effort at that time to build our platform. We didn't have, we didn't version our APIs when I joined, which was wild. We didn't have extensions, we didn't have function, we didn't have a lot of the stuff that we have now. And so almost a decade later, it's amazing to see, see kind of how much our platform has grown in terms of capabilities, in terms of like, what you can build on Shopify as a developer. And then, yeah, I've kind of expanded my role, but that's kind of where it all started.
A
Okay, so last time we talked to the fine folks at Shopify, we had Glenn Coates on, has had a product. Obviously you've been there for nine years, so you were immersed in the sort of philosophy of whatever Shopify thinks. How do you see this role having newly ascended to it?
B
I mean, we do so much as a company, I find, like our. Our scope has increased from where you go to build your online store over the last decade to where you go to get a point of sale where you come to connect your store with agentic surfaces. Like, we've just grown and grown and grown and become truly like the operating system of merchants businesses. I've worked on quite a few parts of our platform online store and liquid. Some of the Horizon updates that you chatted with Glenn. I had worked with the team on quite a lot. So it has been a fire hose over the last six months, but perhaps one that I had already in some places dabbled in. So it's been. Yeah, it's been a fun. It's been a fun six months.
A
Yeah. Well, let's get into the. The details today, the topics today. You think of AI as a renaissance for technology. We talk about AI pretty regularly on.
B
This program, I'm sure, like all.
A
Yeah, absolutely. And we get a healthy amount of pushback on it. We've had some skepticism. We found in our developer survey. Basically, the more people use AI, the more skeptical they become. How do you see AI as a renaissance, sort of in that. That space?
B
Yeah, that's a really good question. Toby had very early on put forward a video which kind of shared our ambition for Sidekick. Right. It showed Sidekick being able to work on the platform to create products, to create collections, create all the primitives inside of Shopify, all the resources in Shopify, and do it like basically with you alongside. So you'd be able to review everything, but it's essentially able to draft a whole bunch of, you know, resources in Shopify. And that was like really the start of our psychic journey. This was back in 2024, I believe. The last couple of years have really been an exercise of how do you build an AI agent at scale. Right. Which for those who have done it, is it's not an easy feat, especially when you are starting from scratch. And so the last couple of years, we've been working a lot on Sidekick. When we Kind of came out earlier this year with a new architecture of Sidekick. We started seeing Sidekick be a lot more successful in most conversations. So I purposely kind of waited and held back the team from talking and shouting too much about Psychic until I thought that it really drove some value.
A
For folks who don't know, Sidekick is.
B
What is our essentially AI assistant. So it lives very similar to what a lot of platforms are doing. It lives alongside in the ui. You can ask it questions, it could create products for you, can create collections for you if you're in Shopify, it can help you edit your online store. And so it's able to kind of help you traverse the entire platform. And so when you're building something like that, we had to make sure that every question you threw to it, it would be relatively valuable.
A
Right.
B
And so earlier this year, after we launched at architecture, we had finally seen, okay, now merchants are starting, our users are starting to truly demand Sidekick in more places. And so I'd say the last seven months since then have been super fun for us. So after building a ton of foundations over the last two years, now is the time where we get to really stretch our, stretch our legs and say, like, okay, in what places in the admin can we also deliver value using AI? And so we renaissance just kind of captured, I think, our approach to where we're at. Like, we put in a lot of hard work to get to this point and to make AI something that wasn't just a great demo feature, but something that actually repeatedly would deliver value.
A
Yeah, and I think when getting a lot of pushback on AI in general, it's because of the sort of non deterministic hallucination aspect of it. And you're talking about having it answer specific questions very well or any given question. How do you prevent it from, you know, going off the rails and, you know, selling a car for a dollar or something like that?
B
Well, there's a lot you do to make sure that it kind of answers properly on the topics. But one thing that I don't think is very obvious for folks who haven't gone through this journey is how important your evaluation set is. I mean, you had written a little bit about it as well. It's such a creative process where you use like LLMs and to grade other LLMs, which is such a fascinating thing. You also use LLMs to generate synthetic data that you can then use to form ground truth sets. So one of the ways that we did that, we put a lot of work into the foundations of evaluations, but you Also make sure that you have enough variety in that judge's training set where you also have essentially negative cases. Right. And grading it negatively so that the judge is also able to spot when it answered and tried to sell you a car, which we absolutely do not want. Right. So honestly, it's a lot of grunt work, it's a lot of time investment, but it's also being super creative about how we build this data set of evaluations and how we build that judge. But I think we're finally at the place where you put in that work. And then you do start to see the development internally of AI features get faster and faster and faster as a result. And so that's what's enabled us to kind of really build a ton more features into Sidekick and capabilities into Sidekick reliably over the last six months.
A
Glad you brought up the evals. I think those are becoming increasingly important and somebody did some research that about 80% at the top end, the LLMs aligned with human preferences. Do you have a human in the loop to evaluate that extra 20% or any way to identify that extra, like might be a bad eval.
B
Yeah. So how we incorporate people into this, I mean people are still. When you're looking at evals, I like to say to all R and D teams that work on AI, like your evals are your new spec, right. We're so used to, okay, we're going to have a requirements document depending on like which or you're in the requirements document or spec and then you take that and then you build software which is, you know, in traditional sense, like very easy to rule based systems, APIs, you're able to just build according to specific. In the world of AI where it is a, you have to be able to assume a variety of input. Your spec is actually your evaluation. Right. And that is the thing, right. So if you think about how do we make sure that people are actually like people like on Shopify side, like our opinions of what sidekick should be, how it should act, those are all actually embodied in the ground truth set. Right. And so that's kind of how you go from human. Like it's not just LLMs making LLMs. Like we used LLMs creatively to help us scale. But for example, I would have a team say, okay, go and generate a bunch of conversations between Sidekick and let's say an LLM and you take those conversations, you have to edit them. Right. And so that is the human in the loop. Now the 20% that you're talking about that don't have the human alignment, like at the end of the day, there's too many. We hit 100 million conversations on Sidekick. When you're looking at that scale, you can't have human in the loop for every conversation. But what you can do is you could take some of the sampled conversations that people are having that you don't align with and you can grade them and say this, we do not align with. Put it in the ground shoe set so that 80% will continually get better and better. But I always tell folks, especially on the product side, because I think this is how product is really changing as a craft is like if you are building AI features funnel all of your opinions, your how you think this agent should work into ground truth set. That's now the new spec for building AI and that's the human in the loop. So it's not like there's not someone behind the scenes helping to shepherd Sidekick. There's definitely a lot of humans.
A
Yeah. It's interesting. The spec driven evals. I've heard of spec driven development with the AI agents.
B
Yes.
A
Is there a case for using both? And if you use both, is there an issue with sort of contaminating the data sets?
B
For us, we really use, you know, just ground truth sets and judge evaluation. That's. That's really what we find works the most. And so that's been the basis of almost every AI feature that we have launched. It's hard to say if there's a difference for both, but we definitely internally have a preference for a robust ground truth set. A judge that has a good respect for grading in the same way that a human would grade a conversation has the same overlap. And then using that judge to accelerate all the other development, then the devs working, let's say in this case on Sidekick, have something reliable that they can run for every PR to say, okay, how have we changed? How much does. If I run 10 test conversations across my PRs, across my PR, my branch of Sidekick, how much does that match? Like, what is the LLM judge that I now can trust? What is it spitting out as an improvement because of this pr, you know, in comparison to main. That's been our focus.
A
And in running these test conversations, how much do you rely on like simulated environments, simulated sort of user conversations?
B
Less and less so. Right. That I think was a big part of how we had to get started. When you have no conversations and no real kind of user interactions to go off of, you kind of need to use the synthetic data generated. And it was hilarious. Like there were Some times where we didn't quite tune the merchant LLM. In our case, like we call them merchant LLM quite well. And so they would just agree with each other and they would just go off into like a never ending conversation because that's what LLMs obviously are trained to do. And so it's actually, actually not that easy to generate in synthetic tests. I'd say nowadays, now that we've actually gone live and we have real user conversations that we can then grade and make sure that we, you know, we align with our judge. We, we use that more than we do synthetic.
A
Yeah. It feels like, you know, when people would rely on the, the synthetic and the eval too much, it almost feels like you're getting towards a sort of model collapse. Right?
B
Yeah. It becomes a bit recursive. Yeah. I remember there being a point where like what exactly, exactly how did we get here? How are we using the, the LLMs to talk to other LLMs? But you always just have to remember to the human in the loop, you have to find the place in the development cycle where you are going to insert your opinion. Right. If you don't have that, then yes, that is a little bit. Are you sure that there is a ceiling? Like the humans always bring the ceiling. Right. So if two LLMs are talking and they're, they've kind of gotten the answer correct, but you're like, it's not quite, quite how I'd want to answer. I would like it to be more concise, you know, or whatever your, your opinion would be. It's you who's then going in, correcting it and raising the ceiling. Right. Because yeah, if not, you're just going to kind of peter out with, okay, well this is what the LLMs are doing. I guess you could change your prompt a little bit, but the human in the loop is what really raises the bar. And to be honest, that work never really ends. Like, we still have people today in Sidekick every month putting in and refreshing our ground truth set.
A
Yeah. While back we talked to one of your engineers, distinguished engineer Ilya Grigoric, and he was talking about micro front ends and sort of components. I've talked to other folks who are sort of tokenizing at the component level for software. Are you thinking about that at all? Sort of Sidekick bringing up like pre vetted components?
B
We tried that. We aren't launching anything yet. We're playing a lot with ui. Okay. So one of the things that we've kind of talked about internally is how does Sidekick come out of just a text Shell. Right. So right now you speak with Sidekick, you have conversations. But one of the things that is fascinating that I'm super curious to see where we go in the next year is how does LLMs help to change the way that we interact with UI as well? Right. So not just text based, but also, okay, let's say, especially in our case where we have millions of businesses and every business has a different workflow, a different need, how does Sidekick, or let's say another LLM that we build, how does it build UI to fit a merchant's specific needs? Which is not something that was ever possible before. Right, right. Without, without LLMs. And I think that that's a really exciting way to think about software. It's like, you know, in the past software is limited by the pixels that are on the screen. And so you try and as elegantly as you can, put in as much functionality without overwhelming your user and that's perpetually something that's really hard to do. We still want to make sure that UI is really phenomenal out of the box, but there is room for some, I think, customization that can be done by a merchant. So for example, it's not quite what Ilya is talking about, but it's something that we're launching called basically the ability for Sidekick to generate apps for you, custom applications for your business. So if you for example, manage the tags of your product metadata in a different way than how our UI represents it and that it is like a frontline thing that you want your merchandisers to edit. We put it in the, at the bottom of the page, you want it at the top of the page, you can then say, okay, I want to create a merchandising application where my merchandisers can go in and it has tags at the top. It manages like my certain meta fields, which are custom custom fields over here. And then it becomes like a new way for I think merchants to work with Shopify. So that's been a pretty, a pretty fun thing to kind of go and offer merchants which they would have, it probably would have taken them a while or at least costed them a lot to build for themselves.
A
It's almost like building and vibe coding.
B
Yeah, it is, I mean like vibe coding for us for not the average merchant has been something that I think we've, we've all done in our own day to day work. But I think how do you take that and then how do you give that power to a user that's not as technical? That is a really. If we don't you know, if we're not fearful about it, if we just explore that optimistically for a second, that is something that we would have never been able to do without AI. And I do think that that's a really exciting way to think about user interfaces in the next decade is how are user interfaces highly personalized to what Ryan wants, what Vanessa wants? And that's pretty cool.
A
Yeah. So, I mean, I think we've, we've both all seen those demos of on the fly interfaces per person. Are you actually thinking about that level of customization?
B
I think we're still in the early days of it, so there's also latency concerns and we have to make sure that UI isn't changing on you every two seconds. There's still some kind of fundamental, like user behaviors. Like if things were to change on you every time you log in to Shopify, that's too jarring. But this felt like the right first move for us where we're saying, hey, this business. Well, one, we've actually had the luxury of investing in our app platform now for almost a decade. So we have all these tools, we have the right GraphQL APIs, we have the right front end components in terms of what we've done with Polaris. And then now we can give all of those platform tools to an LLM and say like, okay, now create something that is bespoke for this business and then they can install it and use it over and over again. I think we're still a bit of ways from like real time generating UI for our user, but this felt like the right slight shift for us to start to see whether this is going to be valuable.
A
Yeah. The couple of E commerce API platforms I've worked on, I was sort of surprised just to see how much of that was just storing data on products. Does Sidekick help out affect the data side of the house?
B
Yeah. So when you're talking about data, you're talking about like for a user, like it's actually able to generate data on the backside for you.
A
Yeah. For like, you know, the products, the T shirt sizing. Like, I know sometimes products will have very complicated and very specific requirements on what data they store.
B
Yeah. So one of the things that we've worked on for the last couple of years is actually looking at the data model of Shopify and understanding, hey, we have now products across millions of merchants. And that is a fantastic position for, for us to be in, but also makes it very hard for platforms who connect to us to understand, well, this T shirt from merchant a And this teacher from Merchant B, like all of the metadata is stored in different ways, right? So you have the product description which might have some details. You have the details of the size and fit in a meta field in a different merchant store. And so one of the things that we did actually starting a couple years ago was use LLMs to start to properly categorize products and properly create attributes. So this is where I'm super proud of one of these launches. We've kind of worked on it behind the scenes over the years, but last year we actually basically embedded these predictions into Shopify. So if you started and said, okay, I'm gonna, I'm creating a new product, here's my sweater, right? Upload image of the sweater and then write something like, hey, this is like the, I don't know, the, the Vanessa sweater. It would start to be able to say, hey, an LLM would run in the background, say I know the category of this is apparel, tops, sweaters. Let's say we have a standardized taxonomy that we've, we've created and then the attributes are like sleeve length, material, color. Right. And so these are then based on the images that were uploaded, then also automatically suggested for you. So it just makes your life a little bit easier and nudges you a little bit into okay, well like yes, I agree that it's colored black and the sleeve length is X. And then it allows us to actually create better, more standardized product listings not just for their shop, but also for all merchants as a whole. We're able to then work with partners like OpenAI and say, hey, we have a product catalog that you can plug into so that our merchants products are actually surfaced in these, in these surfaces and all of the products are actually categorized and have the right attributes. Right. So this is, this is work that's ongoing, but it has been something that we've really worked hard on over the last couple years.
A
Yeah, I talked to somebody machine learning at Etsy and talking about how they are trying to categorize products and I'm sure you all have a similar issue where it's like, could be anything, could be custom handshake, cursed mannequin, whatever. Right? How do you categorize those?
B
Well, we have a pretty robust taxonomy tree that we continually add to because you're right, there is so many different types of underwater cameras that you had no idea and so many attributes of them that people need to be able to understand which one to buy. So I think that that's just like an ever growing task. We started this Actually about a year and a half ago at this point. Point. And it's just something that we've continually invested. I don't think there's a secret sauce to it other than you. You need to train a model. You need to create a bunch of labeled data sets. And this is just. It can be just a large ML model. It doesn't need to be an LLM necessarily, but I think it just takes a lot of work to be honest. But it is an important task. I can see why I probably everyone who's. Everyone whose product data across many sellers will be very familiar with this problem.
A
Right, Right. Yeah. Yeah. So when you're sort of thinking about new features for this, how. How do you weigh the needs of like somebody who doesn't know anything about it? They're creating a little. Little store for whatever, for the wedding registry or something to the developer who is coding in, you know, soup to nuts, everything in the. The E Commerce platform.
B
When you're building your own brand, Right. And this is something that's been something we've kind of kept true to ourselves throughout. We always underestimate how much merchants care about their brand and how much it's about expression. Right. And so one of the things that we've been really passionate about is making sure that as a. Whether you're a developer or you are a mom and pop who has no developer on staff, you can come and create something that, let's say an online store, in this case, that feels native to your brand. Right. And feels honest. And so a lot of the times that could mean, okay, I'm able to hire a developer, but in the case of no code, you're able to go to the theme store, find a theme that feels close to what you want, and then be able to customize it and build it in a way that, okay, now it really is my brand.
A
Right.
B
And so I think when it comes to building yourself, and we've had a lot of conversations over the years, especially during the 2020 era, where there's a lot of folks building headless, especially if they did have developers on, I think that no matter what, there will be always different architectures, different constellations of services that you might have to bring together, especially if you're in the larger category where we're always going to have that escape hatch. Our approach has always been like, we want to be with you no matter if you choose to develop your own, let's say headless storefront or if you are coming in and installing a theme. But one thing that has always been true throughout. I know the last decade is. I've always observed merchants to be extra, extra, extra efficient. They live and die by how efficient they are in their day, how productive their teams are. And so I think that no matter what they want to achieve, their questions are always coming back to, okay, well, what's the most efficient way for me to achieve the brand that I have in my head, the customer experience that I want to create? And so I think that we offer both. But I think at the end of the day, we've seen a lot of folks just say, like, you know what? I can do a lot even without needing to go headless. Right? But we're never one shop to say, okay, you can only go a certain way. I think we always have to acknowledge that developers will always have needs and wants, and brands will always have things that they want to do that's unique to them.
A
So Sidekick is now out in the wild. What are you excited about for the future of this AI renaissance?
B
As user behavior changes from just working in our UI to now working, let's say, more and more increasingly with Sidekick, one of the things that was really important to me was that we made sure that there was a way that our ecosystem could come with us. We're never going to be a platform that builds every piece of functionality across millions of businesses, across all verticals and all sizes. That's just. That's always been our belief. And so one of the questions that I get a lot from our ecosystem is, and then how. When will Sidekick be able to work with my app? You know, if a merchant says, create me a discount, how can Sidekick then go create the discount? But also say, and let me draft that in an email to this customer segment for you. And let's say I use an app for my email, how does that app participate in that conversation? And so one of the things that we started releasing in a developer preview because we want to develop out in the open is our ability for Sidekick to essentially launch what we call app intents, which are ways for you to register essentially like tools for Sidekick to be able to then use that in their conversations, in workflows, so that merchants can actually access your app from conversations. So that's one that I'm probably. It's a developer preview, so it's early, but I'm excited to see where the next 12 months goes.
A
I think the last time we talked, you all had a MCP server. Does this use MCP or anything like that?
B
So it's very mcp. Like in the way that we've architected and built the schema. So you define the schema very much like mcp. It's not MCP exactly because there's some stuff that we wanted to make sure you didn't have to do. You didn't have to stand up a server yourself, but it is very mcp. Like our MCP tool that we launched earlier this year, we also upgraded that. So if you are a developer building an app on Shopify, not only did the MCP tool that I know you spoke to Glenn about, not only can it do GraphQL like it did mid year halfway through 2025, it can also now basically use the Shopify CLI. So we put a lot of work into our CLI. Our CLI makes it easier for you to like, create a test environment, create an application, create the TOML file where you specify like what your app is supposed to do in terms of meta fields and everything. And then now it's able to do that holistically. So this app can actually, this MCP tool can actually create an application from start to finish with all the tools that we offer in our platform, which has been fun to see what people can do very quickly in cursor and they can now build an app very quickly. So, yeah, I'd say like both of those things are two big launches for the developer community that I'm hoping will make lives easier for developers to participate.
A
All right, well, it's very exciting. Very exciting. Well, it's that time of the show again where we shout out somebody who's gone onto Stack Overflow, drop some knowledge, shared some curiosity, possibly earn themselves a badge. Today we're shouting out a great answer Badge winner. Somebody who dropped an answer that was so Good it scored 100 points or more. Congrats today to Erwin Brandstetter who answered how to convert empty to null in postgres SQL. So if you're curious about that, we'll have the answer for you in the show notes. I'm Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you have topics, questions, concerns, comments, you can email me@podcastackoverflow.com and if you want to reach out to me directly, you can find me on LinkedIn.
B
Thanks for having me, Ryan, Vanessa. And you can find me on X v Lauren Lee. And yeah, excited to have had this chat, Ryan.
A
Thanks for listening everyone and we'll talk to you next time.
Date: January 9, 2026
Host: Ryan Donovan
Guest: Vanessa Lee, VP of Product at Shopify
This episode explores how AI is transforming the world of ecommerce, focusing on Shopify’s innovative AI assistant, Sidekick. Ryan Donovan sits down with Vanessa Lee, VP of Product at Shopify, to discuss AI as a "renaissance" in technology, the evolution of Shopify’s developer platform, and how grounded evaluation methods ensure safe and helpful AI products. They also dive into the technical and human challenges of training AI, how Sidekick is reshaping merchant workflows, and share insights on the future of customizable, AI-driven interfaces and developer tooling.
The discussion is pragmatic, transparent, and focused on real-world developer and merchant needs. Vanessa is candid about the technical and human challenges behind building and scaling Sidekick, while the overall tone remains optimistic and forward-looking about AI’s potential in ecommerce. The technical complexity is grounded with practical examples and relatable anecdotes.