
Loading summary
A
Stripe's network handles on average about 50,000 new transactions every minute. To put that in perspective, because it's a lot of zeros. That's about 1.3% of global GDP.
B
Welcome back to the Mad podcast. Today I'm sitting down with Emily Glasberg Sant, head of information at Stripe. Once a payments API startup, Stripe has become one of the most legendary companies of this generation and a full financial infrastructure platform that moves 1.3% of the world's GDP online. We talked about why Stripe decided to build its own AI financial model and what it learned in the process.
A
Stripe is a little bit different. We have really differentiated data. OpenAI doesn't have that data. Anthropic doesn't have that data. Our first instinct was actually full on wrong.
B
We also discuss the brave new world of gentic commerce where agents will buy and sell on our behalf and what it means for payments and new infrastructure like MCP servers.
A
Who's doing the buying is different and where they're doing the buying is different. It's pretty clear that MCP is becoming the default way that any single Service, Stripe or GitHub or Notion talks to an LLM.
B
We closed a conversation covering fun Stripe data about the incredible rise of this generation of AI startups.
A
They are monetizing faster than any previous generation of startups that we've seen. Those that already hit 30 million in annualized revenue got there in about a year and a half. For comparison, the fastest growing SaaS startups on Stripe took five and a half years to hit that same mark.
B
We're living in an era where AI is increasingly rewriting commerce, much movement and risk. And this episode is a great way to make sense of where the world is going. Please enjoy this terrific conversation with Emily Glassberg. Sense. Emily, welcome. Thanks for spending time with us.
A
Delighted to be here. Thanks for having me.
B
All right, so everyone in tech obviously knows Stripe, which is a monster of a company. But maybe for context, what is the latest and greatest way of describing the full breadth of what the company does and maybe the latest stats?
A
Well, Stripe builds programmable financial infrastructure. So put kind of less buzzwordy, we are giving any business, whether it's a 20 year old selling a Figma template or now more than half of the Fortune 100, the rails and the intelligence to move money online and to grow faster. You asked about the numbers. Last year, companies processed about $1.4 trillion on Stripe. To put that in perspective, because it's a lot of zeros, that's about 1.3% of global GDP. And that number grew 38% year over year in what many experienced as kind of a rocky macro climate. Stripe's network handles on average about 50,000 new transactions every minute. So those are the transactions that are adding up to 1.4 trillion in payments volumes processed annually. And every one of those transactions is training data for some of the AI systems that we will talk about today. I just say because of the flywheel, Stripe is no longer the payments API. If we were talking 10 years ago, we'd be talking about a payments company. But in practice, we're optimizing now the entire payments lifecycle, the checkout, user experience, fraud prevention, bank routing, automatic card update retries, even how you handle disputes as a business. And that's all in service of merchants profits. Right. Growing their revenue and reducing their costs. And so I think of sort of the tools we're creating as generating a structural tailwind for the Internet economy, for growth in any environment. And we're already seeing it. Businesses on stripe grew seven times faster last year than the S&P 500. So it's that infrastructure creating a structural tailwind for growth. That's our primary focus.
B
Amazing. All right, so we're going to unpack some of this before we do that. You are head of information at Stripe. What does that mean? What does your remit cover?
A
Yeah, our information Org is really focused on three things. One is how do we use data effectively and that's end to end. How do we do the data engineering and analytics and internal science, how do we build ML powered applications for our users. The second thing the information org works on is growth and the self serve business. So millions of businesses run on Stripe, the vast, vast majority of them, and almost all of the SMBs and startups get going directly in our product. And so building that product led growth front door experience for users is sort of our second focus area. And then the third thing we work on is experimental projects, which I have mixed feels on this name because I think innovation and experimentation is so important and it can and should and does happen everywhere. But the concept of an experimental projects team is really just having a couple dozen standout engineers and PMs who can go run ahead at really big, perishable, meaty opportunities that we couldn't easily staff from within any of our current product verticals. So information is data, self serve and experimental projects.
B
Very cool experimental project. Sounds like a very fun job for the right person. Very cool. And you came from the data science world, right? You were at Coursera before this and Harvard, maybe walk us through your journey and why you chose Stripe.
A
I think I've kind of just always chased puzzles where better data, better understanding unlocks kind of outsized social impact. That's what drew me into academia. So Harvard. I was an econ PhD and ran a bunch of field experiments that expose hidden frictions. Right? Like why do referrals dominate hiring? Why are female playwrights so under produced? And you know, got a lot of just like pleasure from seeing policy shift, decision making shift, incentive shift. Once the evidence was clear, going to Coursera for me was really about kind of translating that impulse into product. Right. It was 2014, I was in my fourth year of the PhD program. I graduated a little bit early, so coming up on graduation and I said, hey, where do I think this obsession with better data unlocking outside social impact is most going to matter? Is it going to be in writing papers or is it going to be in diving into, in this case, Ed Tech. And Coursera was super small at the time. It was less than 40 folks. But what it turned into was AI driven learning paths and skills based hiring tools that opened opportunity for tens of millions, eventually hundreds of millions of learners around the globe. And the transition, I was there about eight years and the transition to Stripe's really the same mission. At economic scale, Stripe is about equalizing access to creating a company and reaching customers globally for businesses everywhere. And then, you know, economists by training, so really care a lot about incentives. And I think a thing that struck me from my first conversation with Patrick was how aligned incentives are between what Stripe wants and what the businesses running on Stripe want. So like if a coffee roaster in Berlin sells more. Right. Stripe grows and so does the Internet gdp. And so that ability to build and ship any product that makes a business more successful without even really needing to worry about first order monetization of that product. Right. Because we in most cases already sit on monetization of the payments infrastructure was just really exciting for me and kind of kid in a candy shop. And that's all manifested over the last almost four years now. The only other thing I'll add about the Stripe poll was just like the data set here is kind of like looking at a macro mri. It's like a real time image of the global economy that we can then actually action and improve. And so that's a little bit of economist catnip.
B
Awesome. All right, so the big news that you announced a few weeks ago now is the launch of your own foundation model, which I find fascinating in so many ways, including for starters, the fact that if you listen to the general zeitgeist on Twitter or on AI panels, a lot of people say, well, it's a silly idea to create your own foundation model these days because the large general foundation models will do all things to all people or for all people. So it's interesting to start with from that perspective, maybe walk us through the thinking of experimenting with the idea of a foundation model and then launching it.
A
We've, I think all seen and are all experiencing this sort of explosion of impact from foundation models that are trained on broad data and that can then be adapted for a bunch of downstream tasks. Right. So you know, GPT for language or diffusion for images or time GPT for time series. And in each case the trick is kind of the same, which is there's a transformer and it soaks up incredibly diverse data, it learns a kind of dense embedding space and then later you fine tune or prompt it for whatever job you need to your kind of push earlier. I think if you're doing a pretty standard image thing or you're doing a pretty standard language thing, you should for sure use out of the box with some prompting or some fine tuning and maybe we'll talk later about sort of the AI economy that we're seeing. But like there is just a wealth of really cool, you know, applied AI companies solving vertical problems that start out just as like pretty simple wrappers. And wrappers is sometimes said in kind of a derogatory way, which I think is actually misses the point. Like these businesses are bringing real context and real relationships and real incremental data to build that wrapper differentiated product experience. But like, I totally agree with the general sentiment that for many slash most businesses, and certainly many slash most startups who don't have access to any kind of proprietary data start with out of the box LLMs. Stripe is a little bit different, right? So we have really differentiated data, which is sort of data at the scale that I was talking about earlier, like $1.4 trillion a year in payments volumes flowing through us. And that's Data that's like OpenAI doesn't have that data, Anthropic doesn't have that data. And it's a pretty different problem in some ways, not in all ways, but in some ways than a language problem and certainly quite different than an image problem. And this isn't our first time putting that data to use. It's been well over a decade at Stripe that we've relied on specialized ML systems. We have radar for fraud we have adaptive acceptance for soft declines. But each of those models is sort of narrow single task model. And each of those models historically only saw kind of a sliver of reality. And so last year we were stepping back and looking at what foundation models can do and recognizing that we're logging tens of billions of transactions. And at that density, actually payments, while a different problem than language, start to look like language in some ways. There's an agreed upon syntax, there's the bin and the MCC and the amount. There's sort of some longer range semantics like is this device reuse? What's the merchant history? Where is it in the card life cycle? In a similar way to how language transformers learn an embedding space where words with similar meanings cluster together, we thought, hey, intuitively, at our scale and given how payments data is structured, we could probably learn payments embeddings as well. Or it's at least worth. Worth a shot.
B
Yeah. And just to double click on this, since you're on the topic, that's one of the things I find particularly interesting about the idea of creating this foundation model is that as you said, in credit card data, there is a lot that looks like language, but equally there is a lot that looks very different. Right. The data is presumably sparser, there's no grammar to it the way you would find in language or code. So I'm curious about how you thought about that kind of, you know, two sides, that heterogeneity of the data.
A
I would say the thing that's most interesting to me about the analog between language and payments is in language, words have a meaning in relation to the other words around them. And in much the same way a payment has a meaning in relation to the other payments around it. And so with our foundation model, what we're really asking is like, what if every charge got its own vector in a similar space and then as each new charge comes in, you place it in that many dimensional space and understand where it sits in relation to, for example, a known car testing attack or known fraud or a known merchant issue. The other thing I'll note about learning these embeddings is it doesn't require any labels. Right. It's fully unsupervised. So, you know, jumping back to the specialized models, fraud, Auth disputes those work because of the labels. But being able to do a fully unsupervised approach means you can actually use all of the tens of billions of transactions. You can adopt it at very large scales. You don't have to constrain to the subsets of data where you have relevant labels. And so I guess like the simple description of why a payment foundation model has turned out to work is like how much data can we learn from? So literally all of Stripe's history, not just some task specific subset, how richly we learn. So these very dense embeddings capture subtle interactions and similarities among charges that manual features or counter features will totally miss. And then the third, and this is more kind of operational, but I think it matters, given kind of the pace of AI, is just how efficiently we can build. Like we now have these shared embeddings. They're available in shepherd, which is our shared feature store, which we actually co built with Airbnb and have open sourced under the name Cronon. But like, it makes spinning up a new model become a weekend project, not a quarter project, because you get kind of out of the box these embeddings.
B
One aspect that I find particularly fascinating is that tension between traditional machine learning and generative AI foundation models. My takeaway from spending a lot of time in the space is that the end result of the current phase we're in is more of an ensemble approach, where you have foundation models for certain things and traditional machine learning models for other things, typically stuff that fits a bit more precisely in rows and columns. What I'm getting a sense in this discussion is that effectively the foundation model just outperformed what traditional machine learning models were supposed to be best at, to the point that the foundation model would replace the machine learning models. Is that the right impression or am I jumping to conclusions?
A
So yes, and I think we will get to a point where it fully replaces places. Today it is, as you put it, an ensemble, but it's an even more nuanced ensemble, which is it's an ensemble within a problem space. So, like take the example of card testing. So card testing is when a fraudster is trying to find cards that work either so that they can use them later for fraudulent purchases, or that they can so that they can sell them to other fraudsters to use. There are labeled examples of card testing. There are traditional machine learning models that Stripe has and has invested in substantially to identify and block card testing. But there are important slices of card testing that traditional methods just literally can't see. So if you think about a global online retailer, they might see hundreds of thousands of legitimate purchases in an hour. Fraudsters might slip in, I don't know, like a few 37 cent authorizations, like, way too dilute for any of your traditional models to catch the foundation model is basically watching the sequences in the way that you'd watch frames in a movie. It sees 200 near identical requests, same low entropy user agent, maybe rotating the proxy IPs, maybe space 40 seconds apart or something. They light up this red island that denotes card testing and can get blocked. And so what's unique about that is the number of clusters can be very large. There's a lot of different cartesting attacks that can be happening, but the number of labels that are needed to correctly classify a cluster is actually quite small. You really just have to know that if the cluster is tight enough, you really just have to know that there's some evidence of cartesting there to know that the whole cluster is card testing. Given the size of the stripe network, we can find labels for even very small clusters, which is what boosts our recall. So in this case, we ensembled together the existing traditional card testing models with this classifier classifying sequences of these foundation model embeddings. And our detection rate on large margins went from 59% to 97%. Will we move to a world where eventually all cartesting is detected by the foundation model? Maybe. But what's more interesting to us right now is solving the problems that couldn't previously be solved.
B
So how does one go about building a foundation model? Walk us through the history of this, when you guys started thinking about it and then what do you next and what team does it?
A
Yeah, well, so first of all, our first instinct was actually full on wrong, right? Which is like let's just throw like bigger transformers at single payments. And I said earlier, what's interesting about payments sort of similar to language is words only matter in relation to the words around them. Payments only matter in relation to the payments around them. But actually that wasn't ex ante obvious to us. A loan payment record you mentioned, kind of sparse. It's also kind of boilerplate. And after something like a billion tokens, the loss curve kind of flattened. Scaling wider wasn't going to be the answer. And so we actually had to change the question. And instead of treating a payment as an isolated atom, we stitched charges together into these short histories represented as sequences. Everything the same. I mean there's lots of different types of sequences. Everything the same card did in the past few minutes, everything that flowed through the same device on a Friday night, everything that this merchant's new bin saw during some pre sale frenzy. And then kind of like the moment we trained on sequences, the model had fresh signal to learn. And kind of the Curve started dropping again and so the backbone that we ended up with is a BERT encoder. And by the way, we also tried a decoder only model architectures like GPT. But just BERT is better for understanding tasks. Right. What we're really trying to generate is the embedding, the understanding of the payment and then we put it in relation to other payments. And the GPT is better for Gener, we're not actually trying to generate in the first stage.
B
So it's all based on BERT versus GPT. Fascinating.
A
Yeah, it's a BERT encoder and it definitely like you asked, who did the work. We actually just originally had three MLEs who we put in a little bubble. They'd worked on risk related problems in sort of previous instantiations of their careers at Stripe. But we put them in a little bubble and said think about the broad set of problems Stripe faces that might be solved by a foundation model and choose a couple of steel threads and then go see how much progress you can make against those steel threads. But these folks were protected from day to day operational load, were protected from incidents, weren't running any production grade systems at the time, and really operated more like a research team.
B
Are they part of that experimental group that you mentioned up front?
A
It actually wasn't because the experimental group has only been around about a year and a half now. So we started this shortly before that. But same concept, right? They don't happen to report into that. They report into our ML foundation team. But structurally it's the same idea and was part of actually the motivation for then scaling up experimental projects.
B
And because it's bird base, was that less of a massive compute data crunching effort or was it still intense?
A
I mean less of, yes. And still intense, yes. Definitely wasn't all smooth on the infrastructure side. We had to build a custom tokenizer and optimize it for Stripe events. We had to scale our data pipelines to grow to the very large data sizes I mentioned earlier. Previous models just hadn't trained on such large amounts of unstructured data all at once. We also just had to build custom data loaders to make sure that GPU utilization was high. Earlier versions actually resulted in like pretty low GPU utilization because data loaders, you know, became the bottleneck. And so yes, that made training more expensive, but also it made it slower. And so yeah, I mean this was, this was something bigger than we trained before. We had to add a bunch of checkpoints to make our runs more robust. Intermittent failures, you know the kind of stuff that you would be doing anyway if you were like an AI lab. But we are not first and foremost an AI lab. And so those were all sort of progressive builds for us.
B
Any other bottlenecks or parts that felt harder than they should have been, whether that was, I don't know, data quality or any other part.
A
When it came time to actually running the model in shadow, we ran all of our ML in shadow before we roll it out. Running in shadow was relatively straightforward. The first experiment though we ran in product had a bunch of latency and reliability requirements that put pressure on some of our systems. As you can imagine, these decisions have to be made in the charge path. So in real time you have maybe dozens of milliseconds to make the decision. And actually part of the reason that we were totally happy to start with this kind of ensemble model is we had a full fallback to the existing model in cases where we couldn't meet the latency requirements. But yes, plenty learned in the journey.
B
How do you think about transparency? So in the world of financial data and given the absolute mission criticality of what you do, and also from a regulatory standpoint, the concept of a black box quote, end of quote, AI, maybe something that people may raise an eyebrow about, how do you think about transparency and explainability?
A
My first reaction to that is LLMs are actually getting quite good at explainability. And so to the extent that the model is seeing patterns, even patterns that humans couldn't enumerate, sort of an LLM on top can say something like high velocity CVC mismatches on a new device are the explainable reason, sort of the summary of this cluster. But I really do think of all of these defenses as sort of like a two step dance. There will always be room for rules. Rules provide speed, rules provide clarity. We ultimately put our users in the driver's seat. Users can write radar rules. They can say, never accept first time cards from this country, over $1,000. And we actually about a year and a half ago released a tool called Radar Assistant that lets them type that in plain English and test it and ship it instantly without even having to write code. But then the models are really needed for, for nuance, for seeing the patterns that humans can't when they conflict. Historically, the Rule 1 merchants keep ultimate veto power. But a few weeks ago we actually updated our systems to blend the two even better. So we call it dynamic risk based rules. How it works is instead of the user writing a brute force rule like block every CVC mismatch or every postal mismatch the rule can be blended with the model. So block every CVC mismatch. If the real time model or the issuer score call it risky beyond some threshold, what that allows is kind of the best of both worlds, right? Like there's always some good customer who fat fingered and they should be able to get through but the sketchy traffic is is still stopped. So I don't think transparency or explainability is yet 100% there. I think we will continue to use rules and models in parallel. And then there are of course just engineering and logging best practices around making sure you are storing the features that were used by the model and the model output so that X post. If you know whether a user or a regulator comes and wants to understand what drove the decision beyond what you've logged, you can always reconstruct that cleanly.
B
You mentioned radar and the long history that Stripe has had to build. I'm curious about how you think about where to deploy machine learning and AI across products. Obviously we are in that moment in tech when everybody wants to do problem X plus AI and equals magic. But I think it would be very interesting for people to hear about how somebody like you at the very edge of the space think about, okay, this is a problem for AI and this is a problem where AI should be actually not included at all.
A
There's so much enthusiasm about the latest models and the latest methods and I think it's really easy to start with like what can the models and the methods do? And then try to like come up with a product from that. We like to start at the opposite end of the spectrum, which is like the simple business task, like what is the user pain that we are hearing or seeing what metric best captures that user pain. And if we were to build a AI solution ML solution that nudges this metric by, you know, even a single percentage point, right when you're talking about Stripe scale, like a single percentage point of improvement is a lot of money back to the businesses that run on us and the Internet economy. Does moving that metric matter? It sort of starts with the user pain and the business need. Then we look at the data. It has to be plentiful, it has to be already flowing through stripes pipes. Doesn't mean we can't think expansively about what other data we'd like to be collecting over time. But you're not going to turn on a solution, an AI solution today if you don't have the data. And it has to either be amenable to unsupervised approaches or or we have to be able to label it well enough that the model can learn is there the data and is it structured in a way that's useful? And then finally we like to ask, just like whether Stripe has a built in advantage, is this something we can do uniquely well because of our network? And that usually comes down to the shape of the data that enables it and the fact that we have that data in a way that other people may not. So a recent example that might bring that to life a little more is our Smart Disputes product which we announced just a few weeks back. So, chargebacks. So let's start with like the user pain the business needs. The chargebacks are really painful. Merchants lose about $55 billion a year to chargebacks. And fighting disputes is also really costly for the business. Like fighting a single dispute can mean putting together a 12 page evidence packet, digging up receipts, looking at IP logs, tracking down delivery confirmations, pasting everything into this like dozen page PDF. Most businesses are only bothered by the biggest ticket items. And for lean teams, which includes like basically all of the startups out there, they rarely bother. Like it's just not worth their time. They don't have the expertise in house. I was talking to a friend of mine the other day who runs a like jobs marketplace and she's one of the few marketplaces that monetizes off of the job seeker instead of monetizing off of the employer. And she's just getting crushed by disputes and she told me like, hey Emily, it's crazy that these people are disputing because they're saying that they, that they never used my service, but they've literally uploaded their resume. Like nobody else has their resume, Nobody else benefits from uploading their resume. Like this is, you know, it's called friendly fraud, but like that's kind of a misnomer because it's not friendly, but it's getting. But, but she literally doesn't fight them. Like she has all the evidence, but she doesn't fight them. And if you ask her, she's like, it's just not worth my time to put together these crazy packets. Okay, so like a small improvement in dispute win rates would translate into hundreds of millions of dollars across the Stripe network. So it's sort of this satisfies like the first bill of like, there's a real user pain and there's real business opportunity here. Okay, then the second is kind of, do we have the data? Well, we already see which disputes are being won and lost of those that are being fought, we already store Most of the data an issuer would want to see when it decides a chargeback. So it's a great candidate, which is why we launched Smart Disputes. And it's basically just a classifier that grades every incoming chargeback as it comes through on its likelihood of success. And if the model thinks that the merchant can win, then we overlay this LLM powered agent that goes out and gathers the right proof, right, like, you know, IP address matches for the digital services and screenshots of like, the usage and, you know, whatever the issuer historically prefers. And then it just bundles that evidence into the format that the bank expects and files the response without any human having to touch the case. And then of course, it watches the ruling and then feeds the outcome back into training. So it's keeps getting smarter. And like Vimeo and Squarespace were our two first adopters, but they're recovering 13% more revenue on disputed charges from adopting it. And they're doing that with zero extra labor. Like you literally don't even have to click a button, you just toggle once to turn it on. And then, you know, the impact is even greater for these tiny merchants who never used to contest chargebacks at all. And they now have kind of this like AI paralegal that's working for them. And so you weren't asking about Smart Disputes, you were asking about the mental model. But it's basically like big user pain, abundant stripe only data, a clear kind of model driven fix. And that's how we decide where's the next sort of place that stripe AI should go.
B
And a lot of what we talked about so far has had to do with stopping bad things from happening. So fraud, card testing, illegal chargebacks. Are there examples where you use AI to generate revenue? I guess the example that you just mentioned does generate increasing revenue. But whether that's, I don't know, a smarter route or faster checkout, any of.
A
Those things, for sure. And by the way, fraud done well also generates revenue in the sense that the alternative is usually doing fraud poorly, which has a bunch of false positive, which means you're blocking some good users. But the way we think about it is we use AI across every stage of the payments lifecycle. So from the second a customer lands on the checkout page all the way through to handling those refunds and disputes. And if you think about that life cycle, there's kind of five meaty steps. There's checkout, there's authentication, there's fraud detection, which is where we've spent most of our time talking. There's authorization. And then there's the downstream of events like the refunds and disputes. Checkout is sort of like the easiest for you or like me pre stripe to reason about because we all experience it as consumers. And I think we could all agree that checkout experiences feel pretty sort of staid and inefficient. Like no matter who you are, no matter where you're shopping from, no matter how you like to pay, you usually get like ish, the same old form. It doesn't adapt, it doesn't know you. And a lot of times that's kind of all it takes for a customer to drop off at the finish line. Some of that is little stuff, but some of that is big stuff. Like, you know, if, if I only have an Amex on me and Amex isn't shown, like I literally would have to text my husband to get a Visa card. And if I'm in another country and have no access to any of the payment methods that are listed, then you've basically shut off my market entirely. So we've been working a lot on fixing that in checkout. AI is our magic wand here. We call it Stripe's optimized checkout suite. And it's just about making the checkout experience increasingly personalized for our users. Customers. Right. So dynamically tailoring that experience to each of the end users again, our users. Users in real time. So like Turo, maybe you've used it there. Like the world's largest car sharing marketplace. They moved over to our checkout suite and saw a 5% increase in recaptured revenue, which for them was I think like 100 and some million dollars a year. Payment methods are a really interesting subcomponent of checkout. So there have been a proliferation of payment methods in the world, which from a market efficiency perspective is probably a great thing. Stripe now supports well over 100 payment methods. So like Apple pay ideal, buy now, pay later. And so what we do in the optimized checkout suite is like more payment methods is better for business because it comes kind of out of the box for businesses. They can reach more customers with what they need, but actually showing more payment methods to their customers is suboptimal because people get choice anxiety. If they don't see what they need in the first three, they give up. And so we provide all these payment methods, but then we automatically surface the most relevant payment methods based on who the customer is and what they're buying. And it works like businesses that show at least one relevant payment method beyond just cards. See like a 12% increase in revenue and a more than 7% lift in conversion. So conversion goes up and the size of the transaction goes up and that's like a really big deal for something as small as kind of the order of buttons on a screen. So that's checkout.
B
Can we nerd out on data infra for a few minutes? I'd love to talk about lessons learned operating data infrastructure, specifically for data science, machine learning and AI at this scale. What tools you use, what worked, what didn't, any lessons around scaling and operating at that level.
A
We use ML infrastructure that we've developed over time at Stripe and that relies on open source where available, and third party buy solutions where it's not differentiated for us and where there's a third party that meets our reliability and latency and cost consideration needs. For example, the data scientists and MLEs and even some of the software engineers here use notebooks for experimentation. We use databricks notebooks. We use Flyte for orchestrating our training runs. We use Nvidia GPUs and PyTorch for model training. Feature computation, including those LLM embeddings and feature serving, is done in shepherd, which again we built in partnership with Airbnb and have since open sourced under the name of Cronon. You know, I think so. Shepherd is new for us. Actually. We just completed the full migration to Shepherd a month and a half ago. That migration took on the order of about six months. But one lesson learned is to really make sure that we're investing sufficiently in the horizontal infrastructure layer so that individual product teams snap to the same infrastructure versus allowing their sort of golden workflows to diverge and everyone to spin their own transparently. The original feature computation and feature serving system we built, which was called Semblance, had a number of limitations. It was pretty hard to develop on, and as a result, one of our largest machine learning groups at Stripe decided to fork buy Tecton, a third party solution. We couldn't adopt Tecton across Stripe because Tecton was only useful for batch solutions and didn't meet the latency and reliability requirements of the charge path. So you could use it, for example, to score merchant risk at onboarding, because you have a couple minutes to make that decision, but you couldn't use it to score a charge because you have tens of milliseconds to make that decision. And we ended up in this fractured world which led to all sorts of issues, including like, actually like one of the most valuable signals for understanding whether a merchant is fraudulent is looking at the transactions that are happening on that merchant because, you know, there are certain patterns of transactions. Oh, many of your buyers are from the same IP or there's a big jump in prices. You used to be selling everything at $2 and suddenly you're selling everything at $2,000. That in and of itself indicates that the merchant is fraudulent and, and those features actually couldn't be shared because we were bifurcated. Plus, just from an investment perspective, you basically have like mini ML infra teams within the applied teams that are operating kind of inefficiently. And so we brought all that together under shepherd. It was a bit of a long journey, but it was definitely worth doing. And then we put enough work into it that we were like we should just open source it and make sure other people can build on it. Can build on it as well.
B
You have obviously a key real time aspect to what you do. You need to detect fraud in real time. Is there a specific way this translates into infrastructure tools that you use for that real time component?
A
I think it results in us. It's a combination of the latency requirement and the reliability requirement. We run on 56 nines reliability. You can't have downtime. And that's not just downtime of the core payments APIs. Downtime of, of the Radar API is super, super costly to the businesses that run on us. And so the SLA is needed for us to be able to buy are quite high. There's also pretty stringent security requirements. So there are often new startups, less so on the infrastructure side and more so on the applied side, who we would love to buy from, partner with, but they don't have the security protocols and controls in place for us to feel comfortable operating in their stacks. And so I do think that the nature of what we are doing, yes, the timeliness requirements, but also the reliability requirements and the security requirements push us to and I'm a big proponent of only build where you have a core competitive advantage, but on the margin for ML infra deployment do push us a little bit more towards build than we would have in other contexts.
B
All right, let's switch to the rise of agentic commerce. So obviously agentic is one of the big words of the last year or so. How do you all envision this? Do you view autonomous shopping agents as a part of that future and where do you fit?
A
Well, reasoning models are on the rise and with that AI is no longer just about getting answers to your questions. Right. It's starting to do things for you. I think most Individuals first felt that, like our individual aha moment was maybe with the shift from ChatGPT to operator right answer questions to like go out and execute tasks in a browser. But that shift from knowing to doing is a big deal. And I think one of the earliest places we're seeing it's going to change things is commerce. We've all seen those cool demos of agents buying stuff for people at Stripe. We started leaning into this about a year ago. And back in November we launched a toolkit that makes it easy for agents to transact on someone's behalf. So I like coffee. I drink a lot of coffee. There is, you might be able to tell by the pace of speaking, but there's this barista agent that is out there today and you tell it what kind of coffee you like and then it just scours the Internet for the best beans and then it buys them for you. But what's interesting about the barista agent is it's not a traditional coffee shop. It doesn't own any of the inventory. It is literally just doing the discovery matching. Plus I'll talk a little bit about the payments flows. Like that is the entirety of the app. And I think that's just a glimpse of how kind of who is doing the buying is starting to shift. Like agents are buying on behalf of humans. And then there's another big shift that's kind of happening in parallel. And by the way, both of these are early, but I think just given the pace at which we're seeing things change or will probably move pretty quickly, here is like where the buying happen. So more people and more businesses are spending time inside AI tools and with that product discovery and browsing and now even buying are starting to happen in those tools. So like Perplexity, you may have seen that they recently launched hotel discovery and booking in the app and it's powered by Stripe. But unlike most hotel discovery and booking surfaces you might think of, you're not linked out to a merchant website. You aren't taken to separate checkouts. You stay within the Perplexity app. And I think that kind of like in situ commerce is really interesting. We're also working with hipcamp. It's like summer season, so maybe a good time to mention this. They just use agents to book campsites at state or national parks on the camper's behalf. Even off platform, the agent goes and completes the booking. They do it really safely with these virtual cards in terms of the money flow. And it just gives campers access to sites that aren't normally all Bookable in one place.
B
So behind the scenes, how does that translate into requirements? Whether that's, I don't know, speed, data formats, authentication, the checkout experience that requires you guys or not to just change the way Stripe works.
A
Early days, the biggest change is around the money flows. But I would caveat because you know, people get really jumpy about like, oh, an agent buying for me, that sounds super scary. I argue that like in practice, agents have actually been buying for us for years. They were just human agents, right? Like when I order my salad from DoorDash, DoorDash charges my credit card and then it issues a single use virtual card to the driver, right? The driver is my human agent who goes and buys the salad on my behalf. And they can only buy at Sweet Greens and they can only buy for $25 and they can only buy in this two hour window in my town. But it is very controlled. And that single use virtual card in the DoorDash case happens to be, you know, powered by Stripe. And so what we're doing here in sort of the first, most simple iteration in your mental model should be swap out the human agent for an AI agent. And that's how Barista Agent works, right? It's just using a single use card from Stripe issuing to make the purchase, just like the DoorDash driver does. So the transaction's controlled and you know, your, your data stays safe. Now I don't think that that will be the only mechanism for agentic commerce or the limit to what gets done, but it is sort of the first instantiation that we're, that we're seeing. Is this just difference in money movement or replicating kind of human agent money movement with machine agents. The other thing that's kind of interesting, you know, Those were all B2C like consumer examples. But just like you and I are spending a bunch more time in ChatGPT or perplexity or whatever we like to use, developers are spending a lot more time in cursor and various AI dev tools to code faster. And so another example of agentic commerce, which maybe isn't the first thing that comes to mind for people is like you're in cursor, you're building your product, you want to set up some front end thing, bot protection, whatever, something like Vercell. Normally what do you do? You stop coding, open a new tab, go to Vercel, sign up, get your API keys, bring them back to cursor, like total context switch. But now you can just buy Vercel from inside cursor right there. In the code editor so you don't have to break your flow. It saves time for the user. It also creates a whole new channel for Vercel and Cursor to sell software directly. Right where the work is happening. So I mentioned in situ commerce for the consumer, but this is in situ commerce for The Developer or B2B and Stripe enables those transactions too. So I don't know. I think there's a brave new world of agentic commerce and who's doing the buying is different and where they're doing the buying is different. But there's a bunch of other stuff that's going to need to evolve too.
B
And as you push the reasoning further and think of multiple agents that need to coordinate and everything happens through code, do you then get into a different world where Stripe needs to sort of behave differently?
A
I don't know if you know Daphne Caller, but she hired me at Coursera. She co founded Coursera back in the day and I remember talking to her in the parking lot one night. She was notorious for staying very late. So it was always dark when we talked to the parking lot and also driving very quickly. So you really had to get out of the parking lot before she got in her car. But she was so to me, in the parking lot late one night. We were so. Everyone was so young and like half those people ended up married to each other. We were like there at all hours. Anyway, the first movie, the way she described it to me was like. Because we were talking about, you know, where, where were we gonna. This was literally 2014, like where do we need to evolve the learning platform and teaching platform to be. And I think her words were something like, you know, the first movie was just filming a play on stage and then you think about whatever, today's latest Hollywood release and it's like this whole set of experiences that are only possible because it's on film. And sort of the analogy she was drawing is like the very first MOOC. Literally what we had in 2014, massive open online courseware, was like recording Andrew Ng up in the front of his Stanford classroom, right? But today companies like Coursera and Khan and whatever, we've actually built learning experiences that are only possible because of the data, because of the technology, because of the what you can do sort of through this new medium. And just bear with me, like I kind of think it's going to be the same for commerce, right? So the earliest versions of agentic commerce have looked a lot like flipping an AI agent for a human agent, right? Like instead of the Doordash driver doing it like the barista agent is doing it. And actually, you know, we didn't talk about order intents, but one of the things that we're also enabling is right down to the agent navigating a web browser and filling out the human optimized checkout form. And that feels like a very reasonable place to start. But that's not what agentic commerce will be, right? Like imagine now you're like no longer selling to a person who's scrolling through your site. You're selling to a piece of software that has already read all the reviews and has price compared the market and is now in a hurry to kind of tick payments off its list. That's like how the AI agent is going to feel. And it's going to buy very differently from you and me. And so, you know, we're still working through a ton of this, a ton of this is yet to be built. But you asked like, what's it going to demand of stripe? And I think sort of some high level design principles. Like the first is just that intent is the interface, right? Like humans click around, agents just declare what they want. So in perplexity, a traveler will type, you know, find me a flight to New York under $300. But perplexity is going to turn that sentence into a single JSON blob. It's like origin, destination and budget cap. It's going to fire that at the seller. And so every merchant API is probably going to need one canonical kind of intent endpoint that accepts those structured desires instead of sort of this UI click world that we live in today. Second, I think it's pretty clear that product data is going to have to be machine readable, right? Like, I don't know if you've ever played around with like United's Fair database. It is not perfect for humans. Sometimes it's like intentionally opaque for humans, but it is definitely useless for code. And so I think early adopters who want to sell through agentic channels are going to need to expose kind of an open product schema like the SKU and the inventory and the price and the constraints and maybe even the wedge that you're willing to give to the facilitator agent who's facilitating the commerce. Like that's not CSS, that's, that's not JavaScript. And then the agent's going to be able to run kind of a SKU level search and know with like cryptographic certainty. Right? Like Flight UA263 for $250 is still available. Right. So I think that'll change. I think latency budgets are going to shrink to machine time. We talked about latency budgets in the context of the charge path. But like, you know, people will wait three seconds for a spinner. I think an agent's just going to retry somewhere else after a couple hundred milliseconds. And so it's all going to have to be like pretty fast. And then we touched on this briefly. But just like a ton's going to have to evolve in the risk space today. I think human buyers think of themselves as owning their credentials. I own my card numbers. Credentials are going to have to move from being possessed, being owned to being permissioned. Someone gets a one time scope limited token to spend that $250 on United Airlines before midnight and that token can't be used, replayed at another provider and it evaporates after use. And it has all sorts of limits. Whatever trust is going to have to be super programmable. Like some developer IDE is going to be buying GPUs on behalf of 50 different startups and will want it to attach sort of a verifiable business profile including a risk score so the downstream sellers can accept or refuse the purchase. We're going to need a lot more observability. We want to kind of, if you take the hip camp example, right, like it's camping bot should be able to book federal park campsites, but it also needs to be able to expose these real time logs so that hosts can reverse anything that looks odd and then it's probably obvious. But like good bots need to be very distinguishable from bad bots. And a lot of the classic fraud tools might mistake a good bot as a bad bot or just consider a bot to be bad. Like speed data format, auth are all going to change when the buyer's a bot. And I think it's just going to require designing for intent and publishing those structured catalogs and signing and scoping every credential, instrumenting everything. And then just like we're all going to have to teach the risk stack to tell the good from the bad.
B
Fascinating. Where does MCP fit in that picture? So MCP being the emerging agent to agent protocol and stripe was early in setting up your own MCP server. Where does that fit and any lesson learned with your experimentation with MCP so far?
A
I think there's two bits. One is like our own MCP server and the other is how we enable MCP payments. And there are, they're different but I think they're both kind of interesting in their own right on the former. Like, you know, we talked a bunch about, you know, commerce related examples, but there are AI agents out there now, probably a greater number actually than commerce agents that are helping you run your business. So not the, not the transaction commerce part, but the running your business, like doing the boring admin stuff you hate. So generating the invoices off of messy spreadsheets and you know, updating cards on file and changing billing plans and analyzing business metrics and doing support stuff. And they're doing that without needing a human MCP model. Context protocol is a critical enabler here. Yes, it can be agent to agent. MCP can also be a translator between LLMs and SaaS APIs, more deterministic SaaS APIs. You can think of it as the simplest version, just the LLM reads a menu of tools, for example, a menu of Stripe tools. And then when you ask a question, the model picks the right tool and fills in the JSON. And then the MCP server fires, in this case, the actual Stripe call. It's the same principle as a browser hitting a rest endpoint, but the client is just a bot instead of a person. What does this actually let you do today? Stripe's MCP server lets you do all the most common kind of low risk tasks that you can do on Stripe or through our API. So list customers or create customers or find your product prices or spin a payments link or issue a refund or pull up your balance. All of the kind of boring but essential stuff you do 100 times while you're wiring up your Stripe integration or trying to serve your customers. One of the most interesting use cases I saw recently was actually decagon. Are you familiar with. Yeah, customer AI, Company customer AI. And in less than one week, one engineer at decagon built an integration with Stripe through our MCP server that just lets decagon's customer support agents securely access all of the info for their users. Right? So their users invoicing info and subscription cancellations and whatever. So now decagon's customer support agent can, on behalf of the business, find the business's customers invoicing info or cancel their subscription or deliver their refund directly from their customers Stripe accounts. And the first Decagon customer that they release this to reported a 65% drop in support costs. Like, it's kind of striking how much of support is cancel my subscription, give me a refund, explain my invoice. Right. Stuff that actually can be done in a fully automated way if you have clean access to your stripe systems. So where do we think this is going to go? I mean it's pretty clear that MCP is becoming the default way that any single Service, Stripe or GitHub or Notion talks to an LLM. And so naturally I think MCP also needs to support monetization, which is why we've enabled MCP payments so you can seamlessly monetize your MCP server using Stripe as well.
B
We previewed talking about the new AI economy earlier in the conversation. Stripe for the reasons that you describe as a very unique vantage point into what companies do and their growth and all the things and you release from time to time to time really interesting stats and perhaps we'll put some of those as a link in the show notes to start at a high level, what do you see that's different in this generation of AI companies? From your vendor point?
A
One of the things from my economist's hat that I love about working here is just kind of this front row seat to, hey, what's the growth trajectory of each successive wave of, of startups in particular? And the current wave of course is AI. We work with AI companies across the stack. So when I talk about the AI companies on stripe, you should think of this as everything from infrastructure and modeling to full blown applications. OpenAI, Anthropic, Suno, Perplexity, Cognition, Eleven Labs, Decagon, Sierra and a long tail of others. We recently looked at the Forbes AI 50 and 78% of them are Stripe users. That 78% reflects 100% of the Forbes AI 50 that accept online payments. And you know, I think there's, there's a lot of hype around AI tech and I think fair questions around the monetization. And so we took a look at, hey, with this current wave of AI startups, what do we see in their monetization trends and in their growth trajectories? The long and short of it is like they are monetizing super fast. They are monetizing faster than any previous generation of startups that we've seen. We focused in just for concreteness on the top hundred highest grossing AI companies on stripe and we asked, okay, for the median in that cohort, how long did it take them to hit various revenue milestones and what did their customer base look like? What did their monetization strategy look like? And those that already hit 30 million in annualized revenue got there in about a year and a half. For comparison, many of us were around five years ago. The fastest growing SaaS startups on Stripe took five and a half years to hit that same mark. So this kind of AI wave is scaling revenue. You can think of like 3x the speed of the SaaS boom. And it's not just the big players. If you look at the newest AI startups, the ones just getting going, they're ramping even faster. The ones that hit a million, the median gets there in five months, they're earning 4x more in their first year than peers who launched just a couple years earlier. I was at Stripe Tour Paris and so a couple weeks ago and was a week and a half ago and was looking at some of the European breakouts lovable out of Stockholm, hit 50 million ARR in six months and is now for sure the fastest growing startup in Europe. Cursor, which of course we mentioned earlier, helps developers code with AI. They only launched two years ago. They recently announced that there are over 300 million in ARR. So just really astounding growth rates. And I think doesn't mean it comes without cost, including inference costs. But this is a real wave of businesses building real value in the market, else they wouldn't be able to monetize it. And they're doing that way faster than we've seen in, in any previous tech cycle.
B
You mentioned Paris and Europe. Do you find that those companies are global earlier in their life as well?
A
For sure, these AI companies are going global way faster than their predecessors. If you look at the AI, that AI100 group and you ask the median, the median is in 55 countries in their first year and 80 countries by their second year. And that is twice the internationalization of equally promising earlier SaaS companies at the same stage of their evolution. And it's real money that they're getting cross border. Like today, these companies generate the majority of their revenue. I think the median is 56% of revenues from international customers back to France. Like Photo Room is very cool. It's like one of the darlings of France, the AI photo editor. It helps you clean up images. They went From I think 0 to 50 million ARR in three years. They already sell into 184 market markets like, I mean you don't have to go that deep into your geography background to know there aren't that many more markets to sell into. Right. And well, some of it is that they're selling, you know, infrastructure and models and digital art and music and stuff that just works across borders. Some of it is that LLMs are good at translation, but some of it, honestly, to our conversation earlier on optimized checkout suite is just that the bar to going global has gone down, right? So almost all of these guys adopt our optimized checkout suite. It comes with over 100 payment methods out of the box. That gives you global reach and conversion. But also there's all the hassle of global for managing tax and regulations. And a bunch of our solutions like Stripe Tax help these businesses scale up globally with very lean teams because that's another trend we didn't talk about yet. But these folks are building very real businesses with 10, 20, 30 people in a way that's quite striking and actually never been seen before.
B
And look 100%. I mean, not that you need praise from me, but you guys should absolutely take a victory lap for enabling a whole generation of startups around the world. A combination of aws, Stripe companies like Deal and others. You can just launch your business globally in a few days after you incorporate the company. Which is insane and which is partly the reason why you see this generation of companies growing so fast. Yes, AI is hot, but the enabling layer now exists in a way. And I would add that on top of that you got the global communication layer where everybody, at least in tech, is on X and this whole world of problems that were just abstracted away in a way that was just completely unimaginable 15 years ago.
A
So I have a hypothesis about a second order effect from that, which I haven't robustly validated, but I'm going to say it anyway because I'd love you to chew on it. Which is an interesting corollary of being so global from day one, is that today's vast Internet markets enable and reward specialization. The markets today are so much bigger than they were a decade ago. And correspondingly, what people are building with AI is starting to look a lot like what we saw with SaaS. Like first horizontal and now vertical, right? SaaS was like first Salesforce and then toast. AI was like, you know, first broad tools like ChatGPT and then highly specialized industry specific applications in healthcare, in real estate, in architecture, in restaurants. But the switch from horizontal to vertical, which definitely happened in SaaS happened so much faster with AI. And I think part of that is for sure that like the models enable these, we talked earlier about wrappers enable these sort of specialized products to spin quickly and find product market fit without having to sort of invest a bunch in upfront research. But also I think the fact that they are global provides additional tailwinds to that, which is when stuff is truly borderless. Specialization is rewarded. Because the markets are bigger and so even a very specialized niche is a very large business.
B
Yeah, no, vertical is narrow, is too narrow when you can do it globally. Really interesting. Yeah, I love that thought. Yes, I think part of it is also seeing the LLMs go up the stack and go from being foundation models to increasingly application companies and covering a lot of the broadly horizontal stuff, which pushes people to the vertical aspect of things. But I love that thought, that being international in a given vertical makes your vertical market very. What are you seeing in terms of in your world that's different with AI companies in terms of billing, pricing, business models.
A
Okay, so selling software used to be you build it once, you incur a fixed cost of building it once. I'm slightly oversimplifying. Obviously you continue to do R and D, but it is a high fixed cost of building and then you sell it by seat over and over and over again at very high margins because the marginal cost of providing the software is low. Okay, that is not true with AI. As products get more AI centric, at least today, inference costs are, you know, more meaningful. And so companies are shifting from this sort of per seat billing. And by the way, if the AI does really well, there might also be fewer human users who need such seats. So it's not clear that per seat billing was going to get you the revenue even if it ran. But companies are shifting to usage based billing first to align pricing with costs and then second, a trend. And this one's earlier, but I think it's where actually like the market equilibrium, like where clearing will actually happen, you know, two, three, five years from now, is experimenting with new pricing models like outcome based pricing. And actually increasingly using outcome based pricing, which, you know, provides flexibility and really only charges you for the stuff that works as a competitive, competitive differentiator. So it can be hard to evaluate whether AI is going to work or not. And if you can go in and say, look, we're only going to charge you for what works like that is a much lower risk proposition for the business than saying we're going to charge you per se or we're going to charge you for usage. Right? It's like, well, what if I use it but it doesn't work well enough. And so I'm paying the inference cost, but it's not like moving the needle for my business. And so Intercom is an Irish founded company and they're also reinventing customer service. There's a lot of interesting stuff in the customer service space, but they're moving their support product from charging per seat, right. The olden days model, which is how most SaaS is built to charging per resolved case, which is it aligns incentives with their customers and it is actually like an outcome based version of pricing. And you know, so I just think like stepping back, AI is changing everything. It's increasing productivity. We think you don't want a pricing model that is static. You don't want a pricing model that depends on your customers hiring ever more people. You also don't want a pricing model that is assuming near zero marginal costs given inference costs. And so we do see these, these businesses iterating very quickly to figure out kind of where does supply and demand intersect. And correspondingly we're sort of arm in arm with them, working on our billing solutions, including usage based billing and outcome based billing and really partnering with this current wave of AI startups to make sure that their pricing and monetization approaches A work for the market and then B can be very fast evolving and highly unconstrained.
B
So maybe as a last theme to close the conversation, atopic du jour is our companies use AI internally and it's a little bit of AI coding, vibe coding on the one hand and then on the other hand the Toby memo about AI literacy and then you saw Aaron at Box do the same and the CEO of Zapier and so on and so forth. How do you all think about this in terms of building or governing AI literacy Inside Stripe?
A
For us, I think it really starts with a culture of experimentation. And I actually like to tell the story of how back two years ago now, a couple of engineers hacked together a little internal beta for an LLM explorer and the basic idea was like, hey, let's get a ChatGPT like interface in the hands of thousands of talented Stripe employees and just have them figure out how to apply it to their work. And Stripe is coming into this from kind of a long running culture of bottoms of experimentation all the way up to kind of Patrick and John leaders here have very intentionally crafted that we think a lot about sustaining experimentation and innovation internally as we grow. And so in the case of LLMs, for us this was like, hey, let's just quickly unlock internal experimentation and obviously that needs to be done safely, right? Like people are going to experiment. The enthusiasm was palpable, like they better not be in their personal ChatGPT accounts, especially given the sensitivity of Stripe data. So you know, we decided fairly early on to organize cross functionally and just set up the tools and policies so that any Stripe could safely play with LLM capabilities. We also decided early on to decouple from anyone model because we saw the models evolving quickly. So the first version of this LLM Explorer had just GPT 3.5 and GPT 4, but today we serve dozens of models through the tool. We assumed collaboration, so people are very social and the returns you get from building something are almost never worth it if that thing only works for you. And so we enabled these things called presets, which are basically shareable prompts. And basically overnight the Stripe community developed hundreds of these of reusable LLM interaction patterns. And I think from there we're kind of off to the races and we had a bunch more to do. Like, hey, let's make sure that we built LLM proxy like any engineer should be able to hit a standard API to get access to LLMs and build their production grade applications. We actually only relatively recently ga'd like an agent builder internally that hooks up to what we call Tool Shed. So it has access to the MCP servers for Google Cloud and Jira and Slack and whatever else. But it started with just a small number of engineers saying everyone at Stripe should have access to LLMs. They should be able to share what they build with LLMs, then they should be able to access those LLMs programmatically, and then they should be able to build agents on top.
B
Zooming out. Anything you can talk about in terms of like roadmap, what you're currently working on, what should we expect in the next 12 to 18 months? Anything you can share.
A
I mean, it's a lot of going big on what we talked about today, like deploying our foundation model across applications, building really robust risk as a service, helping our users prepare for commerce in an AI era. I can't share any super specifics, but I think kind of see where we're headed with foundation models with MCP with order intent, with sort of the perplexity shopping example. And you can expect to see more of that from us in the coming months.
B
Brave new world. All right, thank you so much. This was fantastic. Love the conversation. Thank you so much for spending time with us.
A
Super, thanks for having me, Matt.
B
Hi, it's Matt Turk again. Thanks for listening to this episode of the MAD podcast. If you enjoyed it, we'd be very grateful if you would consider subscribing if you haven't already, or leaving a positive review or comment on whichever platform you're watching this or listening to this episode from. This really helps us build a podcast and get great guests. Thanks and see you at the next episode.
The MAD Podcast with Matt Turck
Guest: Emily Glassberg Sands, Head of Information, Stripe
Date: July 10, 2025
This episode features a deep-dive conversation with Emily Glassberg Sands, Head of Information at Stripe, exploring the company's AI-driven transformation, the rise of "agentic commerce" (software agents transacting on our behalf), and the broader implications for payments infrastructure and the AI startup economy. The discussion covers Stripe’s proprietary foundation model, its unique approach to AI, real-time infrastructure demands, business model evolution, the global spread of AI startups, and the emerging landscape where autonomous agents conduct commerce.
Stripe’s scale and impact:
"Businesses on Stripe grew seven times faster last year than the S&P 500." — Emily ([04:00])
Role of the Information Org:
Emily’s journey:
Why build a proprietary foundation model?
“Stripe is a little bit different. We have really differentiated data… OpenAI doesn't have that data.” — Emily ([09:58])
Language-like signals in payments:
Embeddings power unsupervised learning, overcoming label limitations ([13:55]):
"It doesn't require any labels... you can actually use all of the tens of billions of transactions." — Emily ([14:36])
Traditional ML vs. Foundation Model:
"Detection rate on large margins went from 59% to 97%." — Emily ([19:43])
Development journey:
"Our first instinct was actually full on wrong." — Emily ([20:18])
Operational challenges:
Explainability and regulatory demands:
"LLMs are actually getting quite good at explainability... we will continue to use rules and models in parallel." — Emily ([25:51])
Where to apply AI?
Revenue generation through AI:
"Businesses that show at least one relevant payment method beyond just cards see like a 12% increase in revenue." — Emily ([36:55])
ML and data infra stack:
Real-time requirements:
"You can't have downtime... even Radar API downtime is super, super costly." — Emily ([41:35])
Definition and current applications:
"AI is no longer just about getting answers to your questions. It's starting to do things for you." — Emily ([43:17])
Behind the scenes:
Systemic implications:
"Humans click around, agents just declare what they want." — Emily ([49:50])
Fraud and risk changes:
What is MCP and why it matters:
“It’s pretty clear that MCP is becoming the default way any single service… talks to an LLM.” — Emily ([58:54])
Future:
Stripe’s vantage point:
“They are monetizing faster than any previous generation of startups that we’ve seen.” — Emily ([60:28])
Hyper-global from day one:
"They already sell into 184 markets... you don't have to go that deep into your geography background to know there aren't that many more markets to sell into." — Emily ([63:41])
Market specialization and business model innovation:
“Specialization is rewarded. Because the markets are bigger and so even a very specialized niche is a very large business.” — Emily ([66:49])
"As products get more AI centric… companies are shifting from this sort of per seat billing… to usage based billing… and experimenting with outcome based pricing." — Emily ([68:00])
“We enabled these things called presets, which are basically shareable prompts. And basically overnight the Stripe community developed hundreds of these of reusable LLM interaction patterns.” — Emily ([72:18])
"Businesses on Stripe grew seven times faster last year than the S&P 500." ([04:00])
— Emily Glassberg Sands
“Stripe is a little bit different. We have really differentiated data … OpenAI doesn’t have that data.” ([09:58])
— Emily
“Our first instinct was actually full on wrong.” ([20:18])
— Emily, on Stripe's initial approach to their foundation model
"LLMs are actually getting quite good at explainability... we will continue to use rules and models in parallel." ([25:51])
— Emily
"Companies are shifting from this sort of per seat billing… to usage based billing… and experimenting with outcome based pricing." ([68:00])
— Emily
"Specialization is rewarded. Because the markets are bigger and so even a very specialized niche is a very large business." ([66:49])
— Emily
“Humans click around, agents just declare what they want.” ([50:30])
— Emily
“It’s pretty clear that MCP is becoming the default way any single service… talks to an LLM.” ([58:54])
— Emily
“They are monetizing faster than any previous generation of startups that we’ve seen.” ([60:28])
— Emily
| Topic | Time | |---------------------------------------------|---------------| | Stripe’s scale and infra | 00:00–04:06 | | The Information Org's scope | 04:17–05:37 | | Emily’s background and joining Stripe | 05:56–08:53 | | Why Stripe built its own foundation model | 09:39–13:16 | | Foundation model vs. traditional ML | 16:24–20:07 | | Model development & challenges | 20:18–24:25 | | Explainability and transparency | 25:24–28:37 | | Where to use AI (Smart Disputes) | 29:18–34:10 | | Revenue-generating AI (Optimized Checkout) | 34:32–37:46 | | Stripe's ML/data infrastructure | 37:46–41:22 | | Real-time and reliability considerations | 41:22–42:50 | | The agentic commerce paradigm | 43:13–49:35 | | Technical requirements for agentic commerce | 49:50–55:44 | | MCP protocol and server | 56:02–59:30 | | AI startup economy trends | 60:06–64:56 | | Specialization, verticalization | 65:52–67:18 | | Pricing and billing transformations | 67:58–70:57 | | AI literacy & internal culture at Stripe | 71:32–74:04 | | Roadmap and next 12–18 months | 74:04–74:44 |
This conversation is a must-listen for anyone tracking the intersection of AI, fintech infrastructure, and the next frontier of digital commerce. Emily brings clarity, specificity, and candor, with real-world examples from Stripe’s vantage point at the center of the new AI economy.