![[State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify — Latent Space: The AI Engineer Podcast cover](https://substackcdn.com/feed/podcast/1084089/post/186610557/d01dd7cecc754e03e5567d7830eb093e.jpg)
Loading summary
A
Okay, we're here with Sarah Karnzaro from Amplify. Welcome.
B
Thank you. First time on the pod to be here. I took too long. I know, I know. We. We've known each other for so long. Yeah. Never made an appearance.
A
It also made the transition from data to AI, I guess. I don't know if I did. I don't know if you were always, like, as deep on AI, but obviously there's a lot of simpatico.
B
Yeah. I've always actually kind of oscillated between data and AI. Sure. Like, arguably, I started my career in quote, unquote, AI. It was just more like symbolic systems back then. But as you said, I think they're so symbiotic, it's almost hard to divorce them. That's actually what brought me into data. I was like, I. I want to better understand what happens when I write a SQL query.
A
Yeah. Let's briefly touch on data, because I think, obviously that's a lot of. Where you and I first met. DBT5tran. That was so cool. I mean. Or.
B
Yeah.
A
How do you think about the end of the modern data stack?
B
Okay, so a lot of people look at the DBT5tran merger and talk about the end of the modern data stack, and I think that is a fundamentally wrong take. Both of these companies were growing, you know, very healthily. Both of these companies.
A
Do you fund a dbt?
B
We funded dbt, so. So, like, both of the companies were actually, like, beating their revenue targets. I think what you're more seeing is a, you know, IPO environment wherein companies are expected to have far more than, you know, like, 100 million revenue. And so what would you say to.
A
Bar is now 300.
B
No, like, above 600. Yeah, yeah.
A
And the combined company is 400.
B
I believe that they'll actually be close to 600. I don't have the exact number, but.
A
They clearly just getting ready for ipo.
B
So, you know, basically, like, the merger was a way to accelerate that path to liquidity, as you might remember.
A
And they were the presumptive winners in their categories anyway.
B
Exactly, exactly. You know, I think one of the things that has actually pleasantly surprised me, and this speaks to, again, the symbiotic relationship between, you know, data and AI. Many of the big frontier labs are actually using both DBT and fivetran. I recall talking to folks at Thinking Machines, like, within weeks of the company's formation, and DBT was already an important part of their stack. Certainly, like training, datasets need to be managed. We need insight into what users are doing on these platforms. And in fact like the way in which you would analyze interactions with an agent or analyze interactions with an LLM is even more complicated. And so while I think perhaps like the demand for analytics engineers, the demand for data scientists didn't explode in the way that some people thought, like analytics engineers are not one third of personnel. That doesn't actually mean that the demand for the tools is not still like very prevalent.
A
But you got what you wanted. You wanted to democratize things, you got it.
B
Yeah, yeah, I mean, I guess we democrat, we democratized things by perhaps reducing the need for the people. I don't know whether or not that is a good thing, but honestly I do think that like the fact that it is easier than ever from a tooling standpoint for people to make data driven decisions is probably a step in the right direction. And I've become actually convinced that like while every company does need analytics engineers and does need data scientists, they probably don't need armies of them. And probably having like a moderately sized data and analytics team is a good, good thing.
A
Yeah. So you touched on an interesting thing. I wasn't planning to ask, but this is interesting. So I come from the data field. Data was synonymous of analytics.
B
Yeah.
A
But you're now saying that the DBT5 trend are being used for training data. Is there any notable differences in the workloads or the requirements?
B
Undoubtedly there will be. I mean, I think one of the things that we saw with analytics that was surprising to some of the people in the data infrastructure space was that the workloads were actually quite predictable. They were quite predictable because like many of them were actually not being generated by humans, but rather by deterministic systems. So a lot of it was like BI dashboards that are tableau that is actually hitting your database, or maybe not tableau, but like looker or you know, hacks or something like that. I think with like analyzing, curating, preparing data sets, it's a bit more ad hoc and so undoubtedly it will be less predictable. I don't know if that really changes the way that we approach developing data infrastructure. I talked like some people are quite interested still in things like learned indexes, learned optimizers, and it's a bit easier to build a learned optimizer if you have more predictable workloads. And so it could change the way that we approach things like that.
A
Yeah. Data catalogs, do they become more important? Are they transferred?
B
Oh man, like straight to the gut. So, so that was something I got wrong.
A
I, I'm sorry. I don't know the background. What, what did you, I, I just.
B
I really believed that data catalogs were going to become an important part of, you know, the modern data stack.
A
And the players are Atlin. I, I, she's Singaporean.
B
So I, yeah, yeah, there, there was data world, Data world metaphor within our portfolio.
A
They've all struggled as a, have struggled.
B
A bit as a category. Many of them have been acquired subsequently, which suggests that this was not perhaps a standalone category. As a data scientist, I spent so much time working on data catalogs and so I kind of felt like this was the thing I wanted, I didn't want to have to build the, more.
A
To the point also pre training data. You have a lot more heterogeneous data all over the place.
B
Yeah.
A
And like you need to keep on top of it and, and you need to make it discoverable, accessible and all that. So why didn't it work?
B
So I think there were a couple of things. I think we have seen some consolidation in the modern data stack, particularly around, you know, some of the key components. Whether it was, you know, fivetran or DBT or you know, Hex or you know, Snowflake. Many of these products offered kind of like data cataloging capabilities as a feature. And I think for humans that was good enough. Like the data catalog that you had available in Snowflake was good enough. The data cataloging capabilities available in dbt, like those were good enough. They did dbt.
A
Like obviously as they didn't build the cloud, they were going to build it.
B
Yeah.
A
What else do you do?
B
I mean it's actually funny. In fact, my colleague bar at Amplify was the products lead on these kind of metadata services. I think it's still not obvious to me, but I think one opportunity that might have existed and or could have been realized was the opportunity to build data catalogs not for humans, but for machines. This would look a little bit more like metadata services. I don't just mean for agents, although I think that opportunity is arising more. But even like microservices and things like that.
A
Okay, yeah.
B
So I do wonder at times like if we built data catalogs for the wrong people and potentially even you know, for the wrong use cases. Like I think a lot of data cataloging companies ended up focusing on like discoverability when perhaps like the real market opportunity was in governance.
A
Governance, very important. Any other comments? Just about what you know so far about the data stacks of the large labs. I guess obviously a lot of data people who might be listening would want to sell into them.
B
Yeah, I mean, a couple of observations. One is that they are actually paying careful attention to their data stacks. I think they're thinking about problems ranging from data discoverability to data preparation to even things like the efficiency of data loading. If you're unable to load data to a GPU efficiently, then the GPU is going to sit idle and that's going to be kind of like. Yeah, yeah, exactly.
A
So what, what solution is. Handles that? I don't actually.
B
I mean, I get to. To talk about. Yes, exactly. Plug my portfolio companies. We have a portfolio company called Spiral that has developed a file format called Vortex and they make data loading like super efficient.
A
Specifically GPUs.
B
Specifically two GPUs.
A
Okay, yeah, yeah, good to know.
B
One of the things that has surprised me though is actually that like so much data infrastructure has actually scaled quite elegantly to meet the AI use case. You would hope you would. But like, the scale of these AI companies, it's incredible.
A
It's not as big as ads.
B
Maybe. Maybe. Yeah. I think that could change as agents actually become kind of like more prevalent and are interfacing with each other and therefore perhaps like the number of transactions explodes. I have a friend who works on transactional databases at OpenAI and I was like, so you must be like building databases. This is like a paradigm shift in terms of the scale that databases are going to need to handle. And he's like, no, we use Rockset.
A
That's the one that you fire, right?
B
Yes, exactly.
A
Yeah, yeah, very cool. Okay, let's just talk about funding around it because obviously that's like a big theme this year. What comes to mind in terms of looking back at 2025, what stands out?
B
It was crazy. Yeah.
A
You can give anonymized examples of what does crazy look like?
B
Yeah, I mean, I think crazy looks like raising upwards of $100 million seed. Like upwards of $100 million in a seed round where you have a long term vision but not a near term roadmap. Yeah, this is something that I'm seeing happening not just occasionally, but quite frequently.
A
Yes.
B
And it definitely makes me anxious because firstly, like when founders are asking me, you know, how much should I raise? I'm typically saying like three, like five. Well, like, what do you need to do? Like, what are your milestones for the next, let's call it like 12 to 24 months. What resources do you need in terms of, you know, head count, compute equipment to unlock those milestones and then like maybe add like a 20% buffer or something like that. But Doing that analysis requires you to like, understand what you're going to build in the next zero to. Let's call it like 24 months. I've talked to some companies and they're like, we're building a frontier lab for X. And I'm like, okay, cool. Like, I get the long term vision. There is an opportunity to, you know, make AI more secure, make AI more humane, make AI more data efficient, whatever it might be. So, so like I'm bought into the long term vision and that, that, you know, for me as an investor is super important. Like, so let's talk about like, what your team's going to work on in the next six months. They're like, maybe we might build a consumer app. Like, you know, we're feeling.
A
I know exactly the company you're talking about.
B
But, but, but like, I wish I was talking about like one specific company. I'm actually talking about like several companies. And look like I'd be a hypocrite to say that I've never done investments like that, but I've done investments like that when I really know the people. And I'm like, they're gonna figure it out. What is frightening about this funding environment is that you meet a founder, they're like, I'm raising $100 million. I'm raising like a billion dollars maybe at times. And you need to make a decision in seven days. And I can't tell you what I'm gonna do for the next six months. And so like, you have no way of even gaining conviction that they' figure it out because you only have like seven days to get to know them. I think what some of the founders are missing is like, you only have seven days to get to know me. If you haven't figured it out, like, you probably want a partner who's going to be working closely with you to help you figure it out.
A
I mean, they're absolutely viewing it as transactional. Right. Like they don't care.
B
No, they care about, you know, the most money at the highest valuation. I mean, the crazy thing is that they don't even seem to care about dilution. It's just like the most money at the highest valuation.
A
Yeah. And, and, but you know, it does send a sign that helps.
B
So, I mean, yes, I think it does right now. Send a signal.
A
Okay. I'll tell you how it affects me. And I hate it. I hate it. Right. Antithesis came out of stealth this week. Right. And the only thing I know about them is they do something, something in AI testing and Jane street led a seed round of $100 million.
B
We invested it in it too. I can tell you what they do, but they do deterministic simulation testing.
A
The thing that is the lead is the money.
B
Yeah.
A
And then like, okay, well who else uses it other than genestreet? Like, what do you do that's innovative?
B
Palantir.
A
Okay.
B
Warp stream.
A
Anyway, so maybe antithesis is a bad example because they're actually legit. But like, you know, there's a lot of similar examples where they just leave with the money and like, there's not much substantiation behind it. Maybe it's just bad storytelling. And that's why I, as a podcaster get to talk to them. I just talk to general intuition. And like, once you spend some time with them, then you're like, okay, this is why they raise $100 million. But like, without that context, it's like, really hard to understand anything.
B
Well, and like, I think there are some companies that are raising $100 million or more because they need it. Like, a good example might be like, periodic in addition to. Yeah. They need to build out a wet lab and like designing a wet lab that can support high throughput biology, which is absolutely critical to, you know, their goals. That's costly. So, so, so, like, I understand why they need that, that funding, but again, there are others where, like, they don't have these near term milestones. I think the thing that is a little bit, you know, perturbing to me, many of them are doing it because it makes it easier for them to hire because, you know, there are all of these candidates who like, want to be, want to work at a company that is like a unicorn or a near unicorn.
A
They're pitching because the alternative is work at a big lab where, you know, it's. Yeah. The prestige and the money is there.
B
Yeah. Well, or the alternative is like work at like an early stage startup. But, but, but, but, but like, there's something about like the big valuation that becomes enticing.
A
Yeah.
B
They're also kind of pitching candidates. They, they have a compelling equity pitch where they're like, okay, maybe you're getting, you know, less than zero point, like 1% of the company. But like, given the valuation, the value of your equity is already, you know, like 10 million or something like that.
A
And, and they also guarantee the dollar value, the equity.
B
You, you, you mean that, like, they'll offer them a loan to, to pay.
A
A buyback if, if, if it goes. Yeah, if you want to sell it.
B
Yeah. But, but, but, but, but, like, because.
A
They have so much cash.
B
Like, but the thing though is that like the valuation is a made up number. Like valuation until a company exits, it is an entirely made up number. So like I could just be like, you know what, the latent space pod that is worth $5 billion and we could agree, like we, like I as an investor could say like, that is the price. And now, now the company is worth $5 billion. Like, do you think that, like if you were to.
A
Yeah, it's not real. It's not, it's not actual in any volume.
B
And given the, the funding amounts that they're raising too, like if they spend that and they, you know, get acquired for less than that amount, then like their teams are getting nothing. I wish people were kind of like more sensitive to this D and thinking more about like what is the upside associated with the company and you know, more fundamentally, like, do I deeply believe in this vision? Because I think like joining companies because like they have a billion dollar valuation, it's just, it's not the right way to choose a job.
A
I hear you. Okay, so there, obviously we can go about that forever.
B
Oh yeah.
A
And there's, there's a lot of, there's also some stuff with like cyclical funding and all that stuff. But I do want to be more relevant to engineers and researchers. What are the themes that are really strong? So one thing I'll point out is world models just in general are a really strong bet, I would say. So every near reps I go to this group of researchers and we take a vote on the top themes of the year. Everyone's extremely skeptical about world models. I think it's a trailing indicator because LLMs have been so enormously successful. You're like, I don't need anything else. I don't know if you ever take on world models or any other top theme of the year.
B
My take on world models is that we have not yet defined what a world model is.
A
Oh yeah, there's three definitions right now.
B
Yeah, I think there's a lot of confusion about what a world model is and therefore what it should be used for. We're already seeing plenty of market potential for video models, including for things as perhaps like banal is like video editing. I think, you know, we're already seeing some applications of world models to things like autonomous driving and potentially even coding. But again, it really hinges upon like how are you defining world models? And I think one challenge that people have seen is that like world models perhaps designed for one specific use case, might not generalize to others. So as an example of this, like, world models for, like, video game generation might not, like, generalize to, like, factory settings or robotics. I use the word might like strategically because I think, like, it is potentially a research problem that might be figured out.
A
Yeah. So, yeah, that's part of the general intention podcast that we did is that they have some evidence.
B
Yeah, yeah, I think, like, it is possible. It's just we're not there yet today.
A
Yeah.
B
A theme that I've been spending a lot of time thinking about is memory management and continual learning. I work with a lot of same startup. Okay. I think I know what startup you're thinking about as well. But I actually like. I see like, a lot of market potential for memory management and continual learning. My interest in this is actually more driven by conversations with practitioners. Personalization is so important right now. I think what we're seeing is that, like a lot of AI application companies, they're growing really quickly, but they suffer from, you know, relatively low retention, relatively high churn. So if you're developing an app like Cursor, how do you ensure that your users don't switch over to Windserf. Yes. Or Claude Code or Cognition or whatever else when they release new features?
A
Yeah. Cursor rules isn't enough. Right. It's like the shittiest form of memory.
B
Yeah.
A
And it's great. But. Yeah, I agree with that. But also it's like, as a. I've publicly mused about this before where memory is very poorly implemented today in a lot of surfaces. Like, even ChatGPT. I wouldn't say people are particularly excited about it. Okay. All right.
B
Yeah. Yeah.
A
You feel stronger about it than I do.
B
Yeah, yeah. I mean, I wish ChatGPT had much better leading one.
A
I don't know. So. And then I think, like, just in general, it makes product management harder because what is the product? It's a combination of U plus memory and like, when you have a bug, is it the memory or is it something core? And that's as a user, especially if it's consumer. It's. There's going to be zero patience for any of this.
B
I agree. But that said, like, consumers seem to be, like, tolerating products with, like, no implementation of memory today. So I think, like, better is still probably better than, like, what, what. What exists now. Better is better than nothing, I guess.
A
Would you agree with the statement that basically, let's say a key theme of 2026 is this personalization? I would call it kind of like the consumerization of AI in the Same way that consumerization of enterprise was a trend like 10 years ago.
B
Yeah, I mean, I think that is a good way of putting it too. Like, I don't think for, for what it's worth, think like this is just a like consumer or prosumer phenomena. If you are an enterprise that is adopting again, like a Devon or Augment or something like that, you probably also want your models to kind of like learn the.
A
Like, I'm not little learned. Yeah, like you start to like K factor. I had to explain what that is to so many founders and you know, like this, these, like, if you're in normal SaaS, this is what you obsess over. And to AI founders, they're like, what do you mean? Growth doesn't just show up like.
B
Yeah, yeah. I mean it has though, but I think like it has because for a while, you know, AI has just felt magical. But like now we're getting more accustomed to the magic and it's no longer enough. And I think, you know, we need to revert to some of the like, old tips and tricks for retaining people and, you know, bringing them in. Personalization is one of them. I always kind of intermingle like memory and continual learning because I think like one interesting element of personalization is not just learning facts about your or your preferences, but like actually learning new skills from interactions with you and you know, learning as the world changes. Like there are new versions of languages and frameworks and you know, other repos that are coming out all the time. The world is changing all the time. Human intelligence is incredibly dynamic. And yet like, like artificial intelligence is just so static today.
A
Yeah, but like, so it must update weights. Yeah, for you.
B
But, but, but that also means that like, it's an interesting kind of like systems problem because like, if you must update weights, then like, you know, weights become stateful and today like inference is not stateful. So. So, you know, I think, I think there's going to be like a lot of kind of fun, gnarly problems to figure out as we figure out things like personalization and continual.
A
That's also a fascinating infrastructure problem because you have to load and unload and you know, cache and all the, all the good stuff.
B
Y.
A
Exactly. One more thing. I think we have time for one more take on RL environments. Huge topic. Is it just a Docker container with some custom software loaded and logging stuff out? What are the good ones like and what are the average ones like?
B
So I know I'm going on record on this and like, I'm actually Okay to be wrong, but I think RL Environments is just a fad. Oh God.
A
Oh no. They're all, they're all fake. I mean like, I mean people like, like. Okay, the thing that makes me take it seriously, the labs I know are paying 7, 8 figures for RL environments for other like. And they could build it in house. They're not and I don't understand why.
B
I mean they were paying seven to eight figures for like piss poor data annotation too.
A
Yeah.
B
And then data labeling before. Like the labs have a lot of money. I think perhaps like RL Environments could create some value in the short term. But I think to the point about like what makes a good RL environment, what makes a bad RL environment? I think the best RL environment is, you know, the real world. Why would I, you know, want to buy a doordash clone when like I can just use logs and traces from, you know, doordash itself? It doesn't mean that we don't need.
A
To spend in parallel.
B
Yeah, I mean I think like using the real world, using real apps as like RL RL environment is in fact like the best thing. And this is what Cursor does. Like they actually do use, you know, real user activity on their platform to significantly like improve both their coding agents as well as tab. And I think that's one of the approaches that has like made the platform so compelling. It doesn't like, you still need to figure out like the right rubrics. You still need to figure out like the right set of tasks. Tasks. So there are some aspects of RL environment design, you know, at least as we're talking about it today that I think are going to remain incredibly relevant. But like just building a clone of an app I think is not that useful.
A
Yeah, yeah, okay. Yeah, that's. That is hot. Take. We have maybe three minutes for any other stuff that you think about. Just the state of startups in general, state of funding.
B
Yeah. So maybe I can talk about like just the archetype startup that is like most exciting to me.
A
Yes. Restartives.
B
Yeah. Yeah. I love investing in, you know, infra tools, platforms, et cetera. And as we talked about with continual learning, I think like there will be opportunities for like new tools, platforms and infra in the future. I've spent a lot of time thinking about like applications today and specifically like the relationship between research and applications. An example of this is application. Like I think there were a lot of advances in rag and the biggest beneficiaries of these advances were the Application companies for whom, you know, retrieval was a critical unlock. So as an example of this, you know, like Harvey Habia.
A
I knew you were going to say Harvey.
B
Yeah, I mean, they. They. They have, like, really interesting rag implementations. They have hired researchers, like really good researchers to kind of advance the state of the art, and that enables them to build a better product. I feel this way very much about, like, rule following and customer support. Rule following is like a hard research problem, but if you solve rule following, then you unlock, you know, better customer support. And I think a lot of Sierra's success can be attributed to, like, their focus on this. So I've been thinking about, like, even for something like continual learning or memory, what is like the killer use case, where you can either offer a dramatically better experience by having a good memory implementation, or you can do something that was just not possible today. I think you can also think about this in the inverse. And often the best companies emerge in this way. They're like, I'm trying to do this thing, but in order to actually do it, I need to solve this hard technical problem that's kind of like the story of Runway. I don't think they would have built models if they didn't have to, but I love that combination of we're delivering something that is better for consumers, better for consumers, better for users, but we're doing so by solving these really gnarly research and engineering problems.
A
Yeah, I don't want to. Yeah, there's so much that I want to sort of dig into there, but we're short on time.
B
Just.
A
Thank you. In general, I don't know if you have a general call to startups for a page somewhere that you can. You want to point people to Twitter.
B
Facts, whatever it's called. Yeah, you can find me. You can find me there or in south park with the oneeyed dog. I'm easy to spot a.
A
Okay, well, thank you so much for your time. I know you got to go, but appreciate it.
B
Of course. It was great seeing you and thanks for having me.
A
Yeah, thanks.
Episode: [State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify
Date: December 30, 2025
Guest: Sarah Catanzaro (B), Amplify
Host: Latent.Space (A)
In this episode, Sarah Catanzaro, partner at Amplify, offers her perspective on the evolution of the data and AI startup landscape in 2025. The conversation explores the intersection of the modern data stack with AI workflows, the reality behind high-valuation funding rounds, trends in memory and personalization, and candid takes on "hot" areas such as RL (Reinforcement Learning) environments. Catanzaro shares both insights and provocative viewpoints, making this episode a lively resource for practitioners, founders, and investors following rapid changes in the field.
[00:21 – 08:16]
[05:22 – 08:16]
[08:16 – 10:12]
[10:12 – 17:02]
[17:06 – 23:24]
[23:25 – 25:35]
[25:46 – 27:56]
"Many of the big frontier labs are actually using both DBT and Fivetran..."
— Sarah Catanzaro [02:17]
"I do wonder at times like if we built data catalogs for the wrong people and potentially even for the wrong use cases."
— Sarah Catanzaro [07:59]
"It definitely makes me anxious… when founders are asking me, 'How much should I raise?' I'm typically saying, like, three, like five..."
— Sarah Catanzaro [10:57]
"The thing though is that like the valuation is a made up number... until a company exits, it is an entirely made up number."
— Sarah Catanzaro [16:04]
"Personalization is so important... AI application companies are growing quickly, but they suffer from relatively low retention, relatively high churn."
— Sarah Catanzaro [19:56]
"We have not yet defined what a world model is... world model for video game generation might not generalize to... robotics."
— Sarah Catanzaro [17:45; 18:47]
"I think RL Environments is just a fad... The best RL environment is... the real world."
— Sarah Catanzaro [23:41; 24:49]
Sarah Catanzaro brings sharp, experience-driven commentary to the current and future state of AI startup infrastructure, funding trends, and product innovation. She argues for building infrastructure and products that deeply integrate hard research problems, rather than chasing hype cycles or inflated valuations. For founders and engineers, her advice points toward investing in memory/personalization, leveraging the real world as a training environment, and staying focused on applications that genuinely need cutting-edge science to succeed.
For more resources and full show notes, visit: latent.space