
Loading summary
A
Okay. Hi.
B
So this is another lightning pod with Joe Christian Burgum. Is that. Did I get it right? You're over in Norway?
A
I'm over in Norway. Trondheim, Norway. In the center of Norway? Yes.
B
What should people know about Trondheim?
A
It's a small city, it's easy to get around. There's a great technical university here. The climate sucks a little bit, but it's easy to get things done in the winter.
B
So, yeah, I've never been over. I've been to Oradev, I think, which is over near you guys. But yeah, what we're here to talk about, just generally your hot takes on rag, search, vector databases, all that stuff. I think you've taken to publishing a lot more recently on X and that's gone really well. So I'll just kind of go into that main thing that is that everybody knows you for, which is your piece on the vector databases, the rise and fall vector databases. So maybe give us the background of, like, why you felt compelled to write this.
A
Yeah. First of all, I think I had to go a little bit back, right? So I have a long background in search and working on infrastructure for search. Like I've been in search, working on search systems for 20 years at Yahoo Company, also fast search and transfer here in Trondheim and Norway, and also working on embeddings, neural search, all of those things. Right. Leading up until ChatGPT, the ChatGPT moment, like November 2022. And then there was some kind of cookbook, I think, from OpenAI where they said, okay, this is how you can do connect ChatGPT with your data and here's embeddings. And I think then a lot of developers, right, got into this is how we can build cert, this is how we can do rag. And I think there was like this unnatural connection, meaning that between retrieval in rag, that it had to be vector embeddings.
B
By the way, small role in that. I actually was the one who wrote the Chroma example in the OpenAI Cookbook. You did?
A
Okay.
B
I was an angel investor in Chroma before they became a vector database. And then I was just helping out.
A
I'm actually a huge fan of Jeff and Anton from Chroma. I mean, I think Anton left, but I think they've done a great job at promoting retrieval for AI and infrastructure and did a lot of great things. So I really enjoy talking to them on X anyway. And then we had the whole vector database. I think Pinecoin was one of the pioneers framing it as a new infrastructure category. If you need to work on embeddings. You have to use a vector database. And naturally then if you want to do anything in AI, then you need to have a vector database. And that was my primary motivation for writing that piece and looking a little bit back what happened and where we are now and how I see it. And yeah, so that was the pure motivation.
B
Okay. And the general thesis, I guess, if you wanted to sort of recap that, like, you know, I think it's a very fast rise and fall. Like Pinecone was a dominant player for a long, long time. And you know, I don't know my exact sources because there's a lot of rumors going back and forth, but apparently they went up to like 100 million AR very, very quickly. They raised a big round and. And then suddenly a lot of people started leaving, like suddenly went from cool to uncool very quickly. And I don't understand why.
A
I don't understand that either. And I think also they repositioned a little bit, going back to their core messaging. If you go to their website now, it looks more developer focused. It's not the memory for AI, it's not like enterprise ish, it's more towards developers now. So I tried to think that they would try to go back to their original route. I think that's a good thing. But also, of course, there's been a lot of competition in this space, a lot of new companies. One of the upcoming stars is turbopuffer. Kind of same SaaS model, a little bit different pricing and they really talk to developers. And I'm not saying that the companies are dying, right? I'm just saying that the separate infrastructure category is dying, right? Because you have vector search capabilities in almost any DB technology nowadays, right? And you have it also in more traditional search engines like elasticsearch, Solar, Vespa. So I think there's like convergence on features on both parts. So. And then you have things like PGvector in Postgres. A lot of people, you know, get confused. Okay, I have already a db, it has Vector search. Why do I need another db? Like a vector db so the whole database concept. So I think those companies, I mean there are lots of great technology here, don't get me wrong on that. But I don't saying that the companies are dying, but I'm saying that the category is dying. So that there's, there's this distinction and I think a lot of people like oversaw that and like came at me and say that, you know, because they had some kind of hate around Some of these companies and they said, yeah, you know, go fuck Bitcoin or whatnot, right? But I actually say that the category is dying and I'm actually want to call these new companies that they are like search engines. And I want to go back to the natural. I think that's a more natural abstraction for connecting AI with knowledge and all the arguments for doing rag. I think the natural concept there is search. And I think one of the insight I have from. I use Windsurf a lot. I love Windsurf. There are cascade mode and if you ask it like, what are the tools you have available? And like list like 17, 18 tools like edit files. But there's also like things like search code base, search, the web, grep, and these are like search abstractions, right? And I love that idea where you like just connect the reasoning model with, with these tools that are essentially search tools and that can help the agent or the LLM to actually formulate the query. You know, should I do a GREP or should I do a more of a semantic search or should I do more a keyword search or should I just search the web? So I think that's more like the natural abstraction instead of jumping into vectors and how you represent that is more of a detail of how you implement search.
B
It's interesting that we fixated a lot on vector, like dense embeddings and all that. And I think now we're sort of broadening out. I will also mention that Chroma, I think from the start has always said they're kind of going after information retrieval and not so much the narrow sense of rag. I think broadly this is the consensus.
A
That.
B
The category was never really going to be lasting for that long. There was just a brief period of time from one of my favorite early tweets in AI, this post chatgpt phase was I summed up all of the fundraising that happened in vector databases for 2023 and it was something like 230 million all put into all the vector databases. And that was more than the entire lifespan fundraising of MongoDB.
A
Right.
B
So basically they cannot all win because they've already taken more money than supports one of the de facto winner companies in NoSQL.
A
Yeah, interesting. Yeah. I think also on MongoDB, right, they brought a new category, the NoSQL. Right. And but nowadays all the other database players have also caught up, right? So now even MongoDB has relational SQL, right? So there's always like this convergence, but MongoDB kind of it sticks. But I don't think that for pinecoin that was originally leading that movement, it won't like stick in the same way. It's too narrow. It's too narrow, yeah. But I would like to say one more thing about embedding so people like, okay, Joe, but you know, embeddings is really important and also think that embeddings is really important, right? Because you can represent more data than ever before, right? Multimodal whatnot. Run into a neural network, get an embedding representation, and then you can move this embedding representation around in vector space and adjust to your domain or whatever you're doing. So it's really important. But what happened was that it went mainstream, right? It went from these big tech companies like Google, Yahoo, Facebook, all of them, you know, been working on embeddings for a long time for a lot of different tasks, right? But with post ChatGPT and we got the embedding APIs from open API, it suddenly became mainstream. Like every developer could start using embeddings, right? And what to do with them and similarity search and so forth. So I'm not against embeddings. Like embeddings are here to stay. It's just that it's only, it's not only about similarity searches in this kind of embedding space, but. And then I think more people actually realize that, you know, you actually need something more to it than just embedding and a cosine similarity to do search well, like things like freshness or authority and all of other signals like that that really plays into role in web search. And I remember One of the OpenAI guys wrote like, you can embed the whole web and then you can build the next generation web search. And I thought like, okay, just looking at semantic similarity, that's not going to play out too well. So yeah, I mean, they're trying to.
B
Sell you their, their model, right? So what they're going to do is say those very hypey things. Yeah, I mean the way that I put it is always, you're always going to want to do a hybrid query. You always want to add metadata and like, you know, do all that stuff. I think my question to you is maybe a very ageless question, which is, should they all be the same system? Right? Your search system, like elasticsearch is typically you duplicate whatever your main storage of record is and then you have that search index that is basically almost a complete duplicate. You just copy over the documents. Do you believe in that? Do you think there's a convergence here?
A
This is a fantastic question, I think for a lot of use cases. Right. And if you're already using some database like Postgres. Right. It has this great extension PGA Vector. Right. And I know that I tweeted things about PGA Vector that was true in the start around the limitations of PGvector, but there was a rally around PGA Vector, like adding new algorithm going, introducing actually two algorithms, both EBF and HNSW, adding half vec, adding binary vectors. So actually what you can see, PJvector is doing more in the capabilities of vector search than some of the real vector database players. Right. So if you're only looking at like vector search capabilities and you already have your data in postgres and you're operating at a reasonable scale, I think it's fair to use postgres. Sorry, I think it's fair to use postgres or use one database. If you like operating at, you're not operating at a really large scale and you do, you know, some vector search related workloads and you also use a database for other types of workload, then it might make sense to just keep the data there. But if you're actually building something that really depends on search quality and your business depends on it. Yeah. Then definitely I think that you should consider actually using a real kind of retrieval or search engine to represent the data there. Right. Yeah, yeah.
B
And is the search system how closely entwined is Rexis and search in your mind?
A
Yeah. And that's the thing with embeddings, right? Embeddings. I think with embeddings and embedding based retrieval. Because embedding based retrieval has been used for a long time in recommender systems like large scale recommender systems like, you know, TikTok or Yahoo News or things like that.
B
Apparently TikTok published their Rexis recently, which is kind of interesting.
A
Yeah, it's like a cascade of. There's always. In this system that operates at a really large scale, there's always a cascade of different stages where you first have to retrieve over the candidate pool and that's typically using embedding based retrieval. And then you have layers of re ranking layers that finally you end up with hundred candidates or something like that that you actually present to the user. So I think definitely there's convergence in that. Embedding based retrieval is also no more common for search systems. So there's convergence there on how it actually solves on the technology spectrum.
B
Yeah, yeah. Any other thoughts on like, I guess the confusion for a lot of folks who are newer to this they understand now that you cannot just have embeddings only and cosine similarity only. It's just the sequencing of like what should I do first, what should I do second, what should I do third? Everyone says re ranking is super important, but that it adds maybe 3 to 4% to your results and maybe that's the lowest hanging fruit. So I'm always trying to figure out what should I recommend to people that they should start with like a Postgres or MongoDB as they're transactional and in the end vector store. Then they can split it out to maybe use elasticsearch or Vespa. I don't know if that was the recommendation there. Redis I think is also trying to push themselves there very, very hard. And then you add the rexes. Is that, is that a good sequence?
A
I think it's, it's really hard to come up with general recommendations like without knowing what you're doing. But if you're like looking to build a RAG application like I think most people are interested in some, something related to rag. Right. When you have some data and you have to transform your data and think, I think first I think is it Hamal that always talks about look at your data or you know, everyone's talking about look at your data. So first of all, you know how to get your data in a cleaned up way. If it's like PDFs or whatnot to do that. There's things there. I think actually that a very strong baseline is the classical BM25 algorithm that's been around for 30 years. Right. It's keyword matching, but it offers a very useful baseline for a lot of different search use cases because it gives you that baseline. Right. Then you can start looking at using an off the shelf embedding model to also embed a model and all of the eng, more or less most of the engines have some kind of hybrid search capabilities. Start to play with that. And then if you can afford it both from a latency perspective, on a cost perspective, you can look at adding like a RE ranking layer on top of that. How you stitch that together depends on, you know, your framework of choice. But I think most of you can, you can stitch this together like with multiple different APIs depending on your budget, I guess. Yeah.
B
And you know, I always tend to recommend people to do this offline as much as possible, like batch offline, whatever. Most people don't need fully online systems.
A
Yeah. And that's a friction point because I've been used to Working on kind of constrained online systems, like at pretty significant scale and when there's always like everything is online, needs to be a low latency and then I have problems adjusting to when you want to do things at a much lower scale. So I'll give you an example, like calling out to some kind of embedding API to get JSON floats. It's not something you want to do if you're running at thousands of QPs. You don't want to add that dependency. You want to have something local, something that is faster. So I've always been like, okay, I'm going to call out to this endpoint. It's going to take 300 milliseconds to get this float. It's something that I like. Oh, shrug. But now I'm shifting towards more, you know, it's easy, it's an API based service. You don't have to think about it, it's just there. So it's much easier to build from. Right. To have something that is API based. So I'm trying to embrace that.
B
I see, I see. No, so when I say offline, I mean more like not in the critical path, like batch systems, because. Yeah. And it's interesting. I don't know if you've looked at Postgres ML for running the models alongside of the database. Are you, are you bullish on that kind of stuff?
A
No, I'm not. I'm sorry, I'm not. I think this is. Yeah, we also seen other players that tries to, you know, move a lot of the logic into the database. Agentic embedding, inference and whatnot. LLMs. I think it's the right direction, is to keep infrastructure a little bit separate from that because they're like different scaling properties. I think people can stitch those two things together instead of trying to do everything with one single platform. So no, I'm not bullish on that because I don't believe in the developer experience of writing like these huge SQL statements for transforming data from this and then embedding it and then writing it back and expressing this in the databases. Like, what does this do to my database? Is it like calling out what's going on? I tend to want to have more control over cost and performance and what's going on than just writing some really large SQL to execute new.
B
Yeah, it's interesting. I think there's this constant tension between what should live in the database versus what is an external system. I don't think it's a clear cut, like, you know, classic the Cron service Which we have in Supabase. Okay, so cool. Like any other hot takes or. What are the biggest criticisms that you got after you published this? What do you agree with? What do you disagree with? You know, just.
A
Yeah, I think one of the things that people pointed out, if something goes semi viral after a few days, you discover that there's a lot of replies that you didn't see and you're like, okay. But I think one of the things that stood out was that people said that Joe is saying that RAG is dead because of vector database infrastructure is dead. Right. And I think that was a misunderstanding as well. And I think that comes from people making the connection between RAG and vector databases. Like, it's so strong. So when I'm saying that the vector database infrastructure category is dead is like, okay, RAG is dead. And I think that RAG is definitely not dead. Right. Augmenting AI with retrieval or search is still going to be relevant, and I think it's going to be relevant for a very long time. Right? So that was one of the things.
B
I mean, the.
A
And I saw that, you know, now we have 10 million model longer context and you have the same cycle repeat every time.
B
Every time. I'm just like, you know, for me, you know, I put out this like cryptic tweet. I was like, you know, this is. Llama4 is going to reignite the long context versus frag debate. But it will actually resolve the debate, but not in the way that you want.
A
Hold on, this is too cryptic to me.
B
No, it's just like there are like five other guys, like saying, oh, like, you know, long context hills rag, like RP rag. I'm just like, guys, you're idiots. Or like you're engagement farming basically. Like a lot.
A
Most.
B
Most likely you believe, like they know what they're doing and they're just saying nonsense just to have fun and like people who don't know take them seriously.
A
Yeah. But also it's nuanced to this. Right? So I've seen people do RAG when there's no need to do rag, meaning that, you know, if you have one PDF, like with visual information and things and you want to chat with that, definitely that case is probably like that if you don't have like high QPs and things like that. So I think there's nuances around this that there's definitely. I had a call with someone that had like 300 articles and I said, you know, this will just fit into the context window of one of these Gemini models. You don't have to have A vector database for this case. And they were so surprised when I said this. Can you really do that? Yeah, but it's also. Look at it. It's like just we had like 4K context window. Right. And now 10 million. Right. So. And that's fast. Right. And then people are still running their initial demos from early January 2023. Right. So where you were dealing with 4k or 8k. Right. So. So some parts of it is still not. Is. Is not relevant now because we have longer context windows. But I think ret. Of course it's going to be there for a long time. One example I love to bring up is like one of these small toy data sets from Trek Covid is like 170,000 documents and it's already 36 million tokens. And you're not going to load all of that for a single query.
B
Yeah. Awesome. Do you have a take on knowledge graphs in Graph Rag?
A
I think that the Graph rag. Well, I have a lot of takes around it. I think the one issue, I mean Graph databases is a database that kind of solves one particular problem and it does it well to traverse the edges in the graph and, you know, random access and jump across. But the core issue is actually to build the knowledge graph. Right. The entities, the relationships. So if you say graph, you know, databases or Graph Rag is going to kill Vector Rag and all that discussion. I think the first issue is to actually build the knowledge graph the first place. Right. And if you use a search engine or a dedicated graphdb to actually speed up and accelerate the searches. Okay, fine. But I think people like, okay, if I'm going to do Graph Rag, then I need a graph database. And I hate that connection between doing something and then connecting it to some specific technology. And I think a lot of people do that. Right. You jump from some concept into some technology. You can also do graph exploration with a search engine, right? You can. So you don't need a specific technology to do it. And can Graph RAG be better than Vector Rag? Yeah, for sure. In some cases it might make sense or hybrid or whatnot. But I think people get caught up in some specific technology all the time.
B
Yeah, but like, I think that's okay. But I'm still trying to validate the presence of knowledge graphs in LLM applications. Because obviously with LLMs it is much easier, better to create these entity triplets and all that. So theoretically it should be better. Yeah, I mean, in the past knowledge graph has been a dirty word, but now maybe it's not.
A
Maybe, maybe, maybe. Yeah, maybe I think with LLMs you can do a lot more things around data generation, you know, in general. So generating those triplets is a bottleneck, right? It's been a bottleneck and now you have LLM. So I agree. You know, now it can be easier to actually build what matters. Right. Which is those triplets.
B
Okay, awesome. Any other opportunities that you find? I know that you mentioned Gina, I think they're prominent European startup in Rag. And then I think over here, Voyage just got acquired by Nvidia. Anything on the embedding side, do we need a lot better embedding models? Is what we have from the big labs good enough?
A
Oh, I hope to see. I mean, Voyage was really leading the pack on doing domain specific embedding models like legal PDFs, and what I want to see is more embedding models in that direction where you essentially represent this PDF as an embedding or multiple embeddings for legal domain or finance or health. I hope to see that grow so that you can have a better starting point than just those text models. And I've been a huge believer in using visual language model as, as the backbone for embedding models where you essentially take a screenshot of a page. You don't have to go through ocr, so you, you, you then get a much richer representation. You don't have to go through these complex processing pipelines. So I hope to see more innovation. I'm not sure if it's going to happen because I think it's a bit difficult business model to be in because you have to have an API based service and you have to do batching and you have to make up for the compute and then, you know, are people willing to pay for it And I think maybe that's why Voyage got acquired. I think also Jina is doing a lot of great things in this space now, especially in European languages. But I think every company is trying to move up in the value ladder. Right. They want to move into enterprise search or move into a different direction. So. Yeah, but I. But I do hope that we will see more and better like general embedding models.
B
Yeah, yeah, I mean, I'm sure. I think the Voyage guys are very happy because it seems like they got acquired for a lot.
A
Yeah.
B
Okay, so. Okay, cool. Anything else before we we wrap? Any calls to action? Any, you know, parting rants on the topics of the day?
A
No, I would love. I mean, if you want to connect with me, you know, for the audience, you can find me on ax. I'm under the handle Joe Bergen there. So I love or show that shout on X. So I hang out there quite. Sometimes. Yeah. Yeah.
B
I mean, it's. It's where the AI community is, you know, Although I've been. I'm always trying to, like, grow on LinkedIn or YouTube. Like, I mean, you know, there's a lot more people there, you know, there. Twitter is this, this like echo chamber.
A
Yeah, but it's, it's not the same. I mean, X has. I mean, we wouldn't have this meeting, me and you, without X there. Right. So it's a great place for really high signal to noise. And I think the AI community there is really great. So.
B
So, yeah. Awesome. Well, thank you.
A
Thank you so much for having me. Zwicks, this has been awesome.
Episode: ⚡️The Rise and Fall of the Vector DB Category
Date: May 1, 2025
Guest: Joe Christian Burgum
Host(s): Latent.Space
This “lightning pod” episode features Joe Christian Burgum, a veteran of search systems and AI infrastructure, discussing the rapid emergence—and equally swift decline—of the "vector database" (vector DB) as a standalone infrastructure category. The conversation explores the origins of vector DB hype in the AI community, why it became a dominant topic in RAG (Retrieval Augmented Generation), and the market and technical convergence that’s led to its fading distinctiveness. The discussion offers historical context, first-hand insight into embedding-based search, and practical guidance for developers and companies navigating today’s retrieval, RAG, and search infrastructure landscape.
[01:00]
[03:03-04:30]
[05:10]
[08:31]
[10:27-12:03]
"PGVector is doing more in the capabilities of vector search than some of the real vector database players."* — Joe [10:47]
[13:17-16:00]
[17:23]
[18:45-21:12]
[22:00-24:07]
[24:30-25:55]
On category collapse:
"I’m not saying the companies are dying, but the category is dying." — Joe [03:47]
On search vs. vector DBs:
"That’s a more natural abstraction for connecting AI with knowledge...the arguments for doing RAG—the natural concept is search." — Joe [04:29]
On application advice:
"A very strong baseline is the classical BM25 algorithm...it gives you that baseline...Then you can start looking at embedding models...then, if you can afford it, add a re-ranking layer." — Joe [15:00]
On mixing AI/ML infra with databases:
"I don’t believe in the developer experience of writing these huge SQL statements for transforming data from this and then embedding it and then writing it back and expressing this in the databases." — Joe [17:45]
On hype and cycles:
"Now we have 10 million model longer context and you have the same cycle repeat every time." — Joe [19:39]
On infinity context windows vs. retrieval:
"Some parts of it is still not...relevant now because we have longer context windows. But...already 36 million tokens...You’re not going to load all of that for a single query." — Joe [21:27]
For more show notes and resources visit:
https://www.latent.space