
Peter Levine speaks with Ash Ashutosh, CEO of Pinecone, about the launch of Nexus and the shift from vector databases to knowledge engines. As agents become the primary users of software, they discuss why traditional retrieval systems break down and how AI systems need to evolve to support machine-to-machine interactions. The conversation explores how agents currently spend most of their time retrieving and reasoning over data, why that approach is inefficient, and how moving reasoning closer to the data can dramatically improve performance, accuracy, and cost. Ash also explains how Pinecone is rethinking the stack for agentic applications, introducing new abstractions, query languages, and developer workflows.
Loading summary
Ashutosh Kulkarni
About eight, nine months ago, we started seeing a massive shift of who our users are. It turns out it wasn't a human being anymore. It wasn't a different Persona. It was an agent. 85% of the agents work even just retrieving knowledge, and only 15% is the models. The models aren't a problem. The problem is the underlying system that you're trying to get information from brought it down from 40,000 to about 2,000. Wow. It's under 500 milliseconds, from a minute to two minutes. Most importantly, the accuracy dramatically goes up from I think best case was about 68. We're well over 90% accuracy. And that is just version one. We finally understood why these things were taking so long, and they were fundamentally running on a system that was designed for human beings.
Podcast Host
What happens when software is no longer built for humans, but for agents? For years, systems like databases and search were designed around human interaction. A person asks a question, evaluates the response, and decides what to do next. But with the rise of agents, that model starts to break down. Agents don't have context. They brute force their way through systems, issuing dozens of queries, consuming tokens, and often failing to complete tasks. This creates a new bottleneck, not in the models themselves, but in how data is retrieved, structured, and underst. In this episode, Peter Levine speaks with Ash Ashwetosh, CEO of Pinecone, about the shift from vector databases to knowledge engines and what it takes to build systems that actually work for agents.
Peter Levine
Hey, Ash.
Ashutosh Kulkarni
Welcome. Hey, Peter. Been a while.
Peter Levine
Yeah, it has been a while. Good to see you. So we're here to talk about. At least I'm here to talk about Pinecones new the launch.
Ashutosh Kulkarni
Yeah.
Peter Levine
And I know we. I'm a board member with you, and we've been on this journey together now for a bit, and you all have been working on this new product called Nexus. And, you know, I'd love to hear more about it and sort of the. The kind of the genesis of it and then what's happening at the launch and, you know, kind of what. Where to from here.
Ashutosh Kulkarni
Yeah, I think we've been talking about it at our board level for several months now. About eight, nine months ago, we started seeing a massive shift of who our users are. It turns out it wasn't a human being anymore. It wasn't a different Persona, it was an agent. And that shift fundamentally changed how we thought about what's the best way to serve this new user in the world of retrieval. If you think about what we had done for Five years, six years since we first pioneered the vector database market. The idea was you provided a interface to a human being who did a query, got a response back, and it was the human being who provided the context about whether the response was accurate, whether they had to re ask the question, and they would finally take the action based on whether they verified the information or not. Unfortunately, agents don't have that luxury. The human gives them a task and the agents go there and start trying to perform the task. And they spend a ton of time going through this brute force loop of querying, getting some chunks of data back.
Peter Levine
And when you say, just so I have the context, when you say agents spend a lot of this brute force, what are they actually, let's say right now, before this, before Nexus is launched, what are they actually doing in the background there? Like querying what and what's the nature of that whole data flow?
Ashutosh Kulkarni
Yeah, so with the, you give it a task to the agent to say, hey, is this the product under warranty?
Peter Levine
Okay, Somebody asked them right now, let's
Ashutosh Kulkarni
say without Nexus, okay, customer service, okay, Agent comes in and says, can you, can you let me know if this product is under warranty?
Peter Levine
Right.
Ashutosh Kulkarni
Agent does something called a query expansion, breaks up the queries and then says, okay, let me go figure it out. What this product is. It goes to five or six different systems. Sometimes it might be sales order system, product definition system, things about warranty information, and it sends out different queries just like a human being would, because that's the interface we provided as part of database. So here's an agent trying to solve a problem without having any context with a system built for human beings. And so it goes out, issues a query, and it asks six or seven different queries before it first starts to get an idea about the first, think of it as a first line of code, effectively. Sometimes it could be 40 different queries
Peter Levine
and it might be an internal system or external, all over the place, whatever, stuff.
Ashutosh Kulkarni
Right? But the idea most of these guys, most of these agents do, is they do a ton of retrieval, figure out, oh, I don't have enough information, let me go ask more questions. Oh, I have a conflict here with this information.
Peter Levine
Got it.
Ashutosh Kulkarni
And this reasoning goes on until they finally figured out either, okay, I'm done with the task, let me report back to my human that the task is complete and the human has to actually examine because most of the time, turns out the task completion rates is less than 50%. So half the task return. These, these agents don't actually complete, right. And they take a ton of time. In fact, there's a research study that came out of UC Berkeley which showed 85% of the agents work isn't just retrieving knowledge and only 15% is the models. The models aren't a problem. The problem is the underlying system that you're trying to get information from.
Peter Levine
I see.
Ashutosh Kulkarni
They were built for human beings. You're asking agents to come back and do pretend like we talked about this before. When machines are talking to machines, why do we have an interface that looks like a human being?
Peter Levine
Right. Yeah. You and I have talked about that
Ashutosh Kulkarni
for a while, and this is the same problem. It just happens to be agents performing specific tasks. And that change in our user led to what it means to fundamentally change how retrieval is done by Pinecone. Yeah. And that's what we're calling Nexus.
Peter Levine
So maybe to help me and help to put the context here, Pinecone, of course, you know, built and defined the vector database category. Okay. So now we're talking about this Nexus. You know, it's a. You call. I believe we call it a knowledge engine.
Ashutosh Kulkarni
Yep.
Peter Levine
So is this just a marketing term? Like what? Actually, you know, like, you have vector database. Like, you know, instead of doing this, we'll call it something else, but it's really the same thing. And so kind of help, you know, for me, just helped me to understand, like, one is a mark, you know, one. You know, you just put some lipstick on. It looks different.
Ashutosh Kulkarni
Right.
Peter Levine
Or there's a really. There's a different approach built on vectors. Not built on vectors. Like, what's the evolution of Pinecone into this? And maybe a second question on that is, how did you actually bump into this? I mean, what were users doing that informed the company that this shift was occurring and that Pinecone was a viable solution for this? So sorry to break, like, maybe both of those.
Ashutosh Kulkarni
Yeah. I think the distinction is absolutely real in terms of what a knowledge engine is and what a vector database provides to knowledge engines. I think think of a vector database like a library.
Peter Levine
Yeah.
Ashutosh Kulkarni
There's tons of information out there. A human being asked for some information. Appropriate books and pages and documents are given to you. And you read through this stuff and you figure out the knowledge out of it and go back, go ahead and make a task. Now, you allow the same vector database to operate with an agent. It has to do the same thing, except it doesn't have the context, so it spends it.
Peter Levine
When you say, yeah, the agent. Okay.
Ashutosh Kulkarni
Right.
Peter Levine
Go ahead.
Ashutosh Kulkarni
Yeah. And it has to go through Everything where you read all the pages that are relevant, you synthesize across them. You hope it got the right answer. And that's the brute force approach, because agents are very, very good at reasoning. They can spin up more queries in a millisecond than you can do in an entire day.
Peter Levine
Right, right.
Ashutosh Kulkarni
And so they brute force their way through, which is why you see a ton of token consumed for even the smallest of the building applications. Now, a knowledge engine is more like an expert. An expert in some task you're performing. You want to get some task done. But let's say you are in the. You have a medical billing task agent, and a knowledge engine for medical billing for that specific task is an expert in figuring out the medical billing part. It may not care about your prescriptions. It may not care about, let's just
Peter Levine
say billing, just billing, and then go ahead.
Ashutosh Kulkarni
That same knowledge engine uses the exact same data, which is, you know, let's say in a hospital, and may have a very different Persona, very different context. When a doctor uses it.
Peter Levine
Sure.
Ashutosh Kulkarni
Versus when a hospital administrator uses it. Sure. And that's the difference. I think a vector database treats all data like it's a pool of data, like a library. And you need that. That is essential, but you need something else on top that can. That literally creates a context, Very, very specific context.
Peter Levine
So we'll get back to the other one. I want to follow up on point B here on that. When you.
Ashutosh Kulkarni
When.
Peter Levine
So you have. I get the library analogy, a bunch of books. And now I get. I also understand this knowledge. I think I do the knowledge engine, which is as if you've read the books and it gives you the context back. I'm trying to distinguish between an LLM that I kind of thought did some of that stuff versus what, what added things did Pinecone do to turn the library into the knowledge agent? And then, you know, without having an LLM, like, what is the contextualization in the. I mean, we can use the example of. Of the billing service.
Ashutosh Kulkarni
Right.
Peter Levine
Okay, so now how does Pinecone know the context itself? Where does it learn that? I guess that's the question.
Ashutosh Kulkarni
You know, I think fundamentally today, all of the reasoning is done at the retrieval level, which means once you get the data, you got the LLM, you throw it in. Sure.
Peter Levine
Yeah.
Ashutosh Kulkarni
Let me figure out the answer.
Peter Levine
Yeah.
Ashutosh Kulkarni
May or may not be the right answer. I don't even know if you have all the data, but all I reasoned over was based on the data you gave me.
Peter Levine
Yeah.
Ashutosh Kulkarni
Okay. Then you move the reasoning closer to where the data is, closer to where the curation of the data, where the actual processing of the data is happening, you can do a lot more things. For instance, you can get the right kind of data because now you know what context I'm addressing for more importantly, you can start citing and attributing. So you actually can say, this is the citation of why and where this answer came from, as opposed to a learner would not. It just probably talks to some MCP server, gets some information and brute forces it way to some answer, whether it's right or wrong. So when you move the reasoning from retrieval to curation, closer to the source, closer to the data, significant differences happen. And what you would do is you would tell Nexus, I have this data. And typically these are the answers I expect to see. This is my context. So you give it the appropriate data
Peter Levine
and when you say you that's a human that does typically set it up,
Ashutosh Kulkarni
you're effectively kind of training, we call it building. Okay, go ahead. Training the context of the knowledge engine to say with this data. Here is the answers I expect to have.
Peter Levine
Got it.
Ashutosh Kulkarni
So based on this test data, this is where the interesting part is. Very similar to a compiler that we remember. You write code, it compiles to generate some code. This one is a continuous compiler, iterative compiler, that says, okay, you gave me this data and you want this output. I want to match it. So I want to keep figuring out how to curate, how to break up this data in a way, create new artifacts. In fact, we actually create completely different artifacts.
Peter Levine
And is this happening all within the plain system?
Ashutosh Kulkarni
Yeah, the entire reasoning has been moved inside.
Peter Levine
I see, okay.
Ashutosh Kulkarni
And that's where you start looking at, you gave me this data, but this is the output you want. Let me find the most effective way to just completely break up this data.
Peter Levine
Got it.
Ashutosh Kulkarni
Into new artifacts. So, for example, in case of billing, you might give it the entire hospital data, but what you care about is just the patient, the doctor, and the bill.
Peter Levine
Got it.
Ashutosh Kulkarni
Maybe you don't care about the research part.
Peter Levine
Got it.
Ashutosh Kulkarni
After we break that up, that's when we embed that data back into FindCo.
Peter Levine
I see.
Ashutosh Kulkarni
And so the fundamental shift here is the first build phase, which is you are now compiling the context very specifically for the knowledge engine. That's one part. So as the new data comes in, it gets converted into this new format that is very close. Got it. It gets cited back to where the source is. Yep. It gets put back into Pinecone's active database. That's one part. The second part is on the retrieval side. Now, agent says, not only did I give you the data, I want to get some information. And don't give me a poem. Don't give me an image. That's cute for a human being. Give me very structured data. Tell me exactly in a very structured format. Because I'm a machine. I understand structure.
Peter Levine
Yeah, machines.
Ashutosh Kulkarni
Yeah, I understand machines.
Peter Levine
We don't care about images or whatever.
Ashutosh Kulkarni
So that's the second part you define as part of your definition of context. Not only do, do you define data and what kind of outputs would you also define the format of the output? Because the format for billing might be very different from the format for the doctor.
Peter Levine
Got it.
Ashutosh Kulkarni
But very different from the hospital administrators.
Peter Levine
And. And how hard would it be for somebody to set this up? Let's say that, you know, you start with the human. They kind of organize things like how what. And then we'll get back to how customers actually bump into this. But what are you, you know, what's the, what's the presentation and complexity that a user has to go through?
Ashutosh Kulkarni
Literally, in fact, we are working on an internal one for our own contract management stuff. We've done hundreds of contracts. What we did was to say, okay, why don't we take all the contracts we did. Let's on one side talk about the successful contracts. Let's look at the input of all the contacts with red lines. This is your source data. This is your destination. Figure out how I can approve something from here to there, and we just load it into the build phase. Runs about three to five turns, takes a few minutes, and you create an entire new artifacts. Wow. This is literally, I hate to use the word training a model, but you're training a knowledge engine.
Peter Levine
Yeah.
Ashutosh Kulkarni
In a very, very different way.
Peter Levine
Right. It's almost like you're training data to be present. You know, you're training data to. You are using data to train a knowledge engine.
Ashutosh Kulkarni
Exactly. And the data is the foundation, the output and the format of the output. And there are several things. And we'll talk about the new protocol that we have defined to make sure the agents can actually define how they want to get responses back. This is literally the massive gap that we've had between models that have spent a ton of time building reasoning capabilities and people have completely ignored where the real value is, which is on the data side, not its side.
Peter Levine
And then let's say in this case, the agent. Now come, you know, let's say we have the knowledge engine. Agent queries the knowledge engine and comes back and you know, query understandable. Sorry, an agent understandable language.
Ashutosh Kulkarni
Yeah.
Peter Levine
Would the agent still use an LLM in that case afterwards? Is that sort of the. Is that how this works? And so, I mean, my takeaway from that is it will simplify or reduce the number of tokens actually used for the backend LLM system and all that. Because my data is much more prescriptive when it gets to the hm. Is that fair?
Ashutosh Kulkarni
Absolutely. And three things happen. One, the task completion rate. The success rate of a task has gone up on an average about 50%, maybe 60% in a good day, it goes up well above 90%. You actually have agent finishing a task. This is even more important because there's no point giving someone a task even if they did it for free. Who cares? You just did the wrong thing.
Peter Levine
And if it fails, that's the biggest.
Ashutosh Kulkarni
That's even worse. So number one is task completion rate goes up dramatically. Number two is the time it takes to complete the task.
Peter Levine
Right.
Ashutosh Kulkarni
It used to take, if you run today, any of the task, it takes minutes. And part of the reason is spending a ton of time, 85% of the time trying to just retrieve knowledge that dramatically goes down. And in our own internal, various applications that we've been building on, Nexus tokens have gone down depending on how badly or how good it was written. Put in 40 to 90% reduction in Frontier model tokens.
Peter Levine
Wow.
Ashutosh Kulkarni
And that is, I mean, that's a big.
Peter Levine
That's a cost savings, performance saving, the whole thing.
Ashutosh Kulkarni
I mean, ultimately the ability for you to come back and have, quote, unquote, an expert who gives you precise answers very quickly at the lowest cost. That's huge. Yeah, that is huge.
Peter Levine
I mean, it's really, it's accuracy, performance and cost. It's like all of those benefits come together.
Ashutosh Kulkarni
Yeah. And that the problem for users hasn't been the models. Right. That hasn't been the problem. That's why you get demos really quickly.
Peter Levine
Right, right, right.
Ashutosh Kulkarni
It takes four hours to put a demo together. Sure. But then yet you understand why is it taking so long for people? And interesting. I think the difference here is people have been traditionally using kind of ETL pipelines. Right. We've been taking your data through just like the old database. This is not an ETL pipeline anymore. This is context compiling completely on the fly.
Peter Levine
Yeah, I love that. I love that concept of context compiling
Ashutosh Kulkarni
completely on the fly.
Peter Levine
I understand that. That makes sense to me.
Ashutosh Kulkarni
Yeah.
Peter Levine
So Ash, I had asked before and the multi, multi question, multiple Questions like, what were customers that you talked about, current customers started doing this, and that's how Pinecone recognized that there was an opportunity. So maybe talk about a customer who had Pinecone and then what were they doing? Like, different. Like, how did you know that this was a real opportunity based on customers?
Ashutosh Kulkarni
Yeah, let's take. Maybe the customer. 0 was. Was pinecone actually. Okay. Because we. We. We had started building our entire operations, an operations agent that allowed us to run our business without. Without dashboards. We just banished the dashboards and moved to a model that kept the entire company's knowledge alive and accessible everywhere.
Peter Levine
Right.
Ashutosh Kulkarni
So we had this. We still have this agentic backplane called Ask Data. And every query we put out there would take six to 10 queries to come back with the result. Take about 45 seconds or sometimes a couple of minutes. And oftentimes you would come back and actually validate that that was the right answer. And in the process, we also noticed it would take us about 40,000 tokens.
Peter Levine
Wow.
Ashutosh Kulkarni
And you're looking at this. I'm saying this is a small application now. It's bringing data from all kinds of places. Our data warehouse, our slack, our gong, our claim, all kinds of sources. And then you started looking at what was it doing. It turns out our agentic application and the frontier model just went out and blasted. Tried to get everything possible, put it through the agents and keep doing this over and over again, like you were saying.
Peter Levine
Yeah.
Ashutosh Kulkarni
So once we got. We moved that to Nexus, we literally took out 90% of the token usage. We brought it down from 40,000 to about 2,000. Wow. It's under 500 milliseconds from a minute to two minutes. Most importantly, the accuracy dramatically goes up from, I think best case was about 68. We're well over 90% accuracy. And that is just version one. And that, I think, was our first revelation that, okay, we finally understood why, yeah, these things were taking so long, and they were fundamentally running on a system that was designed for human beings. And then we have a customer support agent that somebody had built about, you know, does ACME have. Are they in warranty? Are they in support? And you would go to three different sources. Like, we talked about the customer record, sales record, the product record, and you'll watch this whole thing take a lot longer than it should. And so that was our first principle, is to figure out maybe we need a system that actually brings a lot more of the context much, much more closer to the data than trying to push it into LLM. Up there?
Peter Levine
Yeah. I mean, again, it's this compilation of data to provide context and knowledge is super, super important.
Ashutosh Kulkarni
And with the same data set, you might have different contexts. Totally. And it's important to make sure that the artifacts that we created were created completely on the fly. Like we talked about, it's a context compiler, but unlike the regular compiler, it keeps iterating until it got to the right artifacts and say, yeah, for your context, for your knowledge engine you want to build for this particular agent, this is the right format.
Peter Levine
Right. So you mentioned that there's now this new language that the agent talks to, Pinecone, and all of that. What, for Nexus? How does all that work? And what was the innovation there?
Ashutosh Kulkarni
Yeah. So once we built Nexus and you have an engine where you could have an agent define what its task was, what kind of a knowledge engine it needed, it just didn't have a way to specify that there needed to be a language that both knowledge engine and a agent could actually talk. So we defined something called NoQL. It's a knowledge engine query language or knowledge query language. And the intent was to put it into three buckets and six basic parameters. One was in terms of what is the intent of this query? I want to be able to say specifically this particular query has some intent on what my ask is, what the scope of the data is. And second is in terms of time. I need this response in 45 milliseconds. Don't take an hour to come back, figure out the best way to be the response at certain time. And third, one was to really talk about governance. How much of the data set am I going to go access? Don't give me the entire data set because I need to be able to put a governance across the board to be able to come back and have explainability. Yeah, for this is what we. It's not just the knowledge engine. It's about being a trusted knowledge engine. That makes a big difference about how you deliver in the enterprise.
Peter Levine
Right. What, so this, What about the economics of this and how do we think about that and how do you, you
Ashutosh Kulkarni
know,
Peter Levine
I mean, you mentioned kind of the, the completion rate and other things. Is it. I mean, if I'm a company, I'm going to go build, let's say, build agents, right?
Ashutosh Kulkarni
Yeah.
Peter Levine
Can I quantify this up front or do I just wait and see and say, hey, like, you know, I'm going to use Pinecone Nexus and we'll see what happens. Is there a way to say you're going to get 90% completion. It's going to be, you know, 40,000 to 2,000, that kind. How do. Is there a certain class of data where we know or you know that that is going to be the outcome? Does it happen on all data? Like what?
Ashutosh Kulkarni
Yeah.
Peter Levine
How do you think about that and how should customers think about it?
Ashutosh Kulkarni
Firstly, if you think about where the cost is today, every vertical application is building their entire knowledge retrieval stack. It's like you might go back in time and say every database application was writing its own query language, building its own database, or even further up saying, I'm building my own operating system, my own solution. So one is, from a user's perspective, even with our own SaaS data, we saw 85% reduction in our actual code required because that whole part is gone. So that, that's number one in terms of ROI and tco. Second is for the same data, how many context engines are you or how many knowledge engines do you want to go provide? So the larger the data set, the bigger it becomes. If it's a small data set, by definition, it's pretty constrained. A model can do fine. In fact, it will load up the entire model in a entire data set in a context.
Peter Levine
Yeah.
Ashutosh Kulkarni
Of the model and they'll be fine. But in this case, this was important for us to go after large data sets with lots of knowledge engines, lots of tasks and agents running across the board, and the bigger they are, exponentially higher the overall benefits are.
Peter Levine
Right,
Ashutosh Kulkarni
Right.
Peter Levine
Pinecone is an infrastructure company that makes sense, you know, that just stick in an order for, you know, infrastructure requires applications or agents, stuff to get built on top of the infrastructure. Yeah. So, you know, how do teams think about this? How should they go about thinking about building, you know, apps, agents on top of this? And how does one build this in? And think about it in terms of the global, you know, sort of stack. We're basically rewriting the stack here for agents. And so what should that stack be and how to, you know, how do I get this? How do I as an enterprise actually leverage this as quickly as possible?
Ashutosh Kulkarni
Yeah, that's a big.
Peter Levine
Because everyone is saying, oh, we got to go do AI, Right. So, you know, everyone's demanding, I mean, you know, the leadership of companies do AI. Right. So the faster you get it done, the better.
Ashutosh Kulkarni
So firstly, I think if you go back to the DNA of Pinecone, it was started and continues to be a developer centric company. You have somewhere between 35 to 40,000 developers who continue to sign up, who learn about vector databases and it is those same developers who are moving and building agent applications for us. The starting point continues to be making no QL public to these developers.
Peter Levine
And does that now come with. Let's say I do a. You know, for the 40,000 people signing up. It's just built in right up front. Or is there an added. Like how does that. Yeah, how do I know about no ql?
Ashutosh Kulkarni
So one, we have to continue to partner with the agent harness companies.
Peter Levine
Yeah. Okay.
Ashutosh Kulkarni
And we may have to put things like the skill MDS for cloud to define a whole interface.
Peter Levine
Okay.
Ashutosh Kulkarni
No different than how we promoted the existing APIs.
Peter Levine
Got it.
Ashutosh Kulkarni
We have to start partnering with some of these folks. Okay. So number one is getting no QL to be adopted by the same development community that adopted Pinecone Vector Database. Now as they move up to agentic applications, they use a whole new AI API across the board. Second is partnering with. We intend to make no QLA open standard.
Peter Levine
Got it.
Ashutosh Kulkarni
So we are partnering with some of the industry standards at the right time. I think we need to get enough adoption to make sure this thing becomes an industry standard. So just like you had SQL for databases, GraphQL for APIs, you expect to have no QL for agentic applications.
Peter Levine
Got it.
Ashutosh Kulkarni
In addition, there's one more part we are also working on is to create a standardized agentic stack. What does agentic app stack look like right now? If you think about your traditional agents are the applications, LLM is a new operating system and Pinecone is the disk in between. Now you have one more thing called knowledge that becomes a standard stack.
Peter Levine
Got it.
Ashutosh Kulkarni
And to make it very easy, obviously we have the core database now we have this knowledge engine. Plus we are also opening up something called Pinecode Marketplace that we will be announcing. That makes it very easy for someone to have a prepackaged complete solution. You want time to value? You can go to Marketplace and look at either an app that we built
Peter Levine
or a third party as a blueprint to see how it works.
Ashutosh Kulkarni
Or you can just use it. It might be production ready.
Peter Levine
Yeah.
Ashutosh Kulkarni
Or you might want to customize it. The idea is for you to start as the hardcore developer of the database or as an agent application with the Knowledge engine or as an end user with a full fledged stack based solution that you can interface. And that part is both ours and the third party partners.
Peter Levine
So let's say just so I'm clear here, we have or pinecone does we 40,000 new people trying out Pinecone Vector database. Okay. Now let's just say I want to try Nexus.
Ashutosh Kulkarni
Yeah.
Peter Levine
Is that. Do I, how do I do that as a developer? Is it a new thing that I add on? Is it embedded in. And like what, how do I get that?
Ashutosh Kulkarni
It's just another API service. It's a fully managed service, just like Pinecone basis.
Peter Levine
Okay.
Ashutosh Kulkarni
So all you have to do is get your agentic applications to use NoQL.
Peter Levine
Got it.
Ashutosh Kulkarni
Completely change the economics. And the most important part here is once we start working with some of these other partners so that it becomes even easier for these agentic harnesses that build agentic applications to directly use that, the friction gets even lower. Got it.
Peter Levine
I have this crazy question anyway. The crazy question is, is the layer, the knowledge layer with no Ql, is that dependent on Pinecone being there, the vector database, or can this work with any kind of database?
Ashutosh Kulkarni
The whole idea is no Ql is supposed to be an industry standard. No.
Peter Levine
But will our implementation of it be that way? It can work with any underlying.
Ashutosh Kulkarni
Well, I think Nexus is going to be built on Pine Cone Vector database.
Peter Levine
Got it.
Ashutosh Kulkarni
No QL is supported by Nexus, but somebody else could build no.
Peter Levine
Understood. Okay, so yeah, that's a good distinction. But Nexus is the full.
Ashutosh Kulkarni
You have to have synology.
Peter Levine
Nexus has both sides of it. The top part, then the disk part and the knowledge part together.
Ashutosh Kulkarni
Absolutely. And the disk part. And then there's also in. There is also the auto ingest part, being able to connect to all kinds of sources of data.
Peter Levine
Right, right, got it.
Ashutosh Kulkarni
So you can almost imagine every tomorrow you can have vertical application, somebody has a great idea.
Peter Levine
Yeah.
Ashutosh Kulkarni
You don't go through trying to build your own database, your own operating system. You just point us to the data sources, point us to what context and what knowledge you want to go. Go back and what task you're trying to accomplish. And that's it. After that, you've got a vertical application you can focus on.
Peter Levine
Now let's look out two or three years.
Ashutosh Kulkarni
Yeah.
Peter Levine
What is this? You know, what does it look like when all this is working and you sort of explain what becomes, you know, maybe what's possible that's not possible today. You sort of explain that. But let's say in two or three years from now, how does this all look?
Ashutosh Kulkarni
Very similar to the Cambrian Explosion that happens every time somebody standardizes the most common layer, whether it is an operating system, whether it's a SQL interface. Now you'll have a explosion of vertical AI applications or agentic applications that now don't have to worry about what kind of Tokenomics, you're dealing with the speed, the accuracy. All you have to do is point us to what data sources you want us to engage with and suddenly you can focus on the real vertical application, the real vertical business case that you're trying to focus on, rather than the infrastructure underneath. And like we said before, 85% of the agent work today is knowledge retrieval. So suddenly you're out of the business of dealing with 85%.
Peter Levine
Right.
Ashutosh Kulkarni
You take all that effort, put it back into where the vertical is.
Peter Levine
Right.
Ashutosh Kulkarni
Second part is, more importantly, if you truly are deploying in large enterprises, trust becomes important. So not only do we have a knowledge engine, but you actually have a trusted knowledge engine that gives you an entire trace of how we reason to get this answer, gives you the citation of where the data came from so that you have an explainable AI. At the same time, you're doing it at a just not just the economics of using a model, but also you're getting out of the business of building ETL pipelines. You're building knowledge engines completely on the fly. The old model of analytic source, transform it, load it into vector database one time, that's gone. Now you have context compiling on the fly as you require. And that's a big change in how people go back and deploy today. If you think about it, the demo is great. It comes out very quickly. Everybody runs an AI agentic application and then they stall. They have to go through this ETL pipelines. They have to worry about trust, they're worried about security. You have removed all those barriers. You just dramatically simplified and dropped down the cost.
Peter Levine
So speaking of cost, how. What does the pricing look like for, you know, and how. How is Pinecone thinking about evolving pricing relative to what we're talking about here?
Ashutosh Kulkarni
We have a first draft of it and we continue to work with several partners to identify what the right pricing is. But it will be more aligned with how knowledge is curated, knowledge is extracted and tasks are completed, and less about infrastructure. It'll not be about reads and rights, it'll be at a level that is more about task completions, what kind of knowledge you want us to curate. So we'll continue to evolve that one.
Peter Levine
Nice.
Ashutosh Kulkarni
Yeah, yeah. Sometimes we thought about just, it could be as simple as how many tokens we are saving you. Yeah, it could be as simple as that one. But turns out that itself is not in a good metric because somebody could give you a product for $0, but the trust is terrible or the accuracy is terrible, then that's useless. So we Tried to combine both of those. I think one other thing we have done is now that we're opening up to an entire new interface for agents, where you expect a thousand X more agents than human beings, human users, probably more.
Peter Levine
Yeah.
Ashutosh Kulkarni
It was important for us to also change the economics of the underlying platform itself. The vector database itself needed to enable the economics so that you have a vector database, you have a knowledge engine, you could stack all of them at the same kind of pricing and margins. So we also are announcing an entirely new price point that allows for this entire knowledge engine to be much more successful in terms of adoption. So part of the announcement will be the first of the changing the cost structure for the core database itself, followed by. We will be doing that for rest of the year. So not only are you democratizing the access, but you're also opening up the economics for a lot more use cases.
Peter Levine
Got it. Got it. Yeah. That's exciting. The fascinating element here, and I'll say this. It's hard to believe that this knowledge, you know, nexus, the knowledge engine here and the compiling of data to make context and all that has such a dramatic impact on the number of tokens used. Right. It's astounding.
Ashutosh Kulkarni
Yeah.
Peter Levine
And if you just think of, like, I mean, this is. This is. It's all. It's. It's sort of revolutionary in the way, I mean, we talk about it, you're like, oh, it's casual. Just put this thing in and you'll say, go from 40,000 to 2,000. I mean, that's a fricking major, major shift. And it's hard for. I mean, just intellectually, it's hard for me to believe that, you know, Pinecone, like, actually has this.
Ashutosh Kulkarni
Yeah. And,
Peter Levine
you know, I guess it's.
Ashutosh Kulkarni
When you and I have seen this parallel before, there was a time when I O interfaces, all of the IO code used to run on CPUs. Yeah. And CPUs are expensive. Everybody worried about the cycles you use.
Peter Levine
Right.
Ashutosh Kulkarni
And then you started offboarding that onto dedicated processors.
Peter Levine
Yeah.
Ashutosh Kulkarni
Like I O boards, IO cards, if you remember. Yeah, yeah. Networking. Same thing.
Peter Levine
Ops was another. I mean, all of those.
Ashutosh Kulkarni
All those different MD is based on uploading somebody's specialized functions. That is exactly what we did. It's history repeating itself. To say much of this stuff, you're putting on very expensive frontier models. You're offboarding that to very specialized things.
Peter Levine
Right.
Ashutosh Kulkarni
And allowing applications, people.
Peter Levine
I mean, you know, it really strikes me that we're it. It's. And this is good for the industry. And good in general, we're really at the very early innings here of this whole transformation because if you think like okay, it's expensive, there's tokens, now we're going to optimize. It's kind of like all these industries, you know, like there are past examples, graphics, whatever, networking, all that they created. There were whole industries that got created by optimizing the first order, right. So the first order was everything runs on a cpu, right? You know, it's oh my God, we gotta, you know, have more CPUs and all this. But then it was like, no, no, no, we're gonna take, we're gonna offload that CPU and go do other specialized things. And they created, there were, I mean of course like entire industries were created out of that with a lot of the same use case being the fundamental, like you gotta move bits around on the network or you gotta show graphics or whatever. It just the cost load shifted to a more appropriate area. And that's like what, that's what we're seeing here. And it's all. I will, I will venture to say, no pun intended, there's going to be a lot of this. I mean whether it's pine cone or other areas of the industry, right. We're like in the first inning of a multi, you know, multi inning game goes into overtime. You know, like it's just, it has been done.
Ashutosh Kulkarni
I mean it has been tried. The first one was we looked at this one for some time. We knew the problem, we knew the solution. We also spent a lot of time wondering, are we the right people to do this? Yeah. First one was why don't we just Claude. Do this thing. LLMs do this thing and you realize they are too far away from the data. To them it's just data.
Peter Levine
Right.
Ashutosh Kulkarni
Everything just brute force you with trying to figure they were too far away. And not only that, each of us uses, each agent tech application uses multiple models within a single task. So what am I going to do? Load up LLMs with the data? That's unwieldy. So ultimately it comes back to first order stuff. If you are talking about getting knowledge and the knowledge is being derived from data.
Peter Levine
Yep.
Ashutosh Kulkarni
You have to be as close to the data as possible.
Peter Levine
Yeah, yeah.
Ashutosh Kulkarni
And be at the closest point.
Peter Levine
Yeah, yeah, that's. I mean it's awesome. I mean and I think, yeah, there's going to be a, there's going to be a lot of opportunity. I mean I just think a lot of opportunity, you know, Pine Cone, Pinecone aside to optimize the, like, AI is, you know, it's incredible, it's magical and all that, but it's. It's a very blunt instrument right now, you know, and like, yeah, we're going to sharpen a lot of things up over the next, you know, the. The long tail of this is to, you know, optimize and a lot of things.
Ashutosh Kulkarni
The biggest one continues to be around trust and security. Yeah, for sure, for sure.
Peter Levine
That's an opportunity in and of itself.
Ashutosh Kulkarni
Right.
Peter Levine
But all these other bits, I mean, you know, and if you look at sort of, you know, the past history of computing, a lot of these things repeat themselves in terms of the importance of offloading processes, the importance of security, the importance of data governance, the importance of, you know, applications having the right access. I mean, all of these bits and pieces sort of come together.
Ashutosh Kulkarni
I'll give an example of what else we're doing, which MCP interfaces, which have become the de facto way. In fact, I posted this yesterday or day before as we looked at that they were the first ways to define access, a standard access for a model to access any source of data. Gen 1. Great. Nobody cared. It made it very easy. And now you're finding out each MCP interface sucks up a lot of tokens because they're not optimized. So now you get to the point, where can I put an MCP interface optimization behind access? Or maybe somebody else designs a router?
Peter Levine
Yeah, for sure.
Ashutosh Kulkarni
So there are definitely very early innings. I think we find one part of the stack that we think we are focusing on and then we'll continue to have other partners.
Peter Levine
Well, you know, I'm looking forward to seeing how all this evolves.
Ashutosh Kulkarni
Yeah, we love it. Love it. This changes on a daily basis.
Peter Levine
Yeah, for sure.
Ashutosh Kulkarni
This is amazing. Yeah, awesome.
Peter Levine
Great time to be in the business. All right, I agree.
Ashutosh Kulkarni
Thank you, Peter.
Peter Levine
Okay, brother, thanks.
Podcast Host
Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or review and share it with your friends and family. For more episodes go to YouTube, Apple Podcasts and Spotify. Follow us on x16z and subscribe to our substack@a16z.substack.com. thanks again for listening and I'll see you in the next episode. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com disclosures.
Title: From Vector Databases to Knowledge Engines: The Next Layer of AI
Podcast: AI + a16z
Date: May 5, 2026
Host: Peter Levine (a16z Partner)
Guest: Ashutosh Kulkarni (CEO of Pinecone)
In this episode, Peter Levine and Ashutosh Kulkarni explore the evolution of data infrastructure powering AI, particularly the shift from traditional vector databases to advanced “knowledge engines.” They discuss Pinecone’s launch of Nexus, a new system purpose-built for AI agents rather than humans, and explore how the developer and enterprise landscape is being transformed by these innovations in retrieval-augmented AI.
On the magnitude of improvement:
Peter Levine: “If you just think of, like, I mean, this is. This is... It's astounding... Go from 40,000 to 2,000. I mean, that's a fricking major, major shift.” [39:26]
On the industry disruption:
Ashutosh Kulkarni:
“History repeating itself. To say much of this stuff, you're putting on very expensive frontier models. You're offboarding that to very specialized things.” [40:17]
On democratizing trusted AI for enterprises:
Ashutosh Kulkarni:
“Not only do we have a knowledge engine, but you actually have a trusted knowledge engine that gives you an entire trace of how we reason to get this answer, gives you the citation of where the data came from so that you have an explainable AI.” [35:22]
On agent-first infrastructure:
Ashutosh Kulkarni:
“85% of the agent work today is knowledge retrieval. So suddenly you're out of the business of dealing with 85%. You take all that effort, put it back into where the vertical is.” [35:18]
| Timestamp | Segment Description | |-----------|--------------------------------------------------------| | 00:00–03:49 | Introduction to changing user base—agents, not humans | | 04:09–06:51 | Problems with current systems & agent inefficiency | | 07:13–10:31 | Defining vector databases vs. knowledge engines | | 11:15–13:37 | How contextualization and reasoning move into Nexus | | 14:39–16:11 | Building knowledge engines & “training” context | | 17:24–18:34 | Quantified results: completion, latency, token use | | 20:02–23:26 | Case studies: Pinecone’s own adoption | | 23:41–30:27 | Launch and explanation of NoQL, open standards | | 30:51–34:02 | Marketplace and developer experience | | 34:04–36:43 | Future outlook: explosion of vertical AI | | 36:57–39:26 | Economics and anticipated cost structure | | 40:13–44:52 | Industry analogies, offloading, early innings analysis |
| Challenge in Old Model | Solution/Outcome with Nexus | Result/Benefit | |----------------------------------------|--------------------------------------------|----------------------------------------| | Agents brute-force, lack context | Nexus knowledge engines contextualize data | Higher accuracy, less token use, speed | | Completion rates below 50% | Above 90% with tailored knowledge engine | Tasks are completed reliably | | High costs due to token overuse | 40,000→2,000 tokens per task | Dramatic cost reduction | | Human-centric outputs | Structured, agent-oriented responses | Agents process more efficiently | | Fragmented query interfaces | NoQL knowledge query language | Standardized, easy agent integration | | Ad-hoc app engineering | Pinecone Marketplace with blueprints | Accelerated time-to-value |
The episode presents a clear and actionable vision for the next layer of AI infrastructure. By using knowledge engines and new agent-native interfaces like NoQL, Pinecone and similar companies are targeting the bottleneck in AI applications—not the model, but the structure, retrieval, and contextualization of data itself. This new paradigm promises faster, cheaper, and more reliable AI-powered automation for enterprises, and signals the emergence of a robust ecosystem for agentic applications in the years ahead.
Ashutosh Kulkarni:
"This is literally the massive gap that we've had between models that have spent a ton of time building reasoning capabilities and people have completely ignored where the real value is, which is on the data side, not its side." [16:11]