Episode Overview
Title: From Vector Databases to Knowledge Engines: The Next Layer of AI
Podcast: AI + a16z
Date: May 5, 2026
Host: Peter Levine (a16z Partner)
Guest: Ashutosh Kulkarni (CEO of Pinecone)
In this episode, Peter Levine and Ashutosh Kulkarni explore the evolution of data infrastructure powering AI, particularly the shift from traditional vector databases to advanced “knowledge engines.” They discuss Pinecone’s launch of Nexus, a new system purpose-built for AI agents rather than humans, and explore how the developer and enterprise landscape is being transformed by these innovations in retrieval-augmented AI.
Key Discussion Points & Insights
1. The Shift from Human Users to AI Agents
- Old Paradigm: Traditional databases and search systems were built for humans—users posed queries, evaluated the response, and provided both context and judgment.
- Emerging Paradigm: The vast majority of current users are “agents” (LLM-powered software) that interact differently: they issue many queries in rapid succession, lack human context, and depend entirely on data structure and retrieval systems.
- Ashutosh Kulkarni: "About eight, nine months ago, we started seeing a massive shift of who our users are. It turns out it wasn't a human being anymore. It was an agent." [00:00]
2. Bottlenecks in Current Systems & Agent Behavior
- Inefficiency: Agents operate by brute force—issuing dozens of queries, repeatedly consuming large quantities of tokens, and often failing to complete tasks (completion rates <50%).
- Ashutosh Kulkarni: "Turns out the task completion rates is less than 50%. So half the task... these agents don't actually complete." [05:34]
- "85% of the agents work isn't just retrieving knowledge and only 15% is the models. The problem is the underlying system that you're trying to get information from." [05:16]
3. From Vector Database to Knowledge Engine: Defining the Difference
- Vector Database: Works like a library—given a query, retrieves relevant documents for a human to interpret.
- Knowledge Engine: Acts as an expert—synthesizing, curating, and contextualizing data automatically for an agent performing a specific digital task (e.g., medical billing).
- Ashutosh Kulkarni: "A vector database treats all data like it's a pool of data, like a library... you need something else on top that can... create a context, Very, very specific context." [10:10]
- "A knowledge engine is more like an expert. An expert in some task you're performing." [09:18]
4. Technical Approach: Building and Retrieving with Nexus
- Curation Phase: Pinecone’s Nexus compiles raw data into new artifacts specifically fit for a given application context—breaking up data for the needs of billing, contract management, etc.
- "You are now compiling the context very specifically for the knowledge engine... as new data comes in, it gets converted into this new format that is very close... It gets cited back to where the source is." [13:13]
- Retrieval Phase: Nexus serves highly-structured, agent-friendly responses rather than human-oriented outputs, improving speed, accuracy, and cost.
- "Don't give me a poem. Don't give me an image... Give me very structured data. Tell me exactly in a very structured format. Because I'm a machine. I understand structure." [14:36]
- Ease-of-use: Setting up requires providing examples of source data and desired outputs ("training the context of the knowledge engine"). The build phase is highly automated and fast.
- "Runs about three to five turns, takes a few minutes, and you create an entire new artifacts... you're training a knowledge engine." [16:01]
5. Quantified Gains: Dramatic Improvements Delivered by Nexus
- Task Completion: Task success rates jump from ~50% to above 90%.
- Ashutosh Kulkarni: "The success rate of a task... it goes up well above 90%. You actually have agent[s] finishing a task." [17:24]
- Latency: Response time drops, e.g., 1-2 minutes down to under 500 milliseconds.
- "We brought it down from 40,000 to about 2,000. Wow. It's under 500 milliseconds, from a minute to two minutes." [21:28]
- Token Usage: Dramatic reduction in AI infrastructure load—40,000 tokens per task down to 2,000 (up to 90% savings).
- "Put in 40 to 90% reduction in Frontier model tokens... big cost, performance saving, the whole thing." [18:01]
6. New Interface: NoQL Knowledge Query Language
- Pinecone introduces NoQL, a query language for agents to communicate intent, data scope, time constraints, and governance requirements to the knowledge engine.
- Intent: Clearly specify the query purpose.
- Time Constraints: Request answers within strict latency SLAs.
- Governance: Specify data access, provenance, and explainability.
- "We defined something called NoQL... it's a knowledge engine query language... intent of this query... time... governance." [23:41]
7. Impact on Development, Standards, and the Stack
- For Developers: Nexus and NoQL lower the code and expertise bar for building agentic applications, with prebuilt artifacts and marketplace solutions.
- Open Standard Ambition: Intends NoQL to become the “SQL for agentic applications.”
- "We intend to make NoQL a[n] open standard... just like you had SQL for databases, GraphQL for APIs, you expect to have NoQL for agentic applications." [30:10]
- Marketplace: Pinecone will offer a marketplace for ready-to-use or customizable agentic solutions, accelerating enterprise adoption.
- "We are... opening up something called Pinecode Marketplace... makes it very easy for someone to have a prepackaged complete solution." [30:51]
8. Economics & Future of AI Apps
- Cost Structure: New pricing will be based on knowledge curation and task completion, rather than classic infrastructure metrics (reads/writes/tokens).
- "It will be more aligned with how knowledge is curated, knowledge is extracted and tasks are completed... not about reads and rights, it'll be at a level that is more about task completions..." [36:57]
- Vision: As the infrastructure standardizes, the industry will see a "Cambrian Explosion" of vertical AI applications.
- "Very similar to the Cambrian explosion... you'll have a[n] explosion of vertical AI applications or agentic applications..." [34:24]
Notable Quotes & Memorable Moments
-
On the magnitude of improvement:
Peter Levine: “If you just think of, like, I mean, this is. This is... It's astounding... Go from 40,000 to 2,000. I mean, that's a fricking major, major shift.” [39:26]
-
On the industry disruption:
Ashutosh Kulkarni:
“History repeating itself. To say much of this stuff, you're putting on very expensive frontier models. You're offboarding that to very specialized things.” [40:17]
-
On democratizing trusted AI for enterprises:
Ashutosh Kulkarni:
“Not only do we have a knowledge engine, but you actually have a trusted knowledge engine that gives you an entire trace of how we reason to get this answer, gives you the citation of where the data came from so that you have an explainable AI.” [35:22]
-
On agent-first infrastructure:
Ashutosh Kulkarni:
“85% of the agent work today is knowledge retrieval. So suddenly you're out of the business of dealing with 85%. You take all that effort, put it back into where the vertical is.” [35:18]
Timeline of Important Segments
| Timestamp | Segment Description |
|-----------|--------------------------------------------------------|
| 00:00–03:49 | Introduction to changing user base—agents, not humans |
| 04:09–06:51 | Problems with current systems & agent inefficiency |
| 07:13–10:31 | Defining vector databases vs. knowledge engines |
| 11:15–13:37 | How contextualization and reasoning move into Nexus |
| 14:39–16:11 | Building knowledge engines & “training” context |
| 17:24–18:34 | Quantified results: completion, latency, token use |
| 20:02–23:26 | Case studies: Pinecone’s own adoption |
| 23:41–30:27 | Launch and explanation of NoQL, open standards |
| 30:51–34:02 | Marketplace and developer experience |
| 34:04–36:43 | Future outlook: explosion of vertical AI |
| 36:57–39:26 | Economics and anticipated cost structure |
| 40:13–44:52 | Industry analogies, offloading, early innings analysis |
Summary Table: Key Innovations and Takeaways
| Challenge in Old Model | Solution/Outcome with Nexus | Result/Benefit |
|----------------------------------------|--------------------------------------------|----------------------------------------|
| Agents brute-force, lack context | Nexus knowledge engines contextualize data | Higher accuracy, less token use, speed |
| Completion rates below 50% | Above 90% with tailored knowledge engine | Tasks are completed reliably |
| High costs due to token overuse | 40,000→2,000 tokens per task | Dramatic cost reduction |
| Human-centric outputs | Structured, agent-oriented responses | Agents process more efficiently |
| Fragmented query interfaces | NoQL knowledge query language | Standardized, easy agent integration |
| Ad-hoc app engineering | Pinecone Marketplace with blueprints | Accelerated time-to-value |
Final Thoughts
The episode presents a clear and actionable vision for the next layer of AI infrastructure. By using knowledge engines and new agent-native interfaces like NoQL, Pinecone and similar companies are targeting the bottleneck in AI applications—not the model, but the structure, retrieval, and contextualization of data itself. This new paradigm promises faster, cheaper, and more reliable AI-powered automation for enterprises, and signals the emergence of a robust ecosystem for agentic applications in the years ahead.
Ashutosh Kulkarni:
"This is literally the massive gap that we've had between models that have spent a ton of time building reasoning capabilities and people have completely ignored where the real value is, which is on the data side, not its side." [16:11]