Summary6 min read

Podcast Summary: Redis and AI Agent Memory with Andrew Brookins

Podcast: Software Engineering Daily
Host: Sean Falconer
Guest: Andrew Brookins, Principal Applied AI Engineer at Redis
Date: August 26, 2025

Overview

In this episode, Sean Falconer speaks with Andrew Brookins, Principal Applied AI Engineer at Redis, to explore the complexities of building agentic AI systems—particularly focusing on the engineering of memory systems for AI agents, the expanding role of Redis in the agentic stack, the nuances between different memory types, the evolution of search strategies like hybrid search, and the critical missing pieces in today’s infrastructure for truly autonomous AI agents.

Key Discussion Points & Insights

1. Why AI Agent Memory Is a Hard Problem

Statelessness and Limited Context: LLMs are stateless and have limited context windows, making it challenging to maintain continuity and reliability across sequential interactions.
- Quote:
  
  “LLMs don’t model state transitions like that for environments. That’s where they tend to break down.” – Andrew (02:24)
From Chat to Agency: Demonstrations and proof-of-concepts tend to mask these weaknesses, but productionizing agents that act in, and predict changes in, real environments exposes the limits of traditional LLMs.
Mapping Problems: Building effective agents requires deep understanding of which tasks are suited for agentic approaches and what extra engineering (especially around prediction and state) is required.

2. The Anatomy of Memory in AI Agents

Layers of Memory:
1. Message History (Short-term/Working Memory): Storing recent conversational exchanges.
2. Summarization: Condensing those histories to fit context windows; beyond storage, involves task-specific engineering.
3. Long-Term & Reference Memory: Extracting significant facts over time to persist and retrieve for future interactions.
  - Metaphor:
    
    “A cognitive system interacting with people all day… their brain in a background thread picks things out that are important.” – Andrew (06:26)
Difference Between Knowledge Base and Long-Term Memory:
- Knowledge base = developer-supplied facts; long-term memory = runtime-learned facts by the agent.
- Quote:
  
  “Retrieval looks probably similar… but long-term memory is stuff the agent learned at runtime, while RAG or knowledge base is what developers inject.” – Andrew (09:03)
Durable State & Workflow: True agentic applications require not just LLM memory, but tracking environment state, tool calls, event-driven logic, and durable execution, sometimes requiring checkpointing and dynamic workflows.
- Quote:
  
  “We're really talking about durable execution at some level; dynamic workflow that... is going to die in the middle of something and have to restart.” – Andrew (10:24)

3. Redis in the Agent Memory Stack

Filling Multiple Roles: Redis is used for:
- Fast storage/retrieval of working memory (message history, context blobs)
- Vector search and indexing for both knowledge base and long-term memory (fast retrieval)
- Streams for managing workflow and orchestrating agent state
- Quote:
  
  “Redis is absolutely a great fit [for working memory]. It’s super fast… and as a vector DB, it can serve retrieval too.” – Andrew (12:16)
Recent Innovations:
- Addition of a query engine for dynamic queries and vector indexing (Redis 8).
- New vector sets data structure for more native vector operations.
- Semantic Caching: Caching based on similarity, not just keys, to avoid redundant LLM calls.
  
  “Semantic caching is caching based not on a deterministic pattern, like a key… instead on similarity.” – Andrew (18:19)
Integration and Ecosystem:
- Redis Vector Library (Redis VL) provides drop-in components for agent frameworks (LangGraph, LangChain).
- Move toward plug-and-play memory providers in frameworks, though complete standardization is difficult due to nuanced database behaviors.

4. Engineering and Modeling Memory

Schema and Recency: Storing structured data (timestamps, metadata) is crucial to supporting episodic (time-bound) and semantic (general fact) memory.
- Example: Storing both “likes apples” (semantic) and “preferred Cosmic Crisp this week” (episodic).
Prioritization in Context Construction: Recent information often overrides older long-term memory, but prioritization logic is still an open engineering challenge.
- Quote:
  
  “I feel like the answer is still out there… I’m just trying to screw with the prompt until I get better retrieval many times.” – Andrew (30:44)
Hybrid Search: Combining vector (semantic) and keyword (exact) search, and choosing between them based on task—e.g., code navigation (semantic) vs. variable renaming (keyword).
- Quote:
  
  “It often tends to be task-specific… the demands on search are different.” – Andrew (34:34)

5. “World Models” as the Next Frontier

Limitation of Context & Retrieval: Agents can memorize and retrieve facts, but without the ability to predict environment state changes, they fall short of autonomy.
- Quote:
  
  “Agents need to predict how the environment will change… agents that are playing text games, you can do a lot of context engineering, but they just don’t improve that much.” – Andrew (44:15)
The Need for World Models: Inspired by DeepMind’s research, agents must be able to model and predict the consequences of their actions—true environment state transitions—beyond “just” managing memory.

Notable Quotes & Memorable Moments

On the challenge of agent memory:

“It's actually quite difficult at a certain point when you have enough messages to try to figure out what exactly is relevant to the incoming question or the input.” – Andrew (05:32)
On durable execution:

“When you realize… you’re really talking about durable execution at some level… like checkpointing, right? … it’s been around for a while.” – Andrew (10:24)
On open standards for “memory”:

“Open standard for memory… doesn’t always work because all databases have slightly different traits… especially things like filtering the vector search.” – Andrew (20:55)
On hybrid search merits:

“Vector search is great… when the input is less specific, you want to find clusters of related things… But when I tell the agent to rename a variable, I just need keyword search and exact matches.” – Andrew (34:34)
On schema and recency:

“You definitely do want to store more than just the text… The time that the user referenced in the memory… Then you could order information by the times that users were talking about.” – Andrew (27:58)
On the missing piece in AI agents:

“General agents need world models…to predict how an environment will change. That’s fundamentally not about predicting the language that will represent that afterward.” – Andrew (44:15)

Important Timestamps

Statelessness & Prediction Gaps: 02:01–03:53
Defining Agent Memory Layers: 04:26–09:48
Durable State and Workflows: 09:48–11:47
Redis for Working, Long-Term, and Knowledge Memory: 12:16–16:50
Semantic Caching: 18:16
Plug-in Memory Frameworks, Standards: 20:35–23:06
Modeling Memory Types and Recency: 24:27–29:51
Prioritizing Short- vs. Long-Term Memory for Context: 29:51–31:54
Hybrid and Task-Specific Search: 32:53–37:46
World Models and Agentic Limits: 44:15–47:04

Final Thoughts

Andrew and Sean’s conversation paints a vivid picture of the cutting-edge engineering challenges at the intersection of AI agents, memory, and data systems like Redis. Redis is no longer just a cache; it sits at the heart of next-gen agentic architectures, powering everything from fast working memory to hybrid vector-structured search to workflow orchestration. The path to robust, autonomous agents will require not just better memory systems and smarter context engineering, but crucial advances in world-modeling—the ability to understand and predict the consequences of actions in dynamic environments.

“That sounds like a project that absolutely is going to fail. And I am all in because that’s what's exciting. If I don’t see the path yet to that working really well, I’m very excited to find it.”
– Andrew (47:41)

Loading summary

Transcript59 lines

[00:01]
A
A key challenge with designing AI agents is that large language models are stateless and have limited context windows. This requires careful engineering to maintain continuity and reliability across sequential LLM interactions. To perform well, agents need fast systems for storing and retrieving short term conversations, summaries and long term facts. Redis is an open source in memory data store widely used for high performance caching, analytics and message brokering. Recent advances have extended Redis capabilities to vector search and semantic caching, which has made it an increasingly popular part of the agentic application stack. Andrew Brookins is a principal Applied AI Engineer at Redis. He joins the show with Sean Falconer to discuss the challenges of building AI agents, the role of memory in agents, hybrid search versus vector only search, the concept of world models, and more. This episode is hosted by Shawn Falconer. Check the show notes for more information on Shawn's work and where to find him.
[01:19]
B
Andrew, welcome to the show.
[01:20]
C
Thank you. Thanks for having me. I'm a big fan, so this is fun.
[01:25]
B
Nice. Yeah. Well, I'm glad you could be here. Glad we could work it out. Always good to have a fan on the show as well.
[01:31]
C
Absolutely.
[01:32]
B
So I wanted to kind of start with the big picture or a big picture question. A lot of people are saying that 2025 is going to be this breakout year for AI agents. It's the year of the agent. There's a lot of hype going on in the market. Right now we're moving beyond just basic chat. So from your perspective, what makes building these more autonomous agentic systems hard? And why does memory or other components kind of play such a central role here?
[02:01]
C
Yeah, well, I think I've been thinking a lot about this, of course, and one of the reasons I think it's so difficult is that many of the tasks that we can put into a POC to show off what an agent can do backed by an LLM, they satisfy the weaknesses of the LLM. Right. They draw on information and training. They use that generative ability plus context engineering to produce information effectively. And LLMs can do that really well. And we've done a lot of work now to make agents be able to do that as well. Right. But the tricky part is, I think when the agent has to integrate in any kind of environment and do something, actually change something, and crucially be able to predict the outcome of the change. And that's the part that, you know, LLMs just don't, don't model. Actually, they don't. They don't model state transitions like that for environments. That's where they tend to break down, where agents tend to break down.
[03:01]
B
And you think that is that primarily that, you know, companies get stuck in sort of that, you know, demo PoC mode where they just can't move to a productionization standpoint because of that limiting factor?
[03:13]
C
No, I think that's. That's a whole other problem and is also a problem. I think it's more along the lines of it's really easy to think about, let's build an agent for this problem, but it's harder to think about the problem and map that back to what the agent will actually be good at and what if. What you will need to make it good at certain tasks. So tackling certain problems with an agent and with an LLM require more thought about the predictive component beyond just, you know, chatbots and things like that. And then there's the whole side of just getting out of the POC phase and getting things into production, any meaningful capacity. And that itself is also a large problem for other reasons.
[03:54]
B
Yeah. In terms of some of the challenges around memory or the value of memory when building some of these agentic systems, every sort of call to the model's stateless. And then of course, they have a limited capacity in terms of how much information you can be sending to them in the context. So what are sort of the components of memory you need to be thinking about and how do those influence context? And how do you work through sort of the optimization challenges of feeding it the right context at the right time?
[04:26]
C
Yeah, absolutely. Great question. So I think thinking about memory, you know, even in your question, right. There's some implicit assumptions about what it is, exactly what type of thing it is. This happens all the time when I talk to folks about memory. And it's really interesting. So the most basic level, right, if we just are not even talking about agents, if we're just talk like a web application, I'm a user, I go to a web application, I do something, I come back, I expect it to continue where we left off. Right. It's just like fundamentally there's stateful interactions. Most things are just a stateful interaction where it continues. And just the fact that the LLM being this component inside of more complex applications is itself stateless. Right. So we tend to think about that a lot, but really it's just kind of like, well, so are all of the web servers that we've been building applications on. What do we have to do to make that work? Well, we have to store data, we have to use a database. And so for me that part is the one that's like, okay, this has to be a given. We have to start the process of building something with an LLM with the assumption that we'll have to store data. And that is a form of memory. And kind of you can map this into like the traditional, traditional, the 2025 version of like, what are the small pieces inside of that memory box? One of them is typically message history of some kind. Because obviously, like, if what you're continuing is a conversation, you need the messages that were sent in the past, right? So messages are. They tend to be this, like, lowest level or, you know, fundamental part that's more about storage because we don't have to think a lot about storing that. We can just store messages. It's not a big deal until it is a big deal. Because, like, the person never started a new chat. They never actually started a new thing. They just started a new conversation within the same, same data conversation, right? I think many of us have seen this, right? Many of us know, like the, the, the whole point that we talk about now, context engineering versus prompt engineering, which was last month, probably is like, it's actually quite difficult at a certain point when you have enough messages to try to figure out what exactly is relevant to the, the incoming question or the input. And so then it becomes a question of, okay, this has gone on for some time. You know, we know the limit of this model, but we also know that research suggests, right, even with long context models, it's still important for us to send as little as possible, right? Because they still get lost. So it's a matter of compacting that or summarizing the conversation. And now we're talking about a different thing. This is just my opinion, right? So messages are just data. We're just storing what, what we have. Summarization starts to become an engineering problem beyond just storage, where we have to actually figure out how to get this conversation history for this application and this user summarized correctly so that we can reduce the amount of context we're sending. And then it goes from there, right? So then I'll wrap it up, right? But that's like the next level. Beyond that, if you imagine a person or a cognitive system, let's say, that's interacting with people all day, you know, what's happening is they go to sleep and their brain in a background thread, let's say, or a process picks things out that are important and tends to remember them. The other thing that can happen is that somebody asks about tacos 57 times or 57 people ask about tacos and the person also remembers the taco thing, even though maybe that's not that important. But like it's so frequent that they, their brain usually like stamps it in. So you get this thing that's extracting long term facts and putting them somewhere in the brain and then pulling them back out later. And that usually is how we think about cognitive system of an agent. Right. So there's that third thing of taking pieces out of this conversation, putting them somewhere we can use them later. Typically when the person comes back, the user and wants to interact again. Well, we know something. We don't have to look at all of the summarized message histories. We just know that they're vegan and so of course we would give them a vegan recipe. Right. So, so those are three big areas that I think about with memory.
[08:41]
B
Yeah. And then there's also the sort of reference data that serves as this, you know, potential long term memory. Like whether I'm looking that up in a vector database or some other type of data store, I might want to pull that in form a particular, be part of essentially a particular context which then gets factored into this sort of loop, compaction loop, summarization loop that also might be used in a later cycle.
[09:04]
C
Yeah, absolutely. It's really interesting to think about what's the difference between the knowledge base and long term memory. For example, because retrieval looks probably similar, we're going to use like hybrid search or vector or keyword search to pull stuff out. So it tends to be, you know, just a factor of time. So typically like long term memory. And again, I'm not even, I'm not contradicting you even. I'm just like, this is just how I think about it. Right. So long term memory is like stuff that we learned, the agent learned at runtime, let's say. Whereas RAG or the knowledge base tends to be stuff that we, the developers of the agent knew to include or we dynamically injected, but still we knew that we wanted to inject it. So there's a slight difference, but the result tends to be very similar. We store it, retrieve it.
[09:49]
B
Yeah. And then even outside of the direct interactions with the model, thinking about the whole piece of software that's like an agent because these are really full body systems. It's not like you're just interacting with a model. The model's telling you to go and call a tool where the tool is maybe going to communicate with an API. Then there's other sort of memory systems that are part of this outside of the Model of how do you maintain state to make sure that if the API call fails or it succeeds, you're not making that call multiple times. These types of sort of key characteristics of event driven systems or durable execution and so forth too.
[10:24]
C
Oh yeah, absolutely. Yeah. That's really fascinating. Right, so this is why I think people get tripped up thinking about this. Because it's so much more complicated than it sounds. Because it, you know, you tend to like. Well, personally I tend to even knowing a lot of this stuff, right. I'll tend to think about or talk about an agent and be like the model, the agent, the model, just interchangeably. Right. But actually it's neither of those things. It's a complex system that involves lots of different types of state. And when folks think about building out an agent that does complex things, multiple tool calls, a deep research agent, let's say, you know, often what they don't realize perhaps until too late maybe, or they realize and understand in the right amount of time, is that we're really talking about durable execution at some level. At some level we're talking about a dynamic workflow that typically in production is going to die in the middle of something and then have to restart or the user's going to be like, whoa, actually I was wrong. It's not a dog, it's a cat that I was looking at. So just back up and start at the other point where we were at and so restarting from that other point, similar to the workflow crashed and we need to rerun it but not re execute every step. So there's all those like checkpointing, right? Is what Langgraph would call that. But it's been around for a while, right? It's been around all these different workflow systems. So yes, a big yes.
[11:47]
B
So in terms of, you know, redis role in all this, if we look at just starting, even with short term memory, which is kind of where we first started talking about, you know, what are these different memory systems. LLMs have these limited context windows. You have agents that are maybe communicating over sort of the same session or thread. If it's interacting with another agent or interacting with a user that needs to be tracked. You have those sort of message histories that you're passing back and forth. Where does redis fit into that architecture?
[12:17]
C
So I view it as very similar to a web application that is not agentic or deterministic web application. Let's say you've got the where does it fit from my perspective? And then you've got also where does it fit from the Perspective of random production engineer. They tend to overlap, but there are some ways I tend to think about it differently maybe. So I'll give you my perspective first. First of all, we're talking about agents. And of course, so like sticking closely to agents, there is this concept of often we'll call it short term memory, but lately I think about it more as working memory. Just like I need working memory to solve problems as a human being, we need somewhere to put the stuff that we're juggling right now. And that tends to be for these applications, a message history or message history, you know, and a summary of the past message history from six months ago that we summarized already. So Redis is absolutely a great fit for that. It's super fast from just accessing it through the direct data structures that we have in Redis that have been around forever through the query engine, which is kind of new for the open source version, I think as a core component of redis, it's in Redis 8 Community Edition. And actually, yeah, so it's in Redis 8. Before that though, it was still available in modules and things. But really what we're saying is data structures, or if you want to define like a schema and make queries with a query language, you can also do that with Redis. In both of those ways, it's extremely fast, faster than most things you're going to use for data. So for working memory, it's a great fit, especially because you can also think about working memory as being like key value lookups. We know exactly what we need. We need that thing for that user and we'll just pull it out. And that blob is what the agent's working with in working memory. However, as a vector database, it can also serve a purpose in retrieval, right? Retrieving stuff from the knowledge base that you can incorporate in the idea of memory or from long term memory. So if you actually store these extracted facts over time in long term memory, it's great at retrieval as well also very quickly. So. And that's not even it, right? So actually just final finishing where I view Redis is streams. I think streams are a really overlooked thing perhaps about Redis, right? So in the production agent that I most recently worked on at Redis, it's all about background tasks and streams. We track the state of this workflow in streams with a background task library called docket. So Redis is right there helping us manage like orchestrating the state of this dynamic workflow that is an agent. It's also there in short Term memory and in long term memory and in retrieval for knowledge based stuff. Now of course I work for Redis, so, you know, I'm incentivized to use it for everything as possible. But I actually came back to work for Redis. I worked here once before and there was a day I was building out code retrieval for, you know, a RAG application basically, and I was using Postgres and I had been working with Postgres quite a bit at that time and like at extreme scale levels of Postgres and had been crashing in production like a ton. I was really angry at Postgres and I realized that you could do everything I was doing with Redis and it would scale out the way that I wanted and it would be really fast. So anyway, we have all these things and I view them as showing up in all these different ways.
[15:37]
B
So I mean, Redis has been around for a number of years. Were there specific things that had to be extended in the product to support some of these kind of new workloads that we're seeing from agents or other types of applications in the AI realm that weren't already available in prior versions of Redis?
[15:58]
C
Not in immediate recent history, let's say this year. I don't think we've been forced to introduce something new that unlocks one of these use cases with agents, but I mean, in the fullness of time. The query engine itself was a module we added to Redis, not a core data structure. And that's the big one for me when I think about what is Redis added to enable some of these use cases. And that's the number one thing, being able to index vectors, query them dynamically with a query language that's like number one, but more recently as an accompaniment to that, or like a different approach, a new approach that we're experimenting with. Salvatore, the creator of Redis, worked on vector sets. And that's a core data structure that sits alongside things like sets and lets you do, you know, similar things to what you can do with the query engine right now in terms of some.
[16:50]
B
Of the support around, like vector search. What is supported in terms of getting those vectors actually into Redis? Like, am I mostly doing that work, like the kind of pipeline work that I would need to get those vectors in outside of Redis, and then Redis ends up being the landing point for indexing and allowing me to retrieve, or is there other things where Redis kind of owns part of that pipeline?
[17:11]
C
So the answer is yes, it is mostly a client side concern in Redis. Current architecture, if you want to put something into Redis as a vector, you know, then you're going to be generating the embedding yourself and it's landing in Redis. As you said, we also have a new product this year called Lancash and Private Preview, I believe it's like a slightly higher level abstraction over using Redis for what we call like semantic caching. And we do have plans to experiment with including the embedding model like within that pipeline of just send things to that product and it will do that part for you. But core Redis and what we're talking about today, as far as what's available, you do it yourself. So that's the first answer. I forget your other point there.
[17:55]
B
Is it primarily a landing zone where the pipeline is being built externally, or is there essentially functionality in Redis that allows me to build some of that pipeline directly within Redis, not having to leverage some other third party tools?
[18:07]
C
Gotcha. Yeah, yeah. Then I think that is the answer to your question, right? Most of that stuff you're building to produce the vectors and then Redis is helping you to store them and retrieve them quickly.
[18:17]
B
You mentioned semantic caching. What is semantic caching?
[18:20]
C
Semantic caching is what we would call caching based not on a deterministic pattern, like a key necessarily, which is how most we all do caching for many years, right? And instead on similarity. So the similarity to an input being the thing that lets us pull something out of the cache. And that's really helpful for questions that we have the answer for already from the LLM. So we've got the exact same question, very close to it, many, many times. We don't have to keep paying a vendor to produce the same answer. So we can use semantic caching for that.
[18:55]
B
Right? If I'm building out an agent to solve some particular problem and I'm using Redis for basically the service kind of the memory for my agent, Am I building, sort of writing the code and the business logic to leverage Redis for some of the stuff, like let's say I want to make sure that I'm not going and overspending on tokens because the same question has come in multiple times or the same task has come in multiple times, I can leverage the semantic caching. Am I essentially building that hook into Redis or is there something where I can hook Redis into, I don't know, existing framework and that's kind of abstracted away.
[19:30]
C
Do we pay you to ask this question? Because this feels like a great plan. You don't have to write that code yourself. You could though. It's quite fun to write it now with Cloud you can just let them go or let it go and do a lot of stuff. However, if you're using many of the popular frameworks that exist for creating agents like Langgraph or you're just doing LangChain stuff and not even using Langgraph, then we have drop in components and some of those are available through external or I should say separate repositories like langgraph Redis. Get some of our like open source langgraph parts. But there is also a library that we maintain a team at Redis that I'm on maintains called Redis Vector Library or Redis VL and that's available on pypy. We've got a lot of components in there that are drop in and a little bit more general purpose. So you might not necessarily be using a particular framework or you've got a Frankenstein of them and you want, you know, message history or semantic hash object, you want to drop into like a Python project. You can do that with that library. So we've got a few different options for you.
[20:36]
B
Do you think that we'll end up moving into a world with a plethora of agent frameworks that are out there where memory will be sort of more of this pluggable thing where it's like, hey, I want to use Redis as my memory provider. I can just plug that in and these frameworks support it almost like some kind of open standard for memory.
[20:55]
C
Open standard for memory. I think it's a really interesting topic because I don't have a good answer on that. However, if you look at the way that frameworks like langgraph work, I think you can see what I would expect to shape up more or less. Right. So similar to how LangChain has a vector store interface that standardizes just dropping in a vector store into a chain, you can also swap out the memory provider for your Langgraph agent. And I think that makes a lot of sense. Having worked with those abstractions, I can tell you it doesn't always work because all databases have these slightly different personality quirks and traits, especially if you think about things like filtering the vector search. So of course vector search, that's fine if we're just talking about just do a similarity search, but when we're talking about do a similarity search and include these 25 different faceted search points around that, that's where it starts to be like, okay, well this database doesn't even support that at all. Or this one has this particular query type for that, but it's different. So it's not always quite as, as smooth as it could be. So that's where a standard could be useful for sure. I guess what I'm really interested in though is just to be like totally brutally honest is stuff like Cogni and MEM0 and agent memory, like products, agent memory frameworks and thinking about that as like, what is that like? Because I've worked on systems like that now and it's just a really interesting area where you can take, you take more of that thing that is related to memory out of the. Just drop in a component and use something place and put it in somewhere central. So anyway, it's a really interesting area of innovation, I think.
[22:45]
B
Yeah. I mean in a lot of ways you do need essentially a layer that's above the storage layer that's agent specific. Kind of like you have a query engine, but you need some sort of, I don't know, memory engine for agents that understand the summarization compaction needs of an agent that go above and beyond those things that are typically supported by like a, you know, a conventional database.
[23:07]
C
Yeah, absolutely. And so I think where that falls currently, and maybe what we'll probably see is agent frameworks are going to try to carve that out, I think. Right. You can see Langgraph already trying to do that with Langmem, where this, you know, storage is sort of a swappable component. And then the intelligence and the engineering around managing all that memory stuff and the cognitive system aspect of it is more like part of the framework's job or the job that it's taking on and trying to do. But it is a big job. Having now worked on that, it's quite a lot. And I think you look at these products that I mentioned before and I mean, you can see why it's a whole like system separate from the workings of the agent, which tends to be more like durable execution or workflow execution, which itself is more complicated than it sounds too.
[23:59]
B
Yeah. Although there is, you know, you can rely on at least, I don't know, a decade so of engineering that's been poured into some of those topics.
[24:08]
C
Yeah, absolutely.
[24:09]
B
In terms of like long term memory, is there different sort of classes of how we think about long term memory? It's not just, you know, here's the, you know, data I want to keep track of. But do you think about those as being in sort of different buckets that you need to be able to manage and treat differently as, you know, Agents progress through their work.
[24:28]
C
Yeah, absolutely. I think if you think about, for whatever reason, when I'm testing out an agent and its memory, particularly I use like, apples. I really like apples, I guess. So I'm constantly talking to agents about apples. Everybody has their own weird thing that they do. But I think the apples thing is like an interesting one. So I can tell an agent that I like apples, right? And that's just a fact. I did tell it at some point that I liked apples. But you could look at it as a fact about me. In fact, if you were to look at it as whether or not it's stable over time, about me in particular, you would find it's pretty stable. It's probably just a general fact, it might be a semantic fact that I like apples in the words of some of the papers that, that break this down into different, like, types of memory. Whereas, you know, if you were talking to my wife about me and apples, she would tell you like, well, he likes apples, right? But which type of apple this week, because it changes every week, which is the apple that I prefer. And that is more like a fact that is very tightly bound to time or duration. So, yeah, last week I did like honeycrisp. Again this week it's more about Cosmic Crisp. You know, I need a little bit more tartness in the apple. And that's more like what you would consider like an episodic memory. Or if we, again, if we were to break these different facts, you could store down into types that could be one type, a type of memory that is time bound. And the time bound aspect is really important because when the agent goes and retrieves, you know, we're, like I said, we're really talking about this process of context engineering where by the end of the process of our engineering and the runtime portion of it, we arrive at the right context for the LLM to give the user of the application or the agent a decent response. Right. And so when we think about how to get to that outcome, how do we get to the outcome that the agent that I'm asking it to buy my groceries or whatever is going to go out and buy the right apple. I don't want it to buy honeycrisp. It needs to know that I want Cosmic Crisp this week without me having to go in and tell it every single time, because that erases the value of autonomy. So it needs to store the fact that I like Cosby Crisp apples with the time so that when it retrieves stuff to build that context, it can order them by time and overlay the most recent facts that it has that are episodic. Perhaps now most people I think would say, well what, well wait, what is episodic? Exactly? And I think you could also look at episodic in a different way, which is to say, you know, last summer I visited Orcas Island. Last summer is also a time bound fact. I visited Orcas Island. So they kind of are very similar because really the only difference with the first one is we just didn't store the time and it's really general. It's like not that useful. Right. So anyway, those are some things I have actually a kind of spicy take on procedural memory, which is another type of memory. We could talk about that separately, but that's my first response.
[27:23]
B
Yeah, well, in terms of what goes into like modeling this stuff though with an agent, like if I need to keep track of when these types of things happen, do I have to basically go through the process of modeling this like a schema where it's like, okay, well I need to think of like I need to track the timestamps and then I might need to factor in some sort of, I don't know, decay scenario on this knowledge over time so that I'm not, you know, telling Andrew about something that happened six years ago that he no longer cares about and is no longer relevant. Like how does essentially do inform the agent to be able to be able to take into account all these different things.
[27:58]
C
Yeah, absolutely. So I remember just a moment ago you were like, well, but we have, you know, with workflows and durable execution and stuff, we have all these decades of knowledge about how to do it. What I love about AI stuff is like pretty much every problem in some way, almost every problem boils down to something that is something we have decades of experience with. So in particular this one, the fact that context, engineering and retrieval are all really a form of information retrieval means that we have tons of experience with this problem. Exactly this problem. So we, we've been putting things into search engines and trying to pull them out and pull out the most relevant things for a long time. And this is one that's, you know, that's been around. If you've ever worked on any kind of product with search, you'll know like at some point somebody is going to come to your desk in the past and be like, hey, you know, it's. The thing is that videos are on sale this week on the site. So I kind of need this boost. The engine should just push videos up, right? It's A dynamic boost based on the type of content that's in the search index. Typically we would also have a stable boost for recency. Right. That starts to essentially downvote stuff that's older. The same thing is true for retrieving information. So the answer is yes, you definitely do, you definitely do want to store more than just the text. You want to store things, structured data, like the time and I would say probably as much as possible. Right. So the time that the user referenced in the memory, if you can extract that into a time, then we can include that in queries later. Then you could order information by the times that users were talking about in the memory, which of course is like, that tends to be more important to them than the time that you created it in the memory system, but which is also important. Right. So yeah, that's my like excited response. There is yes, schemas, dates.
[29:52]
B
We have limited context and even if we're having, you know, we're essentially going back and forth in some sort of session, even sort of the short term interaction that's important to that session could eventually grow outside of the context window and we'll have to trim it in some fashion and so forth. But then at the same time, we also have to take into account sort of what is stored in this kind of long term memory, wherever we're storing that, and we want to take that into account as well. So how do you think about being able to prioritize the sharing of that information? Because presumably, just like a person, recent conversation I had, I probably want to take into account at a higher priority than maybe something that is sort of in my long term memory store, but that's still important. So how do I figure out in sort of the construction of the context, the context engineering piece, how to actually force a prioritization, the short term versus long term?
[30:45]
C
That's a great question because I would say, you know, I feel like, I feel like the answer is still out there because. Because I'm just trying to, you know, screw with the prompt until I get like better retrieval. Many times.
[30:58]
B
Yeah. So still a little vibe prompting.
[31:01]
C
Yeah, I mean, honestly, every time I'm like manipulating the prompt for quote, unquote, context engineering. I do feel as if at the end of the day like I'm writing English text to try to help someone understand. Although that's not even what's happening. Right. So I mean like literally the answer is, you know, I will tend to try to structure the prompt in different ways so that I'm attempting to communicate in the Prompt, you know, this information is long term, is from long term memory. We consider it like durable or we consider it like very important. However, you know, the user just said this stuff, this is the immediate conversation. So you should consider this canonical. If they override something else that they said, whether or not that'll produce the right result, though it's not always doing it exactly the way I just described is sort of a. You just hope and measure the results.
[31:54]
B
I think we're getting a little bit better at turning some of these things into more of an engineering discipline. But there's still a lot of this kind of like iterate test cycles of just sort of, for lack of a better word, figuring out how to manipulate the model to get the output that you need it to produce. In terms of using some of these various places where we're going to bring in additional information, whether that's like a vector store or some sort of database. Two years ago, or maybe it was a year and a half ago, everyone was always talking about RAG and vector databases and things like that. And it feels like we've gotten to a place where we realize that vectors and semantic search isn't the only thing that we need to do. We also need to take into account some other things. So can you talk a little bit about why was there a gap with using or relying solely on vector databases and how do some of these hybrid search techniques play a role with all this?
[32:54]
C
Yeah, this is a great topic as well. I think again getting back to what is the engineering that we're doing in context engineering? Part of it is this sort of English language or not even, you know, language manipulation right within the prompt. That's not all right, that's, that would be prompt engineering. That's part of the, the challenge of writing a good prompt. But the other engineering parts tend to be around information retrieval. And so, you know, that's how I think about this problem. So I think if you just look back at what is that exactly happened back then? This is just my like I've been doing this too long kind of take like I observe a lot of people trying to use a vector database for data that is fundamentally like text based or you know, where we're fundamentally going to get the input that we need to do like an effective keyword search. However, it was very popular and like a lot of this stuff I think feels much newer than it really is. So we tend to like think, you know, like, oh right, of course we would need like something new that not everybody uses like a vector database. To do this search for AI stuff because AI is like new, but at the end of the day, you know, I think Claude code is a really good example of this question or you know, how this really plays out. So if you look at quad code and you look at cursor, well, let's just say some other. Anything really. If you just put anything on the other side and you say, well, okay, this team is building this thing and they chose vector search and they leaned hard into vector search. Vector search is great in cases where the input is going to be less specific, right? And you still want to try to find clusters of related things, right? With vectors, that's what it's good at. Or it's good at when the input is non language or you want to find like we want to do something like image search, but like it tends to be not so much application or let's say data specific. Which of these things is going to be more effective? It often tends to be task specific. So a really good example is I open up my editor or my code agent, whatever it is, which of these two I'm using and I'm exploring a new code base. What I really need is something to help me identify the clusters, the clusters of related things. Because in projects they're often stored in all these files in other places, even though they're actually conceptually related. So you could take a slice through the entire application in a bunch of different files and different directories. Those are the things that are related, not the things in the directory or they're related in a different dimension. And that's the one I care about. So semantic search or vector search can be very useful in that case. I'm exploring a new repo or whatever. But then there's the time that I tell the agent to rename a variable. I definitely don't need, I don't need fuzzy search for that. I just need it to use keyword search and do exact matches on that variable name. So like I said, right, it's the same data, it's a, it's a repository. I'm the same user, I'm using the same application probably, but the task is different and so the depend, the demands on search are different. So I think in some cases like that you can really break it down easily into this is the search type that's appropriate for this task. But there are other cases where we're doing something like, you know, our knowledge base has lots of different structured and like tiered or hierarchical data that sounds, you know, maybe that sounds fancy but like, I'm just talking about like a book, like a book is a book and it has chapter, maybe I indexed several books, right. And they all have chapters. Those chapters have paragraphs. And so there's a paper called the rafter paper where they went through and recursively summarized each of these things when they broke them up to go into to the index. And when you look at that, then we're talking about, well, what do I really need? Exactly, because you know we're going to have in their case right by the end of the paper, sorry, spoiler alert. But what they found is if you do it as a hierarchical search, after doing all that work, you've got everything in the database. In their case, like what they have in the database is embedded summaries of each of those layers. So they have embedded summaries of the collection of things, the embedded summary of the part of the collection of things, and embedded summaries of the leafs or the tiny like paragraphs or whatever or sorry. And in their case specifically, they have the actual paragraph at the leaf node embedded directly and then the other things are summaries. So. But when as a user I go and need to use this application that uses something like that, which I have worked on, the search that we want to do from the agent side isn't so cleanly one or the other, because one of those things is going to benefit from exact matches. That's going to be the text that's directly in there. And it's usually directly in the database in a real application. Right. Not just as vectors, but also as text. So that we can do both types of searches in some kind of hybrid search. And it tends to be effective when you've laid the data out like that with summaries of some things, but actual like content of the others. To do both searches and fuse the ranks.
[37:46]
B
In terms of making this choice of how to do the lookup based on the task, does that have to be an engineering choice that's essentially predetermined, or is it something that you can rely on the model to be able to intelligently decide that, okay, in this circumstance, I realize I can do a key based lookup. So I'm going to talk to some sort of tool endpoint that can do the key based lookup versus something that's more like semantic in nature?
[38:12]
C
It depends. I would say we talk to people about this a lot on my team. This is why measurements are so important. So being the first thing that you should work on in this project is like trying to measure the quality or accuracy, you know, the things that matter to you about this, this thing that you're building so that you can experiment, because it will depend. But my opinion is, my opinion is it also depends on the database that you're using. This is actually another reason why I was like, you know what? Redis is in this position where I could, this is a good thing. I'm going to go back, work on AI stuff because if I have to make two slow searches to do a fusion or, you know, these days a lot of databases will just do the fusion on their side still their data is on disk. Right. So the search is still going to be slow. Even if they manage some part of that on the server side, on the database side, if that's true, it could be a problem either way. Right. So then we could add latency by doing too many searches to try to improve accuracy. And whether that's. We do too many searches and we do them every single time because we do it deterministically. We don't let the model choose. That's a real problem because it's like every single time, every turn of the, every interaction between the user and the user and the agent that spawns off a series of tool calls from the model or whatever happens. But as a tool call, that often works really well. I think it depends on the model, you know, but like, I've had a lot of success with moving things into tool calls so that you can break it up and it will often make good decisions. But you don't know unless you've measured, you know, how that performs over time.
[39:47]
B
Right? Yeah. I mean, all of these things come down to, hopefully you're not just kind of, you know, putting your finger in the air, in the wind and measuring it that way. But yes, it's sort of more formal way of testing and actually evaluating it. In terms of what you were talking about before, around hybrid search for any of these kind of memory systems that we're talking about, do you really need to be sort of building these, thinking about what you're building from scratch and sort of modeling it from the beginning, or can you leverage, if you had this data already, some of this data already somewhere in a database, can you leverage those systems or do you really need to be thinking about how to get that data in a form that can be easily consumed and serve the AI use case?
[40:30]
C
I think you do. I think you do need to be thinking about it because like with other data engineering specific problems, data is almost never in the right form for Anything. Application developers like me will just cowboy through and put a bunch of junk in the database. And like, you just expect that that's good, operationally it's fine. But then you try to use it for anything and it's like, why, why did you put all this stuff with commas in like one feel like it's just ridiculous. There's no way to split these up without knowing. So it tends to be in the wrong format. And that's particularly true with AI stuff where, you know, like I was describing earlier, we could imagine that, you know, I have this data set of books that I have licensed and purchased. They are not, you know, pirated books. And I'm putting them in a database, right? So I could just put them in, you know, like that. I could literally just put them all in one, one record, one document, you know, my document database or Redis, and it's all just text. But you know, that's not going to work for, for many, many different things, but especially not for AI stuff. So you really need to look at what you're doing with AI, you know, we would consider, we would call it chunking, right? So like with the Raptor approach, where they break these things up into pieces and then recursively summarize the different levels of them, that's really the kind of thing you need to be thinking about. Even if the data already exists, taking the data that already exists and chunking it out like that, depending on, you know, the type of search that your age is going to be making, do.
[42:02]
B
You have to be, especially with vectors, thinking about what your, perhaps your re indexing strategy is as well, where if you need to reprocess, let's say a website or web page, the web page updates, then you need to reprocess it. What are the strategies for actually updating the vector store that has the chunks of that page? Do I need to blow away the original copy of it, inject the new data and re index, or is there a better way of handling that update or upsert?
[42:34]
C
So it depends on, you know. Well, the answer is it really depends on the database and even within the database, how you're storing the data. And I'm thinking specifically of Redis, right? So with Redis, you know, you can use a hash and store multiple fields, in which case the vector is just one of the fields. And then if you have the ability to, given a particular input, know that the representation of that in Redis, let's say as a hash, is different from the source now, right? So if you like, sorry to reuse hash, but if you can hash a set of inputs that's like stable, it will tell you that that's the same web page but the content has changed. Then yes, I mean you can just reindex, you can just change part of the hash and if you're using like the query engine in Redis, it will re index that hash with like the new, the changed value. For example, JSON is the same thing. So Redis now has a JSON data type and it works pretty much the same way. Right. We can, you can index specific fields in the JSON. So there's that. But obviously, like I just said, there's a couple things you have to think about. How do you, how do you map them? Usually it's hashing stuff to get the same, you know, record and then whether the database can support that. And honestly I would struggle to think of a database that couldn't support that strategy.
[43:48]
B
What do you think is the next frontier around these like memory systems for agents? Like what is kind of missing from besides some of the things that we talked about of just like, hey, we gotta like sort of vibe our way through the context engineering but like, you know, is there sort of core pieces of like the data infrastructure that's missing that would help us solve some of these problems with getting some of these agents to perform really well in production systems?
[44:15]
C
Yeah, absolutely. There's the big missing piece. I think there's a paper this year that Google did Google DeepMind research and they did a paper a lot of people probably have read. I don't know, I don't know how that works. Only some people stay up at night reading academic papers. I guess if you do general agents need world models. This paper is haunting me. I don't know why. Probably everybody at work is like, would you shut up about the paper? It's like really not that long. Like we don't need to hear about it every day. But it haunts me because just for fun, earlier this year I was like, I want to make an agent that just plays text games because those are fun, like making a text game for an agent and then being able to show people, oh, this is a text agent, text game playing agent and like it uses memory to like learn. That sounds like a lot of fun. And it was fun until the agents didn't actually improve all that much. And so this is what got me thinking about this and then it aligned perfectly with what I see in this paper. So the problem is what I actually started this conversation with because it's always on my mind. I'm thinking about building, you know, an agent over the next few months that'll interact with live infrastructure and make changes. That's very similar to an agent playing a text game. And the thing that's similar is to survive that environment, to do things successfully, the agent needs to predict how an environment will change. That's fundamentally not about predicting the language that will represent that afterward in ways that it's going to actually succeed at predicting the next state change if it does something. So agents that are playing these text games, you could do a lot of context engineering. You can improve their performance. But by golly, with, with the amount of work that you could pour into these things, they should be able to pick up a game and generalize on what they've learned from a past game. And often that's not true. And that's the same problem that any agent is going to run into.
[46:05]
B
It's almost like a reinforcement learning, but for exactly the agent. Yeah, right.
[46:09]
C
Well, that's the problem is. So it's true that. And in fact text world, this framework I use for this agent is like for reinforcement learning. Because the thing is, you know, if you know exactly what the agent's going to do and the environment it's going to work with, or in other words, in a game playing agent, the specific game it plays, then you can generate interaction data, you can generate the, you know, the state transition data to be able to do like reinforcement learning. And it will become better at that game, but not necessarily at other games. And that's the whole, that's the general agents need world models because it's just not realistic to think that every time a general agent encounters a new problem, it's like, well, what are we going to do? Are we going to just shut everything down and spin up a new reinforcement learning cycle and teach it how to do that one thing, because things change all the time and even within one environment they change a lot. So I think we still have to figure that out.
[47:04]
B
Yeah. And then I think also, you know, going back to your sort of management of infrastructure example, besides like the learning that would have to happen, there's also probably a number of other memory challenges are going to go on with that where it's going to be this kind of long running continuous thing where it probably has to long wait cycles for certain things to happen. You know, there's a lot both from a distributed system standpoint and also from how you're managing the state with the agent over these long running processes. That you would have to figure out. There's a lot of complexity with making that agent actually be something that you feel confident unleashing within your infrastructure.
[47:41]
C
Yeah. Somebody was talking about this agent, whether this sounds like a good project. And I was like, that sounds like a project that absolutely is going to fail. And I am all in because that's what's exciting. If it's like, I don't really see. I don't see the path yet to that working really well. So I'm very excited to find it. But, yeah, I agree with you.
[48:01]
B
Yeah. Well, awesome. Andrew, as we start to wrap up here, is there anything else you'd like to share?
[48:06]
C
No, I think we covered a lot of things that are just on my mind a lot about agents and memory, and we had a really good conversation. It was awesome to chat with you and kind of dig into the specifics and talk about my fears and my dreams.
[48:17]
B
Fantastic. Well, thank you for sharing your expertise. I enjoyed the conversation as well. Cheers.
[48:21]
C
Excellent. See you.