
Contextual memory in AI is a major challenge because current models struggle to retain and recall relevant information over time. While humans can build long-term semantic relationships, AI systems often rely on fixed context windows,
Loading summary
Daniel Shalef
Contextual memory in AI is a major challenge because current models struggle to retain and recall relevant information over time. While humans can build long term semantic relationships, AI systems often rely on fixed context windows leading to loss of important past interactions. Zep is a startup that's developing a memory layer for AI agents using temporal knowledge graphs enabling agents to retain long term contextual information age. It was founded in 2023 and was part of the Y Combinator batch of winter 2024. Daniel Shalef is the founder of Zep. He joins the show with Kevin Ball to talk about the challenge of contextual memory in AI, temporal knowledge graphs, ambient AI agents, and more. Kevin Ball, or K. Ball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI Discussion Group through latent space. Check out the show notes to follow K. Ball on Twitter or LinkedIn or visit his website K Ball LLC.
Kevin Ball
Daniel, welcome to the show.
Daniel Shalef
Thanks for having me Kevin.
Kevin Ball
Yeah, excited to get into this. So let's maybe start. Do you want to introduce yourself and Zep and what you all are all about about?
Daniel Shalef
Yeah. So I'm a software engineer turned founder and this is my second startup and my first AI startup. Like many, we're just under 2 years old and our focus is on enabling the agentic future. And the way we're going to do that, or the only way we can do that is to ensure that agents have the right information available to them at the right time. And so Zepp is a memory layer for agentic applications and we focus on mid sized companies in the enterprise and it's a very exciting space to be in.
Kevin Ball
So let's maybe define some terms here because while some of our audience is familiar with all of these different things, maybe not everyone is. So when you say the agentic future, what do you mean by that? I feel like agent is a term that different folks mean different things by that. So how do you define agent?
Daniel Shalef
Yeah, it is so nebulous, right? There is a definition for agents. It's just become pretty loose. And the way I think about it is that AI agents today have an LLM as their brain, but they also have the ability to autonomously interpret instructions and make decisions, which is a very important aspect of agents, and then take actions to achieve goals that have been set for them. So those are the three high level attributes that I look at when I think about what agents are. When you dig a little Bit deeper into it though, there are several high level components or several needed components for an agent to be able to actually do what I just described. And they have tools, for example, for taking action. And tools might be something like querying the web, so searching the web for answers. A tool might be actually taking action in a line of business application, so generating an invoice. And the other important aspect is for an agent to be able to reason and understand what to do next and make a decision. They need to have broad understanding of the environment that they exist in. So what is the user's world if there's a human in the loop? Or what is the business world if they're purely reacting to changes in business state or our personal state like our home? And they're going to have to have a very broad understanding of that. And so memory is important. So being able to recall what they did in the past and being able to then plot a way forward based on this new, maybe new stimulus that they've received. And memory really enables planning. And that's the other big important aspect. And there's something called a perception action loop when we talk about agents. And so agents look at stimulus. So maybe a human's request or some event that is coming from their environment, they look at their memory, they perceive this, and then they decide to take action based on that using the tools that they have. So that's just a very high level view of the way I kind of look at what agents are.
Kevin Ball
That's really helpful. So let's maybe dig a little bit deeper on what those pieces are. Tools, I think, is not going to be the big focus of what we're talking about here. But maybe first defining a little bit that's essentially function calls. Right. Making action using.
Daniel Shalef
Exactly.
Kevin Ball
Somewhere.
Daniel Shalef
Yeah.
Kevin Ball
Okay. And then reasoning LLMs like we're all kind of familiar with what these things enable memory, that's where you are focused. And I feel like it might be worth us kind of defining that a little bit more carefully. Is this just context that you're dumping into your LLM prompt in some way? Like what do you mean by memory?
Daniel Shalef
Yeah, so I like to think of memory as quite expansive and there are multiple types of memory that we look at when we think about agents. There's a short term memory and short term memory is merely what is happening in the current. If you're talking about a human in the loop agent, what is happening in the current conversation? What is the user just asked, what did they say previously? So what do I note? You can think of it as a human's short term memory. But we also need long term memory. And if you think about how the human brain works, long term memories are often processed in some way, and they're put into a big data bank or memory bank for recall. And there are various different types of long term memory as well. We could remember how to do things. So there's procedural memory, but then there's also semantic memory where we build relationships between different events in our lives or things that we perceived and connect them all up and those go into the data bank as well. And so those are various types of long term memory. There are more, but those are kind of the big, high level aspects of long term memory. And if we double click on semantic memory, this is where things get really challenging. Because humans have a uncanny ability to draw connections between various parts of our lives and our experiences. Sometimes we get it wrong. You know, I forget people's names all the time. I forget what I ate for breakfast. But most of the time we were able to file it away and recall it over very long periods, which is a marvel. But agents are going to have to have something that approximates that to be able to work effectively. Not only that, but the promise of AI is the ability to process vast amounts of data and make sense of it far more than humans are able to conceive or understand in a very short period of time.
Kevin Ball
Okay, so that makes sense. And I think I understand why you double clicked on that particular area, because that ties into this concept of a knowledge graph, which is, as I understand it, what you all are building. So let's maybe talk about knowledge graphs. What are they? Is that this is the implementation of those semantic links.
Daniel Shalef
Yeah. So knowledge graphs are data structures that allow you to semantically model complex relationships. And they typically contain something called a triple, where you have two entities or what are called nodes in the graph, and those are things. So you could say a person is an entity, or even a concept could be an entity. So you have two of those, and then between them you have a relationship. And this is called an edge in the graph. And the relationship describes what an entity to entity relationship is about. So that's what the edge does. So that's a knowledge graph in a nutshell. And why they're very useful from a memory perspective is that we can build very dense and well described data sets, semantic data sets that are a very good fit for, for retrieval. So graphs allow all sorts of interesting approaches to retrieving data. There's actually a very mathematical approach that one can take to retrieving data from a graph and traversing the relationships that are available. And there are all sorts of wonderful algorithms that have been developed that allow you to interrogate the data and also do so in a very intuitive way. So I absolutely love knowledge graphs. I think they're an amazing data structure for condensing information. The challenge with building knowledge graphs has always been defining the ontology or the types of things that you care about, and how those relationships between the things can take effect. So what sort of relationships are allowed in knowledge graph? Defining ontologies has always been a very challenging endeavor. And then also building the knowledge graph has previously had to be done manually or in some sort of not very flexible way. But with LLMs, we have an amazing new opportunity to interpret data at scale and extract entities from unstructured text or even structured data and understand their relationships in ways we've not been able to do before. And again at scale, because previously it was hand building knowledge graphs.
Kevin Ball
So let's maybe dive into that. What does it look like to. Let's talk about ontology. Right, so defining your categories of things. When you say with an LLM, you can just sort of automate that away, Is the LLM defining your categories? Like, how are you thinking about that piece of this?
Daniel Shalef
Yeah. So let me speak to Zepp's open source Graffiti library. What's unique about Zepp's Graffiti library is that it has a temporal dimension. But let's not dive into that yet. What I will speak to, however, is how Graffiti is able to build an ontology on the fly for data, which is very, very useful when you're working in a domain that isn't necessarily as contained or bound as we've seen in the past. So if you are building agents that interact with humans, humans say the darndest things.
Kevin Ball
Yes. So do LLMs for that matter. Like you get all sorts of bizarre things coming out of them?
Daniel Shalef
Well, yes, yeah. And so if you are, for example, trying to understand human preferences as an agent that maybe controls a home, building a well defined ontology can be extremely challenging. And that's one of the things that has encumbered developers in the past. It's very challenging to fit your data into an ontology. And so Graffiti builds this on the fly by doing named entity recognition across the data that it is seeing. So it is determining what the things are and then intelligently understanding those relationships. Now that could become very, very problematic at scale if you're not being clever about deduplicating the things. So these Two things are alike. They might be differently named, but conceptually they're the same things. So we should use the same type. Secondly, these relationships are very similar and so we should use the same labels for the relationships rather than generating an entirely new label, because otherwise your graph becomes very difficult to query into if there are many different types of entities that are very much related.
Kevin Ball
So let's maybe talk about type, because I think one of the things I saw on Zepp's website is you talked about pulling out strongly typed data from raw chat information. How do you think about that type extraction and are those types themselves then morphable? Because I might have a conversation where I name three things about this entity and then sometime later the agent is having another conversation and discovers, oh, there's a whole slew of additional data that's attached to this type of entity.
Daniel Shalef
Yeah, yeah. So I spoke a little bit about Graffiti's organic development of an ontology. Graffiti also allows developers to define types or custom types that are well described, I. E. Developer can specify what the type is about. So this is a person, and a person does xyz or this is a customer, and a customer is defined as follows. And a customer has a company name, a first name, a last name and several other fields. And that takes things to the next level in terms of being able to have a well understood ontology. And you can do so very simply using pydantic. Graffiti is written in Python and you can do so very simply using pydantic models, which is something that developers have become very accustomed to when working with LLM APIs such as OpenAI's structured output. And what that allows developers to do is build an ontology that is better structured for their particular use case. But Graffiti will still extract entities that don't match existing types. And so that allows you to get the benefits of this organic development of and organic development and well maintained ontology, but also have structured types that make sense for your particular business.
Kevin Ball
Now, if you have, for example, a user type that you've defined and Graffiti is interacting with an end user of some sort, and it discovers, oh, there's a whole bunch of other contextual information that keeps showing up around users that it thinks is probably useful, does it create a new type, overlaying users, does it extend your defined user? Like, how do you think about this? Sort of like the fluidity of when you're interacting with people, when you're dealing with the LLM, extracting things like it can come up with all sorts of stuff.
Daniel Shalef
Yep. So it would likely create associated entities with the user entity that describe an additional relationship. So for example, in the company type that I described earlier, if we hadn't put first name and last name of a particular contact at a company, it might create a contact entity type. And that contact entity type has a.
Kevin Ball
Name that makes a ton of sense. So you have a set of types that you've defined as a developer that you are pretty guaranteed these are going to have these fields in this way. And then there's a set of types that Graffiti is interpreting, creating on the fly, relating to other types. If, for example, it's a Graffiti owned type and it discovers here's some new fields, will it modify the underlying type or it will do the same trick of like, okay, this is a contact and now we have a contact phone tree because we know there's a set of phones or I don't know what the additional entities might be, but yeah.
Daniel Shalef
It would probably a thing would be a phone number. So phone number would be a thing would create an entity for the phone number and relate it to the user or the company. So it's pretty clever that way. You mentioned changing data though, which is where Graffiti really shines. And I, I hinted at that a little bit earlier with Graffiti is a temporal knowledge graph. And you know, it's a little bit of a mouthful where Graffiti is specifically designed to deal with temporality and dynamic data. And this is very unlike other RAG frameworks. And in fact, I like to think of a major shift coming in terms of how we view building LLM based applications. Well, it's actually here already and that is we've shifted from these Q and A chatbots. So question answers over a document corpus powered by semantic databases providing RAG and RAG frameworks. RAG frameworks work really well with static data. But the way I describe agents and the way I strongly believe agents exist in this new world is in a sea of dynamic data. And I think we're in a post rag era now. Look, RAG didn't last very long, three years. But we're in a post rag era now and Graffiti is a post RAG framework. It deals with dynamic data and it deals with or supports a constant stream of unstructured or structured text, structured objects like JSON. And it's able to integrate this data into the graph in a way where it is understanding whether the new data or knowledge that it has created from the data is in conflict with existing knowledge in the knowledge graph. So, for example, just a very stylized example, I purchased a pair of Adidas shoes Six months ago from an E Commerce agent and the shoes fell apart. Sorry, Adidas. I shouldn't have actually used the brand. And I send them back to the return system using the return system. That's not part of this particular agent that I spoke to. It's a different agent that manages returns. And I sent a nasty gram back. I'm very upset that the shoes fell apart. And so my previous brand preference, because I'd had a conversation with the E Commerce agent and said, hey, I love Adidas shoes. So my previous brand preference was noted as being Adidas, but now I've sent these shoes back with a nasty gram saying, I'll never buy your shoes again. I'll never buy Adidas again. My brand preference has changed. And so when we integrate that knowledge, the stream of JSON from the returns agent into our memory store, we've got to update that brand preference. And that's what Graffiti does. It understands that the brand preference has changed and that it needs to invalidate that previous relationship where Daniel loves Adidas shoes to Daniel loves Adidas shoes, fact is no longer valid.
Kevin Ball
So this is interesting and there's a bunch of different pieces we can explore on this. I guess. First off, just like at a very vanilla layer under the covers, does this essentially look like timestamps on any piece of data of when it was invalidated or when it became known and when it became invalidated? Or like, how is this implemented?
Daniel Shalef
So Graffiti, and I'm going to get a little bit technical here, has a BI temporal model and it has both episodic memory and semantic memory. And we're actually in Zepp's implementation, we have also of Graffiti, we've also implemented procedural memory. So episodic memory is an event or a chat message or similar, and you send it to Graffiti and it has a create a date, the event that you created that episode. And you can think about it as, you know, the episode might be a conversation that we're having today, it might be a single utterance in the transcription of this particular conversation. And it has a timestamp and that becomes the created timestamp of any sort of relationships that we add to the knowledge graph. But I also might have mentioned that I purchased a pair of Adidas shoes six months ago. And so there's a two entities there. Daniel, Adidas shoes purchased. And that relationship has a created date from today, but it has an valid at date from six months ago. And then when we now send the shoes back and we say we're really angry, we create a new Episode that's a JSON data event coming from the returns agent. And we update the knowledge graph by integrating this new information in Graffiti parlance, there's another date that we can add to the edge or relationship. The fact that Daniel no longer loves Adidas shoes. So we can invalidate that, which is an invalidat date. And this is how we capture the time dimension of changing state. And it allows Graffiti to reason with the new event data that it receives.
Kevin Ball
So I love this distinction between semantic memory and episodic memory. When you put those timestamps in place, are those also a timestamp and a link to the episode that corrected in this change?
Daniel Shalef
Okay, yeah.
Kevin Ball
Okay, interesting.
Daniel Shalef
So in the graph, you have an episodic node, then you have entity nodes that are related to that episodic node. And so what you end up with is really interesting. You have multiple episodes that link to the same entities, and you can see longitudinally over time state changes. And very importantly, Graffiti allows your agent then to reason with state changes. Oh, Daniel used to love Adidas shoes, but he's back again and he wants to purchase some shoes. I can still see that he is a roadrunner, pronates and has wide feet, but the fact that he loves Adidas shoes is now invalid, so I won't recommend those to him. And he just mentioned that he thinks he wants to try out the Puma shoes. So we can add another entity to the graph that there's potentially a preference for P machines.
Kevin Ball
Yeah, yeah, yeah. Really interesting. Okay, so next question related to this is how do you conceptualize, like, multiplayer or things like that? Right. So, like, in this case, let's go back and just say, you haven't sent the nasty gram yet. So we know Daniel said Adidas is great. I love Adidas. Maybe Kevin said Adidas is terrible. Is that tied to each individual? Like, how do you consolidate knowledge? And is there a way in which, like, if you have specific user graphs, can they be consolidated or shared in some way across a set of users or an organization? Like, how would you deal with those layers?
Daniel Shalef
Yeah. So in Graffiti, you would, in the data that you provide, ensure that Graffiti understood that there was a different speaker or the data was related to a different user. You could do that in the JSON if you're passing a JSON event in. Or you can do it in unstructured text in a transcription format. Maybe this is a good segue to speak to what ZAP is versus Graffiti and how things work in zap. So Graffiti is a framework for implementing temporal knowledge Graphs. It can be used as a memory layer, but it's a very generalized framework. ZEP has first class support for projects, users, sessions or chat history threads, role based access control, data governance and privacy functionality that's all layered on top of graffiti. Plus SDKs for Python TypeScript Go. So it's an end to end memory service that is framework agnostic and we have the ability to implement Zep within langgraph agents, within Autogen or without any agent framework. Here in Zep you can have user based knowledge graphs and by default if you have single user agent interactions you it would go into a user knowledge graph and it's well contained and ensures that user data is managed appropriately from a privacy and security perspective. But ZEPP also has the concept of group graphs that can be used for multiplayer scenarios. So for example, you can stream Slack messages into a group graph and query the group graph independent of users specific users. So get results back from multiple users. And so there's a lot of flexibility there. So my answer is yes.
Kevin Ball
Let's maybe talk a little bit about then what this looks like from a developer standpoint. Because I think one of the things that was interesting to me was this idea about like, yeah, automatically inferring, changing things like what am I sending to Graffiti or to Zap and what do I get back? How do I query? Like what does that actually look like for a software developer?
Daniel Shalef
Yeah, so in Graffiti and I'll speak to Graffiti because I love the fact that it's open source, it's easy for developers to implement as well. So let's talk about Graffiti quickly. Any unstructured text, JSON or even transcriptions you can send to Graffiti. It's a very simple API and you can create graphs on the fly. You can have multiple graphs in Graffiti. So you can approximate what we do with Zep in terms of putting a firewall between users name spacing your user graphs and what it looks like on the query end is let me speak to a design decision that we took. Developers don't have to learn Cypher or some other graph query language to query Graffiti. There are two reasons why we did this. One, we think that we can provide developers with a very powerful search framework without requiring immersion in a language like Cypher, which is a common language for knowledge graphs originally developed by Neo4j. And the other design decision that we made is we're not going to have an agent or LLM in the query path because that's super slow. So Graph Rag can take tens of seconds to get a response back and Graph Rag is that static RAG framework that is built around a knowledge graph and was developed by Microsoft Research. And it is slow because there are all sorts of things that it's doing in the query path with LLMs, and that doesn't really scale in production. It means that you can't use something like Graph Rag for voice agents. And those are becoming more and more common. We don't want to be sitting at a keyboard and typing. We want to talk to a device, for example. And so Graffiti offers a number of different ways to retrieve data. And we do so in a way that's super scalable. So the entry point into a graph is typically through an index, not through a graph search query where you have to then traverse the graph to find things that you're looking for. And these indexes are semantic and full text indexes. And so what we've done in Graffiti is every time you add new data to the graph I mentioned, you have entities and you have edges between those entities. We place a fact onto the entity. Daniel loves Adidas shoes. And that gets indexed both semantically and with, from a full text perspective, with BM25 indexing, we also generate summaries for those graphs, which is for the entities in the graph. And those summaries basically are summaries of all the relationships that a particular entity has. So Daniel loves Adidas shoes. Daniel pronates. Daniel has wide feet. Daniel is a roadrunner. And we index the summary. And so that allows you then to search edges and nodes semantically and with full text or both. And your entry into the graph is in near constant time as a consequence. And so you're querying subgraphs. So you're querying subgraphs rather than the entire graph.
Kevin Ball
Got it. Okay, so let me make sure I understand this. So first data entry, you throw over your stream of messages or documents, whatever. This doesn't have to be fast. So this is when you use an LLM to do extraction or do whatever other things that you're doing there.
Daniel Shalef
It's done asynchronously.
Kevin Ball
Yes, done asynchronously. You can do a lot of essentially pre computer of all these different things. And some of you pre compute a set of summaries, you put things into an index and the index also then points to the nodes. Then query time, you go straight to this fast index. It loads up for you a summary of a set of things that you can throw into your agent's context right away. And a sub graph. If you want to do more searching, that's there for You?
Daniel Shalef
Yep. Well, it's not just the summaries, it's also the facts on the edge, which is pretty well defined. And then you can also do graph search based on that. So things like retrieve the nearest nodes to the node that was a hit. So using something called Bread first search, which allows you then to get a more comprehensive view of the relationship that you have retrieved from the graph. So you can see adjacent entities and their relationships.
Kevin Ball
And that's done explicitly like, I will get back and I can then do this or is this that done as a part of that internal first query?
Daniel Shalef
So Graffiti has a number of different recipes that have been developed that are kind of like best practice pipelines, but you can also create your own pipelines. And the types of things that you can do are pull together BM25 and vector search and use something like reciprocal rank fusion to join the two search results and then use a LLM RE ranker to re rank the results. Or you can use. We have graph based RE rankers as well. So you can, for example, re rank results from distance from a centroid node. So if you want to get Daniel specifically facts back, you can say re rank all the facts by semantic. Well, looking at the semantic search results, re rank by distance from the Daniel node. And so these are all baked in as simple recipes in Graffiti. And they're super powerful. In Zepp's implementation of Graffiti, where we run our own embedding and re rank her services, our own GPUs. You can search a graph in under 200 milliseconds. And the most expensive pipelines have a P95 of 300 milliseconds.
Kevin Ball
This is super cool. I'm doing some pieces of this type of work right now and I'm like, listening to what you're saying. I'm like, holy smokes, I got to get my team on Graffiti. This is cool.
Daniel Shalef
Yeah. So we actually benchmarked. We published a paper last month that describes how ZEP works and a deep dive into Graffiti. It's available on arXiv and I think we can probably link to it in the notes of this podcast. And we benchmarked Zep and Zep's implementation of Graffiti against the state of the art, the prior state of the art in memory, which is MEM GPT. We actually found that the MEM GPT Evaluation suite was too trivial. And so we used that to compare ZEP to Graffiti, sorry, ZEPP to mgpt. And ZEP was a far stronger contender there. But we also selected a far more comprehensive and larger evaluation benchmark called longmeval. And as the name implies, it's for long memories and evaluating performance of retrieval across long memories. Recall across long memories. And by long I actually refer to 100/000 token memory. So filling up a context window and ZEPP outperformed the baseline of putting the entire conversation and business data into the context for both GPT4O and GPT4O. Mini outperformed GPT4O by 18%. Mini by a larger margin across a battery of 20 or so evaluations and some evaluations where temporal understanding was required, it outperformed recall over the entire context window by almost 100%.
Kevin Ball
This makes total sense because you can essentially pre compute the relevance and you're only feeding in what is actually going to be really useful and relevant to the LLM so it doesn't get distracted with all the other noise and all these different. No, it makes perfect sense.
Daniel Shalef
Yeah. So the needle in the haystack problem has not yet been solved beyond. Look, I don't like betting against models and the future of LLMs or any other architecture, whether it's diffusion or some other unknown architecture. However, today the recall problem is not solved. And even putting aside improved recall, what we found is that with stronger models, ZEPP performed better. So having your agent running with GPT4O vs GPT4O mini improved its understanding of the context that Z provided. So there's the promise of being able to offer far denser sets of information to the LLM and for it to then be able to make more cogent decisions based on.
Kevin Ball
Absolutely. And you can load up only the relevant parts of the context for it.
Daniel Shalef
Exactly like that.
Kevin Ball
I agree about not betting against models, but I mean, if you think about the way that we work as humans, like we get distracted when you put too much in front of us as well. Like I don't expect the models to be that much better at it.
Daniel Shalef
Exactly. In fact, there's mental health conditions where if you have a storm of memories come back and other things that flood your consciousness, we also have issues. But the other dimensions that are very important to recall are latency and cost. And so as you described, if you're only pulling out the most important and relevant information that the agent needs at that moment, you can reduce cost and latency dramatically. So one of the other dimensions that we benchmarked in the paper were latency, reducing latency by 90% and reducing token cost by 98%.
Kevin Ball
This makes a ton of sense. So all this makes sense. I love it. I actually really excited learning about this. What are the big problems you still see in agentic memory? What's not solved yet and what are you kind of working on looking forward?
Daniel Shalef
Yeah, so we're very focused on production and not research. And so when you're building systems for research, you don't have to worry about real world consequences. Putting services in the wild. And so what the benchmarks demonstrated to us is that there's still work that we need to do on improving graffiti and zap on a number of different dimensions. So there's ongoing work there. We also recognize that in large enterprises in particular, pushing data into memory is fraught. And so there are in production you have all sorts of other things you need to wrap around a service like, like a memory service that touch privacy, that touch data governance, that touch security and touch many different parts of the business that you need to get buy in from. And so the real world consequences of building comprehensive memory for enterprise agents aren't just in the technology itself, but in being able to demonstrate that the technology works along many, many different dimensions. Not just that it creates the right memories, but that you ensure that memories, data gets removed when it's supposed to get removed, that it's secure, that it is safe, et cetera.
Kevin Ball
So in a lot of ways what I'm hearing is your focus right now is navigating the gap between cool prototypes and production.
Daniel Shalef
Exactly, yeah. And so we have a lot of earlier stage pre IPO companies as customers. Many of them have got compliance requirements as well. So we're SOC 2 type 2 certified, we are working on HIPAA certification as well. We've seen a lot of interest from the healthcare domain. There is so much opportunity there to for automation. But you know, in the enterprise we're still seeing folks moving from that prototype phase to stack selection, so building out reference stacks that can be rolled out globally across these large enterprises and there it's pretty early for large enterprises and agentic adopt adoption.
Kevin Ball
Yeah. This highlights one of the reasons it takes a long time for all these amazing things that we're seeing with LLMs to trickle out into impacting the whole world. Now with those customers that you have right now, you probably have kind of an inside view on what's happening in the agent world. Can you share anything about like what are the domains in which we're really seeing cutting edge agentic applications?
Daniel Shalef
Yeah. So something that I'm finding most exciting is a rise of ambient agents. We've spoken a lot about human in the loop. Daniel wants to buy some shoes, etcetera Ambient agents are super cool because what they're doing is they're just monitoring their environment and taking actions based on these changes in their environments. So you can think about an agent that is monitoring telemetry from a car and being able to understand that telemetry and provide proactive support to the driver around what they should do when something goes wrong. You can think about in a household environment, home automation, learning intelligently from occupation sensors, learning intelligently from your personal calendars as to when you're home and when you're not, coupled with present senses for pets, being able to turn lighting on and off, adjusting the AC in the future, making food on time, responding to events like presence when there isn't supposed to be presence, etc. And so ambient agents offer these amazing opportunities for human agency because they're taking over a bunch of work that we needed to previously and can extend our consciousness as well, our understanding of what's happening in our environment, but also are one of the biggest concerns when it comes to what the future holds from an AGI perspective. So a lot of the time in the popular media and when people talk about AGI, they're actually talking about ambient agents and agents taking actions, not necessarily without human oversight. And so that's a lot of fun and very interesting. And we've been talking to actually some larger companies thinking about ambient agents. Then we've also seen a lot of development I mentioned in healthcare around really automating things that were very, very expensive processes that were very expensive to run, things like insurance claims, understanding coding of insurance claims.
Kevin Ball
I don't know if humans understand that part. Right.
Daniel Shalef
Like, much less in a prior life. I investigated medical billing and it's an incredible environment, such an adversarial relationship between healthcare providers and insurance companies that they actually have teams in competition with each other. So that's. But that's a different discussion. We're seeing a lot of use across B2B and B2C of agents and everything from mental health applications for consumers, e commerce applications, you name it. In the B2B world, there's some really exciting companies that we've been working with that are building. And this again is one of the frightening things about AI building analyst tools to be able to comb through very large amounts of Data for Fortune 500s and the national security industry. So basically analyst work suites that operate autonomously to develop reports and to take actions based on vast amounts of both open source and proprietary data. And so that's really interesting as well.
Kevin Ball
Yeah. Wow. All right, so we're coming close to the end of our time here today. Is there anything we haven't talked about that you think would be important to touch on? Before we wrap up, I want to.
Daniel Shalef
Circle back on is the reframing of what agents are getting a little more precise about that and in particular talking about memory. Again, we have a lot of folks come to us thinking in terms of rag and static document corpuses. I think that's a solved problem already. And there is so much thinking that needs to be put into a post rag world where we're dealing with more dynamic data. And there are both technological challenges there as well as human challenges, some of some of which are mentioned in terms of compliance, but others in terms of technology. There are areas that we're still pretty uncertain about in terms of how we offer really cutting edge capabilities in production environments in a post rag world. And there, you know, we as humans, each of us perceives the world very differently. So our semantic understanding of the world is very different. That's why having different people's opinions is so helpful in decision making, because we see the world with different perspectives. So how are we going to help agents understand the world if we ourselves have different perspectives on what reality is? And that gets into some of the philosophy of reality. And that's an area that I very excited to see people explore. And I think it is incredibly necessary. And it ties into AI safety, it ties into the impact on labor, et cetera.
Kevin Ball
No, there's something really interesting there and it reminds me of work I was pushing for at a previous job that didn't end up getting there. But like when you have these, you're, you're deriving these, I was going to call them facts, but they're not these knowledge entities. When we start talking about trying to perceive reality, we need to move from a single user to like a multi user view. And you almost want like a Bayesian approach to like we have these different supporting factors that lead us to an 80% belief this is likely. And here's the things that might validate or invalidate our priors on this, like kind of having essentially probabilities and support structures associated with our facts.
Daniel Shalef
Yeah, I actually like framing it as a Bayesian problem. If one looks at the graffiti architecture, it is conceptually doing something similar to that when it does reconciliation, where it is looking at priors to understand how to form new memories or facts. But a formal Bayesian layer over something like graffiti would be very interesting. So adding some sort of Bayesian reasoning probability is a very challenging thing for humans to conceive.
Kevin Ball
Absolutely.
Daniel Shalef
And as a consequence, you know, LLMs are incapable of perceiving probability. But there is a architectural aspect of how LLMs work that can help us understand why an LLM produced what it produced. And that is looking at things like logits, the probability of the tokens that were produced. And so there's opportunity there. It would be interesting to explore.
Kevin Ball
Yeah, there's kind of two sides there, Right. There's the what's the probability on knowledge based on multiple views of a thing coming in, multiple interpretations? And then there's even just like, what's the probability on the evidence that we've extracted? Because the LLM itself is probabilistic.
Daniel Shalef
Exactly. Yeah. Honestly, it's interesting. We see that with customers, this looks wrong, and you sit down with them and you say, but this is what it says. This is the input data and this is what it's built. Oh, yeah, yeah, No, I see it now. And so then how do we get it into the structure that you'd like? And that's why we've done things like built custom entity types, etc. Because we perceive other worlds so differently.
Kevin Ball
There's a piece of this too, that like, one of the core ways I think about LLM applications, because they're probabilistic, because they have all these things, is like, if you need reliability, or even if you don't, it's useful to think of them in terms of like, there's an inference step and then there's a validation step, and then you go to another inference or something like that, and that validation might be a human in the loop, it might be some sort of formal checker, it might be a second LLM or some other thing. But what does that look like in a probabilistic world? And how do you kind of insert these validation steps as you build?
Daniel Shalef
So Graffiti does implement reflection on things like entities it's extracted, conflicts it's identified, etc. And we've done this because we started building Graffiti before reasoning models. We actually don't think at this stage that reasoning models are necessary for Graffiti. It's costly, slow. And again, we look at how we scale things in production and cost. I think, you know, folks in your audience who have worked with knowledge graphs and LLMs, and have worked maybe with Graph Rag, will know that it's very costly to produce graphs of any size. And so that is something that we're working on because we want to be able to commercialize this at scale, not just for large enterprises, but Also startups. And so there's a lot of work that goes on there in terms of.
Kevin Ball
The inference that you're doing from you have a stream coming in, you're inferring out these knowledge graphs of different sorts. Can that be done on a range of models? Do you need the top cutting edge models to get good results? Like what does that look like?
Daniel Shalef
Yeah, so we use frontier models for our work. We have not fine tuned any models we might do. So if we have some very domain specific use cases. Frontier models have been able to perform well enough for our use case. We do run our own inference infrastructure for some use cases, not for graph building because of just the sheer scale of it. Yeah. And so we do use Microsoft and other inference services for our use cases. You can use smaller models with graffiti depending on your domain and the complexity of your data. So you could run, for example, GPT4 mini, or you could use anthropic haiku or llama 3.170 B or 3.270 B, et cetera. If you have a pretty constrained domain, you may even get away with using much smaller models. And I find that a very exciting. The prospect of using smaller models is very exciting. I actually think that we're going to start seeing more complex product architectures from inference providers where what we perceive of as a model that we're using actually isn't just a single model. It is a set of different models combined together at different layers of what we would traditionally view as a model and solving different problems.
Kevin Ball
Absolutely. All right, awesome. Well, this has been super fun. Thank you, Daniel, for joining me today.
Daniel Shalef
Thank you, Kevin. It was a lot of fun and.
Kevin Ball
We'Ll call that a wrap.
Episode Information:
In this episode of Software Engineering Daily, host Kevin Ball engages in an insightful conversation with Daniel Shalef, the founder of Zepp, a startup focused on developing a memory layer for AI agents using temporal knowledge graphs. Released on March 25, 2025, the discussion delves deep into the challenges of contextual memory in AI, the implementation and advantages of temporal knowledge graphs, and the future of agentic applications.
Daniel Shalef opens the discussion by highlighting a fundamental issue in current AI models:
[00:00] Daniel Shalef: "Contextual memory in AI is a major challenge because current models struggle to retain and recall relevant information over time."
He contrasts human memory's ability to build long-term semantic relationships with AI systems' reliance on fixed context windows, which often leads to the loss of crucial past interactions. This limitation hinders AI agents from maintaining coherent and contextually aware interactions over extended periods.
Shalef introduces Zepp, his second startup, which aims to pave the way for an agentic future by ensuring that AI agents have access to the right information at the right time. He elaborates:
[01:32] Daniel Shalef: "Zepp is a memory layer for agentic applications, and we focus on mid-sized companies in the enterprise. It's a very exciting space to be in."
The goal is to enable AI agents to autonomously interpret instructions, make decisions, and take actions to achieve set goals, all while retaining and utilizing long-term contextual information.
Kevin Ball seeks clarity on the term "agent" as used by Shalef. Daniel provides a comprehensive definition:
[02:22] Daniel Shalef: "AI agents today have an LLM as their brain, but they also have the ability to autonomously interpret instructions and make decisions... and take actions to achieve goals that have been set for them."
He breaks down the essential components of AI agents:
Shalef emphasizes the importance of memory in facilitating effective decision-making and planning within AI agents.
The conversation shifts to the core of Zepp's solution: knowledge graphs. Shalef explains their significance in modeling complex relationships:
[08:00] Daniel Shalef: "Knowledge graphs are data structures that allow you to semantically model complex relationships... they can be used as a memory layer because they enable dense and well-described semantic datasets that are ideal for retrieval."
He highlights the advantages of knowledge graphs in enabling efficient data retrieval and the challenge of defining ontologies—the categories and relationships within the graph. With the advent of Large Language Models (LLMs), Zepp leverages these models to automate and scale the construction of knowledge graphs, overcoming the limitations of manual or inflexible methods.
Kevin Ball probes deeper into how Zepp automates ontology creation. Daniel introduces Graffiti, Zepp's open-source library:
[10:55] Daniel Shalef: "Graffiti builds the ontology on the fly by doing named entity recognition across the data it processes, intelligently understanding relationships and deduplicating entities."
Graffiti’s ability to dynamically create and manage ontologies allows Zepp to handle diverse and unpredictable data inputs, ensuring that the knowledge graph remains consistent and query-friendly. This adaptability is crucial for applications where data varies significantly, such as in human-agent interactions.
Ball and Shalef discuss the extraction and management of entity types within Graffiti:
[13:10] Kevin Ball: "How do you think about that type extraction and are those types themselves then morphable?"
[15:18] Daniel Shalef: "Developers can define custom types that are well-described... Graffiti will still extract entities that don't match existing types, allowing for organic ontology development while supporting structured types for specific business needs."
Shalef explains that Graffiti not only extracts entities but also allows for the creation of custom, well-defined types using Python's Pydantic models. This flexibility ensures that as new data emerges, the knowledge graph can adapt without compromising its structural integrity.
A significant advancement introduced by Zepp is the incorporation of temporality into knowledge graphs:
[16:50] Daniel Shalef: "Graffiti is a temporal knowledge graph... it can handle dynamic data by understanding when information becomes valid or invalid."
Shalef illustrates this with an example where a user's preference changes over time. Graffiti records both the creation and invalidation of relationships, enabling the AI agent to maintain an accurate and current understanding of user preferences.
When discussing the technical implementation, Shalef emphasizes Graffiti's scalability and efficient querying mechanisms:
[27:13] Daniel Shalef: "Graffiti uses semantic and full-text indexing, allowing for near-constant time queries and the ability to retrieve relevant subgraphs quickly."
He mentions that Zepp has benchmarked Graffiti against existing solutions like MEM GPT and found superior performance, especially in handling long-term memory recall with reduced latency and cost.
Shalef shares insights into the emerging domains leveraging agentic applications:
[40:53] Daniel Shalef: "Ambient agents are particularly exciting... they monitor environments and take proactive actions, such as home automation or vehicle telemetry monitoring."
He also touches on significant interest in sectors like healthcare, where automating processes like insurance claims analysis can yield substantial efficiency gains. Additionally, Zepp is exploring applications in national security and enterprise analytics, where autonomous agents can process vast data sets to generate actionable reports.
Despite Zepp’s advancements, Shalef acknowledges ongoing challenges:
[37:57] Daniel Shalef: "We're focusing on bridging the gap between prototypes and production, ensuring compliance, data governance, and security for enterprise clients."
He notes that deploying comprehensive memory systems in real-world environments involves not only technological hurdles but also navigating organizational requirements and ensuring data integrity and privacy.
Towards the end of the discussion, Shalef delves into the philosophical aspects of creating agentic memory systems:
[45:15] Daniel Shalef: "We're dealing with dynamic data and how agents understand the world, which ties into AI safety and the broader impact on society."
He expresses keen interest in exploring how agents can reconcile different human perspectives and the potential for integrating Bayesian reasoning to enhance probabilistic understanding within knowledge graphs.
The episode concludes with a reflection on the transformative potential of temporal knowledge graphs in creating robust, memory-enabled AI agents. Daniel Shalef's insights provide a comprehensive overview of the technical innovations and practical applications driving the next generation of intelligent systems. As Zepp continues to refine Graffiti and navigate the complexities of production deployment, the future of agentic memory in AI looks promising, poised to revolutionize how machines interact, learn, and assist in diverse domains.
Notable Quotes:
This comprehensive summary encapsulates the key points, discussions, and insights from the episode, providing a clear and structured overview for those who haven't had the chance to listen.