Latent Space: The AI Engineer Podcast
Episode: Long Live Context Engineering – with Jeff Huber of Chroma
Date: August 19, 2025
Episode Overview
This episode dives deep into "context engineering"—the emerging discipline of designing, optimizing, and maintaining the information presented to AI models during inference—with Jeff Huber, Founder & CEO of Chroma. Chroma is a leading open-source vector database and search infrastructure for AI applications. The discussion covers Chroma’s evolution, technical innovations, the philosophy of context windows in modern AI, real-world problems in retrieval, the pitfalls of “RAG” dogma, and the future of memory in AI systems. The conversation is both illuminating and pragmatic, featuring lessons from startup building, developer experience, distributed systems, and personal reflections on meaning and mission.
Key Discussion Points & Insights
1. Chroma’s Mission and Evolution
[00:22–02:56]
-
Origins & Motivation:
- Chroma began as a response to the messy gap between ML demos and reliable production systems.
- Jeff points to the “alchemy” in AI engineering, striving for a more systematic approach.
- “The gap between demo and production didn’t really feel like engineering. It felt a lot more like alchemy.” (Jeff, 00:49)
- Chroma aims to make the experience of deploying retrieval and search systems for AI feel like real engineering, not trial-and-error.
-
Chroma’s Focus:
- They build a modern retrieval engine/search infrastructure for AI applications—distinct from classic search.
- Keys to their approach:
- Written in Rust, fully multitenant, object storage as key persistence tier.
- Emphasis on developer experience and reliability.
2. Modern Search Infrastructure for AI
[03:04–04:30]
- What’s Different about AI Search?
- “Modern” refers to leveraging new distributed systems primitives (e.g., separation of storage and compute).
- “For AI” means:
- The technology is different.
- The workload is different.
- The developer persona is different.
- The consumer of search results is now often a machine (language model), not a human.
- “Humans can only digest 10 blue links. Language models can digest orders of magnitude more.” (Jeff, 03:56)
3. Building in the AI Infrastructure Gold Rush
[05:03–08:08]
- Staying Focused
- Jeff discusses avoiding hype-driven pivots and the importance of “maniacal focus” on developer experience over racing to market with incomplete products.
- Hiring philosophy: build slow, prioritize culture and craftsmanship over headcount or trend-following.
- “We want Chroma’s brand and the craft expressed in our brand to be extremely well known.” (Jeff, 05:50)
4. Chroma Cloud and Developer Experience
[08:33–12:16]
-
Product Metrics:
- Over 5 million monthly downloads, 20K GitHub stars, top usage in LangChain and LlamaIndex communities.
-
Cloud Launch & Zero-Config Ethos:
- Chroma’s onboarding: pip install chromadb—no friction, just works everywhere.
- Cloud experience mirrors the local simplicity: no node-sizing, no “tuning many knobs.”
- “It needed to be like 0 config, 0 knobs to tune. It should just be always fast, always very cost effective and always fresh without you having to do or think about anything.” (Jeff, 10:20)
- Usage-based billing, true serverless.
5. Defining "Context Engineering" and Critique of RAG
[12:16–15:33]
-
The Meme of Context Engineering:
- Context engineering = the task of deciding what information goes into the context window for each LLM generation, and how this process improves over time.
- “Context engineering is the job of figuring out what should be in the context window any given LLM generation step.” (Jeff, 13:10)
- Both “inner loop” (per prompt) and “outer loop” (longitudinal improvement).
- Context engineering = the task of deciding what information goes into the context window for each LLM generation, and how this process improves over time.
-
RAG is Dead, Long Live Retrieval:
- Jeff dismisses the RAG (Retrieval Augmented Generation) label as confusing and unhelpful.
- “The term RAG. We never use the term RAG. I hate the term RAG.” (Jeff, 13:08)
- Advocates just calling it “retrieval” and pushing for clear, unambiguous primitives.
- Jeff dismisses the RAG (Retrieval Augmented Generation) label as confusing and unhelpful.
-
Agent vs. Non-agent Systems:
- Jeff does not see a clear boundary between agent and non-agent applications in context engineering.
- “I don’t actually know what agent means still...Words, they matter.” (Jeff, 15:00)
- Jeff does not see a clear boundary between agent and non-agent applications in context engineering.
6. Context Rot and Model Performance Realities
[15:33–19:20]
-
Context Rot:
- As more tokens are stuffed into LLM context, attention and reasoning suffer—contrary to vendor marketing.
- “As you use more and more tokens, the model can pay attention to less and also can reason sort of less effectively. I think this really motivates the problem. Context rot implies the need for context engineering.” (Jeff, 13:50)
- Chroma published technical research on context rot, showing real degeneration across models at scale.
- As more tokens are stuffed into LLM context, attention and reasoning suffer—contrary to vendor marketing.
-
Marketing vs. Reality:
- Vendors push “our model works with 1 million tokens!” claims, but real agent/system designers encounter limits quickly.
-
Call to the Community:
- Chroma intentionally does not push a product in this research; it’s about highlighting problems so the community can collaborate.
- “We intentionally wanted to make very clear that we do not have any commercial motivations in this research. We do not posit any solutions. We don’t tell people to use Chroma.” (Jeff, 19:20)
- Chroma intentionally does not push a product in this research; it’s about highlighting problems so the community can collaborate.
7. Patterns and Best Practices in Context Engineering
[23:14–29:36]
-
Field Observations:
- Many teams still “yeet” everything into the context window, ignoring cost and efficiency.
- The fundamental optimization: from N candidate chunks, pick the best Y for the current step.
-
Two-Stage Retrieval Rising:
- First stage: classic signals or vector/metadata search narrows to hundreds.
- Second stage: LLM-based reranking (even brute force over 300) to pick the final context.
- “LLMs become 100,000 times faster and cheaper...people are just going to use LLMs for re-rankers...” (Jeff, 25:34)
- Dedicated reranker models may be replaced over time as LLMs get cheaper and faster.
-
Code Context is Special:
- Regex and classic code-search paradigms are still dominant.
- Chroma supports regex search and forking/indexing per code commit, enabling very fast reindexing and versioned search.
8. Embeddings, Developer Experience, and Golden Datasets
[30:02–34:03]
-
Embeddings in Code Search:
- For code, regex and lexical search handle 85–90% of queries; embeddings can add incremental improvements for top teams.
- Who queries and their knowledge/expertise dictate the best approach.
-
Chunk Rewriting & Ingestion-Time Enrichment:
- Suggestion: use LLMs to generate text descriptions of code, embed both source and generated text for richer search.
-
Golden Datasets:
- Small, high-quality, hand-labeled datasets are underrated yet essential for benchmarking and iterative improvement.
- “Looking at your data is important. Having golden datasets… call it the Ten Commandments of AI Engineering...” (Hosts, 33:47)
- Data labeling parties are commonplace at successful AI organizations.
- Small, high-quality, hand-labeled datasets are underrated yet essential for benchmarking and iterative improvement.
9. Future Architectures, Retrieval, and Memory
[34:16–41:53]
-
From Encoder-Decoder to Direct Latent Space:
- Jeff muses on the “crude” state of current systems, predicting future architectures may keep operations in embedding space rather than reverting to natural language.
- “Why are we going back to natural language? Why aren't we just passing the embeddings directly to the models?” (Jeff, 35:20)
- Jeff muses on the “crude” state of current systems, predicting future architectures may keep operations in embedding space rather than reverting to natural language.
-
Continuous/Iterative Retrieval:
- Moving from fixed “retrieve then generate” to systems that continuously retrieve as needed during generation.
-
Offline Processing and Compaction:
- Concepts like memory “compaction” and batch processing borrowed from database systems likely will become fundamental in AI infra.
10. On Meaning, Motivation, and Building for the Long Term
[45:25–49:53]
-
Personal Purpose and Reflection:
- Jeff draws on past startup experiences—emphasizing only working on things you care about with people you admire for users you like serving.
- “...doing that work with people that you love spending time with and serving customers that you love serving is a very useful North Star.” (Jeff, 45:46)
- Jeff draws on past startup experiences—emphasizing only working on things you care about with people you admire for users you like serving.
-
Faith, Conviction, and Societal Values:
- Discusses the rarity today of projects or people focused on multi-generational impact and conviction (including but not limited to religious motives), contrasting this with the shallowness and irony culture that pervades modern tech/startup culture.
- “It used to be commonplace that people would start projects that would take centuries to complete, and now that's less and less the case.” (Jeff, 48:23)
- Discusses the rarity today of projects or people focused on multi-generational impact and conviction (including but not limited to religious motives), contrasting this with the shallowness and irony culture that pervades modern tech/startup culture.
11. Company Values and Craft
[49:59–52:19]
- Intentionality in Everything:
- Chroma’s renowned design and developer experience is a product of deliberate curation and caring about details at every touchpoint.
- “How you do one thing is how you do everything...it feels intentional and thoughtful.” (Jeff, 50:18)
- Jeff embraces being “curator of taste,” ensuring the brand has a coherent voice and experience.
- Chroma’s renowned design and developer experience is a product of deliberate curation and caring about details at every touchpoint.
12. Hiring Philosophy and the Elusive Distributed Systems Engineer
[52:35–56:09]
- Team Focus:
- Looking for deeply skilled product designers, distributed systems engineers (Rust, consensus algorithms, simulation testing, etc.).
- Acknowledges the small talent pool and the importance of focus and quality, not just headcount.
- Creative approaches like the SF Systems Group reading circle used to bring together the right talent.
Notable Quotes & Memorable Moments
-
On shifting from demo to production:
“The gap between demo and production didn’t really feel like engineering. It felt a lot more like alchemy.”
– Jeff Huber (00:49) -
On mislabels in AI:
“The term RAG. We never use the term RAG. I hate the term RAG.”
– Jeff Huber (13:08) -
On context engineering’s core:
“Context engineering is the job of figuring out what should be in the context window any given LLM generation step.”
– Jeff Huber (13:10) -
On model marketing vs. practical reality:
“There was this bit of this sort of implication where like, ‘Oh look, our model is perfect on this task, needle in a haystack, therefore the context window you can use for whatever you want.’ …That is not the case today.”
– Jeff Huber (17:02) -
On brute force LLM reranking:
“Application developers who already know how to prompt are now applying that tool to reranking. …This is going to be the dominant paradigm.”
– Jeff Huber (25:34) -
On analogies and new AI concepts:
“I always get a little bit nervous when we start creating new concepts and new acronyms for things. …If you squint, they’re all the same thing.”
– Jeff Huber (40:48) -
On working and living with intention:
“Only doing work that you absolutely love doing and only doing that work with people that you love spending time with and serving customers that you love…is a very useful North Star.”
– Jeff Huber (45:46) -
On multi-generational building:
“It used to be commonplace that people would start projects that would take centuries to complete, and now that’s less and less the case.”
– Jeff Huber (48:23) -
On the company’s curation and taste:
“How you do one thing is how you do everything and just ensuring that there’s a consistent experience of what we’re doing.”
– Jeff Huber (50:18)
Key Timestamps
- 00:49 – Chroma’s origin story and philosophy
- 03:04 – What makes “search for AI” different
- 05:03 – Startup building in the vector database boom
- 08:33 – Adoption numbers and developer experience
- 10:20 – Zero-configuration principles in Chroma Cloud
- 13:08 – RAG critique and context engineering defined
- 15:33 – Context rot and why it matters
- 19:20 – Role of open research vs. marketing
- 23:14 – Practical state of context engineering
- 25:34 – LLM reranking as new paradigm
- 30:06 – Embeddings in code search
- 34:16 – Future architectures: staying in latent space
- 35:20 – Why not pass embeddings directly?
- 45:46 – Reflections on meaningful work and personal philosophy
- 48:23 – Multi-generational impact and conviction
- 50:18 – Design, brand, and curating taste
- 52:35 – Hiring, distributed systems, and community building
Final Thoughts
This episode provides both strategic and intensely practical insight into the bleeding edge of AI infrastructure, the emerging field of context engineering, and the mindset required to build high-impact developer tools in an uncertain and noisy space. Jeff Huber’s mix of technical rigor, startup wisdom, and human perspective will be invaluable for AI engineers, founders, and anyone navigating the world of modern software and intelligent systems.
For more, visit latent.space.
