
Hosted by Weaviate · EN

Başak Eskili joins the Weaviate Podcast to explore how one of the world’s largest travel platforms adopted vector search, retrieval-augmented generation, and agentic AI at production scale. The conversation begins with Booking.com’s shift from keyword matching to semantic retrieval as internal teams needed embeddings, similarity search, and eventually GenAI RAG workflows. Başak explains why OpenSearch was a practical first step on AWS, how adoption grew across teams, and why hundreds of millions of embeddings, strict latency requirements, complex filtering, and rising concurrency pushed the platform toward Weaviate.The discussion then moves into Booking.com’s partner-to-guest messaging agent, a production GenAI system that helps accommodation partners answer guest questions about check-in, parking, special requests, and reservation details. Başak breaks down the tool-calling architecture, where Weaviate retrieves relevant response templates while GraphQL APIs fetch property and booking context. The agent can suggest templates, craft grounded replies, or decline to answer and leave the conversation to a human, highlighting the practical role of human-in-the-loop design. Evaluation spans offline datasets, LLM-as-a-judge scoring, A/B testing, and live partner feedback.From there, Başak describes the platform engineering behind AI at Booking.com: a central MCP server for internal APIs and external tools, a GenAI gateway for model access, PII reduction, guardrails, prompt injection detection, logging, traceability, and cost tracking across large-scale LLM usage. She also details Booking.com’s evaluation process of Weaviate, including 100 million embeddings, filtered vector search, multi-threaded concurrency testing, reads during writes, and cost-efficient infrastructure provisioning.The episode closes with Başak’s path from computer science and NLP to MLOps and AI platforms, then looks ahead to practical AI, personalized travel agents, and memory systems that capture user preferences, session context, semantic memory, and long-term personalization for future agentic travel experiences.

Dr. Nandan Thakur returns to the Weaviate Podcast fresh off defending his dissertation to discuss the evolution from neural retrieval to agentic search and his new work on Orbit, a synthetic training data pipeline for search agents. The conversation opens with reflections on his PhD journey, tracing the field's shift from ColBERT-style models and sparse retrievers through RAG and into today's agentic search paradigm where LLMs iteratively search, reason, and refine.The discussion dives deep into how Orbit generates multi-hop, riddle-style training queries using DeepSeek's API on a personal laptop over four to six months, making high-quality search agent training data accessible without massive compute budgets. Thakur draws a sharp distinction between deep research (broad, multi-tool report generation) and search agents (focused on search and browse tools to answer specific questions), then connects Orbit's multi-hop queries to BrowseComp's filter-style riddles where each clue narrows the answer space like a funnel. The conversation explores the design of deep research harnesses, chunking strategies, Anthropic's contextual retrieval for entity disambiguation, context compaction to manage bloated agent contexts, and memory services like Weaviate's Engram for compressing search results between reasoning rounds.From there, the episode tackles sequential versus parallel search trajectories, the pass@K approach to rollouts in GRPO training, and whether isolated trajectories should share progress through message passing. Thakur makes a compelling case for training search agents to produce keyword-focused queries optimized for BM25 versus semantic queries for dense retrieval: the idea that one query does not fit all search engines. The conversation closes on future directions: efficiency-focused Pareto frontiers for search agents, long-form report generation evaluation through TREC RAG, and the coming wave of multilingual and multimodal search benchmarks.

Zijian Chen and Xueguang Ma from the University of Waterloo join the Weaviate Podcast to discuss AgentIR and why retrieval systems need to be redesigned from the ground up for AI agents. The conversation opens with a striking reframe: agents have become the primary consumers of search, inserting themselves as middleware between humans and information. Humans used to query search engines directly, now they delegate to ChatGPT, which searches on their behalf. This means retrieval algorithms are no longer optimized for their actual users.The discussion distinguishes reasoning-intensive retrieval from reasoning-aware retrieval. Reasoning-intensive tasks like BRIGHT involve single-hop queries where the connection between query and document is obscure but still one step. Agent IR tackles a fundamentally different problem, extremely multi-hop queries from benchmarks like BrowseComp-Plus, where each hop strictly depends on the previous one. The key insight behind AgentIR is that agents reveal their entire reasoning process in their reasoning traces, unlike humans who never write out their thought process. Existing retrievers discard this rich signal entirely. AgentIR jointly embeds the query and reasoning trace, training a retriever from scratch to exploit this agent-specific context.From there, the conversation covers BrowseComp-Plus, which extends OpenAI's BrowseComp with a fixed corpus to enable disentangled evaluation of agents and retrievers separately, something impossible when both the web and the search provider are black boxes. Building the corpus required over 400 hours of human annotation to ensure every hop in every reasoning chain had its supporting documents present. The discussion then moves into agent context management, contrasting compaction approaches with just-in-time memory retrieval from paged memory, referencing InfoFlow and the AgentFold paper. Xueguang shares a provocative take that neither single-vector nor multi-vector representations are optimal, arguing the field needs embeddings at the right granularity based on information density. The episode closes with Steven introducing AICI, Agent-Computer Interaction, as the successor to HCI, and Xueguang framing the open question of scaling search along two dimensions: deeper (more turns) versus wider (more parallel queries).

Shreya Shankar from UC Berkeley joins the Weaviate Podcast to discuss data agents, the Data Agent Benchmark, and DocETL. The conversation opens with defining what a data agent actually is, not just text-to-SQL over a single table, but an AI system that can reason across dozens of heterogeneous databases, flat files, and knowledge repositories to answer complex organizational questions. Shreya explains why this multi-database reality makes existing benchmarks insufficient, motivating the Data Agent Benchmark where the best-performing agent achieves only 34–37% pass@1 accuracy.From there, the discussion dives into where agents fail. They don't explore data properly, they generate broken regex patterns, they struggle with different SQL dialects, and they give up when datasets get large. Interestingly, agents tend to pull data into Pandas rather than use database operators directly, likely because LLMs are more fluent in Python than in the nuances of each SQL dialect. The conversation moves into semantic operators, natural language variants of relational algebra, filter, map, join, aggregation, where predicates like "Is this a sports article?" replace handwritten regex, with implementations ranging from per-row LLM calls to synthesized code.Shreya then presents DocETL, a declarative system for processing unstructured data that uses LLM agents to propose query rewrite strategies like chunking, splitting, and map-then-reduce decompositions, optimizing for both accuracy and cost on long documents. This leads into a broader discussion of declarative versus imperative agent design, the tradeoff between letting agents write arbitrary Python and constraining them within frameworks that handle optimization and caching. The conversation also explores tribal knowledge, structuring learned facts about data quality into retrievable tables so agents can reuse discoveries across queries, and connects to recent work on using LLMs to discover new database query rewrite rules. The episode closes with a reflection on how classical database principles like query optimization and cardinality estimation are finding new life in the age of LLM-powered data systems.0:05 What are Data Agents?2:10 Multi-Database Systems9:44 Semantic Operators13:18 Querying Databases with Python17:05 DocETL24:34 Advanced Text-to-SQL29:30 Claude Code and Databases34:34 Self-Driving Databases42:00 Agent Memory for Querying Databases53:48 Exciting Directions for AI

Amélie Chatelain and Antoine Chaffin from LightOn are leading the way in the next generation of search powered by Multi-Vector representations and Late Interaction. The podcast begins with what motivates them to work on Multi-Vector Search, continuing to discuss particular details such as the combination between lexical and semantic search, as well as bi-encoder speed with cross encoder accuracy. This discussion continues to present insights about training multi-vector models and how they differ from their single-vector predecessors. The conversation continues into particular successes of Late Interaction such as code, reasoning-intensive, and multimodal retrieval. Agents are great at searching with grep, but they are even better with ColGrep! Reasoning-Intensive Retrieval is a step change in how we think about search systems, beautifully enabled by both Late Interaction models and Agentic Search. Further, Multimodal Search, such as matching text with videos, is seeing massive benefits from Multi-Vector representations. The podcast continues to dive into the cost of MaxSim and how efficient methods such as MUVERA and PLAID can help. The podcast concludes with a presentation of their recent work on ColBERT-Zero, pre-training with Late Interaction instead of Single-Vector Dense Embedding models. LightOn are also the developers of PyLate, the world's leading open-source library for training these kinds of models.Chapters0:00 An Introduction to Multi-Vector Search6:00 Multi- vs. Single-Vector8:55 Comparison with Cross Encoders15:55 ColGrep for Coding Agents30:34 Reasoning-Intensive Retrieval42:02 Multimodal Multi-Vector48:34 The Cost of Multi-Vector53:26 MUVERA and PLAID1:06:18 ColBERT-Zero and PyLate1:08:35 ColBERT-Zero and PyLate

Doug Turnbull and Trey Grainger join the Weaviate Podcast to discuss all things AI-Powered Search! The conversation kicks off with designing search experiences, not all search queries are the same! Sometimes the user knows exactly what they want (a product ID, a specific file), other times they're exploring a broad category, and other times they need to compare and contrast options. AI is now making it possible to dynamically construct UIs around search results, moving toward what Trey describes as a "Minority Report"-style future where visualizations adapt on the fly to the query and the data.From there, the discussion dives into query understanding and domain modeling. Doug and Trey break down how LLMs can classify queries against existing taxonomies (like NAICS codes or Google's product taxonomy), while Trey explains a multi-tier RAG approach, using the index itself as grounding for query interpretation before executing the final retrieval. The conversation moves into agentic search, exploring whether iterative LLM-driven search loops reduce the need for ever-better embedding models, or whether simple tools like BM25 and grep are sufficient when paired with strong reasoning.Trey introduces wormhole vectors, a technique for traversing between sparse (lexical) and dense (semantic) vector spaces by treating query results as document sets with shared meaning, enabling exploration across vector spaces rather than treating them as orthogonal. The discussion also covers reflected intelligence, the idea of making search systems self-learning by mining user behavioral signals (clicks, purchases, skipped results) to continuously improve relevance through techniques like signals boosting, collaborative filtering, and learning to rank.The episode wraps with a conversation about how coding agents are changing the way Doug and Trey work, and Trey's philosophy of designing intentional agentic workflows with atomic agents rather than just handing an LLM a bag of tools.AI Powered Search (Discount Code = "weaviate")https://aipoweredsearch.com/live-course?promoCode=weaviate

Thomas van Dongen is the head of AI engineering at Springer Nature and the creator of Pyversity! Pyversity is a fast, lightweight open-source Python library for diversifying retrieval results. Retrieval systems often return highly similar items. Pyversity efficiently re-ranks these results to encourage diversity, surfacing items that remain relevant but less redundant. It implements several popular diversification strategies such as MMR, MSD, DPP, and Cover with a clear, unified API.

Matthew Russo is a Ph.D. student at MIT where he is researching the intersection of AI and Database Systems. AI is transforming Database Systems. Perhaps the biggest impact so far has been natural language to query language translations, or Text-to-SQL. However, another massive innovation is brewing. AI presents new Semantic Operators for our query languages. For example, we are all familiar with the WHERE filter. Now we have AI_WHERE, in which an LLM or another AI model computes the filter value without needing it to be already available in the database! `SELECT * FROM podcasts AI_WHERE “Text-to-SQL” in topics`Semantic Filters are just the tip of iceberg, the roster of Semantic Operators further includes Semantic Joins, Map, Rank, Classify, Groupby, and Aggregation! And it doesn’t stop there! One of the core ideas for Relational Algebra and how its influenced Database Systems is query planning and finding the optimal order to apply filters. For example, let’s say you have two filters, the car is red and the car is a BMW. Now let’s say the dataset only contains 100 BMW, but 50,000 red cars!! Applying the BMW filter first will limit the size of the set for the next filter!So many interesting nuggets in this podcast, loved discussing these things with Matthew, and I hope you find it interesting!

Xiaoqiang Lin is a Ph.D. student at the National University of Singapore. During his time at Meta, Xiaoqiang lead the research behind REFRAG: Rethinking RAG-based Decoding. Traditional RAG systems use vectors to retrieve relevant context with semantic search, but then throw away the vectors when passing the context to the LLM. REFRAG instead feeds the LLM these pre-compute vectors, achieving massive gains in long context processing and LLM inference speed! REFRAG makes Time-To-First-Token (TTFT) 31x faster and Time-To-Iterative-Token (TTIT) 3x faster, boosting overall LLM throughput by 7x while also being able to handle much longer contexts!There are so many interesting aspects to this and I really loved diving into the details with Xiaoqiang! I hope you enjoy the podcast!

This episode dives into Weaviate's partnership with SAS! We are super excited about our recent collaboration on the SAS Retrieval Agent Manager (RAM), featuring a first party integration with Weaviate! The podcast dives into all sorts of aspects of Enterprise AI adoption from what has changed, to what has NOT changed with recent breakthroughs in AI systems!