Software Engineering Daily
DeepMind’s RAG System with Animesh Chatterjeet and Ivan Solovyev
Date: March 12, 2026
Host: Shawn Falconer
Episode Overview
This episode explores the design philosophy, challenges, and future of DeepMind’s “File Search” — a fully managed Retrieval Augmented Generation (RAG) system integrated into the Gemini API. Guests Animesh Chatterjeet (Engineering Lead) and Ivan Solovyev (Product Manager) explain the technical decisions behind File Search, how it addresses the complexity and costs of RAG adoption, innovations in embeddings, trade-offs of abstraction vs. configurability, and the push toward multimodal retrieval. The discussion offers practical insight for engineers implementing or migrating RAG pipelines.
Key Discussion Points & Insights
1. What is DeepMind’s File Search and What Problem Does It Solve?
-
Simplifying RAG Adoption
- File Search is “an integrated RAG solution that makes it super easy for you to take loads and loads of data, text, PDFs, code, whatever you have, upload it into Gemini and start asking questions about your data.” (Ivan, 02:35)
- Eliminates infrastructure burdens: “We removed a lot of complexity … you don’t need to set up your database, you don’t need to set up your infrastructure.” (Ivan, 02:52)
-
Radically Transparent Pricing
- “The pricing is usually fairly complex…what we did was we decided to simplify the whole model.” You pay for indexing when uploading, and for tokens when querying. No storage or miscellaneous fees. (Ivan, 03:30–04:10)
- “The price is actually much cheaper. So it's a good competitive advantage.” (Ivan, 04:35)
2. RAG: Evolution, Relevance, and Technical Progress
-
RAG is Not Dead—It’s Foundational
- “RAG is a fundamental capability. RAG has been there from the very beginning … it was always useful to some extent.” (Ivan, 05:37)
- Valuable for enterprise-scale: “Having RAG becomes very, very beneficial … costs become much better with RAG if you put everything into the context, it becomes expensive very, very fast.” (Ivan, 06:53)
-
Technique Evolution
- Modern LLMs haven’t killed RAG. Instead, “they have encouraged RAG to cover even more use cases.” Models can struggle with ‘lost in the middle syndrome’ in long contexts; smarter chunking is required. (Animesh, 07:30–07:50)
- Notable mention of recent academic work: “There was a recent paper last year, which is called a ReFRAG…they are trying to embed the chunks and give all these embeddings to the model and let the model decide which … to expand." (Animesh, 07:55)
-
Collaboration, Not Competition
- “RAG is in some sense a tool that we give it to the model ... I wouldn’t really see them as competitors.” (Animesh, 09:14)
3. Technical Details: Chunking, Embeddings, and Retrieval
-
Chunking Strategy
- Uses Gemini embedding models; default chunking and configuration optimized based on internal evals. Most users don’t need to tune. (Animesh, 10:19)
- “We have run a bunch of evals to kind of find the sweet spot in terms of latency of how many chunks we want to retrieve versus the quality we see.” (Animesh, 10:45)
- “Somewhere around five chunks returned … was serving fairly well.” (Ivan, 12:20)
-
Configurability vs. Simplicity
- “80% of quality is embeddings, 20% is your configuration … playing with those configurations will not yield significant improvement.” (Ivan, 13:17)
- Advanced users may still need full control and can opt for more configurable (“complex”) pipelines. (Ivan, 13:55)
-
File Types & Multimodal Support
- “We are indexing text files mostly … PDFs, docs, code files … we are currently doing OCR on images … we are actually working on getting the multimodal support.” (Ivan, 14:25)
- Current chunking is generic; “things like code are working fine” but work remains for complex structures like graphs/tables. (Animesh, 15:44; Ivan, 16:13)
4. Indexing, Corpus Management, and Updates
-
Handling Updates
- Efficient ingestion via parallelization and Spanner for immediate consistency. (Animesh, 19:46)
- “You are not updating the document, you are inserting a new version …. if they want, [developers] could delete the earlier document. … We are not doing data diff at our end.” (Animesh, 20:48–21:17)
-
Retrieval Method
- “Right now it's purely semantic search, which is vector based.” Possible future support for hybrid/keyword search; graph-based retrieval is not a fit for the product’s simplicity. (Animesh & Ivan, 21:47–22:22)
5. Citations and Attribution
- Mapping Answers to Sources
- “The models are trained to cite the responses … every sentence that they have used the original corpus to generate from.” (Animesh, 23:48)
- Each returned chunk is uniquely indexed; citations map directly to the original source via index. (Animesh, 24:32)
6. Real-World Use Case: Beam
- AI-Driven Game Platform
- Beam uses File Search to index engine code and docs, allowing new users to “very quickly pull all the relevant documentation into the context and present … how it works.” (Ivan, 25:30)
7. Performance: Latency, Quality, and Optimization
-
Retrieval Latency
- “Latency is somewhat in line with the model latency. Couple seconds for the retrieval.” (Ivan, 26:49)
-
Retrieval Quality
- “Up to like 85% depending on the use case of the retrieval… correct hits.” (Ivan, 26:47)
-
Improving Accuracy
- Strategies: use best embedding models, optimize retrieval vs. latency, filter results post retrieval, prompt verification, simple cutoff rather than complex re-ranking. (Animesh, 27:24; Ivan, 28:13; 29:03)
-
Fine-Tuning Embeddings?
- “We have a general recommendation … that people shouldn’t do fine tuning. … The speed of progress of the models is so much faster than what individual smaller labs can do.” (Ivan, 30:05)
- “If your use case is very, very niche” then possibly, but gains quickly disappear with new model versions. (Ivan, 30:45)
8. Innovations in Embedding Models
- Recent Model Improvements
- “We've added the multimodal embedding support now …. we started representing embeddings … as this Matryoshka representation” — you can truncate an embedding vector to trade off size vs. performance. (Animesh, 32:48)
- Noted improved multilingual capabilities; English works well, still expanding for other languages. (Animesh, 34:07)
9. Current Limitations & Future Work
-
Hard Problems Ahead
- Multimodal (native image/video/audio) support, better handling of structure (tables, graphs), and improved internationalization are major open areas. (Animesh, 34:07; Ivan, 34:46)
-
Scalability
- 1TB quota per user, 100MB per file, 20GB recommended corpus size for best latency. Multiple corpuses can be queried in parallel. (Animesh, 38:14; Ivan, 37:59)
-
Getting Started
- “Go to AI Studio and play with the file search applets … start hitting the Gemini API.” (Ivan, 38:52–39:06)
- Code samples and generated code support in AI Studio. (Animesh, 39:08; Ivan, 39:16)
Notable Quotes & Memorable Moments
-
On RAG’s Role:
“RAG is a fundamental capability. RAG has been there from the very beginning.”
— Ivan Solovyev (05:37) -
On Simplicity:
“We made some opinionated decisions … The tool is just there. You just upload your data and you can use it right away.”
— Ivan Solovyev (02:52) -
On Pricing:
“We removed most of the things that you are paying for and we focused on two simple aspects … you're not paying for storage, you're not paying for anything else.”
— Ivan Solovyev (03:30) -
On Model Progress:
“The speed of progress of the models is so much faster than what individual smaller labs can do … fine-tuning won’t be that relevant anymore.”
— Ivan Solovyev (30:05)
Important Timestamps
| Timestamp | Segment | |-----------|------------------------------------------------------| | 02:35 | What is File Search – simplicity & pricing | | 05:37 | RAG's foundational role, cost/scale dynamics | | 07:30 | Evolution & new RAG techniques (‘reFRAG’) | | 10:19 | Chunking, embedding selection, and error reduction | | 12:20 | Default config typically "just works" | | 13:17 | Trade-offs: configurability vs. ease of use | | 14:25 | File types, OCR, and future multimodal plans | | 19:46 | Indexing updates, parallelization, Spanner use | | 21:47 | Search: vector/semantic, future hybrid roadmap | | 23:48 | Citations and mapping generated answers to sources | | 25:30 | Beam use case: education & code/document retrieval | | 26:47 | Performance: latency and retrieval quality | | 29:03 | Is re-ranking worth it? | | 30:05 | Fine-tuning embeddings—worthwhile or not? | | 32:48 | Matryoshka embeddings for space/accuracy trade-off | | 34:07 | Remaining hard problems: multimodal, structure, intl. | | 37:59 | Storage limits and scaling advice | | 38:52 | How to get started and available resources |
Closing Thoughts
- Migration is easy: “You upload your data … [I] recommend using the embeddings model first … then try file search directly.” (Ivan, 35:50)
- General availability: File Search is available for Gemini 2.5 GA, 3.x in preview, API access is ready for all users. (Ivan, 36:38)
- DeepMind’s design priorities: simplicity, transparent pricing, broad accessibility, and continuous model and feature improvements.
“I just want to say that it's been really exciting to see the adoption that we're getting for file search. We actually received quite a lot of great feedback from developers and … it's been really nice to see how this works for their use cases.”
— Ivan Solovyev (39:32)
