
Engineering teams around the world are building AI-focused applications or integrating AI features into existing products. The AI development ecosystem is maturing, which is accelerating how quickly these applications can be prototyped. However,
Loading summary
Narrator
Engineering teams around the world are building AI focused applications or integrating AI features into existing products. The AI development ecosystem is maturing, which is accelerating how quickly these applications can be prototyped. However, taking AI applications to production remains a notoriously complex process. Modern AI stacks demand LLMs embeddings, vector search observability, new caching layers and a constant adaptation as the landscape shifts week week Increasingly, the data layer has become both the foundation and the bottleneck to AI app productionization. MongoDB has been expanding beyond its core document database into a full AI ready database platform with integrated capabilities for operational data search, real time analytics and AI powered data retrieval. The company also recently acquired Voyage AI to provide accurate and cost effective embedding models and re rankers to its users. Fred Roma is a veteran engineer and is currently the SVP of Product and engineering at MongoDB. He joins the show with Kevin Ball to talk about the state of AI application development, the role of vector search and RE ranking schema evolution in the LLM era, the Voyage AI acquisition, how data platforms must evolve to keep up with AI's breakneck pace, and more. Kevin Ball, or K. Ball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co founded and served as CTO for two companies, founded the San Diego JavaScript Meetup and organizes the AI in Action discussion group through Latent Space. Check out the show notes to follow K. Ball on Twitter or LinkedIn or visit his website Kball LLC.
Kevin Ball
Fred, welcome to the show.
Fred Roma
Hey Kevin, great to be here. Thanks for having me.
Kevin Ball
Yes, I am really excited to dig in with you on this. So let's maybe start with a quick background of you and how you got to where you are today at Mongo and then maybe we can use that as a way in.
Fred Roma
Yeah, absolutely. So I started as a software developer. It was a long time ago in France in Paris. I evolved, I became a manager and worked also on the product side and I worked in different continents. Small startups, big companies. The biggest one was aws, a small startup. It's probably startup. You haven't heard of French startup but yeah, no, no. And I would say most of my career have been in the in the cloud even before we call that cloud like more application service provider but plugging some servers and giving access to these servers to customers and mostly around security, things like payment, identity encryption and more lately at MongoDB on data management and AI. So I'm having some fun here.
Kevin Ball
Yeah, so let's dive straight in there. We're talking about data management and AI, I think everyone's trying to figure out what is it to build an effective AI application right now with these new tools. What's your take on the core pieces you need?
Fred Roma
Yeah, I mean, it's never been easier to vibe code an application, that's for sure. And that's exciting. I mean, I don't know about you, Like, I just love it. Like, it's going so fast and that's so thrilling. But so what we see though is that when you want to build something that is really for production, be it a large customer, consumer application or large enterprise application, it's still very hard. And I think we could go to a couple of reasons. The first one when you build, I think there are other blockers later on when you want to launch it in production. But the first one when you build is just how complex the stack is. Now you need an LLM, you need a vector surf, you need different kind of AI models, you need an AI framework, you need new caching mechanism. And that can be a little bit. When you leave the vibe coding piece and you really want to build that in a professional manner, that can be a bit scary for a developer.
Kevin Ball
Yeah, absolutely. Well, let's talk about the angle that I think I understand you all are addressing, which is the data side of it. Because to your point, right, I have my database, I might have to do some embedding. I need a vector search situation. I need all those different pieces. So how do you think about the data stack for AI? What needs to be in there first.
Fred Roma
What we think about the data stack? I think there are three things we may want to. You'll tell me which one you want to dive on first. But like, first one, we think it should be simple and simplified as much as possible. The second one is you need to make sure it's really accurate and cost effective because the information retrieval can be pretty expensive if you don't take care of it. And I think the third part is you want to make sure it can evolve quickly. Things are going so fast, you never know which is the best LLM model. You just unplug for two days and that's a new one. You never know which tool you need to use. You never know how fast your data will grow. So I think it would be these three things. I mean, we want the stack to be simple. We want the accuracy of the information retrieval to be really, really good. And we want to make sure that if you change your mind or if the ecosystem is changing or things like that you can touch your application and make it evolve easily. That would be the three key points.
Kevin Ball
Let's actually talk about that last piece, which is the evolvability piece, because this is one of the things I see a ton as we're doing AI agents internally, the things that I'm working on, schemas are not nearly as durable as they once were.
Fred Roma
Yeah, no, absolutely. I mean they are not durable either because you change your mind or you pivot your application as a developer and that's totally fine. But they're not durable as well because the ecosystem is changing so fast. When you want to connect to a new partner or new integration points or even using a new LLM framework, maybe you will have a new field to account for or maybe you want to adopt this new observability for LLM to do real good evaluation. So yeah, that is changing super fast. Absolutely.
Kevin Ball
I think there's even an aspect that I'm starting to see which is LLM derived schemas.
Fred Roma
Right.
Kevin Ball
Instead of having one mega schema that the developers come to, let the LLM choose it.
Fred Roma
No, absolutely. Yeah, it's definitely a trend right now. Plus when you say let the LLM choose it, the reality is that you may have to play with several LLMs. So you may have to be able to handle several schemas either again, because the LLM that was best this month is different than the one that was best last month or because for some tasks you still need some specialization, you still have LLMs. And I think that will be more and more of a trend. You will have LLMs that are really good at some specific industry or use cases and others are really good at other things. But no, absolutely. And we touch. I mean it's been MongoDB value proposition forever. Like go with a document model, you don't have to stress about any change you will want to do. So I mean, we love it. Just to be clear, we love that the full AI world is speaking JSON and we love that the full AI world is coming with all of these changes because then we can. Yeah, I mean that was already something important, important before to be able to evolve quickly, but that's even more the case in the AI world.
Kevin Ball
So let's talk a little bit then about the second step that you talked about in terms of okay, we want to take this to production, we want to be able to scale, we want to be able to deal with all of these different things. Because I think Mongo has long been at least in the front end world where I used to live a lot like database of choice for rapid prototyping. And then at some point sometimes people would say, oh well now we've got to switch over, we've arrived where we're going. But I think at this point you all scale all the way up.
Fred Roma
Yeah, oh yeah, we have like now we have like, I think it's 75% of the Fortune 500. I've been at MongoDB for one year and a half and it started far before me. But I can tell you we always speak about big four like security and durability, availability, performance, being able to speak to large enterprise. And we see that more and more. We see more and more of these very large workload you can totally manage. I see transaction, you can tell you manage. It's a very strong transactional financial thing. So I think it's a, I mean MongoDB is the open source. I've been a customer of MongoDB far before even considering joining the company. And so I also had initially, oh yes, I remember we are going so fast. But that's a very serious database for scale for sure.
Kevin Ball
So let's then talk about how to effectively use it in an AI application because I think this is a space where, okay, the model layer is pretty well understood if changing very, very rapidly, right? You have these LLMs, you can throw text at them, they love JSON, all these different pieces and then you have this kind of, okay, there's these cool applications being developed, but all those middle pieces and as you highlighted the like complex stack that's going into that is very much in flux.
Fred Roma
Absolutely. Yeah, absolutely. So even past what we discussed before, this document model and JSON that is again very well adapted, optimized for AI, you still need a search and vector search because I mean you don't have any serious AI applications that will really give you a lot of value if you just plug an LLM. The value of these AI applications is okay, how do I connect an LLM, what the LLM knows with what my company knows. And if you want to do that you need a search usually and a vector search by the way, maybe we can come back to that as well. But like if you are looking for, I don't know, you are building a optimize e commerce website and you are looking for red shoes. You probably want to see these burgundy sneakers as well. So you need search, you need vector search, you need AI models, embedding models and re ranking models because this is how you can really have very good information retrieval and really make sure that if you are building your RAG application, if you are, your semantic search, your agentic system, to make sure that the right information is provided, grounded results are providing to your user. Depending on what you are building, you may need some stream processing. If you have some events, you may need many of these things. And so as a customer you can choose to stitch together different solutions, a database vector search, search reranker, an embedding model, etc. I don't think you should do that, I don't think you should tell your best friends to do that. But you could stitch these things together and connect multiple times your identity providers and create this pipeline for the data to transfer. What we are really betting on and what we see is bringing value to customers is just making it super simple. You have a database, we don't train on the world with all the stack you need for AI, but on the data layer you have a database that is becoming a data platform. And so you can do, yes, storing of your information, querying of your information, but also information retrieval and data in motion, and all of these AI model optimizations to make sure that you will get good results.
Kevin Ball
Let's go a little bit deeper in what that takes on search, because I love that you brought up search. I feel like to me, one of the things that I'm seeing is everybody's thinking of these as chat products. They're not chat products, they're search products at their core. They're about surfacing the correct information. And LLMs help you interpret that and put it into context for someone. So it's not as simple as throw it at an open source embedding model or OpenAI's embedding model, run naive queries and just go, there's a lot of pieces that go into effective search.
Fred Roma
Absolutely, absolutely. We can maybe take an example, a concrete example, a simple one. Let's say you are a bank and you are building an application for your customer support and you know your bank's customers, they will maybe be able to ask questions about their account, their credit card, et cetera. And so obviously you need an LLM because the full interaction, the conversation, is an LLM interaction. So you need that for sure. But if you only have that and you don't have access to the internal documents of your bank, probably a bit weak as an experience. So you need search, you need search. I would say you need both, you need probably vector search that are really the semantic search. Because if I'm asking you, okay, how much money do I have on my Account, maybe you will look at it from not maybe how much money, but like what is the sold of my account? These are very different words in very different languages. So you will need search, vector search. You will probably need as well maybe some kind of reaction to that event. Because if you see that something has been paid in the last two minutes and we are discussing for five, you want to take that into account. So you need all of these pieces to come together. The search, vector search, the stream processing.
Kevin Ball
So let's talk about the pieces that go into vector search because I think once again, for folks who are coming into this, to the beginning, they think, okay, what do I need to know? Right? I mean, I started the first interactions I had with search were back in the days of Solar and Lucene, right? And these things.
Fred Roma
Absolutely.
Kevin Ball
There were no vector search. It was all keyword based, but you had some synonyms and things like that. But to me as an end user, I was like, okay, throw it all at Solar, do a couple configurations and I'm good. It's golden. It serves things up. I think there's, there's more nuance than that. If you just throw all of your documents at OpenAI's Embedding Model and assume it's going to work, it's, it's not going to just work. So what are the different pieces that go into that?
Fred Roma
Yes, there are a couple of different pieces. I will start with the basic. Different models will have different accuracy. So the quality of the results and it's always a trade off between how fast you want the results and how good you want these results to be and the best embedding models. And that's exactly why we acquired Voyager a bit more than six months ago now. I mean, they were doing, and they're still doing the best embedding models out there, they are more accurate. And this OpenAI that you mentioned, and so you actually want the best result with a reasonable latency because you have a user probably maybe behind a chatbot or maybe behind an agentic system application waiting for results. So accuracy is a big one. Multimodal is a big one as well. Most embedding models will be able to do a good job comparing text or maybe pictures, but the real world is messy. You have images and text and videos combined and you will have PDF and you probably don't want as a developer to break that thing down and call different models. So that's also one like, what are the formats that you can support? I would mention two more things. So, yeah, accuracy, multimodal the ability to understand context is also very important. Like for instance, if I'm asking you, okay, let's say you have this support application, chatbot, maybe now you are like a networking company. And I say, okay, how do I configure this router? If you find somewhere in your core piece of data an exact sentence or blurb or text that explains how to configure that router, most embedding models will be super happy. Oh, I found the information. The best embedding model will be able to say, well, wait a minute, I found a line, a sentence that looks exactly what the user is asking for. But is it part of a recent documentation or is it part of a ticket that is five years old and the setting is totally outdated? So we'd say context is very important. And last. And that's last in my list. But probably sometimes first in the customer's mind is like the cost of it. People sometimes don't realize, but these embeddings, they can be even bigger than the data that they represent. If you have a great model that is able to do all of that and you don't have many of them, good accuracy and multimodal and good context. But the embeddings are so big to be able to achieve these results, it will just be an awful ROI for your application. So you also want a model that is able to do that with really short embeddings that would be cheaper to store and cheaper to query.
Kevin Ball
Yeah. So that's interesting. I kind of want to explore a few different of those aspects and maybe we can explore them from the context of Voyage, because that is the recent acquisition and that was kind of, as I understand it, the secret sauce there. So first, starting with this multimodal piece, right? Because if I think about an application I'm building, I am probably doing a fair amount of pre processing, right. I'm like, oh, this is an image. I've got to translate this image to text. And now I've got to send this text over to my embedding model and now I've got to take that and do my vector search. And it's like this whole pipeline of things.
Fred Roma
Absolutely.
Kevin Ball
But it sounds like what you're saying is they've got a multimodal model that you can just throw whatever at it and it's going to translate it.
Fred Roma
Yeah. No, you nailed it. That's exactly what customers were doing, doing. And when we speak to customers and they are describing that, they say, oh, I have my. Let's go back to the PDF example, they say, oh, I have all of this pipeline and I will extract my pictures and my text from my PDF and then I will run them through different kind of embedding models and I will try to reconcile the results afterwards. Yeah, with the Voyager multimodal model, you just throw your PDF in the embedding models and you will have an embedding. By the way, the result will be even better than when you are doing all the pipeline, because when you are breaking your document down, you will lose some context, you will lose some interaction. Where was this picture exactly? Was it above this text or below this text kind of thing? So the result would be better. But the big benefit is also, as a developer, you can go super fast, just use your document and your embedding model and you're done.
Kevin Ball
Yeah, no, the simplicity definitely appeals to me. So let's explore the context piece a little bit more because what I kind of hear you describing is your embedding right now is taking into account not just the text, or not just in the case of the P, like text and images, but it sounded like things like metadata, updated timestamps, like all these different things, like what does this API call look like? What do I pass to it?
Fred Roma
Yeah, it's even more than that. So what you described, by the way, is the fact that you also want to take into account the metadata and some other information in addition to purely semantic search. It's super important, top of mind for customers and that's why they're using, by the way, vector search and search combined. And that's why it's so important for search and vector search to be where your operational data is. Like, if you're using a separate vector search, you will have to, oh, what is all this? Data or metadata? I have to also synchronize to this? No, when your database is there, you can do all this stuff that you are mentioning. The context is a bit different. It's part of the embedding model. It's just the way the model is trained. And then at inference time, instead of just isolating a chunk of text from your document, that's how you. Again, maybe I should step back. When you are running an embedding model on a text, you should cut. You don't give it like two pages of document, you are chunking the document in small sentences or blurbs or things like that. We call that chunks. Then they will have an embedding for each of these chunks, but they don't really know what is a chunk before and what's the chunk after and what the voyage model is doing and the voyage context model, that's a specific one that we released, is that it will parse the full document and that it will preserve some context in addition to the specific chunk. So, yes, you will know. For instance, yeah, this sentence really explains how you can configure this router, this security configuration maybe, but that's how you know that you are part of an old ticket, because you also see maybe three or four chunks above that it looks like a support ticket and that it is six years old and that probably you shouldn't give it too much importance.
Kevin Ball
Got it. So conceptually, if I'm going to just try to like map this out, if I were building this with a much more naive model, it would look something like, okay, I have a summary of the whole document with maybe some additional things, and then I have each chunk and then those two things are getting put together kind of in each set. Interesting.
Fred Roma
Yeah, that's a great way to look at it.
Kevin Ball
Fascinating. Well, and you alluded to another piece of this, which is combining search and that gets us into this topic of re ranking and all of that. So can you maybe lay out what that looks like just broadly for context for folks who haven't built these applications before and then what the voyage take is on it?
Fred Roma
Yes, it really depends about what you are trying to achieve in your application, but what we see most of the time, when you want to have the best results, and I'll give you one example, but when you want to have the best result, like the most accurate result, like your user is asking for something through a chatbot, through an agent, et cetera, and you want to give the best results combining a keyword search, like looking for document with the exact keywords that were part of the query, plus also the semantic search, meaning the documents that may have very different keywords but are speaking about the same topic, combining the two is how you get the best results. Like the example would be, let's say you say, oh, I want to. I'm interested in. I'll go back to my red shoes. I'm looking for Nike Red shoes, maybe red shoes. It's totally okay to go with burgundy sneakers, because that's almost the same. Now if as a user you made the effort to mention Nike, it may be very important to you and you want really to make sure that you are looking at the keyword Nike. So let's look at the keyword Nike. But let's only look at the semantic meaning of all of these Red shoes and maybe burgundy sneakers are perfectly fine. So this is just a very simple example and it doesn't fit to all use cases, but most of them, the best accuracy would be to combine both of them. And that's why having search and vector search and the database at the same place, it's a big deal in terms of.
Kevin Ball
You remove a lot of round trips to do that.
Fred Roma
Absolutely. Yeah, you're right. Sorry. That's a very important point that you are touching. And I'm not saying that you couldn't do it with multiple pieces. You could, but then you have to run your search on a keyword search, you have to run your semantic search and you have to build your own algorithm to see how you are ranking those results. Yes. So that's exactly a lot of hurdles that you are removing.
Kevin Ball
So implementation wise, if I then was using your API, am I able to specify run this against these two searches? Re rank in this way? What are the knobs that I have available as a developer?
Fred Roma
Yeah, so we didn't reinvent the wheel, by the way. With neither search or vector search, you use the MongoDB aggregation python pipeline, the one you are using with your database, to just query data. But we created new operators, ScoreFusion, RankFusion. So I may not go into these details because these are just slightly different ways to merge the results, but you have full control about. So first you have one operator where you can combine keyword search and vector search, but you have full control about how you want to do that. I mean, many customers are just happy with a basic kind of way to combine, but some say, okay, I want to have different weight, I want to over index a little bit, maybe in the, in this keyword and a bit less on these ones. So it's up to you. But you just use the MongoDB aggregation pipeline.
Kevin Ball
Maybe it's worth actually stepping back and talking about that aggregation pipeline a little bit, because that is a capability that doesn't exist in all databases.
Fred Roma
Yeah, absolutely. Yeah, yeah, no, no, that's really the. I mean, if you have used MongoDB, that's something that customers usually really like. It really gives you the ability to make several operations on your database one after the other, and the result of an operation can be used as an input for the next operation. So it can be really, really powerful. And that's the case for this. Again, search, you can use combined operators as I was describing, but you can also decide to do some search and then you will do some re ranking somewhere else and you will do some of the stuff. So it's really a pipeline that you can implement to play with your data.
Kevin Ball
Yeah. So just to sort of echo back, right. If you were using another database, you might have an external pipeline tool where you're defining a series of stages with dependencies and data transfer and kind of moving that around. And with the aggregations pipeline, you can do that all inside of the database.
Fred Roma
You can do that all inside the database because this is natively integrated. Like search, vector search are natively integrated with the database. You don't have to move the data around, you don't have to use different aggregation pipeline. Indeed. But you don't have to use different cli. You can really have all of that as a single experience. It's not like an extension where you have to be careful about how that will be supported or how you can plug things together. That the same tool is giving you access to all of this.
Kevin Ball
And just so that we're clear, those things can be defined on the fly. If someone, for example, was giving the LLM tools to the kingdom, it could write its own aggregation pipeline and run it.
Fred Roma
The developer.
Kevin Ball
Oh.
Fred Roma
So yeah, so we did release an MCP server that's really trendy these days and it actually, it's really effective as well. So customers are, I mean, the adoption is pretty nice. If you use this MCP server as an example. I'm mentioning MCP server because you mentioned LLM and usually that's how developers more and more are going with interaction with our database. You can absolutely for sure create clusters and do operation, but also configure your aggregation pipelines.
Kevin Ball
Yeah, let's talk a little bit about security. You mentioned security as an area that you had dealt with. And I think you know this question of what are LLMs allowed to see, what are they not allowed to see? All of this is definitely top of mind for those of us building applications here. What are the primitives that are, I'm guessing, baked into the database to allow you to build secure AI applications on top of Mongo.
Fred Roma
So I would say I want to step back on the overarching principle because you touched on LLM and what an LLM can, can see. That's exactly because you don't want probably to train an LLM on your private data that the pattern, the architecture that is winning out. There is no, I will not. I mean, there are exceptions, but I will not fine tune or post train my LLM on my data. I will use an LLM really good at again, everything they are great at. But for my use Case when a user wants something, then I will connect what this LLM knows with what my company or my application knows. That's not about giving anything to the LLM, it's about your application being able with very good information retrieval. Okay, my user is asking me again for my what is my credit limits on my credit card? A lot of things can be handled by the LLM in terms of how to answer to a user asking question. But if I really want this information, I also have to go and find my private information of my bank about what is a real limit and I will provide this information to my user combined. But the LLM will never see this private information. So maybe just to clarify the overall pattern before going into the security. Well, that is actually important.
Kevin Ball
I think that is very important in terms of what sets of data are. I think the term I sometimes use is moderated through the LLM. So even if the LLM, the LLM may be the UX delivery, but do I load this data, pass it to my model and have it presented or do I sidestep around the LLM? Because this has to be right and I can't count on it not hallucinating something about it.
Fred Roma
Yeah, yeah, absolutely. And there are different patterns there. Most of the time what customers will do is again, if a user or an agent wants to do something that does require information retrieval, you will first look for the information that will be relevant. I'll stick with the same example. What is the internal document that explains what are the limits on credit cards? And then we'll insert that in the prompt of the LLM. So yes, that will go through the LLM from the prompt and the answer of the LLM. But it's not stored anywhere on the LLM side. It's not been part of the training of the LLM.
Kevin Ball
Sure, yeah.
Fred Roma
And if you decide, and then that's up to you as a customer, where do you want this LLM to be hosted? And where do you want these queries in this token to be served? And so you can decide depending on your security sensitivities. Many customers say, okay, I'm totally fine with adding my LLM on AWS or OpenAI or Azure, etc. Or some customer will say, no, I want to do that, but I want some specific security agreement with these providers to make sure that my data is never shared. And some customers say, you know what, I want to host my own LLM. You can do that if you want. What is important is that there is nowhere the blending if you want, of how the LLM is trained and your private information. Yeah.
Kevin Ball
So coming back though, to building applications with these pieces, I think I'm curious to understand how you're seeing people defining these lines or barriers. Is it changing at all in terms of how you're managing security at the data layer?
Fred Roma
I mean, security has always been a very, I mean, customers trust us with our data as a data platform, so it's always been top of mind anyhow. But I would say that with this AI specific application, we see more and more, I would say at least even more discussions about this LLM integration, the one you spoke about before. And you know, one of the big value of MongoDB is you can run it anywhere. And when you say run anywhere, sometimes people tell us, oh, you are cloud agnostic. You can run it on AWS and GCP and Azure and et cetera. Yes, that's true. We can also run it on premise. You can also, you know, we have an enterprise and you can do that in your own data center. And we see customers that are doing that. What is very interesting, I think in this AI world is we see customers that are saying, you know what, for this use case, I'm totally fine if it's in the cloud, but for this use case, I really want to make sure that my data is never in any cloud provider and never touched by any LLM provider either. And so I will run it on prem and you start to have this kind of. And again, I think, you know, it's so early and things may evolve, et cetera, but the fact that they have a choice and they can decide what to rely on as a cloud provider, as their own data center, I think it's pretty interesting. And I do believe it's more and more a discussion topic.
Kevin Ball
That's fascinating. And when you're seeing that, are you seeing them doing this within the context of the same application? So you're having to kind of have security boundaries and federation and however that's working or like, how does that work?
Fred Roma
I'm not sure I can extract one pattern. One answer to that. You have customers that will say, okay, I will really have all of my data will be on prem and then some application will be in the cloud and some won't. I see some customers saying, you know what, I want my data to be in my data center, So I want MongoDB Enterprise Advanced and that's how I manage it. But I'm still okay to do a call to an LLM outside of my boundaries because they believe that they can Control and they are right. In many cases they can control the prompt. So it's okay to send some information as soon as it's something that your application control. But at least nobody's seeing the raw data. So I really see different kind of pattern at them and I wouldn't be able to to tell which one wins. I think what is very important though is that overall I think there is really this intent to remain as flexible as possible. I think say, you know, I'm back to the previous point. Maybe this LLM is good for me right now, but in six months that will be another one. Maybe this cloud hosting is good for me right now, but then I will want to go on prem for any regulation or specific concern later on. I think there is really a willingness to remain as flexible as possible and to have options on the table.
Kevin Ball
I would say that gets to kind of a somewhat different topic. But you know, I think one of the things that these tools are doing is they're changing the speed at which people are operating. They're changing kind of how fast we're moving, they're changing how adaptable we need to be. How are you both internally and with your customers, like rethinking the way that we organize the teams doing this work?
Fred Roma
Oh yeah, yes, that's, that's okay. I mean there is a product angle. I thought we were initially going to the product angle and we just go there super, super quickly. But I want to go to the team organization. I think that's super important as well. But from a product I'll go back to if you want to go fast, but you have to move your data around and to. I mean any of us who had been software developers or architects at one point know that when you have to optimize for latency performance, iterate quickly with network layers and data to transfer, it's just a nightmare. So I would say that's one of the key arguments, not just for MongoDB. I think that's a big value of MongoDB, but overall I think the database vector search and search at the same place and you don't have to do this etl, this kind of difficult network configuration between. I think that's a big one for your question as well about the speed of development and iterating. Now if your question is going more to the team organization, I think that's a very good question as well. Even so, I mean two things. I think what's happening right now is that it can be also very thrilling when you are a product manager or business person say, oh, I can build it myself, I can build fast. And personally, there is something I love about that. What I love about that is that instead of trying to debate about maybe some Text and some PowerPoint, you can really show. I love that. I think the risk is to believe that it's easy. Yes, it's easy to show something that's the old engineering and product manager sometimes have to align on that. I think it's great if you use that well, if you use that as a way to say oh, and just put it to production. Well, for some stuff. Yes. But how will you scale and because that's all of these tools, right. They are making the code. Writing code super fast while reviewing code is not faster and making your security assessment of this code is not faster and defining the right architecture is not faster yet. So I think it can be awesome. If you know what it is. It can be a bit dangerous if you believe that, then that's it and I can just push it. Sorry, I don't know if I maybe pivoted a little bit to your question, but that just made me think of this point.
Kevin Ball
No, I think it is key and it kind of gets to a couple of different questions. Some of them related to the product piece and some not. So like one of these things is going from zero to a prototype I can show is now very fast.
Fred Roma
Absolutely.
Kevin Ball
And if you put some restrictions in place, some of things are then if I was hearing you correctly, you can then actually take out to production. Some things actually do are able to ship relatively easily, but others are not. So I guess the place I would start to ask you is like, how do you draw those lines and are there ways in which having for example a database that can handle all these different pieces together makes it easier to bridge that gap?
Fred Roma
Yeah. And I will obviously be biased because of working at MongoDB right now. Database. And we are not vibe coding a database. There's too much at stake.
Kevin Ball
You're not?
Fred Roma
No, we are not. We are not vibe coding. Now that doesn't mean that we are not leveraging AI for many things. Things for prototypes 100% for this product and engineering alignment. By the way, I didn't answer your question about team organization, but maybe later can put a pin on that. But also how you handle tickets, how you onboard. I think people, even very senior engineers, when you ask them, say, oh, I have to discover this new part of my code base that I haven't touched in a while. I can go so much faster now to understand so I think there are many, many benefits as well for real product. But the code that is written for production, I think that we are still doing it very manually for the core database for sure. And then I would say even for more internal things or stuff that are maybe a bit less sensitive where you can for sure go faster in your. I don't know if bytecoding is the right thing but like AI assisted coding, I still believe like yeah, the security audit, these observability there are still a lot. It's not ready yet for these AI tools in my opinion but they can help you prototype and align on tickets and align on requirements. I think that's pretty impressive.
Kevin Ball
So do you have a line in your product of like within this must be handwritten outside of this AI assist.
Fred Roma
Okay, yeah. All of the core database and core product of MongoDB, we don't wipe code it, we don't AI it. The code is written manually each time by a developer. And we are doing the code review as we did and we are doing the security review as we did. That is for sure. Yeah, we can use again some AI tools for some security finding kind of things. I mean we can get some help as many companies do. But I would say most of what we experience with AI are a lot of more management tool around that. That's still a lot for you know, but like all the management tool around that, all the internal tool, all the before coding and after coding piece to go faster. We use AI more and more but we don't vibe code as a core MongoDB database for sure.
Kevin Ball
So that's obviously probably pretty different than a very new not so security focused startup. So like thinking about that, has it changed anything about how you're internally organizing your teams or does it still take just as many people because the core has to be so kind of solid and locked down.
Fred Roma
Yeah, yeah, no, that's a great one I would say. So there's something first as maybe as a cultural kind of stuff that I came to really enjoy about MongoDB and I saw that in my previous company at AWS as well, is that even people that are not engineers are pretty deep technically. So we have product managers trying to build their MCP experimentation. We have, I mean I think that's culturally speaking that's the case and you can see that because you have people moving from engineering management to product management role easily. And I think I do. So I just want to. Yes in this context because I think that if you are in a different context which can have also Pros and cons. But I think maybe that's different. But I think in that case it can really fast track the alignment on what you want to build. Because the example you took before, instead of just describing the long document, oh, that's the UI I want. You can just do it and then you can align and then you can discuss about how to do that. So yes, it does fast track the alignment between product and engineering. No doubt we went pretty fast in terms of organization. We made the decision a few months ago. Now we don't even have an engineering and a product organization anymore. We have a product and technology organization that's part of it. It's not just because of AI, but that was part of it. I think two big objective one, how to make sure we can fast track the decision making, the alignment, the sharing of information between the product decision and the engineering decision because they are the same eventually. So that's number one and number two, how to make sure that everyone is customer obsessed. And even if you're an engineer, you should be customer obsessed. And I do believe AI helps with that. By the way, you can really see what your product manager has in mind. We are using that to really show some report from customer discussion, advisory board, these kind of things. So we really made this leap because you are touching on the organization and I do believe that I guess we probably would have done that as well, but with these AI tools and way to collaborate, I think it's helping product engineering to be in the same kind of smaller teams and before it was a product and engineering organizations.
Kevin Ball
Yeah, no, I think there is definitely something and have been in a lot of conversations kind of talking about this convergence between engineering and product. Whether that looks at more technically minded product people, whether that looks at more product minded technology people. As code becomes, maybe not in a database core, but in many contexts is more commodity. It means that product mindset is more and more important.
Fred Roma
I think that's spot on. I would have a bit of a nuanced take on this one, which is like I love that product people can be a bit more engineering or engineering people a bit more product or the wording you used. I love that because again, I think that's a great way to be customer obsessed and to be focused on the outcome and what you are trying to achieve more than just trying to articulate what you even think kind of thing. However, the expertise doesn't go away. Like if you are a product person and you are meeting many, many customers a week, you will have an expertise about reading between the lines and understanding, reading the room in a meeting. If you're an engineering person, it's not just about this prototype that your product manager colleague can build. It's really about how will that scale, how will that evolve? Where do we expect my growth to be and my new changes to be and my next security audit to require? So I think, yes, it's great to bring people a bit closer to the other side somehow and the brackets. But I do believe we should respect expertise. There's still a lot of expertise in what it is to build a system at scale for real production usage and what it is to really understand customer intimacy and what is the need of a market. And we should respect that. We shouldn't believe that because we have tools that are helping a little bit. This expertise doesn't matter.
Kevin Ball
Yes, I think that that is. If I were to summarize, one of my big lessons about LLMs is they're incredible tools, but you cannot turn your brain off. Your brain as an expert is super necessary still.
Fred Roma
Oh, I love that. And to me, I don't remember who said that. I would love to have been smart enough to say it, but someone says something like, oh, I don't understand when my LLM is super smart on all the topic, I don't know, but when it's a topic I really know it's not that smart. I love that because, you know, your expertise is still important. When you know deeply a topic, you do realize that the LLM is wrong sometimes. And you do realize that John did that with Back to the Mongo. We take that back to the MongoDB case, but grounded that with real data. Real knowledge is important, but even overall expertise matters, even within using it, I.
Kevin Ball
Find I'm better able to guide these tools in areas I know well than in areas I don't.
Fred Roma
Yeah, I love that.
Kevin Ball
So coming back a little bit to this piece around the data layer under LLM applications, what do you see as the big unsolved problems? What are the things that your team is working on, looking forward for the next. I don't know how long we're allowed to project in the AI era. Two weeks, six months, something like that, right?
Fred Roma
Yeah, it may be. No, I think, you know, everyone out there is developing an AI application, right? I mean, it's pretty rare to see a company that. But you don't have so many. I think we are at this turning point right now when where they are really going to production. I mean, you have this. It was a famous MIT paper like three or four months ago saying, oh, 95% of these AI applications, they don't make it to production or when they make it to production, they disappoint. They don't bring the ROIs they were expecting to bring. But I do believe it's changing. I do believe more and more of these AI applications are getting production ready. And to me the key to your question about what we see, I believe that people, customers really understand more and more how the quality of the grounded response is how the quality of the information retrieval. To make sure that your LLM will not just be LLM smart, it will be company smart. It will know what the company knows. I think the accuracy for use case just being 2 or 3% more accurate, that means that you are totally reducing hallucinations. So I think people are really realizing that even a little bit of impact on the hallucination is like a big deal for the user experience. And I do believe as well that people realize that, well actually this stuff is expensive. This AI model can be pretty expensive. And even right now where you know they are subsidized by a lot of VC money and all of that, that's still expensive. So if you are able to only call the LLM when you need to call the LLM and when you're able to optimize the length of your prompt because you did a good job at finding the relevant information in your corpus of data before, if you are using embedding and ranking models that are pretty short and cost efficient, that can totally change the ROI of an AI application. So accuracy and cost of this AI application, I think that would be a big topic for the years to come in my opinion.
Kevin Ball
In terms of accuracy then. I mean I think you've talked some about, we've talked about like what it takes in terms of searching, in terms of re ranking and what you surface. Are there any other kind of best practices you've seen or you recommend to folks in terms of like what you're actually putting in that are you doing pre processing? How are you navigating those different levers?
Fred Roma
Yeah, absolutely. I mean with that, that could be a very long discussion because that's. But I would say one, the quality of your data. If you don't clean your data, regularly, handle metadata, know what's important, the quality of your data is key, how you are preparing your data for AI. We spoke about chunking. How do you cut your long text into smaller text that makes sense. I think that there is a science, there is a science to an art to that. How do you prepare your data so it has to be clean, it has to be prepared for AI and then that's really what is the right information retrieval strategy for you. Do you want results that will be super fast? Do you want results that will be super accurate? Because you are a legal company and when you are providing advice or financial companies that can impact the tax return or legal document and then you will want to use the best embedding models, all the context and all the lengths, et cetera. And that's okay if they are more expensive. Or maybe if you're an E commerce company you will want to go super fast and make sure that you have 10 or 12 good results and it doesn't have to be the only one. So I think it's about your strategy, about what is the right trade off for you between this usual quality and cost and speed and all of that.
Kevin Ball
Yeah, the interactivity is an interesting one. Even within an application you might have different workflows. Like I was talking to someone who was doing an agent and they were saying, yeah, when I know that the user is right there, I bias towards interactivity and speed and getting it up in front of them. And if it's an async workflow, now I care more about accuracy, now I care more about, you know, I can take my time.
Fred Roma
Oh absolutely. And we see that an example is not a public reference, but we have an E Commerce customer of us and how you make your trade off for your E commerce piece, like where the user will go to find their red shoes of sneakers and how you will manage your stock behind the scene. These are different kind of requirements and latencies and costs and things like that. So absolutely, depending on the workload you will have a different sensitivity.
Kevin Ball
So I think a lot of our listeners are now familiar with in some of the AI coding tools like you can dial up your budget for. Oh, I want you to think longer, I want you to reason more. I want the mini versus high versus whatever. What are the equivalent knobs that you have in the embedding models and the re ranker and all these different parts that you have with Voyager?
Fred Roma
Oh, that's a great one. So I didn't think about this parallel before by the way, so I love it. The syncing and the one with the LLM you can definitely go fast and cheaper with like I would say basic. But even the basic you can have really good ones versus not so good ones but like a basic text embedding model. And I mean one of the value of voyage model is they come with different sizes, so you can decide how long your embeddings will be. And even the type like do you want to go with a float or to a binary kind of. So you can decide how much. So there is definitely a first decision there about a cost versus accuracy trade off. Then you can discuss whether we discuss multimodal and context. These are heavier models, they can take a bit more time, a bit more compute, but they will give you better results. So is your use case worth this investment? And last, we didn't speak much about RE ranking, but RE ranking is an additional layer that more and more customers are using, which is like the embedding model will basically give you. Oh, these are like the 102100 best documents in your core piece of data for this query, that's pretty fast and you optimize for that. If you want to know the best model, sorry, the best document for this query, then it has to be compute intensive. You have to go beyond the embedding, you have to go back to the document itself. And that's what RE rankers are doing. So that's also another layer of your thinking versus the LLM analogy, which I like. You can also decide whether you need a RE ranker to have a very optimized ranking of your results.
Kevin Ball
Now you mentioned for a lot of these in mongodb you would put them as an aggregation pipeline. I'm thinking about use cases that I've had in building these things. Oftentimes I'll kind of do things in layers where I'll show them something quick, fast, but then I might render do behind the scenes. Okay, I'm going to re rank. I'm going to resurface this, I'm going to bump this. I'll do things like that. If I were to do that in your system, can I get those kind of intermediate results streamed out to me in some way or like how does that end up working?
Fred Roma
Yeah, you can have. Well, what you describe is like in a single query you could first go with a quick search and a bit of a longer one. I don't have many use cases in mind doing that. What I have though is like a developer that will start maybe the first iteration, they will go with an embedding model and that's about it. And then when you really want to go to production and you will have real users and real data, et cetera, then they will upgrade their model to a more powerful one to improve the accuracy of the result. Or they will add a RE ranker. But thinking out Loud. What you describe is totally possible. You could totally do a first search and then do a re ranking in parallel. For instance, as an example. I haven't seen it, but it's such a fresh page that doesn't. I kind of like the idea still. You could still show oh, these are the 10 pair of shoes that are looking like your red shoes, Nike kind of thing. And you show them immediately, but after a few seconds you can re rank them to make sure that the one that is very accurate comes to the top. You could totally build something. The technology doesn't prevent you to build something that dynamically. I don't know if the financial aspect or balance if it'll make sense.
Kevin Ball
Yeah, it'll depend for sure on the application involved. But yeah, I think these types of latency trade offs are all over the place in these types of applications. And so then there is this question of okay, how much value is there in showing something to the user versus getting the right answer in front of them? And maybe there's value in each.
Fred Roma
Yeah, absolutely. Absolutely. At least what's important I think is to have options. Whether you use that in the same flow as you were describing or you use that because you have different phases of your project and at one point you just want to improve accuracy and et cetera, or you do that because for specific application or workload accuracy is so important. And for another kind of workload, maybe you have a free tier and you are okay that your customers have good result, but when they're paying and you have a premium tier, you want your customers to have excellent results. So I think having the freedom of doing that easily because you can change your schema, you can change your AI model, you can optimize this thing is what customers are looking for right now.
Kevin Ball
Yeah, and having that flexibility within the same API, same interaction, I don't have to now I have to go and get a different thing. No, that's definitely.
Fred Roma
And same document model. I'm coming back to that. We didn't invent anything new to store the embeddings for instance, they are part of your document model. They are there like that works. And we didn't reinvent the wheel about because what people love about mongodb usually the horizontal scaling and the fact that you can have these shards and replication. We did the same for search and vector search. You can have your own search nodes and you can decide that they will be a bit more memory intensive and they will not impact your database. So it just using the same principles of the document model of the distributed architecture just apply to these embeddings as you were mentioning.
Kevin Ball
Awesome. Well, we're coming close to the end of our time. Is there anything we haven't talked about that you think would be important to discuss before we wrap?
Fred Roma
No, I think we touched on it, but I just want maybe to double down on it. Things are changing so fast. Like the quantity of data is changing so fast. It's not just even humans now generating data and consuming data, it's agents generating data. So the quantity of data is changing so fast. The ecosystem is changing so fast. There are new players and some of them are amazing and some of them looks amazing but won't be here in six months if we go super fast. The LLM race is like, I love it, by the way. I love when like, oh, Google is coming with this great one and then OpenAI. But like it's changing so fast. Your sensitivity to, oh, that should be in this cloud provider and that should be on prem is changing so fast. I think it's super important to go with a data platform that can handle this flexibility and that if something change, you can change. You don't have to, oh, I have to rebuild this data pipeline and you have to change my schema and I have to integrate this new identity system to these new players. I think it's really important to at least over index on flexibility.
Kevin Ball
I love it. Let's wrap there.
Fred Roma
Awesome. Thank you, Kevin.
Software Engineering Daily | January 27, 2026
Host: Kevin Ball (K. Ball)
Guest: Fred Roma, SVP of Product and Engineering at MongoDB
This episode dives deep into the complexities and evolving landscape of taking AI applications from prototype to production. Kevin Ball interviews Fred Roma, SVP of Product and Engineering at MongoDB, about the challenges of building “production-grade” AI systems, particularly at the data layer. The discussion covers topics including vector search, schema evolvability in the LLM era, embedding models, the impact of MongoDB’s acquisition of Voyage AI, real-world security concerns, and how engineering/product teams are reorganizing to keep pace.
“When you leave the vibe coding piece and you really want to build that in a professional manner, that can be a bit scary for a developer.”
— Fred Roma [04:02]
Fred outlines three foundational requirements for a production AI data stack (04:34):
“We want the stack to be simple. We want the accuracy of the information retrieval to be really, really good. And we want to make sure... you can touch your application and make it evolve easily.”
— Fred Roma [04:51]
“We love that the full AI world is speaking JSON... you don’t have to stress about any change you will want to do.”
— Fred Roma [06:53]
“They're not chat products, they're search products at their core.”
— Kevin Ball [10:52]
“With the Voyager multimodal model, you just throw your PDF in the embedding models and you will have an embedding. ...The result will be even better than when you are doing all the pipeline.”
— Fred Roma [16:20]
“Voyage context model... will parse the full document and... preserve some context in addition to the specific chunk.”
— Fred Roma [18:03]
“Having search and vector search and the database at the same place, it's a big deal... you remove a lot of round trips.”
— Kevin Ball & Fred Roma [21:06-21:09]
“For this use case, I really want to make sure that my data is never in any cloud provider and never touched by any LLM provider... so I will run it on prem.”
— Fred Roma [28:22]
“Even people that are not engineers are pretty deep technically.”
— Fred Roma [36:10]
On Flexibility:
“Things are changing so fast... It’s super important to go with a data platform that can handle this flexibility and that if something change, you can change.”
— Fred Roma [50:32]
On the Limits of LLM Automation:
“They're incredible tools, but you cannot turn your brain off. Your brain as an expert is super necessary still.”
— Kevin Ball [39:41]
On Product/Engineering Convergence:
“As code becomes... more commodity, it means that product mindset is more and more important.”
— Kevin Ball [38:03]
“We are not vibe coding a database. There's too much at stake.”
— Fred Roma [33:52]
This episode underscores the breakneck pace and high complexity of taking AI applications to production. Fred Roma emphasizes that while prototyping with AI tools is easier than ever, building robust, evolvable, and secure systems for scale requires careful architectural thought—especially at the data layer. MongoDB’s strategy centers on integrating advanced vector search, flexible schema support, and the latest in multimodal embedding (via the Voyage AI acquisition) into a unified data platform. The conversation also explores best practices for balancing speed, accuracy, cost, and security, all while rethinking how product and engineering teams should function in this new era.
For more resources, see the show notes or visit the relevant links for Kevin Ball and Fred Roma.