Summary10 min read

Podcast Summary: The MAD Podcast with Matt Turck

Episode: Mistral AI vs. Silicon Valley: The Rise of Sovereign AI
Date: February 12, 2026
Host: Matt Turck
Guest: Timothy Lacroix (CTO & co-founder, Mistral AI)

Overview

In this episode, Matt Turck sits down with Timothy Lacroix, co-founder and CTO of Mistral AI—a French frontier AI lab making waves as a nimble, pragmatic European alternative to US AI giants. The conversation dives deep into Mistral’s evolution from an AI model development shop to a full-stack provider: building its own supercomputing clusters, offering enterprise and sovereign state AI infrastructure, and developing specialized tooling and agent workflows with a keen focus on control, customization, and trust. Throughout, Timothy shares candid insights about competition with hyperscalers, running European data centers, agent autonomy, pragmatism versus AGI hype, and the realities and timing of enterprise adoption.

Key Discussion Points & Insights

1. Mistral’s Full-Stack Evolution and the Vision for Sovereign AI

Mistral began as an AI lab focused on models but quickly evolved into a full-stack platform (models, deployment, compute, and tooling) tailored for enterprise and sovereign customers.
The model: giving customers modular building blocks to retain control, customization, and privacy.
- “The stack being modular is really important to us as it gives full control to enterprise and our clients as to which part of the stack they decide to own and control.” — Timothy Lacroix [03:22]
The “sovereignty” dimension refers not just to European data residency, but the customer’s ability to own, modify, and govern their AI infrastructure.

2. Building European Supercomputing: Mistral Compute

Mistral is constructing its own data center south of Paris. Motivated by a need for stability and deep expertise in large-scale training workloads, they found traditional partners lacking.
- “Our use of AI compute for large scale training was not necessarily well understood by a lot of providers… So we saw a way for us to build our own data centers.” — Timothy [04:14]
The facility serves both Mistral’s own needs and, via managed platforms, will provide compute to other European customers.
Challenges: Synchronizing trades and logistics for a “huge building with hundreds of people,” coordinating for energy and stability, and planning far ahead—very different from software timelines.
- “It’s a lot more long term planning than a few software features…” [07:46]
France & Europe: Benefit from cleaner, affordable energy mix (nuclear and renewables), but grid expansion is a challenge.

3. Competing with AI Giants: Efficiency over Infinite Capital

Unlike labs affiliated with hyperscalers and massive capital pools (OpenAI, Anthropic, Google, Meta), Mistral focused first on efficiency and lean, impactful investment, proving it could deliver competitive models with less.
- “We’ve been focused on efficiency from the start ... there’s so much to be unlocked in enterprise that I don’t think my main focus today would be into the gigawatts of power.” [09:46]
Mistral partners deeply (SAP, Nvidia) and is integrated with US cloud platforms (Google, AWS, Azure), but maintains independence.

4. Enterprise Reality: Modularity, Control, and Deep Customization

Full-stack deployment can integrate natively wherever the customer’s data sits—on-prem, in VPC, or hybrid.
- “We can deploy all of our stack on the client’s choice of deployment methods ... it lets clients build where their data is and without having to shuffle things around…” [11:07]
Mistral’s approach is highly “white-glove”: collaborative, with applied scientists and AI engineers embedding at clients to find high-value use cases (knowledge management, workflow automation, code modernization, etc.)
Customization:
- Large-scale continued pre-training (for major language or knowledge shifts)
- Fine-tuning (for efficiency, specialized tasks, edge deployment, or domain adaptation)
- “Fine tuning is a tool of choice ... when you want really fast, really cheap models that will be really good at a specific task…” [13:36]
Core differentiation: “Control.” Clients retain ownership of their data, expertise, and IP—unlike standard SaaS-style generative AI.

5. Agents, Workflows, and Trust over Autonomy

Lacroix favors thinking in terms of complex workflows—chaining agents together for real-world enterprise automation.
- Use case: Shipping container release automation for CMA CGM, integrating agent applications directly into daily harbor work. [18:27]
Agent “autonomy” is less important to Mistral than “trust”—ensuring that as agents interoperate on critical, privacy-sensitive tasks, there is governance, observability, and confidence at scale.
- “To me, the better question is how much you trust the agents… It’s really a new way to develop where the parts of your workflows have to be trusted.” [19:51]
Key agent tooling: workflow builders (not yet GA), registries for connectors/components, robust versioning, observability, and support for easy upgrades.
- “The software suite built for software development over years isn’t there yet in the AI world. That’s what we’re building.” [22:52]

6. Context Graphs and the Importance of Enterprise Context

Mistral’s internal “context engine” focuses on efficiently amortizing and accruing organizational knowledge (tables, systems, connections, etc.), making context accessible to agents.
- “Knowledge of the company and the context that’s available to the agent accrues and is maintained…if you want this to be efficient, you need to give access to the agent system to the entire data of your enterprise.” [24:43]
Current stumbling block: Making data available, cleaning it, and securely connecting it—still much work before agents are widely useful.

7. The State of Enterprise AI Adoption

Most enterprises are in the early “plumbing” phase—connecting systems and standardizing data.
- “There is still that phase of work that is just work to connect everything and then be able to build on it.” [27:02]
Lacroix predicts broader adoption within a “year, singular”—not multiple years—though widespread, seamless use is not yet here.
- “The real success is when you’re confident enough to give all of that control back to the company’s employees at large.” [28:18]
When agentic workflows reach full trust and automation, token usage and demand may explode.
- “Once you are not bound anymore by humans asking questions ... the amount of tokens generated for the enterprise will completely jump...” [29:16]

8. High-ROI Enterprise Use Cases

Coding (especially on legacy, enterprise-specific codebases) is a proven high-ROI use case, but requires customization.
Knowledge worker acceleration—the “magic” of asking your enterprise anything—is a coming but not-yet-realized leap.
Industry-specific data (e.g., seismic data in oil & gas, CAD in engineering) represents major future value for customized models.
- “If we manage to build a system where, in a light touch way from us, it’s all self-serve for the customers…then I’ll be super happy.” [32:34]

9. Edge Computing and Defense Applications

Edge AI is essential for offline operation, privacy, and very specific “voice to action” or defense use cases.
Mistral adapts models for smaller, on-device scenarios; this is valuable in settings from trains to defense robotics.
- “The more focused your use case is, the smaller you can make the model…” [33:35]
Defense: Active partnerships in France and Germany, with focus on control, validation, and well-defined, critical workflows.

10. Model Innovation: Mixture of Experts (MoE), Efficiency, and Reasoning

Mistral 3 leverages MoE architectures for efficient, high-performance training (advantageous for resource-constrained compute).
- Not always ideal for on-prem deployment (needs lot of GPUs); dense models are better for some cases.
- “Mixture of experts and their lower flops are very interesting.” [37:34]
Model progress focuses on structural advances, context handling, and real-world integration—not “AGI for AGI’s sake.”
Reasoning is a major post-training focus (e.g., the Magistral model), with reinforcement learning producing richer reasoning traces and improved tool orchestration.
- “There’s no real difference between creating a new thinking trace or calling the right tool—it’s all the same to me…” [46:04]

11. Developer and Coding Products (DevStrol, Vibe CLI, OCR)

DevStrol is Mistral’s agentic coding model, built for “vibe coding” (interactive coding via agent) with enterprise-scale codebases.
Vibe CLI productizes this agentic workflow; also ported into Lusha (chat assistant).
OCR3 provides lightweight, accurate document understanding—critical for KYC workflows and many enterprise automation tasks.
Most Mistral models are now multimodal (images, text, audio); research ongoing into video, initially from a robotics angle.

12. The Pragmatic Approach: Efficiency, Team-Building, and Expansion

Key to Mistral’s success: ruthless focus on highest-leverage areas given their resources (especially data quality), and staged evolution of team expertise (researchers first, then infrastructure and specialization).
- “Any improvements on the data quality would 10x the improvements that we would get by really improving on the model architecture…” [51:36]
Team and global expansion: Paris HQ with international offices (Palo Alto, Singapore), customer-centric global approach while championing European “sovereign” independence.

13. The Future: AI ROI, Democratization, & AGI Skepticism

Mistral’s future is about eliminating doubts on AI ROI, quickening time-to-value, and democratizing AI-driven tooling for all employees.
- “It should be easy and most people should be able to accelerate themselves through the use of AI…” [55:21]
On AGI and the “hype”: Enterprise adoption will always depend on robust infrastructure, governance, and trust—even if models with AGI-level intelligence arrive.
- “Even if I had some AGI model on my servers right now… if I were to go into a large bank and say, ‘Here is a thing, please let it control everything for you,’ they wouldn’t be happy…” [56:36]

Notable Quotes & Memorable Moments

On Model and Platform Control
“The software stack, once deployed, is in the hands of our customers. They own the model changes that we make... your expertise and what makes your company valuable stays yours.” — Timothy Lacroix [00:07, restated at 16:19]
On Building Infrastructure vs. Software Timelines
“You have to plan for the space to be available and on time. And so it’s a lot more long term planning than a few software features.” — Timothy [07:49]
On Value of Trust over Agent Autonomy
“What worries me when building those kind of workflows is… you might have governance concerns where some agent is acting on something very critical… So to me… the problems we’re solving are about how you trust what you’ve built.” [19:51]
On the Hype-to-Reality Gap in Enterprise AI
“Most of the enterprise value of AI will happen once you've gone through that first building phase … the reality is… it's still not easily available in the format and at the scale that we need for the true ROI of AI to happen.” [26:54]
On AI Token & Demand Plateau
“The expectation is that demand… for the enterprise will completely jump once you are not bound anymore by humans asking questions…” [29:17]
On AGI versus Enterprise AI
“Even if I had some AGI model on my servers right now… if I were to go into a large bank and say, ‘Here is a thing, please let it control everything for you,’ they wouldn’t be happy.” [56:36]
On Mistral’s Focus
“We're trying to get the best models that we can and the model that's most useful for the use cases that we cover in enterprise.” [38:13]

Timestamps for Key Segments

[02:18] Mistral’s evolving vision: Enterprise & sovereignty
[04:11] Why build your own European data center?
[09:32] Competing with hyperscalers and “pockets of money”
[10:55] How enterprise and sovereign customers deploy Mistral
[13:08] Model customization: fine-tuning, pre-training, adaptation
[16:15] Data sovereignty and the value of “control”
[17:10] Agents vs. workflows; “trust” as the priority
[18:27] Example: Automating shipping container release
[19:48] Trust, governance, reuse, and observability in agent workflows
[24:03] Context graphs and the challenge of enterprise context
[26:24] Are we early in enterprise AI? Plumbing and building
[29:14] Token demand: When agents run in the background
[31:00] High-ROI use cases: Coding, knowledge, domain data
[33:29] Edge, privacy, offline, and defense applications
[35:43] Model strategy: MoE, dense, architecture flexibility
[38:13] What’s the “ultimate goal” for Mistral’s models?
[45:12] Reasoning, Magistral, RL, and new capabilities
[46:36] DevStrol & Vibe: Agentic coding for enterprise
[48:30] OCR3 and document understanding
[49:54] Multimodal focus and where video fits
[51:16] Mistral’s team-building and efficiency philosophy
[54:12] Operating across France, Europe, US, and Asia
[55:16] The next few years: democratizing custom AI tools
[56:27] AGI hype: Why pragmatism & trust still rule enterprise

Tone and Style

Timothy Lacroix is relentlessly pragmatic, focusing on what it takes to make AI infrastructure and value work now—not lost in AGI dreams.
The conversation is candid, technical, and non-hyped. There’s humor in the “plumbing” metaphors and an understated confidence driving the vision for sovereign, trusted AI.

Conclusion

Mistral AI is building a formidable European alternative to US AI hyperscalers, with a dogged emphasis on customer control, trust, and real-world enterprise adaptation. Whether through pioneering full-stack sovereignty, pragmatic agent tooling, or innovative model architectures, Timothy Lacroix and his team prioritize enablement, not just intelligence—the plumbing over the promise. The future, as painted here, is less about AGI headlines and more about making AI boringly reliable and democratically empowering across the world’s enterprises.

Loading summary

Transcript90 lines

[00:00]
A
I think the expectation is that demand and amount of tokens generated for the enterprise will completely jump once you are not bound anymore by humans asking questions or reading them. As soon as you have enough trust to have agents running in the background, you're not really limited by the number of tokens. The term we use is control. The software stack, once deployed, is in the hands of our customers. They own the model changes that we make. And I think it's really important as a customer to consider that your expertise and what makes your company valuable stays yours.
[00:41]
B
Hi, I'm Matt Turk. Welcome back to the Matt Podcast. Today we have a special episode with Timothy Lacroix, the CTO and co founder of Mistral, the company that proved that you could build frontier models with a fraction of the compute of the US giants. But recently Mistral has quietly evolved into a much more ambitious full stack industrial power building not just the models, but the platform, the deployment stack and their massive supercomputing clusters. We covered a lot of ground in this one, the engineering behind Mistral 3, what sovereign AI actually means in practice, and Tim's contrarian view on why trust matters more than autonomy for agents. If you're tired of the AI hype, Tim is refreshingly no nonsense. Please enjoy this great conversation with Timothy Lacroix.
[01:25]
C
Hey Timothy. Welcome.
[01:27]
A
Hey.
[01:27]
C
So as I was prepping for this, I was struck by how much has been going on at Mistral over the last few few months. I think most people probably know Mistral as a provider of open source models. It seems that you guys evolved from an AI lab to more of a full stack solution focused on enterprise and sovereign customers. So just to set it up, in the last year you guys raised a 1.7 billion euros Series C led by SML at an 11.7 billion post money valuation. You launch a bunch of models which we're going to talk about is the big vision behind all of this, that enterprises and sovereign states are going to need their own AI infrastructure and Mistral is going to be the provider.
[02:18]
A
So the big vision has been evolving and as you stated, we started as a company that built models because with Arthur and Guillaume this was what we knew how to do at the start. The premise on which we built misrule AI was immediately solving for enterprise needs. And we started with Open Weight's model after this and working with enterprise, we realized the need for basically the rest of the stack. So we built the serving platform because infrastructure was needed and then all of the tooling around it was also something that we saw was missing. More than the tooling, it also requires a lot of work and expertise still to get deep into an enterprise workflows and really help that transformation. And so we built that FDE function and more recently with Misrule Compute, we're going a bit lower in the stack as well. So we've done all of this because it was required for enterprise success while still continuing on our model's journey. All of this stack being modular is really important to us as it gives full control to enterprise and our clients as to which part of the stack they decide to own and control, which is maybe more involved or that they decide to have serverless or basically this modularity that we like.
[03:44]
C
All right, so let's take some of those modular components in order.
[03:48]
B
Let's start with Mistral Compute.
[03:50]
C
So that was a big announcement I guess in June of 2025, putting partnership with Nvidia to help with this effort. What's the current status? Is it live yet? Are you building it? You know, how does one go about building data centers or leveraging data centers in Europe maybe?
[04:12]
A
First, to go into the reasons why we decided to start building our own data centers, we tried a lot of different partners over the years and we realized that our use of the AI Compute for large scale training was not necessarily well understood by a lot of providers. And our need for stability, especially like when you run inference on a few GPUs or when you run small scale trainings on hundreds of GPUs, margin for error is a lot larger than when you run trainings on thousands of GPUs at the same time. And so to address this need for stability, we saw a way for us to basically build our own data centers and maintain it with our understanding of what quality looks like. And so that was why we launched Mistral Compute. And when we decided to do it, we also realized, well, maybe others will benefit from it. We launched into a bigger basically development than what was previously intended. And so this was announced in June, as you said, since then the building of the facility has progressed quite well. It's in the south of Paris and we are right now running through the stabilization, stabilization of the first trench. So it's quite a large data center. So delivery doesn't happen in one day. And the first part of this data center is something that we are working on as we speak. We have a few jobs running and we're fine tuning basically all of the last things to run at speed and with the right stability.
[05:48]
C
Okay, great. And did I understand correctly, it's going to be for your customers and your own needs around training, but also you'll be providing it as a service to others in Europe and beyond.
[06:00]
A
Yeah, exactly. So we will use part of that capacity for ourselves as one of our training clusters, but we will also provide a managed Kubernetes and managed Slurm stack on top.
[06:12]
C
Okay, any lessons learned so far? I mean, as you said, you guys come from a very deep background in AI and AI research. It's a whole different thing to build a whole data center facility. How have you gone about it and what are some things that surprised you.
[06:27]
A
In any lessons so far as most new experiences? As a founder I relied on the knowledge of others and so I was lucky to have a few seasoned HPC experts and a lot of cloud software experts as well to build that solution for me personally. And it's one of the things I love about my position at Misrel is that I get to discover so many new things and so many new problems. I hadn't thought possible having to learn to like all of the different parts of building a data center, all of the different trades that you have to coordinate all of the potential synchronization between all of the different trades. I mean it's a huge building, it involves hundreds of people working on it. You have this, then when you stand up the thing, you have to question what works. You have to filter through the blades that are faulty. It's just an entire new area of work where I get to see experts in their field go through things and try to explain to me what their daily work is. It's always fascinating to see an expert in this field like do something that you don't know how to do. I think the logistics of it and the timelines are also quite different from what I'm usually dealing with in software and research. For new capacity to be built, you have to plan around having energy available. You have to plan for the space to be available and on time. And so it's a lot more long term planning than a few software features.
[08:03]
C
How do you guys go about power? Since you mentioned energy in what we've.
[08:08]
A
Been doing in Europe so far hasn't been a huge blocker, although there is constraint. I think the grid in various parts of Europe is not necessarily easily extensible. I know it's an issue in France. A lot of the sites are contended, so we'll see how it all develops. We are lucky in Europe to have very clean and affordable energy either with green energy in the Nordics and nuclear in France. So it's been relatively okay for us today.
[08:42]
C
As you describe this, what comes to mind is the gigantic amounts of money that are being invested in the US around data centers. How do you guys go about that from a finance sensing standpoint and perhaps even more taking a step back, if you think about the race between the big AI labs globally, whether that's the OpenAI's and Anthropic of the World and XAI, it seems that all of them are affiliated with a gigantic pocket of money somewhere. Obviously there's Gemini and Google to add to the list and Meta. I'm just curious, where do you guys stand on that? You have a bunch of partnerships with SAP and Nvidia, but you don't have one of those gigantic companies on your cap table. So how do you think about competing in that general context?
[09:33]
A
So with those companies, so the hyperscalers, there are two parts to the game and we've played the partnership part quite well with them. And we're integrated within Google's Vertex, Amazon, Bedrock and Azure AI Studio. And that is the choice that we've made in terms of having access to gigantic pockets of monies. We've been focused on efficiency from the start, and I think we've done quite well at building models that are competitive with the investments that we've put in for us. It's important to build the company as efficiently as we can. And I deeply believe that with the capabilities that we have today in the models, there is so much to be unlocked in enterprise that I don't think my main focus today would be into going into the gigawatts of power. We still need to build so much with our clients and unlock so much values with the capacities that we have.
[10:38]
C
All right, so let's go into the enterprise reality of all of this. So if I'm an enterprise or if I'm a sovereign and I want to deploy a Mistral open source model, what is it that I do these days with everything that you've built?
[10:55]
A
The way we work with enterprise? I mean, as you mentioned, we have a few of our models that are open source and Apache, and all of our clients are welcome to use them as they need. What we have seen in terms of success is that given the current stack, it still requires a lot of expertise to manage to come to actual value and things that go to production. Basically the way we interact is that we usually stand up our Mistral AI Studio, which is our platform, and we can deploy all of our stack on the client's choice of deployment methods. So it can be on prem, it can be on their vpc, it can be in several places. The reason we do this is that it lets clients build where their data is and without having to shuffle things around, which as I've learned as a cto, is something that you don't want to do ever because it raises a lot of questions and it's quite a stressful thing to do. So once this is deployed, we then work with the business units to understand where their pain points are. Sometimes it's knowledge management and I think it's the most well known use case from the output from the outside of the enterprise world. But it's also around automating core workflows for the enterprise. It's some tooling that you wouldn't expect where one thing that we've done is around code modernization, where you turn a bunch of Excel sheets into an actual Python app. And if you have many, many of those sheets, then potentially you want to use AI for this. So once the infrastructure is built, then we basically look for what's the most valuable to the customer and we start accruing value inside a stack of AI assets. That then accelerates all of the other developments with that customer.
[13:00]
C
And is part of the idea that you do actual model work at the customer and for the customers in particular? Fine tuning?
[13:09]
A
Yes, we customize in various ways. So we have done continued pre training and this is most useful when you want to change the capabilities of a model more deeply. So we've done this to sometimes change the mix of languages in a model to get something that's a lot better at Southeast Asian languages, for example. Or you could require this if your internal data, which doesn't happen on the public web, is something that's so new that you need a large amount of tokens to get a model that understands it and becomes fluent with it. So we do these kinds of continued pre training. Fine tuning we also like and this is more for an efficiency reason. When you get to smaller models, you have to make trade offs. The models won't be as good in their knowledge of the world. And so when you lose a lot of things, you have to focus on what you really care about. And so this is typically important if you want really fast, really cheap models that will be really good at a specific task. It's also useful if you want models that run on the edge that get very, very tiny. And so for all of these, fine tuning is a tool of choice. Another reason to do fine tuning it can be to adapt to data that's not necessarily massive, but that's also not available on the web. So typically encoding what happens is that you will have massive code bases sometimes accrued over decades that the model will need to be able to work with in terms of having like Vibe deployed on it typically. And so being able to come in, not move the code base and learn an actual coding agent for that code base is really powerful as well.
[15:04]
C
And who does all of this? You have evolved towards an FD model.
[15:09]
A
So we have indeed a large FD section. It's a mix of software and FDEs and we split our FD into what we called AI engineers and applied scientists. And so applied scientists will tend to use the tools that we've just talked about. So fine tuning, continued pre training and the likes where AI engineers will focus more on adaptation to the enterprise environment and figuring out what workflows to automate and all of this they work with the customers to make sure that the use cases are indeed providing values and going to production. But it's also a fantastic way for us to understand what matters in an enterprise context and be faster at building the right platform.
[16:00]
C
And again, those customers are the kind of customer for whom customization and privacy is essential. How do you position again, OpenAI's entropy of the world that are going very hard at the enterprise? Is that data sovereignty? Is that customization?
[16:16]
A
The term we use is control. The value that we see is both in our expertise and the software stack that we provide. The software stack, once deployed, is in the hands of our customers and they can change it. They can add to it the own model changes that we make. And I think it's really important as a customer to consider that your expertise and what makes your company valuable stays yours. And so in working with us and building, because it takes effort to build an AI advantage today. And so having this effort built into something that you own is I think, a choice that makes sense.
[16:58]
C
Let's talk about agents. Obviously part of the overall effort at Mistral. How does that work? How do you build an agent and what key use cases have you seen so far?
[17:10]
A
Personally, I think I've moved from agents to workflows, which is I guess an abstraction on top. So agents are, I think, the building blocks where you have a given expected input, a set of tools, and you are trying to reach a set of. You have a goal that you want to reach. The set of inputs that we've enabled are images, text and audio. When you build an agent, to me it's really important that you build it on a Focused task with data set that you understand and that you can iterate on and that you can improve. What we see in enterprise is rarely things that are solved with agents because that's not necessarily where you would expect an FD to be most useful. Those ideally would be built on our platform by the customers directly, where there is more values. Value is in more complex workflows where you will have several agents interact through a workflow to automate something slightly more complex. And so that's what we've been focusing on.
[18:26]
C
What would be an example?
[18:27]
A
An example is something that we've built with the shipping company CMA cgm where we've automated the container release process. And so it's a use case where I don't know how familiar you are with shipping. I wasn't at first, but a container reaches a port and you have to harbor, probably in English, some decisions, decision has to be made that this container is ready for release to the next person on the line to handle this container. And so there are lots of checks that need to be run and data to be accessed in the backend before that decision is made. So as you can imagine, some of those containers are extremely valuable and you can't really afford a mistake. And so what we've done in this case is an application that's integrated into how these harbor worker work and it automates a lot of the manual work that they did to check the data and they make the final decision given all of the evidence.
[19:33]
C
Okay, this is super interesting. Obviously the key question about agents these days, especially when they are combined into workflows, is the question of autonomy. How do you guys think about it? How autonomous are those agents in your deployments?
[19:49]
A
I don't know if it's the way I think about it. To me, the better question usually is how much you trust the agents. And there are a few dimensions around this. What worries me when building those kind of workflows is that typically if you want the value to accrue and if you want to build faster and faster, the more workflows that you build, what you will want to do is reuse assets and make them reusable by others. As soon as you do this with agents, you then start to ask the question, well, this agent has access to some data that is privileged, but maybe this other agent is publishing it to something that's public. You might have governance concerns where some agent is acting on something very critical and you don't know necessarily that the data that it got has been approved or something like this. It's really a new way to develop where the parts of your workflows have to be trusted. Each of them to be trusted requires quite a lot of tooling and quite a lot of observability to get confidence and to basically enable this at scale in an enterprise. So the question that you're asking about autonomy, to me, this is something that I see happening when I vibe code. Sure. Like longer running tasks and making and improving on this is going to be critical and we're working on it daily. But today the problems that we're solving on the software side of things are really about how you trust what you've built and how you improve it and how you allow an entire company to build on it with confidence.
[21:26]
C
Maybe describe some of the things that you guys have built in a studio around governance, as you mentioned, and trackability and registry, all the things. What are the key components of a modern agent suite?
[21:40]
A
So workflows, as I mentioned, is something that we've worked a lot on with our customers and it's not GA yet, so look out for this sometimes in the future. But it's also one of the benefits of working with enterprise. We can have a lot of design partners and once we're confident with the solution, we make it ga. So a workflow solution is critical. Workflows are built on various model capabilities, so vision, audio and text and reasoning. It is important to have a registry of connectors and mcps. And so for this we have our connections. The observability is an area where we're still working on. It's important for me to be able to iterate and really define precisely what an agent does and control each of its goal and see how it's progressing, being able to maintain evaluations and build on them. What is difficult in this entire sea of complexity is that you also have to maintain proper versioning and tagging and think about how you're going to deploy and improve upon what you've built. So let's say you've built a kickass workflow based on a lot of agents and models that Mistrals has released in the past. Then a few months pass and there are new sets of models that are out. Maybe you can simplify that workflow. Maybe the next Mistral4 is good enough that you can factor out a few agents. Basically what you need to be able to do is create a new agent, run it on the same set of inputs and outputs, and control that you haven't broken anything, and then deploy it in the wild. All of this Software suite basically, which has been built for software development over years. I feel it isn't there yet in the AI world. And that's what we're building, as I'm sure you've seen.
[23:40]
C
There was for the last few weeks in startup and venture circles, there's been this whole idea of the context graph as an infrastructure that made the rounds. Is that something that you think about, a layer that would basically enable one to know how the agents made a decision and how those decisions relate to one another?
[24:03]
A
I've seen this indeed. And I think there are two levels to that discussion. The part that you mentioned at the end where it's interesting to know how an agent came to. So in that discussion, when we talk about understanding how an agent came to a decision or an action, the game is really to understand how a human agent really made this decision. It's understanding how an enterprise does what it does. And it's certainly interesting. What keeps me up at night and what I really want to solve first is just the basic idea of gathering a workable enterprise context. Right now, with any model and with a lot of effort, you will be able to get some connections to tools and you will ask a question and your agent will do a bunch of things. It will realize that, oh, by doing five API calls and three joins, I can probably get what Timothy asked immediately. What should happen is that all of that discovery and all of that intelligence should be stored somewhere to be reused. It's not really how things happen, it's just basic knowledge about what the infrastructure of the company is. So knowing where the tables are, what they contain, how they're joined. So all of this is compute that should be amortized, basically. And to me, it's really the entire game with the context engine, as we call it internally is to be in a setup where over time knowledge of the company and the context that's available to the agent accrues and is maintained. The second order thing of oh, how was that decision reached? Sure it's going to be super interesting and it's important, but right now I feel we're not even in a place where it's easy for an enterprise to have any worker in it be able to build an agent that has access to the right context for this to happen. You have huge data privacy concern. If you want this to be efficient, you need to give access to the agent system, to the entire data of your enterprise and there is going to be RBACs everywhere and you need to make this safe.
[26:25]
C
Speaking of which, what's current Reality of enterprise deployments of generative AI. From your perspective, just listening to like some of the concern, like, sounds like we very early to me, we are.
[26:38]
A
Still in the building phase. And I think it's kind of the frustrating thing for enterprise is that when you come to a chat assistant, you feel that it's magic and it's all going to work. But as most things that have value in life, there is still work to be done to get to them. And so most of the enterprise value of AI will happen once you've gone through that first building phase of just setting up all of the machinery, you've got to set up all of the connections, you've got to make all of that data available. And the reality is, even despite a lot of work recently to make data more available in enterprise, it's still not easily available in the format and at the scale that we need for the true ROI of AI to happen. And so when we come in, there is still that phase of work that is just work to connect everything and then be able to build on it.
[27:36]
C
So do you think we are years away from generative AI actually being deployed in the enterprise?
[27:41]
A
Not years. I think years, singular. It's also, to be fair to us, we've started working. I mean, the company started two years ago. And so most of our.
[27:53]
C
It's a good reminder. It's a good reminder that you get have done all of this. And the company was started in June 23rd, right, if I recall.
[28:01]
A
Yeah. And so for most of our clients, we started working with them recently. The tooling for everyone is still in its infancy. And so I hope that the tooling will stabilize and I hope that we will have true value. True value to me is really okay. We've gone through that first phase of building connections, and now employees of that enterprise are able to use everything that we've built. Right now, I think we're in a phase where we build siloed things because we're scared of data going through walls and everything. And so to me, the real success is when you're confident enough to give all of that control back to the company's employees at large and they start really building on it.
[28:43]
C
You're talking about mistral, in particular, about the industry in general. Right. Do I understand this correctly? Because obviously that's the big question, right? We all collectively building this whole thing and data centers and models and pouring billions. And I think it's pretty clear that from a personal use case or from maybe some discrete coding use cases, the demand is very Clear. But the big question is whether demand is going to materialize at the same level as the extraordinary level of supply we're building.
[29:14]
A
Yeah, around this I think the expectation is that demand and basically amount of tokens generated for the enterprise will completely jump once you are not bound anymore by humans asking questions or reading them. As soon as you have enough trust to have agents running in the background, as soon as you've set them to run a bunch of ETLs, as you've got them running lots of workloads and you've got them consolidating data and knowledge across your entire company, then you're not really limited by the number of tokens that humans can create or read. And so I think everyone in the industry expect the demand to jump at that point. And the reality is for this to happen, you just need a lot of boring software and control and things like this.
[30:09]
C
It's amazing how much all of this is engineering. Right. Versus just sheer performance of models.
[30:15]
A
Yeah, it's a lot of plumbing and the goal is to make all of this plumbing easy and easier and to make it faster.
[30:21]
C
All right, and you said we were about a year away.
[30:24]
A
I'm not the most optimistic person. It might be faster, who knows.
[30:28]
C
And we talked about use cases a bit already, but let's just put that one to bed because it's such an important question. What do you think are the kind of the banger use cases in the enterprise? Let's assume like all agents work in a workflow kind of way that you described based on either your industry watch or more specifically talking to your customers. What is it that is going to generate a amazing roi? Beyond coding, which is pretty established at this stage.
[31:00]
A
Yeah, there are several dimensions to this. Coding is an obvious one. And to me, to get the full ROI of coding you need customization because a lot of ROI is unlocked on sprawling code bases that are completely impossible to know for something that's been trained on the web. If you've got an enterprise that's been building its own domain specific languages for years, you'll need some customization for an agent to come in and be competent in that respect. So coding is definitely a big one. If everything comes true as I hope, I think there is still a huge jump in how we accelerate knowledge worker and I believe the magical experience of you go to your chat assistant, it's connected to your system and you can ask it anything about the enterprise just hasn't realized yet. And it's really obvious when you see the kind of queries that People are making expecting them to just work. And to me who's building the system, it feels like magic. Like if you need to somehow send an email to three people and coordinate a meeting and also like gather data from some BI system, it's just something that requires a lot more plumbing and capabilities that we have today. So that's going to be a huge lift. And I think the last one, which is maybe closer to my heart, is really when we start to customize models to a kind of data that is particular to an industry. So typically if we work in oil and gas, they will have seismic data that we can help understand and make sense of. If we work with computer assisted designs, they might have full databases of specific data formats that are not widely understood by the most general models yet. And if we manage to build a system where in a light touch wave from us, in my dream world, we don't really have to intervene. It's all self serve for the customers. They can consolidate that data and then build themselves a model that really understands what their actual private IP is made of and make sense of this, then I'll be super happy. And I think there is huge value to unlock there.
[33:25]
C
Great. Where does the edge fit in all of this?
[33:29]
A
There are a few reasons to go edge. First, there are some regions where it's more convenient to be able to work without Internet and there are also a lot of capabilities that don't necessarily require a huge model. So if you just need something that goes Voice to Action on any device today with typically the voxroll models that we develop, this is doable. Again an area where the more focused your use case is, the smaller you can make the model through fine tuning or through just distillation in an even smaller architecture. I think Voice to Action is going to be a big use case. I think it will simplify a lot the current stacks for these types of things. There is also some privacy things where you could imagine all of the context consolidation stays on your personal device. And for most things you can deal with a small model that answers a lot of your questions and then you potentially can gate what goes out to another cloud based models. I myself take the train a lot. I like having coding assistants. Having Devstrol run on my laptop while I could on the train is comfortable despite the bad WI fi.
[34:53]
C
And presumably there are some defense use cases as well. So you guys do quite a bit of defense work as I understand it, with France, with Germany. I think you mentioned some partnership with Helsing Isai on drones and that kind of stuff. Is that a reality?
[35:10]
A
A reality? It's something that we work on. Yes. We have a robotics division that, that works with these partners. Having a very well defined use cases makes us able to really take the model down to lighter types of sizes. And it's of course use cases where control is super critical and you need to be able to really validate the solution.
[35:43]
C
All right, let's switch to the model part of the discussion. In December you guys released Mistral 3, which was a big release still with the MOE architecture, which is at the core of what you guys have been doing. You mentioned efficiency earlier in the conversation. Maybe walk us through the general thingy and approach. Like in a highly competitive world of AI models, both in terms of closed source, but also very much open source and all the Chinese labs. What is it that you guys are trying to do and how do you position?
[36:21]
A
Yeah, so we've released Mistral Large 3, which is an MOE. MOEs are really nice systems to train because of the lower amount of flops, which makes us able to push performances a lot more during training. They are not necessarily the best format for on prem deployment because as of today, if you want to get the best efficiency out of a mixture of experts model, you require a lot of volume because you're looking at deployments across dozens of GPUs usually. And to justify that amount of GPUs, you need to have the right throughput. We are training large MOEs to get the best performance with the most efficiency during training. We are also continuing to train dense models at other scales because depending on the environments in which our clients want to deploy, this might be the more cost efficient solution. I think both architectures are still valuable on edge as well. Sometimes you just, just don't have the RAM capacity to deploy something like a sparse mixture of experts. And so going dense is helpful there as well. But yeah, definitely for training. Mixture of experts and their lower flops are very interesting.
[37:48]
C
What is the ultimate goal of the model effort? I mean, clearly you guys are a frontier AI lab, but are you trying to create the best models and solve AGI, or are you trying to be the best open source model compared to the Chinese labs or whatever open source eventually comes out of the us what is it that you're trying to do?
[38:14]
A
We're trying to get the best models that we can and the model that's most useful for the use cases that we cover in enterprise. And so typically with the rise of agentic behavior, one thing that's very important is how you deal with Various contexts, how you deal with various documents being added to the input. And so having the capabilities to do architecture iterations, really trying new things in terms of model training is critical. So we're pushing the boundaries of what the current models can do with the compute capacity that we have. But we're also trying to focus on the things that is most annoying in our deployments today. And so one of the considerations that has been solved with a few RNIS tricks is the context of those agentic systems. So it's visible typically in Vibe coding, but it's definitely applicable to a lot of other use cases where through all of the tool calls, you'll have to consolidate and summarize the context to be able to fit everything and have the model focus on the right parts. To me, this is just an artifact of the current architectures. We're trying to fit things in a linear context. Windows, where essentially the questions that we're asking aren't really necessarily all linear. And so we rely today on the file system for this. And I think that was the big change in realization through Vibe coding, is that agents are good enough at manipulating file systems that they can use this as a replacement for their context window. Basically, they can select parts of what they want to read, they can select parts of the tool results, and this minimizes the context length requirements. This is the state today. I think we can do much better, and I think there is a lot of improvements to be done on those types of questions.
[40:32]
C
Do your agents run on sandboxes?
[40:34]
A
It depends on the types of agents, but the answer would be yes if it's coding agents. Usually we have sandboxes that will let the agent iterate and run. I think the depth of the isolation will depend on the use case. Typically, if the file system is just representing textual context and you're not expecting the agent to do much action on it, then you don't really need a full sandbox. You just need some representation of that context as a file system. And it can be any sort of abstraction. But if you are, I don't know, typically running asynchronous code development, then yes, you need a sandbox.
[41:15]
C
Great. What is the current constraint that you guys are facing to make Mistral 4, when it eventually comes out, do much better than Mistral 3? Is that a question of Mistral compute or is that a question of data? And in particular, are you guys doing anything around synthetic data that you can talk about?
[41:37]
A
Definitely. Compute. And the current deployment that we have will help as it's going to be giving us a lot more Grace Blackwell capacity than we had in the past. And so that's something that we're very excited about. And when you add compute, you also have to add data. And so we've been hard at work making sure that our data mixtures are as high quality as ever and growing in size. But as you mentioned, one of the ways to do this is through synthetic data. In terms of where we use synthetic data the most, I think a lot of the interesting work that's happening is for the post training part where we can build environments that look similar to an enterprise and then try to synthetically create queries that are hard and that will require multiple hops. And so all of this work is in addition to the coding work, the reasoning work is really what makes the final model able to perform in the various environments that we work in. So before it was about accruing world knowledge and the web helps a lot with this. Now it's more and more about acquiring know how. And for this it's really about trying to find what our customers are trying to do, trying to replicate it inside of our training environment and let the model run.
[43:12]
C
Basically you mentioned post training and that's one of the key topics of the last 12 months. In particular, this evolution of LLMs into systems with both pre training and post training and a lot of reinforcement learning. Where do you guys fall in that spectrum? Are you pushing a lot of reinforcement learning? Do you believe that pre training has still room to grow? How do you think about it?
[43:38]
A
Yeah, everything still has room to grow. What I'm interested in as the CTO is really how you make all of the steps of the pipeline work well together, how everyone can develop most efficiently. Typically what happens in post training is that you will have a team that's working on improving code, you will have another team that's improving different enterprise behaviors, you will have another team that's improving on instruction following. And so all of this at some point has to come together because customers aren't happy if you require them to deploy five different models to get their job done. There is really an internal engine and capability around making all of these work stream come together in the way that you expect. That is super interesting to build. And so, but yeah, internally we're building and improving all of the parts of the stack. I think the post training is very rich because it also touches all of the new use cases of LLMs and I think it's been very exciting to see just all of the new use case that pop up every day. Anytime someone on Twitter finds a new exciting things that they've done, then suddenly you've got to make it this proof of concept into potentially a base capability on which your model will perform well. And that's potentially an entire stream of work. And you've got to do this efficiently and prioritize.
[45:12]
C
Well, where does reasoning fall in all of this? You guys launched a reasoning model called Magistral a few months ago. Is that a big priority?
[45:23]
A
So reasoning is a big priority. And the interesting thing about reasoning was really how you can train models with reinforcement learning. And so it was first shown through reasoning because the system would learn to create better reasoning traces to get to better results. But the system is the same whether you create reasoning traces or whether you iterate on the tools that you call or mixing both. And so I think more and more the way to train all of this is going to come together. And sometimes you'll have reasoning traces, sometimes they'll be long, sometimes they'll be short, sometimes there won't be any because it's not necessary. And there's no real difference between creating a new thinking trace or calling the right tool. It's all the same to me because what you're optimizing at the end is what is the best output for the model to create before it gets results. To me.
[46:22]
C
Great. Let's talk about Dev Stroll 2 and the Vibe CLI. So walk us through those products and what they do and why people should use them.
[46:37]
A
Sure. So devstool is our agentic coding model. And so it's something that you typically Vibe code with and you are more than welcome to Vibe code with it through our cli, aptly named Vibe Value of Vibe Coding and why we focus on it Coding is a huge use case in enterprise, and especially a lot of our clients have large code databases where it's helpful for us to take our system and customize it to their code base to let our agent run. Now, the devstool and agent decoding is not only about Vibe. Coding. The same system, when you run it asynchronously, can be used to review PRs. It can be used to check code for specific conditions, it can be used to modernize code. So its applications, even encoding, are quite wide as I alluded to as well. Having a system that is good at handling a file system is more generally very interesting. Even if you're not using it to code, you can use it to reason about enterprise knowledge. You can use it to connect to enterprise systems. And it's to me, it's the basis of really the enterprise intelligence that we're starting to build. And so the big news is, yeah, the. That those systems are going ga. We've got an offer where chat users. So Lusha, our assistant, will also get the ability to use Vibe and the associated models and we're trying to basically make that usage as wide as possible.
[48:18]
C
Another thing that you released reasonably recently, I believe is OCR3. What does that do that enables you to just like scan any. Any form, any document?
[48:30]
A
Yeah, OCR is a huge use case in enterprise. A lot of our customers have. I mean, the typical example is kyc, where someone will submit a form and you need to input that information in a structured way in your systems or you need to reason about it. And so ocr, interestingly is it's not the types of systems that I would have expected LLMs to really make large strides on the visual reasoning. And the visual understanding has gotten so good that it's just an easier way to process things. In my mind, you have any sort of input and you can get the data that you care about. As I mentioned, when you build agents, you have a different type of inputs for the task that you're trying to solve. Documents and visual information are just a very, very frequent kind of input. Sometimes it's a lot cheaper to use a small OCR model to just get the text that you care about and then potentially process it or deal with it with another system than to run it through a large multimodal model that will basically do the same thing, but at a higher cost.
[49:41]
C
Yeah, you mentioned multimodal. To which extent is Mistral multimodal or to which extent is that voice? Is video something that you guys either do or think about, or is that just not a big enterprise use case?
[49:54]
A
So to answer on the first part of the question on whether we build multimodal models, yes, it's always a balance between exploring in a direction, getting good capabilities and getting the first model out there and then integrating it into the trunk, the main model that we use for everything else. And so those will always happen at separate times. But for audio we have Voxroll, as I mentioned. And all of our main models understand images and can reason about them for videos. It's a subject that we tackle through the lens of robotics first, and so we're doing our first explorations on that topic.
[50:32]
C
Okay, well again, the Velocity has been super interesting to watch. I again appreciate your reminding us that you guys. I've been doing this for only a couple of years, so just very impressive. Altogether Maybe taking a step back and thinking all of this in terms of engineering and lessons for builders. So as we alluded to a couple of times through the conversation, you guys are doing a lot with comparatively, it's very relative in the world of AI, less resources. How have you been able to do this from an efficiency standpoint?
[51:17]
A
We focused on the parts that we knew would provide the most impact and we focused on basically what we could afford at different times. So when we started and we had enough resources to train a few models and then we focused on getting the data perfect because we knew this was potentially not the most exciting part of the work, but it was absolutely critical and any improvements on the data quality would 10x the improvements that we would get by really improving on the model architecture or things like this. And so I think it's focusing the right effort depending on the scale and the. Yeah, depending on the scale of the company.
[52:05]
C
And from a team building perspective, how have you gone about it? The three of you, the three co founders have a deep background in AI. Are you these days focused mostly on building like an FDA team or are you still building this large kind of like research lab effort and how do you think about the right ratio?
[52:31]
A
We are growing all of our teams, both research FDEs, product engineering infrastructure for compute and all of the teams have their own challenges in how you build and what order you recruit people in. It's been important to me at the start to, I mean to me and, and Guillaume and Arthur. We both like the three of us were good AI practitioners. So we knew how to train models and we knew how to code. And so we started with people like us to get to the models trained the fastest. But that doesn't work as you scale. It is critical to build the right infrastructure for research. And so this takes different skill sets and it's something that we've been building over the years as well. And it's fascinating as someone who used to do research at a smaller scale to see the kind of systems that are involved and the gains that you can have at scale in terms of engineering. It's kind of the same story really where you start with a team that's broad in its knowledge and self sufficient and can iterate fast. And then more and more you bring in experts or people that are, that have seen larger scale and will tell you like, well this won't work in six months and so we should fix that now. So it's been super interesting growing the company and seeing all of the successive things that break at each scale. And overcoming them through either changing the system, changing the organization, or building new things.
[54:12]
C
How have you navigated the whole Europe to US and rest of the world dimension of this? You're very much the pride of France, the pride of Europe as well. Equally. This is a global race. How have you made it work?
[54:29]
A
So we work on all three continents. We have offices in Palo Alto, we have offices in Singapore as well. Most of our employees work from Paris. It's a good representation of what we're trying to build, which is a solution that's independent and that people control. And this target, it doesn't really matter where we're from or who we're building for. We provide the tools and the customer, the end customer then owns everything that's built on it. And so I think it hasn't really been something that I've spent much thought on.
[55:10]
C
So what should we expect from Mistral over the next couple of years?
[55:16]
A
Over the next couple of years, I would say diminishing doubts on the ROI of AI, ideally. So faster time to success, larger and larger use cases being built, and really democratization of building tools with AI in enterprise. I think this is really what I target for our customers. It should be easy and most people should be able to accelerate themselves through the use of AI. I think we've seen this happen quite impressively for coding, and it should be something that happens a lot more widely.
[55:58]
C
I was struck throughout this conversation by how pragmatic you are and focused on precise goals around enterprise success. What do you make of the whole rush to AGI conversation and people being AGI pilled in San Francisco and other places? Is that something that you see happening or does that to some extent not matter from your perspective?
[56:27]
A
I mean, it matters because the better your systems are, the more impressive things you'll be able to do and it'll become easier and easier. Requirements I see for control and governance in enterprise make me think that even if I had some AGIs model on my servers right now, if I were to go into a large bank and say, here is a thing, please let it control everything for you, they wouldn't be happy to let it do it. And so I think building the infrastructure properly is quite key to following the progress of these models and really being able to quickly unleash all of their capabilities. So to me it's. It's two directions that are necessary. You need to improve the capabilities of the model and it's super exciting to do so. But the journey of making it trivial and easy for everyone to unleash those models on your enterprise workflows without really wondering what's going to happen is equally important and honestly super fun as well to develop. There are lots of super interesting questions.
[57:42]
C
Wonderful. Well, Timothy, thank you so much for doing this deep dive on Mistral with us. It's been fascinating. Congratulations on on everything that you've built again in this very short period of time, and excited for what's coming next. So thank you for spending time with us.
[57:59]
A
Thanks. It was a pleasure.
[58:01]
B
Hi, it's Matt Turk again. Thanks for listening to this episode of the MAD podcast. If you enjoyed it, we'd be very.
[58:06]
C
Grateful if you would consider subscribing if.
[58:08]
B
You haven't already, or leaving a positive.
[58:10]
C
Review or comment on whichever platform you're watching this or listening to this episode from.
[58:15]
B
This really helps us build a podcast and get great guests.
[58:19]
C
Thanks and see you at the next episode.