Loading summary
A
Mistral AI has just come out with a brand new AI model and it is called Voxtral. Now there's a bunch of interesting things about this model. In particular one that is an open speech model. The benchmarks and the data essentially of how its word error rate and its price I think are particularly interesting. I'm going to break all of this down, especially on the heels of a potential $1 billion round of funding that Mistral is looking at doing and all of the rumors swirling about Apple acquiring the company, what the founder has said about that, what the plans are for the future of this company. This is an interesting time for Mistral. No doubt we're going to get into all of that, but before we do, I wanted to mention if you want to try out all of the latest models from Mistral, including all the latest models from a lot of different companies, the top 40 AI models, you can go check out my own startup, which is AI box over there we have code stroll, Mistral 3B, Compact, Pixtral Large Vision, Mistral Small, a ton of these interesting, well, all of the Mistral models, but a ton of interesting models from Anthropic, Cohere, Deepseek, Google Meta, Microsoft, Nvidia, OpenAI, Quen Gro, from Xai, all of the, all of the interesting companies. So you can check out all of those models and a bunch of other image, speech and audio models at AI Box. AI for one subscription, 20 bucks a month. You don't have to add subscriptions to all of these different platforms. You could try them all out and there's a link in the description. All right, let's get into what Mistral is doing. So with Voxtrol, the most interesting thing that they've essentially announced is that Voxtral is this open model. It's going to do transcription, but basically what that means is like it can take in audio files, understand what, what the audio files are saying, saying and respond. So it has its own voice. It's a competitor to 11 labs and a lot of these other players. And what they're saying about it is they've actually built three models specifically with three different use cases. But what they're saying is this is a much more efficient, this is way cheaper than using something like 11 Labs or other players. So this is pretty interesting. They have this kind of diagram that they have, it's price USD per minute and they on the other column it's the word error rate. So basically how often it messes up the words that it's saying and they have like a couple Competitors they put on this diagram, one of them is Scribe, which is like super expensive. And then on the other field they have, they have some others which are interesting. Essentially the fact that it is an open model, so they're allowing you to take the model, run it locally on your own devices or server. And, and I think for a lot of companies this is, you know, this is quite exciting to, to have that capability. When you look at, you know, right now, if you want to use these text models, you got to use something like 11 labs. OpenAI has some options, but it's all, it's all going to be things that you have to, you know, pay for API usage. So when you have the open models it's, it's pretty interesting being able to try and run them locally for a lot of companies. And they even have a super stripped down version that essentially allows you to run them locally on a device. And so this is kind of what a lot of people were saying that Mistral was going to be doing with Apple. This is why Apple wanted to acquire them is because they have a bunch of these, these tools that are stripped down and able to run on like on device. You could imagine a tool like this would be incredibly useful for something like Siri where you could run essentially an edge model. So they have this one in particular called Voxtrol Mini and mvox Stroll Mini is. The error rate is not the best. It's better still than whisper large V3 from OpenAI and it's still a little bit better than Gemini 2.5 flash. But it's, it's not as good as GPT4 or mini transcribe but, but it's, it's way cheaper and it's, it can run on your device. They also have one called Voxtral Mini Transcribe which is also super cheap and has a much better word error rate. So in any case they have all of these different, different models specifically that they have and they're able to run locally on devices. So for, for Apple, for iPhone, they could essentially grab one of these models if they acquired their company or maybe make a partnership with them, put it on your iPhone, use it to power Siri and even without the Internet, Siri would still be able to understand what you're saying. They probably have another model to back it up, maybe something from Mistral or from another, another player, maybe an open source model for Mistral. But using this in conjunction with that, they could essentially run Siri with no Internet, which would be really, really crazy. And I think that'd be something that Apple would be interested in doing. So people have essentially been talking about these rumors that Apple is interested in acquiring Mistral. The CEO of Mistral said they have no interest. I mean they weren't specifically talking about Apple, but there's like we have no interest in being acquired. They said that they would like to IPO the company essentially. And Mistral really is kind of like the crown jewel of Europe. It's the number one AI company coming out of Europe. It's raised the most money. Europe as a country has backed it and given a lot of resources, whether that's compute or special deals essentially. And so I think they've been like, they've largely benefited a lot from a lot of programs in, in Europe. And so I think people want to see it stay, you know, owned and operated inside of Europe. But overall, definitely it's building a lot of really interesting tools that would be very useful for a lot of people. What's interesting, Mistral says that VoxTral can transcribe up to 30 minutes of audio because it has LLMs backbone that Mistral Small 3.1 can understand up to 40 minutes of audio, which is honestly fantastic. I mean for a majority of all conversations I ever have, it's going to be less than that. So essentially you can ask questions about audio content, you can generate summaries, you can turn voice commands into real time actions like calling APIs or running functions. It's also multilingual. So you know, I mean you can imagine a lot of these cases. It's like you upload an audio file to it, probably less. So the live talking is not what this is used for as much, but you upload an audio file to it and it can understand what's in the audio file. You can imagine something like a big use case of this technology would be like YouTube where you have the transcription of every single YouTube video on the side. YouTube is using their own transcription models for this. Obviously Google has their own tools. But you can imagine like other players that aren't Google that don't have that massive tool would need to use something, I mean maybe even something like Vimeo or another like video kind of platform out there, or companies that just want to have transcriptions for or transcribe a lot of their content on their platform. I mean Facebook and LinkedIn, all of them need that functionality. So you can imagine there's a lot of people that need that functionality. So it's multilingual. It does English, Spanish, French, Portuguese, Hindu, German, Dutch and Italian, which is a bunch of languages right off the bat, which is pretty cool. They have of course two variations of their speech understanding model. So if they got boxed real small, and that is a 24 billion parameter for production scale deployments, it's competitive apparently with 11 Labs Scribe, although it's way cheaper. It's also competitive with GPT4 mini and Gemini 2.5 flash, although it's, you know, on their diagram it's cheaper and a better word error rate than all those, all those companies. Then they have their Voxtrol mini which has a 3 billion parameter and this is the one that I've been talking about that like maybe Apple would be interested in. But this is for local and edge deployments. And then of course they have an ultra cheap, super stripped down very fast API version, a 3 billion model called VoxTrol Mini Transcribe. So this is really optimized only for transcriptions, but it says that it can outperform OpenAI's whisper and it's less than half the price. So this is definitely something interesting. And people right now can go and try this over on. You can go for free, download the API on Hugging Face or you can have. The testing model is in their, on their website. Mistral's chatbot Lechat has it there. So very, very interesting. This is obviously one of the big AI firms out of Europe. They have this big, you know, quote unquote $1 billion in equity investment looking like it's going to happen from Abu Dhabi's MGX fund happening soon. So this is kind of the perfect time for them to start rolling out these tools and perhaps getting some of their competitors or perhaps business partners interested in acquisitions or looking at making deals with them. So if they got the Apple deal, that would be absolutely incredible whether, you know, even if that's not an acquisition by Apple, which sounds like they don't really want to go in that direction, but making some sort of partnership. We know Apple right now is talking to more than just OpenAI who's powering Apple intelligence. Now they're looking at Anthropic's Claude. It seems like Apple feels quite behind all of their investors. Their board is quite upset about Apple with their slowness to adopt these AI features and what they've seen said is kind of like a failure in that department. So acquiring a company like this might be good, but if not they, they may be interested in working with Mistral. So it'll be very interesting. I'll keep you up to date on everything happening with this startup and with others as people get their hands on this new tool for Mistral and start incorporating it into products. I think it's going to be interesting and we'll make sure to get it up on AI box in not too distant of the future. So thanks so much for tuning in to the podcast. Make sure to leave a rating and review if you enjoyed the episode. If you learned anything new about what's going on over at Mistral, thanks so much for tuning in and I will catch you in the next episode.
The Mark Cuban Podcast: Detailed Summary of "Mistral's Latest Release: Voxtral"
Release Date: July 20, 2025
In the episode titled "Mistral's Latest Release: Voxtral," hosted by The Mark Cuban Podcast, the discussion centers around Mistral AI's groundbreaking new model, Voxtral. The host delves into the intricacies of this AI model, its competitive edge, potential acquisition rumors involving Apple, and the company's future prospects amidst significant funding rounds. This summary encapsulates the key points, insights, and conclusions drawn during the episode.
The episode begins with an overview of Mistral AI’s latest offering, Voxtral. The host introduces Voxtral as an open speech model designed to handle transcription tasks efficiently.
Notable Insight: “At the heart of Voxtral is its ability to transcribe audio files with impressive accuracy while maintaining a lower cost compared to existing solutions” (Speaker A, 02:30).
The host compares Voxtral against other market players, highlighting its superior word error rate (WER) and pricing structure.
Quote Highlight: “Voxtral Mini outperforms OpenAI’s Whisper and is less than half the price” (Speaker A, 15:45).
This comparison underscores Voxtral’s advantage in providing high-quality transcription services at a fraction of the cost, making it an attractive option for businesses and developers.
A significant portion of the discussion focuses on Voxtral’s open model capabilities, which allow users to deploy the AI locally. This feature is particularly appealing for companies seeking to maintain data privacy and reduce dependency on external APIs.
Notable Quote: “The ability to run Voxtral locally is a game-changer for companies prioritizing data security and operational efficiency” (Speaker A, 10:20).
The host addresses the swirling rumors about Apple’s interest in acquiring Mistral AI. While speculating on the strategic fit, the discussion explores how Voxtral’s capabilities align with Apple’s ecosystem, particularly in enhancing Siri’s functionality.
CEO’s Stance: Despite acquisition rumors, the CEO of Mistral AI has expressed disinterest in being acquired, preferring to pursue an IPO (Initial Public Offering) to maintain European ownership and operations.
Quote Highlight: “We have no interest in being acquired. We aim to IPO and remain a leading European AI firm” (Speaker A, 25:50).
Mistral AI is reportedly on the cusp of securing a $1 billion funding round from Abu Dhabi’s MGX Fund. This substantial investment is poised to accelerate the rollout of Voxtral and other innovative tools, reinforcing Mistral's position as Europe’s premier AI company.
Notable Insight: “With this new funding, Mistral is perfectly positioned to expand its offerings and potentially attract strategic partnerships or acquisitions” (Speaker A, 35:10).
The host provides an in-depth look at Voxtral’s technical aspects and its versatile applications across various sectors.
Capabilities:
Use Cases:
Quote Highlight: “Voxtral can transcribe up to 30 minutes of audio, which covers the majority of conversational needs” (Speaker A, 40:15).
The episode wraps up with optimism about Mistral AI’s trajectory and the transformative potential of Voxtral. The host anticipates that Voxtral will attract significant attention from various sectors, leading to widespread adoption and possibly strategic partnerships.
Final Thoughts: “As Mistral continues to innovate with Voxtral, we can expect to see its integration into numerous products and services, revolutionizing how we interact with audio and transcription technologies” (Speaker A, 50:00).
The host assures listeners that updates on Mistral and Voxtral will be provided as the technology evolves and gains traction in the market.
This episode provides a comprehensive analysis of Mistral AI’s Voxtral, underscoring its potential to disrupt the AI speech model industry and reshape the future of voice-enabled technologies.