Loading summary
A
Mistral AI has just come out with a brand new AI model and it is called Voxtral. Now there's a bunch of interesting things about this model. In particular one that is an open speech model. The benchmarks and the data essentially of how its word error rate and its price I think are particularly interesting. I'm going to break all of this down, especially on the heels of a potential $1 billion round of funding that Mistral is looking at doing and all of the rumors swirling about Apple acquiring the company, what the founder has said about that, what the plans are for the future of this company. This is an interesting time for Mistral. No doubt we're going to get into all of that, but before we do, I wanted to mention if you want to try out all of the latest models from Mistral, including all the latest models from a lot of different companies, the top 40 AI models, you can go check out my own startup, which is AI box over there we have code stroll, Mistral 3B, Compact, Pixtral Large Vision, Mistral Small, a ton of these interesting, well, all of the Mistral models, but a ton of interesting models from Anthropic, Cohere, Deepseek, Google Meta, Microsoft, Nvidia, OpenAI, Quen Gro, from Xai, all of the, all of the interesting companies. So you can check out all of those models and a bunch of other image, speech and audio models at AI Box. AI for one subscription, 20 bucks a month. You don't have to add subscriptions to all of these different platforms. You could try them all out and there's a link in the description. All right, let's get into what Mistral is doing. So with Voxtrol, the most interesting thing that they've essentially announced is that Voxtral is this open model. It's going to do transcription, but basically what that means is like it can take in audio files, understand what, what the audio files are saying, saying and respond. So it has its own voice. It's a competitor to 11 labs and a lot of these other players. And what they're saying about it is they've actually built three models specifically with three different use cases. But what they're saying is this is a much more efficient, this is way cheaper than using something like 11 Labs or other players. So this is pretty interesting. They have this kind of diagram that they have, it's price USD per minute and they on the other column it's the word error rate. So basically how often it messes up the words that it's saying and they have like a couple Competitors they put on this diagram, one of them is Scribe, which is like super expensive. And then on the other field they have, they have some others which are interesting. Essentially the fact that it is an open model, so they're allowing you to take the model, run it locally on your own devices or server. And, and I think for a lot of companies this is, you know, this is quite exciting to, to have that capability. When you look at, you know, right now, if you want to use these text models, you got to use something like 11 labs. OpenAI has some options, but it's all, it's all going to be things that you have to, you know, pay for API usage. So when you have the open models it's, it's pretty interesting being able to try and run them locally for a lot of companies. And they even have a super stripped down version that essentially allows you to run them locally on a device. And so this is kind of what a lot of people were saying that Mistral was going to be doing with Apple. This is why Apple wanted to acquire them is because they have a bunch of these, these tools that are stripped down and able to run on like on device. You could imagine a tool like this would be incredibly useful for something like Siri where you could run essentially an edge model. So they have this one in particular called Voxtrol Mini and mvox Stroll Mini is. The error rate is not the best. It's better still than whisper large V3 from OpenAI and it's still a little bit better than Gemini 2.5 flash. But it's, it's not as good as GPT4 or mini transcribe but, but it's, it's way cheaper and it's, it can run on your device. They also have one called Voxtral Mini Transcribe which is also super cheap and has a much better word error rate. So in any case they have all of these different, different models specifically that they have and they're able to run locally on devices. So for, for Apple, for iPhone, they could essentially grab one of these models if they acquired their company or maybe make a partnership with them, put it on your iPhone, use it to power Siri and even without the Internet, Siri would still be able to understand what you're saying. They probably have another model to back it up, maybe something from Mistral or from another, another player, maybe an open source model for Mistral. But using this in conjunction with that, they could essentially run Siri with no Internet, which would be really, really crazy. And I think that'd be something that Apple would be interested in doing. So people have essentially been talking about these rumors that Apple is interested in acquiring Mistral. The CEO of Mistral said they have no interest. I mean they weren't specifically talking about Apple, but there's like we have no interest in being acquired. They said that they would like to IPO the company essentially. And Mistral really is kind of like the crown jewel of Europe. It's the number one AI company coming out of Europe. It's raised the most money. Europe as a country has backed it and given a lot of resources, whether that's compute or special deals essentially. And so I think they've been like, they've largely benefited a lot from a lot of programs in, in Europe. And so I think people want to see it stay, you know, owned and operated inside of Europe. But overall, definitely it's building a lot of really interesting tools that would be very useful for a lot of people. What's interesting, Mistral says that VoxTral can transcribe up to 30 minutes of audio because it has LLMs backbone that Mistral Small 3.1 can understand up to 40 minutes of audio, which is honestly fantastic. I mean for a majority of all conversations I ever have, it's going to be less than that. So essentially you can ask questions about audio content, you can generate summaries, you can turn voice commands into real time actions like calling APIs or running functions. It's also multilingual. So you know, I mean you can imagine a lot of these cases. It's like you upload an audio file to it, probably less. So the live talking is not what this is used for as much, but you upload an audio file to it and it can understand what's in the audio file. You can imagine something like a big use case of this technology would be like YouTube where you have the transcription of every single YouTube video on the side. YouTube is using their own transcription models for this. Obviously Google has their own tools. But you can imagine like other players that aren't Google that don't have that massive tool would need to use something, I mean maybe even something like Vimeo or another like video kind of platform out there, or companies that just want to have transcriptions for or transcribe a lot of their content on their platform. I mean Facebook and LinkedIn, all of them need that functionality. So you can imagine there's a lot of people that need that functionality. So it's multilingual. It does English, Spanish, French, Portuguese, Hindu, German, Dutch and Italian, which is a bunch of languages right off the bat, which is pretty cool. They have of course two variations of their speech understanding model. So if they got boxed real small, and that is a 24 billion parameter for production scale deployments, it's competitive apparently with 11 Labs Scribe, although it's way cheaper. It's also competitive with GPT4 mini and Gemini 2.5 flash, although it's, you know, on their diagram it's cheaper and a better word error rate than all those, all those companies. Then they have their Voxtrol Mini which has a 3 billion parameter and this is the one that I've been talking about that like maybe Apple would be interested in. But this is for local and edge deployments. And then of course they have an ultra cheap, super stripped down very fast API version, a 3 billion model called VoxTrol Mini Transcribe. So this is really optimized only for transcriptions, but it says that it can outperform OpenAI's whisper and it's less than half the price. So this is definitely something interesting. And people right now can go and try this over on. You can go for free, download the API on Hugging Face or you can have. The testing model is in their, on their website. Mistral's chatbot Lechat has it there. So very, very interesting. This is obviously one of the big AI firms out of Europe. They have this big, you know, quote unquote one billion dollar in equity investment looking like it's going to happen from Abu Dhabi's MGX fund happening soon. So this is kind of the perfect time for them to start rolling out these tools and perhaps getting some of their competitors or perhaps business partners interested in acquisitions or looking at making deals with them. So if they got the Apple deal, that would be absolutely incredible whether, you know, even if that's not an acquisition by Apple, which sounds like they don't really want to go in that direction, but making some sort of partnership. We know Apple right now is talking to more than just OpenAI who's powering Apple intelligence. Now they're looking at Anthropic's Claude. It seems like Apple feels quite behind all of their investors. Their board is quite upset about Apple with their slowness to adopt these AI features and what they've seen said is kind of like a failure in that department. So acquiring a company like this might be good, but if not they, they may be interested in working with Mistral. So it'll be very interesting. I'll keep you up to date on everything happening with this startup and with others as people get their hands on this new tool for Mistral and start incorporating it into products. I think it's going to be interesting and we'll make sure to get it up on AI box in not too distant of the future. So thanks so much for tuning in to the podcast. Make sure to leave a rating and review if you enjoyed the episode. If you learned anything new about what's going on over at Mistral, thanks so much for tuning in and I will catch you in the next episode.
Podcast Summary: Joe Rogan Experience for AI
Episode: Debuting Voxtral
Release Date: July 20, 2025
In the episode titled "Debuting Voxtral," the host of the Joe Rogan Experience for AI delves into the latest advancements from Mistral AI, a leading European AI firm. The focus is on Mistral’s newly launched AI model, Voxtral, its potential impact on the AI landscape, and the surrounding rumors of a significant funding round and possible acquisition by tech giant Apple.
Mistral AI has introduced Voxtral, an innovative AI model designed to excel in transcription and speech-related tasks. The host emphasizes the novelty of Voxtral being an open speech model, highlighting its accessibility and flexibility for developers and businesses alike.
“Mistral AI has just come out with a brand new AI model and it is called Voxtral... it can run locally on your device.”
— Host, 00:00
Voxtral stands out due to its multifaceted capabilities:
Transcription: Voxtral can transcribe up to 30 minutes of audio, leveraging Mistral’s Small 3.1 LLM backbone capable of understanding up to 40 minutes of audio.
Multilingual Support: The model supports multiple languages, including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian, making it versatile for global applications.
Local and Edge Deployments: One of Voxtral’s standout features is its ability to run locally on devices, eliminating the need for constant internet connectivity. This is particularly beneficial for applications requiring offline functionality, such as virtual assistants like Siri.
Cost Efficiency: Voxtral offers a more affordable solution compared to competitors like 11 Labs and OpenAI, both in terms of pricing and word error rate (WER).
“Voxtral is this open model, so they're allowing you to take the model, run it locally on your own devices or server... [It] can run on your device.”
— Host, 00:05
The host provides an in-depth comparison of Voxtral against other industry players:
Price vs. Word Error Rate: Voxtral is positioned as a cost-effective alternative with a lower WER, making it a competitive option in the AI transcription market.
Model Variants: Mistral has developed three specific models under the Voxtral umbrella:
“Voxtral can outperform OpenAI's Whisper and it's less than half the price.”
— Host, 00:20
A significant advantage of Voxtral is its open-source nature, allowing businesses to run the model locally without reliance on third-party APIs. This not only reduces costs but also enhances data privacy and control.
“When you have the open models it's pretty interesting being able to try and run them locally for a lot of companies.”
— Host, 00:15
Amidst Voxtral’s launch, rumors have surfaced about Apple’s interest in acquiring Mistral AI. The host explores the implications of such a move:
Strategic Fit: Apple could integrate Voxtral into its ecosystem, enhancing Siri’s capabilities by enabling on-device processing, thus improving speed and privacy.
Mistral’s Stance: The CEO of Mistral has publicly expressed disinterest in acquisition, favoring an IPO instead.
“The CEO of Mistral said they have no interest in being acquired... they would like to IPO the company essentially.”
— Host, 00:35
Mistral AI is reportedly on the brink of securing a $1 billion funding round from Abu Dhabi's MGX fund. This infusion of capital is poised to accelerate the development and deployment of Voxtral and other AI tools.
“They have this big, you know, quote unquote one billion dollar in equity investment looking like it's going to happen from Abu Dhabi's MGX fund.”
— Host, 00:50
The versatility of Voxtral opens up numerous applications across various industries:
Content Platforms: Platforms like YouTube, Vimeo, Facebook, and LinkedIn can utilize Voxtral for accurate and cost-effective transcription of video and audio content.
Virtual Assistants: Integration with voice assistants (e.g., Siri) can enable offline functionality, enhancing user experience and privacy.
Business Solutions: Companies requiring transcription services can leverage Voxtral's multilingual and high-accuracy capabilities for documentation, customer service, and more.
“You can imagine a big use case of this technology would be like YouTube where you have the transcription of every single YouTube video on the side.”
— Host, 00:40
The episode wraps up with the host expressing enthusiasm about the potential impact of Voxtral and Mistral AI's strategic moves. The anticipation surrounding the funding round and possible partnerships, whether with Apple or other tech entities, indicates a promising trajectory for Mistral.
“It'll be very interesting... We'll make sure to get it up on AI Box in not too distant of the future.”
— Host, 00:55
Listeners are encouraged to stay tuned for further updates on Mistral AI’s developments and the evolving capabilities of Voxtral.
Key Takeaways:
For those interested in exploring Voxtral and other AI models, the episode highlights AI Box as a resource to access a variety of AI tools under a single subscription.
End of Summary