
Loading summary
A
Feels kind of like we're getting swept up in this AI tornado. Right? I mean, every single day it's a new announcement, some crazy breakthrough. It's all so exciting, but, like, how do you even keep up with it all, you know, without losing your mind entirely? That's the question.
B
Oh, absolutely. It can be a lot.
A
So, for this deep dive, we did some serious sifting, went through the latest news from AI Deep Dive.
B
And.
A
And we found four big developments that really caught our attention.
B
Definitely some interesting stuff going on.
A
Yeah. Really significant insights, I think. And today we're going to try to unpack it all, figure out what it means, you know, about where AI is at right now and where it's headed, especially for folks who want to get the big picture without needing to go back to school for computer science.
B
That's the goal. Make it accessible to everyone.
A
Exactly.
B
So we'll be looking at the release of Omo2 Sesame's CSM1B model.
A
Ooh, the voice AI stuff. That's always cool.
B
It is, yeah. We'll also look at Google's decision to replace Assistant with Gemini.
A
Big move for them.
B
Huge. Yeah. And then finally their policy proposal, which is something we should all be paying attention to.
A
Definitely. All right, so let's dive in, shall we?
B
Let's do it.
A
Let's start with this Olmo 2 news. It's making waves in the open source AI world. We'd heard whispers about earlier versions, the 7B and 13B models, but this new 32B parameter version is turning heads. Now, when we say parameters, think of them as the little knobs and dials inside the AI, the things that let it learn and make connections. The more parameters, the more powerful the model. Generally speaking.
B
It's a good analogy.
A
Yeah.
B
And this 32B version is being hailed as a real game changer. It's the first fully open model of this scale.
A
And. Fully open, you mean?
B
I mean, they're sharing everything. Oh, the data they trained it on, the code behind it, even the model weights, which are like the core of its knowledge.
A
Wow. That's pretty unusual, right? To be that transparent.
B
Very unusual, yeah. And what's really impressive is that it's actually beating out some of the big names in a series of academic benchmarks. GPT 3.5 Turbo, even a mini version of GPT 4O.
A
No kidding.
B
Yeah. It's holding its own against those industry giants.
A
That's amazing. And I heard it's also comparable to other open weight models out there.
B
It is. Performance wise, it's right up There with models like Quinn 2532B.
A
Okay, so what's the big deal then? Why is everyone so excited about this one?
B
Well, the cost. It cost a third of what those other models cost to train.
A
Really?
B
Yeah. Think about that. A third of the resources needed.
A
That's a massive difference.
B
It is. And this is where it gets exciting. That lower cost could make powerful AI accessible to way more people. Right? Smaller teams, organizations that couldn't even dream of building something like this before.
A
You're talking about democratizing AI, basically.
B
Exactly.
A
So for someone like you who's actually working with this technology every day, what does this accessibility mean?
B
It's a game changer. Honestly, Even the full 32B version of Omo 2 can be fine tuned. Fine tuned, meaning it's like giving it extra training for a specific job. And they're saying you can do that using just a single H100 GPU node. GPUs are these specialized processors that are really good at AI stuff.
A
Ah, okay.
B
And to make it even easier, all the models are available on the AI2 playground, this platform for AI research. So suddenly there are all these possibilities opening up for experimenting, for tailoring AI to very specific needs. It's incredible.
A
It is exciting. And it's not just the size of this model. Right. I was reading about the data it was trained on.6 trillion tokens. And then further training using something called Toulu 3.1. I don't even know what that means, to be honest. I know tokens are those little pieces of language that the AI processes. Six trillion of them. That's like.
B
It's an astronomical amount of text and code.
A
It boggles the mind. So what's the significance of that beyond just the sheer scale of it all?
B
Well, here's the thing. The team behind Olmo2, they're not just releasing a powerful model. They're giving you the entire recipe for how they made it.
A
Really from scratch.
B
From scratch. The data prep, the training process, every single step. This kind of transparency is almost unheard of. And it's incredibly valuable for the research community. It helps us understand how these models learn how to make them better.
A
That makes sense. Okay, let's shift gears for a minute and talk about something a little different.
B
Sounds good.
A
Voice AI. Remember Maya, that super realistic virtual assistant everyone was talking about? Well, Sesame, the company behind Maya, just released the AI model that powers it. CSM1B. It's smaller than Omo 2, only 1 billion parameters.
B
Still a lot of parameters, though.
A
Oh, for sure. But the key here is the open Source License Apache 2.0. That means anyone can use it commercially with very few restrictions, which is huge.
B
Because it opens up all kinds of possibilities for developers.
A
Absolutely. So they describe this csm1b as generating something called rvq audio codes. Can you break that down for us non audio nerds? What's going on under the hood?
B
Sure. RV Q stands for Residual Vector Quantization. Think of it like this. The AI chops up audio into tiny digital chunks, almost like individual notes in a song.
A
Okay.
B
Then it learns how to manipulate those chunks, how to string them together in different ways to create speech. It's like a digital composer, basically.
A
Interesting. So it's not just recording and playing back existing voices, it's creating new ones from scratch.
B
Exactly. And this RVQ technique is being used in other cutting edge audio AI tech like Google Soundstream and Meta's Incodeck. So it's definitely a trend to watch.
A
It is, yeah. Now, what about the architecture of this model? They're not starting from zero, are they?
B
Nope. They based CSM1B on Meta's llama family of models, which are well respected in the AI community. And then they added a special audio decoder component. It's like taking a powerful engine and attaching it to a speaker system. That fine tuned combination is what gives Maya its incredibly realistic voice.
A
I see. And this CSM1B that they've released, is it like a general purpose voice creator?
B
Yeah, it's the bass model. So it can produce a variety of voices, but it hasn't been trained to sound like any specific person.
A
So theoretically, you could train it to sound like anyone?
B
Theoretically, yes, but there are some limitations.
A
Like what?
B
Well, they mentioned that it doesn't work very well with languages other than English. Probably because the training data had some non English stuff mixed in, which can mess things up.
A
Right. And speaking of training data, they weren't very specific about what exactly they used to train CSM1B.
B
Yeah, that's a bit of a black box at the moment, which is interesting because transparency was such a big deal with Olmo too.
A
It's a good point. Maybe they'll release more info down the line. Okay, let's move on to some big news from Google. They're replacing Google Assistant with Gemini on Android phones. This feels like a major shift in their AI strategy.
B
It's huge. It's like saying goodbye to an old friend.
A
Yeah, Google Assistant has been around for a while, but it sounds like Gemini is their new flagship AI assistant.
B
Absolutely. They're going to be upgrading more users to Gemini over coming months and eventually the Google Assistant app will just disappear from the app stores. You won't be able to use it anymore.
A
Wow, that's a bold move.
B
It is, yeah. But they're not stopping there. They're bringing Gemini to all sorts of devices. Tablets, cars, headphones, smartwatches, even smart home stuff like speakers and TVs. It's going to be everywhere.
A
It's like they're building a whole Gemini ecosystem.
B
Exactly. A seamless, unified AI experience across all your devices. That's the vision.
A
And they're being smart about the transition too. I read that they're adding back some of the popular features from Google Assistant that weren't initially available in Gemini, like music playback timers, that sort of thing.
B
Yeah, they're trying to make it a smooth experience for users. They don't want people to feel like they're losing functionality.
A
Smart move. And this all ties in perfectly with the launch of their new Pixel 9 phone. Right. Gemini is going to be the default assistant on that.
B
Yep, it's all coming together. They're really pushing Gemini as their most advanced AI offering all those new features. Gemini Live Deep Research. It's definitely a step up from Google Assistant.
A
It is, yeah. They're clearly betting big on Gemini.
B
They are.
A
Alright, let's wrap up with Google's AI policy proposal. This comes on the heels of OpenAI's proposal. So it's clear that these big AI companies are trying to influence how governments regulate this technology.
B
Oh, absolutely. They're not just building the tech, they're trying to shape the rules of the.
A
Game, which makes sense. So what are Google's main recommendations?
B
They're focusing on two key areas. First, they want more flexibility when it comes to copyright restrictions for AI training. And second, they're advocating for what they call balanced export controls on AI tech.
A
Let's start with the copyright issue. Google's arguing for something called fair use and text and data mining exceptions. What does that mean in plain English?
B
Basically, they want the law to clearly state that they can train their AI models on any publicly available data, even if it's copyrighted.
A
So they could use books, articles, code, anything that's out there.
B
Exactly. Without having to get permission from every copyright holder or pay them royalties.
A
That sounds pretty controversial.
B
It is, yeah. There are lots of legal battles going on right now about whether this kind of AI training is legal.
A
Right. There was that case with.
B
Yeah, there have been a few. Google's involved in some of them. They're arguing that this kind of data use is. Is essential for progress in AI. Otherwise they'd have to negotiate with thousands, maybe millions of copyright holders, which would be a nightmare.
A
It would slow things down for sure.
B
Massively. But of course, the copyright holders, they want to be compensated for their work. They don't want their creations being used without their consent.
A
So it's a tricky situation.
B
Very tricky. And the courts haven't really figured out how to apply copyright law to this new AI world yet.
A
I see. So Google's trying to influence that legal landscape with this proposal.
B
Exactly.
A
Okay, what about the export controls? The Biden administration has restricted the sale of advanced AI chips to certain countries. What's Google's take on that?
B
They're a little worried, actually. They think some of these controls could actually hurt American companies, make it harder for them to compete globally.
A
Interesting. What's their reasoning?
B
Well, they say these controls put unnecessary burdens on US Cloud providers, which are the companies that provide the infrastructure for a lot of AI development. And they contrasted their position with Microsoft, which seems to be fine with the current rules.
A
I guess it depends on how reliant each company is on those specific chips in those specific markets.
B
Exactly. And it's worth noting that there are exemptions in place for certain trusted businesses. But Google seems to be pushing for a less restrictive approach overall.
A
Makes sense. They want to be able to operate freely in the global marketplace.
B
Yeah, and beyond those two big issues, they also had some other recommendations. They're really pushing for continued government funding for AI research and development in the.
A
US which is interesting, because I know there's been talk of cutting back on some of that spending.
B
Right? Exactly. But Google's saying that would be a big mistake. They want the government to release more of its own data sets for AI training and to fund early stage AI research. And they want to make sure that researchers have access to the computing power and the AI models they need. They're basically saying, don't stifle innovation.
A
That's a good message. And what about their stance on AI legislation in general?
B
They're definitely in favor of federal action. They want a comprehensive national AI law, something that covers privacy, security, the whole shebang.
A
Makes sense. We're seeing a lot of different AI laws popping up at the state level. It's getting messy.
B
Exactly. Google's saying we need a unified approach, clear guidelines that apply across the board.
A
But they're also warning against certain types of regulation. Right? Specifically anything that would hold developers liable for how their AI systems are used.
B
Right. They don't want to be held responsible if someone uses their AI. To do something bad. Their argument is that they can't control how their models are used once they're out in the world.
A
Which is a fair point. Once the genie's out of the bottle.
B
Exactly. And they've historically opposed laws that try to pin down liability for AI developers. Like that one proposed in California a while back.
A
Interesting. And I read that they're also not thrilled about some of the disclosure requirements that are popping up in places like the European Union.
B
Yeah, they think some of those rules are too broad. They're worried they might have to reveal trade secrets which could help their competitors. Or even worse, they could be forced to share information that could compromise the security of their AI systems.
A
So it's a delicate balancing act. Right. Trying to promote transparency without jeopardizing security or innovation.
B
Absolutely. And all of this is happening against the backdrop of these increasingly common laws that demand more transparency from AI developers like California's AB 2013 and the EU's AI Act. It's a complex landscape.
A
It is, yeah. A lot to think about. So, looking back at these four big developments, Omo 2 CSM 1B, Google's move to Gemini and their AI policy proposal, what are the key takeaways for someone who's trying to understand where AI is headed?
B
Well, I think the biggest takeaway is that AI is evolving rapidly on multiple fronts. We're seeing major advancements in the core technology, like with OMO too. And we're seeing new applications emerge in areas like Voice AI with models like CSM1B.
A
Right. It's not just about making AI smarter, it's about finding new ways to use it.
B
Exactly. And these big tech companies, they're weaving AI deeper and deeper into their products. Google ditching Assistant for Gemini, that's a clear sign of that. And at the same time, we're seeing a big push to figure out how to regulate AI, how to set the rules of the road. Google's policy proposal is just the latest salvo in that ongoing debate.
A
It is. It's like a wild west out there.
B
It is, yeah. And it's exciting and a little scary at the same time.
A
Totally agree. So as we wrap up this deep dive, I want to leave you with a thought provoking question. We've talked about a lot of different aspects of AI today. Openness, voice tech, the role of big companies, the challenges of regulation. If you had to pick one thing, just one that you think will have the biggest impact on your life in the next few years, what would it be? What are you most excited about? Or maybe most concerned about. It's worth spending some time pondering that question.
B
It really is. And maybe it'll even inspire you to explore some of these areas further.
A
Exactly. That's what we hope for with these deep dives. To spark curiosity, to get people thinking about the future of AI because it's a future that's being shaped right now.
B
It is. And we all have a role to play in shaping it.
A
Well said. Thanks for joining me on this deep dive.
B
My pleasure.
A
And thanks to all of you for listening. Until next time, stay curious.
AI Deep Dive Podcast Summary
Episode: OLMo 2 32B, Sesame’s CSM-1B, and Google Replaces Assistant with Gemini
Release Date: March 15, 2025
Host: Daily Deep Dives
In this episode of the AI Deep Dive podcast, hosts A and B navigate the whirlwind of recent advancements in artificial intelligence. With the AI landscape rapidly evolving, they dissect three major developments: the release of OLMo 2 32B, Sesame’s CSM-1B voice model, and Google's strategic shift from Assistant to Gemini. Additionally, they delve into Google's latest AI policy proposal, exploring its implications for the industry.
Unveiling a Powerful Model
The episode opens with an in-depth discussion on OLMo 2 32B, a groundbreaking model in the open-source AI community. Host A remarks, “[00:07]... how do you even keep up with it all...” highlighting the pace of AI advancements. OLMo 2 32B, boasting 32 billion parameters, surpasses its predecessors, the 7B and 13B models, and stands as the first fully open model of its scale.
Transparency and Performance
Host B emphasizes the model's unique transparency: “[01:48]... they're sharing everything. Oh, the data they trained it on, the code behind it, even the model weights...” This level of openness is unprecedented, enabling researchers and developers to fully understand and utilize the model. Remarkably, OLMo 2 32B outperforms industry giants like GPT-3.5 Turbo and mini GPT-4O, as highlighted by Host B at [01:55].
Cost Efficiency and Accessibility
A key advantage discussed is the cost-effectiveness of training OLMo 2 32B. “[02:25]... it cost a third of what those other models cost to train,” Host B explains, underscoring the potential for democratizing AI. This reduced cost lowers barriers for smaller teams and organizations, fostering broader innovation and application.
Future Implications
Host A further explores the significance of OLMo 2’s extensive training data: “[03:33]... 6 trillion tokens...” B elaborates on the transparency of the training process, stating, “[04:06]... they're giving you the entire recipe for how they made it.” This openness not only advances research but also promotes collaborative improvement within the AI community.
Introduction to CSM-1B
Transitioning to voice AI, the hosts discuss Sesame’s CSM-1B model, which powers the highly realistic virtual assistant Maya. Host A notes, “[04:27]... Sesame, the company behind Maya, just released the AI model that powers it.”
Open Source Licensing and Flexibility
A standout feature of CSM-1B is its Apache 2.0 open-source license. “[04:43]... anyone can use it commercially with very few restrictions,” Host A points out. This openness accelerates innovation, allowing developers to integrate and customize the model without significant legal barriers.
Technical Insights: RVQ Audio Codes
Host B breaks down the technical aspects: “[05:05]... Residual Vector Quantization. Think of it like this. The AI chops up audio into tiny digital chunks...” This method enables CSM-1B to generate speech by recombining these chunks, creating new, lifelike voices instead of merely mimicking existing ones.
Architecture and Capabilities
Built upon Meta’s respected LLaMA models, CSM-1B incorporates a specialized audio decoder: “[05:42]... They based CSM1B on Meta's llama family of models...” This combination endows Maya with its realistic and versatile voice capabilities, capable of generating diverse and natural-sounding speech.
Limitations and Future Potential
Despite its strengths, CSM-1B has limitations, particularly with non-English languages. Host B mentions, “[06:18]... it doesn't work very well with languages other than English.” However, the model’s foundational architecture and open licensing position it as a versatile tool for future enhancements and applications.
Transitioning from Assistant to Gemini
A significant portion of the episode focuses on Google's transformative decision to replace Google Assistant with Gemini across Android devices. Host B describes this as “[06:57]... like saying goodbye to an old friend,” underscoring the magnitude of this shift.
Integration and Ecosystem Expansion
Gemini is not limited to smartphones; Google plans to integrate it across a plethora of devices, including tablets, cars, headphones, smartwatches, and home appliances. “[07:27]... it's like they're building a whole Gemini ecosystem,” Host B explains, highlighting Google's vision for a unified AI experience.
Strategic Rollout and User Experience
To ensure a smooth transition, Google is reintroducing popular features from Google Assistant into Gemini. “[07:34]... adding back some of the popular features... like music playback timers,” Host A notes. This approach aims to retain user familiarity and functionality during the shift.
Launch and Future Prospects
The rollout is synchronized with the launch of the new Pixel 9 phone, positioning Gemini as Google's flagship AI offering. “[07:58]... Gemini is going to be the default assistant on that,” Host B adds, signaling Google's commitment to embedding Gemini deeply into its hardware ecosystem.
Context and Importance
The episode also delves into Google’s AI policy proposal, released in the wake of similar initiatives by competitors like OpenAI. Host A remarks, “[08:14]... as well as trying to shape the rules of the game,” emphasizing the strategic importance of these proposals.
Key Recommendations
Copyright Flexibility: Google advocates for clear fair use and text and data mining exceptions, allowing AI models to train on publicly available data without needing individual permissions. “[08:55]... they can train their AI models on any publicly available data, even if it's copyrighted,” Host B explains.
Balanced Export Controls: Google urges for nuanced export controls to prevent hindrance to American companies. “[10:08]... they think some of these controls could actually hurt American companies,” Host A summarizes, highlighting concerns over global competitiveness.
Rationale and Controversies
Google’s stance on copyright is contentious, as it navigates the balance between innovation and protecting creators' rights. “[09:06]... without having to get permission from every copyright holder or pay them royalties,” Host B points out the logistical challenges of stringent regulations.
Regarding export controls, Google argues that restrictions might burden US Cloud providers and impede global market participation. “[10:17]... these controls put unnecessary burdens on US Cloud providers,” Host A notes, contrasting Google’s perspective with other companies like Microsoft.
Support for Government Funding and Legislation
Google also pushes for increased government funding for AI research and seamless access to data and computing resources. “[11:00]... they want the government to release more of its own data sets for AI training and to fund early stage AI research,” Host B states, advocating for continued public investment.
Furthermore, Google calls for comprehensive national AI legislation to streamline regulations across states. “[11:29]... they want a comprehensive national AI law,” Host A explains, emphasizing the need for a unified legal framework amidst a patchwork of state laws.
Opposition to Liability and Over-Regulation
Google opposes regulations that would hold developers liable for AI misuse, arguing that controlling AI applications beyond their control is impractical. “[11:54]... they don't want to be held responsible if someone uses their AI to do something bad,” Host B highlights the company's defensive position against broad liability laws.
Additionally, Google is wary of stringent disclosure requirements, such as those in the EU, which could force them to reveal proprietary information. “[12:23]... they think some of those rules are too broad,” Host A notes, balancing transparency with the protection of trade secrets.
Rapid AI Evolution
The hosts conclude by synthesizing the discussed developments, emphasizing the multifaceted growth of AI. “[13:15]... AI is evolving rapidly on multiple fronts,” Host B observes, noting advancements in both core technologies and their applications.
Integration and Regulation
Big tech companies, exemplified by Google, are deeply embedding AI into their ecosystems while simultaneously striving to influence regulatory frameworks. “[13:32]... big tech companies, they're weaving AI deeper and deeper into their products,” Host A points out the dual focus on innovation and regulation.
Balancing Excitement and Caution
The episode wraps up with a reflection on the dual nature of AI’s rapid advancements—filled with both excitement and apprehension. “[13:49]... it is exciting and a little scary at the same time,” Host A encapsulates the sentiment, urging listeners to contemplate AI's impact on their lives.
Final Thoughts
Host B encourages listeners to stay curious and engage with the ongoing developments in AI, reinforcing the podcast’s mission to inform and inspire. “[14:26]... to spark curiosity, to get people thinking about the future of AI,” Host A concludes, inviting the audience to actively participate in shaping AI’s trajectory.
Host A [00:07]: “Feels kind of like we're getting swept up in this AI tornado... without losing your mind entirely. That's the question.”
Host B [01:48]: “They’re sharing everything... the data they trained it on, the code behind it, even the model weights.”
Host B [02:25]: “It cost a third of what those other models cost to train.”
Host A [04:27]: “Sesame, the company behind Maya, just released the AI model that powers it.”
Host B [05:05]: “RV Q stands for Residual Vector Quantization... It’s like a digital composer, basically.”
Host B [08:25]: “They’re not just building the tech, they’re trying to shape the rules of the game.”
Host A [09:06]: “They can train their AI models on any publicly available data, even if it's copyrighted.”
Host B [10:17]: “These controls put unnecessary burdens on US Cloud providers.”
Host A [11:29]: “They want a comprehensive national AI law, something that covers privacy, security, the whole shebang.”
Host A [13:49]: “It is exciting and a little scary at the same time.”
This episode of AI Deep Dive offers a comprehensive exploration of pivotal AI advancements and strategic shifts within the industry. From the open-source triumph of OLMo 2 32B and the innovative voice model CSM-1B to Google's bold move towards Gemini and its influential policy proposals, listeners gain valuable insights into the current state and future trajectory of artificial intelligence. The hosts adeptly balance technical explanations with strategic analysis, making complex topics accessible and engaging for all audiences.