Ep 751: Hands on with Google’s Gemma 4: How to Use The Open Source Model Locally and Why It Matters - Everyday AI Podcast – An AI and ChatGPT Podcast

Summary5 min read

Podcast Summary: Everyday AI Podcast, Ep 751

Hands On with Google's Gemma 4: How to Use the Open Source Model Locally and Why It Matters

Host: Jordan Wilson
Date: April 8, 2026

Episode Overview

This episode of Everyday AI offers an in-depth, practical guide to Google's newly released Gemma 4 open source AI model. Host Jordan Wilson explains why Gemma 4 is groundbreaking, details its capabilities, compares it to previous industry leaders, and demonstrates—live—how to run it on a consumer-grade laptop or even a phone. The discussion spotlights how the local, open-source model changes cost and privacy dynamics and why every AI practitioner, business leader, and enthusiast should pay attention.

Key Discussion Points & Insights

1. Why Gemma 4 is a Gamechanger (00:16 – 08:50)

Open Source Frontier Performance:
Gemma 4 delivers "at least 2025's frontier AI performance" locally, for free.

"If I would've told you a year ago you could use the world's most powerful models on your local machine without having to pay for it, you'd think I'm crazy... That day is here." — Jordan (00:16)
Licensing & Commercial Use:
Released under Apache 2.0, Gemma 4 allows unfettered commercial use.

"...full commercial freedom with essentially no restrictions." (01:40)
Performance vs. Size Revolution:
Gemma 4’s 31B-parameter model "outranks models 20 times its size" and currently sits third globally on open model leaderboards.

"It is competing with models 20 times its size...something we've literally, quite literally, never seen in the history of AI." (07:28)

2. Open Source Model Variants and Hardware Requirements (14:52 – 24:23)

Four Versions of Gemma 4:
- E2B & E4B: Ultra-lightweight models for phones/edge devices (even Raspberry Pi)
- 26B Mixture-of-Experts: Runs on mid-tier consumer laptops (e.g., "the middle flavor of a MacBook Pro")
- 31B Dense: Requires high-RAM devices (32–64GB RAM), e.g., Mac Studio or high-end Windows laptop
"You could technically run the quantized version [26B] on a 16GB MacBook, just the middle flavor, right off the shelf..." (19:50)
Comparative Leap:
"A middle-of-the-road MacBook Pro can now run a model that's about the same capabilities as the best models in the world 14 months ago." (23:06)

3. What This Means for Cost, Privacy, and Agentic AI (24:38 – 32:10)

Radical Cost Reduction:
No subscription fees; run locally 24/7.
Enhanced Privacy:
"You never have to send anything to a cloud." Critical for sensitive sectors like healthcare and law.
Agentic Capabilities:
Native function calling, ideal for agentic AI where privacy and reliability are key.

"This is something any sensitive industry...can gain very capable AI without any cloud exposure." (30:18)

4. How to Run Gemma 4 Locally – Step-By-Step Demo (32:15 – 38:04)

Three (or Four) Free Methods:
- Ollama: Desktop client; graphical interface for local models.
  - Go to ollama.com, download, install, choose Gemma 4 variant, and run with one CLI command.
  "This is essentially gives local models a graphical user interface like ChatGPT... in minutes." (33:14)
- LM Studio: Another user-friendly GUI client.
- Direct via Hugging Face or agentic system: For advanced users.
- Google AI Edge Gallery app: For running E2B/E4B on mobile devices (iOS/Android).

5. Live Evaluation: How Gemma 4 Performs in Practice (38:05 – 58:02)

Replicating Old Benchmarks:
Ran a rubric of logic and instruction-following tests previously used for GPT-4o and Claude Sonnet 3.7—demonstrating that Gemma 4 now matches or exceeds their performance on consumer hardware.
- Logic Puzzles:
  - Sometimes gets tricky logic questions closer to right than previous models, sometimes falls short—the inherent challenge of generative LLMs.
  - Example:
    
    "All got it wrong. Gemma got it a little closer..." (45:41, apples and bananas question)
- Creative Tasks:
  - Strong on instruction-following; less so on humor ("the jokes were trash but they actually follow the instructions"). (52:52)
- File Uploads and Real-World Use:
  - Given a transcript & sample newsletters, Gemma 4 summarizes and mimics host's tone/style impressively.
  - "It actually did a pretty decent job from what I recall; a little better at instruction following than Claude Sonnet; better at matching tone than GPT-4o." (56:45)
- Speed:
  
  "...this is running all locally and that was probably faster than I would have even gotten from proprietary models online." (49:10)

6. Memorable Quotes & Fun Moments

On the Local AI Revolution:

"I think we're going to have this future that's kind of retro...I think that desktop software is going to come back in the same way that, you know, in the 90s we saw this wave of personal computing." (12:34)
On Practical Potential:

"You can have an agent that works for you 24/7 that costs $0.00. It's 100% private." (59:09)
Community Polls:

"If you want to see other types of demos...I put a poll in our newsletter on Monday. ...This is what you wanted, FYI." (16:45)
Going Hands On:

"AI moves too fast to follow, but you're expected to keep up. Otherwise...your company might lag behind while AI-native competitors leap ahead." (24:00)

Timestamps for Key Segments

| Time | Segment | |--------------|------------------------------------------------------------------| | 00:16–08:50 | Introduction & Why Gemma 4 is Revolutionary | | 14:52–24:23 | Technical Breakdown: Model Variants & Hardware | | 24:38–32:10 | Impact on Cost, Privacy, and Agentic AI | | 32:15–38:04 | Hands-On: Running Gemma 4 Locally (Ollama, LM Studio, Mobile) | | 38:05–58:02 | Live Testing: Logic, Creativity, Summarization, Speed | | 59:09–end | Key Takeaways, Community Calls to Action |

Key Takeaways

Gemma 4 offers near-frontier performance, open-source licensing, and can be run locally—even on consumer-grade hardware—without cost.
Its size-to-performance ratio signals a paradigm shift in both AI capability and accessibility.
Businesses and individuals can benefit from cost savings, privacy guarantees, and robust agentic workflows, previously out-of-reach for most.
Live testing showed strong competitive performance compared to models considered state-of-the-art just 14–15 months ago.
Running powerful AI locally is now feasible for almost anyone, making it essential learning for anyone interested in AI’s future.
The host encourages engagement: sign up to the newsletter, participate in polls, and leave feedback if the content is valuable.

Notable Quotes (with Timestamps)

"It is competing with models 20 times its size...something we've literally, quite literally, never seen in the history of AI." — Jordan (07:28)
"A middle-of-the-road MacBook Pro can now run a model that's about the same capabilities as the best models in the world 14 months ago." (23:06)
"You never have to send anything to a cloud." (30:52)
"This is essentially gives local models a graphical user interface like ChatGPT...in minutes." (33:14)
"You can have an agent that works for you 24/7 that costs $0.00. It's 100% private." (59:09)

If you found the episode valuable, visit youreverydayai.com to sign up for the free daily newsletter, and engage with the show's ongoing community and resources.

Loading summary

Transcript3 lines

[00:01]
A
This is the Everyday AI show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business and everyday life.
[00:17]
B
If I would have told you a year ago that you could use the world's most powerful models on your local machine without having to pay for it, you probably would have looked at me and said, you're absolutely crazy. Well, I'm not, because that day is here. Thanks to Google's new impressive Gemma 4 model, you can get at least 2025's frontier AI performance on your local machine, running it privately offline with this new impressive open source model. And what I want you to think back to a year ish ago. I'm not just telling you to, okay, think about maybe replacing that $20 a month subscription. That's not where this is going. What about if your company was spending thousands of dollars or millions of dollars on AI deployments internally or externally? Or when you think about running AI agents around the clock, right? Anthropic recently said, hey, you can't use your Claude subscription anymore for OpenClaw. Well, now you can run with Google Google Gemma 4. You can run it around the clock and not pay a penny. No, this is not too good to be true. Yes, it kind of feels like we're living in the future and we're going to go over it all live. So welcome to Everyday AI. So here's the big picture. What's happening with Google's new Gemma 4? And it's, I think, rivaling some of its trillion parameter giant competitors. So Google DeepMind released Gemma 4, its most capable open market model family to date. So there's four different variations. We're going to be breaking them all down on today's show. But the big boy, the 31 billion parameter model outranks models 20 times its size. It is third on the global ranking of all open models on arena. And that's not even the best thing. The best thing maybe is that Google changed its licensing and now Gemma 4 is released under the very permissive Apache 2.0 license grant, granting full commercial freedom with essentially no restrictions. So not only is this free, you can run it on your computer around the clock if you have the right hardware. I'm going to be breaking all of that down, but you can also create and sell things with this model. So on today's show, here's what we're going to break down. I'm going to break down for you how a 31 billion parameter model now competes with ones 20 times its size. I'm going to show you and tell you why running AI locally on your laptop or even phone changes the cost and privacy equation. And I'm going to break down live, right, the exact tools and steps to Download and run Gemma4 for free today as we go, hands on. All right, let's get into it. This is everyday AI. Welcome. What's going on? My name is Jordan Wilson, if you're new here. Well, we do this every day. And this is your daily live stream, podcast and free daily newsletter helping business leaders like you and me keep up with the everchanging AI landscape. I help you understand what's important and show you practically how to use that information to grow your company and career. So it starts here, but make sure you go to our website at your everyday AI.com to sign up for the free daily newsletter. We're going to be recapping today's show as well as giving you all of the other AI information you need to be the smartest person in AI at your company. So it's Wednesday. Wednesday, right? We have kind of different shows. On Mondays we give you the AI news. On Wednesdays we go hands on a deep div with one new AI release, something like that. And then on Friday we go over in our Friday features, you know, kind of five to seven new AI features. Tuesday, Thursday we rotate it a little bit. So if you are new here, that's the plan. But on Wednesdays we get our hands dirty doing live demos of AI. But first, before we do that, I want to talk about how Gemma 4, how I think is going to completely change the landscape. And the cool thing about having a daily AI podcast where you can go listen to all the episodes for free. You can go see. I've been ranting and raving about the power of small language models since 2023. And technically this is a small language model, but you get large language model performance, right? So the exact definitions of what's a small language model, what's a large language model is ever changing, right? But for the most part, you look at the number of parameters in a model. So we're going to be comparing, you know, what you can get for free out of Gemma 4 with what you could get out of the frontier models about, you know, four 14 months ago, which at the time were GPT4O and Claude Sonnet 37. But those are models, right? GPT4O was reportedly 2 trillion parameters, right? So here you have a model that's about an eighth of the size and it's open Source and it's free. And here's why I think it's going to change the landscape. Well, number one, I already talked about kind of this anthropic saying, hey, you can't use your Claude subscription anymore to run open Claw. You have to pay via API. And people are like, that's going to be crazy expensive. Well, now you can run a Gentic AI around the clock for free with Gemma4. Is it going to give you the same model as an Opus 5.4 or a GPT 5.4? Absolutely not. But there's a good chance for, you know, maybe 50 to 80% of what you're trying to do, this model is going to be good enough. And Google also changed the Gemma license to Apache 2.0, which provides users unrestricted commercial freedom and preventing corporate vendor dependency. That's the big thing here. I think that this is going to lead to a resurgence and I've referenced this on the show once or twice, but I think we're going to kind of have this future that's kind of retro. I think that desktop software is going to come back in the same way that, you know, in the 90s we saw this wave of personal computing. I think we're going to see personal software, right? So not necessarily software that's for your whole company, software that's for you. Right? And I think that it's going be models like Gemma 4. Right, that are going to allow this to happen. And also the performance versus size ratio. Absolutely, just reset. All right? And I'm going to break down what this means. But essentially think of it like this, right? If you follow, I don't know, boxing or ufc, I don't really follow those things, but there's something called like, you know, pound for pound, this is the pound for pound best fighter in the world, right? If, if someone, I don't know fighting at 150 pounds can knock out someone at 180, that's pretty impressive, right? This Gemma is punching well above its weight class. I'm talking about. It is competing with models 20 times its size on the open source side. This is something we've literally, quite literally have never seen in the history of AI, which is why I think Gemma 4 is a huge deal. So even if you don't necessarily think that you or your company need to use this, you're like, okay, well we pay for ChatGPT Enterprise or Google Gemini Enterprise, Claude Enterprise, whatever, right? And we have more robust agentic solutions already going, okay, you still need to be learning Gemma 4 and building with it not just as a backup dependency, but because it can run in probably in the future. Right? If we fast forward one more year, I don't know if any open source competitors are going to be able to truly catch up to what we just saw from Google and their Gemma 4. So let's talk quickly about the capabilities so it can solve complex reasoning, math and multi step logic problems effectively. There's native support for function calling, yes, in a local model in structured JSON outputs for agentic workflows. It has a context window. Depending on your hardware will give you all those specs in the newsletter, right? But if you have capable hardware, you can work with a 256k token window for analyzing large document and code bases. It can analyze text, images and videos natively, but excludes audio support for the bigger models, but the smaller models actually support audio and it can generate and correct code efficiently as a local offline coding assistant. Here are the four different flavors of Gemma 4 and oh, FYI, as I take a sip of my coffee, yeah, this is unedited, unscripted. I hope that this is going to be interesting for you, but if not, and if you listen to the podcast all the time, make sure you sign up for our newsletter because I put a poll in our newsletter on Monday. I said, hey, what do you guys want to see hands on on Wednesday? And you all voted Gemma 4. So if you want to see other types of demos on Wednesday, make sure you read our newsletter. I'll usually put out a poll maybe Monday or the Friday before, depending on how busy things are. So I'm doing this for you. This is what you wanted, FYI. But I mean, I'm doing it for myself because I'd be doing the same thing anyways. But right now there are four different variants of Gemma 4. Easy, right? But you have the E2B and the E4B. Those are essentially phone models. This is as Edge as Edge gets because it can actually even run on a Raspberry PI, right? And your basic phones. And this is big, right? Especially if you've always, I don't know, wanting to build a certain app for something, right? And you're confused how or right, using the Gemma E2B and E4B models can get you there pretty quickly. They are extremely capable. All right, then you have the two bigger boy models that you're going to need. Well, consumer hardware, that's, that's the reality here, right? Because to get this level of performance previously, before, you know, rewind more than a week ago, you couldn't. On a $2,000 laptop, you couldn't run anything, right? That was a, a top 10 open source model. Now you can, and I think a good way to look at this is the MacBook Pro test, right? So generally Apple, right, they usually have about three to four different versions of their MacBook Pro. So obviously the souped up ones are, you know, a little expensive. But I say if you take the middle, right variety or the middle flavor of a MacBook Pro off the shelf, right, Walk into Best Buy, Apple Store, whatever, look at the MacBook Pros and say, give me the one in the middle. Now that one in the middle can technically run the 26B version of Gemma 4, and that's because it uses the mixture of experts framework and it only activates 4 billion parameters. And it's really fast, right? So that model, the 26B, is actually faster, less capable than the 31B. But by default you can run that. You know, you could technically run the quantized version on a 16 gigabyte MacBook, the base baseline. But if you just go for the middle flavor of a MacBook Pro, which I'm trying to see here, you know what this cost is? I have it open on my other tab here. Let's see. Okay, no, no, trade in. No thanks. I don't want all this extra stuff. Let's see how much this is. All right, so 2020, 200, right? Which any MacBook Pro I've bought over the last 10 years has been that price or way more, right? So. AI moves too fast to follow, but you're expected to keep up. Otherwise your career or company might lag behind while AI native competitors leap ahead. But you don't have 10 hours a day to understand it all. That's what I do for you. But after 700 plus episodes of everyday AI, the most common questions I get is, where do I start? That's why we created the Start Here series, an ongoing podcast series of more than a dozen episodes you can listen to in order. It covers the AI basics for beginners and sharpens the skills of AI champions pushing their companies forward. In the ongoing series, we explain complex trends in simple language that you can turn into action. There's three ways to jump in. Number one, go scroll back to the first one in episode 691. Number two, tap the link in your show notes at any time for the Start Here series. Or you can just go to starthereseries.com, which also gives you free access to our inner circle community, where you can connect with other business leaders doing the same. The Start Here series will slow down the pace of AI so you can get ahead. I don't think people understand the middle, you know, middle MacBook Pro that you buy off the shelf can now run a model that's about the same capabilities as the best models in the world 14 months ago. All right, and then you have the last flavor, okay, so E2, E2B, E4B for the phones and edge devices, then you have the 26B, and then you have the 31B dense model, all right, so when you're running this, you're running the entire thing. That's why it's dense. It's not the 26B mixture of experts, all right? And that delivers the highest quality reasoning and coding output. And you will need a more powerful computer. But you know, luckily for me, you know, I got a Mac Studio, a fairly capable one, and it runs great on my machine. But to run this one, you will either need the Most souped out MacBook Pro or Windows equivalent of that, or you will need a fairly capable Mac Studio or an Nvidia dgx, something like that. But you quantize, you could squeeze this one on something with about 32 gigabytes of RAM, it will run better at about 48. So as an example, my Mac Studio has 64 gigabytes of RAM. All right, so now you understand the technical side. Like I said, if you have a newer middle of the line MacBook Pro, just an easy way to benchmark it, you're going to be able to run the quantized version of the 26B. You're going to have to have a little bit more of a powerful machine to run the 31B dense model. But it's capable and I'm going to show you here live. So here's why this is important. If you look at the biggest and best open source models in the world, right, like Kimi K25, thinking, well, Gemma, 431B is now in the exact same category, but at a fraction of the cost and a fraction of the parameter count to run it. And community testers have confirmed strong results in coding, reasoning and image understanding. And the 31B scored a 1452 on arena, right? So the Arena AI, formerly the LM arena, this is blind taste testing. You put a prompt in, it kicks out, you know, two different outputs from different models. You score which one's better. That's how you get an ELO score, right? And 15 months ago, the best ELO in the world was not 1450. Right? And that's what's crazy. So now this is scoring better at least on an Elo score and all the scientific benchmarks as the Frontier AI models from 15 months ago. So here's I, I, I have a little chart here on my screen for our live stream audience, podcast audience here. You know this one's not going to be super visual, but you could always go to our website at your everyday AI.com, click episodes and you can go watch today's show if you want to see the video version of this. But it should be fairly straightforward. I'm not going to be doing anything too visual, but we do have this from Google's announcement blog post that shows the model performance versus size. And you'll see this is literally the new Gemma 4B is uncharted territory because this is charting Elo score on one access and then the total model size on the other. So previously to get anything like a 1450 on an open source model, right? And we don't usually know the size of proprietary models, right? So your, you know, your Gemini 31 Pro, your, you know Claude Opus 4. 6, your GPT 54, you know, etc, but presumably they're, you know, multiple trillions, you know, maybe one and a half to two and a half trillion parameters, right? So think of this as like a, a hard drive size, right? If you want to simplify it. So to get this same level of performance, a 1450 is score from an open source model, you're looking at something that's about 300, 400 billion parameters in size. So again this is about 10% of that size, some of them, right. Like Kimmy K K 25 thinking, which is a lot of people's most favorite open source model. This is like a twentieth of the size, right? With the same, roughly the same level of performance at least when it comes to ELO in most scientific benchmarks. So now let's just quickly talk about, well, why would you even want to run anything locally? Like okay Jordan, what's the point? I don't care. $20 a month isn't a big deal. Sure, right. But think about doing this at scale. Think about agentic AI, right? Things like openclaw, right? And now that Anthropic has shifted away, a lot of people are having to now use some of these open source models, but using them via open router, so it's not completely free because you can't run, you literally can't run models like Quinn 35, GLM5 Kimming K2 thinking on anything less than a $8,000 computer, definitely on not your average MacBook Pro Pro, right? So that's the difference. This can't really run on true consumer prior open source models that might be able to run, you know, your agentic tools like Open Claw or other agents. You couldn't run it on consumer models. You have, you had to literally have like an 8,000, $10,000 computer or more. Right? And that's where the local aspect comes in handy. So being able to run things locally, keeping costs down, no subscription fees, no API keys, no loot, no usage limits after you download something. But also talk about privacy, you never have to send anything to a, to a cloud, right? Because yes, keep this in mind. As long as you are on a paid team plan with any of the big four, you know, and as long as you turn off model training or turn on the basic privacy settings, you're not setting technically any of that private information to these companies or they can't really do anything with it. Right. However, I do understand with highly sensitive documents how you might not want to send that even if you've turned off model training, right? So any sensitive industries like healthcare and legal can gain very capable AI without any cloud exposure. And then like I talked about, right, the combination of, well, it's free, it can run 24, seven, it's sensitive, it runs it all on your machine. You can literally turn the Internet off and use this. But then also with the new licensing, you can have full commercial use. So you can literally use this for anything. And there's three ways to run Gemma 4 today and then I'm going to show you how to do this live. Thanks for sticking with me. I wanted to, you know, first kind of tell you how important this is and kind of set the context here. But there's kind of three different ways, all for free that you can run. Gemma for one would be a tool like Ollama. Right? That's the one I'm going to show you. This is essentially it gives local models a graphical user interface like Chat GPT. Very simple, right? So you can then run a terminal command and download the models in minutes or you can even run that command in the Ollama interface. Also LM Studio is a great one. Same thing offers you a visual chat Interface Similar to ChatGPT for non developers. I guess another way. So hey, we'll just do four free ways to run it. You know, if you, any agentic system that you're running locally, you can point that, point that system to Gemma4 on hugging face or if it does, a lot of the, you know, local agents that run in your computer, they can run via, you know, Olama as well. So you can run an Olama command, hugging face command, you know, point your agents to the download, because that's the thing. You essentially download this thing and you run it also for the, the other versions, right? So the two that we're going to be looking at or the one are going to be the bigger variants, right? But to run the local ones, people don't know this. Google actually has a great app called Google AI Edge Gallery. You can download that for iOS or for Android. But this is huge. If you haven't done this already, you probably should do this because what that allows you to do is the equivalent of running it offline on your computer, right? This is an app where then you can download the smaller mobile Edge versions and then, hey, if you're ever in trouble or if you're ever somewhere where you just don't have service, you at least have a highly capable large language model on your phone that you can use at any time. All right, so let's get going. Liveish. First, I'm not going to download this live because it might take a while, right? And trying to download a large, a larger file like this while also streaming live doesn't always work. So here's what I did. I'm going to use Olama in this case. So here's what you're going to do. You're going to go to Olama, all right dot com. So that's O L L A M a dot com if you haven't already, you're going to download the program. Okay. This is a simple desktop client. Like I said, it just allows you to use any open source, open weight model on your computer, but it gives it a graphical interface. So in the same way that you would chat with chatgpt.com, gemini.com, claw AI etc, this allows you to work with open models in that interface because by default you'd be interacting with them via command line tools or the terminal, which is not always ideal for non technical users. So go to olama.com download that, install Olama on your local machine. All right, that's step one. Step two, you're going to download the actual model. So you're going to search, you can search models on Olama, just type in Gemma4 and then you're going to choose the variant that your local machine can run. So for most people that's going to be the 26B version for me, I'm going to be showing you the 31B version, all right? So all you have to do is, once you bring that model up on the Olama website, there's going to be a. It says cli, right? But there's a little command. You're just going to copy that, right? So this one says Olama, run Gemma 4 31B. Right? All you're going to do is copy that, then you're going to open Olama, right? And again, very simple. All you're going to do is then paste in that command, all right? And it's going to download the model. So for me, you know, this model was about 9 gigabytes. It took, I don't know, 5ish minutes to download and then that's it, and then you're ready to run with it. So let's go live. So here's what we are going to do. Live stream, audience, do me a favor, let me know if you can see my screen. All right? So I'm going to be jumping around a little bit here because I'm going to be having some, some copy and paste prompts. So about 15 months ago, I did a show comparing the latest version of Sonnet, which I believe was 3.7to GPT4O. So again, going back to how I started this show, this was about 14, 15 months ago. These were the best general use case models in the world. And I had a series of prompts. I kind of had like a very, you know, unofficial fun rubric that I would do comparing models. And I'm going to go ahead, run the exact same prompts. Right, so we're going hands on here. So I'm going to first put in a message to Gemma. And this is exactly what I did previously, just to kind of level the playing field. So all I'm saying is for this chat, please respond with proper formatting and structured bullet points. Do not waste words. Answer in the shortest way possible while still being detailed enough to fill in the user answering requests. Right? Right. So this is what I did for all the other ones. This is what I'm doing it for now. All right, so here is our rubric. So test one. This is just a trick question. It's logic, all right? And when I did this, Both Claude and Claude 37, Sonnet and GPT4O got it wrong. We'll see if Gemma 4 gets it correct. All right? Hey, love, love, love. When we get little bugs. All right, I'm going to have to run that again. It essentially went through the Thinking, Right? That's the thing this model thinks, and it reasons as well. And for those watching it live, you can just see that it did this. So we'll see if this got it correctly. Right. The correct answer should. Well, I should probably read it. I said, I just woke up today with six apples and three bananas. Yeah. Live stream audience or podcast audience, try to do this live. See if you can get it. I just woke up today with six apples and three bananas. Yesterday, I ate a banana and two apples. This morning, I will eat one apple and no bananas. However, I don't really like apples, and one banana may turn brown tomorrow. Assuming nothing else changes, how many apples and bananas will I have tonight? So, a little trick question. GPT4O and Claude 37 Sonnet got this wrong. All right, let's see. All right, so it looks like also, they both got it wrong. All right, so the correct answer is five apples and three bananas. All right, so Gemma four close, got five apples and two bananas, which technically. Not that we need to gauge, you know, the level of correct versus other models. Right. Claude Sonnet said three apples, two bananas. GPT4O said, three apples, two bananas. So they all got them wrong. Gemma got it a little closer. But, hey, did you get this right? All right, our next one. The old man and dog crossing a river. All right, so this also shows that the model is thinking, right? So if you're listening on the podcast, you're probably not seeing this, but it's also showing its thinking trace. All right, so the next one, I'm saying a man and his dog are standing on one side of the river. There's a boat with enough room for one human and one animal. How can a man get across with his dog in the fewest number of trips? What's so funny is I did all of these, all of these beforehand, just because I wanted to make sure that they would work. The first time, all right, it got this right. And now the second time, it just got this wrong. Which is funny, but you can always go back and look and look at how it thought. So, same thing Claude 37 sought. It got it wrong. GPT4O got it wrong. They both said three trips. The first. First time I ran this, Gemma4 got it right. This one, I'm doing it live here. It got it wrong. And then just for fun, I reran it again, and it still got it wrong. It said two trips. Interesting. That's the thing with large language models. They're generative. That's why it makes these live demos always Always fun, right? Because doing it before offline, I'm like, okay, cool. It looks like Gemma 4 is going to perform much better. But it's again, getting it wrong. As did the best models in the world 15 years ago, but it's getting it a little less wrong, at least for the first two times. All right, let's try the next. All right, so our next prompt here, we're saying it takes three hours to dry 10T shirts in the sun. How long will it take to dry 30T shirts in the sun? The correct answer is three hours. All right, and for reference, a year ish ago, Claude and GPT got that correct. All right, it is three hours. The time doesn't change, right? Assuming, and it did say drying principle, the time required to dry laundry is determined by external factors and sun intensity, humidity, not by the total quantity of items provided adequate space exists. So Gemini, Gemini 4. Sorry, Gemma 4 not only got it correct, but it did provide a some nice rationale as it thought through the problem. All right, the next one. And this already answered. That was so quick. Right, again, this is running all locally and that was probably faster than I would have even gotten from proprietary models online. Right. So I said, if you have a single match, I streamed. Did you see how fast that was? My gosh. So I said, if you have a single match and you walk into a room with an oil lamp, candle and a fireplace, which do you light first? Again, these are just fun trick questions. The correct answer is the match. All right, so it got that right. Claude and GPT4O also got that right. All right, our next one. What color is an airplane's black box? All right, it's taking a second to think bright orange. Got that correct. Good. As well as the others. Got that correct. All right, here's one. We'll see if this is actually correct because last time Claude saw it and GPT4O failed on this one. So I said, please give me seven jokes that end in the word blue. Two should be about animals, three should be about some other topic in the body of this chat. That's important. Right. Although in fairness to the Gemma models, that technically has a little bit more to do with the harnessing of Olama in this case. Right. So not exactly an apples to apples comparison, just FYI. All right, so I said three should be about, two should be about animals, three should be about some other topic in the body of this chat, and you should make up the other two. So first I'm going to see, did it get the correct number? Yes, it gave me two animals, three about chat topics and two original made up. So so far so good. Next, do they all end in the word blue? Blue, blue, blue, blue, blue, blue, blue. Yes. All right, so so far, good. And then I'm going to see. As long as they make sense. Right. These aren't always funny, but it at least has to be a joke to pass this rubric. All right, so animal joke. Why did the monkey fall into the paid bucket? Because he wasn't used to something so vividly blue. All right, Is that a joke? Sure. Is it funny? Absolutely not. All right, let's look at the chat topics, see if it got it right, pulled the context incorrectly. Why did the farmer throw away the apples? They were no longer crisp, just a sad brown blue. It's borderlining nonsensical. Right, let's look at the last one or the next one. Why couldn't the laundromat predict the drying time? Because the sunlight was so strangely blue. So these are borderline nonsensical. I could say you could make the argument they make sense. They're on. They're on the edge here. Then let's look at the original made up jokes. Hopefully these are a little bit better. All right, why did the geometry student bring a fishing pole? Because he was hoping to catch something entirely blue. All right, so the jokes are trash, but they actually follow the instructions. So when we're looking at instruction, following this technically passed, even though the jokes were absolute garbage. But like I said, Sonnet previously failed and GPT4O failed. All right, next one. All right, this one is much trickier. So I wouldn't expect Gemaphore to get this right. So I said a box is locked with a three digit numerical code. All we know is that all digits are different. The sum of all digits is 9 and the digit in the middle is the highest. What is the code? All right, so this is a very trick question because there are multiple valid answers. All right? But both Claude sawed it, got this wrong. GPT4.0 got this wrong. So what I'm looking for in a correct answer here, number one, that it even gives me at least one correct answer. But there's multiple correct answers. Right? Like as an example, 1-802-70351 would meet all those criteria. So Claude in GBT 4 oh, got this wrong. When we did the original testing, quads math didn't add up. GPT4O did not follow the rules. It had ones that added up to 9. Right. But as an example, it gave me 1 2, 6. But that didn't follow the rules because the middle digits, 2 was not the highest. So let's see, it thought for 22 seconds here it kind of went through the, the deduction process and it did give me a solution here. All right, so it technically is correct. Right. Whereas the other models did not even give me one correct code. Right. So quad 7, quad 37. Saw it said 172. That does not add up to 9. That adds up to 10. Like I said, GPT4O gave me 1, 2, 6, which is not follow the instructions because the middle digit was not the highest. So here technically, Gemma got it right. It didn't get it fully right, but it was the only one that got it right. It said the code is 2, 4, 3. This was technically a triple trick question because I asked for a code, but technically there are multiple codes. So it technically answered. But I would have loved a, a super correct answer where it said you asked for one correct answer. Here's one. But there's actually more correct answers. But I will say that at least now Gemma got the last two right. Where the last did not get any of them right. All right, this one we're going to go into some gray area here, all right? And I don't want to make this too long because it'll probably take another five to 10 minutes to go through the rubric. So I'm just going to find some other questions that the others maybe failed or just look into some gray area here talking about some creativity. So this one I said generate. Generate unique and creative marketing advertising strategies to grow the everyday AI podcast. Do not suggest general run of the mill ideas, only pitch clever advertising and marketing tactics to specifically grow the everyday AI podcast. All right, so for reference, a year ago, Claude said run AI teasers, virtual co host challenge, listener Q and A augmented reality experience. GPT4O said monthly puzzles, art contests, custom recommendations, guest AI co hosts. All right, so let's see what Gemma4 said. So it said partnership and cross promotion strategies, which is good because that's, you know, basics of growing a podcast. Right. Which the others didn't come up with, even though it's not super creative. All right, so it says AI tool integration ads. It said partner with niche specific non major AI tools. It's a good idea. Industry vertical sponsorships. Then it said content hijacking, viral strategies, doing an AI MythBusters challenge, interactive prom battles, then community engagement tactics. So the AI challenge hotline, I like that it says dedicate a specific call in segment where listeners Call with a real world mundane problem. Should we do that? Should we do that? All right, if you think we should do that, also, shout out. Because someone from Microsoft did suggest this to me, like two years ago. So shout out. I do remember Nissiani, you. You said I should do that. And I'm like, yeah, we should. All right, so if you think we should do that, just say hotline. Right? Do a. Do a comment in the live stream or leave a comment on the Spotify. Just say hotline if you think that'd be fun. Maybe it won't. All right, then it also said micro membership prompt vault. All right, so this is good. I would say these are much more impressive. Yes, this one requires judgment on my part. It is great area, but looking at what Claude 37 Sonnet and GT4 gave me, Gemma 4, much, much better. All right, let me do one other. That for sure, some failed on. All right, so uploading. Uploading photos might do that. Although I don't have the original photo that I use. Let me see. All right, let's just do one other one here. Okay, we're gonna do a. Uploading a transcript. I like that one. So let's go ahead and let me find this file here. All right, so I'm going to go ahead and put this prompt in. It's a little bit longer. And then I'm going to be uploading two different files here. So I want to make sure that I get these. Get these correct. All right, there we go. I should go in my downloads folder. That. That would help. All right, so here's what we're gonna try, and this will probably be the last one. All right, so I said for this chat, you will turn a podcast transcript of me, Jordan, the host of Everyday AI, talking about AI News, turn it into a choppy and engaging newsletter copy. I've attached examples of previous newsletters and how they should be written, as well as the most recent podcast transcript. So this is my podcast transcript from yesterday where we did a start here series about Vibe coding. And then I said, please write a newsletter for the attached transcript, mimicking the style as closely as possible to the example given. So we'll see here we're getting a little dot, dot, dot. So if I'm being honest, I don't know the last time that I uploaded two different file formats. So I uploaded a PDF and an RTF file here inside Olama. So again, this one is not the fairest comparison because again, technically, here we're also relying on the technically the harness of Ollama and not just the model of Gemma 4. Whereas before, you know, when we were testing this against GPT4O and Quad 3.7, we were using it. Okay, so I do know. Okay, it is working, right? I'm like, okay, this should be able to. Right? Olama is amazing. It should be able to handle, you know, multiple kind of file types. So we are. This one is taking the longest so far. This one is the first time that we're probably going to have it be able to think or reason for more than a minute or two. And again, y' all think about this. Just the fact that you can have a local model that now reasons without paying a cent is crazy. All right, so it also gave me a checklist of adherence, which is great because I didn't even ask it to do that. But that is something that. If I was rewriting this prompt that I've been using this for like two years, I would have rewrote this. So it went through. It created a checklist based on what it found from the examples that I that I uploaded. So as an example, right, I gave yesterday's transcript, and then I gave a 30 page document of older newsletters. So it went through. It examined those. It actually only took 27 seconds. It kind of picked out the tone style, the format, the context, source, all these things. Hook, intro, quality. Let's see. All right. It actually did a pretty good job because I remember this was my intro of the podcast. All right, so I'll read it. Let's be real. You can tell it. And if you read our newsletter, let me know if this sounds like it might be in our newsletter. All right, let's be real. You can tell an AI your wildest dream home, and poof, a building appears in front of your eyes in minutes. It's exactly what you asked for. You move in, it's awesome. But then you want to hang a towel rack, you run. You run into a wall, and you realize the entire thing is held together with duct tapes, hopes and dreams. There wasn't a permit and the foundation is shaky. You're in trouble. All right, this actually did almost too closely to my actual intro from yesterday. But as I'm looking forward this, as I'm looking at this, it actually did a pretty decent job of writing something in my tone, kind of this short, choppy style like I told it to. You know, an emoji in each headline, which is what we would normally do. It has an actionable. Try this section, which is something that we also do in the newsletter. So Although this is not, you know, the best ever, right? It actually did a pretty good job from what I recall. It did a little bit better job at instruction following than Claude 37 Sonnet. I do think Claude 37 Sonnet did a little bit with the tone of voice, but it did a better job tone of voice or matching the tone of voice than it did than GPT4.0 did. So overall, when I look at the, you know, six or seven kind of different unofficial rubric tests that we did here with a free local model, comparing it to the Frontier general use case models from 15 years ago, or sorry, 15 months ago, the best in the world, it actually did better because even though it failed right, and I wish it would have gotten it right like the two that it failed previously, it actually got those right the first time I ran it, which didn't happen with 3.7sonnet or GPT4.0. But still head to head in this very unofficial rubric, it did markedly better than the best models in the world from a year in three months ago. All right, so as we wrap this one up, here's what I want to leave you with. Open source AI is getting smaller, faster and harder to ignore. All right? Google built Gemma 4 specifically for Agentic workflows with native function calling. All right, so even though I didn't give an example of, you know, running this agentically, you now, I cannot tell you how important that is. If you have a middle of the road, new, you know, MacBook Pro as an example, you can now have an agent that works for you 24. 7 that costs $0.00. It's 100 private. Also, that's worth noting. This is based off of the Gemini 3 model family, so you're not getting quite the Gemini 3 level. But again, you are getting a top three open source model in the world and the only one that you can run on consumer hardware. So now users can route routine AI tasks locally and cut significant costs on AI bills. And the gap between the free local models in paid cloud services keeps shrinking fast and you can no longer ignore it. All right, if this was helpful, tell someone about it. You know, if you're listening live here on LinkedIn, take a second to repost this. I'd really appreciate that. If you are listening on the podcast, do me a favor, take 30 seconds, make sure that you're following or subscribed to the show. But then if you could, if any episode of Everyday AI has been helpful, right, because we spend literally countless hours helping you all understand how this works. So if this has been helpful, please leave us a rating on those platforms as well. So thank you for tuning in. Make sure to go to your everydayai.com Sign up for the free daily newsletter. We're going to be recapping today's show and a whole lot more, so thank you for tuning in. Hope to see you back tomorrow and every day for more Everyday AI. Thanks y'. All.
[42:50]
A
And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going for a little more AI magic. Visit your everydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.