(BNS) Simon Willison And SWYX Tell Us Where AI Is In 2025 - Tech Brew Ride Home

Summary7 min read

Techmeme Ride Home: Simon Willison and Swix Discuss the Future of AI in 2025

Release Date: January 11, 2025
Host: Brian McCullough
Guests: Simon Willison, Swix

Introduction

In the first bonus episode of Techmeme Ride Home for 2025, host Brian McCullough welcomes tech analyst Simon Willison and AI guru Swix to discuss the current state and future trajectory of artificial intelligence. The conversation delves into advancements in AI models, cost reductions, multimodal capabilities, the evolving role of AI agents, and the integration of AI into creative industries.

The State of AI in 2025

Simon Willison opens the discussion by highlighting significant trends in AI over the past year:

Performance Enhancements: AI models have become significantly faster, cheaper, and more efficient. "Everything's got really good and fast and cheap," he remarks ([00:40]).
Multimodal Capabilities: Beyond text, models now adeptly handle images, video, and audio, expanding their usability across various applications.
Stable Model Improvements: Contrary to expectations, models haven’t seen a dramatic leap like GPT-5. Instead, incremental improvements have made existing models like GPT-4 more cost-effective and capable.

AI Model Efficiency and Cost Trends

Simon discusses the breaking of the GPT-4 barrier:

Cost Reduction: "OpenAI's GPT4 models are now 100 times cheaper than they were two and a half years ago," he states ([06:22]).
Increased Accessibility: New models from competitors such as Google’s Gemini 1.5 Flash and Deepseek V3 are not only competitive with GPT-4 but also accessible for local use. Simon shares, "I ran another GPT4 model on my laptop... it runs on my MacBook Pro" ([04:21]).
Training Costs: Deepseek’s demonstration of training a top-tier model for just $5.5 million challenges the belief that AI training requires exorbitant investments, potentially democratizing AI development ([10:30]).

Competitive Landscape and Industry Impact

Swix attributes the aggressive cost reductions to intense industry competition:

Sustainable Pricing: "Google Gemini is not operating at a loss," Swix explains, indicating that companies are optimizing efficiency to offer affordable AI solutions without sacrificing profitability ([09:14]).
Open Weights and Efficiency: The emergence of open-weight models allows for broader innovation and competition, further driving down costs and enhancing capabilities.

AI Agents: Potential and Challenges

The conversation shifts to AI agents, where both guests express cautious optimism:

Current Limitations: Simon criticizes the reliability of autonomous agents due to their propensity to "believe anything you tell them." He emphasizes the importance of human oversight in decision-making roles ([21:18]).
Proven Successes: Research assistant agents and coding tools that iterate based on error feedback have shown tangible benefits, contrasting with fully autonomous agents that handle sensitive tasks like booking travel ([25:00]).
Future Outlook: While certain agent functionalities are advancing, fully autonomous agents capable of making complex decisions independently remain a long-term goal.

Multimodal AI Advances

Simon and Swix explore the strides made in multimodal AI:

Video and Audio Integration: Models now seamlessly handle simultaneous audio and visual inputs, enabling applications like real-time video analysis and interactive media creation.
Creative Applications: AI tools are beginning to integrate into film production, assisting with tasks like visual effects and storyboard generation. Simon envisions AI empowering creative teams to achieve ambitious projects with enhanced efficiency ([35:24]).
Tools and Models: Innovations like Google’s Gemini 1.5 Pro and Alibaba’s Quen demonstrate the growing capability of AI to handle complex multimedia tasks, including reasoning and artistic creation ([16:03]).

AI in Creative Industries

The integration of AI into creative workflows is a focal point:

Film Production: AI tools are being adopted by top-tier creative teams to streamline effects and production processes. Swix notes collaborations between AI companies and film studios to enhance movie-making capabilities ([35:24]).
Content Generation: AI-generated assets are increasingly used for background elements and special effects, reducing the need for extensive manual labor while maintaining high production standards.

Credibility and Trust in AI

A critical discussion revolves around the credibility of AI-generated content:

Human Oversight: Simon stresses the necessity of human review to ensure the reliability and trustworthiness of AI outputs. "If a human being has reviewed it and said, you know what? This is actually worth other people's time," he explains ([44:02]).
Defining "Slop": Unreviewed and unsolicited AI-generated content is termed "slop," emphasizing the importance of editorial control to maintain quality and credibility ([44:09]).
Ethical Considerations: The integrity of information sources is paramount in an era saturated with AI-generated content, urging creators to uphold transparency and trustworthiness.

User Interface Innovations for LLMs

The need for improved user interfaces for large language models (LLMs) is highlighted:

Prompt-Driven UIs: Simon envisions interfaces where LLMs generate interactive elements like custom dashboards or sliders based on user prompts, enhancing interactivity and usability ([50:55]).
Collaborative Platforms: Tools like OpenAI’s ChatGPT Canvas and Bolt demonstrate early steps toward more intuitive and interactive AI interfaces, enabling users to collaborate with models in real-time on visual and functional tasks.

Local vs. API-Based AI Models

The debate between local AI models and cloud-based APIs is revisited:

Advancements in Local Models: Recent efficiency improvements have made local models more viable, allowing users like Simon to run powerful AI on personal devices without exorbitant hardware requirements ([55:57]).
Recommended Tools: Simon recommends tools such as MLC Chat for iPhones, OLAMA for laptops, and LM Studio for user-friendly interfaces, emphasizing the increasing accessibility of high-performance local models ([61:35]).

Future Trends and AI Wearables

The guests discuss emerging trends, particularly in AI-enabled wearables:

Smart Glasses and Earbuds: Innovations in AI wearables are making devices like smart glasses and advanced earbuds more capable and affordable, with potential applications in areas like perfect memory aids and enhanced productivity ([75:03]).
Privacy Concerns: The integration of AI into daily wearables raises significant privacy issues, necessitating thoughtful regulation and societal discourse on acceptable usage ([77:35]).

Perspectives on OpenAI and Competitors

Simon provides insights into the competitive dynamics of AI companies:

OpenAI’s Position: While still a major player, OpenAI faces stiff competition from Google’s Gemini and Anthropic’s Claude, which have made significant inroads in model performance and cost-efficiency ([66:13]).
Talent Retention: Challenges such as talent retention are impacting OpenAI’s leadership position, though strategic innovations like OpenAI’s O3 models help maintain their market relevance ([66:13]).

Importance of Better Criticism of LLMs

Simon advocates for more nuanced and constructive criticism of LLMs:

Balanced Discussions: Moving beyond binary narratives of AI being either destructively overhyped or useless, he calls for high-quality conversations that explore both the benefits and challenges of AI.
Addressing Real Issues: Topics such as environmental impact, data privacy, and the implications of AI on various professions require thoughtful debate and actionable solutions ([67:40]).

Recommendations of AI Tools

Both guests share their preferred AI applications:

Simon’s Picks:
- MLC Chat: Ideal for iPhone users interested in local AI models.
- Olama & LM Studio: Recommended for laptop users seeking robust local model interfaces.
- MacWhisper: Facilitates seamless transcription and integration with AI tools for content creation ([61:57]).
Swix’s Picks:
- Super Whisperer: Enhances voice transcription with AI-driven refinements.
- Rosebud: Focused on AI for journaling and mental health applications.
- HeyGen: Utilized for creating AI-generated avatars for content creation ([61:57]).

Conclusion

The episode concludes with a forward-looking perspective on AI’s evolution. Simon and Swix express optimism about ongoing advancements while acknowledging the challenges that lie ahead, such as improving AI credibility, enhancing user interfaces, and navigating regulatory landscapes. Brian McCullough emphasizes the importance of embracing AI innovations thoughtfully to maximize their potential benefits while mitigating risks.

Key Takeaways:

AI models are becoming more efficient, affordable, and multimodal, making advanced capabilities accessible to a broader audience.
The competitive landscape is driving significant cost reductions and innovation, challenging dominant players like OpenAI.
AI agents hold promise but face reliability and ethical hurdles that require human oversight and refined definitions.
Multimodal AI and integrations into creative industries are expanding the scope and impact of AI technologies.
Enhancing user interfaces and ensuring credible, human-reviewed AI outputs are crucial for widespread adoption and trust.

Notable Quotes:

Simon Willison ([00:40]): "Everything's got really good and fast and cheap."
Simon Willison ([04:21]): "This was our Christmas gift... everything's trending smaller and faster and more efficient."
Simon Willison ([44:02]): "If a human being has reviewed it and said, you know what? This is actually worth other people's time."
Simon Willison ([50:55]): "I think there's so much scope for innovation... why should you just be communicating with text when it can build interfaces on the fly?"

By synthesizing the comprehensive discussion between Simon Willison and Swix, this summary provides a thorough overview of the key points and insights shared about the evolving landscape of AI in 2025.

Loading summary

Transcript189 lines

[00:00]
Swix
Foreign.
[00:05]
Brian McCullough
Welcome to the first bonus episode of the Tech Meme Ride Home for the year 2025. I'm your host as always, Brian McCullough. Listeners to the pod over the last year know that I have made a habit of quoting from Simon Willison when new stuff happens in AI from his blog. Simon has become a go to for many folks in terms of analyzing things, criticizing things in the AI space. I've wanted to talk to you for a long time, Simon, so thank you for coming on the show.
[00:41]
Simon Willison
No, it's a privilege to be here.
[00:43]
Brian McCullough
The person that made this connection happen is our friend Swix, who has been on the show even going back to the Twitter space's days, but also an AI guru in their own right. Swix, thanks for coming on the show.
[00:59]
Swix
Also thanks. Happy to be on and have been a regular listener, so just happy to contribute as well and a good friend.
[01:08]
Brian McCullough
Of the pod, as they say. All right, let's go right into it, Simon. I'm going to do the most unfair, broad question first, so let's get it out of the way. The year 2025, broadly, what is the state of AI as we begin this year? Whatever you want to say, I don't want to lead the witness. Wow.
[01:30]
Simon Willison
So many things, right? I mean, the big thing is everything's got really good and fast and cheap. Like that was the trend throughout all of 2024. The good models got so much cheaper, they got so much faster. They got multimodal, right? The image stuff isn't even a surprise anymore. They go at growing video, all of that kind of stuff. So that's all really exciting. At the same time, they didn't get massively better than GPT4, which was a bit of a surprise. So that's sort of one of the open questions is are we going to see. But I kind of feel like that's a bit of a distraction because GPT4, but way cheaper, much larger context lengths and it can do. Multimodal is better, right? That's a better model, even if it's.
[02:11]
Brian McCullough
Not what people were expecting or hoping. Maybe not expecting is not the right word, but hoping that we would see another step change. Right, right. From like GPT 2, 3 to 4. We were expecting or hoping that maybe we were going to see the next evolution in that sort of.
[02:28]
Simon Willison
We did see that, but not in the way we expected. We thought the model was just going to get smarter and instead we got massive drops in price. We got all of these new capabilities. You can talk to the things now, right? They can do simulated audio input, all of that kind of stuff. And so it's kind of. It's interesting to me that the models improved in all of these ways we weren't necessarily expecting. I didn't know it would be able to do an impersonation of Santa Claus. I could talk to it through my phone and show it what I was seeing by the end of 2024. But yeah, we didn't get that GPT5 step and that's one of the big open questions is, is that actually just around the corner and we'll have a bunch of GPT5 class models drop in the next few months?
[03:10]
Brian McCullough
If you were a betting man and wanted to put money on it, do you expect to see a phase change, step change in 2025?
[03:18]
Simon Willison
I don't particularly for that, like the models, but smarter. I think all of the trends we're seeing right now are going to keep on going, especially the inference time. Compute. Right. The trick that O1 and O3 are doing, which means that you can solve harder problems but that costs more and it churns away for longer. I think that's going to happen because that's already proven to work. I don't know. I don't know. Maybe there will be a step change to a GPT5 level. But honestly, I'd be completely happy if we got what we've got right now, but cheaper and faster and more capabilities and longer contexts and so forth. That would be thrilling to me.
[03:53]
Brian McCullough
Digging into what you've just said, one of the things that, by the way, I hope to link in the show notes to Simon's year end post about what, what things we learned about LLMs in 2024. Look for that in the show notes. One of the things that you did say that you alluded to even right there, was that in the last year you felt like the GPT4 barrier was broken. Like ie other models, even open source ones, are now regularly matching, sort of the state of the art.
[04:22]
Simon Willison
Well, it's interesting. So the GPT4 barrier was a year ago, the best available model was OpenAI's GPT4 and nobody else had even come close to it. And they'd been in the lead for like nine months. Right. That thing came out in what, February, March of 2023. And for the rest of 2023, nobody else came close. And so at the start of last year, like a year ago, the big question was why has nobody beaten them yet? What did they know that the rest of the industry doesn't know? And Today I've counted 18 organizations other than GPT4 who've put out a model which clearly beats that GPT4 from a year ago thing. Maybe they're not better than GPT 4.0, but that barrier got completely smashed. And yeah, a few of those I've run on my laptop, which is wild to me. It felt very clear to me a year ago that if you want GPT4, you need a rack of $40,000 GPUs just to run thing. That turned out not to be true. This is that big trend from last year of the models getting more efficient, cheaper to run, just as capable with smaller weights and so forth. I ran another GPT4 model on my laptop this morning. Microsoft 5.4 just came out and that if you look at the benchmarks, it's up there with GPT4O it's probably not as good when you actually get into the vibes of the thing, but it's, it runs on my. It's a 14 gigabyte download and I can run it on a MacBook Pro. Like, who saw that coming? The most exciting, like the close of the year on Christmas Day just a few weeks ago was when Deepseek dropped their Deepseek V3 model on hugging face without even a readme file. It was just like a giant binary blob that I can't run on my laptop. It's too big. But in all of the benchmarks it's now by far the best available open weights model. Like it's beating the Metalamas and so forth. And that was trained for five and a half million dollars, which is a tenth of the price that people thought it cost to train these things. So everything's trending smaller and faster and more efficient.
[06:23]
Brian McCullough
Well, okay, I kind of was going to get to that later. But let's combine this with what I was going to ask you next, which is you're talking also in the piece about the LLM prices crashing, which I've even seen in projects that I'm working on. But explain that to a general audience because we hear all the time that LLMs are eye wateringly expensive to run. But we're suggesting, and we'll come back to the cheap Chinese LLM. But first of all, for the end user, what you're suggesting is that we're starting to see the cost come down sort of in the traditional technology way of cost coming down over time.
[06:57]
Simon Willison
Yes, but very aggressively. I mean my favorite thing, the example here is if you look at GPT3 so OpenAI's GPT3, which was the best available model in 2022 and through most of 2023 that the models that we have today, the OpenAI models, are 100 times cheaper. So there was a 100x drop in price for OpenAI from their best available model like two and a half years ago to today.
[07:22]
Brian McCullough
Just to be clear, not to train the model but for the use of.
[07:26]
Simon Willison
Tokens and exactly for running prompts through them. And then when you look at the. Really the top tier model providers right now I think are OpenAI, Anthropic, Google and Meta and there are a bunch of others that I could list there as well. Mistral are very good. The DeepSeq and Quen models have got great. There's a whole bunch of providers serving really good models. But even if you just look at the sort of big brand name providers, they all offer models now that are a fraction of the price of the models we were using last year. I think I've got some numbers that I threw into my blog entry here. Yeah, like Gemini 1.5 Flash, that's Google's fast, high quality model is how much is that? It's $0.075 per million tokens like these numbers are getting.
[08:17]
Swix
So we just do cents per million now. Cents per million.
[08:20]
Simon Willison
Cents per million makes a lot more sense. Yeah. They have one model, 1.5 Flash 8B, the absolute cheapest of the Google models is 27 times cheaper than GPT 3.5 Turbo was a year ago. Like that's a GPT 3.5 Turbo. That was the cheap model right. Now we've got something 27 times cheaper. And this Google one can do image recognition, it can do million token context. All of those tricks. It really is startling how inexpensive some of this stuff has got.
[08:51]
Brian McCullough
Now are we assuming that happening is directly the result of competition? Because again, OpenAI and probably they're doing this for their own almost political reasons, strategic reasons. Keep saying we're losing money on everything, even the $200. So they probably wouldn't. The prices wouldn't be coming down if there wasn't intense competition in this space.
[09:14]
Simon Willison
The competition's absolutely part of it. But I have it on good authority from sources I trust that Google Gemini is not operating at a loss. Like the amount of electricity to run a prompt is less than they charge you. And the same thing for Amazon Nova. Somebody found an Amazon executive and got them to say, yeah, we're not losing money on this. I don't know about Anthropic and OpenAI but clearly that demonstrates it is possible to run these things at these ludicrously low prices and still not be running at a loss if you discount the army of PhDs and the training costs and all of that kind of stuff.
[09:47]
Brian McCullough
One more for me before I let Swix jump in here to come back to Deepseek and this idea that you could train a cutting Edge model for $6 million. I was saying on the show six months ago that if we are getting to the point where each new model cost a billion, 10 billion, 100 billion to train, that at some point it would almost only nation states would be able to train the new models. Do you expect what Deepseek and maybe others are proving to sort of blow that up? Or is there like some sort of a parallel track here that maybe I'm not technically, I don't have the now to understand the difference. Is the model. Are the models going to go up to $100 billion or can we get them down sort of like Deep Seek has proven?
[10:31]
Simon Willison
So I am the wrong person to answer that because I don't work in a lab training these models. So I can give you my completely uninformed opinion, which is, I feel like the Deep Seq thing, that was a bombshell. That was an absolute bombshell when they came out and said, hey, look, we've trained one of the best available models and it cost us five and a half million dollars to do it. I feel. And one of the reasons it's so efficient is that we put all of these export controls in to stop Chinese companies from buying GPUs. So they were forced to go as efficient as possible. And yet the fact that they've demonstrated that that's possible, I think it does completely tear apart this mental model we had before that. Yeah, the training runs just keep on getting more and more expensive and the number of organisations that can afford to run these training runs keeps on shrinking. That's been blown out of the water. So, yeah, again, this was our Christmas gift. This was the thing they dropped on Christmas Day. Yeah. It makes me really optimistic that we can. It feels like there is so much low hanging fruit in terms of the efficiency of both inference and training. And we spent a whole bunch of last year exploring that and getting results from it. I think there's probably a lot left. I think there's probably. I would not be surprised to see even better models trained spending even less money over the next six months.
[11:46]
Swix
Yeah. So I think there's an unspoken angle here on what exactly the Chinese labs are trying to do because Deep Sea made a lot of noise around the fact that they train their model for $6 million and nobody quite believes them. It's very, very rare for a lab to trumpet the fact that they're doing it for so cheap. They're not trying to get anyone to buy them. So why are they doing this? They make it very, very obvious that their lab, DeepSeek, is about 150 employees. It's an order of magnitude smaller than at least anthropic and maybe more so for OpenAI. And so what's the end game here? Are they just trying to show that the Chinese are better than us?
[12:37]
Simon Willison
So deepseek, it's the arm of Hedge, it's a quant fund. Right. It's an algorithmic quant trading thing. So I would love to get more insight into how that organization works. My assumption from what I've seen is it looks like they're basically just flexing. They're like, hey, look at how utterly brilliant we are with this amazing thing that we've done. And it's working. Right? Is that it? Is this just that kind of like, this is why our company is so amazing. Look at this thing that we've done. Or I don't know. I'd love to get some insight from within that industry as to how that's all playing out.
[13:14]
Swix
The prevailing theory among the local llama crew and the Twitter crew that I index for my newsletter is that there is some amount of copying going on. It's like Sam Altman tweeting about how they're being copied. And then also there are other sort of OpenAI employees that have said stuff that is similar, that Deepseek's rate of progress is how US Intelligence estimates the number of foreign spies embedded in top labs. Because a lot of these ideas do spread around, but they surprisingly have a very high density of them in the Deepsea v3 technical report. So it's interesting. We don't know how much tokens, I think, that people have run analysis on how often Deepseek thinks it is Claude, or thinks it is opening GPT4. And we don't know, we don't know. I think for me, we basically will never know as external commentators. I think what's interesting is where does this go? Is there a logical floor or bottom? By my estimations, for the same amount of ELO start of last year to the end of last year, cost went down by 1000x for GPT4 intelligence. Do they go down 1000x this year?
[14:32]
Simon Willison
That's a fascinating question. Yeah.
[14:36]
Swix
Is there a Moore's Law going on or did we just get a one off benefit last year for some weird reason?
[14:43]
Simon Willison
My uninformed hunch is low hanging fruit. I feel like up until a year ago, people hadn't been focusing on efficiency at all. It was all about what can we get these weird shaped things to do? And now once we've hit that, okay, we know that we can get them to do what GPT4 can do. When thousands of researchers around the world all focus on, okay, how do we make this more efficient? What are the most important? Like, how do we strip out all of the weights that have stuff and that doesn't really matter, all of that kind of thing. So yeah, maybe that was it. Maybe 2024 was a freak year of all of the low hanging fruit coming out at once and we'll actually see a reduction in that rate of improvement in terms of efficiency. I wonder. I mean, I think we'll know for sure in about three months time if that trend is going to continue or not.
[15:28]
Swix
I agree. I think the other thing that you mentioned, that DeepSeek v3 was the gift that was given from DeepSeek over Christmas, but I feel like the other thing that might be underrated was Deepseek R1, which is a reasoning model you can run on your laptop. And I think that's something that a lot of people are looking ahead to this year.
[15:50]
Simon Willison
They released the weights for that one.
[15:51]
Swix
Yeah.
[15:52]
Simon Willison
Oh my goodness, I missed that. I've been playing Quen. So the other big Chinese AI lab is Alibaba's Quen.
[15:59]
Swix
Actually. Yeah, I'm Sorry, I missed R1. Is API available?
[16:04]
Simon Willison
Exactly. Quen. That's really cool. So Alibaba's Quen have released two reasoning models that I've run on my laptop now. The first one was QWQ and then the second one was qvq because the second one's a vision model. So you can give it vision puzzles and a prompt that these things, they are so much fun to run because they think out loud. It's like the openai01 sort of hides its thinking process. The Quern ones don't. They just churn away. And so you'll give it a problem and it will output literally dozens of paragraphs of text about how it's thinking. My favorite thing that happened with QWQ is I asked it to draw me a pelican on a bicycle in svg. That's like my standard stupid prompt. And for some reason it thought in Chinese. It spat out a whole bunch of like Chinese text onto my terminal on my laptop and. And then at the end it gave me quite a good sort of artistic pelican on a bicycle. And I ran it all through Google Translate and yeah, it was contemplating the nature of SVG files as a starting point. And the fact that my laptop can think in Chinese now is so delightful. It's so much fun watching it do that.
[17:15]
Swix
Yeah, I think Andrej Karpathy was saying, we know that we have achieved proper reasoning inside of these models when they stop thinking in English and perhaps the best form of thought is in Chinese. But yeah, for listeners who don't know Simon's blog, he always whenever a new model comes out, I don't know how you do it, but you're always the first to run Pelican Bench on these models.
[17:37]
Simon Willison
I just did.
[17:38]
Swix
And you post up the results.
[17:39]
Simon Willison
Yeah.
[17:41]
Swix
So I really appreciate that you should check it out. These are not theoretical like Simon's blog actually shows them.
[17:48]
Sponsor
Growing your small business in 2025 all comes down to how well you can hire better hires. Start with Smarter Insights and LinkedIn has the strength, strongest hiring data and insights to help you identify the right candidates so you can make the best hiring decisions. Start the new year off hiring smarter with LinkedIn. LinkedIn pairs you with the best candidates using data you won't find anywhere else, from unique skills and interests to the connections you have in common. No wonder that based on LinkedIn data, 72% of small and medium businesses using LinkedIn say that LinkedIn helps them find high quality candidates. LinkedIn also lets you go beyond candidates who are actively applying. In a given week on LinkedIn, 171 million LinkedIn members aren't actively seeking jobs, but are open to new opportunities. That's a big pool to miss out on if you're not hiring with LinkedIn. So hire smarter in the new year with the only hiring tool I've ever used for my businesses. Post your job for free@LinkedIn.com ride that's LinkedIn.com ride to post your job for free. Terms and conditions apply.
[18:53]
Brian McCullough
This year. Actually.
[18:55]
Sponsor
Achieve your New Year's resolutions. Feel your best, regain your energy, face your fears. No matter what your Goals are for 2025. This supplement can help because it supports your health at the foundation by encouraging cellular renewal. Mitopure is a precise dose of the rare postbiotic Urolithin A. It works by promoting an essential cellular cleanup process that clears out dysfunctional mitochondria, AKA your cell's battery packs. Mitopure is the only Urolithian, a supplement on the market clinically proven to target the effects of age related cellular decline. With regular use you'll see and feel the difference in the form of improved energy levels, better workouts, faster recovery, more endurance and more, all of which will help you achieve your New Year's goals. Psychological Mitopure is shown to deliver double digit increases in muscle strength and endurance without a change in exercise. Win Cellular health is the foundation of well being and longevity. Mitopure recharges your cells supporting any New Year's goal by helping all of your systems work better. Timeline is offering 33% off your order of Mitopure while supplies last. Go to timeline.com Ride33 that's T I M E L I N E.com.
[20:13]
Brian McCullough
Let me put on the investor hat for a second because from the investor side of things, a lot of the the VCs that I know are really hot on agents and this is the year of agents. But last year was supposed to be the year of agents as well. Lots of money flowing towards agentic startups. But in, in your piece that again we're hopefully going to have linked in the show notes, you sort of suggest there's a fundamental flaw in AI agents as they exist right now. Let me, let me quote you and then I'd love to dive into this. You said I remain skeptical as to their ability. Based once again on the challenge of gullibility. LLMs believe anything you tell them. Any systems that attempt to make meaningful decisions on your behalf will run into the same roadblock. How good is a travel agent or a digital assistant or even a research tool if it can't distinguish truth from fiction? So essentially what you're suggesting is that the state of the art now that allows agents is still, it's still that sort of 90% problem, the edge problem. Getting to the or is there a deeper flaw? What are you saying there?
[21:19]
Simon Willison
So this is the fundamental challenge here and honestly my frustration with agents is mainly around definitions. Like if you ask anyone who says they're working on agents to define agents, you will get a subtly different definition from each person. But everyone always assumes that their definition is the one true one that everyone else understands. So I feel like a lot of these agent conversations, people talking past each other because one person's talking about the sort of travel agent idea of something that books things on your behalf, somebody else is talking about LLMs with tools running in a loop with a cron job somewhere and all of these different Things you ask academics and they'll laugh at you because they've been debating what agents mean for over 30 years at this point. It's like this long running, almost sort of an in joke in that community. But if we assume that for this purpose of this conversation, an agent is something which you can give a job and it goes off and it does that thing for you like booking travel or things like that, the fundamental challenge is it's the reliability thing, which comes from this gullibility problem. And a lot of my interest in this originally came from when I was thinking about prompt injection as source of this form of attack against LLM systems, where you deliberately lay traps out there for this LLM to stumble across, at.
[22:28]
Brian McCullough
Which I should say you have been banging this drum that no one's gotten any far, at least on solving this that I'm aware of. Right. That's still an open problem.
[22:37]
Simon Willison
We've been talking about this problem and a great illustration of this was Claude Anthropic released Claude Computer Use a few months ago. Fantastic demo. You could fire up a Docker container and you could literally tell it to do something and watch it open a web browser and navigate to a web page and click around and so forth. Really, really, really interesting and fun to play with. And then one of the first demos somebody tried was what if you give it a web page that says download and run this executable? And it did, and the executable was malware that added it to a botnet. So the very first, most obvious dumb trick that you could play on this thing just worked. Right. So that's obviously a really big problem if I'm going to send something out to Book Travel on my behalf. I mean, it's hard enough for me to figure out which airlines are trying to scam me and which ones aren't. Do I really trust a language model that believes the literal truth of anything that's presented to it to go out and do those things?
[23:36]
Swix
Yeah, I definitely think there's. It's interesting to see Anthropic doing this because they used to be the safety arm of OpenAI that split out and said we are. We're worried about letting this thing out in the wild. And here they are enabling computer use for agents. It feels like things have merged. I'm also fairly skeptical about this always being the year of Linux on the desktop. And this is the equivalent of this being the year of agents that people are not predicting so much as wishfully thinking and hoping and praying for their companies and agents to Work. But I feel like things are coming along a little bit. To me it's kind of like self driving. I remember in 2014 saying that self driving was just around the corner. And I mean it kind of is, you know, like in the Bay Area.
[24:27]
Simon Willison
And then you get in the Waymo and you're like, oh, this works.
[24:30]
Swix
Yeah, it's a slow cook. It's a slow cook. Over the next 10 years we're going to hammer out these things and the cynical people can just point to all the flaws. But there are measurable or concrete progress steps that are being made by these builders.
[24:43]
Simon Willison
There is one form of agent that I believe in. I believe mostly believe in the research assistant form of agents. Yes, the thing where you've got a difficult problem and I've got like, I'm on the beta for the Google Gemini 1.5 Pro with deep research, I think.
[24:58]
Swix
It'S called these names.
[25:00]
Simon Willison
These names, right. But I've been using that. It's good, right? You can give it a difficult problem and it tells you, okay, I'm going to look at 56 different websites and it goes away and it dumps everything to its context and it comes up with a report for you and it's not. It won't work against adversarial websites.
[25:18]
Brian McCullough
Right.
[25:18]
Simon Willison
If there are websites with deliberate lies in them, it might well get caught out. Most things don't have that as a problem. And so I've had some answers from that which were genuinely really valuable to me. And that feels to me like I can see how given existing LLM tech, especially with Google Gemini with its like million token contacts and Google with their crawl of the entire web and they've got like Search, they've got Search and cat, they've got a cache of every page and so forth. That makes sense to me. And that what they've got right now, I don't think it's not as good as it can be, obviously, but it's a real useful thing which they're going to start rolling out. So Perplexity have been building the same thing for a couple of years that I believe in. If you tell me that you're going to have an agent that's a research assistant agent. Great. The coding agents, I mean ChatGPT code interpreter, nearly two years ago that thing started writing Python code, executing the code, getting errors, rewriting it to fix the errors. That pattern obviously works. That works really, really well. So yeah, coding agents that do that sort of error message loop thing, those are proven to work and they're going to keep on getting better, and that's going to be great. The research assistant agents are just beginning to get there. The things I'm critical of are the ones where you trust. You trust this thing to go out and act autonomously on your behalf and make decisions on your behalf. Especially involving spending money like that. I don't see that working for a very long time. That feels to me like an AGI level problem.
[26:48]
Swix
It's funny because I think Stripe actually released an agent toolkit, which is one of the things I featured, that is trying to enable these agents each to have a wallet that they can go and spend and have. Basically, it's a virtual card. It's not that. Not that difficult.
[27:03]
Simon Willison
With modern infrastructure, if you stick a $50 cap on it, then at least everyone can't lose more than $50.
[27:08]
Brian McCullough
You know, I don't know if either of you know Rafael Ali. He runs skift, which is a travel news vertical, and he constantly laughs at the fact that every agent thing is, we're going to get rid of booking a plane flight for you. And I would point out that historically, when the web started, the first thing everyone talked about is you can go online and book a trip, right? So it's funny, for each generation of, like, technological advance, the thing they always want to kill is the travel agent, and now they want to kill the webpage travel agent.
[27:45]
Simon Willison
I use Google flight search. It's great, right? If you gave me an agent to do that for me, it would save me, I mean, maybe 15 seconds of typing in my things, but I still want to see what my options are and go, yeah, I'm not flying on that airline, no matter how cheap they are.
[27:59]
Swix
Yeah, for listeners. Go ahead. No, go for listeners. I think, you know, I think both of you are pretty positive on NotebookLM. And, you know, we actually interviewed the NotebookLM creators, and there are actually two internal agents going on internally. The reason it takes so long is because they're running an agent loop inside that is fairly autonomous, which is kind of interesting.
[28:19]
Simon Willison
One for a definition of agent loop. If you pick one definition. And you're talking about the podcast, podcast side of this, right?
[28:25]
Swix
Yeah, the podcast side of things. They have a. There's going to be a new version coming out that we'll be featuring at our. At our conference.
[28:34]
Simon Willison
That one's fascinating to me. Like Notebook lm. I think it's two products, right? On the one hand, it's actually a very good rag product, right? You dump a bunch of things in, you can run searches that that it doesn't always was. And then, and then they added the podcast thing as a bit of a. It's a total gimmick. Right? But that gimmick got them attention because they had a great product that nobody paid any attention to at all. And then you add the unfeasibly good voice synthesis of the podcast. Like.
[29:03]
Brian McCullough
It'S the lesson of like midjourney and stuff like that. If you can create something that people can post on socials, like, you don't have to lift a finger again to do any marketing for what you're doing. Let me dig into NotebookLM just for a second. As a podcaster, as a gimmick, it makes sense. And then obviously you dig into it. It sort of has problems around the edges. Like it does the thing that all sort of LLMs kind of do where it's like, oh, we want to wrap up with a conclusion. I always call that like the 8th grade book report paper problem where it has to have an intro and. But that's sort of a thing where. Because I think you spoke about this again in your piece at the year end about how things are going multimodal and how things are that you didn't expect like, you know, vision and especially audio. So that's another thing where at least over the last year there's been progress made that maybe you, you didn't think was coming as quick as it came.
[30:04]
Simon Willison
I don't know. I mean, a year ago we had one really good vision model. We had GPT4. Vision was very impressive. And Google Gemini had just dropped Gemini 1.0, which had vision but nobody had really played with it yet. Like Google hadn't. People weren't taking Gemini seriously at that point. I feel like it was 1.5 Pro when it became apparent that actually they got over their hump and they were building really good models. And yeah, and to be honest, the video models are mostly still using the same trick, the thing where you divide the video up into one image per second and you dump that all into the context. So maybe it shouldn't have been so surprising to us that long context models plus vision meant that video was starting to be solved. Of course, what you really want with videos, you want to be able to do the audio and the images at the same time. And I think the models are beginning to do that now. Like originally, Gemini 1.5 Pro originally ignored the audio. It just did the one frame per second video trick. As far as I can tell, the most recent ones are actually doing pure multimodal. But the things that opens up are just extraordinary. The ChatGPT iPhone app feature that they shipped as one of their 12 days of OpenAI, I really can be having a conversation and just turn on my video camera and go, hey, what kind of tree is this? And so forth. And it works. And for all I know, that's just snapping a picture once a second and feeding it into the model. But the things that you can do with that as an end user are extraordinary. That to me, I don't think most people have cottoned onto the fact that you can now stream video directly into a model because it. It's only a few weeks old. But, wow, that's a big boost in terms of what kinds of things you can do with this stuff.
[31:50]
Swix
Yeah. For people who are not that close, I think Gemini Flash's free tier allows you to do something like capture one photo every second or a minute and leave it on 24. 7 and you can prompt it to do whatever. And so you can effectively have your own camera app or monitoring app that you just prompt. And it detects for changes, it detects for alerts or anything like that, or describes your day. And the fact that this is free, I think it also leads into the previous point of the prices having come.
[32:30]
Simon Willison
Down a lot, even if you're paying for this stuff. A thing I put in my blog entry is I ran a calculation on what would cost to process 68,000 photographs in my photo collection and for each one, just generate a caption. And using Gemini 1.5 Flash 8B, it would cost me $1.68 to process 68,000 images, which is. I mean, that doesn't make sense. None of that makes sense. Like it's 1 400th of a cent per image to generate captions now. So you can see why feeding in a day's worth of video just isn't even very expensive to process.
[33:08]
Swix
Yeah. I'll tell you what is expensive, it's the other direction. So here we're talking about consuming video. And this year we also had a lot of progress. Like probably one of the most excited, excited, anticipated launches of the year was Sora. We actually got Sora and less exciting we did.
[33:25]
Simon Willison
And then VO2, Google's Sora came out like three days later and upstaged it. Like, Sora was exciting until VO2 landed, which was just better in general, I.
[33:35]
Swix
Feel the media or that social media has been very unfair to Sora, because what was released to the world, generally available, was Sora Lite is the distilled version of Sora. Right.
[33:45]
Simon Willison
I did not realize that you're absolutely.
[33:47]
Swix
Comparing the most cherry picked version of VO2, the one that they published on the marketing page, to the most embarrassing versions of Sora. So of course it's going to look bad.
[33:56]
Simon Willison
Well, I got access to the VO2, I'm in the VO2 beta and I've been poking around with it and getting it to generate pelicans on bicycles and stuff.
[34:04]
Swix
I would absolutely believe that VO2 is actually better.
[34:06]
Simon Willison
Is Sora. So is full fat Sora coming soon? Do you know when do we get to play with.
[34:12]
Swix
No one's mentioned anything. I think basically the strategy is let people play around with Sora Lite and get info there, but keep developing Sora with the Hollywood studios. That's what they actually care about. Like the rest of us don't really know what to do with the video anyway.
[34:29]
Simon Willison
I mean, that's my thing is I realized that for generative images and images and video, like images we've had for a few years and I don't feel like they've broken out into the talented artist community yet. Like lots of people are having fun with them and doing and producing stuff that's kind of cool to look at. But what I want that movie Everything Everywhere all at once. Right. One ton of Oscars, Utterly amazing film. The VFX team for that were five people, some of whom were watching YouTube videos to figure out what to do. My big question for Sora and midjourney and stuff, what happens when a creative team like that starts using these tools? I want the creative geniuses behind Everything Everywhere all at once. What are they going to be able to do with this stuff in like a few years time? Because that's really exciting to me. That's where you take artists who are at the very peak of their game, give them these new capabilities and see what they can do with them.
[35:24]
Swix
I know a little bit here. So I should mention that that team actually used Runway ML. So there was in that movie. Yeah. I don't know how much. So, you know, it's impossible to overstate this, but there are people integrating it generated video within the workflow, even pre Sora.
[35:42]
Brian McCullough
Right. Because it's not the thing where it's like, okay, tomorrow we'll be able to do a full two hour movie that you prompt with three sentences. It is like for the very first part of video effects in film. It's like if you can get that three second clip, if you can get that 20 second thing that they did in the Matrix that blew everyone's minds and took a million dollars or whatever to do. It's the little bits and pieces that they can fill in now that it's probably already there.
[36:07]
Swix
Yeah, I think actually having a layered view of what assets people need and letting AI fill in the low value assets. Right. Like the background video, the background music and you know, sometimes the sound effects that, that maybe may be more palatable. Maybe also changes the way that you evaluate the stuff that's coming out. Because people tend to in social media try to emphasize foreground stuff, main character stuff. So you really care about consistency and you really are bothered when like for example, Sora botches image generation of a gymnast doing flips, which is horrible. It's horrible. But for background crowds, like, who cares?
[36:54]
Brian McCullough
And by the way, again, I was a film major way, way back in the day. That's how it started. Like things like Braveheart where they filmed 10 people on a field and then the computer could turn it into a thousand people on a field. Like that's always been the way it's around the margins and in the background that, that, that first comes in.
[37:12]
Simon Willison
Right?
[37:12]
Brian McCullough
Yeah.
[37:12]
Simon Willison
The Lord of the Rings movies were over 20 years ago, although they had those giant battle sequences which were very early. Like, I mean you could almost call it a generative AI approach. Right. They were using very sophisticated like algorithms to model out those different battles and all of that kind of stuff. Yeah, I know very little, I know basically nothing about film production, so I try not to commentate on it. But I am fascinated to see what happens when, when these tools start being used by the, the people at the top of their game.
[37:42]
Swix
I, I, I would say like there's a cultural war that is more that being fought here than a technology war. Most of the Hollywood people are against any form of AI, any anyway, so they're busy fighting that battle instead of thinking about how to adopt it. And it's, it's very fringe. I participated here in San Francisco one generative AI video creative hackathon where the AI positive artists actually met with technologists like myself and then we collaborated together to build short films. And that was really nice. And I think I'll be hosting some of those at my events going forward. One thing that I think I want to give people a sense of is this is a recap of last year. But then sometimes it's useful to walk away as well with like, what can we expect in the future? I don't know if you got anything. I would also call out that the Chinese models here have made a lot of progress. Hailuo and Cling and God knows who else in the video arena also making a lot of progress. I think maybe actually China is surprisingly ahead with regards to open weights at least, but also just specific forms of video generation.
[38:50]
Simon Willison
Wouldn't it be interesting if a film industry sprung up in a country that we don't normally think of having? A really strong film industry that was using these tools like that would be a fascinating sort of angle on this.
[39:04]
Brian McCullough
Agreed. Oh, sorry. Go ahead.
[39:09]
Swix
Just for people's. Just to put it on people's radar as well. Hey Gen, there's a. There's a category of video avatar companies that don't specialize in general video. They only do talking heads. Let's just say heyjan's doing very well.
[39:26]
Brian McCullough
Swix, you know that that's what I've been using, right? Yeah. Right. So if you see some of my recent YouTube videos and things like that where. Because the beauty part of the heygen thing is I don't want to use the robot voice. So I record the MP3 file for my clips every single day and then I put that into hey Gen with the avatar that I've trained it on and all it does is the lip sync. So it looks, it's not 100% uncanny valley beatable, but it's good enough that if you weren't looking for it, it's just me sitting there doing one of my clips from the show and yeah, so by the way, hey Jen, shout out to them.
[40:05]
Swix
So I would. In terms of the look ahead, reviewing 2024, looking at trends for 2025, they basically call this out Meta tried to introduce AI influencers and failed horribly because they were just bad at it. But at some point there will be more and more basically AI influencers, not in a way that Simon is, but in a way that they are not human.
[40:32]
Simon Willison
The few of those that have done well, I always feel like they're doing well because it's a gimmick. Right. It's novel and fun like that the AI Seinfeld thing from last year, the Twitch stream, you know, like those. If you're the only one or one of just a few doing that you'll get, you'll attract an audience because it's an interesting new thing. But I just, I don't know if that's going to be sustainable longer term or not.
[40:53]
Brian McCullough
Like I'm going to tell you because I've had discussions, I can't name the companies or whatever, but so think about the workflow for this like now we all know that on TikTok and Instagram, like holding up a phone to your face and doing like in my car video, or walking a walk and talk, you know, that's very common. But also, if you want to do a professional sort of talking head video, you still have to sit in front of a camera, you still have to do the lighting, you still have to do the video editing, versus, if you can just record what I'm saying right now, the last 30 seconds. If you clip that out as an MP3 and you have a good enough avatar, that then you can put that avatar in front of Times Square on a beach or whatever. So, like, again, for creators, the reason I think, Simon, we're on the verge of something. It's not going to. I think it's not. Oh, we're going to have AI avatars take over. It'll be one of those things where it takes another piece of the workflow out and simplify it.
[41:49]
Simon Willison
I'm all for that. I always love this. I like tools, tools that help human beings do more ambitious things. I'm always in favor of. That's what excites me about this entire field.
[42:00]
Swix
Yeah, we're looking into basically creating one for my podcast. We have this guy, Charlie, he's Australian, he's not real, but he opens every show and we're going to have him present all the shorts.
[42:14]
Brian McCullough
Yeah, go ahead.
[42:16]
Simon Willison
The thing that I keep coming back to is this idea of credibility. In a world that is full of AI generated everything and so forth, it becomes even more important that people find the sources of information that they trust and find people and find sources that are credible. And I feel like that's the One thing that LLMs and AI can never have, is credibility. Right. ChatGPT can never stake its reputation on telling you something useful and interesting because that means nothing. Right. It's a matrix multiplication. It depends on who prompted it and so forth. So I'm always, and this is when I'm blogging as well, I'm always looking for, okay, who are the reliable people who will tell me useful, interesting information, who aren't just going to tell me whatever somebody's paying them to tell them, who aren't going to type a one sentence prompt into an LLM and spit out an essay and stick it online? And that, to me, earning that credibility is really important. That's why a lot of my ethics around the way that I publish are based on the idea that I want people to trust me. I want to do things that gain credibility in people's eyes so that they will come to me for information as a trustworthy source. And it's the same for the sources that I'm consulting as well. I've been thinking a lot about that sort of credibility focus on this thing for a while now.
[43:25]
Swix
Yeah, you can layer or structure credibility or decompose it. So one thing I would put in front of you, I'm not saying that you should agree with this or accept this at all, is that you can use AI to generate different variations. And you, as the final sort of last mile person, you pick the last output and you put your stamp of credibility behind that. Everything's human reviewed instead of human origin.
[43:49]
Simon Willison
That's the thing. If you publish something, you need to be able to be proud of publishing it. You need to be able to say, I will put my name to this, I will attach my credibility to this thing. And if you're willing to do that, then, then that's great.
[44:02]
Swix
For creators, this is huge because there's a fundamental asymmetry between starting with a blank slate versus choosing from five different variations.
[44:09]
Brian McCullough
Right. And also the key thing that you just said is like, if everything that I do, if all of the words were generated by an LLM, if the voice is generated by an LLM, if the video is also generated by the LLM, then I haven't done anything. Right. But if one or two of those, you take a shortcut. But it's still. I'm willing to sign off on it. I feel like that's where I feel like people are coming around to like, this is maybe acceptable, sort of.
[44:39]
Simon Willison
This is where I've been pushing the definition. I love the term slop. Where I've been pushing the definition of slop as AI generated content that is both unrequested and unreviewed. And the unreviewed thing is really important. The thing that elevates something from slop to not slop is if a human being has reviewed it and said, you know what? This is actually worth other people's time. And again, I'm willing to attach my credibility to it and say, hey, this is worthwhile.
[45:02]
Brian McCullough
It's the curatorial and editorial part of it, that no matter what the tools are to do shortcuts, to do, as Swix is saying, choose between different edits or different cuts. But in the end, if there's a curatorial mind or editorial mind behind it, I want to wedge this in before we start to close. One of the things coming back to your year end piece, that has been something that I've been banging the drum about is when you're talking about LLMs getting harder to use. You said most users are thrown in at the deep end. The default LLM chat UI is like taking brand new computer users, dropping them into a Linux terminal and expecting them to figure it all out. I mean it's literally going back to the command line. The command line was defeated by the GUI interface. This is what I've been banging the drum about is this cannot be the user interface. What we have now cannot be the end result. Do you see any hints or seeds of a GUI moment for LLM interfaces?
[46:11]
Simon Willison
I mean, it has to happen. It absolutely has to happen. The usability of these things is turning into a bit of a crisis and we are at least seeing some really interesting innovation in little directions. Just like OpenAI's ChatGPT canvas thing that they just launched. That is at least going a little bit more interesting than just chat, chats and responses. Exploring that space where you're collaborating with an LLM, you're both working on the same document. That makes a lot of sense to me. That feels really smart. One of the best things is still who was it who did the. The UI where you could. They had a drawing UI where you draw an interface and click a button TLDraw with their make it real thing. That was spectacular. Absolutely spectacular. Like alternative vision of how you'd interact with these models. Because yeah, the. And that's, you know, so I feel like there is so much scope for innovation there and it is beginning to happen. Like, like I feel like most people do understand that we need to do better in terms of interfaces that both help explain what's going on and give people better tools for working with models.
[47:19]
Brian McCullough
I was going to say I want to.
[47:22]
Sponsor
Small steps today can have a huge impact on your future. You know the saying from Acorns, Mighty oaks do grow. Which is why I love our sponsor, Acorns. Acorns makes it easy to start automatically saving and investing so your money has a chance to grow for you, your kids and your retirement. You don't need to be an expert. Acorns will recommend a diversified portfolio that fits you and your money goals. You don't need to be rich. Acorns lets you invest with the spare money you've got right now. You can start with $5 or even just your spare change. Plus you can earn bonus investments just for buying what you need from brands you love. You don't need a ton of time either. You can create your Acorns account and start investing in just five minutes. You don't need to feel like financial wellness isn't possible. Acorns gives you small, simple steps to get you and your money on track. Basically, Acorns does the hard part so you can give your money a chance to grow. To me, this is a no brainer New Year's resolution sort of thing. Head to acorns.com ride or download the Acorns app to start saving and investing for your future today. Paid non client endorsement compensation provides incentive to positively promote Acorns tier one compensation provided investing involves risk. Acorns Advisors LLC and SEC registered investment advisor. View important disclosures@acorns.com Ride have you heard about Senolytics yet? It's a class of ingredients discovered less than 10 years ago and they're being called the biggest discovery of our time for promoting healthy aging and enhancing your physical prime. As we age, everyone accumulates senescent cells in their bodies. Senescent cells cause symptoms of aging such as aches and discomfort, slow workout recovery, sluggish mental and physical energy associated with that middle age feeling. Also known as zombie cells, they are old and worn out and not serving a useful function for our health anymore, but they're taking up space and nutrients from our healthy cells. Much like pruning the yellowing and dead leaves off a plant, Qualia Senolytic removes those worn out senescent cells to allow for the rest of them to thrive in the body. Take it just two days a month. The formula is non GMO vegan gluten free and the ingredients are meant to complement one another, factoring in the combined effect of all ingredients together. Resist aging at the cellular level. Try Qualia senolytic. Go to qualialife.com ride for up to 50% off and use code ride at checkout for an additional 15% off. For your convenience, Qualia Senolytic is also available at select GNC locations near you. That's Q U A l I a life.com ride for an extra 15% off your purchase. Thanks to Qualia for sponsoring today's episode.
[50:04]
Brian McCullough
Dig a little deeper into this because think of the conceptual idea behind the gui, which is instead of typing into a command line open word exe, you click an icon, right? So that's abstracting away again the programming stuff that a child can tap on an iPad and make a program open. But the problem, it seems to me right now with how we're interacting with LLMs is it's sort of like a dumb robot where it's like you poke it and it goes over here, but no, I want to go over here. So you poke it this way and you can't get it exactly right. What can we abstract away from the, from the current? What's going on that makes it more fine tuned and easier to get more precise. You see what I'm saying?
[50:55]
Simon Willison
Yes. This is the other trend that I've been following from the last year, which I think super interesting. It's the prompt driven UI development thing. Basically. This is the pattern where Claude artifacts was the first thing to do this really well. You type in a prompt and it goes, oh, I should answer that by writing a custom HTML and JavaScript application for you that does a certain thing. And when you think about that, and since then it turns out this is easy, right? Every decent LLM can produce HTML and JavaScript that does something useful. So we've actually got this alternative way of interacting where they can respond to your prompt with an interactive custom interface that you can work with. People haven't quite wired those back up again. Ideally I'd want the LLM to be able to ask me a question where it builds me a custom little UI for that question and then it gets to see how I interacted with that. I don't know why that's like just such a small step from where we are right now, but that feels like such an obvious next step, like an LLM. Why should you just be communicating with text when it can build interfaces on the fly that let you select a point on a map or move like sliders up and down?
[52:07]
Brian McCullough
All of that. Knobs and dials. I keep saying knobs and dials.
[52:11]
Simon Willison
We can do that and the LLMs can build and Claude artefacts will build you a knobs and dials interface. But at the moment they haven't closed the loop. When you twiddle those knobs, Claude doesn't see what you were doing. They're going to close that loop. I'm shocked that they haven't done it yet. So, yeah, I think there's so much scope for innovation and there's so much scope for doing interesting stuff with that model where the LLM, anything you can represent in HTML, JavaScript and SVG, which is almost everything, can now be part of that ongoing conversation.
[52:42]
Swix
Yeah, I would say the best executed version of this I've seen so far is Bolt, where you can literally type in, make a Spotify clone, make an Airbnb clone, and it actually just does that for you. Zero shot with a nice design.
[52:58]
Simon Willison
Did you see there's a benchmark for that. Now the LM arena people now have a benchmark that is zero shot Apple app generation because all of the models can do it. I've started figuring out how I'm building my own version of this for my own project because I think within six months, I think it'll just be an expected feature. If you have a web application, why don't you have a thing where, oh, look, you can add a custom. For my dataset data exploration project, I want you to be able to do things like conjure up a dashboard just via a prompt. You say, oh, I need a pie chart and the bar chart and put them next to each other and then have a form where submitting the form inserts a row into my database table. And this is all suddenly feasible. It's not even particularly difficult to do, which is utterly bizarre that these things are now easy.
[53:44]
Swix
I think for a general audience, that is what I would highlight. That software creation is becoming easier and easier. Gemini is now available in Gmail and Google Sheets. I don't write my own Google Sheets formulas anymore. I just tell Gemini to do it. And so I think those are. I almost want to basically somewhat disagree with your assertion that LLMs got harder to use. Yes, we exp more capabilities, but they're in minor forms like using Canvas, like web search in ChatGPT and like Gemini being in Excel Sheets or in Google Sheets. Like, yeah, we're getting.
[54:21]
Simon Willison
No, no, no, no. Those are the things that make it harder because the problem is that for each of those features, they're amazing if you understand the edges of the feature. If you're like, okay, so in Google Gemini Excel formulas, I can get it to do a certain amount of things, but I can't get it to go and read a web. You probably can't get it to read a web page. Right? But there are things that it can do and things that it can't do, which are completely undocumented. If you ask it what it can and can't do, they're terrible at answering questions about that. So my favorite example is Claude artifacts. You can't build a Claud artifact that can hit an API somewhere else because the CORS headers on that iframe prevents accessing anything outside of CDN js. So good luck learning CORS headers as an end user. In order to understand why, like I've seen people saying, oh, this is rubbish. I tried building an artifact that would run a prompt and it couldn't because Claude didn't expose an API with CORS headers that all of this stuff is so weird and complicated and yeah like that, that the more that with the more tools we add, the more expertise you need to really to understand the full scope of what you can do. And so I wouldn't say it's like the question really comes down to what does it take to understand the full extent of what's possible. And honestly that's just getting more and more involved over time.
[55:43]
Swix
Yeah. I have one more topic that I think you're kind of a champion of and we've touched on it a little bit, which is local LLMs and running AI applications on your desktop. I feel like you are an early adopter of many, many things.
[55:58]
Brian McCullough
Wow.
[55:59]
Simon Willison
I had an interesting experience with that over the past year. Six months ago I almost completely lost interest. And the reason is that six months ago the best local models you could run, there was no point in using them at all because the best hosted models were so much better. There was no point at which I choose to run a model on my laptop if I had API access to Claude 3.5 Sonic. They just, they weren't even comparable. And that changed basically in the past three months as the local models had this step changing capability where now I can run some of these local models and they're not as good as Claude 3.5 Sonnet, but they're not so far away that it's not worth me even using them. The continuing problem is I've only got 64 gigabytes of RAM and if you run like Llama 370B, most of my RAM is gone. So now I have to shut down my Firefox tabs and, and my Chrome and my VS code Windows in order to run it. But it's got me interested again. The efficiency improvements are such that now if you were to stick me on a desert island with my laptop, I'd be very productive using those local models and that's pretty exciting. And if those trends continue and also I think my next laptop when I buy one is going to have twice the amount of ram. At which point maybe I can run the almost the top tier like open weight models and still be able to use it as a computer as well. Nvidia just announced their $3,000 128 gigabyte monstrosity. That's pretty good price. You know, that's, that's if things is.
[57:30]
Swix
Custom OS and all.
[57:34]
Simon Willison
If I get a job, if I have enough of an income that I can justify blowing $3,000 on it, then yes.
[57:40]
Brian McCullough
Okay, let's do a GoFundMe to get Simon one of it.
[57:43]
Swix
Come on. You know you can get a job anytime you want. This is just purely discretionary.
[57:48]
Simon Willison
I want a job that pays me to do exactly what I'm doing already and doesn't tell me what else to do. That's the challenge.
[57:54]
Swix
I think Ethan Mollick does pretty well whatever it is he's doing. But yeah, basically I was trying to bring in also, you know, not just local models, but Apple Intelligence is on every Mac machine. You seem skeptical. It's rubbish.
[58:10]
Simon Willison
Apple Intelligence is so bad.
[58:12]
Swix
Like it does one thing well.
[58:14]
Simon Willison
Oh yeah, what's that?
[58:15]
Swix
It summarizes notifications and sometimes it's humorous.
[58:18]
Brian McCullough
Are you sure it does that well? And also, by the way, the other, again, from a sort of a normie point of view. There's no indication from Apple of when to use it. Like everybody upgrades their thing and it's like, okay, now you have Apple Intelligence and you never know when to use it ever again.
[58:35]
Swix
Oh yeah, you consult the Apple docs, which is mkbhd.
[58:40]
Simon Willison
The one thing I'll say about Apple Intelligence is one of the reasons it's so disappointing is that the models are just weak. But now, like llama3b is such a good model in a 2 gigabyte file, I think give Apple six months and hopefully they'll catch up to the state of the art and the small models and then maybe it'll start being a lot more interesting.
[58:59]
Swix
Yeah. Anyway, this was year one and just like first year of iPhone, maybe not that much of a hit. And then year three they had the app store, so I would say give it some time. And I think Chrome also shipping Gemini Nano, I think this year in Chrome, which means that every web app will have for free access to a local model that just ships in the browser, which is kind of interesting. And then I think I also wanted to just open the floor for any of us. What are the apps that AI applications that we've adopted that we really recommend? Because these are all apps that are running on a browser or apps that are running locally that other people should be trying. I feel like that was one thing that is helpful at the start of the year.
[59:51]
Simon Willison
Okay, so for running local models, my top picks, firstly on the iPhone, there's this thing called MLC Chat which works and it's easy to install and it runs llama 3B and it's so much fun. Like it's not necessarily a capable enough novel that I use it for real things, but My party trick right now is I get my phone to write a Netflix Christmas movie plot outline where like a jeweler falls in love with the king of Sweden or whatever, and it does a good job and it comes up with pun names for the movies. And that's. That's deeply entertaining. On my laptop most recently, I've been getting heavy into Olama because the Olama team are very, very good at finding the good models and packaging them up and making them work well. It gives you an API. My little LLM command line tool has a plugin that talks to Olama, which works really well. So that's my Olama is, I think, the easiest on ramp to running models locally. If you want a nice user interface, LM Studio is, I think, the best user interface thing of that. It's not open source. It's good. It's worth playing with. The other one that I've been trying with recently, there's this thing called. What's it called? Open Web UI or something. The UI is fantastic. If you've got Olama running and you fire this thing up, it spots Olama and it gives you an interface onto your Olama models. And that's really nicely done. That's my current favorite open source UI for these things. But yeah, so there's lots of good options. You do need a lot of disk space. Like the models. The models start at 2 gigabytes for like the 3B models that are actually worth playing with. The really impressive ones tend to be in the sort of 20 to 30 gigabyte range. In my experience.
[61:35]
Swix
Yeah. I think my struggle here is I'm not that much of an absolutist in terms of running things locally, like I'm happy to call an API.
[61:44]
Simon Willison
Same here.
[61:45]
Swix
Okay. Yeah, I just, I do it to play.
[61:48]
Simon Willison
Yeah, it's my research interest. Yeah.
[61:50]
Brian McCullough
But when people get so excited, answer your own question. Give us more apps that you want to.
[61:58]
Swix
Yeah, sometimes it's just nice to recommend apps. So I use Super Whisperer now. I tried Whisper Flow, didn't really work for me. Super Whisperer is one of them, which basically replaces typing. Like, you should just talk most of the time, especially if you're doing anything long form. I hold down caps lock and I talk. And then when I'm done, I lift it up. And it's not just about writing down your transcripts because I make ums and ahs all the time. I restate myself all the time. But it uses GPT4 to rewrite and that's what these guys are doing. They're all doing some form of state of the art ASR, automatic speech recognition, and then an LLM to rewrite. And then I think I would also recommend for people to check out Rosebud for journaling. I think AI for mental health is quite unexplored. And it's not because we are trying to build AI therapy. I think the therapists really hate that. You'll never be on the level of therapist.
[62:56]
Brian McCullough
That gets back to the human thing that we were discussing. You know, on some level, there are certain things and disciplines that require the human touch. And that might be.
[63:05]
Swix
Sure, but the human touch costs me $300 an hour. Yes, right. And this thing's $3 a month. Like, you know, so there's a spectrum of people for whom that will work. And I think it's cheap now to try all these things.
[63:23]
Simon Willison
I'm going to throw in a quick recommendation for an app. Mac Whisper is my favorite. That's your desktop app. I love that thing. It runs Whisper and you can do things like you can paste in the URL to a YouTube video and it'll pull the audio and give you a transcript. So that's how I watch YouTube now. I slap it into MacWhisper, and then I hit copy and paste into Claude, and then I use the Claude web app to do things. But MacWhisper, it works with MP3 files. Every time I'm on a podcast, I dump the MP3 into MacWhisper. Then I dump the transcript into Claude and say, what should I put in the show notes? And it spits out a bullet point list where it says, oh, you mentioned data set that you should link to that, that kind of thing, stuff like that. MacWhisper. I use it several times a day, to be honest. It's great.
[64:07]
Swix
Yeah.
[64:09]
Brian McCullough
I'm going to say one that is incredibly super basic. And again, coming back to just my workflow. But we are currently recording this on Riverside. Riverside is a great tool for recording video, audio, things like we're doing right now. But I always use this as an example to folks when they're like, well, how. What will AI do for me? When I first started using Riverside, like, we're recording three different channels right now, right? You guys are recording locally. So there's three audio files, three video files. And then when I first started using Riverside, you had to pump three tracks into Adobe and then edit. Okay, now we focus on Simon. Now we focus on Swix. Now we focus on Brian. Now we do all three. And then one day a tool Popped up that says, hit this button and it's smart edit. And then the AI determines. Okay, Simon has been talking for 30 minutes, so go to the full shot of him. And Brian is now talking or there's over talk. So let's have all three talking heads with one button. For anything I posted, it saved me three or four hours worth of work. That to me is like, again, if normies are listening.
[65:18]
Simon Willison
Riverside has that feature now.
[65:20]
Swix
Yeah, yeah. I don't use it.
[65:22]
Simon Willison
Oh, that sounds fantastic.
[65:24]
Swix
I still use a human editor.
[65:26]
Brian McCullough
The day it came out, I was running around the house telling my wife, telling anyone that would listen, you don't know. I just saved three hours because they had a new feature. Like, that's.
[65:35]
Swix
That's Brian's basically crying with joy right now.
[65:40]
Brian McCullough
All right, let's try to bring this to a landing a little bit. Simon, I have about maybe two or three more. We can do these rapid fire. One of my shows, one of the things of my show is it's sort of like Silicon Valley writ large. So it's sort of like the horse race of who's up and who's down or whatever. To the degree that you're interested in pontificating on this. OpenAI as a company in 2025, do you see challenges coming? Are you bearish? Bullish. I almost am doing a CNBC sort of thing. But, like, how do you feel about OpenAI this year?
[66:14]
Simon Willison
I think. I think they're in a bit of trouble. They seem to have lost a lot of talent. Like, they're losing and they don't have that. If it wasn't for O3, they'd be in massive trouble because they'd have lost that, like, top of the pile thing. I think O3 clawed them back up again. But one of the big stories of 2024 is OpenAI started as the clear leader and now Google Gemini is really good. Google Gemini had an amazing year. Anthropic Claude 3.5 summit is still my personal favorite model, and that feels notable. OpenAI went from. Nobody would argue they were not the leader in all of this stuff a year ago. And today they're still doing great, but they're not as far ahead as they were.
[66:57]
Brian McCullough
Next question. And maybe this couldn't be as rapid fire, but I loved, finally, from your piece, the idea that LLMs need better criticism, which I'd love you to expand on, because as I sort of straddle this world of tech journalism and creator and investor and all that stuff, I thought that you had a really interesting thing to say about how. And we even alluded to this about like Hollywood being against it, like, better criticism in the sense that as I took it, everybody is sort of. They've got their hackles up, they're trying to defend their livelihoods and things like that. But it's either this is going to destroy my job and destroy the world, or like, I'm sorry, I'm again leading the witness. What did you mean by LLMs need better criticism?
[67:41]
Simon Willison
So this is a frustration I have that if I read a discussion thread somewhere about, on this topic, I can predict exactly what everyone's going to say. People talk about the environmental impact. They talk about the plagiarism of the training data, the unlicensed training data. There's often this sort of, oh, and these things are completely useless thing. That's the one that I will push back against. The other things are true, right? The idea that LLMs are just completely useless. The argument I always make there is they are very useful if you understand how to use them, which is distinctly unintuitive. Like, you have to learn how to deal with something that will just wildly hallucinate and make things up and all of those kinds of things. If you can learn how to. What they're good at and what they're bad at. I use them dozens of times a day and I get enormous value out of them. So I'll push back on people who say, no, they're just useless. But the other things, you know, the environmental impact of the way the training data works. I feel like the training data one's interesting because it's probably legal under fair use, but it's clearly unfair if somebody takes your work without your permission and trains a model which then competes with you in the marketplace. Like, like legal or not. That, that, that's, that's. I understand why people are upset about that. That's a reasonable thing to be upset by. So what I want, and I also feel like the impact that this stuff can have on society, especially as it starts undermining all sorts of jobs that we never thought were going to be undermined by technology. Like, who thought it would come for artists and lawyers first? Right? That's bizarre. We need to have really high quality conversations where we help people figure out what works, what doesn't work. We need people to be able to make good decisions about what to do with their careers, to embrace this stuff and all of that sort of stuff. And if we just get distracted by saying, yeah, but it's useless, plagiarism driven, like environmental Catastrophic. Even though those things represent quite a lot of truth, I don't think that that's a useful message to lead with. Like I want to be having the much more interesting high level conversations. Okay, well if there are negatives, what do we do to counter those negatives? If there are positives, how do we encourage those? How do we help people make good decisions about how to use this technology?
[69:49]
Swix
I think where I see this the most is for people who are kind of very internal. Like sort of you and I are immersed in this every single day. So we're frankly tired of the same debates being recycled again and again. I think what might be more useful or more impactful is the level at which it starts to hit regulation. Last year we had a couple of very notable attempts at the White House level and in the California level to regulate AI and those did not come to pass. But at some point these criticisms bubble up to law, to matters of national security or national science in progress. And I feel like there needs to be more information or enlightenment there, maybe if only because it tends to be that they're very trailing. Like my favorite example to pick on, which is very unfair of me. But whatever the California SB 1047 act tried to cap compute at 10 to the power 25, which is exactly deep seek. Exactly. Well, it also is exactly at the point at which we pivoted from training GPT 5 to 01 where there is not no longer, no longer scaling pre trained compute. What I'm saying is like we're always trying to regulate the last war and I don't think that works in a field that is basically eight years old.
[71:15]
Simon Willison
I think I've got. There are two areas of regulation. I'm super interested in that. One of them is I do think that regulating the way these things are used can work. The big example is I don't want somebody's insurance claim denied by a black box LLM where nobody can explain what it did.
[71:32]
Swix
Oh, we have laws for that.
[71:33]
Simon Willison
This is like redlining those laws. Take those laws, reinforce them, update them for modern capabilities. And then the other one. There's some really interesting stuff around privacy. We've got this huge problem right now where people will refuse to use any of these tools because they don't trust that the things they say to it won't be trained on and then exposed to other people. And there are lots of terms and conditions that you can read through and try and navigate around. I would love there to be just really straightforward laws that people understand where they know that it's not going to train on their input because there's a law that says under these circumstances that that can't happen. Like that sort of stuff. It's basically taking our existing privacy laws and giving them a few more teeth and just reinforcing them without introducing cookie banners a la the European Union. Right. These things are always very it's very risky to try and get the stuff right because you can have all sorts of bad results if you don't design them correctly. But there's space for that, I think.
[72:31]
Sponsor
Even if you think it's a bit overhyped. AI is suddenly everywhere from self driving cars to molecular medicine to business efficiency. If it's not in your industry yet, it's coming fast, but AI needs a lot of speed and computing power, so how do you compete without costs spiraling out of control? Time to upgrade to the next generation of the cloud Oracle Cloud Infrastructure Oracle OCI OCI is a blazing, fast and secure platform for your infrastructure, database, application development, plus all your AI and machine learning workloads. OCI costs 50% less for compute and 80% less for networking, so you're saving a pile of money. Thousands of businesses have already upgraded to oci, including Vodafone, Thompson, Reuters and Suno AI. Right now, Oracle is offering to cut your current cloud bill in half if you move to OCI for new US customers with minimum financial commitment. Offer ends March 31st. See if your company qualifies for this Special offer@oracle.com Techmeme that's oracle.com Techmeme.
[73:33]
Most weight loss plans are one size fits all, not taking into account each person's individual needs. Noom, on the other hand, is built for your psychology and your biology. Meeting you where you are, NOOM weight uses psychology and that's why they say losing weight starts with your brain, but it also takes into account your unique biological factors which also affect weight loss success. The program helps you understand the science behind your eating choices and why you have cravings. Stay focused on what's important to you with NOOM Psychology and Biology based approach. Sign up for your trial today@noom.com that's n o o m.com.
[74:16]
Brian McCullough
Yeah I when I read that piece and then when you just said, you know, swic said we were in the weeds on this every single day so we're tired of hearing these arguments. It reminds me of folks that are always into politics and then they're like they're mad at the people that don't care about politics until it's an election year. And then they're like, well, you're a low information voter because all you know is that the factory in your town got shut down or there's inflation or whatever. And so you vote one way or the other, but you haven't been paying attention. But that's kind of the point, is that you shouldn't expect normal people to pay attention, except for the fact that, oh, this might lose me my job. So you can't, you can't blame them for being, I don't know, reactionary is the word or what, or emotional. But.
[75:03]
Sponsor
Right.
[75:04]
Brian McCullough
If you're in the weeds, it's harder to keep everybody informed. And this is gonna touch everybody. So I don't know. Okay, so this is the very last one and then we can wrap and do plugs and everything. But Simon, this is for you. It was kind of alluded to a little bit and you might not have one, but if there's something this year that a generalist like me is not aware that is coming down the pike that you think is going to be big in the AI space and maybe, Sean, if you've got one too, what do you think it would be?
[75:37]
Simon Willison
I think for most people who haven't been paying attention, we know these things already. We know that the models are now almost free to run things against the fact that you can now do video, like stream video to a model, the one that I've not played with nearly as much. But the thing where you can share your entire screen with a model and get feedback there, that's going to be really useful. Like, that's again, the privacy side of things really matters, though. I do not want some model just training on everything that it sees on my screen. But no, I feel like the stuff that is now possible as of a few months ago is. That's enough. I don't need anything new that's going to keep me busy all year.
[76:14]
Brian McCullough
Zwicks, you got one.
[76:15]
Swix
Simon's always too content and then he sees the next thing and he's like, oh, yeah, that's great too.
[76:21]
Simon Willison
Yep.
[76:23]
Swix
Okay. I love trying to be contrarian by saying, what does everyone hate right now? Remember this time last year we just had CES rabbit R1. We had the humane wearables.
[76:36]
Brian McCullough
Wearables? Yeah.
[76:37]
Swix
Those are completely in the gutter. No one will touch them. They're toxic nuclear waste. Okay. This year is the year wearables.
[76:44]
Brian McCullough
Yeah. Yeah, I agree with you, by the way. That cycle always works out where, like you go to a CES and it's everything hype hype, hype, hype. And then three years later, it becomes the thing. Unless it's 3D TVs, in which case that was a mistake.
[76:59]
Simon Willison
Anyway, transparent TVs are the big thing. The last couple is. What the hell.
[77:05]
Swix
Yeah, yeah. You know, so I. I think Simon may have got one of these, but there are a lot of people working on AI wearables here in sf. They are surprisingly cheap, surprisingly capable, and with decent battery life, and they do useful things. We have to work out the privacy aspect, of course, but people like Limitless, which used to be called Rewind, I think they're shipping one of these wearables that, based on your voice, only records your voice. So you opt in.
[77:36]
Simon Willison
Interesting, right?
[77:37]
Swix
Right. And so you can have perfect memory if you want. You can have perfect memory at work. Your employer can buy these for you. That only. It only applies at work. And it's fine. It's just a meeting aid. Lots of people use granola or some kind of fireflies or some of these meeting recorders, only for online meetings. But what about in person meetings? What about conversations and locations that you've been. And some of that should be a choice. Right now, you have zero choice. And I think these variables will enable some of that. And it's up to us as a society to determine what's acceptable and what's not. I really like these gray areas where we still don't know yet. Whenever I tell people about this, they're like, I don't know. I guess it's as though you have perfect memory, but some people have better memory than others. Where's the line? And there will be a lot more of these.
[78:30]
Brian McCullough
I would add to that, because Swix, as you know, because you listen to my show, the idea that AI has taken the smart glasses and completely changed everyone's mind about that as a product category and form factor. And I should say this from things that I've been looking at investing in. Wait till you see what they can add on to earbuds. Like the earbuds in your ear can do a lot more things than they're doing now. And then you combine that with smart glasses, and you combine that with an LLM that you can access, maybe with a phone, as like the mothership. There's some interesting things. CES next year is going to be crazy if you think wearables are AI. Wearables are a thing. Anyway.
[79:16]
Swix
This year they were not a thing. There were very much no wearables. As Cesar.
[79:22]
Simon Willison
This one's interesting as well, because the Thing that makes these interesting, it's multimodal audio input, video input, image input, which a year ago was hardly a thing and now it's dirt cheap. So yeah, we're much better positioned now than we were 12 months ago to build the software behind this stuff.
[79:37]
Brian McCullough
All right, let's bring this to a landing. Swix, go first. Tell everybody about obviously your podcast, which hopefully we're simulcasting, but also your conferences, events, everything.
[79:52]
Swix
Sure, yeah, you can find my work on Latent Space. It's the AI Engineer podcast. Much more sort of focused on serving engineers and developers than the general audience. But feel free to dive in to the deep end with us. And we are also hosting a conference in New York in February, the AI Engineer Summit, where we gather people. And this one is entirely focused on agents. As much as people like to make fun of the idea that every year is the year of agents at work, I think people at least want to gather to figure out what are the open problems to solve. And so these are the community of builders that get together, they show their latest work. I have Instacart coming to show how to use agents for their recommendation system and their sort of background jobs and internal jobs. And we have a whole bunch of sort of financial tech company, fintech or finance companies also showing off their work that I cannot name yet, but it'll be lots of fun. We do high quality events that sometimes people like Simon speak at.
[80:55]
Brian McCullough
Right. As I said, or I think I said online or on air that I saw Simon speak at one of your events last year. Wait, Swix, just say again, it's in February, it's in New York City. I'm going to be there if that matters to anybody, if that's an attraction. But what's the dates on that and how to apply it?
[81:11]
Swix
Yes, sir. I'm horrible at this. February 20th and 21st. 20th is the leadership day for management, like VPs of AI CTOs. And 21st is the engineer day. The individual contributors, hands and keyboard people. And that's when I'll have the big labs. So DeepMind, Anthropic, Meta, OpenAI, all coming to share their agents work. And then we'll have some new launches as well that you haven't heard of.
[81:35]
Brian McCullough
And to sign up to attend. What website can I go to?
[81:38]
Swix
Yeah, it's Apply AI Engineer.
[81:42]
Brian McCullough
All right, Simon, I'm going to hold hand you or handhold you even more. Your web blog is SimonWillison.net but what else would you like us to know or go find out about what you're doing?
[81:53]
Simon Willison
Yeah, I was going to say my blog, my day job, I call it a job is I work on open source tools for data journalism. That's my project. Datasets spelt like the word cassette, but data dataset IO and that's beginning to grow some interesting AI tools. Like originally it was all about data publishing, exploration and analysis. And now I'm like, okay, well what plugins for that can I build that Let you use LLMs to craft queries and build dashboards and all sorts of bits and pieces like that. So I'm expecting to have some really interesting product features along those lines in the, in the next few months.
[82:28]
Brian McCullough
And I'll end by saying if anyone's listening to this on Swix's show, I do the Techmeme ride home every single weekday. 15 minute long tech news podcast. Look up Ride Home on your podcast app of choice. Tech Meme Ride Home. Gentlemen, thank you for your time. Thank you. This was fantastic. What a great way to start the year for this show.
[82:50]
Simon Willison
Well, thanks a lot for having me. This has been really fun.
[82:53]
Swix
Yeah, thanks for having us. Honored to be on.
[83:01]
Simon Willison
Your brand deserves better than one size fits all branded clothing. Lands End Outfitters creates apparel your team will want to wear backed by above and beyond Expertise. Go to business.lands end.com pod20 and use.
[83:13]
Brian McCullough
Code pod20 for 20% off your first product.