Loading summary
A
Foreign.
B
Welcome to the Last Week in AI podcast. We can hear us chat about what's going on with AI. As usual in this episode we will summarize and discuss some of last week's most interesting AI news. Also I think some a little bit older. We are in our two week phase right now, but soon we'll be back to B weekly on Promise. Anyway, as always you can also check out the Last Week in AI newsletter at Last Week in AI that will send to your email with a whole bunch more news. I am one of your regular hosts, Andrei Karenkov. I studied AI in grad school and now work at Astrocade. And once again we have a guest co host, Michelle Lee.
A
Hey everyone, I am Michelle and I also study AI in grad school with Andre and now I am the founder and CEO of a company called Medra where we are building physical AI scientists.
B
Yeah, you co hosted for the first time just a couple of weeks ago was super fun. So you graciously agreed to co host again, which I think will be a great time. And this episode will be fairly low key I think. There's not been any like crazy big news in the world of AI surprisingly lately, but there is a smattering of different things going on as usual. So we've got some model releases of GPT5, one Ernie, a few other things in business stories. There's always more billions being spent on data centers, a lot of stuff going on with self driving cars which we'll touch on again and moving on to open source. We still are getting more and more models coming out of China which is a trend. We'll touch on a bit research and advancements. Some pretty interesting research from OpenAI and again other trends going on and we will touch on a bit of policy and a bit of media and art. So should be a fun relatively short episode and let's kick it off with tools and apps. We've got GPT 5.0, so this was just announced. There's GPT 5.1 instant, which is meant to be a warmer and more intelligent model and GPT 5.1 thinking, which is faster on simple tasks and will take longer on complex tasks. And this will be a replacement for the previous GP5 models. They'll be available for another few months and then they'll become legacy models. So a bit of an interesting development. A lot of people seem to not like GPT5 when it came out because they liked GPT4O and in fact like were angry that GPT4O was taken away. So this seems to be OpenAI trying to sort of, you know, thread that line of making it be friendly like GPT4O, but also not sycophantic as it was to be.
A
Yeah, and it looks like there are now more personality presets for the new models too. So people can choose between friendly, quirky, nerdy, cynical and a couple other ones. So gives users a lot more options to decide between the personalities that they want for their models.
B
Yeah, which I think is interesting. I always think back to like the mind blowing fact that ChatGPT has 800 million weekly users or something and, you know, the two of us presumably are just using LLMs and chatbots for work to be productive. But I can definitely see a lot of people just like chatting with these chatbots and in that sense having these friendly, quirky, efficient, cynical, like different personalities makes a lot of sense. And I could even see myself like talking to a nerdy chatgpt and having a bit of fun with that.
A
Well, something I started doing recently is on my drive to work, I will just turn on ChatGPT and talk to it and ask it to teach me new things or recently been really into poker. So I'll ask it to give me poker drills that I can do in the car. And it's surprisingly really fun and helpful. And I do sometimes find some of the preset personality is pretty annoying if I'm talking to it. And so I do think that as we are using more and more LLMs in even everyday situations, treating as a teacher the way I do, the different personalities will start to matter.
B
Yeah, for sure. And onto another big model release, but not from OpenAI. This is from Baidu. They have announced Ernie 5.0 and various other products in their offerings on AI. There was actually Baidu World 2025 which just happened. So there they announced this model. It is now part of Ernie Bot. It's going to enterprise customers. And interestingly, it seems to have been met with some disappointment. Baidu's stock went down by 10% after this. This was seen as a bit of an incremental move in a way similar to GPT5 from what I can tell. But yeah, there's a variety of announcements here. Also about the autonomous right hailing service Apollo Go, which is now completing quite a few trips, actually 17 million rides. Baidu search is starting to include more AI capabilities. So I always try to like keep an eye on these kinds of problems in China, which are huge and will have a lot of impact. But perhaps for our listeners who are mostly in the west may not be as familiar.
A
I think part of this is probably just like this past week, so many new models came out of China and I don't think Ernie was the most impressive one that came out. I mean like a lot of today's episode will be going over a lot of the different models and unfortunately I don't think Ernie was as impressive as some of the other models that came out. But what was really interesting was I read that Baidu's ride hailing service is now operational in 22 cities. That's pretty crazy. If you look at like Waymo and Robotaxi and now even Zoox, like they are really operating at. So fewer cities compared to Apollo Go, which is Baidu service.
B
Yeah, I think Waymo is now trying to get up to something like eight, Tesla is now trying to get to two. So definitely, yeah, quite a bit ahead on that one. Well, next up we've got another release from China. Bytedance has their Volcano engine and they're releasing dobao Seed code which is a competitor to cloud code Cursor Klein. All these various coding agents and one of the big deals in it is that it's pretty cheap. So the initial price is only $1.3 for first month. After that it's going to be $5.5 and it seems to be quite good. So I haven't looked into this 2D but I did see some discussion online that this is not, let's say as good as OpenAI Codex or Cloud code, but it's pretty good and it's much cheaper relative to how much you would pay. So again like we've seen a ton of competition on the coding agent front and like coding tooling front. And it makes sense that we are now getting these kinds of releases also from Chinese model developers.
A
More to come, I'm sure.
B
All righty. And now getting back to the west, we've got some news from Google. They are adding a whole bunch of shopping features to AI mode, AI mode being their kind of full on AI search in Google. So they had this blog post titled let AI do the hard parts of your holiday shopping which came out just earlier this week. And it goes over a bunch of things. You can shop conversationally so it will do a search for you, show your product. It adds a support to actually make calls for you and kind of give you the summary at the end. There's agentic checkout that is starting to roll out. So basically the AI can buy stuff for you which is similar to we've seen also OpenAI adding Some of these kinds of features being able to integrate with Shopify so seems to be like a killer money generator that goes Google and ChatGPT and some of these other companies are betting on Nice.
A
I guess this is just in time for Black Friday and also for the holidays. And we shall see how this works.
B
Yeah, apparently the agentic features are starting with some specific merchants like Wayfair, Chewy Quinth and some Shopify sellers. So it's probably not like full on buy anything for you yet. You're not gonna be able to go to Facebook Marketplace and buy some used products. But you know, I mean if you have a lot of people to get Christmas gifts for, I could see people using it. I don't know.
A
Yeah, I wonder how this will affect adoption of AI from the actual merchants. I know Amazon had maybe issues. Was it perplexity where they didn't want perplexity to, to shop on Amazon. But if, you know, you don't let your website be connected to these agentic checkout features and insist on using your own, I wonder how that's going to evolve.
B
Yeah, I think it's given the rise of AI browsers and AI search really it seems like sort of a natural next step is people are going to start finding what they want to buy via search. They're going to have agents in their browser or whatever. Why not just like let the agent do the final bits of work there?
A
Yeah.
B
And now moving away from chatbots to World Models. And the story is Fei, Fei Li's World Labs has released Marble, their first commercial product. So World Labs has been out of stealth for about a year now. A couple months ago they introduced a beta release of their work on World Models which is essentially at the time with a prompt you were able to generate a free environment to navigate around. And it was pretty impressive. You know, you had relatively large worlds, quite detailed. Well now two months later they are releasing an actual paid product where you can start doing a lot more. So they're expanding the generation to be multimodal now in addition to text, you can do images and videos as your input. They also have some interesting ways to edit the environments with for instance like course 3D settings saying like here's a cube, make it into a chair or something like that. And they, yeah, are introducing various tiers as a free tier, a pro tier, a max tier, lots of things. So it will be interesting. I think this is one of these things that isn't prevalent yet but seems like a no brainer for things like video games and VR, perhaps even VFX or movies. So it's still a bit nascent, but definitely starting to mature.
A
Yeah, I guess. Andre, you're in the video game world. How do you feel about the development of world models?
B
Yeah, it's, I think, generally related to 3D as a whole. Right. So generative AI has cracked 2D imagery for a while, is now starting to crack video, which is in a sense a world model. It's getting quite good on 3D models themselves, but that has still been a really tough case to crack. And similarly, 3D worlds are quite tricky because, you know, it's big, right? As you go 3D, there's a lot of detail you need. So it's been kind of case of gradual progress for years now. And we're getting to a point where you're probably going to be able to start integrating it into actual professional kind of processes. I don't think this is particularly useful, kind of as a hobbyist unnecessarily, but for industry purposes it's likely to start being useful. Although I don't know exactly how people in these spaces do their work, I suppose. And just one last story, also on OpenAI, kind of a quirky one. Sam Hoffman tweeted that they made it possible for you to tell ChatGPT not to use EM dashes. So quite literally the announcement was if you put it in the custom instructions for ChatGPT, supposedly it will no longer use EM dashes. And immediately after seeing that tweet, I saw someone else reply with an example of that not working. So it's kind of funny. And it's, you know, this made me think like, why is this a problem? Why is M dashes like a thing in these models? It's not like amdashes are statistically prevalent in the training corpus of online text. So why this is such a weird kind of problem to have for these models?
A
Yeah, I don't know. I do think it's kind of nice to have this like telltale sign of AI because it's now pretty easy to tell when people are using AI to write things and don't even, you know, take out the EM dashes. So yeah, going to get harder and harder to tell what's written by AI.
B
I would really love it if someone did a deep dive, a real investigation on the root cause of the EM dash writing style because it. You could have some very interesting hypotheses, like the training corpora of journalism uses more often. So high quality data like misleads you. I don't know I'm actually very curious, now that I think about it, why it likes M dashes so much.
A
Yeah, maybe it's like a RLHF. People prefer text with EM dashes. And then this is just like a. Like it will change soon once our opinions about EM dashes also change.
B
Yeah. Well, I think this is a good test for AGI. If you're told not to use EM dashes and you still use EM dashes, you're probably not at the AGI level.
A
Or maybe you actually can make the decision of when EM dashes are appropriate.
B
So I guess as humans, we could make a mistake. If you're told not to use like commas, you'll still use commas. So maybe it's unfair.
A
Maybe it's unfair, yeah.
B
Onto applications in business. First up, we've got anthropic announcing a 50 billion billion partnership to build data centers. This is a partnership with UK based NEO Cloud. I don't know what that means. Provider Fluid Stack. And they'll be building data centers in the US in Texas and New York. So we've seen a ton of these deals popping up in recent months, primarily from OpenAI. All of them have these crazy numbers attached to them. I don't know, 50 billion, 200 billion. All these numbers. We haven't seen as much coming out of Anthropic, but this signals that they are very much also in this game as it seems to be everyone, all the Frontier Labs, which is Meta, Google, Anthropic, OpenAI and just a couple other players are now at a point where they believe at least they need to be investing to build all these data centers.
A
Yeah, but it does seem like the 50 billion is a lot smaller compared to some of the other deals where Meta committed to building a $600 billion worth of data centers over the next three years. And Stargate, which is OpenAI's partnership with Soping and Oracle, is planning 500 billion.
B
Yeah, and this is also Anthropic's first investment in custom infrastructure. So they've been kind of depending on Amazon and Google and other partnerships of this sort. So them making the move to actually kind of take the next step of building their own data centers probably also signals a lot of confidence on their front. Basically, they are apparently projecting rising to 70 billion in revenue and even positive cash flow in 2028. So this is another kind of signal of that confidence, I guess. And speaking of hardware, we've got another story from Baidu. They have teased their next generation AI training and inference accelerators. So the chips are M100, which is their inference optimized chip, which is designed in part to enhance the performance of mixtures of experts models. And as with Nvidia chips and so on, they'll come in configurations of like 256 of them all at once where you can put in data centers. They're also announcing a training optimized chip for M300 which is in development and is set to debut in 2027 with intent to support multi trillion parameter model training. So this is quite notable. China is still locked out of using top of line hardware from Nvidia. And geopolitically it's very unclear where things are likely to be headed. But I think at this point more than ever it's important for them to create this domestic capability to have hardware to compete. And this to my knowledge is like the most promising effort on that front.
A
Yeah, it's. It is interesting that they are specifically designed to enhance performance of mixture of experts models because I don't think that has been the main kind of architecture that other frontier labs are investing a lot in.
B
My guess is that is influenced by recent trends, particularly over the past year. So deepseek V3R1 Quinn basically especially out of China, because you need to be more efficient. I see in your inference, it seems to be the trend that all the models are MOE models and that allows you to be much more efficient with your compute.
A
That's a really good point. Yeah.
B
I also interestingly here in this article I highlight that Jetson Huang apparently last week admitted that efforts to sell their Blackwell accelerators in China have stalled and there are no active discussions. So yeah, essentially it's an open market for someone to come in and replace or provide something that Nvidia is not providing in that market.
A
Yeah, I mean if there's a threat of in the future these GPUs being pulled away from the market, I can see why they want to do everything homegrown.
B
Yeah. Next up, a bit of light drama with people, I guess, big names and what they're doing. As far as startups go, as we often have this time it's Yen Lecun. There are seemingly rumors, I'm not entirely sure how confirmed this is, but the news is he is planning to exit and launch a new startup led by him. So Yann Lecun for reference is a very big name in AI research, has been active for decades, was the inventor of convolutional neural nets, which for a long time was the predominant way to do computer vision for basically like A decade was big part of why deep learning and large neural nets took off over a decade ago now. And he's been with Meta, I think, for something like a decade, built out their entire kind of AI research division, which was doing like really advanced research. Not quite as far as Google, but like, they put out a lot of papers and as we've covered over the past months, a lot has been going on over at Meta. They hired Alexander Wang, they established this whole thing, super intelligence division, which is separate from what Yann Kun is doing. He is the chief AI scientist. Alexander Wang is like the chief of something else, their super intelligence efforts. So anyway, the news is perhaps unsurprising now that lecun is looking to exit and focus on what he thinks is more promising, which is world models, something that is not necessarily just LLMs and trying to achieve superintelligence in the same way that at least Zuckerberg and others think needs to be done, which is more of the same.
A
Yeah, I mean, Jan is a researcher at the end of the day, and he for many years espoused deep learning when it was incredible, incredibly unpopular. And he ended up being right. And so I don't think he is at all shy of also espousing perhaps a different unpopular opinion of trying to really think about what the future of true intelligence would be. And I think, I think it's great that he can then go and explore this. And I'm sure given his background and his pedigree, he will be able to raise the money necessary. And I think it would be in general good for AI that researchers are investigating multiple different ways to improve AI.
B
Yeah, exactly. And I think it's maybe the case that there's a perception that he is an LLM hater of some kind. And from what I've seen, you know, he understands the value and strength of technology. He has a somewhat nuanced stake with regards to whether it will lead to AGI, which is like however you define AGI. So the initiative or the direction he wants to go in is working on models that learn from video and spatial understanding rather than just language, which, you know, makes a lot of sense. He often makes a point that modern AI isn't as intelligent as like a cat, because cats, of course, can do very advanced, very intelligent things with regard to locomotion, spatial reasoning, computer vision, et cetera. So in that sense. Right, it's true that LLMs are not truly generally intelligent. And whatever research he is hoping to do as a startup, I think will be very exciting. Moving on, actually Touching on something you mentioned earlier, Michelle. The story is that Amazon has demanded Perplexity stop using or letting its browser do purchases on Amazon. So they are actually suing Perplexity AI they sent cease and Deceased and they are just saying essentially the agent in the Comet browser that Perplexity has launched should not be allowed to check out and purchase stuff on our site. And their cases, you know, they have a set of acceptable use policies, they have conditions regarding data, meaning robots or gather, data gathering and so on. So basically the terms of conditions according to them make it so you're not allowed to do it. And I guess makes sense for Amazon in that they are also working on chatbots and bots to do shopping for you. They have Alexa as a thing that exists that, you know, I'm not sure if anyone's using Alexa, but it's a thing. So yeah, we'll be interesting to see where this goes.
A
Yeah, I'm actually pretty surprised that they're blocking it. I guess maybe they were seeing degradation in user experience but. But I kind of see this as only a net positive for Amazon if these agents are shopping on it. Right. If they really believe that Amazon has the best product, has the best prices, then they should welcome this new way of being able to shop. Though I do think, you know, Amazon asking perplexity to stop and perplexity not stopping is a big issue. But I think long term allowing agents to be able to shop on Amazon if the technology works and the user experience is positive, should be positive for Amazon itself and the company.
B
Yeah, I think the point being made in this Bloomberg article is something I hadn't thought about, that shopping agents could pose a threat to advertising on Amazon. So, oh, it's not just about people buying stuff on Amazon as you do searching for stuff. They do have like promoted things that you see with your eyeballs. Well, if the agent goes and buys stuff for you, you no longer are being exposed to that advertising. So in that sense, I suppose this could make a lot of sense.
A
Yeah, yeah. Though I mean in general advertisements and how agents will interact the web. I think it's a open question of how people will monetize.
B
Moving on to a couple of stories about startup fundraising. First we've got Gamma. It's a company that is aiming to kill PowerPoint or at least create a product that allows you to make much more fancy presentations. They are hitting a 2.1 billion valuation. They say they have a hundred million in annual recurring revenue, 70 million users and they are announcing a series B funding round of $68 million with their product having launched in 2022. So this is I think maybe one of the less known big successful startups in the space. Like everyone talks about Cursor, of course, all the chatbot ones, but Gamma is one of these very like practical things of make a chatbot, make a very nice looking presentation for you and then use it. And I actually have tried it and I found it pretty good. So it's one of these things where this is definitely a kind of product where AI makes sense and it can be lucrative and profitable, which cannot necessarily be set up vibe coding at this point.
A
Yeah, it's pretty impressive that they only have 50 people in the company and they've already reached 100 million ARR. And honestly the amount they're raising for a series B at 100 million ARR is not a lot. Like 68 million is a relatively small number which just means that they're actually, especially with such a small team size, they're probably very profitable right now.
B
Yeah, they're in a very kind of powerful position where they probably don't need that much money because we don't need that much manpower. This purely software products anyway, it's one of the definite successes of the past few years.
A
Yeah, definitely.
B
Onto a slightly newer startup, we've got Inception. This is a startup that we covered some months ago. They had this very cool demo of a somewhat powerful diffusion model for coding where instead of doing the standard kind of left to right at a regressor generation, you could generate everything all at once and it looks really cool and interesting. Well, they have now raised $50 million to do that to build diffusion models for code and text. And this is led by, actually didn't know this, by Stefano Armand from Stanford, which is really cool and yeah, really exciting. If you know, if you look at how diffusion models generate text, it's very different from the way it works with LLMs auto regressive models and could be very powerful if they kind of crack the challenge of training and whatever else is making this hard.
A
It's interesting though that they are still only being integrated into tools like proxy AI and build glare, which are not development tools that I have used or my team uses. So curious when they'll integrate with more popular tools like Cursor and if so excited to try it.
B
Yeah, I think we're at a point where the models are fast but not necessarily as good with the fusion. So hopefully this money will allow them to do all sorts of experiments to figure out how to make them good. In addition to being fast, next we've got cursor raising $2.3 billion just a few months after having previously raised. So they have raised their valuation to $29.3 billion with this. And this is coming after their series C raise of $900 million back in June. So this is an interesting case, I think, where like you are a provider of a coding interface. It doesn't seem clear why you need this much money to me unless your subscription service is operating at a loss. And I do allow you to subscribe for $20 a month to get some amount of auto completion and other LLM support.
A
But I think Cursor has been starting to train their own models and the fact that they're also doing reinforcement learning and also doing a lot of the training and kind of online and improving the quality of Composer, I think it does make sense if they want to go head to head against Open Anthropic, especially because OpenAI and Anthropic are also releasing their own coding tools. I think for Cursor to only rely on external AI models, I think it's actually pretty dangerous for them and it puts them at existential risk. So I think it's a really good idea that they are raising more so they can build their own models and continue to improve their training. And also some really cool things that they've been building like reinforcement learning directly in production and kind of learning online.
B
Yeah, yeah, that's a good point. Somehow I forgot just on the previous episode we covered the Release of Cursor 2.0 and composer with Composer being their first fully in house model. So in that sense it makes a lot of sense. I think the other thing is I'm fairly certain that cloud code and codecs and a lot of these other tools are also operating at a loss. Like Anthropic is burning a lot of money letting people use their max like 200 per month plan. Somehow it's actually much very far from being profitable. So if Cursor wants to compete with cloud code and sort of the broader market of many players, this kind of war chest will definitely help. Next, a couple stories about self driving cars. As we've promised a bit more on Baidu. Baidu said we mentioned previously that their follow Go service is now in 22 cities. They are also saying that it's running 250,000 robo taxi rides a week which is the same as Alphabet's Waymo, which is at least the same as Waymo. As of earlier this year. Apologo operates in Wuhan Beijing, Shanghai and Shenzhen and is expanding to Hong Kong, Dubai, Abu Dhabi and Switzerland. So major big cities similar in a sense to Waymo which is in San Francisco, Los Angeles, Phoenix and now also Austin and Atlanta. So exciting times for Robotaxics.
A
Right.
B
We love to see it and I haven't been able to find kind of reports or any sort of first hand impressions on the quality of a ride or any comparison between Waymo and Baidu, but I wouldn't be surprised if Baidu has similarly kind of cracked the problem of really reliable self driving.
A
Yeah, it's really also exciting that they're going multi country already or they're planning to very soon. I don't know whether what the plans are for Waymo and Zoox and Tesla to go beyond the United States and very cool to see the expansion.
B
And one more story on a self driving front from China. Ponny AI, another driverless tech firm has raised $863 million in their Hong Kong listing. They sold 42 million shares, basically went public. And that's exciting, right? The whole space is maturing. Pony AI is one of the leaders in the space and that's about all I have to say. Like we're getting to a point where these companies are ipoing and you know, are at least promising to be profitable in the near ish term.
A
Yeah, I think they're aiming for profitability in 2028 or 2029, which is coming up.
B
Yeah, pretty soon. I remember the hype we had like a decade ago for self driving cars back in 2015 was when we were being promised it's going to be here in a year or two and it's finally here. It's finally here onto projects and open source. We've got I think one really exciting release in the past week or two. We've got Kimi K2 thinking so much. AI has previously released Kimi K2 I think a couple months ago back in August. A very nice model, just LLM competitive with the other open source models from China like Kwen and now with Kimi K2 thinking, which is of course kind of optimized for tool use, for coding for complex problems. Also on the larger size they have a total of 1 trillion parameters and 42 billion active parameters per inference is really good. So they say it is able to perform 200 to 300 sequential tool calls without intervention. Getting really good numbers on benchmarks for tool use like browser comp, live code, bench, sb Bench, verified. Overall seems to be really good. And this is at also while being Very affordable, much cheaper than GP5 or even Mimax M2.
A
My team actually tested this model in the past week and have found a lot of its capabilities better than other of the tool use agents that we have used in the past or currently using. So it's really exciting that not only is it beating all the other open source models, it's even matching or surpassing the models from the large Frontier Labs.
B
Yeah, and this is while being with a modified MIT license. So MIT license basically means you can do whatever you want with it. All right, don't come at us. We are freeing our hands. They do say if the software or any derivative product serves over 100 million monthly active users or generates over 20 million USD per month in revenue, the deployer must prominently display Kimi K2 on a project user interface. Which is, I suppose fair enough. Actually. Cloud Agent SDK also has some of these things where you have to actually say cloud agent. But yeah, this is a very permissively licensed, very advanced model. You'll be able to fine tune it on your own data and I am exciting to also be able to try it out. It's not out on many of the serving platforms yet, but once it hits things like Groq, it might be a very intelligent and a very fast model. And onto research and advancements. Moving away from models for a bit, we've got, I think a very interesting paper titled the Remote Labor Index, Measuring AI Automation of Remote Work. So this is essentially a benchmark, they say this is a standardized empirical Measurement to evaluate AI's capability to automate remote work, focusing on economically valuable tasks sourced from online freelance platforms. So essentially, you know, if you go on any, I guess fiverr or I'm blanking on the names of these services, but you can hire web developers, designers, editors. We hire an, actually an editor from one of these services. This is going to try and track whether AI is able to replace or at least kind of do that work. What they say in the initial evaluation is that current AI agents are quite limited. The best performing models only achieve a 2,5% automation rate and they do all sorts of comparisons showing that while AI models are improving, they're still very far from human performance in remote labor tasks. So yeah, I think this, you know, is kind of a key question with regards to a lot of things like AGI. One of the definitions of AGI is being able to automate the majority of economically valuable work. This kind of index bay should basically tracks whether we're at AGI or not in some sense.
A
I think though the agents do keep improving and I'm curious how this is going to change, like even in six months from now when new agents are released, how good they will be with you. I think the inability right now of agents to have context, like long time horizon context probably is the reason why remote work is still going to be really difficult, because most remote work can span days, if not weeks. And so if an agent cannot keep track of all the new updates and the work that comes in, the feedback that comes in is going to be really hard to scale. But again, these are things that keep improving.
B
Yeah, I think in this, they are limited still in this evaluation with regards to things like teamwork, for example. But digging into the details a little bit, they take a lot of the tasks from primarily upwork, which is an actual freelance service. And they have a pretty wide variety of categories of work from the UPWORK taxonomy. So there's video work, graphic design, game development, audio product design and a bunch of other ones. And they actually sourced the projects from UPWORK and some other ones. They also recruited 358 freelancers with verified UPWORK accounts and specializations and used these freelancers to collect the projects to basically take their work samples and then use that as the tasks that they need to fulfill. So essentially like these are about as real as you can get for a benchmark as far as I can tell. Like even going beyond sdb, which verified this is like actual work that humans were paid to do. And so will be very interesting to see how rapidly AI agents are going to be able to improve. I think on some of these tasks, likely with regards to maybe a product design, we might see rapid improvement. Some other things like CAD development, architecture, I don't know. It'll be interesting to see where this goes.
A
Yeah, and it's a fun paper to scroll through because they give a bunch of examples of where AI succeed and AI failed. And the failures are pretty funny. Like creating like a 2D design, educating viewers on things and then they're just like spelling mistakes, you know, in the same like in the ways that AI generated words and art oftentimes has spelling mistakes. And also there's a lot of like creating 3D products where the AI just creates like completely jank looking 3D products. A diamond ring is just like a ring drawn out in some CAD software and like another, like another like oval shaped thing on top of it that's supposed to represent the diamond.
B
Yeah. Pro tip. If you open up a paper, go to section C6 in appendix page 27. A lot of These screenshots. And on that note, actually just to give a couple actual concrete examples of projects, there's for instance a data visualization project where the brief is build an interactive dashboard for exploring data from the world Happiness reports. The requirements to use data from a provided Excel, provide an overview, map, detail, score, breakdown. They have another task for an automated video. Create a 2D animated video advertising the offerings of a tree services company. Has to use a provided voice over file, flat design, no subtitles. And they actually get examples of the human deliverable for these project briefs that you can see. So you know, you can do a straight side by side comparison of what a solid deliverable there would be if you were to be hiring someone. Next, we've got some research from OpenAI regarding interoperability with sparse transformers. So we've covered durability a decent amount this year. Has been a trend going on for a while now where we've made, or we've gone to a point where you can find kind of groups of neurons that activate together to represent a concept of some kind. And the way this has been done is to take a bunch of outputs of a neural net node that has already been trained, compress those outputs to find smaller space. And in that smaller space you're able to find things like comma or Golden Gate Bridge or sarcasm or et cetera. And you've made quite a lot of progress on that front to a point where you're able to control models. You have, you can make Claude obsessed with Golden Gate Bridge, et cetera. So yeah, that approach has been quite successful. This paper is doing something a little bit different. Basically the argument is, what if instead of taking a dense transformer and then trying to find a sparse representation with the sparse autoencoder to then kind of map back onto the initial set of weights. What if we try to just train a sparse transformer in the first place and then within that sparse transformer find these combinations of units, what they call circuits, to find out how things work. So that's the gist of the paper. And as you might expect, they show some early results on that front. They show that if you train a sparse outer cord, which means you have to train an entire new model from scratch in a way that's different from the way that GPT5, for instance, has been trained, you're then able to find a set of a path through the neural net, more or less, that explains how you get to, I don't know, like parentheses or quotes or whatever. So it is kind of, it has Some advantages in that it's arguably simpler to find these circuits or it's more interpretable, more directly interpretable, what is going on within the neural net, just because you have fewer units that are active. So it's a little simpler. But on the other hand, you need to train an entire sparse autoencoder from scratch, as opposed to being able to train something on top of an existing model to explain its behavior. So that's kind of a main challenge here.
A
And then training sparse models are just a lot more inefficient to train and deploy. I believe in this paper the models were around the size of GPT2. So the paper itself acknowledges that these models are extremely inefficient to train and deploy and are unlikely to ever reach frontier capabilities. They do have a section in the paper, preliminary results of using bridges at the layers to be able to use this kind of weight sparse training to better understand existing dense models. So you can couple a weight sparse model bridged with a dense model. So that's still a new area of research with still very preliminary results. And I think this paper is just suggesting a new way of being able to interpret these weight sparse models and hopefully gives us scientifically valuable understanding of how these models work mechanistically.
B
Yeah, I think this is a very kind of research paper where it's an early idea, there's some initial results that aren't usable in practice necessarily. They have Section 5 limitations in future work, which takes up most of a page. But that also makes it exciting because we've seen some very impressive progress with sparse autoencoders and this could potentially work alongside that, help it, you know, gain more fundamental insights. Could be cool. Next we've got a new paper about a potential alternative to the standard transformer model architecture. This is actually from the Kimi team, but it's not Kimi K2 thinking. It's research paper on how to train more efficient models. So the paper is kimilinear and expressive Efficient attention architecture. There's been years and years of work on efficient attention. So the standard attention formulation going back to 2017 with the first transformer paper has famously quadratic scaling, where everything connects to everything. So it's very costly to get big. And there's been many ways to approximate full attention, many ways to make it have linear complexity and registers. All these works haven't managed to be quite good enough to make it. So you can replace true full attention. Here the team is saying that for the first time, a linear attention architecture is able to outperform full attention in fair tests on actual RL scaling benchmarks under identical training budgets, this would mean that you're able to get similarly good performance at much, much higher efficiency. So they reduce memory usage by up to 75%. They get six times faster decoding at 1 million tokens. This is at a relatively large model scale. So they have 48 billion parameter mixture of excess model, 3 billion active parameters for forward pass. Not huge. This is still on the smaller scale of models. So there is a question there of whether if you scale up to a trillion parameters with tens of actor billions of parameters, if that will still work as well. But if truly we're able to get to a more efficient architecture that is able to scale this well, that would be very exciting.
A
Yeah. I Wonder if Kimi K2 thinking is built on Kimi linear.
B
Yeah, I'm pretty sure that's not the case. I think this has just happened to be around the same time. Because this is still research. Exactly. I might be wrong, so please do correct me if I'm incorrect. But if we try to get into the details, by the way, this is just gonna be impossible. I'm gonna totally ruin. They introduced QEMU Delta attention, which is a gated linear attention variant that it gets very jargony very quickly. There's a lot of math, there's a lot of algorithmic details here. In a sense, this relates to things we've discussed a lot last year like samsa. Yeah. There's been a lot of work on recurrent models as well that are promising.
A
It looks like also a RNN architecture. So they are able to use linear RNN like architecture but still get efficacy similar to two transformer models.
B
Right. And similarly to other successful things. This is a hybrid model. So they combine full attention with this kind of linear attention and that combination. They have like a 3 to 1 ratio of delta attention layer to full attention layer seems optimal for balancing quality and speed. So it's been an open question in my mind for years whether we transcend the standard transformer architecture, which really hasn't changed dramatically since GPT3. And it could be that we're getting there after many years of research. Just a couple more stories. Next one is not a full on paper but a fun kind of announcement from DeepMind. They have announced and showcased a lot of videos of a new agent called SEMA2 which is their general purpose kind of game playing AI. So they have this very long line of research on training agents within simulated environments. They released SEMA I think a couple Years ago. And it operates within a variety of like actual games you can buy and play, such as no Man's Sky, I believe Minecraft as well, a whole bunch of them. So SEMA scalable instructable, multi world agent we first saw in March 2024 that was the first time they had this agent that operated within a whole bunch of game worlds and was able to given a text prompt, go off and try to do some stuff. Now they have SIMA2 which is the same idea. You give it a text prompt and the agent is able to go off and try to do it in whatever world it happens to be in, whether it's Goat Simulator or no Man's sky or, or Minecraft or interestingly for me, they have an example also of Genie 3 where it's an entirely new world that is being created by their world generative model genie and they have this actual kind of AGI ish agent operating within this world. So we were talking a bunch about world models earlier. Yeah, this is a world agent, I suppose you could say that actually interacts with the world, has to navigate, it has to do things like jumping or searching and there's really not. You really have to go and look at some videos to understand. One of the key things they said is there's a lot more reasoning going on. So the agent internally reasons through a set of actions they need to needs to take. So it's more agentic broadly anyway, no full paper. So I just read their blog post and so saw a bunch of cool videos.
A
I think it's really interesting there. Jane Wang, who's one of their senior staff research scientists at DeepMind said that they're really training this so that it becomes a training ground for potentially transferring the skills to real world environments one day. So again, if we want truly physical AI or like train these AI agents to be working in the physical world to actually be embodied one day, that perhaps video games, which is not a new concept. Right. Like in robotics we have the idea training and simulation doing sim to real. But then now we can also train in video games and if you train on enough data, can you then do sim to real for this like physical AI agent that can interact in real world. So I think these are all very interesting ideas of how we can use world models and agents in 3D environments to actually turn into embodied physical AI.
B
Exactly. And this really is something that DeepMind has been investing in I guess basically from the start, learning within games and trying to generalize more and more to open ended environments and do reinforcement learning and they are making substantial progress. If you look at these videos, these agents are being told to like go and find a beehive or go and build a house in Minecraft or whatever, and then they go off and do it. And they are kind of like an LLM, but that can actually walk around and do stuff in a simulated 3D environment, which is not something you'll find from GPT5 or Claude or any of them. One last story. Not a research paper, but actually a meta story about research. The story is that Arxiv is changing rules after getting spammed with AI generated research papers. So the new rules is that Arxiv will no longer accept computer science review articles and position papers. So review articles are basically summary papers. You summarize the state of a given field, call out all the papers. It's quite useful to be able to, if you do research, kind of keep up with what's going on in this sub space. And position papers are what it sounds like. You kind of set forth a position rather than a result. And I am I guess not surprised that Arxiv, which by the way is kind of this open platform where anyone can post papers and browse them, the standard place to post computer science and AI papers. Not surprising that they're now getting spammed by low effort stuff.
A
Yeah, well, you know, with how good LLMs are these days, arguably we don't need review papers anymore. Right. Like why read a review paper if you can just ask, you know, chatgpt to create your own review paper every time you're doing paper reviews.
B
Yeah, I think that's another reason for this.
A
Yeah, yeah, it's just not like it's not as necessary anymore.
B
Archive posted pretty lengthy article explaining this. It's actually technically not a change in rules. They have ways to be able to still have your review article or position paper go and be on archive of some review. So it's not a huge deal. It's not like a dramatic kind of tragedy or anything, but an interesting example of where we're at in the world. Their Arxiv, which is this niche thing for researchers, is now getting spammed by research papers onto policy and safety. First up, on the legal front, we have Stability AI largely winning UK court battle against Getty Images over copyright and trademark. So Britain's high court has ruled for Getty on trademark infringement, specifically for Stability AI images with Getty's watermark, but dismissed a broader copyright infringement claims. So this is essentially dismissing a major part of what Getty has accused Stability of and Justice Jonah Smith concluded that stable Diffusions AI or Stability AI did not infringe copyright because it does not store or reproduce copyrighted works. Which then kind of speaks to a lot of the legal questions about generative AI and then especially I guess text to image AI where technically the model isn't reproducing or storing any image, but it's being trained on copyrighted works without permission. So very interesting to see some finally some results on this front. These kinds of lawsuits have been ongoing for years and it's I think still to this day a kind of a fairly open question as to what will shake out with regards to copyright and training AI models. But it's looking more and more like the fair use argument of like you can just train on anything might win out.
A
I'm just surprised that Stability AI. I haven't heard many releases from them honestly in many years now.
B
So sounds like they've definitely gone quiet. They used to be, you know, all over in 2023. They were one of the first kind of big providers of open models back in the day. Used to be stable diffusion was a big deal. Now not so much, but they're still around and I guess good for them. They're not going to have to pay out to Getty, perhaps too much. Another article on the copyright front. This One is about OpenAI and about a court in Germany saying that OpenAI has violated German copyright law. So this is a lawsuit filed by Gemma, a German collective that manages music rights in November and the court has ruled in favor of this organization and has told OpenAI to pay an undisclosed amount in damages. So this is a another significant case I guess in Europe and Germany where this is kind of the opposite outcome. Right. Where copyright infringement was found for OpenAI training on some organizations work. Apparently this society manages the rights of composers, composers, lyricists and music publishers and has approximately 100,000 members. This lawsuit was filed back in November of 2024. And so this is about specifically lyrics that ChatGPT has learned from so very. Yeah, it's one of these very open questions of whether it's actually an issue. And in this case the funding was for these songwriters.
A
Well, it's interesting because it's lyrics, it's not actually the full song but even then because they train on these lyrics, OpenAI now has to pay them, which I mean lyrics is.
B
It is copyrighted.
A
So it is copyrighted. Yeah.
B
Anyway, still it's a whole big mess. What is going to be the actual law and precedent with regards to copyright? We are Starting to get some initial legal results. Finally, it's kind of going in different directions in different countries, it looks like. And one story not on copyright, but instead on more kind of geopolitical fronts. Microsoft has invested 15 to $15.2 billion in building out their hardware capacity in the UAE over the next four years. So they'll be shipping advanced Nvidia GPUs to the UAE. The US has granted Microsoft a license to do that. And apparently this will be one of the first kind of cases where there's a license to do that. And US is now forming this tighter relationship with the uae with these kinds of scale of investments and with having AI hardware in a country, as we've covered before, this has been a bit of a touchy area with certain organizations in the EU UAE having ties to China. Anyway, complex geopolitics going on here. But the direction seems to be that the US is getting tighter and closer.
A
To the uae, which, I mean, in many ways we have seen the Middle east actually being source of funding for a lot of the major AI companies, or at least in the investors being limited partners in VCs, investors that end up investing in the major AI companies. So it's, yeah, it's interesting that now the US is also investing back into the UAE and you know, part of the Middle east to expand the country as a key player in the USAI diplomacy.
B
Yeah, and I think that's a good thing to call out. The UAE in general is trying to kind of position itself as a key player in the space from some very basic context. This is the United Arab Emirates. It's a very rich country due to oil. The country has a whole bunch of money and they have been trying to diversify and not depend entirely on oil for quite a while. And part of that strategy has been to invest very heavily in AI. And so Microsoft work here, they're saying we are not just going to build data centers, they're also going to be pairing this infrastructure with investing in local talent, training and governance and trying to make Abu Dhabi a regional hub for AI research and model development. And we've also seen some of that, some models coming out of organizations there. So certainly an ambitious effort by Voe and seems to be at least having some success. Onto the last section, synthetic media and art. It got just a couple of stories here, both dealing with AI generated songs that are starting to hit some charts. The first one is an AI generated country song that is topping a Billboard chart. This is Walk My Walk by Virtual artist Breaking Rust. And it has topped the Billboard country digital song sales chart. Now I did see some people comment that these charts are very specific and kind of gamble. I don't know the full details of it, but the song has gained significant traction with 1.8 million monthly listeners on Spotify. And Michelle, we were just chatting about this. You took a listen and. And you think that it's actually an enjoyable song.
A
Before this recording I was listening to some AI generous song and I have to say most of them are pretty bad or maybe just in genres I don't normally listen to. But Walk My Walk is actually a banger. It's actually quite good.
B
Yeah. I think it's a very kind of traditionally sounding song. It's not very complex, but the quality of it is representative of the quality of text to song in general. Where all the artifacts or other little like bits of noise and crunchiness and other audio weirdness things are increasingly gone such that you're not necessarily going to be able to tell whether something is an AI song and you are going to be able to create some good sounding music. It's just that you need to pick kind of the right type of music with the right prompting and so on. And so it's kind of unsurprising that you can get something fun sounding. Spotify also has famously a lot of sort of ambient music and other electronic music that appears to be being done with AI. A lot of playlists now that are dominated by AI songs. I think there's like problems with spam now where people are trying to spam on Spotify to upload a whole bunch of AI songs to get some revenue to a point that Spotify has had to start cracking down on it. So as you might expect, this is a bit of a big deal because it's is charting on some kind of measure of economic success. And you. You can see various takes on it where most people commenting on it online are not happy and are not a big fan of the idea of AI music, which I think is a bit of a complex topic, I think. I don't necessarily dislike AI music, but I think it should be used in a specific way that isn't kind of replacing traditional music, let's say.
A
I think it's just another tool personally. And anytime there's a. I think it's in many ways its own, it's a tool and it's also in many ways like a new genre in and of itself. And I think we should be treating it that way. I think when new genres have emerged, especially that utilizes a Lot of technology. They've always in the past been slammed for somehow not being creative or utilizing technology too much. But if we think that's just another technical advance. Of course there are a lot of differences here because you can go into the genre, it's not like techno where it's a complete different genre. You can go into the genre and compete with other people like real artists in that genre.
B
Yeah, I think the particular aspect of the story this is charting on Billboards and getting a lot of listens on Spotify. You could make a definite case that this aspect of it, like, aside from the artistic merit of it, is concerning and not necessarily might be a bigger issue, regardless of whether it's good music or not. Like, should AI generated music be competing with people who try to make a living as musicians? I could see that being. That's kind of is a key question. And the next story actually is also dealing with that. We've got this AI artist, AI singer with the name Gazania Monet that has become the first AI artist to earn enough radio airplay to debut on the Billboard radio chart. So apparently this AI artist that is of course being run by some actual human being has signed a multimillion dollar record deal with Hollywood Media. This is created by poet Talisha Nikki Jones and she's using Suno, making music in the genres of gospel and R and B. Has already released a full length album and episode. Definitely a question of whether this will become a normal thing or not. Is this like a flash in the pan and it'll go away or are there going to be more of these sort of people creating AI musicians?
A
Yeah, behind the scenes, I think so. And I think it's telling again that like the creator is herself a poet and an artist. So maybe the argument is like, in the future anyone can do this and you no longer have to be an artist yourself to create this. But I think right now it does seem like a tool that people can then use to create their own art.
B
Yeah, so there's been examples of kind of bands with, I guess you could say artificial or fictional characters like gorillas. There's Hatsune Miku. There are cases of sort of real people behind the scenes creating an artist Persona that they are operating through. And you know, that's been an interesting kind of example. That is obviously not the norm with AI that could become much more of a normal thing where the artist is a person creating a Persona, a character. And this artist has an Instagram, has a face, has, you know, music videos attached to it. It's certainly like, I don't know, you could write a lot of papers and a lot of humanities papers about this. Or I think you could, you could.
A
Use AI to write a lot of papers about this.
B
Yeah, you can make some really good video essays on YouTube discussing the implications of it and so on.
A
Just AI all the way down.
B
Well, we actually do have one last story on text to audio, but this time it's about text to voice. So ElevenLabs now has a marketplace that lets brands use famous voices for ads. It's called the iconic Voice marketplace. And basically if you're someone famous, you can with provide your consent and formal licensing terms for companies presumably marketing to use your voice. And for example, it seems that Eiko Cain has supported the initiative and Michael Caine of course is a famous actor, has a notable voice. Other offerings on our marketplace include Judy Garland and Alan Turing. So I guess you could also do some historical characters.
A
There's actually a lot of historical characters. I'm looking at this list like Mark Twain, Thomas Edison.
B
Yeah, I'm not sure how you could get the voice of Mark Twain.
A
Yeah, I don't, I don't know. And also I assume it sounds like they have gotten permission. So it looks like they are referencing historical archival audio but. And they probably have gotten permission from the states of these people. But yeah, it's kind of weird. I think most of them actually that's a. Not most of them. Many of them are actually historical figures.
B
Right? Yeah. Michael Caine is one of a few living celebrities to learn his voice. So all these other characters would be their estates. You know, there are people who have the rights for people to allow to do this and maybe that's why they started Historical voices. Yeah, the states are maybe more friendly to the idea than actual living actors. Yeah. But you know, it's certainly better than illegally cloning Michael Caine's voice, using it. So I wouldn't be surprised if this becomes kind of a major marketplace actually. Well, that is it for this episode. A whole bunch of kind of smaller non crazy news was quite a lot of fun to discuss. Thank you Michelle for guest hosting.
A
Thanks for having me here and thank.
B
You to all the listeners as always for listening and hopefully forgiving me for not actually publishing last week in AI every week for the last while, as I've been mentioning. Jeremy, our usual co host, is on leave until December and we should be able to get back to. To actually being weekly, hopefully soon. As always, we appreciate it. If you subscribe, if you share the podcast and if you leave reviews on Apple podcasts or somewhere else. I will definitely keep an eye out. But more than anything, we appreciate you listening. So be sure to keep tuning in. Tune in Tune in When the AI news begins begin it's time to break.
C
Break it down Last weekend AI come and take a ride get the low down on tech and let it slide Last weekend AI come and take a ride I'm at to the streets AI's reaching high new tech emerging Watching surgeon fly from the labs to the streets AI reaching high Algorithm shaping up the future sees Tune in tune in get the latest with ease Last weekend AI come and take a ride Hit the low down on tech and let it slide Last week in AI come and take a ride I'm the last to distribute streets he has reaching high. From girl nets to robot the headlines pop data driven dreams they just don't stop Every breakthrough, every code unwritten on the edge of change with excitement we're smitten from machine learning marvels to coding kings Futures unfolding see what it brings.
Podcast: Last Week in AI
Hosts: Andrei Karenkov (A), Michelle Lee (B - guest)
Episode Date: November 21, 2025
Main Theme:
A roundup of the latest in AI news—model releases, business developments, research papers, legal issues, and cultural trends—with a focus on major global players and tangible impacts across tech, business, and society.
"I do sometimes find some of the preset personality is pretty annoying if I'm talking to it." (A, 04:25)
"This is also Anthropic's first investment in custom infrastructure." (B, 17:26)
AI-Generated Music Makes Billboard
AI Singer’s Chart Debut
ElevenLabs Famous Voice Marketplace
Chatbot Personae Matter:
"I can definitely see a lot of people just like chatting with these chatbots...having these friendly, quirky, efficient, cynical...personalities makes a lot of sense." (B, 03:30)
Open Model Transparency:
"Not only is it beating all the other open source models, it's even matching or surpassing the models from the large Frontier Labs." (A, 38:21)
On AGI & Economic Tasks:
"Current AI agents are quite limited...they're still very far from human performance in remote labor tasks." (B, 41:00)
On AI-Created Music:
"Walk My Walk is actually a banger. It's actually quite good." (A, 67:33)
“I think it's just another tool, personally...when new genres have emerged, ... they've always in the past been slammed for somehow not being creative or utilizing technology too much." (A, 69:38)
This episode underscores the relentless pace and global scope of AI advancement, balancing technical achievements (model performance, hardware, architecture) with real-world business, culture, and regulatory ramifications. Open models, international competition, and the friction between legacy and emerging creative forms illustrate a field in flux—where what’s cutting-edge one month feels routine the next.
Memorable Outro:
"Just AI all the way down." (A, 73:32)