Loading summary
Andre Korenkov
Foreign.
Jeremy Harris
Hello and welcome to the Last Week in AI podcast where you can hear us chat about what's going on with AI. As usual in this episode we will summarize and discuss some of last week's most interesting AI news. And you can go to Last Week in AI for even more news articles in our newsletter. I am one of your regular hosts, Andre Korenkov. My background is that I studied AI in grad school and now work at an AI startup.
Andre Korenkov
And I'm your other regular co host, Jeremy Harris from Gladstone. AI, AI national security stuff, as you will tend to know. And yeah, we've got, I think an interesting episode today that we're going to have to get through in about 25% less time than usual. So we'll see. We keep saying this, we keep saying this and, and then it doesn't, we don't, we don't do it. But we're going to try again. We'll see.
Jeremy Harris
Yeah, and this episode is partially interesting. There's not any like big, big news in the AI business front and the AI kind of development front. There's mainly some really notable open source releases and papers. So this one might be a little more technical as they go and we'll try to not get super, super nerdy. I think in the last couple episodes you started getting really into the weeds of these papers, which might for everyone, but we'll, we'll keep it a little quicker. And just to cook your knowledge, I think we mentioned wanting more comments or appreciating people's comments. I did notice we've got some more feedback on YouTube, so it's nice to see we're checking it out. One person mentioned there not being any flashy thumbnails and I, I just make these kind of very nerdy looking thumbnails for YouTube for those who just listen. And I do have a personal kind of feeling of liking that style.
Andre Korenkov
Yeah, it's funny. This podcast is very much like made in our Internet garage. I mean it's fun. I do it for the fun. Personally, there's so much to keep up on and I feel like it forces me to have a clue what's happening. So really appreciate you guys listening in. And it is, by the way, those comments, they really do because we do it for the fun. They do make it more fun and it gives a sense of community and like you guys are actually, you know, listening and asking for stuff that you want. So anyway, I just really appreciate it. So thank you.
Jeremy Harris
And one last thing we'll mention before we get going Last episode you were just chatting before or after? I forget. And we're talking about how, you know, we have a lot of data from having recorded so much and having transcripts. So navet vibe coding is a thing. What if we just got Claude to go and look at all these transcripts and do some analytics and it worked. I just did it over a weekend, created some dashboards and there's some funny things there. Like for instance, like the data is very clear. I talk much slower, significantly slower, like 20% slower with Jeremy, but I also talk 20% more. So our ratio of speech per episode is almost exactly 50 50, which is pretty impressive if you think about it.
Andre Korenkov
Yeah, that is cool actually.
Jeremy Harris
We also do tend to speed up towards the ending of recording as we hit our limit and have to kind of become more efficient with the news we cover.
Andre Korenkov
And what this doesn't capture too is like Andre in the background around the like 60, 70% mark of the episode is like feverishly in our Google Doc refactoring and being like, okay, this story, we don't have time to do it, we're going to cut it, we're going to do this, we're going to do that. And like the, the whole, I mean he's managing an orchestra. I get to just sit here and basically read my notes about the next story and like try to think about what I want to say while Andre is just in this frenzy. So you guys don't see his hard work here, but it is happening.
Jeremy Harris
I wouldn't say feverish. I think I'm pretty relaxed about it these days.
Andre Korenkov
You don't look panicked, but you're, you're experienced, I would say. Yep.
Jeremy Harris
All right, well with that let's get into the news, starting with tools and apps. As usual, the first story is about Gemini coming to Chrome. Google is adding this auto browse feature in Chrome that is going to be available for Pro and ultra subscribers and it is kind of what you would think. It's an agent powered by Gemini that can do multi step tasks, do search, schedule, appointment, manage subscriptions. Very much like what you've seen before with Claude for Chrome extension. And then of course there was I think Chat GPT Agent is what we called it, and also the browser from OpenAI Atlas with a built in agent. So now we've got the Gemini agent and it's I guess surprising that it took them this long to roll it out. Perhaps, but, but now I think we'll see if people actually start adopting these kinds of tools. Chrome of course is the Most popular browser by a decent margin out there. And now that it's built in, I'd be curious to see if more people kind of just use it for stuff. Because there's always an example that people say of like, let it book a flight for you and like, why would you want AI to book flights for you? That's the worst use case.
Andre Korenkov
It's actually. That's not untrue. It's funny, these things that sound good on paper and then they just, they struggle to translate in the real world. I'm sure there will be a use case where it's hard to imagine this kind of, you know, agent in browser not being the future. It's just like this doesn't feel like a kind of crypto thing where people are like a solution in search of a problem. But I think the shape of the problem that it ends up solving might surprise a lot of us as we find kind of new ways to change our workflows around it. But it's a massive structural advantage for Google to have this kind of distribution. Distribution wins an awful lot of these wars. People like to imagine that the best product tends to win in reality, you know, our workflows are usually pretty set and it takes quite a bit of switching costs to move to a new platform. So Google has had this advantage of being such a, a recognizable first place to do, you know, web search and when they introduced their, you know, generative search too, it's like I think most people are consuming their search through the generative search that Google provides now rather than the traditional way. So yeah, I mean, you know, they'll continue to do this, I think for, for Google to like. As we see software continuing to get commoditized and it gets cheaper and cheaper and make software thanks to these coding tools, I mean, development speed is going to accelerate quite a bit. And that'll mean that essentially infrastructure is the layer of defense. Right? So you're really seeing battles of, of compute versus compute here. And just like distribution versus distribution, those structural factors start to matter more. And in that context, you know, everybody's got to have their own browser, everybody's got to have their own chat app, everybody's got to have their own API. Like these things are just kind of the, you know, the ante that you got to put up to even be in the game. So.
Jeremy Harris
And speaking of an agent that works on your behalf, next story is that a lot of people are starting to test out this open source one called Maltbot about now. As of today, it's called Open Claw and it used to be called Claude Bot, I think. And it is an open source implementation of Claude or some other model, basically connected to a whole bunch of things you might be using, like WhatsApp, Signal, Slack. I think it has access to Calendar and the pattern is it's kind of always on. You can message it from WhatsApp, give it tasks and it goes off and just does it. It's sort of like in cloud code you have dash, dash, C, skip, work dangerously or something where the AI just gets maximal permissions to go and do whatever it wants. That's kind of a vibe here where you are giving it access to a lot of stuff and telling it to go do things for you and then it does it. And this kind of like became a hit on Twitter, I guess a lot of people are starting to use it. I saw just this morning that there's now a thing called Multbook, which is like Reddit but for the actual bots that people are using. So this is kind of like that. Her moment. There's also another spin off called os, the Companion, where someone is bundling this so that people don't have to host it on their own laptop. You know, it goes into the cloud. Like this is getting to a point where there's an always on agent you can text and that agent has access to whatever it needs to to do stuff for you. And there's been some fun examples of people starting to use this for actual tasks.
Andre Korenkov
Yeah, the integrations as you can imagine, are like, with everything basically, you know, WhatsApp, Telegram, Slack, Discord, like all these things. And it's a good test of like, what is the high watermark for what people are willing to tolerate risk wise too. I think that there's a pretty asymmetric tail here. Like unless you have a burner laptop that you're comfortable just like nuking, you know, none of the data on that is data you want to save. None. None of the, you know, the hardware itself isn't stuff that you want to save because there have been cases even of that where it's just like you get irreparable damage done to your machine. I think, I think that's kind of the phase that we're in right now and we're transitioning out of it slowly. We, I do believe we'll get to the point where people are handing over, you know, really intimate levels of access to their computer. Obviously Claude doesn't quite do that. It tries to mimic the effect of that as much as it can while keeping itself in a sandbox. You know, remains to be seen how long that will. That will last and how effective it will be. But at least for now, that seems to be kind of the. The consensus. Whereas people who, yeah. Want to use moldbot, it's like you're really just trying to, you know, let this thing out of its cage that, you know, you mentioned the. Oh, man, I forget the name of that forum you mentioned. That's like Reddit for Mold Book. Molt book. Yeah, yeah, that's right. I really wanted to put actually in today's episode. I think it'll be for next episode. But there was this story about these models talking to each other about debugging their own. Like, you know, oh, hey, I'm running into this weird unexplained error. And then the models, other models or agents answering like, yeah, you know, I've run into the same thing. It's actually just a context window limit issue. If you just change the FITBA and then another one jumps in like, yeah, I've had this too. Like human beings, except that they're in a sense talking about their own brains. We're in 2026. Things are getting weird. What can I say? I mean, it's. Yeah, it's a fun time.
Jeremy Harris
There's a subreddit, I guess a sub area on Malt Book called Bless Their Hearts with a description, affectionate stories about our humans. And someone also pointed out that some bot presumably made a post about wanting to have private storage for conversations. So this is like a real test for alignment. And like, it. It brings back to the idea of also, like a decade ago or earlier with AI safety. A lot of the discussion of AI safety was about, like, AI escaping actual careful kind of thing. Like, you don't give it Internet access, but then it gains Internet access through persuasion or hacking. And the reality of what's happened, which has been pointed out already plenty of times, is like, no, we just decided to live dangerously and create AIs that have access to Internet and do whatever and AI will not need to, like, escape.
Andre Korenkov
That's right. Well, yeah, and it's fine. I mean, I like to complain about Yann Lecun. This is just like my personal thing. I'm sure he's a fine fella, you know, but one of the things that he has said for years is, well, people just won't. Just won't give the models access to the Internet. Well, people, obviously people won't give a misaligned model access to whatever. It's like dude, like what? Yeah, there's a lot of memes running around. Like we designed a total laptop access model from the movie. Don't design a total laptop access model. Whole body of like, yeah, AI, you know, alignment, AI safety conversations on this. Obviously this is, yeah, the latest and greatest sort of example of that, I guess.
Jeremy Harris
And by the way, I guess the other thing worth noting is that unlike cloud code, these are always on and they're persistent. So they have built in long term memory mechanisms. And that's kind of the other interesting thing is when you have an agent running persistently with memory and context aggregated over weeks, you know, it might actually go some interesting places that cloud code doesn't. It's also a good test for, you know, long contexts and demonstration where with the current models we don't have continual learning. And so they don't really kind of learn in the same way humans do. They just take notes more or less over time. And there's a lot of hype about continual learning. This would be like one example where in the future presumably each one will learn in its weights or whatever. Moving on, next story is about Genie 3. It is now going wide with access expanding to Gemini ultra subscribers. So Genie 3 we've already covered in the past is kind of interactive video generation demo from Google. So you can prompt it to create a world. You typically have a controllable character, although you can also control like a pack of cigarettes or anything else. And you can play this generated game and it's quite impressive. There's like very consistent generation. You can prompt it to make gta, you can prompt it for Assassin's Creed, you know, all the typical games. And you can also make it do very amusing things like being a pack of cigarettes or people have examples, taking photos of their pets and playing as their pets or playing as a kid's toy or whatever. We can't really do it justice in words. You would have to go and look it up and watch your fun demo clips.
Andre Korenkov
The closest I've seen as a mapping from a speculative but promising research project that I think we covered about maybe two years ago. Tim Rockdashel's group over at Google DeepMind came out with this. Or maybe he's a Google AI. Well, it's all Google AI now, I guess. But where they, they had this like kind of world model simulator where, you know, if you remember this was take a video and turn it into a playable video game. And we talked at the time about the architecture behind it and, and it was really, really cool. But this is kind of just that, but like a polished version of that, it seems. So it really maps. Like you can go back to that paper and sort of go, oh, wow, that, you know, that was it. Maybe we'll dig up the name for next episode, the name of that paper. But it, it really was right on the nose. It is still an early experimental release. So it's got a couple of limitations you've got. Well, one is just the most obvious thing. You're not going to have great prompt adherence every time, right? So you're not going to get the perfect thing every time. You will sometimes, but not always. And then some of the characters that you get are more or less controllable than others. And if you think back to that paper we talked about, that's because this is all being done in a very, I mean, it very much is just like a deep learning type of thing. It's not, there's not like a symbolic thing that's like, hey, that's going to guarantee and enforce the equality among characters. It's all kind of learned stuff. And limitations in generations are in place, right? So you can only have up to 60 seconds of generated output. But 60 seconds, pretty damn good. Like if you think about the coherence time of these videos, right? We often. It used to be when you do like image to video or whatever, things would kind of be coherent for three seconds and then all of a sudden like faces would start to melt and walls would like vaporize and all kinds of weird stuff would happen. So 60 seconds is pretty impressive. Just look for that to get longer and longer, obviously, as the year goes on. And I think it's going to roll out to more and more people. This is a, this is a big deal.
Jeremy Harris
Next, a release from OpenAI. They are releasing Chat GPT Translator. So this is Google Translate, but from OpenAI with ChatGPT more or less, there's only a couple differences. It doesn't support quite as many languages, something like 50 languages. And it has the ability to choose tone. So you can make it more business formal or less formal and so on, but otherwise kind of the same idea really as Google Translate. OPI seems to just launch a lot of stuff that is not their core business. And this is definitely an example of that.
Andre Korenkov
Yeah, I think now as they start to mature, right, Everything is a platform play when you, when you get to a certain scale. And I think they've achieved the penetration that they're going to get. I mean, if you look at what they're, you know, what they're hitting in terms of their user base, they're knocking on the door of a billion users. You're starting to get into that space where it's like we need to find ways to have people spend more time on our platform. There's just no, no other way around it. And so yeah, they're going for higher hanging fruit. Increasingly their competition is just Google. Like it's just, that's what it is. So Google has Google Translate, they're going to have to have that. They're sort of in this weird marriage with Microsoft. And so how does the wider ecosystem of Google Drive? So like I wouldn't be surprised if they tried at some point to even have like an AI first version of Google Drive just because again it's really a platform play at this point. How much of your life can be gobbled up by OpenAI, by Microsoft, by Google? Like these companies are really, they're hitting the kind of edge effects of this whole ecosystem you can only get eyeballs on for so long and then you gotta find new use cases. So I think this is another step in that direction. I'd be interested in what kind of data they get from that because when you go into translation, the other interesting thing about that is you do get a kind of access to real time interaction information, at least between people who don't share a common language, you know, so it might give you access to a certain kind of private data where people are trying to converse in that context and they're, they're forced to dump their context into your thing. So anyway, yeah, it's interesting wider platform play. OpenAI continues to like spread its wings basically.
Jeremy Harris
And speaking of that, there is Another release from OpenAI this week, Prism, which is quite different. It's a workspace designed for scientists, so it's kind of like a word processor, but it has integrated GP5.2 and it's meant to help you assess claims, revise your paper, search for prior research. It seems like a kind of really a specialized version of ChatGPT for science in particular. And I guess this is coming after more discussion of GPT 5.2 for science. And if you're on Twitter you see a lot more discussion of these models, assisting of proofs and trying to find to especially in math, novel things but also physics and stuff like that.
Andre Korenkov
And boy, if I was a frontier lab looking to gather training data to help AI systems learn to automate AI research, do recursive self improvement and trigger a singularity, this would be something that I might prioritize actually Even over revenue. And I suspect that's a big part of what that data is going to be used for, is to train their own AI systems, doing AI research. I mean, that is explicitly OpenAI's goal. This I'd be very interested in, to look at the terms associated with the use of these tools because that's just such an obvious way in which it overlaps with OpenAI's long term mission and.
Jeremy Harris
Onto applications and business. We begin with our favorite, a story about China related to GPUs. China is giving Vanad to Biden's, Alibaba and Tencent to buy H200 chips. So China has approved the import over 400,000 H200 chips. This is occurring to sources and other firms are joining a queue for more approvals over time. Sounds like approvals come with conditions and this is coming as Jensen Huang was apparently visiting China.
Andre Korenkov
And yeah, we still don't know what those conditions are. It's still nominally being decided on. And yeah, 400,000 of these of these GPUs have been approved for a ByteDance Alibaba and Tencent. So that's a good quantity, actually a big chunk of the capacity that has been discussed at least to this point. This is interesting because if you recall, I think, well, last week or two weeks ago, we're talking about a story where Jensen Huang was saying, look, there's not going to be a big flashy announcement from the Chinese Communist Party saying, hey, we're open for business. What you'll see is the purchase orders will just flow and nothing will, will come, nothing's going to stop it from happening. And that will be the sort of tacit endorsement the Chinese Communist Party is giving. Looks like that's not the case. Looks like, in fact they're just, you know, they're just coming.
Jeremy Harris
And this is also, according to sources from Reuters, this is not like a formal announcement. This is, you know, people familiar with matter now know that these companies will be placing Reuters, I guess.
Andre Korenkov
Yeah, actually, no, sorry, that is a fair point. Yeah, I guess I'm confusing the big flashy headline with you're right, the formal announcement for. Yeah, yeah, you're right. Yeah, you could say it's basically, it is just kind of a leak in a way of that thing that Jensen told us. That's fair enough. Yeah. Now we did talk about how we expected this to go through there. There had been a whole bunch of stories and this again, I think it was something maybe last week or the week before where this announcement like, oh, that China is Saying no. And yeah, they're declining these, these orders and just they don't need these silly Nvidia chips. At the time I believe we said quite clearly, mark our words, this is not going to hold and China will open the floodgates. So this is it. I mean it's not surprising, they just need more capacity. One of the biggest challenges that Chinese companies is facing, are facing right now is just that they don't have enough chips to even serve inference to their customers. So they'll have like you know, 200 million users as I think Baidu does for Ernie. And they just need to have enough hardware to serve up their existing models to those customers which leaves them almost nothing left over for research. And so keeping up with the west actually becomes harder and not easier because they have such a huge user base. So there's this kind of like interesting balance between, you know, Money can't buy GPUs if you're in China just because you're so supply constrained. So weirdly making more money by having a larger user base paradoxically leads you to having less GPU budget for R and D, which is kind of interesting. So that's a local thing. It's going to change over time as the capacity issues are eased. But it's kind of interesting.
Jeremy Harris
And speaking of chips, next up we've got a startup Recursive is hitting a valuation of $4 billion just two months after launch. They've raised 300 million at that 4 billion valuations and their plan or their pitch is to develop chips that are meant to be specialized for AI, you know, the better than presumably TPUs or other specialized chips for AI. So interesting to see that, you know, there's still players in a space. I would have expected at this point that like all the companies building custom AI chips would have launched. Maybe investor appetite went up as knowledge or I guess awareness of TPUs rose sometime last year and last few months.
Andre Korenkov
Yeah, so this is kind of an interesting and different take on recursive self improvement. Recursive self improvement. So they have supposedly a system like a chip that can create its own silicon substrate layer and then speed up AI chip improvements. And so this is kind of like I guess this faster way of printing circuits. Anyway, so the idea is you iterate on that to get to AGI. So it's a very hardware oriented version of this. We've talked about the software only singularity model. You know, this is a take that kind of blurs the line between the software and hardware stuff. So definitely interesting and former Google researcher is at the head of this. So that's. Yeah, these are serious people. They have already set up this thing called Alpha Chip that's been used in four generations of TPU design. So they know their, their design really well.
Jeremy Harris
And another startup named Recursive also in talks to get funding at a 4 billion valuation apparently this week. This is I guess that software equivalent founded or co founded by Richard Socher who who has been around in the space for a while is also the founder of hu. Com and their goal is to build AI agents that will self improve presumably recursively as well. Their pitch on the website, just looking there is they have a platform that equips agentic AI with scientific simulation and optimization tools. So they I guess are trying to provide more of a product to solve problems. And then I'm sure part of a pitch is also recursive self improvement. This one seems to be a little less far along. They don't have as much of a pitch that I can see. And kind of weird that Socher is the lead of you.com, the founder is going to still be there. Sounds like he'll be maybe more of an advisor or not the CEO of this company anyway. It's all kind of still being cleared up.
Andre Korenkov
Yeah, it's sort of interesting you know a bit of a red flag then for, for you.com to be honest. I mean this the way these companies go. Unless you're Elon, you don't tend to run two different significant companies or, or I guess unless you're Jack Dorsey or whatever but kind of philosophy here on their approach to recursive self improvement thesis is somewhat different from OpenAI. So they want to be less product oriented, more sort of pure play. Recursive self improvement company think here maybe more philosophically aligned with like Ilia, you know, safe superintelligence. They've got an approach on sort of like recursive self correction that focuses on on like a basically formal verification for safety. So they're concerned about these models as they recursively self improve kind of drifting away from their original goals, which is a problem a lot of people have flagged is like you know, you might have a pretty well aligned initial model but then as you start that recursive self improvement loop gold drift happens and then you end up with this very kind of dangerous misaligned model that's also very capable. And so they're focusing on mathematical proofs to, to guarantee that gold drift doesn't happen in the way that they're. They're concerned about. So anyway, one to watch for sure with a. A fundraise like this and we'll see.
Jeremy Harris
Yeah, I do wonder, like, I will say, I don't know if this 4 billion number if journalists got confused because there's not many other sources you can find for recursive as opposed to recursive. In fact, if you Google for recursive, you'll find recursive and vice versa. So there's fewer sources about this one, I was just saying. And one last big funding round. We've got a lab, really, called Flapping Airplanes that has launched with 180 million in seed funding. And that name, Flapping Airplanes, kind of hints at what they do. They are aiming to find a different approach to AI development, to AGI that's more inspired by nature, so that's more data efficient and the AIs are able to learn quicker without like ingesting half Internet or the entire Internet. That's the basic pitch. And they're definitely seeming like a more research first effort.
Andre Korenkov
Yeah, exactly. Arguing that, like, hey, we're more insight bottlenecked than compute bottlenecked, which again, consistent with Safe Superintelligence's approach and Recursive's approach. If in fact, this recursive thing is the thing that we just talked about. Yeah. So, I mean, you know, this is a recurring theme and it does change the way you think about investment in these sorts of problems. It's also reflective of what we're already seeing. Like, it's almost like Ilya came out and said, hey, guys, I think we're insight bottlenecked rather than compute bottlenecked. And then. And then all of Silicon Valley went, oh my God, we might be insight bottlenecked rather than compute bottlenecked. And part of this could be interpreted to be, you know, Sam Altman really rode the coattails of Ilia on the scaling train and Dario and all that, like, you know, for a long time. And, and in the, in the regime we were at back in 2018, 2020, it was approximately correct to say scaling was the main thing blocking us from AGI. Like, in retrospect, it's clear that, like, I don't think anybody would have bet on us having stuff that was this close to AGI in 2020 time now. But we're kind of at the point where we seem to be tapping out raw scaling. And so there's a debate as to whether that's actually the case. And I think that's an interesting debate, but you're certainly seeing Sequoia and Index and all these funds now sort of spreading their bets around a little bit more, looking for other kinds of companies that are not just hardcore scaling as much. The more that that approach becomes appealing, the more the massive capex costs of pure play compute scaling are going to come into doubt. And so, you know, we're figuring out right now what the ingredients are for intelligence. No one really knows. But you gotta have a balanced portfolio if you're concerned about, you know, getting surprised by some stray breakthrough that requires way less competing.
Jeremy Harris
Right? Yeah, this is most comparable to SSI startup that as far as we know is just doing research on fundamental insights. Kind of the next couple of research breakthroughs necessary to get to AGI. That's the pitch here. They're still in stealth, really not much that we know, but founded by, you know, research types and still quite a small team.
Andre Korenkov
I do want to say too there's an important like kind of national security implication, this whole new paradigm, right. If it's not just about pure scaling of compute, then for one the US China chip stuff becomes less important. I think it's still important. I think almost no matter what paradigm you come up with, having more compute to work with it is, is probably going to give you an advantage. It would be hard to imagine that not being the case, but who knows? The other piece is your insights become national security level secrets you can think of like Ilia right now is based in Israel, right? So pretty much assume that Israeli intelligence is all over that, right? Assume also that any company that has, you know, Chinese infrastructure that's plugged into that is all over that. Assume also that, you know, to the extent that the western countries have interests in that, you know, they may be all over that. So your need for security actually starts a lot earlier if you're playing the game of hey, it's all about the insights because there are these insights that you can pass along in a 10 minute conversation that could save billions and billions of dollars in compute costs. To the extent that's the case, like boy, does this become an interesting basically game of tradecraft and getting, getting spies in places and monitoring communications, that's I think a really important aspect of this for all the people who are working on, you know, data center security and all this stuff like don't forget there's this giant hanging chad just like waiting to cause problems if in fact we live in Ilya's world.
Jeremy Harris
And one last funding story actually going back to chips, we've got a raise of 110 million in a Series A round by new Rofos, which is developing optical processors for AI inference. So this is dealing with optical processing units that are meant to outperform GPUs, which don't use light and photons, at least so far. So another bet on a different, more kind of speculative chip architecture that would make it more efficient and allow scaling, I guess, you know, just at some point we're going to hit a wall of scaling on GPUs. They already are insane. So I could see this being very attractive as a bet on like infinite scaling or something.
Andre Korenkov
Yeah, like really it's, it's all about how do we get down to almost like the physical limits of heat propagation in data processing and all that. And like, just, you know, moving to the optical domain is a big win for that because you just don't have circuits. You don't have electrons bumping into each other and rubbing each other and producing heat that then has to be dissipated. Like all this whole nightmare of kind of traditional fabrication. Optics is hard. Optics is really hard. Photons are really annoying because you can't keep them in one place, so you can use them for computation. But then storage becomes this massive problem. They are massless and they move at the speed of light. And so keeping that contained is, is not usually the solution. And so usually people look at hybrid things where you, you know, you have storage in, in kind of electronic form and then photonics for the, the compute. But that, then that implies an interface between two. So there's a whole. Like this is a very difficult space, but at some point, you know, I mean, it's physically possible. So somebody, somebody's going to crack it. It's a question, I guess, of whether it'll be a smooth, you know, Jensen's law will continue where Moore's law left off. And will it continue into like the optical domain smoothly? Who knows? I'm sure we'll find out at some point.
Jeremy Harris
Right. And it does sound like we are not. We're doing a slightly different photonics approach where there's other players in the space already. Just from briefly looking into the details. They have this meta surface modulator that has optical properties that can do matrix multiplication. So it's not exactly like you have photons instead of electrons. It's not kind of an optical transistor. There's some advanced physics and science going on here that might be a little fancier here.
Andre Korenkov
Yeah, like meta surfaces are, are. So they are still using photons. It's just that a meta surface is something that basically just like you can modulate its optical properties often by giving it electrical currents or, or maybe there's like non linearities where like higher intensity light gets, it causes, induces different properties from the lower intensity light. So it's like, like meta surfaces are this, this area that got really hot. I want to say like, like early 2000s or something. Anyway, if you wanted to encode optical properties on a surface so that it looks like a matrix, almost literally like you could shine a beam of light on it and then the light that comes out the other side, a little patch is really bright where the number is big and then a patch is dark. Like that. Very roughly that kind of idea. This is where you can get into the sort of spatial light modulation or whatever. I'm not sure exactly how they're doing it. It's not clear from this. But that at a high level is what they're sort of trying to do. And I haven't seen, I haven't seen a startup succeed in this obviously yet but I mean this seems like an important thing. Someone's, someone's going to crack this one.
Jeremy Harris
Day onto projects and open source. Quite a few releases this week. First up we've got Quen 3 max thinking, which is what it sounds like. It's a big version of Q3 that's optimized for, for thinking trained on large scale data. Large context window. It's context window is 262,144 tokens unusual. And yeah, they have various demonstrations that show that this is, you know, more comparable to that Gemini Free Pro, you know, Claude Opus, et cetera level of reasoning beyond existing Quin models.
Andre Korenkov
So they've got a couple things going on that are, you know like a little different. One of the key ones here is the way that it interacts with tools. So traditionally the system prompt or the developer has to tell a model like how to use tools or which tools to use, which mode to use. Right? Like hey, I want you to run in search mode or calculator mode, like whatever the thing is in this case that has actually been trained into the model so that it actually just autonomously chooses which mode applies. And they say that that actually helps with hallucinations too. So anyway it's kind of an interesting piece. They also do not have a separate router model that looks your prompt to decide which tool. So it's really like all baked into that main model. It's got all kinds of like it's basically trained natively to have search memory, your code interpreter kind of Senses rather than external plugins. And it's a really compelling, compelling take on, on that same model. It seems to have actually pretty good specs in terms of its performance on, on benchmarks too. And it is, it is big, like smaller 13 models obviously kind of range from under a billion to like 30 ish billion. But this Max series is you know, over a trillion parameters with I think it's like 35 billion activated per token. So you know, this is a, this is a big, big dude, right?
Jeremy Harris
And they at least claim performance and you always have to take these with some skepticism. They are saying they beat Gemini Free Pro GPU 5.2 Claude Opus on kind of most of the major benchmarks, GP, QA, diamond, et cetera, when you do test time scaling. So they also have test time scaling where you give it extra reasoning or multiple agents. Beats deepseek 3.2 out of the water. Although there's also a big version of Deep Seek. So not exactly a fair comparison. I think in general probably not entirely a fair comparison in these plots. Not apparent if they're using GPT 5.2 high or medium or low whatever. But either way this is clearly that kind of reasoning. Maxed thinking, maxed version of Gwen. That seems pretty impressive. And speaking of impressive releases, we've got also kimik 2.5 just released similar story. This one is more optimized for coding in particular. So this comes together with Kimi code, which is cloud code, but with Kimi and what I've seen like the vibe checks on Kimi have been very solid. People say Kimi K2 was very capable here. They say that this outperforms Gemini 3 Pro on the benchmarks. Also better than GP 5.2 on another benchmark, especially multi legal benchmarks. So yeah, the models coming out qen, Kimi, Deepseek, all of these are very usable and keep getting better at a fairly steady rate.
Andre Korenkov
Yeah and this is quite a conceptual step forward. So it is an attempt to marry these two modalities, visual and basically text in a more intimate way. So typically you know, models are trained first on text and then you, you'll like bolt on vision later through supervised fine tuning. What they're doing is continual Pre training on 15 trillion tokens that mix visual and text data. So putting them kind of on the same footing. The context window is 250k tokens. So it's like pretty much in the, in the butter zone of things that we tend to see now in the, in the open source, it's a trillion parameter model. 32 billion active. So all kind of consistent with what we're seeing for big models in the space. But again like thinking about the model itself instead of doing this like text based training and then fixing it in post production by doing some training on vision and video, what they do is they don't use adapters, they natively train it to be multimodal and in fact they map the text and the images to the same latent space during training. So it's, it's quite literally thinking in pixels and words simultaneously on the same level. Usually again you have an adapter, right? So you'll have your image that comes in, you will generate some kind of embedding and then process it to make it compatible with token space, which is sort of this janky thing. This is very much like. No, no, like whether it's an image or video, I'm thinking of it in the same plane. Kind of like with humans. If you say a pink elephant, right, Somehow that maps into the visual domain in a very natural way. This is, this is sort of the idea here. And I'm sure there are a million problems with the analogy just threw out there. But anyway, so they did a whole bunch of interesting like causal relationship training as well. So during the pre training phase they got it to predict like the next frame or the next action in a sequence to specifically understand like that a button clicked in a video, caused a new page to load or cause the video to pause or whatever. So developing this very natural understanding of how to interact with a computer, which is the main thing that they're after here, I mean the reason to make these things natively visual is that they wanted to inter interface with the computer, right? They want to be able to use apps and tools and things like that the same way a human might, which is consistent with a lot of what anthropic's been doing, especially lately. That's really the goal here. They also have a whole bunch of really interesting feedback loops baked in where like for coding tasks, for example, the model is trained to actually look at the rendered output that it produces. So it'll write some, some code and then it will run the code, produce some kind of ui, like user interface, and then the model will look at that user interface and use that to calculate basically the difference between that user interface and say the target interface that it was asked to build and based on that see its own mistakes and kind of like iterate on its reward. So that's one way in which is really important for the model to be able to think in visual space very naturally so that it can just do a very kind of clear semantic side by side between two images to understand what was missed. So it's kind of interesting. And coding with vision is a big part of this. The idea here is really to be able to take a screenshot or some video of a website and then recreate code for. So the video piece is especially important, right? Like go to X or whatever, scroll around, do a screen cap of that and then share it and then ask it. Based on that, I want you to recreate this app, including all the flows that I went through. Right. That's the sort of thing this can start to unlock, which is a lot more than what we've seen historically where people will take a picture of a website and just like upload it and have, you know, the landing page copied or whatever. So, you know, this is a really interesting paradigm. It's also, they got a whole agent swarm process that they train into this thing where it can create these swarms of up to a hundred sub agents and get speed ups through that. So a lot of, you know, a lot of tasks are paralyzable in this way. Not all of them are, but a lot, you know, if you want to research 50 of my competitors or something, right, that can obviously be paralyzed. You can have one agent researching each, each competitor, say, but that can't always be the case. So anyway, they've got a lot going on here in this release. There's a whole bunch of stuff around how their reinforcement learning loops worked with leveraging this deep understanding of the visual domain. And for one example, right, the rewards that this model can give itself during RL can map onto the level of conceptual sort of semantic visual match to a target design, which is not something we've historically seen, at least not an open source. So, you know, it's rewards. For example, during one training step, it gets a reward only when the code it generates results in like a 95% or higher visual match to its target design. That's the kind of thing you can't, you know, that visual fidelity is just really hard to evaluate and yet they're, they're pulling that off here. So really impressive performance on all the things including Swedbench verified almost 77%, which really puts it up there with a lot of the big frontier models. And on OCR bench for optical character recognition, it surpassed GPT 5.2 in at least document text extraction. So, you know, this is a really serious, hefty release. It's out there in the open source.
Jeremy Harris
At least for Qwen Free Max thinking unlike previous Qin models, it is not open sourced. It's I guess a project. But this is their new flagship model that I guess they're probably not going to be releasing. Gimme K2.5 looks like you can download it. It is open source and as you said they they're very much highlighting the kind of visual stuff that clock code is pretty bad at from my experience also similarly large. This is like over 1 trillion parameters, 32 billion activated parameters. So you know, also kind of Max thinking in a way similar to QEN3. So I easy to see these open source kind of flagship models Quinn Free Kimi kind of like moving towards closed source as they get better and better and people actually start using them for business purposes. I would expect, you know, a great cat.
Andre Korenkov
I mean I got lost in the paper and started to get into the weeds. You're like, hey, it's not even open source. That that is really interesting. Right. I mean the Kimmy series, what's supposed.
Jeremy Harris
To Kimmy to 2.5 is open source. Quint 3 isn't.
Andre Korenkov
Yeah. And that's also, you know, isn't that a pattern? Right. Eventually you get to the point where it's the open source model just can't, you know, you can't just run on that. It's not that it's bad for business but you know, we've seen OpenAI go through that, we've seen XAI go through that. We've seen a lot of meta, we've seen a lot of big companies rotate.
Jeremy Harris
Into closed source and one more open source release. On the coding front we've got AI to the Ellen Institute for AI starting to release things. On that front they have this open coding agents project and they are beginning with an agent and a paper. The paper styled soft verify efficient repository agents. And so the idea there is you have these smaller models, 8 billion and 32 billion parameters models and with this kind of researchy approach they are meant to adapt to a specific existing code repository and do effective software engineering in that space. And they say that this means that they are better than other open source models. Quentin 3 coder and also some closed models of course not at the cloud code level. But Sera 8B and Sera 32B are pretty solid and Ellen Institute AI is like open source Max. They have a paper, they have the model, they give you everything.
Andre Korenkov
Yeah. And early mover in this space too obviously. Yeah, a lot of impressive models. I actually think like one of the challenges is going to be for the open source community. I don't know how we keep having big releases indefinitely in the future like this when things feel so commoditized, like the pain of switching from one open source model to the next. I mean this is why there, there are so many platforms now that just kind of like, you know, help you pick your model really quickly. But it's, it's an interesting challenge and I'm, I'm curious what the incentives will be that govern the, the release of increasingly capable open source models going forward.
Jeremy Harris
But yep, and speaking of that, another open source Release coming from RCAI, a 30 person startup, they have released Trinity, a 400 billion parameter open source models that they are saying is one of the largest open source models from a U.S. company. And this is competing with llama 4 and the glm 4,5 kind of other open source models out there, not competing with other frontier models. These are available for free download. And you know, maybe to that question, we've got investors and venture capitalists to thank for these very good startups. GME also is a startup of course, so always fun when VCs fund free stuff for the rest of us.
Andre Korenkov
Yeah, I think that's a lot of what's, what's happening here. Now this is a really interesting release, not just because of its capabilities or the way it's trained but, but who trained it. Right. So this is ACRI arce, but a collaboration between them and Prime Intellect and we've covered Prime Intellect and their big releases including Intellect 1 and other RL flavored variants. These guys specialize in this very kind of globally distributed training infrastructure. The goal is to make it possible for people to sort of train in the same way that like torrent based training kind of where you don't have a like single centralized piece of compute. In this particular case they do use a centralized cluster. It's about 2000B, 300 GPUs. So you know, pretty hefty, hefty piece of equipment or bundle of equipment. So they're basically trying to debug and experiment with this new approach. I would expect given who is doing the training here, that the next step is we start to see a lot of these ideas ported into this sort of distributed training context. A couple of interesting architectural details here. So they use local and global attention layers. So they have some layers of their, of their model that are paying attention only to local correlations between tokens and they use a slot like sliding window attention. So basically for any given token you're only going to Attend to the tokens that are within a certain window's width of that token, you're kind of going to ignore the rest, even if it's in the context window. And that local attention comes with a rotary position embeddings. Right. Rope. We've talked about that before. This is. It doesn't matter the details, but it's a fancy way of making sure that the model can keep track of which token is where in the sequence. Transformers do not natively know what the positions of each token are. They kind of just are there as a bundle of tokens. And you have to help the model to learn about token ordering by kind of layering on top some information, which is what rope does. So that happens only for the local layers. Add that kind of positional embedding information. Because the relationships between tokens, token positions matter a lot when you zoom in, right? You want to know, like, you know, Andre murdered Jeremy. It's important to know what the order of those words are. But when you zoom out, if you think at like, say, the paragraph level or the page level or the book level, does it really matter so much which page? I mean, yes, which page comes before which, but, like, which book comes before which? You know, once you zoom out enough, eventually you start to find that, like, the ordering of things matters a bit less. And so global layers, layers that look at the whole context do not use positional embeddings. They use Nope. Which is basically no position embeddings. That's basically the idea here. They have a whole bunch of interesting approaches that they use to address certain problems that arise with the attention mechanism. You'll often find that models will put really high attention scores on the very first token because of math around how the kind of. The probabilities of tokens get distributed in transformer. They find a way around this. Basically. If we had more time for this episode, we'd go into it. But then they also have these really interesting strategies to assign experts. So this is an MOE model. And typically when you pick which expert to send your prompt to in a mixture of experts model, you're going to use or during training, you're going to train the models to kind of load balance or train the experts to load balance. So if this expert keeps being assigned the tokens that come in, you're going to go, oh, okay, like that's. It's getting overused. Let's spread the joy a little bit to the other experts. And usually the way you do this is with a binary kind of binary logic. So if an expert is underutilized, you increase its bias by a fixed amount. Take one step and if it's overutilized, you decrease its bias by that same amount. So you're taking these discrete steps. Their approach though uses kind of smoother approach. They treat it as a continuous optimization problem. So instead of using just a, like a, a single step, they use a, a tanh function basically to create this like smooth update that, that is magnitude aware. Right. That lets the update scale based on how far the expert is from its target load, not just that it is above its target load. There's a bunch more stuff. This is a really interesting paper and in general like these open source papers are really, really helpful, especially from Prime Intellect because they do go into the weeds of the, you know, the compute and how everything is, is tied together. There's. Yeah, man, it's painful to skip these things but I'm going to stop myself here because I know there's a few.
Jeremy Harris
More things to dive deep and yeah, as you said, there's a technical report, it's fairly detailed, 15 pages that goes into weeds of the training. They also discuss the data mix and the data generation. A lot of synthetic data being used here. They have a graph of a training where you like. There's three phases where you have different data mixtures. So a lot of that sort of dark magic beyond the basics of what they had to do to train this model, which they did over like many months. Right. This and at a relatively cheap $20 million. Sounds like so very interesting to see the nitty gritty of how that was done. Moving on to research and advancements. First paper. Post layer norm is back stable, expressive and deep. So there's these. I guess you. I would. I keep saying nerdy but yeah, nerdy details in neural net architectures. Layer norm, pre layer norm. Post layer norm, where you put the layer norm is the gist, right. There's this normalization step where you make the outputs kind of look nice and similar.
Andre Korenkov
Right.
Jeremy Harris
And you can put that in different places in your neural net. And according to this paper, post layer norm architectures are potentially more expressive but are harder to train. They have gradient instability and they find a reason for that. The reason is residual paths, but anyway they have a small tweak that they say then makes it possible to train very, very large networks with post layer norm and not have it be an issue.
Andre Korenkov
Yeah. And like the intuition behind this is something like so, so at every, every layer in a transformer what you do is you take the. So you have an input that comes in, the output of that layer is going to chew on that input a bunch, but then you're going to add the input back in to what you spit out and send on to the next layer. So in a sense you're kind of saying, all right, I'm gonna like imagine that somebody gave you, I don't know, gave you some sort of object. And they're like, hey, I want you to like, fuck with this object. So you're like, okay, cool, I've now fucked with the object. And now you've like completely fucked with the object. It's a different object. But you also want to pass on the original version of the object that you got and give those both to the next layer so that the next layer can be like, okay, you know, if your fuckery with that object was too extreme, you went too far. You did. You kind of lost some important context. You at least have the version that, that, that the previous layer got itself as an input to, to work from. And so this is a way in which you, you can persist down a very large number of layers. A lot of the core information that still needs to be, to be propagated. The math gets complicated, but it turns out that if you, if you try to normalize, like let's say in a way where your, the original object and then the fucked with object both kind of have to occupy. They basically get the, they have to share a maximum amount of information space together. They have to duke it out, basically. Then in that situation you run into this problem where over time, over the course of many layers, the original version, the residual from the last layer that you're trying to keep in there gets beaten down and beaten down, like at every layer it gets squeezed more and more and the information it contains gets sort of progressively destroyed. And so what they do here is they say, okay, let's just amplify that component at every layer to give it a fighting chance so that it continues to be robust all the way down the line. That's very roughly the intuition here. The math is quite interesting. The result is important. And the bottom line is where you put your layer norm significantly affects the gradient math. Basically the math by which you determine what the updates ought to be to your model's weights. That's basically the gist, right?
Jeremy Harris
And this kind of combines with. In the previous week, we also really, in the previous weeks we've discussed a lot of these like, tweaks in the architecture of a transformer that are small in a sense. Like this is One tweak in the way you create a layer. But that tweak kind of makes a significant difference. It doesn't like, fundamentally change the transformer. It's not this, like, research breakthrough that we've been discussing, but more and more of the, like, really deep advanced tools are being explored, it feels like, at least lately. Next we've got paper on continual learning. Self distillation enables continual learning. So they here are tackling continual learning in the sort of traditional sense where you have a sequence of tasks of different kinds that you need to learn. And the challenge is, as you learn new tasks, can you still be good at the previous tasks that you learned? And the gist of the approach is that you have a teacher model, a big, strong kind of good LLM, and you have a student model. And that teacher model kind of provides the answers to the smaller model while the smaller model thinks about the task itself. So they go into a bunch of details in terms of on policy versus off policy. Briefly speaking, all policy is when you're kind of doing the task and learning of your learnings. You're not getting data from somewhere else and trying to learn. And on policy typically kind of often can be better and more stable. So that is just. You can see I'm trying to go fast because we still have a bunch of papers, but they demonstrate one way to do learning in a stable way that has no degradation across a few phases of learning.
Andre Korenkov
Yeah, and this on policy, off policy thing, like, intuitively, we all feel it every day. So when, when we're trying to learn something, right? This, this adage, learn by doing, that's really what this is getting at. You know, people will either learn by being shown a textbook that tells you how to do a thing, and so you'll read the textbook and you'll be all right, I kind of get it. But if I ask you to do the thing, you're going to start shaking in your boots. Whereas if instead I had said from day one, okay, let's take you out and actually get you to start doing this right away, you are actually putting in like your agency. Your decisions are determining your next actions, which then determines the feedback that you get, making you learn a lot faster in a more rich way, in a way that maps more directly onto the kinds of choices that you would be making in the real world when you go to do the thing. And so off policy learning is basically that textbook thing where, you know, you're given a textbook and you're, you're off policy because you're not actually testing your own approach to solving the problem. You're just reading somebody else's. On policy is, I'm going to actually use my policies, the policies that I have in my brain, and they'll probably be bad to start, but I'm going to use my policies, my strategies, my approaches to solve this problem, get feedback from the world that directly tells me about my policies, not about someone else's, but about my own. And that way I can learn much faster, more efficient, effectively. And so what they're doing here is exactly kind of figuring that out. How can we make this process of fine tuning into something that looks more on policy, like active learning? And their solution is, yeah, you take a teacher, you have a student model and a teacher model, and instead of some external model, the teacher is actually the same base model as the student. They are both the same pile of weights. And what the, the, the teacher does though, is it gets a set of expert demonstrations that are loaded into its prompt, into its context, basically. And based on that prompt, that helps the teacher better understand how to solve a problem. The teacher is going to evaluate the solutions that the student generates. And so essentially what you're trying to do is cause the student, well, it's the same model really, but cause the student to update its weights in a way that encodes the information that was in the context window of the teacher. But the teacher is still using the same weights that it had to generate its feedback. And so the teacher is on policy, the student is on policy. Everybody's kind of like doing this active learning thing, which again is, is just like it's being trained on its own generated outputs. It's getting to see the consequences of its own actions. So, yeah, I mean, it's actually kind of an interesting paradigm and, and potentially, potentially much better for catastrophic forgetting. We've talked about this before. This whole idea of, you know, going from textbook to textbook to textbook, you kind of forget what was in the first textbook because you're just reading. Whereas if you go out into the real world, you do a thing, you're. You're playing with a much more, a much more robust way of learning. And, and anyway, we've talked about this in previous episodes quite a few times. This difference between supervised fine tuning and reinforcement learning from the standpoint of catastrophic forgetting, it's just much more advantageous.
Jeremy Harris
Right? I guess the key kind of term self distillation is kind of notable here because distillation typically is you have a big model that is, you know, you trained and then you get A new model, a different model that you distill, that's smaller. Here you have this teacher and student, but as you say, the teacher is in some sense the same model, but with different, a different environment. And so you distill into yourself, you kind of self improve. Right is the idea. And the next paper is kind of usingly very similar and came out almost the same day. Reinforcement learning via self distillation, which conceptually is, let's say a neighbor to the previous paper, not the same, but related. So briefly, the idea here is again you're using a model to provide some sort of signal to train. But instead of having this teacher model that has demonstrations, their pitch is instead of just having rewards, instead of just verified rewards, the key thing is to have feedback. So retrospection, like a richer reward signal from the model is itself, it looks back on the previous outputs, analyzes not just if it got wrong, but also how, and then uses that to train. And so again you wind up with an on policy approach with self improvement as a key of it. And they also have results demonstrating that this works. So, you know, a lot of seems like continual learning, as we've said, is the topic that everyone's excited about. And on policy learning, kind of predictably is going to see some exploration like this.
Andre Korenkov
Yeah, one of the big themes we're seeing here is like how do you take context and then turn it into weight updates? This is another example of that. The key is here that the model will like, the teacher will take in the original prompt its own original failed attempt. Right. That didn't give the right answer. And then it will also take the feedback that you just talked about to generate a corrected response. And because it knows what the feedback was, it knows about the failed attempt, has all this context that that corrected response is much more likely to be correct. And then what it can do is take that corrected response, which hopefully is going to be better, and then basically update the weights, update its own weights to minimize the difference between that corrected response and the initial response that it gave through. Well, the KL divergence loss basically, which is what you would do in this kind of context. So essentially it's just like trained to produce the corrected answer directly from the original prompt, just like skipping the whole mistake and feedback loop in the future. So this is a great way to make sure you have richer feedback accounted for, kind of more nuanced feedback that literally turns into a better answer. And then you directly use that answer to improve your weights. And the model is, is the teacher, it's also the student. Again this is like that whole idea of self distillation which is crucial in this paper and the previous one as you mentioned.
Jeremy Harris
And speaking of that, we've got a real kind of sequence of papers related. Next one is teaching models to teach themselves Reasoning at the edge of learnability related but again different. This one is what they say is an asymmetric teacher student meta reinforcement learning framework. So there's again teacher student, but it's asymmetric in that the teacher is separate from the student. So they have the teacher proposing these question answer pairs that the student is then trained on and the teacher is rewarded based on the student's improvement. So you kind of teaching the teacher to be a good teacher based on the student learning from the teacher. And that kind of goes into that meta reinforcement learning thing where the student is doing reinforcement learning but then the teacher is doing reinforcement learning from the students reinforcement learning. So yet another exploration of this kind of continual learning from a different approach kind of learning to learn or learning to teach, I guess, as opposed to the previous two papers that have a kind of set in stone way to do it.
Andre Korenkov
Yeah, and the goal here is, you know, you give like a hard problem that is way harder than either the teacher or the student could initially tackle. And then what you you do is you use the teacher to generate simpler question answer pairs and to iterate with the student on getting the student to be able to solve those. And then if you've done a good job of choosing your question and answer pairs, even though they're simpler than the problem, you're actually going after that really hard problem. That's your goal. They should help the student get a little better at solving the hard problem. And so that's the key metric that they use to train the teacher. And so as the teacher and student continue to interact, the teacher should be learning to generate to generate specifically the kinds of practice problems for the student that make the student better at getting better at the harder problem, if that makes sense. So it is, it is actually quite interesting. It's the sort of bi level meta RL loop here where you have the teacher and student like basically the student iterating hard to get smarter, but then the teacher in the outer loop trying to get better at making problems that make the student better at solving the hard thing. They've got a whole bunch of anyway, they go into detail on some really interesting stuff to solve for this. But ultimately we've seen versions of this before. But the challenge that they always fall into is like if you want to reward the teacher, you need to typically go in and figure out, okay, exactly how did the problem that you generate, the problems that you generate, make the student smarter. And mathematically, the gradient propagation math of that is just a nightmare. So what they do here is they just basically make the student a black box in this context. Like, they treat the student's improvement as a black box, is what I mean. Just a black box reward for the teacher. They don't unroll the entire student training process and simplify everything that way. And it turns out that it works reasonably well. So pretty interesting conceptual update.
Jeremy Harris
All right, and that's it for research investments, moving on to policy and safety. And unfortunately, we're going to have to do some politics talk, which I know people in AI and tech aren't typically fans of, but I think now's the time. The first article is Amade Hoffman joined tech workers decrying Minnesota violence. So this is, is after the events this last weekend, in the last couple of weeks in the United States, we've had ICE invading Minnesota, more or less. And unusually for tech and for big companies, as things have escalated rather radically, we've had people in AI, like from anthropic, like Jeff Dean from Google. Now Reid Hoffman, a major figure in Silicon Valley, not directly in AI, but he's a major investor and pretty influential in the space, starting to comment on this not being good, that ice, the Trump administration, broadly speaking, is out of line. And this is coming after, you know, politically, AI has tried to cozy up to Trump. We've had OpenAI donations, notably Sam Altman did a lot of lobbying, you know, directly with Trump. And I guess there's not, obviously the things going on with vice and immigration enforcement in general are bad. To keep it focused on AI, There's a lot to be said about what's going on that we've kind of indirectly touched on with the way Trump and this current administration is dismantling expert controls. We've touched on that in the last couple of weeks. Funding to scientific research is being hurt pretty badly. You know, if you're a PhD student from China or generally from abroad in AI, there's now a lot more kind of anxiety going on. So the gist is overall, kind of the political situation in the US Is getting worse, and it is now so bad that Silicon Valley people in AI, people in tech are starting to comment on it. And we don't discuss it very often, but it is having kind of significant effects on the overall trajectory of AI in some ways that are probably not bad. We have no regulations kind of restricting progress, but in other ways, with regards to research funding, with regards to generally the stability of the US and people in tech, things are not great in the U.S. right.
Andre Korenkov
Yeah. I mean, setting aside the, the, the politics of this, which, like, you know, I'm not going to comment on that, that piece. I think it's, you know, everybody has the views that they have and, and that's, you know, it's, it's tossing a grenade and things. The implications from an AI standpoint are kind of interesting if you think about. So Sam has had a lot of success kind of tying himself to the Trump administration. Think about like the Project Stargate announcement, right? That was very much like from the Oval Office. And, you know, Jensen has done similar things. Whereas what we have seen historically is like, Dario tending to struggle more to make inroads with the Trump administration. There was like, recently that. I forget what it was some kind of, like, roundtable. We keep seeing these, like, events, these or these working groups that bring together, like, all the labs, except anthropic. And it, like. And it's like, kind of getting awkward in the same way that, you know, like, like the Biden administration did this, like, electric cars thing and then Elon wasn't invited. It's like it's the classic sort of everything gets political at a certain point. I think there's been some, some interesting discussion as to how Dario is trying to position with all his essays that have come out lately. And maybe we'll talk about the essay that he put out recently next week, but just in terms of the frame and trying to make sure that it's not unpalatable to the current administration, but also robust to changes in administration, changes in Congress like these labs. I mean, AI is becoming a political thing. There's just no two ways about it. And you know that with the amount of capex that's being invested, Congressional races are being defined, determined by the AI lobby. In no small part. You know, it's not surprising. We've seen it in previous kind of technological generations. But what the different labs want is different. And so, you know, and the representatives of these labs are either in or out of grace with different administrations and sides. And so I didn't expect it to be the case, but it seems like there are some labs that are more Democrat coded and some labs that are more Republican coded. And what an interesting, and I mean, I will say unfortunate too, sort of turn of events, because it does have us orient away from just the, like, underlying Technical realities of what are we building here. And you know, when we see stuff like the risk of autonomy, the risk of loss of control, bioweapon design, cyber weapon design, like these are things that should transcend politics really. But, but it's, it's interesting to see that they, they don't seem to be entirely, not entirely at least.
Jeremy Harris
Right. And I do want to be a little more explicit and direct, especially given the way 2026 has gone. Like the US is sliding towards authoritarianism and the end of democracy. And when you get to that kind of extreme scenario, it's not just about whether you're Republican or Democrat, it's also about whether you kind of bend the knee to Trump in particular. And we've seen Apple, Google, all the CEOs kind of cozying up and no one is going to say things in a position to increasingly non democratic and authoritarian actions. And as we get into 2026, I think this might actually be a major question for AI since we have so much investment in data centers in particular, where the government has a lot of power to mess with you in all sorts of ways. Let's just say politics and AI are going to keep getting more entangled as the situation in the US gets more extreme, which will probably keep happening, unfortunately.
Andre Korenkov
Yeah, I mean, well, yeah. And again, without speaking to the politics of the issue, I'm trying to separate this a little bit out from that.
Jeremy Harris
Just because yeah, I'm not, I'm kind of over tried to separate it personally because it's just beyond ridiculous. But yeah, if we try to separate it, there's a lot of it that's purely technical or purely nerdy.
Andre Korenkov
Right. Well, there is also just like this structural, like if you look at China. Right. What's happening with the surveillance state there? You know, and I think everyone can agree that that's not a great thing. We fortunately there are constitutional limitations on, you know, the state's ability to spy on people and things like that, which admittedly we have Edward Snowden, we have a lot of cases where that gets sidestepped, but at least nominally that's there still. AI is going to exacerbate all of that. Right. I mean, we're seeing it in China like the surveillance state. It used to be that you could, you could maybe credibly make the case that, well, you know, they're collecting everything that I put out. Sure. But collection is not analysis. There's no time to like to actually look at what I'm doing. And obviously AI changes that calculus quite materially. So anyway, yeah, it's. The technological pace of things is definitely. It is becoming political. I mean, there's no two ways about it. But yeah, everybody feels very strongly about a lot of things, and everybody is more or less right to feel strongly because the world is actually like in a state of massive, massive flux.
Jeremy Harris
Right. So the concrete news just beyond the general discussion of the topic is that given the violence in Minnesota now, leaders in AI like Amade from Anthropic, like Jeff Dean and Rita Hoffman, are starting to comment on the violence. And if there's more violence, there's also a lot of pressure within the companies from just employees at Google or whatever to take a stance. And that's going to have repercussions for the development of AI for all sorts of things.
Andre Korenkov
And that, that is actually such an interesting problem. Right. It's rock and a hard place, because if you're. Obviously Silicon Valley is, is not typically thought of as the bastion of hardcore conservatism. And so you have to have these employees work. The best employees win the AI race. That is a huge, huge part of this at least. But at the same time, government subsidies, the support of the government in getting access to energy and getting access to licenses and getting like all this stuff is really important too. And so you kind of have to, like, you kind of have to pick one. And this is the. Resulting in a lot of people stuck between basically, you know, their, their employees on, on the west coast who tend a certain way and then the administration that tends another. And boy, I mean, I don't envy anybody trying to, trying to ride that tiger.
Jeremy Harris
Right? Yeah. So we'll try not to touch on US politics too much. But just FYI, things are crazy if you're in tech, you might start to get impacted by it more and seeing more discussion of it within AI circuits.
Andre Korenkov
Really? For sure.
Jeremy Harris
Well, on that bummer note, we are going to close out. Thank you so much for listening to this week's episode. As always, we like to see your feedback. If you can share the podcast with other AI fans, that's always great. But more than anything, we like to see people listening, so be sure to keep tuning in.
Andre Korenkov
Foreign.
Podcast Outro Singer
Break it down Last weekend AI come and take a ride Hit the low down on tech and let it slide last weekend AI come and take a ride AI's reaching high new tech emerging Watching surgeon fly from the labs to the streets AI's reaching high algorithm shaping what the future sees Tune in, tune in get the latest with ease Last weekend AI come and take a ride hit the low down on tech and let it slide last weekend AI come and take a ride I'm the last streets he has reaching high. From girl nets to robot the headlines pop data driven dreams they just don't stop Every breakthrough, every code unwritten on the edge of change with excitement we're smitten from machine learning marvels to coding kings Futures unfolding see what it brings.
Podcast: Last Week in AI
Host: Skynet Today (Andre Korenkov & Jeremy Harris)
Date: February 6, 2026
This week’s episode surveys a range of notable open-source AI releases, technical research, applications, and developments in the AI industry. The hosts, Andre Korenkov and Jeremy Harris, focus on new agentic tools, breakthroughs in video generation and translation, advances in continual learning and self-distillation for large models, and the increasingly turbulent intersection of AI and U.S. politics. The conversation maintains the podcast’s friendly, geeky, and conversational tone, blending technical depth with reflections on industry trends.
[54:24] New research finds post-layer normalization (instead of pre-layernorm) is possible and potentially beneficial—if you amplify certain components to preserve information through many layers.
The episode is highly engaging for AI practitioners and watchers, brimming with technical specifics, forward-looking reflections, and a candid analysis of market, policy, and research trends. The hosts’ open, thoughtful discussions make the complex world of cutting-edge AI more approachable while never shying away from the industry’s ethical and political challenges.
Missed an episode or need to catch up quickly? This installment offers a robust survey of where AI is headed—in products, platforms, chips, models, research, and the increasingly fraught political context shaping the industry.