Loading summary
Francois Chollet
I think we're probably looking at AGI 20, 30, around the time that we're going to be releasing, like maybe arc6 or arc7. You're not going to stop AI progress. I think it's too late for that. And so the next question is, okay, like, AI progress is here. It's actually going to keep accelerating. How do you make use of it? How do you leverage? How do you ride the wave? That's the question to ask
Interviewer 1
Foreign.
Interviewer 2
To be joined by Francois Chollet, founder of the ARC Prize, a global competition to solve the ARC AGI benchmark. His latest project is ndia, a lab exploring a new paradigm in frontier AI research. Francois is one of the best people in the world to help us understand the current AI moment and where all of this is going to. Francois, thank you so much for joining us today and congrats on the launch of ARC AGI V3.
Francois Chollet
Thanks so much for having me. I'm super excited to be here. Super exciting time to talk about AI.
Interviewer 3
So, Francois, tell us a little bit about ndea. So what exactly is it and what are you guys trying to achieve?
Francois Chollet
Right, so NDIA is this new AGI research lab and we are trying some very different ideas. And so our goal is basically to build this new branch of machine learning that will be much closer to optimal. Unlike deep learning, all of us right
Interviewer 2
now are sort of taken by what's going on with code. I have sort of this viral moment right now where I got to 40,000 stars this morning on GStack. So it's like, oh, this is an open source project that now is one of the biggest ones. And I have more than 100 PRs from contributors to deal with. I guess you're, you know, one of the best people to talk to about this because you're actually literally coming up with something that is a totally different pathway.
Francois Chollet
That's right. That's right. So what we're doing at NDIA is we're doing program synthesis research. And when I talk about program synthesis, often people ask me, oh, so are you doing like cogen? Are you building an alternative to coding agents? And it's actually not at all what we are doing. We are working at a much, much lower level than that. What we're actually doing is that we are trying to build a new branch of machine learning, an alternative to deep learning itself, rather than coding agents. Coding agents are this very, very high level, last layer piece of the stack. And we're actually trying to rebuild the whole stack on top of different foundations. So we're building a new learning substrate that's very different from parametric learning, deep learning. So if you go back to the problem of machine learning, you have some input data, some target data, and you're trying to find a function that will map the inputs to the targets and that will hopefully generalize to new inputs. And if you're doing deep learning, what you're doing is that you have this parametric curve that serves as your function, as your model, and you're trying to fit the parameters of the curve via gradient descent. And this is basically what we are doing, except we are replacing the parametric curve with a symbolic model that is meant to be as small as, as possible. It's like the simplest possible model to explain the data, to model what's going on. And of course, if you're doing that, you cannot apply gradient descent anymore. So we are building something that we call symbolic descent, which is like the symbolic space equivalent of gradient descent. The idea is to build this new machine learning engine that's giving you extremely concise symbolic models of the data you're feeding into it, and then we're going to make it scale. And so everything you're doing with machine learning today with parametric curves, we should be able to do it with symbolic models in the future in a way that will be much, much closer to optimality. Much closer to optimality in the sense that you're going to need much less data to obtain the models. The models are going to run much more efficiently at inference time because they're going to be so small. And because they are so small, they will also generalize much better and compose much better. You know, the minimum description length principle, that the model of the data that is most likely to generalize is the shortest. And I think you cannot find a model like this if you're doing biometric learning. You need to try symbolic learning.
Interviewer 2
That's fascinating.
Interviewer 4
So the rest of the industry is just pouring more and more billions of dollars down an approach that was set years ago. Can you help make the case for why you think that it's the right thing to explore alternate approaches instead of just to keep putting more money into the current approach?
Francois Chollet
I mean, everybody is building on top of the LLM stack these days, which makes sense because the returns are there, like it's actually working. So it would seem very sensible for everybody to just be doing what seems to be the currently most proactive path. But rtech is actually counterproductive to have everybody working on the same thing. I personally don't think that machine learning or AI in 50 years is still going to be built on this stack. I think this is a stack that is very nice, maybe it even gets us to AGI, but it's not as efficient as it should be. I think it's inevitable that the world of AI will trend over time towards optimality. And so I'm trying to sort of leapfrog directly to optimality to build the foundations of optimal AI today. But in general our vision is very ambitious. And I'm not saying that we're going to be successful. We have maybe a 10 or 15% chance of success, but that is enough that it's worth trying. Right. And I think in general among listeners, if you have a big idea and it has very low chance of success, but if it works, it's going to be big and no one else is going to be working on it. It's not something popular, it's not something, if you don't do it, no one else will do it. And this is basically our situation. If you're in this situation, then you should try a chance, you should go and work on it.
Interviewer 4
I mean, that's almost like the mission statement of Y Combinator, the thing that you just said.
Francois Chollet
Yeah. The reason it's important is that again, if we don't do it, no one else will do it. Right. So it's worth trying. Even if we don't succeed, it's worth trying.
Interviewer 3
Has the success, well, very specifically, of the coding agents, I guess, built on top of the LLM stack, has their success surprised you at all in particular, like say over the last six months or so?
Francois Chollet
Yeah, absolutely. I think it has surprised many people. It definitely did surprise me. If you look at why everything is starting to work so well with coding agents, it's really because code provides you with a verifiable reward signal. And I think right now we're in this situation where any problem where the solutions you've proposed can be formally verified and you can actually trust a reward signal. It's not just some guess made by a model. Any domain like this can be fully automated with current technology with the LLM based stack. And code is sort of like the first domain to fall, but there will be many others in the future. I think mathematics is also primed to see a revolution in the next few years for the same reasons. Again, because the domain just gives you verifiable rewards.
Interviewer 1
I guess the challenge for a formally verified domain is you have to somehow take a domain and make it verifiable, which is the trick I mean code is very natural. You could test there's bugs, compiles, et cetera and mathematics as well, whether all the theorems and proofs work out. I guess it becomes more nebulous when you go a couple degrees off where there are fields that are not naturally formally verified and you need to come with a, again with some sort of a function to come up with that reward that makes it verifiable. With very fuzzy things like let's say English language and composing the perfect essay. How do you make that formally verifiable?
Francois Chollet
Yeah, yeah, absolutely. I mean writing essays is the typical example of a domain that's not verifiable. And so what you're going to see is that progress of reasoning models and Bayes LLMs on this type of, of domain is going to be very slow because the stack we're using, like the LLM stack, is very, very reliant on its training data. It's basically just operationalizing the trained data. And for writing essays, the trained data is coming from human experts like annotating Ansys and that's costly. So you're going to see this very, very slow progress, maybe it's even going to stall. But for any verifiable domain, like take code for instance, what was the big unlock is when people started creating this code based training environment for post training where the reward signal, the verification signal is provided by things like unit tests and so on. And so that means that the model was not just working from human provided annotations, it was actually trying its own things, verifying the answer and generating a lot lot more string data in the process. A much denser coverage of the problem space and not just coverage in terms of is the answer right or wrong, but also starting to build models of the execution traces so that the models could start incorporating an execution model. Very much the way that human programmers, when they look at code they're sort of like executing the code in their minds. They, they keep track of the value of variables and so on is also what the models are trying to do now. And this is why it's working so well. And it's possible because you're working with this very formal, fully verifiable environment. You cannot do that with ss, you cannot do that with LAR or many other problems.
Interviewer 1
I think I really like how you define intelligence and how to measure it. Which brings to the question of also having you share the history of rkgi.
Francois Chollet
Yeah, so my definition of general intelligence, many people around the industry these days, they say AGI is going to be a system that can automate most economically valuable tasks. And to me that definition, it's about automation, it's not about intelligence, it's not about general intelligence. So my definition is AGI is basically going to be a system that can approach any new problem, any new task, any new domain, and make sense of it, like model it, become competent at it with the same degree of efficiency as a human could. So meaning it's going to need basically the same amount of training data and training computes as a human would, which is very little. Humans are really, really data efficient. So general intelligence is human level skill acquisition efficiency on the same scope of tasks that humans could potentially learn to do.
Interviewer 4
Do you think it's possible that we will accomplish the first definition of AGI, the automate most economically useful work, before we accomplish your definition?
Francois Chollet
Absolutely. I think that's a trajectory that we're on right now. And I think it's already true that in principle current technology can fully automate at human level or beyond any domain where you have verifiable rewards and code being the first one. And I think figuring out agr, figuring out human level learning efficiency over arbitrary tasks, that's probably going to take a different sort of technology, a different mindset, a different approach.
Interviewer 4
Do you think that LLMs can be bent to have the same sample efficiency as humans? Or do you think it's fundamentally just impossible and we need a new approach and that's, that's the thing that you're hoping to solve.
Francois Chollet
With enough compute, everything starts looking like everything else. Compute is a great equalizer. Every approach starts looking the same. And I think it's possible in principle to build something that looks a lot like AGI on top of the LLM stack, but it's not going to be LLMs per se, it's going to be this new layer, perhaps it's going to be even a few layers above, not just one layer above, but a few layers above. But you can build it on top of LLMs because LLMs are kind of computer. Right. I do believe, however, this would be the wrong thing to do because it would be very inefficient. I think AI research will have to trend towards not just efficiency, but in fact optimality over time. And for this reason future AI in a few decades it's not going to be this harness on top of a reasoning model on top of a base LLM, it's going to be much, much lower than that.
Interviewer 4
To Diana's question, do you want to talk about how you actually designed ARC AGI and why it's a good barometer of that.
Francois Chollet
I mean I've been doing deep learning for a very, very long time. And initially my take, my mindset was that deep learning was going to be able to do everything.
Interviewer 1
You were the creative at Keras before even all the other frameworks became very popular.
Francois Chollet
That's right. I was trained deep cleaning model for natural language processing in fact in 2014. And from that work I actually started developing this open source library which I released in fact exactly 11 years ago, March, March 2015. So it was chaos. And then it got popular and then I ended up sort of like doing less of the research that I started chaos for and, and more working on the framework itself just because it has really good product market fit. And so my take around that time around, like 2015, 2016 was that deep learning was extremely general, that you could do everything with deep learning that you didn't need in anything else. It was true incomplete. So my tech was basically that deep learning was differentiable programming. So anything you would do with software, you could in principle train a deep learning model on the right inputs and outputs to do the same thing. And in 2016 I was doing research at Google Brain on trying to train deep learning models to help with reasoning problems and in particular first order logic problems, theorem proving and so on. And I started finding that you could not really get gradient descent to encode sort of like reasoning style algorithms. It was not because the models could not represent these algorithms, it was because gradient descent could not find them. So the problem was that it wasn't about deep learning not being trained complete or anything like that. That was not the problem. The problem was gradient descent. Gradient descent would not find generalizable programs. It would instead end up doing overfit pattern matching right over sequences of input
Interviewer 2
tokens, which I guess people could argue like that's what's happening.
Francois Chollet
I mean it's useful, it's happening today in a slightly higher level version of
Interviewer 1
it's with a lot of data. So it doesn't feel like overfitting because the data has a lot more distribution
Francois Chollet
with a lot more data. And also I think models today, they're a lot more compressive of the data, which is why they generalize better.
Interviewer 2
All models are wrong, but some models are useful. And then I guess what I'm hearing is like your method might find the right model.
Francois Chollet
That's right. That's where the idea came from. And I was like, you know, at the time, back in 2016, 2017, I was like, okay, we're going to need a benchmark to capture these ideas. We're going to need a program synthesis benchmark. And my mental model for that was imagenet. I was like, oh, I'm going to make the imagenet of reasoning. So I started brainstorming a few ideas around 2017. I explored many different things. I tried working with in particular cellular automata, like a setup where you show a model solar automata outputs and it must recreate the program that generated them, like that sort of thing. And eventually I settled on the RKGI format around like early 2018, you know, I was doing this on the side. It was a side project. Like my main project was developing Keras at Google. I wasn't moving very, very fast on that. So summer 2018, I wrote the ARC Task Editor. And then I started just making lots of tasks by hand. And about one year later I had made 1,000 tasks. And so I wrote up the paper that was explaining what this was about, what the big idea was like, intelligence as skill, acquisition, efficiency. And I published all of that in 2019 in parallel.
Interviewer 1
GPT 32020 was coming out and starting to show signs until the ChatGPT moment around 2022 end of the year. And the industry took off with that. And this was one of the benchmarks that was really performing really badly. And it was very obscure. I don't think many people knew about it. It was mostly niche research communities that maybe read your paper.
Francois Chollet
Yeah, people who worked on program thesis knew about it, but a lot of people who worked on deep learning, on scaling up LLMs, they didn't really care for it. And part of the reason why is because LLMs did not work well or at all on the benchmark. For a benchmark to capture the attention of the research community, it needs to start working a little. If it's too hard, people are just going to dismiss it.
Interviewer 2
You're just ahead of your time, clearly, because we're not on Arc AGI 1 anymore and then 2 is reaching saturation and then 3 is out now.
Francois Chollet
Yes.
Interviewer 1
And I think the cool thing about ARC AGI, it has been a very good barometer for the industry of the big changes that happen. Because V1 was not working at all for a long time until 2025 when reasoning models came out, right?
Francois Chollet
Yeah, absolutely. If you look at frontier performance on arc v1 first and then v2. So base LLMs were scoring extremely low on v1, like sub 10%, basically. And I mean, it was true of the original lag GPT3 scoring zero. But that's even true of the latest bezel lamps today, as of March.
Interviewer 2
Without reasoning.
Francois Chollet
Without reasoning, yeah. So the base models. So performance of bezel lamps on V1 stayed very, very low, even though in the meantime we had scaled up these models by 50,000x. Right. So it was really telling you that more scale scaling up pre training alone was not going to crack the benchmark. This was not enough to demonstrate that the model had fluid intelligence. And then the moment models started performing well on arc one was with the first reasoning models, in particular the OpenAI 01 and then O3 models, which by the way, they were demonstrated by OpenAI on arc because it was the one unsaturated reasoning benchmark that was really showing that this model was different, it had new capabilities that we had not seen before. And so with reasoning models you start seeing this sudden step function change on ARC1. And so ARC1 was really the benchmark that signaled that at this moment in time, something was happening.
Interviewer 1
Something big.
Francois Chollet
Yeah, something big. Like new capabilities were emerging. Like reasoning was new and different and it was actually not obvious at the time. Like, you know, I don't know if you remember when O3 preview was announced by OpenAI.
Interviewer 1
That was the end of 2024 actually.
Francois Chollet
Yeah, December 2024. And sure, it was huge step function progress on ARC, but it was very expensive. It did not really have product market fit effectively. But if you looked at ARC results, you knew that this was big and important. And then we released Arc 2, which was the same format but more difficult, like with more composition at the level of the reasoning chains. And what happened is that. So the earliest reasoning model started very, very low on arc 2. And then around the same time as coding agents started working, you saw this. Yeah, so very, very recent. Just a few months ago, you saw this very, very fast saturation of R2. And so again, R2 signaled that yes, there was this new set of capabilities emerging. Setting the benchmark did a really good job at capturing the advent of reasoning models and then the advent of agent decoding, like this new pattern where if you have verifiable rewards, then you can basically fully automate the domain. Which by the way is true of arc. Like ARC does provide a verifiable reward,
Interviewer 2
I guess, for v2. What caused the so one was clearly reasoning two. A benchmark doesn't care how you solve it, I guess embedded in what you said. Like were people using CodeGen to then solve.
Francois Chollet
That's right. Not necessarily CodeGen per se, but the Frontier Labs has been targeting ARGV2. And the progress you saw on ARGV2 is actually a result of this very, very large scale targeting. So what you can do to solve ARGV2 is you ask your reasoning model to make more tasks like those in the benchmark, and then you try to solve them using, let's say, program induction, for instance, still using your reasoning model. Then you verify the solution again, it's verifiable, so you can trust the answer. And then you fine tune the model on the successful reasoning chains, and then you keep repeating like you generate new tasks, you solve them, you verify the solution, you fine tune the model on the reasoning chains, and you can keep doing this millions of times. Right, Right. Like you just need to spend more money.
Interviewer 2
This is the RL loop that's happening. Yeah, exactly.
Francois Chollet
And the new paradigm in AI is basically that any domain where this is true, where you have the ability to generate these true verification signals, you can run this kind of loop. Right? If you can run this kind of loop, you can mine, you can brute force mine effectively the entire space and get extremely high performance. This is basically the process through which arc2 was saturated. So what it tells you is that it's not so much that the models have higher fluid intelligence than they did with the first reason models, it's just that you have this new paradigm of post training. And this is exactly what led to agency coding. So it does matter. It is valuable, it is useful.
Interviewer 2
It's not that the models are smarter, it's that they're suddenly more useful. And it's possible to be more useful in particular domains without being smarter, clearly, because that means good things for me. I'm not getting any smarter right now. At age 45, you know, I can learn how to do things. And that's sort of what's happening with the models as of like, late.
Francois Chollet
Yeah, absolutely. When it comes to competency, there's always a trade off between intelligence and knowledge. If you have more knowledge, if you have better training, you need less intelligence to be competent. And that's exactly what happened with the rise of coding agents. Right? The models don't have higher fluid intelligence per se. They don't have like a higher iq, so to speak. It's just that they're way better trained. And they're way better trained in two ways. So they're not just trying to autocomplete code anymore. They're actually trained via trial and error in these il post string environments with true reward signals. And also they're trained to embed this model of code execution where they learn to keep track of the value of variables over an execution cycle. And that's what's leading to this extremely strong product market for it of agency coding today. And three, it's completely changing software engineering.
Interviewer 1
This happened not too long ago. The saturation we actually had the founders of Poetic that came and spoke about the approach, which is really sounds like this new way of getting LLMs to perform is building this agent hardness, right? And the hardness is basically structuring a problem domain into something that can be formally verified. And they did that basically for Ark V2, which when they released it, they were at the top of the benchmark. But then the crazy thing is I actually worked with a company in the winter 26 batch not too long ago called Confluence Lab, which actually ended up saturating the V2 results with 97%. And I think their task cost was a lot more efficient too. And the approach they basically took is similar to this. I think they build the harnesses on top of it in order to get the LLMs to go and build different tasks and program through it. Which then for me I was like, wow, is this batch? During the batch they only worked on it for a couple of months and they were able to saturate this benchmark that has been around for a long time. It's like something special is happening.
Francois Chollet
Yeah, yeah, there's a lot of progress right now that's driven by custom harnesses around the task. And the harness is basically a way for the human programmer to input into the model higher level solution strategies, basically. I mean, to me, the fact that you need humans to engineer these harnesses is also a sign that we're short of AGI today. Because if we had AGI, AGI would just make its own harness. It would not need to be told how to solve a problem, it would just figure it out. But it is very effective harnesses. I don't think they get us closer to AGI in any sense, but it's a very valuable area of research because that can lead to task automation at scale.
Interviewer 2
YC's next batch is now taking applications. Got a startup in you apply@ycombinator.com apply. It's never too early and filling out the app will level up your idea. Okay, back to the video.
Interviewer 1
Can you tell us about then what V3 is going to measure that's just got released?
Francois Chollet
Yeah, absolutely. So if you look at V1, V2, it was really focusing on your ability to produce causal models of a pattern that was just Given to you. The data was given to you. So it was static, it was passive, and really focused on modeling. And V3, it's completely different. We are trying to measure agentic intelligence. So it's interactive, it's active. Like, the data is not provided to you. You must go get it. The idea is that your agent is dropped into a new environment, which is kind of like a mini video game. And it's not provided in instructions, it's not told what to do, it's not told what the goal even is or what the controls even are. And it must figure out everything on its own via trial and error. So we are not just measuring the AI's ability to model its environment. We are also looking at its exploration efficiency, its ability to acquire goals on its own, like goal setting, and of course, its ability to plan through the model of the environment that's created and to execute the plan. And so together, all of these abilities, we call that agentic intelligence. And we are looking for AI systems that could learn to play these games and crack them with the same degree of action efficiency as a human. If you look at the human, they are dropped into this new environment, they try a few things, they start understanding how things work. They can solve the environment in a few hundreds to thousands of actions. We're trying to look for AI systems that could match this efficiency. And by the way, we know that all of these Test environments in arc 3 are solvable by humans with no prior training, because we actually tested them on regular people. Yeah, at first you just see this screen and you have these keys available, but you know what they do, and you must figure out everything from scratch. And humans are really good at that, by the way. They're really good at exploring efficiently, at making sense of something new, and eventually cracking the game. And frontier models today, they are not very good at it.
Interviewer 4
If the riesling models cracked v1 and the reinforcement learning environments cracked v2, do we need a new advance to crack v3? Do even the best techniques currently not work?
Francois Chollet
Yeah, I mean, I'm pretty curious to see how Frontier Labs are going to react to V3 and how they're going to start targeted. It is designed to be more resistant to the same kind of targeting strategy as what we saw for VTune in particular. Like, of course, you can try to just make more arc3 like games and then train your agents in them. But the thing is, we've deliberately tried to create a private set of environments that is significantly different from the public set. Like, you can look at the Public set. It's not actually giving you that much information about what's in the private set. In the private set you will have very different games with very different concepts. And, and also the public set is meant to be substantially easier. Your performance on the public set is not actually, it's not representative of how well the system will do on priority. So for this reason it's going to be harder to target and that makes it a better test of fluid intelligence as opposed to a test of how much effort you put into cracking it.
Interviewer 4
I'm so curious, how do you come
Francois Chollet
up with these games?
Interviewer 4
They're so creative.
Francois Chollet
Yeah. We set up an entire video game studio.
Interviewer 3
Right.
Francois Chollet
To create them. So we got over 250 games and they're pretty quick to play. Each game takes you maybe 10 minutes or a bit less to play from scratch upon first contact. And we have 250 plus and we set up this very proactive game studio where we had any given week we had multiple games in progress. We had this pipeline including design, implementation, review, human testing and many iteration cycles to make sure that the game comes out. Right.
Interviewer 4
Who's working in the studio?
Francois Chollet
Right. Yeah. We hired a team of game developers and we built our own game engine.
Interviewer 4
Wow. So it's actually people who previously worked in the video game industry.
Francois Chollet
That's right. So one thing to keep in mind though is that the games in Arc 3 are unique. Right. They're trying to not borrow elements, concepts from previous video games. They're built entirely on top of core knowledge priors like things like just elementary knowledge like basic physics, understanding of objects, understanding of the notion of agents for instance, like an agent in objects with goals and intentions. But we are not incorporating any language, any cultural symbols, like arrows for instance, or the color green meaning go and color red meaning star, that sort of thing. There's no external knowledge that's involved in these games.
Interviewer 1
It's like one of those IQ tests that are just pattern matching. But now it has time series.
Francois Chollet
Yeah, it's not just time series, it's interactive. You must create your own path through game space. Right. You must, you know, in an accutes like problem, like you know what arc 1 and 2 is, the data that you must model is provided to you. You already have the data, you just need to find the causal rule to explain it. With R3 you actually must gather the data and you must do so efficiently. Like of course you could say, well I'm just going to brute force mine the space of every possible game state and then I find A solution. You cannot do that because if you try to do that, you would score extremely low. Even if you manage to solve the level, because you're scored on your efficiency, you must match human level efficiency.
Interviewer 2
It's funny, it's like almost coming full circle. This level of AGI with games sort of is the match pair to OpenAI writing. I mean, you know, Tom Brown, one of the co founders of Anthropic, had to write like the harness code to allow pre GPT AI at OpenAI to play StarCraft.
Francois Chollet
Yeah. OpenAI worked in particular on Dota 2, the OpenAI 5 model, which was, if I recall correctly. So this was not just pre GPT but also mostly pre Transformers because they were working with a stack of LSTM layers, if I recall correctly. And even before OpenAI, DeepMind worked a lot on video game solving video games via Deep Aisle. And they were the first to do Atari Games back in 2013. They were very, very early, very visionary in that sense, to work on this problem so early with these methods, which are still very modern methods. So the big difference is that if you look at Atari games, for instance, I even do that your training on the same environment as what you use for testing. So effectively you're just trying to memorize the best strategies. You're trying to at training time, explore the full space of possible game states and productionize, operationalize that knowledge into the model. And then at inference time, you're basically just recalling that knowledge and that's explicitly what you're trying to avoid. With Ark 3, you're not playing games that you've seen before. You're not playing games that you've been trained on like for millions of hours. Like the OpenAI 5 model, for instance, was playing a restricted version of Dota 2 and it was trained on like tens of thousands of hours of gameplay, effectively, I think maybe in millions. So it's just an insane amount of. With arc3, you're being evaluated on games that you're seeing for the very first time. And every action you spend exploring is counted towards your efficiency score.
Interviewer 2
Right.
Francois Chollet
So you're really focused on measuring fluid intelligence, your ability to efficiently explore, efficiently produce a world model of the environment, and then use this model to infer goals, plan towards these goals, and eventually crack the game.
Interviewer 2
One of the arguments for NDEA is that you're able to do all of the intelligent tasks. For an ARC task might be like 0.3 cents for an arc task, but for the same task on a foundation model with LLMs, a dollar to $10. And then there's this other aspect that we've been tracking where it seems like more and more intelligence, at least on the LLM side, can be distilled down into smaller and smaller models. And so, on the one hand they're scaling up, but then they're like, distilling smarter and smarter small models. I guess your approach might indicate that it's not billions of parameters like the, you know, ndea. Achieving AGI might not be, you know, sort of inherently a scale thing at all. There's a Platonic ideal of the NDIA model that achieves AGI. Do you ever think about it in terms of, like, well, it would fit on a floppy disk.
Francois Chollet
Well, okay, there are two things to separate. There's the sort of, like, Fluid Intelligence engine. I think it's going to be a very, very small code base and a very small set of models associated with it. And it's probably going to be on the order of megabytes. Right. And then you have the knowledge base, so to speak, that's going to be layered below this Fluid Intelligence engine. Like, fluid intelligence has to draw on some knowledge, and that knowledge is going to take up a lot more space. I think it's important to differentiate the two. I do believe that when you create AGI retrospectively, it will turn out that it's a code base that's less than 10,000 lines of code and that if you had known about it back in the 1980s, you could have done AGI back then using the compute resources available back then.
Interviewer 4
Wow, that's a crazy prediction.
Francois Chollet
I think retrospectively this will turn out to be true. Wow.
Interviewer 4
So it was just, like, hiding under our noses in plain sight for, like, 40 years. It took us, like, 40 years to figure it out.
Francois Chollet
That's right. That's right.
Interviewer 2
Well, that second thing sounds like Douglas Lynott's psych project. Or is that the wrong way to think about it? It's like there's sort of knowledge about the world and then there's methods, like the program. What I hear is, like, the program might be 10,000 lines, and then it
Francois Chollet
operates on, like, on knowledge base that's very large. So the problem with psych, I mean, there were many issues with it, but one of the big issues is that there was no learning involved.
Interviewer 2
Yeah, it's just the knowledge.
Francois Chollet
The knowledge was handcrafted.
Interviewer 2
It was, like, purely symbolic knowledge, and it was probably inaccurate.
Francois Chollet
The way you want to be building AGI is that you want to be removing humans from the improvement loop as much as possible. You don't want a system where every improvement in system capability has to involve a human engineer doing something. And it's actually the strength of deep learning and foundation models is that you can just scale up the knowledge base. Like an LLM is effectively a knowledge base. It's a bank of modular vector programs that map patterns of input tokens to patterns of output tokens. And you can scale up that knowledge base by just adding training data and training compute with no further human involvement. I mean, of course there's still a little bit of human involvement in making sure the training job completes, but it's minor. You've managed to remove humans from this improvement as much as possible. And that's also what we want for our system. We want a system that's self improving, where the improvements are compounding, meaning that every time the system increases its capabilities, it's also increasing the rate at which it increases its capabilities.
Interviewer 2
I think this is a PG ism. It's like, I'm sorry the essay is so long, if I had more time I would make it shorter.
Francois Chollet
Yeah, when you're looking at a hard problem, it's actually harder to produce a short, elegant, concise solution than a messy over engineered solution.
Interviewer 2
Yeah, you can brute force it, but you know, the more elegant version is very, very short. And that's kind of like what you said with how this might come about.
Francois Chollet
Yeah, this is literally the shape of the type of AI approach we are creating. And I think this is also the shape of science itself. Like science is fundamentally a symbolic compression process where you're looking at a big mess of observations like the position of planets in the sky or something like that, and you're compressing that down to a very simple symbolic rule. You're saying, yeah, all these thousands of observations, actually just all this one simple equation, that's symbolic compression. And to do this, by the way, you need the model to be symbolic. You could not fit a curve and say, well, that curves my model. The, that would never be optimal. It would never be concise or elegant enough. And that's not what science is doing. Science is not about curve fitting. Science is about finding the equation, finding the most compressive symbolic model of your pile of observation. And that's the process that you are trying to recreate in software form. Like you could say that the NDI approach to program synthesis is that we are building science incarnate science, the scientific method in algorithmic form.
Interviewer 4
I'm curious if you compare it to biology. Clearly LLMs don't learn the way that humans do, because no baby reads the whole Internet. Do you think program synthesis is closer to the way that humans learn, or do you think that's yet a third branch where even if program synthesis is correct, there will be some yet as undiscovered third way to do it, which is the thing that we do?
Francois Chollet
I think so. I do think humans do some amount of programs in this. I think the way humans learn and the way the human mind works is very messy. It's not like there's one simple elegant principle behind it all. It's an implementation of fundamental principles, the fundamental principles of intelligence, which I think we can identify these principles and reimplement intelligence from scratch, from first principles in a way that will be much more efficient than the human brain. I think the human brain is messy and it can be a good source of inspiration for AI, but I think it would be counterproductive to just try to observe it and reimplement it and make it biologically plausible. I think that's counterpart. That's not what we're trying to do at ndi. We're really trying to find what are the first principles of intelligence and what is the system that would best implement them. But yeah, I do believe the human mind does at the highest level, something that looks a lot like program synthesis. Like, we're currently building causal models of our surroundings. Like we are describing our surroundings in our mind as a set of objects and agents and relations between objects that are fundamentally symbolic and causal in nature. This is exactly the process that lets us generalize so well and adapt so well to novelty on the fly.
Interviewer 3
I'm curious about ndia, the company. And as you're, as you're building it, we've all here heard sort of the OpenAI founding story. Something that's always struck with me is just like both Sam and Greg say that it was a little odd in the early days because they didn't actually know what to do. Sort of just like a bunch of people, like hanging out in an apartment. I would love to hear kind of, what's that been like for India? What did the day one look like? And just maybe for just people who are interested in starting these alternative approaches who don't have sort of a researchy background, how should they think about that?
Francois Chollet
Yeah, so we started on day one with the symbolic learning vision. Like, we basically knew that we wanted to do symbolic program synthesis, that we wanted to create a new approach to machine learning where you replace parametric curves with the shortest possible symbolic models. And then the big question Was, okay, so how do we find these models? We started from the base idea, which is still the idea that we are following today, which is that we are going to do deep learning guided program search, that you have a symbolic search space to explore and it's big, it's in fact cominatorial. You're not going to make progress if you just use brute force. It's not going to scale. You have to break the combinatorial wall. And the way to do it is to add deep learning guidance. It's actually very similar to the principles that underlie something like AlphaGo or AlphaZero. That was our starting point. We also didn't have clear ideas about how to build it. So we tried many different things, we tried many different ideas and it took us half a year roughly to get to good foundations where we could start building a system that compounds. And I think that's what's really important when doing a lab like this, that you don't want to be in a situation where you're constantly trying something new. It's not reusing any learnings, any findings from the previous approaches. You want a compounding stack, you want to build reusable foundations and then the next layer, and then the next layer, of course, you want to be building onto the right foundation. So don't commit to the foundation layer too early, but also make sure that at some point you're building this compounding structure. And that's the situation that we're in now.
Interviewer 4
Is arc 3 the end or will there be an arc 4, 5, 6? Can you keep making it harder?
Francois Chollet
Yeah, yeah, I think there will absolutely be arc 4 and arc 5. I mean, we're currently planning arc 5. The point of the arcgi benchmark series is now to say that, well, you know, here's this test. If you pass it, this is a gi Instead, what you're trying to do is we're targeting the residual gap of fair capabilities like Frontier is advancing. And we're saying, well, if you compare it to human abilities, there's all these tasks, all these things, it's now doing well. So we are going to create a benchmark to target that. And so it's a moving target, right? It's not fixed points, a moving target. So there will be arc four, which will be in the spirit of Arc three, but more focused on continual learning and curriculum learning at longer time scales. So you're going to have fewer games, but they're going to have way more levels. And the levels are going to be compounding, meaning that for Each level you need to reuse stuff that you've learned before, then that's going to be arc5. And I'm actually really excited with arc5. It's very new and different. It's all about invention. And I mean, you will see what that means. Eventually, I expect we'll run out of things to test, like as we get closer to AGI. Eventually there will be no measurable difference between human capabilities and particular human learning efficiency and frontier AI. And when that happens, when it becomes effectively impossible to measure the gap, this is the AGI moment.
Interviewer 2
Well, then the machines will take over and then they will create Arc ASI1. Yes, Arc ASI1 and then it'll continue from there. If you had to put a guess, I mean, years, decades, months,
Francois Chollet
my timeline to AGI. If you just try to extrapolate from the current rate of progress and the amount of investment that's going into not just DLLM Stack, but also like side ideas, side bets that might work out, like NDR for instance. I think we're probably looking at AGI 2030, early 2000-30s most likely. So around the time that we're going to be releasing like maybe arc 6 or arc 7, that's probably going to be AGI.
Interviewer 4
You guys are doing a different approach to LLMs. Do you think there's room for more startups to explore other new approaches? And are there any other ones that you think are promising that don't have time to explore yourself?
Francois Chollet
Yeah, absolutely. I mean, there are many different approaches that you could try. I've said compute is a great equalizer. I think if you look at the amount of compute and resources that we've thrown at deep learning and gradient descent and scaling that up, if you had thrown the same amount of investment into almost anything else, you would also have seen extremely exciting results. Like genetic algorithms, for instance. If you try to scale up genetic algorithms, I mean, I'm sure you can do incredible things with that. You could in fact probably do new science because that's based on search and search is the best fit for automating the scientific method. So right now there's also approaches that build on top of the current stack, but they're slightly alternative, like state space models, for instance. There's the XLCM architecture. You can basically current frontier AI. It's a stack of things and you can take any layer in the stack and try to propose an alternative. If you propose an alternative architecture, you can be doing for instance, more recurrent models instead of transformers for the architecture, or you can do Even lower level, you're going to be like, okay, we're still going to be training parametric curves, but you're going to get rid of clandestine.
Interviewer 2
Right.
Francois Chollet
We're going to use search. Maybe you're going to do neural evolution. That's lower level. And the lowest level is the level where we're operating, where we're saying, well, actually, forget about curves, forget about parametric learning, forget about gradient descent. We're just going to do something completely different. And I think if you want to build optimal AI, you're kind of forced to go back to the foundation of the stack. It cannot be one layer added on top of the pile.
Interviewer 1
So do you think for aspiring researchers to want to do a new NEO lab with a different approach, they should be reading research papers from the 70s or 80s and go deeply in those with approaches that were not as invested nowadays?
Francois Chollet
That is actually a great idea because earlier in the history of the AI research timeline, people were exploring more things and very different things. You've had this sort of like collapse of everything into one approach. It's actually kind of a bad idea. Consider that not too long ago, like about 20 years ago, we had the
Interviewer 1
collapse into SVMs too.
Francois Chollet
Yeah, I mean, I wouldn't describe it as a collapse because there weren't that many people doing SVMs and AI was much, much smaller field back then. But there was this widespread understanding that neural networks were a failed approach, that neural networks didn't work and it was a waste of time to keep trying.
Interviewer 1
In the 90s, right?
Francois Chollet
Yeah, no, even in the late 2000s, it was a set of things. Basically when I got into AI, people were telling me like, hey, neural networks, don't try that. I was like, yeah, but it looks a lot like what the brain is doing. I'm interested in that. If everybody is working on something, you are discarding ideas that will actually turn out to be very proactive ideas. And yeah, back in the 70s, back in the 80s, people were trying more things. And I think genetic algorithms are actually a very good example of that. I think this is an approach that has a tremendous amount of potential, but there's not too many people are looking into scaling it up deeply.
Interviewer 2
Are there any characteristics that you would be looking for? I mean, is it as simple as, like, if there's a scaling law that could happen then even if it's a different. Or is that too like, you know, thinking by analogy?
Francois Chollet
I think you are looking for approaches that scale. Yeah, I think it's a non starter. If you're working on Something. But the only way to increase the capabilities of the system is, is to have human engineers and researchers spend time on will not work. Because even if the idea is very clever and very elegant and works really well, capabilities are going to be bounded. They're going to be bounded by human investment. You want to be in a setup where the system can improve its capabilities with no human in the loop, with no human involved.
Interviewer 2
So you would say don't just do it the way we did it 10 years ago. Do it with the idea that recursive self improvement is baked in at the beginning.
Francois Chollet
Yeah, not necessarily recursive self improvement, because deep learning, for instance, is not recursively self improving, but with the idea of scaling up with no human bottlenecks. You want to remove the human from the improvement loop. The great strength of deep learning is that the models got better and better simply by adding training compute and training data. I mean, it's a little bit of caricature because of course just adding these factors requires a lot of human involvement. But basically that's the idea that you have this decoupling from the improvement curve and the amount of human effort that's needed to be injected into the system.
Interviewer 4
Yes, or human effort. That's already happened. Because the LLMs do actually require an enormous amount of human effort. It's just the human effort to build the Internet. And we'd already built it.
Francois Chollet
Yeah. Actually less and less now that we are doing training in interactive verifiable environments. Because then you only need a small amount of human effort to create the environment. And from that small amount of effort, you're creating exponentially more trained data. But at first I think to sort of like prime the machine, you need this tremendous amount of human generated abstractions encoded in text data. And if you don't start from that, you cannot get the system into this loop.
Interviewer 2
Do you have any advice for me? Starting a open source project, things to do, things not to do in. In the AI space. Because I am not sure how I signed up for this in the last 14 days, but I think I have, I don't know, on the order of like 10 to 30,000 people using GStack every day.
Francois Chollet
That's wild.
Interviewer 2
Yeah, I don't know. Like, I have a job, I guess. Like, you know, what was it like to start Keras? And how did you keep maintaining it? What's a good maintainer? Like, what did you learn from that? I don't know. This might be a whole hour.
Francois Chollet
Yeah, I mean lots of learnings from too many things from Growing Keras. So right now I'm less involved with it. There's a big team at Google that's working on it and they're doing an amazing job.
Interviewer 2
So it is possible to not, you know, to put people together to, like.
Francois Chollet
It is possible to start something. Yeah, it is possible to start something. That's a relief. And then get more people involved and at some point and it becomes its own thing and just, you know, it used to be your baby, but now it's all grown up and it's all adult and going on is its own life. So if you ask me, the factors that really made Chaos successful, I mean, first of all, is that there was this big focus on making the API simple and intuitive. There was this big focus on usability, and this was inspired by Scikit Learn. Like, Scikit Learn was sort of like the OG machine learning library for Python. And what made it successful was that it was so easy to get started with it. So at first I was like, okay, I'm going to package all this functionality I've created under a really, really simple API. It's going to be like the ScikitLearn API. That was like the big idea. The focus on usability is not just making sure the API is simple, it's also making sure the entire onboarding experience is nice and easy. Like, the docs should be very informative. The docs should be not just telling you about how to use this thing, but they should actually be teaching you about the domain in the first place. Because the folks who land on your website, they're not going to be already deep learning experts. They're going to be people looking to maybe start using deep learning. And so you have to teach them not just how to use the tool, but where the tool is good for and the entire field around it. And then you have to put a lot of investment into community building. One thing we did a bit at Google, in fact, Google made it kind of difficult, and I was sad about that, is hire your power users. Like, hire your fans. This is a really, really good idea. Find the most enthusiastic users from your community and just hire them on your team.
Interviewer 2
Amazing.
Francois Chollet
Yeah. And these are always the best people, right?
Interviewer 2
All right, time to start gstack.org put in a bunch of my own money and then hire a bunch of people to work on it. That sounds good. I think you've been a leader and pioneer and we're so lucky to have you sit with us. There are people watching who are at the beginning of their, you know, adulthood, even, like Certainly their professional careers, or actually people just around the world, they're trying to understand what does this mean as intelligence becomes broadly applicable, what would you tell if you were 18 right now? What would you tell them?
Francois Chollet
Yeah, I mean, there's a lot of people today who have very pessimistic, very negative takes about the rise in capabilities. They say, oh, you know, I'm going to be out of a job soon. There's going to be mass unemployment. AI is just going to take over completely. And my take is actually the more, you know, the more expertise you have about things like programming, for instance, the better you're able to use and leverage these tools for your own benefit. And with the right kind of expertise, all this AI progress is actually empowerment, like it's something that you can leverage for yourself. I mean, that's exactly what you did with your project. Right. And, yeah, more people should have this mindset of trying to learn as much as possible, not just about AI, but about the domain that they want to apply AI to, so that they should seek to turn this new development into an opportunity, into a tool they can use for themselves to improve their own lives. I think that's the right mindset, because you're not going to stop AI progress. I think it's too late for that. And so the next question is, okay, AI progress is here. It's actually going to keep accelerating. How do you make use of it? How do you leverage? How do you ride the wave? That's the question to ask.
Interviewer 2
I wish we could keep going for a couple hours, because I'm sure we could. Francois, thank you so much for spending time with us.
Francois Chollet
Thanks so much for having me. It.
Date: March 27, 2026
Guest: François Chollet
Main Theme:
A deep dive into alternative approaches to Artificial General Intelligence (AGI), focusing on François Chollet’s NDIA lab, the ARC Prize and benchmark, and the broader landscape of AI research beyond large language models (LLMs).
In this episode, the Y Combinator team interviews François Chollet, founder of the NDIA AGI research lab and creator of the ARC Prize, which challenges teams globally to solve the ARC AGI benchmark. Chollet discusses why he believes current deep learning approaches are not the ultimate path to AGI, details NDIA's radically different machine learning paradigm, and explains the motivations and evolution behind the ARC AGI benchmarks (now in its third version). The discussion is both philosophical and highly technical, providing perspective on the future of intelligence, the inefficiencies of current methods, and why exploring alternative AI architectures is vital.
AGI by 2030?
Chollet predicts that AGI could arrive around 2030 – early 2030s, paralleling the release of ARC AGI versions 6 or 7.
"I think we're probably looking at AGI 2030, early 2030s most likely. So around the time that we're going to be releasing maybe arc 6 or arc 7." (00:00, 45:50)
The Wave of Progress
It’s too late to "stop" AI—adoption and acceleration are inevitable. The right question is how to ride and benefit from the wave.
"You're not going to stop AI progress. I think it's too late for that. And so the next question is...how do you leverage? How do you ride the wave? That's the question to ask." (00:00, 55:44)
NDIA’s Goal
NDIA aims to build a fundamentally new branch of machine learning: optimal symbolic learning, not neural net-based parametric learning.
"We are trying to build this new branch of machine learning that will be much closer to optimal. Unlike deep learning." (01:08)
Program Synthesis Explained
Instead of fitting huge parameterized models via gradient descent, NDIA is building new learning engines that generate concise symbolic models (short programs) to explain data. It eliminates reliance on raw scale and data hungriness.
"We're building a new learning substrate...replacing the parametric curve with a symbolic model that is meant to be as small as possible...you need to try symbolic learning." (01:51)
Symbolic Descent
NDIA’s symbolic descent is a "symbolic space" analog of gradient descent, meant to yield highly efficient, generalizable models.
"We're building something that we call symbolic descent, which is like the symbolic space equivalent of gradient descent." (01:51)
Efficiency and Optimality
Symbolic models require less data and are more interpretable and efficient at inference time—they generalize better due to their simplicity.
"Much closer to optimality in the sense that you're going to need much less data...the models are going to run much more efficiently...they will also generalize much better." (01:51)
Dangers of Monoculture in Research
It’s counterproductive for everyone to work on the same LLM stack. Chollet doubts LLMs will be the long-term foundation for AI/AGI.
"I personally don't think that machine learning or AI in 50 years is still going to be built on this stack...I think it's inevitable that the world of AI will trend over time towards optimality." (04:39)
On Defining Intelligence and AGI
Widely used definitions (automating most valuable work) miss the essence of intelligence. True general intelligence is measured by how efficiently a system can learn new tasks like a human.
"AGI is...a system that can approach any new problem...and become competent at it with the same degree of efficiency as a human could." (09:52)
LLMs Lack Sample Efficiency
Current LLMs are not sample-efficient like humans—even if they achieve similar task automation, they do it through massive data and compute.
"LLMs aren’t fundamentally sample-efficient. I do believe, however, [building AGI on LLMs] would be the wrong thing to do because it would be very inefficient." (11:41)
ARC AGI started as a reasoning and program synthesis benchmark highlighting deep learning’s limitations. LLMs performed poorly on v1 and v2 until reasoning/agentic models and verifiable reward harnesses were introduced.
"Base LLMs were scoring extremely low on v1, like sub 10%...scaling up pre training alone was not going to crack the benchmark." (17:55)
Agentic reasoning and fine-tuning using verifiable signals (like in code or math) allow models to perform and automate narrower domains efficiently but don't equate to fluid intelligence
"It's not that the models are smarter, it's that they're suddenly more useful...the models don’t have higher fluid intelligence per se, it’s just that they’re way better trained." (22:36, 23:00)
ARC v3 moves from static, passive pattern matching to interactive, agent-in-environment tests (think ‘mini games’).
"V3...is completely different. We are trying to measure agentic intelligence. So it's interactive, it's active. Like, the data is not provided to you. You must go get it." (26:08)
To succeed, an AI agent must explore, discover goals, build a model of its environment, plan, and act efficiently—just like humans do when playing a new game with no instructions.
"Your agent is dropped into a new environment, which is kind of like a mini video game...and it must figure out everything on its own via trial and error." (26:08)
ARC v3 focuses on agentic intelligence, exploration efficiency, and true fluid reasoning. It's much harder to "brute force" via massive compute or overfitting, and is scored on action efficiency.
"If you try to brute force mine the space of every possible game state...you would score extremely low. Even if you solve the level...you're scored on efficiency. You must match human level efficiency." (31:20)
Development required building a full video game studio and creating hundreds of unique, knowledge-neutral environments to avoid test leakage.
"We set up an entire video game studio...We hired a team of game developers and we built our own game engine." (29:37, 30:20)
"The point...is not to say, here's this test. If you pass it, this is AGI. Instead...we're targeting the residual gap of fair capabilities...Eventually there will be no measurable difference between human capabilities and frontier AI...this is the AGI moment." (44:04)
Conciseness is Ultimate Intelligence
The shortest, most elegant models generalize best: science as symbolic compression, not curve fitting.
"Science is fundamentally a symbolic compression process...you're compressing that down to a very simple symbolic rule...we are building science incarnate: science, the scientific method in algorithmic form." (39:40)
Learning from the Past
Chollet argues revisiting and scaling up old ideas (genetic algorithms, 1980s research, etc.) could yield breakthroughs—current research focus is too narrow.
"Earlier in the history of the AI research timeline, people were exploring more things and very different things. I think genetic algorithms are actually a very good example of that." (48:41)
Remove Humans from the Improvement Loop
Successful future systems scale with compute, not human engineering; improvement must be automated and self-compounding wherever possible.
"You want to be in a setup where the system can improve its capabilities with no human in the loop, with no human involved." (50:22)
Advice to New Builders
Deep focus on usability, onboarding, and cultivating the community are key to long-lasting open-source impact. Hire your super-users!
"There was this big focus on making the API simple and intuitive...focus on usability...But you have to put a lot of investment into community building...Hire your power users." (53:12)
Actionable Mindset for the Next Generation
AI is not to be feared but leveraged. Those with expertise—especially in programming—will be able to ride the wave and create new opportunities.
"The more expertise you have...the better you're able to use and leverage these tools for your own benefit...AI progress is actually empowerment." (55:44)
On Foresight and Odd Jobs in Research:
"If you have a big idea and it has a very low chance of success, but if it works, it's going to be big, and no one else is working on it... then you should try." (06:09)
On Recapturing Symbolic Insight:
"Science is not about curve fitting. Science is about finding the equation, finding the most compressive symbolic model of your pile of observation." (39:40)
On What AGI Will Eventually Be:
"When you create AGI retrospectively, it will turn out that it’s a code base that’s less than 10,000 lines of code and that if you had known about it back in the 1980s, you could have done AGI back then..." (36:30)
On the Human Element:
"The fact that you need humans to engineer these harnesses is also a sign that we're short of AGI today. Because if we had AGI, AGI would just make its own harness." (25:07)
| Timestamp | Segment/Topic | |-----------|--------------| | 00:00 | François Chollet’s AGI timeline and inevitability of progress | | 01:08 | NDIA’s symbolic program synthesis – moving beyond deep learning | | 04:39 | Why not just keep building on the LLM stack? | | 09:52 | Defining AGI: Efficiency in skill acquisition | | 17:55 | How ARC benchmarks revealed LLM and reasoning limitations | | 26:08 | ARC AGI v3: Measuring agentic intelligence with interactive games | | 31:20 | Why brute-forcing ARC v3 won’t work—efficiency as the bar | | 39:40 | Science as symbolic compression; NDIA’s core insight | | 44:04 | The future of ARC AGI: a moving, expanding benchmark | | 48:41 | Value in revisiting overlooked AI techniques from previous eras | | 50:22 | Automating improvement—removing the human bottleneck | | 53:12 | Chollet’s lessons on open-source projects and community | | 55:44 | Advice to the next generation: treating AI as empowerment |
Closing Note:
François Chollet’s approach is a reminder that radical progress in AI might come from off the beaten path—from those willing to reimagine the entire stack and rethink what "intelligence" really means.