Loading summary
Dr. Sebastian Risi
We can take an example of how nature evolved intelligence and use evolution instead. The nice thing is it doesn't only have to be the weights like in standard back propagation, but it can be the architecture. We can start from a completely random network. The only thing it has the learning rules, but otherwise the weights are completely random. And in a few steps the network can self organize. When you use a static fixed network that is not changing the weights during its lifetime, if you cut off a leg, it will probably fail because it can't adapt. But these Hebbian networks, they change the weights all the time. It's basically like a continually learning, updating system where you can cut off a leg and oftentimes it will still be able to function, even though it has never seen this kind of variation during training.
Interviewer
So we're going to talk about evolutionary strategies, Evolutionary or neuroevolution, which is the book that you guys and about your work at Sakana.
Dr. Sebastian Risi
Sure.
Interviewer
Right now to begin, you have this new book out, Neuroevolution, and it's about evolutionary strategies in machine learning or in AI. And it pulls together several different strategies or streams of thought which are you working on most. So maybe you can start by explaining what neuroevolution is in AI.
Dr. Sebastian Risi
Sure. So neuroevolution is the idea of combining evolutionary algorithms with neuroevolution with evolution. So the idea is instead of training networks with, you know, like gradient descent or reinforcement learning, we can take an example of how nature evolved intelligence and use evolution instead. And so you're applying genetic algorithms, evolutionary strategy, many different flavors of evolutionary algorithms to optimize some part of the, of a neural network. And that can be the nice thing is it doesn't only have to be the weights like in standard back propagation, but it can be the architecture, it can be some learning rule, it can be hyper parameters, it can be many different parts and it doesn't have to be differentiable. So it's quite versatile how you can apply it.
Interviewer
What do you mean it doesn't have to be differential.
Dr. Sebastian Risi
So so when you typically train a neural network with like supervised learning, you need to be able to differentiate through the network. So the whole, the way you know, in, in standard, when you use an algorithm like back propagation, what basically all machine, most machine learning is built on be able to differentiate through the network to know how much based on the loss, like how well it performs, how much you change each weight in the network. And so the network has to have certain properties like this differentiability for it to be able to apply this algorithm. And so if you don't have that then then it's a little bit more complicated to apply something like back propagation to it. Evolution basically still works. So you have to have specific like activation functions that are differentiable. The architecture has to be differential. Everything has to be kind of smooth instead of being discrete. And if you want to do something like discrete, discrete actions, things like this, then you have to, you know, use some tricks. Maybe evolution doesn't really care if anything about it is differentiable or not.
Interviewer
Yeah, I'm not getting what differentiable is.
Dr. Sebastian Risi
But so basically like from, like if you have a function you can, you can use, you can take the derivative of that function and that's basically what you're doing. When you train a neural network, you view the whole neural network as a, as a big function. And if you take the derivative of it like from math, like high school math, then it tells you the slope, it tells you which direction do I have to push the weights for the error to get lower. So that's all it does. You like, it gives you the slope of the function and that means that should I take, should I take this weight in this direction or the other direction? And if I take it in this direction the arrow increases and in this direction the arrow goes down. So you have basically like let's say you have a three dimensional network with three weights. What you get is depending on how you vary those weights, you get an error surface. And that tells you. And then if you get the slope, it tells you where, which way you should go down.
Interviewer
Right?
Dr. Sebastian Risi
And if you have a million people parameter network, it's a million space and back propagation is very good. If you can do that then it's, then it's great to, you know, finding that point. But if you can't, the minimum maximum, but if you can't do it then it's very difficult to navigate that space. And that's what you can use like evolution for. So yeah, basically the differences in, if you use evolution is that you don't need to grade it because yeah, you have a whole, you have a population that is basically distributed on this, on this landscape, right? And you don't need to have the, this arrow signal, you can just basically you kind of sample like evolution strategy. For example. You have like you are somewhere on that, that, that surface and then you, what you do is you sample, you slightly change the weights, right? Create like 100 different mutations that are like around you and then you go into the direction of, oh, in this direction. So it's a more you locally sample, but you have a population. So you're not only sampling here, but in, in many places at the same time. And that can give you a direction to go. And then the nice thing is also that if you, that you not only need to do mutations like slightly going in one direction. Right. But you can also do big jumps by doing crossover. So like you know, the idea crossover is you take the genes of one parent and the genes of another parent and, and by combining them maybe you get the best of both, both worlds because each one has good building blocks. And by combining it together you get something even better. But if you want to do that with evolution, you need to take care of that. You can't just randomly take half of a network and half of another network and assume that it's working well. So there neuro evolution researchers have developing algorithms that allow like a sensible crossover that you, you know, you don't want to have like suddenly like two left hands, you want to have a left and the right hand. And the same applies to kind of
Interviewer
neural networks and then it can combine them in a way that's still executable.
Dr. Sebastian Risi
Right, exactly. That makes sense where you don't kind of lose functionality.
Interviewer
So that's neuro evolution as it applies to AI and you've been working on. And this is an area that fascinates me. Plasticity of networks growing self growing networks and to some extent self recurrent recursive improvement. Are you working on that at all?
Dr. Sebastian Risi
Yeah, yeah, so. So yeah, exactly. Like we have. So we're basically trying to see what are potentially building blocks from nature that we don't have in our current system that might make them hopefully a lot better. And one of those things is this plasticity that you mentioned. So how we learn is through one of the mechanisms that our brains learn is like if two neurons always fire together, then the connection between them gets stronger. So it's like it's a local learning rule instead of this back propagation that is like this outside thing that changes everything about the network. And so what we have been working on is what if we only train those learning rules for each synapse we train this local learning rule instead of having a global signal and we train it through evolution. But then we can put the. So we did experiments where we trained those Hebbian learning rules and they take into account like how much does the presynaptic neuron fire like the source neuron and how much does the post synaptic neuron fire. And then Depending on how much they fire together, we have a learning rule that says, oh, if this fires often, or this one or them together, then maybe make it stronger, make it weaker. And for every, every connection in the network, we evolve its own rule. And then we showed that if you do that, starting from when the agent is born, we can start from a completely random network. The only thing it has, the learning rules, but otherwise the weights are completely random. And in a few steps the network can self organize because it's trained by evolution. The learning rules are trained by evolution to self organize into a network that can, for example, control a car driving around or controlling a quadrupedal robot. And the interesting part is that this quadruped, when you use a static fixed network that is not changing the weights during its lifetime, if you cut off a leg, it will probably fail because it can't adapt. But these Hebbian networks, they change the weights all the time. It's basically like a continually learning, updating system where you can cut off a leg and oftentimes it will still be able to function, even though it has never seen this kind of variation during training. And so now we're trying to, you know, extending those to also more complicated tasks, more, more like continually learning tasks. But the main idea is that the weights never stop changing. Like our, you know, your brain is not frozen at some point, but it keeps, keeps changing. And yeah, so I think this is a very like a promising direction towards like continually learning agents that are based on their own evolved learning rule that could be, for example, optimized to facilitate continual learning through some kind of meta learning.
Interviewer
That's something that, and the problem of course with continual learning on a fixed network is you overwrite weights and you forget information. How do you deal with that?
Dr. Sebastian Risi
Yeah, so that can still happen with those networks. So one thing that, that people have been experimenting with is so in the brain we have this happy learning, but we also have this thing neuromodulation. So neuromodulation is like another type of system in the brain that tells some parts of the brain when they should learn. That's one of the things it does many other things, but one functionality is that it tells parts when should they switch learning on and when should they switch it off? So, so us and others have been experimenting with adding another type of neuron to a neural network that can then tell other parts of the network when should learning be switched on and off. And so that's one way of, towards more continually learning system that the system itself learns. Okay, I should override maybe this part. This, this weights are fine. The other parts maybe shouldn't be changed.
Interviewer
Interesting. But. And that's in a fixed network.
Dr. Sebastian Risi
Yes. And yeah, the, the. So the other thing we have been working on that as part of this, this EU project Grow AI is that we are also trying to learn not just in a, in a fixed network, but also learning actually to grow a network like more taking inspiration from neurogenesis and morphogenesis in nature that we're not given. This brain, like this brain, it's been, it's been growing, growing. And so that's one thing that we, in machine learning Skip. We just. This is the, the neural network you have. You have. But the nature things are grown from like starting from a single cell. So we're trying to replicate that process. Trying, starting with one neuron and growing and then the hope, the ideas. So we have a system that can do that. And currently it works for, for simple tasks. But ultimately the idea is to also take into account the environment during the growth process. Like to take advantage. There is already some information in the environment, so why not take it into account when you, when the network is created and develops. And so that's something we're working on. And also we have been doing a combination of. And we call this a neural developmental program. So it's basically like you're learning another small neural network that is a copy of those runs in every neuron of a normal neural network. And then that small network can then decide when should another node be created or how should the connection between two nodes change based on the activation. So it can learn in principle any type of learning rule which just makes it also harder to optimize. But it's basically like a graph neural network type system, but that is dynamic, that can change while the agent is born and interacting with its environment.
Interviewer
Experiments how many. You're growing parameters, right?
Dr. Sebastian Risi
Yeah, in effect, yeah, we're growing parameters, but not trainable parameters. So, so the trainable parameters like the, the DNA, the, the small program that. And then the, the, the big network that has also parameters, but those are basically like those it has to learn to change by itself.
Interviewer
Right.
Dr. Sebastian Risi
Our like DNA and the process. But then your brain then is not relying on evolution anymore, but on whatever learning mechanism is running inside of. Of that system.
Interviewer
I mean, can we talk about it from the point of view of. So let's say you have an understanding that. I'm a journalist, so I don't really understand that stuff. You have a four, say a three layer Network with four nodes or nodes in each network.
Dr. Sebastian Risi
Right.
Interviewer
So in the hidden layer, the middle layer. Yeah. One of the neurons is. It gets enough information or more information that it can handle and then splits off a new neuron.
Dr. Sebastian Risi
Yeah. Except that it's. It would be up to the. So. So there are two things. Like, one is it could just grow without getting any information at the start. Like, you know, like how before the. Ourselves might get sensory information on the outside, you could just have a fixed program where the nodes communicate with each other and they exchange information and they figure out, okay, you should grow five times and I grow two times. And so this is like, without any outside activation. That's one. That's the first like, process that can run. But then it could be that. Yeah, then it could be that they figure out, okay, I have too much. So the system itself learns to do this. So we're not telling the system if you have too much information, but each. Because in each node you have another, like, recurrent network basically running like this genetic program, developmental program. And that could figure out, okay, I'm getting all this information. So. So maybe I should split the cells to that, you know, you have more capacity. But that is something that we don't program in. But that the algorithm would have to figure that out by itself. And then how it would figure it out is that genetic programs that do that would get a higher fitness than genetic programs that don't do that. And so the other ones would select. Be selected out. And. And some that do this a little bit would get initially better fitness and then they would be selected. And. And yeah, and so that's one thing that is. That is the little bit the challenge. Because the space of what you could learn is so large. Right. You could learn any very weird developmental program. So. And. And that's we. So it probably would require to evolve really complicated things, have a good curriculum of tasks. Like first, you know, you have to do some small. Some small task. And. And then we make the task more and more complicated, like, kind of. There's also some research that we talk about in our neural revision book, like one system called poet, where you're evolving the. The environment and the agent together so that both things get. Can, like, can scaffold off of each other. And so something like this will. Is probably required to get that approach to work for really, like, complicated problems.
Interviewer
And just on a very concrete level, you're dealing with computer code, right? So does the code. Is there a function in the code then that replicates a neuron and Then adjusts that new neuron's weights.
Dr. Sebastian Risi
Yeah, that's basically like the, the small developmental program has an, is a known network that has an output that says like create another node. And then when it says when it's there, that is over some threshold we just add another node to that parent node and then there's another output to the network. If you give it as input the state of two nodes, then it tells you how much you should change the connection between them. So it's an extra output to the net, to the developmental program network.
Interviewer
Yeah. And so far, how many nodes have you grown? I mean the network has grown from what to what?
Dr. Sebastian Risi
Yeah, so, so the networks are grown from like one single node to like. I think the biggest we had is like we, we tr. We tried it on the, on a small, like we tried it on some robotic tasks, but I think the biggest one was like a small version of mnist and maybe like a few thousand nodes, something like this. So it's, it's orders of magnitude smaller than like current.
Interviewer
But, but it grows by order.
Dr. Sebastian Risi
Yeah, yeah, yeah, yeah.
Interviewer
And if you could scale this up, I mean first of all, are you guys confident enough? Are you at the pointless research that it's time to scale it or are you still working on the base algorithms?
Dr. Sebastian Risi
I think it could be almost time. I think there's some challenges with balancing growth and plasticity. Like sometimes it, what it sometimes likes to do is it grows as much kind of as it almost can and then uses this whole network and just changes the connection between nodes. So, so, so there has to be some kind of pressure towards being like maybe sparse or not using too much. Because if you don't give it any pressure like in, in nature you have like this energy consumption. Like so, so you can't just grow and, and, and use all that energy. So that's a little bit harder to find the balance. Like and, and, but it's in, in nor revolution you often also have multi objective optimization. That means you can say okay, one thing is important is the fitness. The other thing that's important is for example, how, how much nodes there are like some pressure on the, on the, how many, how big the structure is. And that helps. But there's still a few things we need to figure out. Like one thing that's also, it's still a little tricky. You want to learn to grow something and you want to be able to elaborate on that. And nature is very good at it. Like it learned how to make a butterfly and then it learned how to Make a butterfly with like ice spots and different eye spots. But in these type of representations they're also called like indirect encodings because you have this indirect, indirect way of making a network. One thing that's difficult is that you, you want it to grow a certain network and then you want to. That it learns more and adds like another structure to the network. But it should do that without forgetting how to grow the first part. So there is like a kind of continual learning problem in developmental system and we're still trying to figure out how to best approach.
Interviewer
I see. If you can get this to work, is there. What are the scaling laws that apply? I mean, can you, could you then grow this indefinitely or does something start happening beyond a certain point in the size of the network?
Dr. Sebastian Risi
I think if we really, if we figured out how to grow it, I think we can, we can really, we should be able to really scale it up. And there's actually some interesting work that was presented just now at Neurips where they had a network. They showed reinforcement learning. If you scale up to. Normally you have quite small networks for reinforcement compared to like language models. But they showed if you have a network, if you really scale it up, like having hundreds of layers, you actually get much more interesting dynamics out of the system by itself. If you train this in a supervised way. They did this like in some robotics task. So that shows kind of. And that is just scaling up like a typical feed forward network. So, so imagine like there might be like really interesting dynamics hidden if you scale it up. But let not the structure like be the structure grown and determined by this and where it's not like a typical feed forward architecture. But I would like to grow something that has more like, I don't know, maybe it discovers how to grow a cortical column and then it should be able to replicate that cortical column many times or some other structure that's important in, you know, in biological systems. And then I think then we could get really interesting dynamics out of the system. But yeah, the difficulty is having a system that can learn to grow important neural motives and able to copy them and also maybe slightly create variations of it, because maybe it wants to use this kind of cognitive map for something and then slightly changes to use it for another modality. So but I think if we figure out how to make this really efficient, then there's a lot of things, interesting things we can, we can do.
Interviewer
Yeah. And once you, I mean, growing is one problem, right? Yeah. Training is, is another.
Dr. Sebastian Risi
Right.
Interviewer
And then the inference. Right. Is Another So when you grow this network then you're starting with a trained network, right? A pre trained.
Dr. Sebastian Risi
So what is trained is the program that grows it, right?
Interviewer
But can you then train that network the way you would a GPT model?
Dr. Sebastian Risi
So you can also use, you can also. We did some approach. You can also train it in a more supervised way or through reinforcement learning. It's just, it seems to be easier to train it with, with evolution. But the, the, the, the issue is also with this approach it kind of. There's a in general machine learning and evolutionary computation, there's this kind of issue of like deception that it's easy to get like a decent score, but if you want to get all the way to the goal, you might have to first like decrease your go another way that decreases your performance for it to be then become better. Like there's this classical example of like a maze and you can get very close to the, the goal of the maze, but to get actually to the goal you would have to go all the way around the maze. So getting a decent score is okay, but if you want the really good score you have to get worse first. And so likely these kind of problems require like approaches that can deal with this kind of deception. And that's why also neural evolution people have been developing these methods of more open ended search methods, like methods that don't just go for one target, but methods that are under this umbrella term like quality diversity. You want to have an approach that explores much more of the space but also takes quality of the solution into account. So, so for this kind of growing approach to really work really well, we have to combine it with these kind of quality diversity approaches because all of these things kind of work together because otherwise it's really, it's quite difficult to explore the space. And we also, we did some work back in the day that just shows just the difficulty of learning to learn and plasticity. Like imagine you are like in a typical experiment people use use in biology like this T maze. So you have a maze that looks like a, like a T and the mice goes to one part of the maze to has to learn to remember. Like oh, was there a big reward here or was it here? And when they collected they have you put them back to the start of the maze. And if you train this, we did experiments where we used heavier learning for that. And, and, and what happens often is that imagine you learned to always go to the small reward. Like you go to the small reward, then, then you get put back and then it's the high reward here. So you learn to go to the, the small reward. This is like the worst thing you can do. It's, it's worse than going always to one side of the maze because you would at least get 50%. Right. But in terms of how close is that network to actually learning? It's, it's closer than the network that always, that is just not reacting and always going to one stupidly going to the, the right side. Right. So if you use a traditional approach, this will be the worst. This will be directly be sorted out. So you need to have different methods to evolving these more like cognitive skills than just saying, you know, this is the fitness, because otherwise you will get stuck in that, like go 50% stupidly to one arm of the mirror. So everything has to kind of work together. And that's kind of the challenge in this, kind of, in this way. You, you want to learn to learn, you want to learn to grow, you want to do everything at, at the same time. And that's kind of the challenge and
Interviewer
what I was referring to in, in learning, you know, right now we have large models that have been trained on a tremendous amount of data. They're notable. And if you do too much training, post training, you end up overwriting. Right. The idea here is that you could have a network that's trained, but as it operates during inference in the world, it learns new things and rather than overriding, it would grow new nodes and store that learning in those nodes.
Dr. Sebastian Risi
Yeah.
Interviewer
Is that right?
Dr. Sebastian Risi
Yeah, I think that would be the ultimate goal to combine these smaller scale experiments and seeing what we learn there to allow ultimately language models to do this kind of continual learning. And there is some work by Sakana that goes a little bit in this direction, which is called this evolutionary model merging, where you take, you know, because we have many networks that are already trained. Like there are thousands of language models that you can download online and why not take advantage of all these networks that are already there? And so this, in this model merging approach, you take one network and some layers of that and you take from another network layers, and then you kind of merge them together and you let evolution figure out how to do it, and then you can. So colleagues at Sakana have done this, that you can then take a model that is good at Japanese, you take a model that's good at math, and you let evolution figure out how to combine them together, have a model that's good at Japanese and math. And, and so one thing that the next thing could then be, could you Have a model that's good at Japanese that you can teach incrementally how to be good at math or something else. But, but I think we're not, we're not there yet. Like, but I think this is how the field is, is moving towards.
Interviewer
Yeah, the model merging is fascinating. And so you can take, if you have the source code GPT5 and Grok, wherever they are now for, and combine them. I mean, theoretically.
Dr. Sebastian Risi
Theoretically, yeah. Those would be probably a little big or like, if you had the resources, you could do that. The trick is, like, you have to be able to see is it better the combination. You have to be able to evaluate it. And, but you can evaluate on many of the benchmarks that you have, and then you could merge them together for these really big models. There could be some other challenges, like the approaches people have done so far on like, slightly smaller models also, because then the smaller model, we know that model is not good at math and we know this one is not good at Japanese, which is for the really big models, it's a little bit harder to know even what are they not good at. But yeah, the evaluation is a little more tricky there. Yeah, but in principle you could do it.
Interviewer
Yeah, you could, you could distill smaller models.
Dr. Sebastian Risi
Yeah.
Interviewer
From the parents and then merge, right?
Dr. Sebastian Risi
Yeah, yeah, right.
Interviewer
How, how large are the models that you merged?
Dr. Sebastian Risi
They're like a few, I don't know, like, maybe like a hundred million per minute actually. I don't remember exactly, but yeah, not compared to the big model. Yeah, yeah.
Interviewer
And where, where do you see all of this going? You're also doing some really interesting work in evolving, what is it called but life, right, in virtual life.
Dr. Sebastian Risi
Yeah, yeah. Artificial life. Yeah, yeah. What do you call artificial life?
Interviewer
Artificial life, yeah.
Dr. Sebastian Risi
Right.
Interviewer
So is that related to this or is that completely separate?
Dr. Sebastian Risi
No, no, that is, that is very related. Like in. So in artificial life, it's like the idea is that life, the instance, we know one example of life, but artificial life is like life as it could be. Like, and some. And people simulate things that are lifelike properties. And, and one thing of lifelike property is, is growth. So, so self organization and, and growth and self replication is very like, essential to life. And, and those are also things we explore with these growing networks, but we also explore them with what's called this neural cell automata. Like, it's also basically like neural networks, like copies of it. And they imagine just replacing the traditional rules of cellular automata, like of the, the game of life, which has these fixed rules like if you have three neighbors, you create a new cell. If you have four, you, the cell dies and you can replace that with a neural network. So instead you ask each cell says ask the neural network what should I do? What state should I become next? And we changed that to. So we are able to scale that to 3D. Like we have a paper where we growing Minecraft structures with this. And the fun thing is like you can have a, you grow a salamander Minecraft, you cut it in half and then it grows two salamanders. And one thing so that the nice thing is you can train those with supervised learning. So if you have a target, you know, you want to grow a house or like a tree, then you can teach it to grow that. If you don't know. We also used it to train kind of soft robots, they look like squishy, squishy robots where we don't know what is a good morphology for locomotion. But there you can use evolution that you tell it, you know, grow a structure, put it in an environment, see how well it works. If it doesn't work, then we throw it out. And through this process we can grow like structures that, that are able to locomote and then we're also able to damage those. Like you can again you can cut off parts of the structure and if it's trained to recover from it, then it can regrow just only based on the local information. So it doesn't need any other information, it just needs to sense the local part. Like a salamander can regrow its tail and these methods can be used to do that. And there's this community, artificial life that also is a few people at Sakana also working on these kind of ideas of artificial life. And yeah, it's an interesting direction that's a little bit not the mainstream machine learning, but I think there's a lot of promise, like taking some of these properties from biological system, putting them there like one is like being resilient. So natural biological systems are incredibly resilient and still deep learning. Often, you know, you find these like weird examples and it completely fails. So I think there's a lot of promise in using these systems that can self organize and based on local communication. They have an inbuilt resilience that I think we could exploit to make these deep learning systems also more robust and also adaptive.
Interviewer
Yeah, and adaptive, that's another area. Well, a couple of things for, for evolutionary systems that use evolutionary strategies to find solutions, right? Find new novel solutions to a problem. And then systems that can combine models based on the quality of their outputs and improve through generations. It seems like that would be very applicable to scientific research because you've got, you know, I mean, with gradient descent, what we were talking about earlier, you're going toward the local minima and you're hoping it's the universal. Right. But what you were talking about being able to cover a much larger landscape. Can you talk about what the implications are for. Right, for scientific research?
Dr. Sebastian Risi
Yeah, that's also something we're exploring at Sakana. It's this kind of idea of like an AI scientist, for example, that you can, you know, you, you can. And that's a very like, with, with AI scientists, but also this thing we call Shinka evolve, which is like kind of alpha evolve. And, and the idea is that you can, and that's a fruitful, lots of combination of evolution and large language models. So large language models are good at, for example, generating code and generating ideas. But to explore that space, for example, it can be really useful to use evolution. So like, basically you can use language model as a mutation operator. You start with like, you know, like a, one kind of example is this circle packing. Like you have a, you, you want to have a space and you have like a number of circles and you want to put them in there. Like the maximum number of circles you can put into this space. And, and so what you can do is you can then, you know, ask a language model to give you a new solution. Like and, and then you, and multiple solutions. And then you evaluate those solutions based on fitness. Like how, how. What's the score that it gets packing those circles? And then you, you do this again from the best ones and, or from the number of best individuals. And you ask again, the language model give me variations of, of the solution. And then you do it over and over again until you find a good solution that packs the most circles into this, into this space. So, so this is like you use evolution to navigate the space, but you're using the, the language model to give you as a mutation operator. And you can do this also for, you know, like, like scientific ideas. For example, you can start with one idea and let the model generate your variations of it. And then the only thing you need is you need to be able to somehow score it based on some fitness function. So which is a little easier if you have the circuit packing, it's a little harder if you have like some, you know, scientific idea, which is a little bit more complicated to say if this is a good idea or is it a bad idea? But, but I think it's really. And yeah, I think this is a direction that a lot of Sakana is pursuing and other companies are pursuing where you have this kind of combination of evolution because it's creative in what it can discover but you have it a little bit more grounded because you have a language model that is the mutation operator. And people in evolution have for a long time done things like evolving programs with genetic programming but those were always like very hand tailored to the pro, the kind of problem at hand. But now that you can use a language model, you can let it output code and you can ask it to modify the code and navigate in its space and applying all these lessons that we have learned from Neuro evolution, like more open ended setups, using things like quality diversity to kind of navigate this space and hopefully not getting stuck in too many of these local optima. And I think it will, yeah, it will change how, I think how like how science is made that you have this kind of AI scientist or like co scientists that you can exchange ideas. It's navigating some space, it's giving you some, you know, hypothesis to test. And yeah, I think this is kind of where the direction is kind of moving towards. Yeah.
Interviewer
And you guys at Sakana, I mean where are you in that research? Have you.
Dr. Sebastian Risi
Right.
Interviewer
Are you still at the architecture level or have you done run experiments to see if it will output some useful.
Dr. Sebastian Risi
Yeah, so, so I think it's, it's going this way. So there's this, the AI scientist from Sakana that can. And the recent version, the new version was able to actually generate some like a paper that got accepted at a workshop. So it kind of shows that there is some, it can generate some generally interesting things. Now is the question how far can you push this approach to generate something like truly groundbreaking. So I think that's the kind of the holy grail of the film field. Like and still like an open question like it can definitely generate some new things but how far from the training distribution, how creative can it be and do we need to. Is it just about prompting it and getting those ideas out or do we need to. There's a need to run its own experiments and we need to fine tune it on those results and then like iteratively make it, making it better. And also this idea of the, the self improvement so, so that the model itself gets better and better and it gets better at getting better. So, so, so those are like ideas that also exploring. Yeah.
Interviewer
On the the paper that was submitted, I think to icml.
Dr. Sebastian Risi
Yeah.
Interviewer
Or was it ICML or iflear, I can't remember which one. It was a. It wasn't a very interesting idea. I mean, it got accepted.
Dr. Sebastian Risi
Right, right.
Interviewer
Was, as I recall, is kind of proving a negative. Right. It was. I mean the conclusion was whatever the hypothesis was. Right. Would not work.
Dr. Sebastian Risi
Yeah.
Interviewer
Right. And how did that start? I mean, did it come up with that? And what was the question? I can't remember that.
Dr. Sebastian Risi
Yeah, yeah. I wasn't part of that paper, I think. So the thing is that, I think the main thing is that that's probably the worst it will ever be. So I think that's kind of the idea that this was an. With an older model you get this paper. Right. But if you would replicate it now using like Gemini or some other model, it will probably push further on this. So the better, the nice thing is about the framework that you can use the same kind of framework and you can switch out the language model that you're using. So the better the language models become, the better papers they should also be able to write. And I think that's kind of the main thing. Not necessarily that there was this, what ideas it generated then, but showing that the whole. You can kind of automate the whole pipeline and it will get better and better with better models. But also for me, I think the interesting part is how can we use this also as a kind of like a co scientist? Because it will, I think in the future, or at least for some time there will be humans and AIs like working closely together. And I think it's very interesting. How can you make sure that it can take into account both ideas? How can you make sure that AIs and humans talk in the same language? There was an interesting keynote by Melanie Mitchell and where she showed that basically like even the models that come up with it looks great on the benchmarks and it looks like the right solution and it gets a good score. But it solved it in a very different way that was not even intended by the humans. Like it exploited some kind of feature about the domain that wasn't even built in. So how do we kind of. We have to find kind of like a language that we talk in the same way. If we want to collaborate, then we have to find a common ground to being able to do that. And I guess there's already some common ground. It's natural language, it's trained on text, you can communicate with it, but you might not be sure about intentions. So I Think to collaborate well, we need to do a lot of work that goes beyond it just being able to write its own papers. But how do we best kind of combine it with what humans are good at and what machines are good at? And I've always been interested is kind of co intelligence or hybrid intelligence, like, how can we combine the best of both worlds? And before it was a little more easy, like it was a little bit more clear. What are humans and computers good at? Now it's becoming a little more. Less clear. So I think that's something we need to kind of figure out.
Interviewer
Yeah. You said something interesting before. Not, I mean, everything interesting, but you said that it's, you know, its creativity is constrained by the training data. Like, is it really going to come up with ideas that aren't in some way embedded in the training data? Two questions on that. One, there's maybe that's the case, but there's. The training data can be very rich in scientific ideas that have never been explored. Yeah, right.
Dr. Sebastian Risi
Yeah.
Interviewer
I mean, even evolutionary AI has not gotten the attention until very recently that it probably should because people get attracted by other things. Yeah. AI is not like that. It'll look at everything.
Dr. Sebastian Risi
Yeah.
Interviewer
So isn't. Aren't there insights to be discovered within the training data, within the body of scientific research as it exists?
Dr. Sebastian Risi
And I think it's still. I think it has. There's some results that show it can, it can generate something new. It's just a question how far can it be pushed beyond what was in the training? So I wouldn't say that it cannot produce anything new. I just, for me, it's less clear. How far is that outside of what it has seen? And I mean, there's like mathematicians, I think Terence Tao that used it and the other ones that found like, it found like an existing. It found a proof that humans forgot or didn't know about. And so it can certainly be helpful. It can. I mean, people have been applying it to also optimize, you know, robot morphologies, and it came up with new, new morphologies that didn't exist before. So. But the question is. Yeah. How far can we. Can we push it outside that?
Interviewer
And what you're saying is that, you know, human knowledge contains like a finite amount of all possible knowledge.
Dr. Sebastian Risi
Right.
Interviewer
And how do you go beyond current human knowledge?
Dr. Sebastian Risi
Like, let's assume like, I don't know, like 50 years ago, let's say you had a model that was just trained until this point. Point would have had then invented the iPhone at some point.
Interviewer
Yeah, I've heard this like.
Dr. Sebastian Risi
Yeah. And. And so it's not. I don't think the current ones would. They probably would invent other things and then you could imagine, okay, they invent something and then they could be trained on. On those inventions and then maybe ultimately they would get there, but maybe they
Interviewer
would come up with something.
Dr. Sebastian Risi
Something else.
Interviewer
Yeah, yeah. I mean the iPhone is. Is it. It certainly was revolutionary, but it was. There was no groundbreaking tech in iPhone. It was an engineering exercise where you brought together MP3 player and telephony and.
Dr. Sebastian Risi
Yeah.
Interviewer
So I mean, given the right target, it might evolve an iPhone, but it might evolve something that is better than an iPhone.
Dr. Sebastian Risi
I think the thing is also it probably has to be combined with. It has to be able to run like its own experiments and stuff. I think if you just like 50 years ago and then you just only in your head and think about what could you invent? Like, I don't know how far people would have gotten, but you have to be able to either run an experiment or you have to be able to manufacture something and then see how does it do or so. So I think. And some people are. Are working on combining this now like this like Leela signs that. That they combine these ideas and they have robots and manufacturing things to then I think ultimately try to see what the model makes and then use it to automate kind of science or like
Interviewer
what kind of a factory. Like, because there are, you know, I'm thinking of in Silico. Is it not in Silico? It's in. Maybe it's in Silico. The drug discovery company.
Dr. Sebastian Risi
Yeah, yeah.
Interviewer
That has. Is using AI for, you know, compound. Right, yeah, Discovery. But then they have it attached to an automated wet lab that synthesizes the molecules and in this case, what. What kind of a manufacturing.
Dr. Sebastian Risi
Yeah, actually I don't know like exactly how it looks like, but I guess like also trying, like I could imagine it's like, yeah, robots that try to mix things together and, and like seeing what happens. But. But I'm not chemistry. Yeah, I think as well, I think they're doing many, many different things. But I guess there are a few labs that. That's one of them. I think there are a few labs that are like companies that. That moving towards that direction, like automating, also like material science. So. So. And I think, I guess that is said that what probably has to happen because if it's just only being able to output language, I don't know how much it will be able to do something I think it has to be able to affect the world or do run experiments somehow to. Because that's how. Yeah. How we humans at least learn. Like so, but, but I think that's also moving in this, in that direction.
Interviewer
Yeah. And so we talked about artificial life, we talked about growing or plastic.
Dr. Sebastian Risi
Yeah.
Interviewer
Neural networks and about evolutionary strategies. What, what are the other areas that you're focused on?
Dr. Sebastian Risi
Yeah, a lot about like also open endedness. Like.
Interviewer
That's right.
Dr. Sebastian Risi
How can you create a system that can. And that's very much tied to the current LLMs. Like how can you create a system that can keep innovating, keep producing interesting new things? And that's why. Yeah, people use more and more like language models in that as well. But there is a lot of like techniques in the evolutionary neuroevolution community that people have already developed that now getting kind of augmented using language models. Like one is for example, like evolving both the environment and the agent at the same time.
Interviewer
Yeah, yeah, you mentioned that. What does that mean involved.
Dr. Sebastian Risi
Right. Like, so one of these examples is like this algorithm called Poets, like where basically like the environment, the agent might be like a bipedal robot and the environment is the, the terrain in this case. Like, and it's easy to, if you have a flat terrain, it's easy for the robot to walk, but then you can introduce like gaps or like obstacles. And so the agent has to learn to deal with those. And so in this approach, initially you start with a very simple environment, but then over time you make it more and more difficult. And at the end it's solving like crazy environments that it goes down or it has to jump over things. Like it's really impressive what it can do in the end. And the interesting part is if you would have started with the really complicated environment at the end, it wouldn't have been able to solve it. So it needs, you need to go through these stages for it to discover these kind of stepping stones in the behavior to be able to do the final thing. And then people have also extended that to which we also talk about in the book is this one is this approach called Omni, where you can extend this to things like environment generation is in Unity, for example, like where you have not just this two dimensional flat landscape, but now that we can have language models. You can have the language model produce code that creates an environment and initially it might create an environment that's very simple and then it creates more and more complex environments. And so that's I think really, really interesting and could allow Also like new revolution to scale up to really complicated tasks. And I mean, now you can even imagine complications combining this with, you know, we now have also neural networks that can simulate whole 3D worlds. Like, and if you combine, I think neural revolution with a controllable world that you can just prompt, how should this world look like? Should it be a simple world, should it be very difficult worlds, more predators, more this and that. Then I think we could really get like, yeah, an increase in what the agents we can kind of evolve this way.
Interviewer
Yeah. And Sakana's building world models, right?
Dr. Sebastian Risi
Yeah.
Interviewer
And are they explicit? You know, Fei, Fei Li I had on recently.
Dr. Sebastian Risi
Right.
Interviewer
And her marble world model played with that, you know, expresses the output is an explicit representation of a world. I'm interested in how you use the internal representation of a world for the model to learn as it interacts with the external environment. So it, it understands physics, it understands, you know, whatever it learns from the external world.
Dr. Sebastian Risi
Right.
Interviewer
Which is Sakana working on one of the.
Dr. Sebastian Risi
David Howe, our CEO, he made kind of one of the first papers on world models. Like that was maybe I think 2017, where he, he trained, I mean now like a simple world model at the time. And then he could train inside of this dream. You could have the agent get better and better. He trained us on like this kind of 3D doom task. And yeah, I think one of the most interesting things was that you can. Even back then the, the simulation wasn't perfect. You could see that it was like a hallucinated world. But you could already then use the agent to train inside of the world model to then be better in the real world. And so that's something that David has explored in the past. Something where we looked a little bit into is when do you need to have a world model? And when can you just use. When can you just ask language model the answer. Like if you, if there's something about like physics, when do you need to, you know, when you need to ask and when you actually need to run a simulation. So, so, so those are kind of some things that we have been looking at a little bit. But yeah, like, like basically what you said, like answering the question, when is a world model useful? Because you might not. I might not need to ask a world model every little step I do. Right. But if I thinking about, I don't know, I'm jumping over the table. Maybe I should. Should ask the world model, will I likely get to the other side or not? Yeah. Can you learn to. When to run kind of which process.
Interviewer
That's interesting. And, and so you're not using like marble sort of generated environments like that.
Dr. Sebastian Risi
Not that I know, no.
Interviewer
And, and I was talking to, to Risto about how, you know, I spoke to Julian 2018 maybe.
Dr. Sebastian Risi
Yeah.
Interviewer
But I, I, you know, evolutionary AI has not been front and center. I mean, right. You know, the transformers took over and everything's about optimizing transformer architectures or looking beyond transformer architecture. Is there a reason why evolutionary AI is swinging back into focus?
Dr. Sebastian Risi
Yeah, I think the thing is, I mean I think.
Interviewer
Or am I wrong? Maybe it always has been and I'm just not attending the right conferences.
Dr. Sebastian Risi
No, I think it definitely wasn't in the main focus, but I think now that, I think it's just a very good pairing, e.g. lLMs with evolutionary algorithms. I think that's a really good match. And people, that's why I think more people also now interested in combining generative AI with evolution. Because it's just before basically before an evolution revolution, you always had to think about what does your representation look like. And then you kind of with a particular representation you might restrict the type of solutions you're getting. And now that you can use language model, you don't have to worry so much about, you can, can just have that come up with the representation or be the representation. And also in that instance it's quite, it's difficult to using like to differentiate or use gradient descent like so searching that you know, you can use the language model to give you ideas or like generate you artifacts and then searching that space. It's beneficial using evolution because it's hard to back propagate through, through that using gradient descent through that space. So I think that's one reason why it's becoming also more popular now. Like approaches like alpha evolve and this model merging are just like really, I think fit really well together. And then that's also something that Sakana is very much looking into. Like we want to go beyond the transformer. Like one of our co founders, Josie, he's one of the, the authors on this transformer paper. And so we think this is not where we should kind of stop and go beyond what the transformer can do. And so one of our approaches that we just presented here is this continuous thought machine, which is an approach where the network actually, it's not just like input outputs. Like you get something in, you put something out, but the network itself can decide to think about a problem for longer time periods. So where the external input is actually not as important as like the internal thought process itself. And so we made an architecture that allows the model to kind of do that and again, like incorporating some more biologically inspired algorithms, like, more like kind of like a memory, like an activation memory of what comes into each neuron, making each neuron more complex. So not just like the simple neural models that are most times used, but slightly more complex that each neuron is its own neural network. And also this idea of the brain, like biological brains often use this kind of synchronization and oscillations to do like, processing. And they seem to play a really. I mean, I'm not a neuroscientist, but they seem a really important role in biological brains, like how neurons oscillate together or synchronize together. And so that's one of the key components in this continuous thought machine that you see how much the neurons synchronize and you kind of use that as a representation.
Interviewer
And so they're also not in synchronize. You talking about the. What's the Hemian fire together?
Dr. Sebastian Risi
Actually, in this case it's more like, yeah, basically looking how much they fire together, like how much they correlate together. So very related to the, to the Hebbian one. And in this case, so those systems you can also train with this whole thing, you can train with great descent. So it's not that we only like or that I would say, like you should only use evolution. I think it really depends on the type of problem you have. Where is evolution useful? Where is gradient descent useful? There's also some other work we published at Alive where we used traditionally null cell automata, where we used gradient descent to optimize them. So I think kind of both techniques, I think are very complementary and useful for like, different aspects. And we also talk about that in the book as well.
Date: April 2, 2026
Host: Craig S. Smith
Guest: Dr. Sebastian Risi (Co-founder, Sakana AI; Professor, ITU Copenhagen)
This episode explores a paradigm shift in artificial intelligence: moving from "training" static neural networks via gradient descent, to "growing" adaptable, evolvable networks inspired by nature's principles. Dr. Sebastian Risi shares insights from his research, discusses the power of neuroevolution and plasticity, and unpacks how evolutionary strategies and developmental growth could lead to more resilient, continually learning, and creative AI systems. The conversation spans neural plasticity, evolving architectures, model merging, artificial life, open-endedness, and the future role of AI in scientific discovery.
“You don’t need to backpropagate gradients. You have a population that’s distributed on this landscape... you kind of sample, like evolution strategy, and can locally sample in many places at same time. That gives you a direction to go.” — Sebastian Risi (04:13)
“When you use a static fixed network that is not changing...if you cut off a leg, it will probably fail...But these Hebbian networks, they change the weights all the time... oftentimes it will still be able to function, even though it has never seen this kind of variation during training.” — Sebastian Risi (00:00 and 08:00)
“Nature is very good at it—learned how to make a butterfly, then learned how to make a butterfly with different eye spots. In these representations… you want it to grow a certain network and then learn more and add structure, but not forget how to grow the first part.” — Sebastian Risi (17:32)
“If we figured out how to grow it, we should be able to really scale it up… there might be really interesting dynamics hidden if you scale it up, not letting structure be fixed, but letting it be grown and determined by this process.” — Sebastian Risi (19:40)
“Natural biological systems are incredibly resilient, and deep learning often…completely fails on weird examples. Using these systems that can self-organize…could make deep learning systems more robust and adaptive.” — Sebastian Risi (32:18)
“You can use a language model as a mutation operator… you start with one example, then ask it for variations… then you evaluate and repeat. It’s a fruitful combination of evolution and large language models.” — Sebastian Risi (35:00)
“How do we best combine what humans are good at and what machines are good at? … Now it’s becoming a little more less clear. So I think that’s something we need to kind of figure out.” — Sebastian Risi (41:40)
Risi communicates with the excitement and humility of a scientist pushing into the unknown, constantly drawing analogies to biological systems, often qualifying statements with ongoing challenges, promising directions, and the collaborative spirit of combining evolutionary and deep learning approaches.
Dr. Sebastian Risi makes a compelling case that true artificial intelligence may need to be grown, not merely trained. By harnessing biological principles—evolution, plasticity, growth, and open-ended innovation—AI systems can become more resilient, creative, and capable of continual learning and discovery. As fields blend, and with platforms like Sakana pushing the boundaries, the next AI leap may come not from bigger transformers, but from networks that evolve, adapt, and grow within ever-changing worlds.