Loading summary
Ian Fisher
The world is changing so quickly. This is probably a little bit obvious, but you should just try things and like every day do something with AI. Last summer I took a weekend and used GPT5 to help me build an iPhone app. I hadn't done that in a decade.
Podcast Host
So fast.
Ian Fisher
Yeah, it's so fast and so easy. And that was, you know, an age ago. That was like eight months ago. Now it's even faster and easier. Don't limit yourself. Like it's anything that you imagine. You should just try to use AI and see how far you can get with it and you'll be making the world better.
Podcast Host
Welcome to another episode of the Light Cone. Ian Fisher is the co founder and co CEO of Poetic, which is building recursively self improving AI reasoning harnesses for LLMs. Previously he spent a decade as a researcher at Google DeepMind and founded a mobile dev tools company through UIC years ago. Welcome Ian.
Ian Fisher
Thank you. I'm so happy to be here.
Podcast Host
What is Poetic? How's it different than rl, you know, how's it different than context engineering?
Ian Fisher
At Poetic, what we're building is a recursively self improving system. And so recursive self improvement is this, you know, kind of the holy grail of AI where the AI is making itself smarter. The core insight that we had is that we could do recursive self improvement far faster and cheaper than all of the other ways that people had been proposing to do this. So obviously I can't go into details about what that is, what our particular approach is, but most of the approaches out there involve they require you to train a new LLM from scratch. And training LLMs from scratch costs hundreds of millions of dollars and takes months of effort.
Podcast Host
And then Anthropic or OpenAI will come along and just eat your lunch in the next model release.
Ian Fisher
Right, right. And of course anthropic and OpenAI and Google, they're exploring recursive self improvement, but typically at that level of having the, you know, having to train a new model for every step of self improvement
Podcast Host
that they do, I mean that seems like actually the like defining thing that a startup really, really wants. Like I know that I want to take advantage of whatever the next model is, but the second year in fine tuning land, I'm spending, you know, millions to hundreds of millions of dollars and then guess what, like I just lit it on fire because, you know, the next version of the frontier model comes out and I'll never catch up. Whereas like working with your systems means that I will always have the Thing that is better than the thing that's out of box and that's sort of like the holy grail.
Ian Fisher
Yeah. We think that this is incredibly valuable to anybody who's building on top of large language models. And we don't view the, you know, the frontier models as competitors. They're, you know, they're the ones that were using the stilts, you know, building stilts to stand on top of. But if we didn't have that foundational layer, then, you know, poetic couldn't exist.
Podcast Host
Yeah, I mean, being the smartest model, you know, it's a game of inches actually. And like, so those inches matter a lot.
Ian Fisher
Right, right.
Podcast Host
How do we actually get started? I mean, you've built something that basically any startup could use that it's sort of like stilts really.
Ian Fisher
We have built a system that can automatically generate systems for your particular problem that will always outperformed the underlying language models and without kind of the massive expense. As you're saying about the bitter lesson where what would you have done without Poetic? You probably would have said, okay, we're going to first collect a large data set, like tens of thousands of examples for our particular problem that we're working on and we're going to fine tune the best model we can get our hands on. Maybe that's one of the frontier models, or maybe it's an open weights model. It doesn't particularly matter. You're going to spend a lot of money on that fine tuning. The compute is so expensive. And then at the end of it you have something that works better than the thing that you fine tuned on top of. But by then a new model has come out and it's better than the thing that you fine tuned, you fine tuned three years ago on top of GPT 3.5 or whatever. And then GPT 404 comes out and it just blows you out of the water. And so are you going to do that again or are you going to go out of business? And in some cases the latter with poetic, what we end up giving you is a, you know, people are calling these things harnesses now, but you know, or an Agentix system or whatever you want to call it that sits on top of one or more language models and it just performs better than them. And when the new model comes out, that same harness is perfectly compatible with it and you don't need to change anything to get the, you know, an even bigger performance bump. Additionally, we can continue to optimize for this new model, whatever the new model is that you want to use and make it even better, but you don't lose out on hundreds of millions of dollars. In fact, we do this so much more cheaply than fine tuning would cost as well.
Podcast Host
And you've done this actually a bunch of times, right? I remember when you first came out with your paper in December of last year. You shot to the top of Arc AGI v2, and then you've done this a bunch of times for other benchmarks too. What Was that like?
Ian Fisher
RKGI v2 was this was kind of us coming out of stealth, letting people know that we could tackle these really hard problems. And in particular, we wanted to show that our system could generate these. What we call, we call our system, like the Poetic Meta system, can generate reasoning systems that are highly effective. Gemini 3 Deepthink had just come out and they were really quite dramatically at the top of the leaderboard at 45%. And two days later we released our results where we were showing that we could get a lot higher than that.
Podcast Host
So they come out with soda and then you come in right above them every single time, which is like wild to see, honestly. That's what it's like to have stilts, you know, like whatever model comes out, you can be taller than that one with Poetics, which is like. That's so awesome.
Ian Fisher
Yeah. So the interesting thing is that we were half the cost of Gemini 3D Deepthink because we were building on top of Gemini 3 Pro, which is a much cheaper model. But we still got in the end a 9 percentage point improvement on the official verification. So they were at 45% and like 70 something dollars and we were at 54% and $32 per problem.
Interviewer 2
So recently you guys just announced some incredible results for Humanities Last Exam. Can you tell us more about those?
Ian Fisher
Humanities Last Exam is a set of 2,500 really, really hard questions written by experts in many different domains. They're meant to be challenging even for PhDs in those fields. AI hasn't passed it yet, but we got to 55%, which is almost 2 percentage points higher than the previous state of the Art, which came out just last week from Anthropic with Claude Opus 4.6. They got 53.1% and we got 55% on it.
Interviewer 2
And one thing that Humanity Last Exam doesn't publish is the cost of getting those results. In your case, this run was done with less than around six figures. How much was it?
Ian Fisher
We didn't publish any cost for this, but I can say that the optimization costs us less than 100k? Yeah.
Interviewer 2
Which is impressive because each of these big foundation models, train runs are in the hundreds of millions of dollars. And you guys, as a company, you're only seven people.
Ian Fisher
That's right, yeah. Seven research scientists and research engineers.
Interviewer 2
Yeah, that's impressive. And I think the thing that's very interesting about your approach is sort of taking a very scientific approach to the emergent behaviors that a lot of the best founders are doing with models. I think a lot of founders that get very good results for agents, they treat the underlying model as a common layer that you can switch in between. And there's certain tasks. For example for GPT 5.2, like very hard to verify bugs gets sent to that versus architecture that gets sent to Claude 4.6. But you're kind of doing this automatically instead of having a human conducting is very impressive. I think there's something more special going on underneath. Can you tell us a bit about how it works?
Podcast Host
Yeah, it sounds magical. What can you tell us?
Ian Fisher
Right, so you're getting at a really core thing. These harnesses, they are code prompts, data built on top of one or more language models. And so this is something that in principle you can build by hand or with cloud code or whatever, but in practice it takes a lot of work to do these, to have all the insights to make these work well. So the core technology that we've developed at Poetic is recursive self improvement. So we have a recursively self improving system which we call the Poetic Meta system. The output of that system is systems that solve hard problems, where a hard problem is something that if you gave it to GPT5. Two, it would struggle to give you reliable, robust results, just to use an example. So this is a very big advantage for us. We can generate these systems in a much more automated manner, which means that we can do it much more quickly and much more cheaply than if you hired a team yourself to try to make your own, you know, your own agent to solve your particular task. But not only that, since you know this is really an automated optimization process. If you already have done that work, you, you know, you're, you're a startup that's like going after a particular vertical and you've put together, you know, you think you understand your pretty well, you've put together your agent and maybe it's working pretty well. But you know, you can get something better or you really need something better, then you can bring that to us and we can optimize that entire agent or pieces of that agent. We could optimize just the prompts, just the reasoning strategies. There's a lot of different things that we can do depending on your particular needs.
Interviewer 2
It sounds like this is a complete different paradigm than rl because we went through the S curve of regular pre training rl when OpenAI released 01 and now this feels like a new one. It sounds special. It rhymes a lot with RNNs, which is a whole different paradigm than RL. Right.
Ian Fisher
It's going to depend on the particular task, the particular type of problem that we're going after that we're trying to solve, and the underlying models that we're working with. But effectively, you could say like each model or each set of models that we're working with will have their own S curve. The Poetic system, the Poetic meta system itself is also going to have its own S curve. And so as the Poetic meta system gets better, and as the underlying models get better, you'll find that the S curve that you're dealing with keeps shifting higher and higher until ultimately either you saturate like Reach AGI. Yeah, Reach AGI. Reach SuperIntelligences.
Podcast Host
Yeah, given its stilts, you might like hit the ceiling first then.
Ian Fisher
That's the goal, right? Yeah, you want to hit the ceiling first with Poetic.
Podcast Host
I think a lot of startups that we work with and then in my spare time I, you know, do a bunch of context engineering. And then the thing is we're sort of like tuning it, tuning evals, tuning like we're context stuffing ourselves. What does that even feel like to have a, you know, recursively self improving version of like prompt engineering and context engineering?
Ian Fisher
We don't spend a lot of time looking at the particular data that we're working with. Instead we're letting the Poetic meta system look at that data. And so like the meta system, you know, if it thinks that it needs to put more things into context, do more context stuffing or whatever, it'll. It'll do that. If it needs to like generate a bunch of examples to get the get better performance, it'll do that for you. Right. It was pretty interesting to look at the prompt output in particular, I'd say for ARC AGI in that I think you can read those and say, well, that's not what a human would have written pretty clearly. And there's some unexpected stuff and it made some really simple examples. And one of the examples is actually wrong. But we didn't change it. We were like, well, this is the thing that IT output. We'll just leave it be. We don't want to go in and monkey around with things. And so historically in machine learning you always is like the rule was you have to know your data set really well. But now we're kind of outsourcing that to the AI itself, where the AI is the AI's job to understand the data set and figure out where are the failure modes and where are the kind of robust reasoning strategies that the model that the agent could use to get better performance.
Podcast Host
How much of it is like much the output is much better prompts and then how much of it is like the harness itself, context stuffing or summarizing in the right way or re ranking in the right way so that like you have some number of like mega LLM calls and then how do you get the most out of each of those calls?
Ian Fisher
Yeah, and so that definitely varies per problem. But what we've seen, in fact our last paper at DeepMind was not doing this recursive, self improving stuff, but we were showing that you could build these harnesses manually to solve really hard problems. And what we saw there is that we manually optimized the prompts really hard for these very hard problems. And that got us a little bit of the way. In this particular case, the hardest task we were working on, we got to 5% performance with Gemini 1.5 flash. This was a while ago. And then when we added on the reasoning strategies, we went from 5% to 95%. And so this is typically what we see. You know, like everybody's out there kind of doing some amount. I wouldn't say everybody, but many people are out there kind of doing some amount of automated prompt optimization. You know, JEPA is this very popular paper. Everybody's kind of reimplementing that that will get you some performance improvements, but it's very far from everything that you can get if you actually think about these reasoning strategies that are really going to be written in code rather than in just better prompts.
Interviewer 3
So if startups want to use Poetic
Podcast Host
to put their agent on stilts, what should they do?
Ian Fisher
Yeah, so right now we haven't released anything yet, but if you go to Poetic AI, there is a button you can click to sign up for early access. And if you're a startup or a company who has a really hard problem and you've tried everything that you can to make it reliable and robust and you just can't get all the way there, you need something more, then let us know looking for problems like that. So just tell us what it is that you're working on and we'll reach out. You'll be the first to know when we're ready to work with you.
Podcast Host
I mean, if you're at the top of humanities last exam, then I mean that's pretty big. So you're already all the way out there at Soda and then I guess the stilts basically let any agentic company become Soda.
Ian Fisher
That's the idea. Yeah. Yeah. And you know, we view the ARC AGI results and the humanities last exam results as showing kind of two different capabilities we have. We can really improve your reasoning and we can really improve deep knowledge extraction from these models.
Podcast Host
And then you're just totally vaccinated against the bitter lesson.
Ian Fisher
Exactly.
Podcast Host
YC's next batch is now taking applications. Got a startup in you apply@y combinator.com apply. It's never too early and filling out the app will level up your idea. Okay, back to the video.
Interviewer 3
Slight sort of change of topic, but something I was curious about. So you arrived at Google over a decade ago when they acquired your first YC startup at Portable. Portable was porting mobile apps cross platform.
Interviewer 2
Right.
Interviewer 3
Like Android or whatever. It's quite different to recursive self improving AGI. How did you make that leap? What happened once you got to Google? What made you think that you maybe wanted to shift down, do something different and just love to hear that story.
Ian Fisher
The acquisition was this amazing opportunity to reflect on what I really wanted to be doing next. Right. Like Google was in the, you know, itself is a place where you can do so many different things. So I spent some time thinking about where, where I wanted to go next in, in. In my journey. I realized that the problems that I was most excited about were really actually AI and robotics. And the best people in the world, many of them in those fields, were at Google at the time. And so I went and talked to them. They let me come join a new AI robotics team in Google Research, which was this amazing opportunity for me since that wasn't my background. My background was computer security and then this cross platform mobile, you know, systems building stuff. I was able to join this team. And I'll tell you the truth that I very quickly realized that hardware is hard and I didn't really want to be doing robotics. It was more aspirational at that moment. But I was really passionate about machine learning. So I just made a very hard switch into just doing machine learning research and did that for about a decade at Google and then Google and then DeepMind.
Interviewer 3
What's maybe some advice that you have today for engineers who want to get into sort of more of the AI side, probably the applied AI and build startups around AI. How should they think about that?
Ian Fisher
The world is changing so quickly. This is probably a little bit obvious, but you should just try things and every day do something with AI. Always try to push yourself to find the boundaries of what they're capable of and build the things that you want to build. Even for me, last summer I took a weekend and used GPT5 to help me build an iPhone app. I hadn't done that in a decade.
Podcast Host
So fast.
Ian Fisher
Yeah, it's so fast and so easy. And that was an age ago. That was eight months ago. Now it's even faster and easier. Don't limit yourself. Anything that you imagine you should just try to use AI and see how far you can get with it and you'll be making the world better.
Podcast Host
That's all we have time for today. But, Ian, thank you so much for giving us all stilts. We can't wait to use it at yc. I can't wait to use it for Gary's list. I mean, there's just so much to do.
Ian Fisher
Yeah. Thank you for having me. This was a lot of fun.
Interviewer 2
It.
Date: February 27, 2026
Guest: Ian Fisher, Co-Founder and Co-CEO of Poetic
Host(s): Y Combinator Podcast Team
This episode delves into the rapidly evolving landscape of artificial intelligence, with a focus on recursively self-improving AI systems as embodied by Poetic, a YC company founded by Ian Fisher. The conversation explores how startups can leapfrog ahead in performance and cost-effectiveness by building agentic "harnesses" on top of frontier language models, ultimately accelerating the path to superintelligence without the need for massive resources.
Ian Fisher introduces Poetic’s core mission: developing recursively self-improving AI harnesses that autonomously optimize and outperform the underlying large language models (LLMs).
Unlike traditional approaches requiring expensive and time-consuming retraining of new LLMs, Poetic’s system builds agentic layers (“harnesses”) atop existing models for rapid, cost-effective improvement.
"Recursive self improvement is this... kind of the holy grail of AI, where the AI is making itself smarter. The core insight that we had is that we could do recursive self improvement far faster and cheaper than all of the other ways."
— Ian Fisher [01:06]
Fine-tuning custom models can cost millions, only to be quickly outdated by new model releases. Poetic’s approach sidesteps this by making harnesses compatible across new models instantly.
A single harness can exploit the latest frontier LLMs without intensive retraining or sunk costs.
“When the new model comes out, that same harness is perfectly compatible with it and you don’t need to change anything...”
— Ian Fisher [04:20]
Poetic gained industry attention by quickly surpassing top AI labs (like Gemini and Anthropic) on AGI benchmarks—including ARC AGI v2 and Humanities Last Exam—at a fraction of the cost.
“Gemini 3 Deepthink had just come out and they were really quite dramatically at the top... And two days later we released our results where we were showing that we could get a lot higher than that.”
— Ian Fisher [05:19]
On Humanities Last Exam, Poetic achieved a state-of-the-art 55% (almost 2% higher than Anthropic’s latest Claude Opus 4.6) using under $100k.
"AI hasn't passed it yet, but we got to 55%... previous state of the art... 53.1% and we got 55%."
— Ian Fisher [06:44]
"The optimization costs us less than 100k."
— Ian Fisher [07:31]
Poetic automates the creation and optimization of agentic harnesses—code, prompts, and reasoning strategies that sit atop LLMs—to reliably solve specific, hard problems.
The system can be used to optimize not just new, but existing agents at founder companies, even targeting specific improvements such as prompt engineering or reasoning logic.
“We can generate these systems in a much more automated manner, which means we can do it much more quickly and cheaply than if you hired a team yourself...”
— Ian Fisher [09:16]
The Poetic Meta System offers a new optimization paradigm, emphasizing reasoning strategies over brute force fine-tuning or RL.
Harnesses are optimized to extract more from each LLM call, shifting performance curves (S-curves) higher with each new generation of models and recursive improvements.
“As the Poetic meta system gets better, and as the underlying models get better, you'll find that the S curve... keeps shifting higher and higher until ultimately either you saturate like Reach AGI. Yeah, Reach SuperIntelligences.”
— Ian Fisher [10:47]
Historically, ML practitioners hand-crafted data handling; now, Poetic’s AI system autonomously analyzes datasets and devises robust prompting and reasoning paths, displacing the need for manual intervention.
“We don't spend a lot of time looking at the particular data that we're working with. Instead we're letting the Poetic meta system look at that data... the AI's job [is] to understand the data set and figure out where are the failure modes and where are... robust reasoning strategies.”
— Ian Fisher [11:52]
On Model Leapfrogging:
“That's what it's like to have stilts, you know, like whatever model comes out, you can be taller than that one with Poetics, which is like. That's so awesome.”
— Podcast Host [05:59]
On Cost Disruption:
“So they were at 45% and like 70 something dollars and we were at 54% and $32 per problem.”
— Ian Fisher [06:13]
On Automation vs. Manual Optimization:
“If you already have done that work... you can bring that to us and we can optimize that entire agent or pieces of that agent.”
— Ian Fisher [09:49]
On Reaching SuperIntelligence:
“As the Poetic meta system gets better... you'll find that the S curve that you're dealing with keeps shifting higher and higher until ultimately either you saturate like Reach AGI. Yeah, Reach SuperIntelligences.”
— Ian Fisher [10:47]
On Practical Startup Advice:
“The world is changing so quickly. This is probably a little bit obvious, but you should just try things and every day do something with AI.”
— Ian Fisher [18:28]
Experiment Relentlessly:
“You should just try things and every day do something with AI... anything that you imagine, you should just try to use AI and see how far you can get with it and you'll be making the world better.” — Ian Fisher [18:28, 19:03]
For Startups Seeking an Edge:
Poetic is seeking hard problems from companies ready to put their agents “on stilts”—applications for early access are open.
Poetic’s recursively self-improving harnesses represent a new paradigm for extracting superhuman performance from LLMs at startup speed and efficiency. By shifting the locus of innovation from simply scaling models to building and optimizing adaptive agentic layers, Ian Fisher and team are enabling startups to outpace even AI giants, democratizing access to "superintelligence"—one benchmark leap at a time.