
What comes after vibe coding? Maybe vibe researching. OpenAI’s Chief Scientist, Jakub Pachocki, and Chief Research Officer, Mark Chen, join a16z general partners Anjney Midha and Sarah Wang to go deep on GPT-5—how they fused fast replies with long-horizon reasoning, how they measure progress once benchmarks saturate, and why reinforcement learning keeps surprising skeptics. They explore agentic systems (and their stability tradeoffs), coding models that change how software gets made, and the bigger bet: an automated researcher that can generate new ideas with real economic impact. Plus: how they prioritize compute, hire “cave-dweller” talent, protect fundamental research inside a product company, and keep pace without chasing every shiny demo.
Loading summary
Jakob Pochotzky
The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas. The next set of evals and milestones that we're looking at will involve actual movement on things that are economically relevant.
Mark Chen
I was talking to some high schoolers and they were saying, oh, you know, actually the default way to code is Vibe coding. I do think, you know, the future hopefully will be vibe researching.
Podcast Host (Narrator)
What does it take to build an automated researcher? And can AI discover new ideas on its own? OpenAI's Chief Scientist Jakob Pochotzky and Chief Research Research Officer Mark Chen joined A16Z general partners Ajahnay Miha and Sarah Wang to unpack GPT5's reasoning push why evals must shift to economically meaningful benchmarks and the march towards an automated researcher. We get into Long Horizon Agency, why RL keeps working, the new Codex for real world coding, research, culture versus product and why for now, compute is destiny. Let's get into it.
Ajahnay Miha
Thanks for coming, Jakob and Mark. Jakob, you're the Chief Scientists at OpenAI. Mark, you are the Chief Research Officer at OpenAI, and you guys have both the privilege and the stress of running probably one of the most high profile research teams in AI. And so we're just really stoked to talk with you about a whole bunch of things we've been curious about, including GPT5, which was one of the most exciting updates to come out of OpenAI in recent times. And then stepping back how you build a research team that can do not just GPT5, but Codex and ChatGPT and an API business and can weave all of the many different bets you guys have across modalities, across product form factors into one coherent research culture and story. And so to kick things off, why don't we start with GPT5. Just tell us a little bit about the GPT5 launch from your perspective. How did it go?
Mark Chen
So I think GPT5 was really our attempt to bring reasoning into the mainstream. And prior to GPT5. Right. We have two different series of models. You had the GPT kind of 2, 3, 4 series which were kind of these instant response models. And then we had an O series which essentially thought for a very long time and then gave you the best answer that it could give. So tactically, we don't want our users to be puzzled by which mode should I use? And involves a lot of research in kind of identifying what the right amount of thinking for any particular prompt looks like and taking that pain away from the user. So we think the future is about Reasoning more and more about reasoning, more and more about agents. And we think GPT5 is this step towards delivering reasoning and more agentic behavior by default.
Jakob Pochotzky
There is also a number of improvements across the board in this model relative to O3 on our previous models. But our primary physi for this launch was indeed bringing the reasoning out to more people.
Sarah Wang
Can you say more about how you guys think about evals? I noticed even in that launch video there were a number of evals where you're inching up from, you know, 98 to 99% and that's kind of how, you know, you've saturated the eval. What approach do you guys take to measuring progress and how do you think about it?
Jakob Pochotzky
One thing is that indeed, for these evolves that we've been using for the last few years, they're indeed pretty close to saturated. And so, yeah, for a lot of them, inching from 96 to 98% is not necessarily the most important thing in the world. I think another thing that's maybe even more important, but a little bit subtler, when we were in this GPT2, GPT3, GPT4 era, there was kind of one recipe. You just pre train a model on a lot of data and you kind of use these evals as just kind of a yardstick of how this generalizes to different tasks. Now we have these different ways of training, in particular reinforcement learning on serious reasoning, where we can pick a domain and we can really train a model to become an expert in this domain, to reason very hard about it, which lets us target particular kinds of tasks, which will mean that we can get extremely good performance on some evils, but it doesn't indicate as great generalization to other things. So the way we think about it in this world, we definitely think we are in a little bit of a deficit of great evaluations. And I think the big things that we look at are actual marks of the model being able to discover new things. I think for me, the most exciting trend and actual sign of progress this year has been our model's performance in math and programming competitions, although I think they are also becoming saturated in a sense. And the next set of evals and milestones that we're looking at will involve actual discovery and actual movement on things that are economically relevant.
Sarah Wang
Totally. You guys already got number two in the coder competition, so there's only number one left.
Mark Chen
Yeah, yeah. I mean, I think it is important to note that these evals, like, you know, ioi, coder, imo, are actually real world markers. For success in future research. I think a lot of, you know, the best researchers in the world have gone through these competitions, have gotten very good results. And yeah, I think we are kind of preparing for this frontier where we're trying to get our models to discover new things.
Sarah Wang
Yeah, very exciting.
Ajahnay Miha
Which capability from GPT5 before the release surprised you the most? When you were working through the eval bench or using it internally, were there any moments where you felt like this was starting to get good enough to release because it was useful in your daily usage?
Mark Chen
I think one big thing for me was just how much it moved the frontier. In very hard sciences we would try the models with some of our friends who are professional physicists or professional mathematicians. And you already saw kind of some instances of this on Twitter where you can take a problem and have it discover maybe not very complicated new mathematics, but some non trivial new mathematics. And we see physicists, mathematicians, kind of repeating this experience over and over where they're trying GPT5Pro and saying, wow, this is something that previous version of the models couldn't do. And it is a little bit of a light bulb moment for them. It's like able to automate maybe like what could take one of their students months of time.
Jakob Pochotzky
Well, GPT5 is a definite improvement on O3 for me, O3 was definitely like that moment where the reasoning models became like actually very useful on a daily basis. I think especially for working through a math formula or a derivation. It actually got to a level where it is fairly trustworthy. I can actually use it as a tool for my work. And yeah, I think it is very exciting to get to that moment, but I expect that. Well now as we're seeing these models actually able to automate. Well yes, like we're saying solving contest problems over longer time horizons. I expect that was quite small compared to what's coming over the next year.
Ajahnay Miha
What is coming in the next one to five years at whatever level you're comfortable sharing, what does the research roadmap look like?
Jakob Pochotzky
So the big thing that we are targeting with our research is producing an automated researcher. So automating the discovery of new ideas. And of course a particular thing we think about a lot is automating our own work, automating ML research. But that can get a little bit self referential. So we're also thinking about automating progress in other sciences. And I think one good way to measure progress there is looking at what is the time horizon on which these models actually can reason and make progress. And so now as we get to a level of near mastery of this high school competitions, let's say, I would say we get to maybe on the order of one to five hours of reasoning. And so we are focused on extending that horizon both in terms of the model's ability to plan over very long horizons and actually ability to retain memory.
Mark Chen
And back to the evals question. That's why I think evals of the form of how long does this model autonomously operate for are of particular interest to us.
Sarah Wang
And actually maybe on that topic, there's been this huge move toward agency and model development. But I think at least the state that it's in currently users have sort of observed this trade off between too many tools or planning hops can result in quality regressions versus something that maybe has a little bit less agency. The quality is at least observed today to be a bit higher. How do you guys think about the trade off between stability and depth? The more steps that the model is undertaking, maybe the less likely the tenth step is to be accurate versus versus. You ask it to do one thing, it can do it very, very well. And to have it keep doing that one thing better and better, but more complex things, there's sort of that trade off. But of course to get to full autonomy, you are taking multiple steps, you're using multiple tools.
Jakob Pochotzky
I think actually like, well, the ability to maintain depth is a lot of it is being consistent over long horizons. So I think there are very related problems. And in fact, I think like with the reasoning models we have seen the models like greatly extend the length over which they are able to reason and work reliably without going off track. Yeah, I think this remains a big area of focus for us.
Mark Chen
Yeah. And I think reasoning is core to this ability to operate over a long horizon because you imagine kind of yourself solving a math problem. You try an approach, it doesn't work and you have to think about what's the next approach I'm going to take, what are the mistakes in the first approach. And then you try another thing and the world gives you some hard feedback. And then you keep trying different approaches. And the ability to do that over a long period of time is reasoning and gives agents that robustness.
Sarah Wang
We talked a lot about math and science. I was curious to get your take on. Do you think some of the progress that we've made can actually extend similarly to domains that are less verifiable? They're sort of less of an explicit right or wrong?
Jakob Pochotzky
Oh yeah. This is a question I really like. I think if you actually truly want to extend to research and discovering ideas that meaningfully advance technology on the scale of months and years. I think these questions stop being so different. It is one thing to solve a very well posed constrained problem on the scale of an hour and there's kind of a finite amount of ideas you need to look through and that might feel extremely different from solving something very open ended. But even if you want to solve a very well defined problem that is on a much longer scale. Right. Prove this millennium price problem. Well, that suddenly requires you to think about, okay, what are the fields of mathematics or other science that might possibly be relevant? Are there inspiration from physics that I must take what is kind of the entire program that I want to develop around this. Now these become very open ended questions and it's actually hard to for our own research if all we cared about is, you know, reduce the modeling clause on a given data set. Right. Like measuring the progress on that. Like, are we kind of actually asking the right questions in research? Actually becomes like a fairly open ended affair.
Mark Chen
Yeah. And I think it also makes sense to think about what the limits of, you know, open ended means. I think a while back, Sam tweeted about some of the improvements that we were making in having our models write more creatively. And you know, we do consider the extremes here as well.
Sarah Wang
Right, right.
Ajahnay Miha
Let's talk about RL because it seems like since 01 came out, RL has been the gift that keeps giving. You know, Every couple months OpenAI puts out a release and everyone goes, oh, that's great. But this RL thing is going to plateau. We're going to saturate the evals, the models won't generalize or there's going to be mode collapse because of too much synthetic data for whatever. Everybody's got a laundry list of reasons to believe that, that the gains in performance from RL are going to tap out and somehow they just don't. You guys just keep coming out and putting out continuous improvements. Why is RL working so well? And what, if anything, has surprised you about how well it works?
Jakob Pochotzky
RL is a very versatile method. Right. And there are a lot of ideas you can explore once you have an RL system working a long time. At OpenAI, we started from this before language models. Right. Like we were thinking about, like, oh, okay, like R.L. is this extremely powerful thing, of course, on top of deep learning, which is this incredible general learning method. But the thing that we struggled with for a very long time is what is the environment? How do we actually anchor these models to the real world? Should we simulate some island where they all learn to collaborate and compete? And then of course came the language modeling breakthrough. And we saw that, oh yeah, if we scale deep learning on modeling natural language, we can create models with this incredibly new understanding of human language. And so since then we've been seeking how to combine these paradigms and how to get RL to work on natural language. And once you do right, then you kind of have the, well, you have the ability to actually execute on these different ideas and objectives in this extremely robust, rich environment given by pre training. And so yes, I think it's been perhaps the most exciting period in our research over the last few years where we've really found so many new directions and promising ideas that all seem to be working out and we're trying to understand how to compare.
Ajahnay Miha
One of the hardest things about RL for folks who are not practitioners of RL is the idea of crafting the right reward model. And so especially if you're a business or an enterprise who wants to harness all this amazing progress you guys are putting out, but doesn't even know where to start, what do the next few years look like for a company like that? What is the right mindset for somebody who's trying to make sense of RL to craft the right reward model? Is there anything you've learned about the best practices or an approach of thinking of using this latest sort of family of reasoning techniques? What is the right way? I should think about even approaching reward modeling. As a biologist or a physicist, I.
Jakob Pochotzky
Expect this will evolve quite rapidly. I expect it will become simpler. Right. Like I think, you know, maybe like two years ago we would have been talking about like what is the right way to craft my fine tuning data set. And I don't think we are like at the end of that evolution yet. And I think we will be inching towards more and more human like learning, which RL is still not quite. So I think maybe the most important part of the mindset is to not assume that what is now will be forever.
Sarah Wang
So I want to bring the conversation back to coding. We would be remiss not to say congrats on GPT5 Codex, which just dropped today. Can you guys say a little bit more about what's different about it? How it's trained differently? Maybe why you're excited about it.
Ajahnay Miha
Yeah.
Mark Chen
So I think one of the big focuses of the Codex team is to just take the raw intelligence that we have from our reasoning models and make it very useful for real world coding. So a lot of the work they've done is kind of consistent with this. They are working on kind of having the model be able to handle more difficult environments. We know that real world coding is very messy, so they're trying to handle all of the intricacies here. There's a lot of coding that has to do with style, with just kind of softer things like how proactive the model is, how lazy it is, and just being able to define in some sense a spec for how a coding model should behave. They do a lot of very strong work there. And as you seem, they're also working on a lot better presets. Coders, they have some kind of notion of this is how long I'm waiting, I'm willing to wait for a particular solution. I think we've done a lot of work to dial in on. For easy problems, being a lot lower latency. For harder problems, actually the right thing is to be even higher latency, get you the really best solution and just being able to find that preset.
Sarah Wang
What's the sweet spot for if you were to say easier problems versus harder?
Mark Chen
What we found is the previous generation of the codecs models, they were spending too little time solving the hardest problems and too much time solving the easy problems. And I think that is actually just probably out of the box what you might get out of O3.
Sarah Wang
Maybe just on the topic of coding, since you guys are both competitive coders in prior lives, I know you've been at OpenAI for almost a decade now, but I was struck by the story of Lee Sedol, the GO player who kind of famously quit GO after He lost to AlphaGo multiple times. And I think in a recent interview you guys were both saying that now the coding models are better than your capabilities and that gets you excited. But say more about that. And how much would you say you code now? Well, if you're hands on keyboard, you can talk about OpenAI generally, but how much code is written by AI now.
Jakob Pochotzky
In terms of coding models being better? I think it is extremely exciting to see this progress. I think the programming competitions have a nice kind of encapsulated test of ability to come up with some new ideas in this boxed environment and timeframe. I do think if you look at things like, well, I guess the imo problem 6 or maybe some very hardest programming competition problems, I think there's still a little bit of headway to go for the models, but I wouldn't expect that to last very long. I do go a little bit. Historically, I've Been. He's being humble. Historically, I've actually been extremely reluctant to use any sort of tools. I just used Vim, pretty much old school. Yeah, eventually, I think, especially with this latest coding tools like GPT5, I really kind of felt like, okay, this is no longer the way you can do a 30 file refactor pretty much perfectly in 15 minutes. You kind of have to use it. Yeah. And so I've been kind of like learning this new way of coding, which definitely feels a little bit different. I think it is a little bit of an uncanny valley still right now where you kind of have to use it because it is just like accelerating so many things, but it's still a little bit not quite as good as a coworker. So I think our priority is getting out of the uncanny valley. But yeah, it's definitely an interesting time.
Mark Chen
Definitely to kind of speak to the Lee Se do moment. I think AlphaGo for both of us was a very formative milestone in AI development. And at least for me, it was the reason I started working on this in the first place. And maybe partly because of our backgrounds in competitive programming, I had this affinity to building these models which could do very, very well in these forms of contests. And going from solving 8th grade math problems to a year later hitting our level of performance in these coding contests. It's crazy to see that progression. And you kind of imagine or like to think that you feel a set of the feelings at least. It all felt too like, wow, this is really crazy. Right. And what are the possibilities? And this is something that I took decades to do and took a lot of hard work to get to the forefront of. So you really do feel an implication of that is these models, what can't they do? Right. And I do feel like already it's kind of transformed the default for coding. This past weekend I was talking to some high schoolers and they're saying, oh, you know, actually the default way to code is Vibe coding. Like, you know, I think like, they would consider. Oh, it's like maybe sometimes for completeness you would go and like actually do all of the mechanics of coding it from scratch yourself. But that's just a strange concept to them. Like, why would you do that? You know, you just Vibe code by default. Yeah, yeah. And so, yeah, I mean, I do think, you know, the future hopefully will be Vibe researching. Yeah.
Ajahnay Miha
I have a question about that, which is what makes a great researcher when you say Vibe researching. A big part of Vibe coding is just having good taste in wanting to build something useful. And interesting for the world. And I think what's so awesome about tools like Codex is if you've got a good intuition for what people want, it helps you articulate that and then basically actualize a prototype very fast. With research, what's the analog? What makes a great researcher?
Jakob Pochotzky
Persistence is a very key trait. Right. I think what is different about research when you're actually trying to. I think the special thing about research, Right. Is you're trying to create something or learn something that is just not known. Right. It's not known to work. You don't know whether it will work. And so always trying something that will most likely fail. And I think getting to a place where you are like in the mindset of being ready to fail and being ready to learn from these failures. And of course with that comes creating kind of clear hypothesis and being extremely honest with yourself about how you're doing on them. Right. I think a trap many people fall into is going out of the way to prove that it works. Right. Which is quite different from, I think believing in your idea and thinking of it is extremely important and you want to persist that, but you have to be honest with yourself about when it's working and when it's not and so that you can learn and adjust.
Mark Chen
Yeah, I think there are just very few shortcuts for experience. I think through experience you kind of learn what's the right horizon to be thinking of a problem. You can't pick something that's too hard or it's not satisfying to do something that's too easy. And I think a lot of research is managing your own emotions over a long period of time too. There's just gonna be a lot of things you try and they're not gonna work. And sometimes you. You need to know when to persevere through that or sometimes when to kind of switch to a different problem. And I think interestingness is something, you know, you try to fit through, reading good papers, talking to your colleagues and. And you kind of maybe distill their experience into your own process.
Ajahnay Miha
When I was in grad school, you know, there's a big part. I'm a failed machine learning researcher. I was in grad school and for bioinformatics. But a big part of my research advisor's thrust was about picking the right problems to work on such that you could then sustain and persist through the hard times. And you said something interesting, which was there's a difference between having conviction in an idea and then being maximally truth seeking about when it's not Working and both those things are sometimes intention because you kind of go native on a topic or a problem sometimes that you have deep conviction in. Have you found, is there any sort of heuristics you found are useful at the taste step, at the problem picking step that help you arrive at the right set of problems where that conviction and truth seeking is not as much in zero sum tension as other kinds of problems.
Jakob Pochotzky
Yeah, to be clear, I don't think conviction and truth seeking are really in a zero sum tension. I think you can be convinced or you can have a lot of belief in idea and you can be very persistent in it while it's not working. I think it's just important that you're kind of honest with yourself how much progress you're making and you're in a mindset where you're able to learn from the failures along the way. I think it's important to look for problems that you really care about and you really believe are important. And so I think one thing I've observed in many researchers that inspired me has been really going after the hard problems, looking at the questions that are kind of widely known but not really considered tractable and just asking why are they not tractable? Or what about this approach? Why does this approach fail? You're always thinking about what is really the barrier for the next step. If you're going after problems that you really truly believe are important, then that makes it so much easier to find the motivation to persist with them over years.
Ajahnay Miha
And in the development of like during the training phase of GPT5, for example, are there any, were there any moments where there were. There was a hard problem? The original initial attempts that were being made to crack that problem weren't working and yet you found somebody persisted through that. And what was it about those, any of those stories that comes to mind that worked well that you wish other people and other researchers did more of?
Jakob Pochotzky
I think on the path there, right. Like along the sequence of models, like both the pre trained models and the rezike models. I think one very common theme is bugs. And both just like, yeah, silly bugs in software that can kind of stay in your software for months and kind of invalidate all your experiments a little bit in a way that you don't know. And identifying them can be a very meaningful breakthrough for your research program. But also kind of bugs in the sense of like, well, you have a particular way of thinking about something and that way it's a little bit skewed which causes you to make the wrong assumptions and Identifying those wrong assumptions, rethinking things from scratch. I think both for getting the first reasoning models working or getting the larger pre trained models working. I think we've had multiple issues like that that we've had to work through.
Sarah Wang
As leaders of the research org. How do you think about what it takes to keep the best talent on your team? And on the flip side, creating a very resilient org that doesn't crumble if a key person leaves.
Mark Chen
The biggest I think things that OpenAI has going for it in terms of keeping the best people motivated and excited is that we are in the business of doing fundamental research. We aren't the type of company that looks around and says oh what model did company X build or what model did company Y build? We have a fairly clear and crisp definition of what it is we're out to build. We like innovating at the frontier, we really don't like copying and I think people are inspired by that mission. You are really in the business of discovering new things about the deep learning stack and I think we're kind of building something very exciting together. I think beyond that a lot of it's creating very good culture. So we want a good pipeline for training up people to become very good researchers. We I think historically have hired, you know, the best talent and the most innovative talent. So I just think, you know, we have a very deep bench as well. And yeah, I think most of the our leaders are very inspired by the mission and that's what's kept all of them there. Like when I look at my direct reports, they haven't been affected by the talon wars.
Sarah Wang
I was chatting with a researcher recently and he was talking about wanting to find the cave dwellers. And these are often the people who are not posting on social media about their work for whatever reason they may not even be publishing. They're sort of in the background doing the work. I don't know if you would agree with this concept but how do you guys hire for researchers and are there any non obvious ways that you look for talent or attributes that you look for that are non obvious.
Jakob Pochotzky
So I think one thing that we look for is having solved hard problems in any field. A lot of our most successful researchers have started their journey with deep learning at OpenAI and have worked in other fields like physics or computer science, theoretical computer science or finance in the past. Strong technical fundamentals coupled with the intends to work on very ambitious problems and actually stick with them. We don't purely look for who did the most visible work or is the Most visible on social media.
Ajahnay Miha
As you were talking, I was thinking back to when I was a founder and I was running my own company and we would recruit for great talent engineers. Many of the attributes you described were ones that were on my mind then. And Elon recently tweeted that he thinks this whole researcher versus engineer distinction is silly. Is that just a semantic. Is he just being semantically nitpicky, or do you think these two things are more similar than they actually look?
Mark Chen
Yeah, I mean, I do think they're researchers. They don't just fit one shape. We have certain researchers who are very productive at OpenAI, who are just so good at idea generation, and they don't necessarily need to show great impact through implementing all of their ideas. I think there's so much alpha they generate in just kind of coming up with, oh, let's try this, or let's try this, or maybe we're thinking about that. And there's other researchers who, they are just very, very efficient at taking one idea, rigorously exploring the space of experiments around that idea. So I think researchers come in very different forms. I think maybe that first type wouldn't necessarily map into the same bucket as a great engineer, but we do kind of try to have a fairly diverse set of research tastes and styles and.
Ajahnay Miha
Say a little bit about what it takes to make create a frontier sort of winning culture that can attract all kinds of shapes of researchers and then actually grow them, thrive themselves, make them win together at scale. What is it? What do you think are the most critical ingredients of a winning culture?
Mark Chen
So I think actually the most important thing is just to make sure you protect fundamental research. Right. I think you can get into this world with so many different companies these days where you're just thinking about, oh, how do I compete on a chat product or some other kind of product surface. And you need to make sure that you leave space and recognize the research for what it is and also give them the space to do that. Right. Like, you can't have them being pulled in all of these different product directions. So I think that's one thing that we pay attention to within our culture.
Jakob Pochotzky
Especially now that there's so much spotlight on OpenAI, so much spotlight on AI in general and the competition between different labs. It would be easy to fall into a mindset of like, oh, we're racing to bid, beat this latest release or something. And there's definitely areas that people kind of start looking over their shoulder and start thinking about, oh, what are these other things? And I see it as A large part of our job to make sure that people have this comfort and space to think about what are things actually going to look like in a year or two? What are the actually big research questions that we want to answer? And how do we actually get to models that vastly outperform what we see currently rather than just iteratively improving in the current paradigm.
Sarah Wang
Just to pull on that thread more on protecting fundamental research. You guys are obviously one of the best research organizations in the world, but you're also one of the best product companies in the world. How do you balance and especially with you've brought on some of the best product execs and in the world as well, how do you balance that focus between the two and while protecting fundamental research, also continue to move forward the great products that you have out?
Mark Chen
Yeah, I mean, I think it's about kind of delineating a set of researchers who really do care about product and who really want to be accountable to the success of the product. And they should of course very closely coordinate with the research organ large um, but I think just kind of people understanding their, their mandates and what they are rewarded for that, that's a very important thing.
Jakob Pochotzky
One thing that I think is also helpful is that our product team and, and broader company leadership is, is bought into this vision right where, where we are going with research. And so, you know, nobody is assuming that like, oh, the product we have now is the product we'll have forever and we'll just kind of wait for like, you know, new versions from research like, like we, we are able to think jointly about what the future looks like.
Ajahnay Miha
One of the things that you guys have done is let such a diversity of different ideas and bets flourish inside of OpenAI that you then have to figure out some way as research leaders to, to make it all make coherent sense as one part of our roadmap. And you got, you know, people over here investigating the future of diffusion models and visual media. And over here you've got folks, you know, investigating the future of reasoning when it comes to code. How do you paint a coherent picture of all that? How does that all come together when there might be at least naively, some tension between giving researchers the independence to go to fundamental research and then somehow making that all fit into one coherent research program.
Jakob Pochotzky
Our stated goal for our research program has been getting to an automated researcher for a couple years now. And so we've been building most of our projects with this goal in mind. And so this still leaves a lot of room for kind of bottom up Idea generation for fundamental research on various domains. But we are always thinking about how do these ideas come together. Eventually we are, we believe, for example, that reasoning models go much further. And we have a lot of explorations on things that are not directly reasoning models, but we are thinking a lot about how they eventually combine and what will this innovation look like once you have something that is out there and thinking for moms about a very hard problem. And so I think this clarity of our long term objectives is important, but it doesn't mean that we are prescriptive about, oh, here are all the little pieces. We definitely view this as a question of exploration and learning about these technologies.
Mark Chen
Yeah, I think you want to be opinionated and prescriptive at their very kind of coarse level. But a lot of ideas can bubble up in a finer level.
Ajahnay Miha
And have there been any moments where those things have been intentioned at all recently? Well, one provocative example could be recently, you know, this new image model came out, which is Nano Banana, right from Google. It's extraordinary value shown that like lots of everyday people can unlock a lot of creativity when these models are good at understanding editing prompts. And I could see how that would create some tension for a research program that may not be prioritizing that as directly. If one of your, you know, somebody talented on your team came and said, guys like this thing is so clearly valuable in the world out there, we should be spending more effort and more energy on this. How do you reason about that question?
Jakob Pochotzky
I think that's definitely a question that we've been kind of thinking about for quite a while at OpenAI. I mean if you look at GPT3, right. Once we kind of saw this is kind of where language models are going, we definitely had a lot of discussions about, well, clearly there are going to be so many magical things you can do with AI and you will, you will be able to go to this extremely smart models that are out there pushing the frontiers of science. But you will also have this incredible media generation and this incredibly transformative entertainment applications. And so how do we prioritize among all these directions has definitely been something we've been thinking about for quite a while.
Mark Chen
Yeah, absolutely. And the real answer is like we don't discourage someone from being really excited by that. And it's just if we're consistent in the prioritization and our product strategy, then it just will naturally fall in. And so it's just for us, we do encourage a lot of people to be excited about building this or building kind of like agentic Products, whatever kind of products that they're excited by. But I think it's important for us to also have a separate group of people who you protect that their goal is to create the algorithmic advances.
Sarah Wang
How does that translate, just to build on Anja's question, into a concrete framework around resourcing, do you think about, okay, X percent of compute resources will go to longer term, very important, but maybe a bit more pie in the sky exploration versus there's also obviously current product inference, but sort of this thing in the middle where it's achievable in the short to medium term.
Mark Chen
Yeah. So I think that's a big part of both of our jobs. Just this portfolio management question of how much compute do you give to which project? And I think historically we've put a little bit more on just the core algorithmic advances versus kind of the product research. But it's something that you have to feel out over time. Right. It's dynamic. I think month to month there could be different needs. And so I think it's important to stay fairly flexible on that.
Sarah Wang
And if you had 10% more resources, would you put it toward compute or is it data curation people? Where would you stick that from? Like a marginal.
Mark Chen
Good question.
Jakob Pochotzky
Honestly. Yeah, I think compute to compute today.
Mark Chen
Fairly reasonable answer here. Yeah, yeah. I mean, honestly, Honestly I do think kind of to your question of prioritization, it's like in a vacuum, any of these things you would love to go and excel and win at, I think the danger is you end up like second place at everything and not clearly leading at anything. So I think prioritization is important. Right. And you need to make sure there's some things you're clear eyed on. This is the thing that we need to win.
Ajahnay Miha
Yeah, but I think it makes sense to talk about it for just a little bit more, which is computer sets so much of computer's destiny in a way. Right. At a research organization like OpenAI. And so a couple of years ago I think it became very fashionable to say, oh, okay, we're not going to be compute constrained anytime soon because there's a bunch of CMS that people are discovering and we're going to get more efficient and all the algorithms are going to get better and then eventually really we'll just be in a data constrained regime. And it seems like a couple of years have come and gone and we're still. This is sort of very compute constrained environment. Does that change anytime soon you think, or.
Jakob Pochotzky
I mean, I think we've seen for long enough like how much we can do with compute. Yeah, I haven't really bought that much into the like will be data constrained claim. And yeah, I don't expect that to change.
Mark Chen
Yeah, anyone who says that should just step into my job for a week and just, there's no one who's like, oh, you know, I have all the compute that I need. Right. Yeah.
Ajahnay Miha
You know, historically, the job of advancing fundamental research has historically been largely a mandate that universities have had, partly for the compute reasons you just described. That hasn't been the case for frontier AI. You guys have done such an incredible job kind of channeling the arc of frontier AI progress to help the sciences out. And I'm wondering, when those worlds collide, the fundamental world of university research today and the world of frontier AI, what comes out?
Mark Chen
So I guess I personally started as a resident at OpenAI and it's a program that we had for people in different fields to come in, learn quickly about AI and become productive as a researcher. And I think there's a lot of really powerful elements in that program and the idea is just, could we accelerate something that looks like a PhD in as little time as possible? And I think a lot of that just looks like implementing a lot of very core results. And through doing that you're going to make mistakes, you're going to be like, oh, wow, build intuition for if I set this wrong, that's going to blow up my network in this way. And so you just need a lot of that hands on experience. I think over time there have been curriculums developed at probably all of these large labs in optimization and architecture and rl and yeah, probably no better way than to just kind of try to implement a lot of those things and read about them and think critically about them.
Jakob Pochotzky
Yeah, I think maybe one other nice thing that you get to experience at academia is this persistence of like, oh, you have a few years and you're kind of trying to solve a problem and it's a hard problem and you've never dealt with such a hard problem before. And yeah, I do feel like this is a thing that's like, well, currently the pace of progress is very fast. Maybe also the ideas tend to work out a little bit more often than they did in the past because deep learning just wants to learn. And getting your hands on a more challenging problem for a little bit, maybe being part of a team attacking an ambitious challenge and, and getting that feeling of what it feels like to be stuck and what it feels like to finally be making progress, I think is also something that's very useful to learn.
Sarah Wang
How does external perception, reception of a particular product launch impact how you prioritize something? Is it to the extent where perception and usage? In the case where they're married, obviously there's probably a clear directive there. But in a case where maybe they're divorced sourced a bit, does that impact how you think about roadmap or where you emphasize resources?
Jakob Pochotzky
So we generally have some pretty strong convictions about the future and so we don't tie them that closely to the short term reception of our products. Of course we learn based on what is going on, we read other papers and we look at what other labs are working on. But, but generally we act from a place of fairly strong belief in what we're building. And so of course that is for our long term research program. Of course, when it comes to product, I think the cycle of iteration is much faster.
Mark Chen
I think with every launch we are trying to aim it to be something that's wildly successful on the product side. And I think from a fundamental research perspective, we're trying to create models with all of the kind of core capabilities needed to build a very rich set of experiences and products. And there are going to be people who have some vision of one particular thing they could build and we'll launch it and everything we launch, we really hope it goes wildly successful and you know, we get that feedback and if it's, if it's not like we'll kind of shape our product strategy a little bit. But yeah, we are definitely also in the business of launching very useful, wildly successful products.
Ajahnay Miha
Yeah, it feels like because of the sort of completely unbridled pace of progress that we've just spent a lot of time talking about, a lot is going to change over the next few years. Right. It gets really hard to Predict, I imagine 10 years out, let alone 10 months out. And so my question, I guess is, through all that change that the frontier of AI is going to bring, what are some priors that you actually think should stay constant? Is there anything? Well, one clearly is that we don't have enough compute. Is there anything else that you think doesn't change that you think would be strong, reasonably held priors as constants?
Jakob Pochotzky
I think more broadly than compute, there's physical constraints of energy. But also at some point, not too far, robotics will become a major focus. I think thinking about the physical constraints is going to remain important. But yeah, I do think on the intelligence front, I would not make too many assumptions.
Sarah Wang
Very few startups can get to the scale that you have both from a employee perspective, but also revenue count and maintain that breakneck speed that you probably had, I mean, seven, eight years ago when you both joined. What's the secret sauce to doing that? And how do you continue to maintain this pressure almost to ship as quickly as possible, even though you're kind of on, you know, top now?
Mark Chen
I think one of the clearest markers that we have really good research culture, at least in my mind, is, you know, I've worked at different companies before and there is a real thing which is a learning plateau, right. You go to a company, you, you learn a lot for the first one or two years and then you just find kind of like, you know, I, I know how to be fairly efficient in this framework and my, my learning kind of stops and I've really never felt that at, at OpenAI just like, like that experience you described of all these really cool results. Bubb learning so much week over week and it is a full time job to kind of stay on top of all of it and that's just been very fulfilling. So. Yeah, no, I think that's a very accurate description. We just want to generate a lot of really high quality research and it's almost a good thing if you're generating enough that you're barely able to keep on top of it.
Sarah Wang
Yeah, exactly.
Jakob Pochotzky
I think definitely the developer of technology I think is a driving force here where maybe we would kind of become comfortable after a few years working in a given paradigm. But we are always on the cusp of that new thing and trying to reconfigure our thinking around the kind of new constraints and new possibilities that we're going to be faced with. And so I think that kind of creates this feeling of constant change and the mindset of always kind of learning the new thing.
Ajahnay Miha
Well, one thing that came up in our research about things at OpenAI that have not changed through a lot of the change is the trust that the two of you guys have in each other. Because I think there was an article or profile of you guys recently in the MIT Tech review and that was also one of the highlight themes that your chemistry, your trust with each other, your oppos, something a lot of the people at OpenAI I've come to treat as a constant. So what's the backstory? How did you guys build trust there?
Jakob Pochotzky
How did that happen?
Ajahnay Miha
Just like asking you to. Have you ever seen that? When Harry Met Sally? I feel like you're on the couch and now you gotta.
Jakob Pochotzky
What's your make you.
Sarah Wang
Yeah, exactly.
Mark Chen
Well, I do think you know, we started working together a little bit more closely when we kind of had the first seeds of working on reasoning. I think at the time, that wasn't a very popular research direction to work on. And I think both of us kind of saw glimmers of hope there, and we were kind of pushing in this direction, kind of figuring out how to make our all work. And, yeah, I think over time, kind of growing a very small effort into increasing larger effort. And I think that's kind of where.
Jakob Pochotzky
I.
Mark Chen
Really got to kind of work with Jakob in depth. I think he's just really a phenomenal researcher. I think any of these rank lists, he should be number one. Just his ability to take any very difficult technical challenge and almost like personally just kind of think about it for two weeks and just crush it. It's incredible that he has kind of the wide range that he does in terms of understanding as well as that kind of depth that you can go and just personally solve a lot of these technical challenges.
Ajahnay Miha
Now you get to say some nice stuff about him.
Mark Chen
Say anything nice about me.
Jakob Pochotzky
Thanks, Mark. Yeah, I think the first big thing that we did together was we started seeing, okay, we think this algorithm is going to work. And so I was thinking, okay, how do we direct people at this? And we're talking with Mark like, oh, we should establish a team that's actually going to make this work. And then Mark went and actually did this. Actually kind of got a group of people working on very different things, got them all together and created a team with incredible chemistry out of this whole disparate group. And that was such an impressive thing to me. And, yeah, I'm really grateful and inspired to get to work with Mark and kind of experience that, yeah, I think this incredible capacity to both understand and engage and think about the technical matter of the research itself, but then coupled with this great ability to lead and inspire teams and create an organizational structure that in this whole kind of mess of chaotic directions, actually is coherent and able to gel together. Yeah, very, very inspiring.
Sarah Wang
That's awesome.
Ajahnay Miha
Well, on that note, great note to end on.
Sarah Wang
Yeah.
Ajahnay Miha
Some of the greatest discoveries in science, especially in physics, have often come from a pair of collaborators, often across universities, across fields. And it seems like you guys have. Have now added to that tradition. And so we're just super grateful that you guys made the time to chat today. Thanks for coming by.
Jakob Pochotzky
Thank you.
Sarah Wang
Thanks for being with us.
Podcast Host (Narrator)
Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to, like, comment, subscribe, leave us a rating or review and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on X16Z and subscribe to our substack@a16z.substack.com thanks again for listening and I'll see you in the next episode. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com disclosures.
Episode: From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki
Date: September 25, 2025
Host: Andreessen Horowitz (Ajahnay Miha & Sarah Wang)
Guests: Jakub (Jakob) Pachocki, Chief Scientist, OpenAI & Mark Chen, Chief Research Officer, OpenAI
This episode delves into OpenAI’s evolving research culture and technical direction, focusing on the long-term mission of building an “automated researcher”—AI that can autonomously discover new ideas and make economically meaningful contributions. Mark Chen and Jakub Pachocki discuss the research strategy behind GPT-5 and Codex, trends in “vibe coding” and “vibe researching”, benchmarks and evaluation strategies, RL’s persistent relevance, managing a world-class research organization, and maintaining a balance between product and fundamental research.
"The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas."
"We think the future is about Reasoning—more and more about reasoning, more and more about agents. And GPT-5 is a step towards delivering reasoning and more agentic behavior by default."
"I think the big things that we look at are actual marks of the model being able to discover new things."
Ajahnay Miha & Jakub Pachocki (11:18-12:58):
"RL is a very versatile method...once you have an RL system working a long time, you have the ability to actually execute on these different ideas in this extremely robust, rich environment."
Jakub Pachocki (13:54): The future for reward modeling is that it becomes more human-like and less about carefully hand-crafting datasets:
"We will be inching towards more and more human-like learning, which RL is still not quite."
Mark Chen (14:35): Codex is now about real-world messy coding:
"Real world coding is very messy...We've done a lot of work to dial in on [presets]: for easy problems, much lower latency; for harder problems, higher latency gets you the best solution."
Jakub Pachocki (16:43): Coding models now often surpass top human competitors—open modeling alleviates intense labor (e.g., “30 file refactor in 15 minutes”). He's moving beyond old-school approaches (just Vim!) due to the gains.
Mark Chen (18:31): “Vibe coding” is now the student default—AI-first. The aspirational analog is “vibe researching”—lowering research’s barrier to entry.
Jakub Pachocki (20:36):
"The special thing about research...is you're trying to create something or learn something that is just not known."
Mark Chen (21:34): Experience, the ability to manage one’s emotions amid repeated failures, and learning problem-interest from colleagues and literature.
"I think over time, kind of growing a very small effort into increasing larger effort..." — Mark Chen
"Mark... got a group of people working on very different things, got them all together and created a team with incredible chemistry." — Jakub Pachocki
"The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas."
"We think GPT-5 is this step towards delivering reasoning and more agentic behavior by default."
"...For a lot of [evals], inching from 96 to 98% is not necessarily the most important thing in the world [...] Now we have these different ways of training, in particular reinforcement learning on serious reasoning, where we can pick a domain and we can really train a model to become an expert in this domain, to reason very hard about it..."
"This past weekend I was talking to some high schoolers and they're saying, oh, you know, actually the default way to code is Vibe coding."
"Persistence is a very key trait... you're trying to create something... that is just not known. It's not known to work. You don't know whether it will work. And so always trying something that will most likely fail."
"We have a fairly clear and crisp definition of what it is we're out to build. We like innovating at the frontier, we really don’t like copying..."
"Anyone who says [we’re not compute constrained] should just step into my job for a week... there's no one who's like, oh, you know, I have all the compute that I need."
"More broadly than compute, there's physical constraints of energy. But also at some point, not too far, robotics will become a major focus."
This conversation offers a rich, behind-the-scenes look at OpenAI’s DNA—how an obsession with deep reasoning, persistent frontier-seeking, and culture of autonomy and trust has positioned the organization for another leap, beyond “vibe coding” to “vibe researching”. The mission: AI that can truly discover, invent, and progress humanity’s knowledge across any field.