
Loading summary
A
Conversations with Tyler is produced by the Mercatus center at George Mason University, bridging the gap between academic ideas and real world problems. Learn more@mercatus.org for a full transcript of every conversation enhanced with helpful links, visit conversationswithtyler.com hello everyone, and welcome back to Conversations with Tyler. Today I'm sitting here chatting with Brendan Foody at the offices of Mercor. Mercor is an AI company. We'll get into more detail soon enough. Which dates from early 2023. Brendan is the CEO and co founder. I believe he's the youngest unicorn founder ever. Mercor by some estimates is the fastest growing company ever. For instance, the quickest speed to $400 million. Brendan, also, at age 22, is the youngest Conversations with Tyler guest ever.
B
My proudest achievement.
A
There's more we'll get to soon enough, but Brendan, welcome.
B
Thank you so much for having me.
C
Tyler, Excited to be here.
A
Now, I saw an ad online not too long ago from AmeriCorps and it said $150 an hour for a poet. Why would you pay a poet $150 an hour?
B
That's a phenomenal place to start. I think it's because.
C
So for, for background on what the company does, we hire all of the experts that teach the leading AI models. And so when one of the AI labs wants to teach their models how to be better at poetry, we'll find some of the best poets in the world that can help to measure success via creating evals and examples of how the model should behave. And one of the reasons that we're able to pay so well to attract the best talent is that we, when we have these phenomenal poets that teach the models how to do things, once they're then able to apply those skills and that knowledge across billions of users, hence allowing us to pay $150 an hour for some of the best poets in the world.
A
So the poets grade the poetry of the models or they grade the writing or what is it they're grading?
C
It could be some combination depending on the project. But an example might be similar to how a professor in English class would create a rubric to grade an essay or a poem that they might have for the students. We could have poet that creates a rubric to grade. You know, how well is the model creating whatever poetry you would like and a response that would be desirable to a given user.
A
How do you know when you have a good poet or a great poet?
B
It's.
C
That's so much of the challenge of it, especially with These very subjective domains in the liberal arts. Right. Is that so much of it? Is this question of taste where you want some degree of consensus of, you know, different exceptional people believing that they're each doing a good job. But you probably don't want too much consensus because you also want to get all of these edge case scenarios of what are the models doing that might deviate a little bit from what the norm is.
A
So you want your poet graders to disagree with each other some amount.
C
Some amount, exactly. But still a response that is conducive with what most users would want to see in their model responses.
A
Are you ever tempted to ask the AI models, how good are the poet graders?
B
We often are. We do a lot of this.
C
It's where we'll have the humans create called a rubric or some sort of eval to measure success and then have the models say their perspective, because you actually can get a little bit of signal from that, especially if you have an expert. I mean, you know, we have tens of thousands of people that are working on our platform at any given time, and so oftentimes there'll be someone that is tired or not putting a lot of effort into their work, and the models are able to help us with catching that.
A
So you had a recent project lately, you hired Larry Summers, I believe, for finance and economics.
B
That was a little bit of a unique deal.
A
He's been a guest on this podcast. Cass Sunstein for law. He's been a guest twice on this podcast. Eric Topol for medicine. I've been a guest on his podcast. How do you pick those people? Obviously they're highly accomplished, but what makes them good at doing this other than just being smart, productive people?
C
Absolutely. Well, so I'll step back and provide a little bit of context on Apex or the AI Productivity Index, and why we chose them to help with it. The largest disconnect that we were seeing in AI research is that everyone was focused on academic evals like GPQA for PhD level reasoning or IMO for Olympiad math, which were wholly disconnected from the outcomes that customers actually care about of how do we get the model to automate a medical diagnosis or a legal draft, or preparing a certain financial analysis of a company. And so we chose legal experts, medical experts, finance experts, people that have a broad economic perspective to see what is the right methodology to think about measuring success across each of these domains, working with them on segmenting. What are all of the different industries within law? What are all the different types of law, and how do we leverage our marketplace of all of these experts to best capture and measure how well models have automated all of those domains.
A
So it's because they've had real world experience and they're not only academics. Is that the way to think about it?
C
I think that's part of it. I think a lot of them obviously have meaningful real world experience, but also this broad vantage point of the entire industry, right, of not just someone that specializes in a particular type of law or a particular industry in big law, but rather having this very large perspective and how we should structure the project, how we should think about the rigorous processes associated with curating the data sets, setting up the reviews, et cetera.
A
And the paper you did with that group of people as your researchers and many others, I should add, what's the main thing you all learned from that exercise and from the paper?
C
I think the largest takeaway is the rate of model improvement at economically valuable tasks is incredible. If you look at the level that GPT4O scored on this model, a frontier model, a year ago, and that against GPT5 today, the Delta is profound. And so it often.
A
Can you put a number on that or somehow.
C
25, 30% improvement per year? Per year, exactly. Well, now GPT5 is at 64%. So maintaining, maintaining that would definitely be challenging.
B
But I mean, it gets my mind.
C
Wondering, like, what will this technology be able to do in another year or two? And how will that have this profound impact on the economy that so many of us have been wondering about for a while?
A
But when you give these numbers, to what extent are you measuring how well they do on the test versus how much economic value are they creating?
C
Well, so I'll walk through the methodology and how we derive that. Essentially, within each industry, we start out with surveys of hundreds of experts. So think within consulting. We get experts that were previously odd, McKinsey, Bain, BCG and other top consulting firms. And then we survey how do they spend their time, what percentage of their time is in customer meetings, is in online research, is in analysis, preparing deliverables for customers, and then within each of those buckets, we ask them to write the corresponding prompts and rubrics associated with how they spend their time. So, you know, using their time as the best proxy we have for the economic value associated with their salary or what customers are willing to pay for. And it's incredible to see, right, the model scoring 64% on that is pretty profound. Obviously, there's some complexity in mapping that to economic impact because in certain industries like medicine, you can't have a 30%.
B
Failure rate, you need to have near perfect.
C
Sort of similar to driverless cars in some ways. But in other industries, like an initial legal draft or a consulting analysis, this technology is already starting to have a profound impact and it's only accelerating.
A
But isn't there something about switching from task to task which the models can't do at all? So the model would beat me on a test. The model might even run better podcast questions than I do. But somehow combining those all in a single entity, I can do even the best model. It's still basically at zero, as far as I can tell. So the economic value is in a way, still at zero.
C
Well, so it's interesting. I think what you're getting at is there's sort of two key things the models struggle at that humans tend to be very good at. The first is these longer horizon tasks of not just something that we could do in a few hours, but something that might take us 50 or 100 hours to do. And then the second thing is integrating multiple tools with our response and going about doing these things, maybe interacting with people as one of those elements. And I think that that's coming very soon. And the next version of Apex.
A
And what does very soon mean, your best guess?
C
Well, I'll talk about it in terms of Apex and then I'll talk about it in terms of model advancement because there is a large correlation between the two. We're doing a lot to measure all of those capabilities and how models interact with the entire workspace and how models do these very long horizon tasks and eval that we're launching in the next couple of months and very quickly, once researchers are able to measure those capabilities, they'll be able to hill climb them. And so I would be shocked if we don't have enormously capable models across those dimensions of lots of tool use with very long horizon tasks in the next six to 12 months.
A
Let's just take the body of knowledge alone. Forget about the long horizon. Just an on the spot test, let's say. I'm Cass Sunstein. I know Cass. He has an incredibly impressive body of knowledge in many areas. When are we at the point where basically Cast cannot ask a question that the best models cannot answer?
C
Wow, that's an interesting question. Well, I think it depends domain by domain, but in law, I think it's going to be a long time. And the reason is that there's so much taste involved in legal responses that effectively getting all of the taste that Caste has into the model is going to be difficult. I do think we'll very quickly get to the point where Cass has a really hard time finding a mistake the model makes, right where he has to spend maybe a week just like trying to probe it.
A
How far away is that?
B
Is that.
C
That might be about two or three years. It wouldn't surprise me for a question and response.
A
I would think it's six months away would be my guess. But maybe if he asked it a thousand questions, I think he could induce an error. But 50 questions, I think in less than a year we might be there.
B
It depends also a little bit in.
C
How tightly you define an error. He might have all sorts of knowledge of niche areas of the law that the model isn't strong at. And so there's some question of how you measure this. But I hold Cass in very high regard with respect to his niche knowledge of the law and ability to stump the models.
A
And what would be an area where the human expert is relatively strong and an area where the human expert compared to the model is relatively weak?
C
There's interestingly a lot of areas in law where the right way of approaching something is not written down or codified. It exists more in the heads of experts, at least not explicitly. And I think it's those domains where there's a lot of taste that isn't well documented that the models will struggle immensely with because they either need those tokens in the pre training data of doing these web scale training runs, or they need it in the post training data of having a legal expert from us to create those data sets. And if they don't have those, then the model will inevitably struggle with that particular problem.
A
Now, I've argued in economics that the leading economics journals should take their referee reports and the submissions and send them somewhere, arguably here. Would that be useful to you?
C
It certainly would.
B
It's something.
C
We've talked about it a bunch of the past. But I think that the largest way that these deep domain experts can help to contribute to the advancement of AI is defining the evals. When we have these phenomenal tests for model capabilities, whether in economics, law or other domains, it's amazing how fast the researchers can hill climb them and optimize around them. And so more help in building these tests and sending them to us and other labs is extremely impactful.
A
So those are nonprofits, those institutions. Why don't they just send it to you now for free? Do you have a theory of this?
C
I'm not sure exactly.
A
It would improve science, right?
C
It would improve science. I think maybe two things. One is awareness of this. I think that while evals are the thing that everyone's talking about in Silicon Valley and the AI labs, it feels like most people in the rest of the country couldn't quite describe exactly why you need an eval. And I think the second is a little bit of fear, right? Where everyone worries about how is AI going to impact their jobs, their work, their ability to contribute to the economy and be meaningful. And I think that that's always top of mind, even for nonprofit organizations that want to contribute and, and preach this world of abundance.
A
So let's say we took the live economics or legal whatever seminars, let's say the top 10 top 20 schools recorded them all, somehow anonymized the data, but you had the comments in transcript and sent that to you. Would that be useful?
C
It would be very useful. One thing I will say, though, is that there are sort of two kinds of data is a good way of thinking about it. The first kind of data is just the output. You have, you know, some curriculum that the model is reading and learning from. The second kind of data is some way of measuring success where you have the rubric for the response, you have the test question, answer, you have the unit test and code. And that second kind of data is the most valuable, where we're able to have the models attempt the problem many, many times, score those responses and learn from them. But. But both are incredibly impactful and things we would, we would love to get support with.
A
So on your wish list, just make this more concrete. You can have some kind of data. Forget about realism. You just get it for free. What is it you most want? Oh, interesting for, say, social science. Forget about realism.
C
I think that we tend to focus a lot on what's economically valuable. And so if people have tests that the models are bad at, that map to a meaningful amount of economic value, you know, and it could be an academic domain that can be applied to create a lot of value in other areas. That's super exciting for us. Maybe a good heuristic is if we could build a model that without seeing this test and reading through it, could max out the test, how much economic impact would that add? Whatever test is able to measure that the best is most helpful, right? And so maybe in medicine, it's, you know, a test around how well the model is doing a certain diagnosis in a particularly difficult domain where we think the models can add a ton of impact. Maybe in economics, it's, you know, areas of analysis and modeling of businesses that aren't well codified but could meaningfully impact the way that we underwrite businesses. Those types of things are what's going through my head.
A
And let's say it's poetry. Let's say you can get it for free, grab what you want from the known universe. What's the data that's going to make the models working through your company better at poetry?
C
Well, I think that it's people that have phenomenal taste of what would users of the end products, users of these frontier models want to see someone that understands that when a given prompt is given to the model, what is the type of response that people are going to be amazed with? How do we define the characteristics of those responses is imperative. And so probably more than just poets that have spent a lot of time in school, we would want people that know how to write work that gets a lot of traction from readers, that gains broad popularity and interest, drives the impact, so to speak, in whatever dimension that we define it within poetry.
A
But what's the data you want concretely? Is it a tape of them sitting around a table? Students come bring their poems. The person says, I like this one. Here's why. Here's why not. Is it that tape or is it written reports or what's like the thing that would come in the mail when you get your wish?
C
The best analog is a rubric, if you have some.
A
A rubric for how to grade.
C
A rubric for how to grade. So if you have here, like if the poem has, you know, evokes this idea that is inevitably going to come up in this prompt or is a characteristic of a really good response, will reward the model a certain amount. If it says this thing will penalize the model, if it styles the response in this way, will reward it. Those are the types of things in many ways very similar to the way that a professor might create a rubric to grade an essay or a poem. Poetry is definitely a more difficult one.
B
Because I feel like it's very unbounded with a lot of essays that you.
C
Might grade from your students. It's a relatively well scoped prompt where you can probably create a rubric that's easy to apply to all of them, versus I can only imagine in poetry classes how difficult it is to both create an accurate rubric as well as apply it. And so the people that are able to do that the best are certainly extremely valuable and exciting.
A
But to get all nerdy here, you know, Immanuel Kant in his third critique, Critique of judgment, he said, in essence, taste is that which cannot be captured in a rubric. And if the data you want is a rubric. And taste is really important. Like maybe Kant was wrong, but how do I square that whole picture? Isn't by invoking taste, you're being circular and wishing for a free lunch that comes from outside the model in a sense.
C
Well, there are other kinds of data they could do if it can't be captured in a rubric. Like Another kind is RLHF, where you could have the model generate two responses similar to what you might see in ChatGPT, and then have these people with a lot of taste choose which response they prefer and do that many times until the model is able to understand their preferences.
B
And so that could be one way.
C
Of going about it as well.
A
I'm sure. You know these studies where there's some AI generated poems and some human generated poems, and often the humans prefer the AI generated poems, even though to people with quote, unquote, taste they're worse. Yeah, I mean, what side, whose side do you take there?
B
Well, it depends what you're optimizing for.
C
I mean, I think that generally we're in the mindset of, for the power users of these AI products, what are the types of responses that they would want to see and be happy with. But it's challenging because that sometimes deviates from the types of responses that the top 1% of experts in poetry might say as a broadly good poem.
B
And so striking that balance is really.
C
Up to a lot of the researchers and product leaders at the labs of what do they think good looks like and how do we act as their partner in defining that?
A
If you could model a much older poet, William Wordsworth, Blake, John Milton, Rilke. Some of my friends say there are no truly great poets left anymore. The best poets were way back when. Is it a goal to model the older poets and figure out what they would think? And rather than having Larry Summers and Cass Sunstein come in that you have some AI generated model of a. Of John Milton, maybe.
B
Well, I will say it ties back to the goal of apex, which is that we saw people were too focused on a lot of these purely academic.
C
Domains and not focusing enough on how will people actually use the models in the economy. But I certainly do think that especially as we start to automate more industries and there's more liberal arts and these kinds of domains where people want to spend time on poetry, certainly building the tools to help them create phenomenal poems and make them happy and their readers happy is definitely the way we would go about it. I'm not sure if it would be using the Archetypes of these former poets. How would you go about it, Tyler?
A
I don't know. I don't trust contemporary poets, frankly. There aren't many of them. I like to read. Maybe, you know, Geoffrey Hill would be one. Some are too postmodern, some maybe are too woke, some are too identity driven. I love older poetry. So it's not that I don't like poetry, but I worry about putting them. They're not quite in charge. I get that. But giving them so much leeway.
C
Yeah, it does evoke this really interesting idea of how we want to teach models and measure success of these models. Is it via consensus? Is it via a handful of the top experts in that given domain? And there's really no correct answer. And I think that different AI labs, different researchers will go down different routes and that will frame the ways that these products feel and the things that they ultimately achieve.
A
Like, maybe we should only enshrine the current age, when the current age is at a peak. Like Scott Sumner says, the best movies were maybe made in the 1960s and 70s, whether or not you agree. But you could have movie evaluators be only from that time. There are some still alive. If you think the best heavy metal, say, comes from the 1980s, well, you wouldn't have, like the current evaluators. You would pick evaluators from the 80s. The best poetry does seem by most people's standards to be really quite old. And we can't resurrect those individuals. But the notion that you enshrine current taste, when taste changes so much, it's a very interesting decision.
C
It certainly is. My guess is that in a long enough time horizon will enshrine taste from every different decade and every different era. And then the model will be able to learn what taste do you have and how does it pull on each of those knowledge bases to best personalize it to your preferences.
A
How much of society, ideally should become a big reinforcement learning machine? We sort of tape everyone, everything, every debate people have over the coffee table.
B
I think it will become an immense amount very quickly.
C
There's obviously still going to be the personal conversations over the coffee table that people don't want recorded. But my firm belief is, especially for economically valuable tasks, we'll move towards a world where people do things once. Instead of the investment banker redundantly analyzing a data room to prepare an analysis of a company every couple of weeks for a new project and a new customer, they'll teach the model how to do that once in the particular domains that they operate in and. And Similar to building software ones, they'll be able to use that many times as they use their agent instead of the customer support rep monotonously responding to tickets every day. And they'll find the mistake that the agent makes, they'll turn that into an RL environment and then all of a sudden the agent will be able to solve that problem many times. And so I think in many ways the economic incentives and how knowledge work will change has a lot of similarities to software. And that will move towards these fixed cost investments of teaching an agent how to do something, building an RL environment for something, and then being able to use agents as many times as we want to perform that activity. And that's why I believe that a huge portion of the economy will become an RL environment machine.
A
And do you think pendants or metal like glasses will be more important than that?
B
Oh, I.
A
Are we gonna do both or.
B
I, I think a lot of both.
A
If I take myself like I don't do that much small talk. Say you attached a little pendant to me and you got the tape of all my conversations. You could feed it in. What's the social value of that? Is it like $5, $50? A bit more. How valuable is that?
C
Well, it certainly depends a lot by person. I would imagine yours are quite valuable.
A
But quite like what's.
B
Side hustle?
A
Yeah, I'm not asking for an offer, but how much actually would you pay?
B
Well, I would pay a lot, just out of pure curiosity, but if I were trying to think about how valuable.
C
It would be to our customers in.
B
Our business, I imagine it would be.
C
Something in the order of. It's hard because it changes over time. Certainly tens of thousands, if not many hundreds of thousands of dollars a year and how that evolves over time. But my guess is that for the vast majority of people, they'll still care a lot about privacy. And so maybe that data will be collected to personalize their individual agent, but they're not going to be comfortable with that getting added to the broader model weights to customize the base model that billions of users are.
A
But that's easy. So you can take me with my pendant, I run it through my AI and I say take out anything I don't want mercor to hear and it will do that quite well. Maybe not perfectly. And then you get what's left over all the debates about elasticities and tax incidents maybe.
B
I suspect you're probably more comfortable with it than most people. Most people would probably say, well, you're asking the AI to be the layer.
C
Of trust to remove the sensitive information, but it's going to have bias in doing so. And so I think there's always going to be some level of sensitivity around these topics. And I actually believe that some of the companies that have done a very good job around their brand of privacy are going to have an advantage in it. Like I think Apple, well, maybe not totally at the frontier of AI yet, has done such a good job in their brand around privacy. And that's going to allow them to have a lot of trust from users in a way that they're able to collect all this personalized information, say three.
A
To five years out, when the top models will be both clearly better than virtually all human experts, or maybe all human experts and recognized as such. The latter we certainly don't have. What do you think in that world the reputation of expertise is like now, one view is no one respects the experts because the machines are better. But I think an alternative possibility is the machine, by not being tied to a personality, is less disliked and people actually respect the experts more because they get this impersonal distillation of the experts. Like, oh, the experts did that. They're so amazing and they're not annoying me like on the late night TV show. Like, what will happen to the status of human experts?
B
I think so.
C
I think that I definitely am already at the point where there are certain domains where I trust ChatGPT or whatever model I'm using more than I trust particular expert in that industry, you know, for a very quick, like medical perspective, even in some cases or whatever it is. And so I think that there's some element of it being highly competent, there's some element of it not having a face to it that causes us to place this high trust. But I do think that the point you made at the beginning is around evoking the question of what is the point at which these models will be able to do everything that experts aren't able to do. And my read on the market is that models are advancing very, very quickly in being able to automate, call it 50% or 75% of what humans and experts are able to do, but will really struggle with that last 25%. And I think that for a very long time, human expertise will be imperative to help accomplish that last 25% as the ultimate bottleneck to more economic prosperity and productivity.
A
How long until the best models can write a poem as good as the median Pablo Neruda poem?
B
Oh, I think that's probably not too far off.
C
I think the most I would say.
A
Less than a year. When you say too.
B
I think less than a year.
A
Less than a year, yeah. How about the very best Pablo Neruda poems?
B
I'm not too calibrated on poetry, so I'd have a hard time saying, but.
A
I think it's much further out. And is that your intuition? I agree.
C
I think that's consistent with my intuition as well. But I think that this longer tail of advancement is generally the most difficult. The other heuristic I have for it is that going back to this dimension of the time horizon of the task models are in some ways superhuman with what you can do in a chat window. Right. With your chatbot. But they still can't draft an email for us, they still can't schedule a meeting. And those things will come. But I think that there's a long way before we're able to tell a model. Go off and build a startup for 90 days and there's going to be an immense amount of human expertise associated with how do we get to that, across every knowledge work vertical that we want the models to operate in, in.
A
So far as we turn society into this big engine for reinforcement learning what new jobs get created by doing that?
C
Well, I think the most interesting part of our business is that everyone else in Silicon Valley is talking about how we automate away jobs versus we're very focused on how do we build this new job category of people training agents, building RL environments to help teach models. And that's what I believe it'll converge to. Instead of the investment bankers doing the analysis, they'll build RL environments and train agents and it'll be the same across consulting and software engineers and customer support and pretty much every knowledge work vertical. And so it's hard to say the exact pace at which that'll happen. But I would not be surprised if within five years a majority of high end knowledge workers are training models, whether in their full time jobs or through our marketplace, to help improve agents at whatever workflows they want to automate and.
A
To hold those jobs. How much technical AI will a person need to have? Or do they just have to know about the thing?
C
They just need to know about the thing. The only element of technical AI that they'll need is to find where the model makes a mistake. So long as they can find where the model makes a mistake and sort of understand in some ways the frontier of the model and its capabilities, how you can push it to its limit, then it's relatively easy to create some criteria or way of Measuring that mistake so that the model can learn from it. And I think we'll have that across every different vertical with every different tool with these very long Horizons, whether it's 100 hours or 100 days that we want the model to work on something. And that's going to very quickly become the primary bottleneck to model improvement.
A
Is the demand for software price elastic?
C
I think it's extremely price elastic. In fact, I think that the elasticity is the exact right thing to hone in on with respect to how job displacement will evolve in these domains. Like, I think if we make software engineers 10 times more efficient, we'll have even more software engineers. Maybe we'll have 10 times as many software engineers and build 100 times as much software. Right. Versus other domains. Maybe that's not the case. Right. Maybe we only need so much accounting in the world or we only need so much customer support. But I think software engineers, certainly we'll be able to do so much more.
A
Where else do you think of as price elastic?
C
I think that building businesses is also. So a lot of the product and distribution associated with software is certainly going to be something we see a lot more of. I think there's a lot of domains, even if you think about investing, obviously it's not as price elastic as software. But I do think that there's still enormous inefficiency with respect to how we allocate capital in the economy. Like if I think back to the early days of Mercur, you know, we were having a hard time getting our $10,000 of working capital for our initial seed investments.
B
And then very quickly, once you get.
C
To a reasonable scale, the markets are very, very capitalized. And so I think a lot of this, like early capital allocation, as well as even just better understanding how companies will develop over time is going to be really interesting. And also how that information and analysis manifests itself within companies. Right. For an operator, they sort of have this investing problem of what are all the different bets that they have within their companies? How do they allocate capital and resources associated with that? And so I think that there's so much elasticity with respect to how we build more products, how we distribute those products, and how we allocate resources within companies more effectively.
A
What will education look like five to 10 years out?
C
I think education is one of the things I'm most excited about. Where a good heuristic is. If everyone has Salcon as their personal tutor available 247 to teach them whatever topic they want to learn, it'll Be that it's much easier to motivate themselves themselves. It's much better access to information, much better ways of explaining that information, and that'll be profoundly impactful.
A
But that seems less price elastic, right? Like only so many hours of salcon a day. No slight intended to him. Yeah, but it's not going to be 27 hours a day, right?
B
Yeah, that's true. That's true.
A
So employment for teachers researchers might shrink.
C
I think in some ways areas of that may shrink, but I also think that there's a large element to teaching that exists in personal relationships of which the model will be able to do part, but not all of it of how does the teacher act as guiding the student through their journey and helping them to improve both in their curriculum as well as their emotional development. And so I think teachers will still play an important role in the economy and ideally able to just provide higher touch points of contact with all the students and smaller class settings.
A
So this is October 2025. How many people work at Mercour?
C
Right now we have just over 300 people across the world as our full time employees.
A
How did you hire so many good people so quickly?
B
Well, we used our technology and our.
C
Platform to help with it a bunch. I mean the origin story of the company was automating all of the ways that we would review resumes, conduct interviews and discuss decide who to hire. And so the ways that we assess talent, the ways that we optimize funnels to build out teams is really ingrained in the DNA of the company and a top priority of me and my co founders. And so I'm extremely grateful for everyone that we have on the team. And they make it look easy.
A
How do other people do interviews? Wrong.
B
I think that one of the. Well, this is something we've talked about.
C
A bunch because you obviously wrote a phenomenal book on talent. I think one of the largest problems that people make is that they don't measure the actual skills and capabilities that they want someone to exhibit on the job. Instead of focusing on how do we measure how well this person does their investment analysis of the data room. They have this vibe space conversation of, you know, where did the person grow up? How similar are they? Do they think they would enjoy hanging out together? And obviously that's still important if you're having a working relationship. But I think that they often over index on that relative to the skills that people actually exhibit.
A
So just give them a project, give them a project and grade them.
C
In essence, I think that's the cleanest.
A
Way to do it, let's say it's not programming. As the company gets bigger, the major AI companies, a lot of them now are quite large and most of the people who work there don't do AI at all. They do jobs that are not so dissimilar from what they might do at Coca Cola, which is fine. That's just part of growth. They're legal, they're communications, they do events, whatever. When you're trying to hire people like that, say, like what's the test, what's the project? Or what is it you look for?
C
I think that that's definitely more difficult. I think you probably want to look for cases in their life where they've worked in similar roles because you can't curate a project that's as similar to exactly what they've done. And so you would see the, you know, the best proxy for that and then really drill in to understand the details of that working environment, how similar it is, how well they performed in that, talking to people that previously worked with them in that environment to get a gauge for it. But it definitely is more difficult to measure someone's slope versus and how they'll develop on the job over a six month time horizon so than it is to measure their Y intercept. And so I think that's one trend that we've found in talent assessment.
A
Do you think body language in an interview is predictive?
C
I think it can be, but I also think it can be a false signal because I've definitely had cases where I over index on, oh, this person feels a little bit awkward or whatever it is, but they do a phenomenal job at the actual work. And so I think it's important to be very cautious around which of these signals are actually correlated with performance and which ones aren't.
A
Articulateness overrated or underrated.
C
Depends a lot on the job, defines a lot on the job.
A
Let's say 10 years from now when we can really measure pretty well the performance of people we're interviewing today. Less than 10 years, but say 10 years, let's say you have a company such as Amazon, does a very large number of interviews and let's say they're all taped and you run them through the best AI models. How good a predictor do you think that will be in your opinion?
C
I think that it will be certainly superhuman because humans aren't very good at it.
A
Right.
C
But it's still such a difficult problem that there's going to be variance. And I think that for roles like the one you described, what's going through my head is there's a lot of confounding variables. Did the person have an issue in their family that caused them to be off their game or not show up to work? Did they get sick during the interview process and maybe weren't full of energy? There's all these things that just add noise to that problem. But I do believe that as we're able to get all of that data in context, to have all the notes from the manager around, what was happening in this person's life both during the interview process as well as on the job, that will allow it to over time become phenomenal. And so maybe we have that on a ten year time horizon.
A
How can we make labor markets more efficient?
C
I think that one of the largest inefficiencies in labor markets is that everything is disaggregated and that when one of our friends is applying to a job, they would apply to a couple dozen jobs. And when companies considering who to hire, they'll consider a fraction of a percent of people in the economy. And it feels like there needs to be a structural change there where there's an aggregator that everyone applies to and every company hires from facilitating this perfect flow of information.
A
But we need a very good AI for that to work.
C
I think a very good AI will help with that working. And the reason is that, the reason I think it doesn't happen today is that there's a very difficult matching problem. And Let me give LinkedIn as an analog. LinkedIn has all the distribution to pretty much every company and every candidate. But at the very same time, it's incredibly difficult to understand based on someone's LinkedIn profile whether they'll actually perform well at a given job. And so I believe that in that case it's very much a matching problem, less so a distribution and aggregation problem to facilitate this effective flow of information and aggregation within knowledge markets. But I think it's also in line with the fact that the nature of jobs is changing dramatically. Right. Previously everyone would think about this problem in the context of full time roles. But as we trend towards this world of everyone building RL environments and being able to do work remotely and train models in this fractional way that also will shift the dynamics of enabling more aggregation, enabling more globalized matching and how that will impact the economy.
A
Some of my friends think that mentors and nepotism will make a good comeback. And they say everyone will submit a perfect cover letter, have an optimized LinkedIn profile, they'll even have practiced with an AI doing the interview. They won't all get up to speed, but a lot of them will. And there'll be this large mass of apparently pretty qualified candidates and what you'll actually do is resort to the old tried and true. Well, do you know this guy's uncle or something else who can recommend them? Agree, disagree.
C
I think in some companies and industries that will happen. I agree with it. My hope is that we have models that are helping to run companies in a very thoughtful, efficient way that are data driven about it, where the models have a eval set of all of the performance reviews of people in that given company and they're able to make an accurate prediction over whether this reference or that piece of nepotism should actually be considered or maybe as a counter signal. Right. And so that's my hope. But it'll probably play out with some combination of both over time in the.
A
AI sort of run labor market. Let's say it's more efficient. But do you think there are fewer second chances and late bloomers in that world? You get scored too early, so to speak, and then you're tracked. It's a bit more like how European schooling systems can differ from American.
C
I think there will be a lot of second chances and the reason that there will be is that oftentimes they're effective and so the models will identify that and realize that maybe someone wasn't the right fit for that first role. There's another role that they could be a really good fit for for, because I do think that there are jobs in the economy that almost everyone would excel at. And it's really just this matching problem of finding the intersection of something that they're excited about where they'll also add an immense amount of economic value.
A
As you know, there are AI services now you're doing an interview across the top or the bottom of your screen. The AI can give you advice, answers. Does that work at all? What do you think of those?
B
We read up against a lot of those.
C
One thing I found in talent assessment is that initially people tried to work against AI, similar to what we do in academic settings where people would try to say we're going to have you write the essay on paper so that you're not able to use ChatGPT to help you with the essay. When really the right way of approaching it is seeing what people can do when using all those tools. If we tell them, hey, use all of these phenomenal codegen tools and record your screen and building a product to see what you're able to do over the course of an hour. That's a far better predictor of this person's ability to actually deliver impact than it is to, say, don't use the tools at all. And so I think that's one shift that we're going to see and will likely frame the relevancy of a lot of these AI cheating tools over the coming few years.
A
Can someone fool you by using an AI cheating tool, or do you feel you more or less always Now, I.
C
Think that there were cases where people could fool us, but now we're quite good at figuring it out. We're quite good at figuring it out and also moving towards assessments where we almost encourage it. Right. And are comfortable with the fact that they're using these tools because we want to see what they're able to do with them.
A
So you were a Teal Fellow. Right. And you dropped out. How could they improve their methods?
B
Well, this is something we've talked with them a lot about because the Thiel.
C
Fellowship is constrained by that exact matching problem that we were talking about earlier, where they can only consider and interview a fraction of a percent of the people in the world that they think would be a good fit for the Fellowship. And so we've worked with them on building out AI interviews that are able to better assess Teal Fellows and using models to analyze the transcripts of those recordings to see what are the signals to better select Teal Fellows and all of that, which I find very interesting.
A
But isn't it part of their strength? Say, Peter, he's quite controversial politically and otherwise. Being a Teal Fellow has a certain brand that's distinct from anything political, but it's a very particular thing. Not everyone wants to do it. Doesn't it work well, because it's an extremely local market and you get people with a certain kind of orneriness, and selecting from that pool just goes pretty well.
C
And.
A
And you don't want to be in the bigger pool of people.
B
Maybe.
C
I think that you're right in the element that referrals are very important. Right. Oftentimes great people know great people, and so they'll always need to leverage referrals. But at the same time, I think they rightfully care a lot about people that think unconventionally and come from unconventional backgrounds. The people from every part of the world that might otherwise not get a meeting with a venture capitalist or some of these more traditional institutions. And so ensuring that they're able to consider those candidates and to give them this opportunity and incorporate them into the Fellowship is incredibly Important and part of the mission.
A
Could it be scaled 10x?
C
Absolutely.
A
100X?
B
I think so.
A
100,000X?
B
Well, I think it sort of ties.
C
To what we were talking about earlier, of the elasticity of demand for better investors. Right. Because in some ways hiring has so much overlap with investing of. Imagine if we could have Peter interview everyone in the world when they're 18 or 20 or whatever the age is and make a decision around whether he wants to give them 100k check. That would probably be very powerful with respect to economic mobility and how many companies we're able to create. And so I think that will happen and it's just a matter of time, of building the right technology and the right focus to enable it.
A
But is the following possible? Let's say Peter is a just tremendous interviewer. That's easy to believe, but he's really a great interviewer for the subset of people attracted to him. And if you just put him out in the broader pool, who's going to be a lifeguard at the swimming pool or something? Maybe he's just not that good an interviewer for that.
B
I agree with that. I think that's certainly the case. And so imagine if you had a.
C
Panel of domain experts across every industry that were able to perform these interviews, because certainly the best models will be better than any single best individual. But I would expect that the aggregate sum of all experts in each domain will likely remain better than the models for a long time.
A
Now you dropped out of school, now you're doing the company. Obviously you're very busy, but imagine as an act of magic, you could have a free year just inserted between today and tomorrow and you come back and nothing has changed. To go off and do anything you want with literature, with art, with travel, with music, with climbing the Alps. I don't know, what would you do with the year?
C
That's a fascinating question.
B
Can it be AI related?
A
No, cannot be company related.
B
Let's see. I would love to travel.
C
I think that sounds like it'd be a lot of fun because as you can imagine, in running the company, I've worked 100 hours a week for the last three years and I love doing it and I'll continue doing that. But I do think that seeing the world and getting more of this understanding of how do perspectives vary by country and geography? How are people thinking about AI differently elsewhere is really interesting. I really like that. I remember after ChatGPT came out, Sam did this world tour of going to all the different places, seeing what they thought about AI how they viewed it impacting their world. And I think that global perspective is incredibly valuable and informative.
A
Where do you want to go the most?
C
I want to go to Japan a lot. I've never been to Japan, so I'll have to make it out there. That's probably my top pick.
A
It's a great visit. One thing I found, since I have traveled a lot, obviously I'm older and in some ways less busy than you are, that it helps me interview people quite a bit, because people more and more come from all over. And it's like, if your model has the poetic taste of different eras, of John Milton, Wordsworth, Shakespeare, whatever, traveling is an individual's version to get some version of that. Yeah. So if, say, you hire a lot of people from India, I suspect you do. It's a populous country a lot in Bay Area. Going to India then becomes very important because you get a better sense of just where they're coming from.
C
Yeah, I completely agree. I think also being able to connect with those individuals very quickly around, hey, I've been to this place, and I'm very familiar with India and all these different things is really helpful in building relationships and setting up trust across all the different people that we work with and interact with.
A
How did your 8th grade donut company go?
B
One of my favorite topics. Well, so I could. I could tell the story, which is.
C
I initially realized that Safeway donuts were selling for $5 a dozen, and my 8th grade mind was thinking, that is such a deal.
B
I would pay $2 a donut, and.
C
I bet my friends would as well.
B
And so I would bike down to.
C
Safeway and I would buy safeway donuts for $5 a dozen, go to my middle school, sell them for $2 each. Eventually, my middle school called me into the principal's office to shut me down because I was scaling up my operations.
B
And then I moved my donut stand.
C
About 50ft over off of school campus so they couldn't police me. I paid my mom $20 a week to drive me in her minivan to be able to bring more donuts to and from school.
A
She charged you 20 bucks?
B
She charged me 20 bucks exactly.
A
Is that underpriced or overpriced?
B
I think it was about right.
C
I anchored it on the cost of.
B
An Uber, and I was like, I'm.
C
Not going to pay more than an Uber.
B
But I. I need the car to wait long enough that I'm able to load up, you know, 10 or 20 dozen donuts. And so I did this. I'd pay my friends in donuts because.
C
I perceive the cost of the donuts as, you know, my cost basis versus they perceived it as $2 each. And so I had a little bit of arbitrage in the salaries. I had competition pop up where they would sell Chex donuts, which are higher end donuts, but they had a $1 cost basis. And so I dropped my prices to $1 for two weeks, weeks to drive them out of business before I had.
B
Learned anything about anti competitive laws. And so those were just a few.
C
Of the stories from my 8th grade. Donut dynasty is what we called it.
A
Other than just intelligence, what makes a person good at extemporaneous speaking. And you won awards for this, right?
B
I did. Well, actually I won awards for it.
C
But I wasn't nearly as good as my co founders. So in high school we all did something speech and debate together.
A
So you knew each other from high school, like age 14, right?
C
Age 14, exactly. And we were on the policy debate team together. We also did national extemporaneous speaking and they were the winningest speech and debate team of all time. And policy debate, the most competitive event where they won the tournament of Champions, the National Speech and Debate association and ndca, the three largest national tournaments which no other team has ever done. And I did okay, but I'm dyslexic.
B
And so I would always stumble over words or mix things up and wasn't.
C
Quite the same level as them. But I think there's a few things that go into the answer of what makes on phenomenal at it. I think that high clarity of thought often correlates very strongly with people that speak very well. And so as you mentioned, intelligence plays another role. I think a second thing is confidence, someone that's willing to speak and improve and iterate on it because oftentimes it's just doing more of that activity that allows you to improve on it. And then maybe a third one is more than just intelligence. It's also the speed of thought. And I think about those as different dimensions. There are certain people I think of as having very high aptitude but thinking very deeply and slowly about a given thing. And other people that I think of as having, you know, reasonably high aptitude or medium aptitude, but being able to like be quick on their feet. And so I think definitely think there's some innate element of that.
A
And which are you?
B
I tend to think I'm more in the slower, deeper thinking bucket.
C
But it depends a little bit on how much coffee I've had.
A
So you started the company, so you were 19?
C
Yeah.
A
Why is it there's a positive statistical correlation between being dyslexic and entrepreneurship and there is one in some published papers. What's the mechanism?
C
It's shockingly strong, actually. I'm not sure exactly, but I find that one unique thing is that it feels like my brain works a little bit differently and that there are certain things that people are so much better at than I am, where, you know, they're reading through evidence in a debate round very quickly and I could never do that. But there are certain ideas or ways of approaching a problem that are just different, that enable more creativity, potentially being unconventional in doing so. And I think that that is one advantage I've had. And one of our early investors, actually Scott Sandel, is dyslexic and backed a lot of dyslexic entrepreneurs. And so we've talked about this a little bit.
A
One of my hypotheses is that quite early on you have to learn how to delegate. And that's a skill that when people are not forced to learn, often very competent people don't become good at it until much later. But the dyslexic person is good at it right away.
B
Totally, yeah. Asking people to help read something for them.
A
That's right. Could you please do this for me?
B
That certainly could be the case.
A
And focusing on bigger picture in some useful ways, at least for being a founder. Not good for every job, of course. Totally.
C
But I think one thing I really came to appreciate, especially during high school, is that there are certain things that some people are phenomenal at and others are horrible at. Like I felt areas of debate and reading through evidence quickly where I felt extremely unintelligent and it was super humbling. And so much of finding success in your career is just understanding like what are your strengths and how do you leverage those and much less about what are your weaknesses. And so that's something that I've sort of taken with how I approach Mercur, but also how I encourage our employees to think about their roles within the business of what are the things where they have these comparative advantages and phenomenal strengths and how do they leverage those most effectively?
A
How much do you feel you're in touch with the general culture of intelligent 22 year old men in the United States or are you just so in the company you have no idea what's going on?
B
I'm so in the company. I don't know.
C
I think that I obviously was in college for a couple of years before I dropped out and so I had some people around me that so much of our company is 22 plus or minus a couple of years. So I guess I have that heuristic. But I certainly don't think that I have spent as much time with people my age as if I had stayed in school. As another comparison.
A
This is not a question about you because we don't ask personal questions. But a good tech friend of mine, you've probably heard of him, he says to men in that age bracket, 22, 23, that there truly is a dating crisis, that something has gone wrong. Not about you, but just America in general. The very smart, possibly nerdy person in that age group. Is there a dating crisis?
B
I think certainly in San Francisco. Not in New York, but certainly in San Francisco.
A
And you think it's just gender imbalance or the country screwed up more generally?
C
I haven't thought too much about this. I think it's probably gender imbalance in San Francisco, especially in certain industries. But I think that dating apps are probably generally in society. I don't use dating apps, but are generally in society helping to drive a lot more efficiency in solving this matching problem.
A
So you're pro dating app. Most of the people I know are against them.
B
No, I think. I think they're good.
C
I'm very much a proponent of better technology to solve these matching problems and enable people to be happy in their lives.
A
Your last name is Foodie? Should I believe in nominative determinism? Are you a foodie?
B
I certainly love. I die to get good food. It's funny, my dad always loved cooking growing up and was certainly much more.
C
Of a foodie than I am, but I a little bit of it rubbed off on me and so while I'm not as much into cooking, I love eating good food.
A
Where in San Francisco should people eat? Or nearby?
C
Lots of good restaurants. I think there are sort of the everyday restaurants that I think are very good and then higher end every day. I love Mexican food. So El Matate's is a great Mexican restaurant. I also for higher end food, I like Catonia in California. Squins. Lots of good restaurants like that.
A
At the meta level, what's the thing people should know about eating out here, like where I live? I would just say you need to know to go to the suburbs. May or may not be true here, but here what do they need to know other than particular names of restaurants?
C
I find that Belly is really accurate. The app for food ratings in San Francisco because there's a high density of users and so if you use Belly as your guide, you'll generally find good spots.
A
Why is the company called Mercour?
C
Merkur means marketplace in Latin. And we want to build the largest marketplace in the world, so we named it Merkur.
A
We're from Mercatus. Do you know what Mercatus means in Latin? It's a variant.
B
Yeah, it means market. Okay, there you go.
A
So, yeah, we're from the same named institution.
B
Exactly.
A
In that sense.
B
Well, it's funny, in high school, my.
C
Co founders and I went to a Jesuit school school, and my co founder, Surya studied Latin. And so we've always certainly thought a lot about Latin roots and Latin words.
A
Your family wasn't Catholic, I believe, right?
C
That's correct.
A
Did going to a Jesuit school help you think or what? What did that add to the mix?
B
Well, the none of the three of.
C
Us were Catholic, despite going to Catholic school, which was a little bit funny.
B
But one interesting story is that my.
C
Mom was concerned about whether I would start selling drugs when I was doing my donut stand in eighth grade because.
B
You know, it's an easy step. And so I like to think that, you know, Catholic school helped instill good.
C
Values in what I should care about. And being very focused on school at the time, on speech and debate, on building companies, and so very grateful for that education.
A
Last two questions. First, what's the next goal you have for the company?
C
The next goal for the company is really in scaling up a lot of these super realistic evaluations that I've talked about of how do we get measure? The ways that models use all sorts of different tools on trajectories that would take someone days or weeks to do is a big focus for us. And especially how that impacts enterprise. Right where I think that so far for the last two years, people have been very focused on the idea of intelligence rather than the idea of models being useful and bridging the gap between what do enterprises actually want to use, how do we measure that, and how do we get those capabilities in models is to me, the most exciting thing that I could work on.
A
And what do you want to learn next, work related or otherwise?
C
That's an interesting question. I feel like Mercur is at the intersection of labor markets and AI research. And we grew up with the DNA in labor markets of thinking all about how do we aggregate all these people on our platform, how do we match them? We hired people that are deep domain experts in labor markets, like Sandeep Jain, who was the chief product officer, Chief Technology officer at Uber But I am most fascinated by all of the advancements in AI research of how do we apply human talent and human labor to all of these problems at the frontier in more efficient ways to train models? And what are the specific rubrics or data types that are driving the most model improvement? And so I've been most interested in how to learn that.
A
Brendan Foody thank you very much.
C
Thank you so much for having me. Tyler.
A
Thanks for listening to Conversations with Tyler. You can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. If you like this podcast, please consider giving us a rating and leaving a review. This helps other listeners find the show on Twitter. I'm at Tyler Cowan and the show is cowenconvos. Until next time. Please keep listening and learning.
Date: January 7, 2026
Guest: Brendan Foody, CEO & Co-founder of Mercor
Host: Tyler Cowen
In this episode, Tyler Cowen sits down with Brendan Foody, the 22-year-old CEO and co-founder of Mercor, an AI company transforming how advanced models are trained and evaluated in knowledge work. Their conversation covers expert involvement in teaching AI, the pace of AI’s economic impact, the future of labor and education, how evaluation data shapes AI model abilities, and what it means for expertise, hiring, and human flourishing. They also touch on Brendan's entrepreneurial background and personal experiences, offering a rich look at the near future of knowledge work alongside the realities facing today’s tech founders.
“When we have these phenomenal poets that teach the models… once they’re then able to apply those skills and that knowledge across billions of users, hence allowing us to pay $150 an hour for some of the best poets in the world.”
— Brendan Foody, 01:22
“The largest takeaway is the rate of model improvement at economically valuable tasks is incredible.”
— Brendan Foody, 05:57
“Models still can’t draft an email for us, can’t schedule a meeting… There’s a long way before we’re able to tell a model: ‘Go off and build a startup for 90 days.’”
— Brendan Foody, 28:42
“A huge portion of the economy will become an RL environment machine.”
— Brendan Foody, 24:46
“They just need to know about the thing. The only element of technical AI they’ll need is to find where the model makes a mistake.”
— Brendan Foody, 30:36
On Taste and Rubrics:
“Immanuel Kant… said, in essence, taste is that which cannot be captured in a rubric. And if the data you want is a rubric, and taste is really important, maybe Kant was wrong, but how do I square that?”
— Tyler Cowen, 17:55
On Reinforcement Learning Society:
“Instead of the investment banker redundantly analyzing a data room… they’ll teach the model how to do that once… Similar to building software once, they’ll be able to use that many times as they use their agent… That’s why I believe a huge portion of the economy will become an RL environment machine.”
— Brendan Foody, 24:46
On AI’s Next Leap:
“I would be shocked if we don’t have enormously capable models across those dimensions… long horizon tasks, in the next six to twelve months.”
— Brendan Foody, 09:04
On Future Data Needs:
“If people have tests that models are bad at, that map to a meaningful amount of economic value… that’s super exciting for us.”
— Brendan Foody, 14:43
On Human-Machine Comparison in Expertise:
“My read on the market is that models are advancing very quickly at automating 50–75% of what humans and experts are able to do, but will really struggle with that last 25%.”
— Brendan Foody, 27:09
| Timestamp | Segment/Topic | |:---:|:---| | 01:01–03:15 | Why pay $150/hr for poets? Mercor’s expert-driven AI model | | 04:14–06:36 | AI Productivity Index (Apex), expert selection, measuring economic impact | | 09:45–12:05 | Limits of AI: Taste, long-horizon tasks, when AIs can rival experts | | 13:50–16:58 | The value of evaluation data, rubrics vs. “taste,” ideal data for social science/poetry | | 18:18–19:22 | RLHF and subjective domains, optimizing for user vs. expert preferences | | 21:32–22:30 | Teaching AI taste from different eras, personalization of style | | 24:46–25:25 | Society as a RL engine, shift from repetitive work to AI agent training | | 27:09–28:17 | Status of experts as AI gets better—“last 25%” challenge | | 29:35–31:18 | New jobs: AI trainers, requirements for expertise | | 33:18–34:39 | AI tutors, future of teaching and teacher roles | | 34:52–37:26 | AI-driven hiring, skills over vibes, challenges for non-technical roles | | 39:20–41:03 | Making labor markets efficient, global matching/aggregation | | 41:31–42:06 | Nepotism, mentors, and the AI-driven job market | | 49:43–51:19 | Brendan’s 8th grade donut company—entrepreneurial lessons | | 51:19–52:59 | Extemporaneous speaking, debate, and cognitive styles | | 53:22–55:17 | Dyslexia and entrepreneurship: creativity and strengths | | 56:12–56:42 | Cultural/dating issues among young tech founders | | 57:10–57:46 | Favorite food spots in San Francisco | | 58:00–58:10 | Why the company is named Mercor; roots in Latin | | 59:08–60:43 | Next goals for the company, intersection of labor markets and AI |
Find more episodes and full transcripts at conversationswithtyler.com. For research and info on Mercor’s work, visit mercatus.org.
Summary compiled using the episode transcript. All quotes are attributed and timestamped per guidance.