Transcript
Podcast Host (0:00)
At a high level.
David (0:00)
How do you explain Micro One?
Ali (0:02)
Micro one is the AI platform for human intelligence. So what that means is we vet highly skilled people, mainly PhDs, professors and industry experts, mainly in medical, finance and legal, but also many other domains. And we help train frontier large language models. So you could think of the AI labs, the way they're kind of improving their model capabilities is by gathering net new human data for their post training pipelines. And we help them gather that net new human data.
David (0:33)
And who are your customers?
Ali (0:35)
Customers are the frontier labs that build foundational models and we also have enterprise customers, you know, Max 7 and kind of Fortune 500 broadly that are building also foundational models but, but also they're building enterprise agents that we help them evaluate and kind of get ready for production.
David (0:55)
One of the reasons I wanted to chat today is because upstream of the LMs improving, there's these improvements to the models. Maybe you could unpack on why are LM models improving and how much of that is this recursive AI improving itself and how much of it is the PhDs and these other professionals?
Ali (1:17)
It's almost entirely human humans that teach the models in some way or another. Of course that started with the pre training phase where humans taught models by first creating the Internet. Of course that was the, you know, the largest set of human data that we, that we had initially, which the models kind of took a unsupervised route of training. And you know, that was kind of the initial, initial stage of the foundational models. And, and then afterwards where the models really got useful is when humans kind of started to do a bunch of preference labeling, kind of choosing which answer is better and so forth based on the model responses. And then once we pass that phase, now we're in this kind of expert data training where humans are creating really complex data from scratch, whether it's doctors, lawyers, finance experts and investment banking and other areas.
David (2:13)
2025 was supposed to be this year of AI agents. Some people think it's going to happen 2026. What needs to happen for AI agents to gain traction in the general market?
Ali (2:25)
Really there's just one fundamental bottleneck that needs to be resolved and that is enterprises need to dedicate large portions of their product budget and really just implement in their product workflow this notion of evaluations. And so what I mean by that is if you think about like what does product development look like in any given enterprise or just any company in general, there's usually a phase of design. You design, you know, whatever software you're trying to build, there's some approval processes and then you get into development, the programmer develops it, there's full stack, back end development, front end development, et cetera. And then you put that into some QA engineering phase where there's a, usually one QA engineer that goes in and kind of tests the product, says, okay, this works. And it's kind of a binary thing like the software either works or it doesn't, and then, and then it goes into production. And that needs to change. And the part that needs to fundamentally change is the QA part where there's no more, just one QA engineer going in and saying, okay, this software works and we can put it to production. But instead there needs to be an evaluation framework for each of the actions that the probabilistic software needs to do. In other words, the agent needs to do because the agent, there's no notion of the agent works or doesn't work. It's instead, what is the action space of this agent? What are all the things that I want it to do and what are all things that it should do? And basically what the experts do is they, you know, they create human data to measure exactly the capabilities of each of those functions. And then once the, once the threshold is met, then the agent can move into production with confidence. What's happening right now is that there's a lot of good demos because if the agent works 1 out of 5 times or 1 out of 10 times, you, you'll just record that 1 out of 5 and it looks really impressive. But then it doesn't work four out of the five times, and you cannot have that in production.
