Podcast Summary: Datumo Challenges Scale AI’s Market Grip
Podcast: The AI Podcast
Episode Date: August 17, 2025
Host: The AI Podcast
Episode Overview
This episode explores the rise of Datumo, a Seoul-based startup taking on Scale AI’s dominance in the AI data services sector. The conversation moves from Datumo’s origins and innovative approach to data labeling, to its new ventures in AI safety benchmarking, and the broader market dynamics shaped by industry giants like Meta and their investment in Scale AI. The host unpacks Datumo’s rapid growth, unique strategy, and the implications of its recent $15.5 million funding round led by Salesforce Ventures.
Key Discussion Points and Insights
1. The Massive Market for AI Data Services (01:00–03:00)
- Host sets the stage by highlighting the scale and drama surrounding Scale AI, especially following Meta’s recent $14B investment.
- Emphasizes the sheer size and potential of the AI data industry.
2. Datumo’s Market Entry and Positioning (03:10–06:00)
- Datumo is based in Seoul, South Korea, primarily serving Korean conglomerates (Samsung, LG, Hyundai, SK Telecom).
- Key Insight: Many companies feel unprepared to use AI safely (referencing a McKinsey report), underscoring the need for tools like Datumo’s.
“Most companies... say that they're not prepared to use AI in a safe and responsible way. …A lot of executives at companies just don't feel like they, they don't know what they don't know.” — Host (04:00)
3. Addressing AI Safety and Trust Issues (06:00–08:00)
- 40% of survey respondents see explainability as a significant risk, only 17% are actively addressing it.
- Datumo steps in by expanding from pure data labeling to safety benchmarking and model evaluation, helping clients monitor and improve AI models.
4. Datumo’s Origin Story and Crowdsourced Approach (08:10–10:20)
- Founded by David Kim, a former AI researcher frustrated with slow, expensive data labeling.
- Innovated with a reward-based app crowdsourcing human labelers, allowing anyone to contribute in their spare time for payment.
- Early market validation with tens of thousands in pre-contract sales even before app completion.
“Anyone can get on there and... if you have free time, you can sit there and label data... and you get paid for it. …It’s obviously a huge business.” — Host (09:15)
- In their first year: $1M revenue, rapid client adoption.
5. Client Base and Expansion (10:30–12:45)
- 300+ clients, primarily Korean tech giants.
- Last year revenue: $6M.
- Planning international expansion, particularly into Japan and the US.
6. Moving Beyond Data Labeling: Model Evaluation (13:00–14:40)
- Clients began requesting model output scoring and benchmarking, which Datumo identified as a new product line.
- Growth from annotation to full model evaluation services.
“They wanted us to score AI model outputs to compare them to other models… We started in data annotation and then expanded into pre training data sets and evaluations as the LLM ecosystem matured.” — Michael Huang, Co-Founder (13:40)
7. Market Drama: Scale AI, Meta, and Competitive Shifts (15:00–17:10)
- Discusses Meta’s $14.3B deal for 50% of Scale AI, likening it to Microsoft’s OpenAI investment.
- Noted that some clients, including OpenAI, left Scale AI post-deal due to competitive trust issues.
8. Differentiators in Datumo’s Business Model (17:20–19:30)
- Datumo licenses its own proprietary datasets (notably, data crawled from published books for enhanced reasoning abilities in AI).
- Bundles data labeling with licensing and benchmarking.
- Introduces a no-code evaluation platform (Datumo Eval) for non-developers, aimed at compliance, policy, and safety teams.
“Apparently reading books is a good way for [AI models] to... learn how to reason through problems, which I thought was absolutely fascinating.” — Host (18:30)
9. The Salesforce Ventures Funding Story (19:40–21:30)
- Unusual funding origin: Salesforce discovered Datumo after a LinkedIn post featuring a fireside chat with DeepLearning.AI’s Andrew Ng.
- The funding journey took around eight months from initial contact to closing.
“Hosting some big famous person and then posting it on LinkedIn is like the best way for a startup to raise money. Thought that was very interesting.” — Host (21:10)
10. Future Roadmap and Expansion Plans (21:40–22:50)
- New funds allocated to automated evaluation tool development for enterprises, global go-to-market (GTM) strategy.
- Further hiring/growth in Silicon Valley and plans for broader expansion.
Notable Quotes & Memorable Moments
-
On industry readiness:
"Most companies... say that they're not prepared to use AI in a safe and responsible way." — Host (04:00)
-
On Datumo’s business model:
"Anyone can get on there and... label data... and you get paid for it. …It’s obviously a huge business." — Host (09:15)
-
On pivoting to model evaluation:
"They wanted us to score AI model outputs to compare them to other models… We started in data annotation and then expanded..." — Michael Huang, Co-Founder (13:40)
-
On funding serendipity:
"Hosting some big famous person and then posting it on LinkedIn is like the best way for a startup to raise money." — Host (21:10)
-
On dataset innovation:
"Reading books is a good way for [AI models] to... learn how to reason through problems..." — Host (18:30)
Timeline & Timestamps for Key Segments
- (01:00) - Episode topic and industry landscape
- (03:10) - Datumo’s background and market context
- (06:00) - Issues of AI explainability and Datumo’s new service
- (08:10) - Datumo founder’s story and crowdsourced approach
- (10:30) - Client list and revenue milestones
- (13:40) - Quote from Michael Huang, co-founder
- (15:00) - Meta’s acquisition of Scale AI and market implications
- (17:20) - Datumo’s proprietary dataset licensing and evaluation tools
- (19:40) - Salesforce Ventures investment process
- (21:40) - Future plans and global expansion
Conclusion
The episode paints Datumo as a nimble, innovative challenger capitalizing on gaps left by market leaders—especially around AI model safety and explainability. The host shows optimism for Datumo’s trajectory, citing their agile product development and client-driven evolution. Listeners gain insight into the competitive, rapidly-evolving world of AI infrastructure, where global expansion and trust are just as crucial as technical innovation.
