[State of Evals] LMArena's $1.7B Vision — Anastasios Angelopoulos, LMArena
Latent Space: The AI Engineer Podcast
Date: January 6, 2026
Guest: Anastasios Angelopoulos, LMArena
Host: Latent.Space
Episode Overview
This episode features Anastasios Angelopoulos, co-founder of LMArena (now simply "Arena"), a leading consumer platform for evaluating AI models through real-world usage and feedback. The discussion explores Arena’s origins, vision, mechanics, funding, competition, recent controversies, and principles for building a trustworthy evaluation platform at scale. The conversation covers behind-the-scenes decisions, the role of community, and Arena’s outlook for the future as a central evaluation and benchmarking engine for the AI world.
Key Discussion Points & Insights
1. The Origins and Naming Evolution of Arena
- Arena (originally "LMArena") spun out from the LMSIs group at Berkeley, with funding and incubation support from notable backers such as On, Sequoia, and a16z.
- "The way the company started was as an incubation by On... he found us at Berkeley and picked us out of the basement." (01:41)
- The "LM" (Language Model) prefix was dropped to broaden the brand and vision.
- “We started as LMSIs... but we wanted to broaden a little bit... and we were the first Arena, so we feel like let's kind of try to own that.” (00:57)
- On (investor/founder) was particularly impactful, giving grants and supporting the team before company formation.
- "He sort of... formed an entity for us... [and said] you guys can walk away at any time... a very aggressive investment move." (01:41)
2. The Decision to Build a Company
- The founders debated other formats (academic, nonprofit) but opted for a company to scale and meet real infrastructure requirements.
- "It became clear that the only way to scale what we were building was to build a company out of it. The world really needed something like Arena." (02:48)
3. Funding and Growth Metrics
- Arena raised $100 million, primarily to ensure they could experiment and iterate, not necessarily to spend all at once.
- "The purpose of money at a company is to give you cards to flip... if your first bet fails, you can make another bet..." (03:48)
- The platform funds inference for all users, paying market rates minus some enterprise discounts (04:17).
- Arena's scale:
- Over 250 million conversations to date.
- 5-6 million MAUs, with mid-tens of millions of conversations monthly—largest in the space after ChatGPT.
- 25% of users are professional software developers, based on surveys and prompt analysis.
- "It's one of the largest consumer platforms for LLMs... 25% of the people on our platform... do software for a living." (05:03, 05:08)
4. Competition, Differentiation, and Platform Integrity
- Main competitors: Artificial Analysis, Artificial Arena (crypto-originated).
- Distinction: Arena uniquely uses organic, user-generated queries and feedback, while some competitors aggregate benchmarks or use synthetic data.
- "The thing that distinguishes our platform... is that the users are actually inputting their own use case... That gives a level of realism..." (07:23)
- Won’t allow pay-to-play or pay-for-removal on leaderboards: Platform integrity is paramount.
- "You can't pay to get on the public leaderboard... You can't pay to take it off either." (17:11)
5. Technical Scaling and Platform Decisions
- Arena migrated from Gradio (by Hugging Face) to React to enable more custom front-end development and easier hiring.
- "[Getting] off Gradio... Gradio, incredible platform... scaled us to a million MAU... Eventually... it became time for us to move off of that and go to React." (08:42–09:01)
- "Primarily [funding goes to] inference that funds the free usage of the platform and... hiring, of course." (09:55)
6. The Leaderboard Illusion Controversy
- Discussed recent criticisms ("Leaderboard Illusion" paper) claiming Arena enabled inequity via undisclosed private testing/pre-release evaluation.
- "Leaderboard Illusion is a paper that critiques Alam Marina... the claim is... we were doing undisclosed, quote unquote, private testing on our platform..." (10:23–10:28)
- Arena's official response refuted multiple factual errors, corrected by the paper authors.
- "Our response to that paper... is essentially pointing out a series of factual mistakes in the paper that question the validity of the claims." (11:42)
- The community enjoys codenamed pre-release models (e.g., "Nano Banana").
- "Our community loves it. They love basically getting like secret code names." (12:11)
- "Nano Banana was a sensation. That moment alone changed Google's roadmap." (13:13, 13:27)
7. Multimodality & Market Influence
- Image and multimodal models (e.g., Nano Banana) have huge consumer and commercial impact—useful for marketing/design.
- "I think that these multimodal models are going to become some of the most economically valuable aspects of AI..." (14:27)
- Generating paper-level diagrams instantly is now possible; speeds up research and content creation (15:04–15:35).
8. Community and Platform Principles
- Core principle: Provide a "North Star" for the industry—real benchmarks reflecting organic use, continuously updated and resistant to overfitting.
- "We want to provide the North Star of the industry and center the use cases of real users." (15:51)
- Public leaderboard is "a charity, a loss leader"; integrity will not be compromised (17:11–17:53).
- Arena releases open-source datasets of real-world usage to help the community.
9. Product Roadmap & What’s Next
- Recent launches: Code Arena, Expert Arena.
- Upcoming: More occupational/expert verticals (medicine, legal, creative), continued focus on multimodal/video.
- "We're able to show the performance of these models in all these different verticals..." (18:42)
- API potential considered, but focus remains on primary evaluation functions.
- "There's obviously a need for an API... we really should be doing one thing well." (19:18)
10. Community Building and Retention
- Arena's approach: Provide real value, learn from users, drive features like persistent history to boost sign-in and retention.
- "The way I think about it is every user is earned... You have to earn them every single day." (20:43–20:45)
- "Sign-in was a big driver of retention." (21:33)
- Key to success: Exceptional community managers (shout out to Greg), and a high-performance, expert team.
11. Partnerships & Collaboration
- Arena works straightforwardly with all major model labs; always looking for more talent and relevant partners.
- "If you are one of the best people in the world in your area... We need you at Arena." (21:49)
- "Of course [we] partner with all of the major model labs." (22:25)
- Open to evaluating agents (not just models) and integrating new harnesses (e.g., Devin agent). (22:44–23:36)
Notable Quotes & Memorable Moments
- On platform vision:
- "The goal is to create a benchmark that is constantly fresh, that does not suffer overfitting... gives the whole world sort of ground truth for how real users are using these models." — Anastasios (15:51)
- On funding philosophy:
- "The purpose of money at a company is to give you cards to flip." — Anastasios (03:48)
- On community-generated innovation:
- "Nano Banana was a sensation. That moment alone changed Google’s roadmap." — Anastasios (13:27)
- On data release and openness:
- "[We] have probably released more data than basically anybody on the real world use cases of AI." — Anastasios (16:38)
- On maintaining leaderboard integrity:
- "It's not like a Gartner in that sense... Never going to be like that. Models are going to be listed on the leaderboard whether or not the providers pay... you can’t pay to take it off either." — Anastasios (17:11)
- On user retention:
- "The way I think about it is every user is earned. You have to earn them every single day." — Anastasios (20:43)
- On multimodal future:
- "I think that these multimodal models are going to become some of the most economically valuable aspects of AI, both in consumer and also an enterprise." — Anastasios (14:27)
Timestamps of Important Segments
- 00:43 — Genesis of Arena and how it spun out from Berkeley
- 02:48 — The rationale behind building a company vs. staying academic
- 03:36–05:16 — Funding and growth stats; diversity of user base
- 06:05–07:46 — Differentiating Arena from competitors (organic vs. synthetic benchmarks)
- 10:09–13:38 — The "Leaderboard Illusion" controversy and community’s embrace of pre-release models
- 14:27–15:35 — The economic importance of multimodal models and content/marketing applications
- 17:11–17:53 — Policies that guarantee public leaderboard’s integrity
- 18:42–19:08 — Expansion into expert and vertical categories, future product pipeline
- 20:43–21:33 — Insights on user retention, growth, and sign-in impact
- 21:49–22:44 — Partnerships, hiring philosophy, and openness to agent evaluation
Conclusion
This episode offers a transparent, insider look at Arena—how it became the go-to place for LLM benchmarking, why organic real-world evaluation is crucial, and how the founders insist on integrity and openness even as the product and community scale rapidly. With ambitious plans, embrace of multimodality, and community-driven features, Arena aims to sustain its impact and trust as the “North Star” for AI evaluation.
![[State of Evals] LMArena's $1.7B Vision — Anastasios Angelopoulos, LMArena - Latent Space: The AI Engineer Podcast cover](/_next/image?url=https%3A%2F%2Fsubstackcdn.com%2Ffeed%2Fpodcast%2F1084089%2Fpost%2F186610584%2F7b528f473ec8e5562101a90c6d5abf32.jpg&w=1200&q=75)