The MAD Podcast with Matt Turck
Episode: The AI That Beat the World’s Hardest Math Test (Putnam 2025) — Carina Hong
Date: February 26, 2026
Guest: Carina Hong, Founder & CEO of Axiom
Episode Overview
This episode dives deep into the development and implications of Axiom, an AI mathematician built to be 100% correct, thus fully solving the AI hallucination problem—in math, at least. Host Matt Turck and guest Carina Hong discuss how Axiom aced the Putnam exam (the world’s hardest math test), autonomously solved open research conjectures, and what this means for the future of math, proof verification, AI, and science. They also explore Carina’s unique personal journey through competitive math, academia, and founding an ambitious tech company at age 24.
Key Discussion Points & Insights
Why Build an AI Mathematician?
[01:25]
- Carina’s Vision: The world is at a threshold of a “mathematical renaissance,” enabled by AI agents that can solve theoretical problems across industries.
- “Our worldview is mathematical reasoning is a true reasoning layer of AGI. From math, you kind of get to code, and from code, you can run a lot of real world experiment.” — Carina Hong [00:00]
- Applicability: Math extends to verification, optimization, physics, logic, and more.
- “If you solve math, you can have physics, you can have a lot of logic and property and you can extend to a lot of things. The world needs more math.” — Carina Hong [01:32]
Understanding the Difficulty of Math & Axiom’s Achievements
[02:02], [03:06], [04:13]
-
Putnam Exam Explained:
- Notoriously difficult; >50% of participants score zero; only 5 perfect human scores in 100 years.
- “We got the exam from the proctor...threw it to the [Axiom] prover and we announced that we got a perfect score.” — Carina Hong [03:34]
- Axiom solved 12 out of 12 problems, 8 within the time limit.
-
Autonomous Research Conjectures:
- Solved 4 open research-level math problems from professors globally, end-to-end, without human intervention.
- “It’s probably the first AI to solve a research conjecture completely end-to-end and self-verified. The outputs are fully verified, 100% correct.” — Carina Hong [04:13]
What Makes Math Hard?
[02:10], [05:31]
- Two Axes of Difficulty:
- Abstractness (e.g., qualifying exams) vs. Creativity (e.g., IMO/Putnam).
- “There are some really important mathematical breakthroughs that happen once in a decade...But there are a lot of other questions that are sort of proficiently, routinely applying the standard bag of tricks.” — Carina Hong [05:31]
On Novelty in AI Approaches ('Move 37’ Moments)
[07:19]
- AI Solutions vs. Human Solutions:
- Axiom’s proofs are produced in Lean (formal language for math) and sometimes use different, more mechanistic approaches than humans.
- “A lot of the solutions actually differ from the human solution. Since it’s a Lean-based system, it is really good at routine bookkeeping...” — Carina Hong [07:19]
Lean and Formal Proofs: Foundation of Axiom’s Approach
[08:59], [09:10]
- What is Lean?
- Functional programming language designed for math proofs; allows code and proof to be the same.
- “It’s like both the C language and the GCC compiler—two in one...you can run it and then they will see a check mark which shows that it's logically correct proof.” — Carina Hong [09:10]
- Key to providing formal, verifiable proofs.
Differentiating from OpenAI, DeepMind, and Others
[10:50], [11:07], [14:28]
- Contrast in Approaches:
- Previous automated theorem provers (ATP) pre-date deep learning, often symbolic without LLMs. Google Alpha Geometry and AlphaProof are examples of formal systems for specific domains.
- OpenAI/DeepMind mostly focused on informal, natural-language reasoning—“brute force.”
- “We kind of take the approach of, you know, we think that it's important for the system to be able to reason both informally and formally and in a way, sort of bridge across these different abstraction from high level intuitions to low level.” — Carina Hong [13:38]
Formal vs. Informal Reasoning in Mathematics
[14:28], [16:06], [16:16]
-
Definitions:
- Informal: Natural language (English) arguments—what mathematicians have classically written.
- Formal: Programs/proofs written in Lean or similar, checked mechanically.
- “Mathematicians have been coding in English for centuries... Formal language means Lean and the output will be lean machine code, wouldn’t be super readable to humans.” — Carina Hong [14:30]
-
Auto-formalization:
- Hardest problem—translating informal arguments into formal, verifiable code.
- “Harder than translation... You're translating something that cannot be verified [natural language] into something that can be.” — Carina Hong [15:06]
-
OpenAI/DeepMind Target Informal, Axiom Targets Formal:
- “OpenAI and Google DeepMind approach would fall into the informal...we'd be falling into the formal.” — Matt Turck [16:06]
Verification — The “Missing Layer” for AI
[18:00]
-
Hallucination Problem & The Need for Proof:
- “If you want to have a reasoning engine that really truly masters at logic and mathematical reasoning, then you need to somehow get verifiable reward for the proof steps.” — Carina Hong [19:13]
- Reward hacking is possible if you only check the answer, not the steps.
-
RL with Verifiable Rewards (RLVR):
- Essential for real progress in mathematical AI.
Generalizability and Impact
[20:46], [22:21], [23:45]
-
Potential Beyond Math:
- “Our worldview is math reasoning is a true reasoning layer of AGI...From math you go to code, and from code, a lot of real-world experiment in the software stack...” — Carina Hong [20:46]
- Start with math; extend to code, verification, software, hardware and potentially broader scientific fields.
-
Limits:
- The method is most naturally expanded in domains expressible in code or logic—less so in non-formal domains.
Carina’s Personal and Academic Journey
[24:53], [25:41], [29:53], [32:00]
-
Early Years:
- Grew up in China as a “competitive math kid.” Math Olympiad training gave resilience, addiction to “pain and suffering,” and a deep love for math.
- “Failure is like a given... just the exam keeps getting harder and the number of people competing keeps shrinking...” — Carina Hong [26:19]
-
Influential Experiences:
- Ross Math Camp in the US introduced the idea of “constructing math yourself from axioms”—a formative step in her math philosophy.
- “It was like...every day I will be learning cool research math...the curiosity and discovery is a basic human need.” — Carina Hong [28:18]
-
Academic Diversity:
- MIT undergrad in 3 years, Rhodes Scholar at Oxford (neuroscience), then joint math PhD and JD at Stanford, fascinated by IP law, law and economics.
- “The law professors marry AI and copyright law... I spent one year being a diligent law student... then decided to fully focus on Axiom.” — Carina Hong [32:00]
Under the Hood: How Axiom Works
[34:06], [35:28]
-
Three Main Components:
- Conjecturer: Suggests new mathematical statements to prove (LLM-based, still under development).
- Prover: Core of the system, takes statements and produces formal Lean proofs. Ensemble of models, deterministic Lean-based tools, and a proprietary (partially synthetic) data set.
- Knowledge Base: Large, ever-growing store of already formalized mathematics.
- “Your knowledge base is like...to look up your knowledge base, to make sure that this is indeed uncharted territory.” — Carina Hong [34:06]
-
Mechanical Insights:
- Generate a huge amount of synthetic, verifiable Lean code (easier for math than for, say, finance or law, because correctness is decidable).
- Combining CPU (Lean) and GPU (LLM) efficiently is a challenge, but less compute-intensive than full LLM pretraining.
-
Announcement:
- Releasing “Axiom, Lean Engine XL”—infrastructure tools for Lean proof generation, 100x faster than open source counterparts. [36:51]
Frontier and Bottlenecks
[43:10], [45:19]
-
Pushing Limits:
- Prover is hitting success at Journal-level problems (e.g., Journal of Algebra), not yet Annals of Mathematics—but scaling both the depth (“up”) and breadth (“out”) of proofs.
- “On the easy end of the Putnam problem, we have 40 nodes. On the hard end of research questions in house, we currently have a problem with thousands of nodes.” — Carina Hong [44:34]
-
Fundamental Bottlenecks:
- Scarcity of Lean data (solved by synthetic generation), hardware engineering, scaling inference, transfer to new domains (e.g., from math to code or beyond).
Can AI Win a Fields Medal?
[46:32]
-
“We really want axiom prover to be able to solve one long-standing problem...so that you can objectively say, even if it's an AI, that would be in the [Fields medal] shortlist.” — Carina Hong [46:36]
-
Philosophical and procedural questions about awarding prizes to AI—but real advances would be undeniable.
The Broader Vision: Mathematical & Scientific Renaissance
[47:49], [52:20]
-
On the Brink of a Mathematical Renaissance:
- “We are now realizing that we are at the threshold of mathematical renaissance. We could also be at the threshold of theoretical discoveries in science—massive scientific discovery at the theory level.” — Carina Hong [47:49]
-
All Theoretical Problems Solvable by AI:
- “Everything that human mind can conjecture, find interesting, find tasteful, to be solved by AI, hopefully majority by Axiom Prover.” — Carina Hong [48:17]
-
Math Enables Science & Code:
- Recursive flywheel: mathematics → science → code → technology → new mathematics.
- “He [Shubo, Axiom’s CTO] believes code is math… you can try to fulfill the dream of Knuth’s ‘literary programming,’ where programmers can enjoy the luxury of mathematicians where they can reason in natural language.” — Carina Hong [50:05]
Industry Impact: Verification as an Enabler for AI Deployment
[53:25], [55:19]
-
Industries with Zero Tolerance for Error:
- Fields like aero/astro, defense, hardware require provable guarantees before deploying AI solutions.
- “There is no partial credit for mostly verified GPU. Mostly flying plane. Yeah, yeah, yeah.” — Matt Turck & Carina Hong [55:19]
-
Axiom as an Enabler:
- “We want to see the AI for math movement that Axiom is hopefully leading transfer to these domains and try to solve some of those problems as well.” — Carina Hong [55:47]
Reception Among Mathematicians
[56:01]
- Verification = Trust:
- The main reservation from mathematicians is the unverifiability of informal AI-generated proofs; formal, machine-checked proofs could solve this.
- “If you have a formal certificate, like a stamp certifying that this is a correct Lean proof, I think people are a lot more receptive to it.” — Carina Hong [56:01]
Leadership, Culture, and Team-Building
[57:48], [59:16], [60:42], [61:07], [62:48]
-
Transition from Math to CEO:
- “I learned that ego is a really, really bad thing and we should just basically get rid of it.” — Carina Hong [59:16]
-
Hiring Principles:
- “We hire people for taste, but not for ego...when we are excited, we just go after that person.” — Carina Hong [59:46]
- Assembled a founding team of world-class mathematicians and AI researchers—prizes alignment and culture of intellectual excitement and adventure.
-
Company Culture:
- Flat, non-hierarchical, debate encouraged, recruiting bar continually raised, successful in attracting “tribe” of like-minded talent.
Notable Quotes & Memorable Moments
-
On the core challenge:
- “Building an AI that is 100% correct 100% of the time. Completely solving the hallucination problem or the stochastic issue—make AI perfect, the perfect prover.” — Matt Turck [20:18]
-
On the future:
- “I think this is going to be amazing. Ideas can diffuse between different fields...you have calculus, you have the industrial revolution...then you have the prototype of computer science. The rest is history.” — Carina Hong [52:20]
-
On discovery:
- “The curiosity and discovery is a basic human need and that definitely exists in...being used to motivate and inspire mathematical learning. That was a very beautiful process.” — Carina Hong [28:18]
Timeline & Timestamps
| Timestamp | Segment / Topic | |-----------|----------------| | [00:00–02:02] | The need for AI mathematicians & scope of math | | [03:06–04:05] | Putnam exam: Setup and Axiom’s achievement | | [04:13–05:07] | Solving open research conjectures with AI | | [05:31–06:58] | The role of creativity, intuition, and routine in mathematics | | [07:19–08:59] | Differences between AI and human mathematical reasoning | | [09:10–10:50] | What is Lean and why it matters | | [11:07–13:38] | The history of AI in math and comparative approaches (OpenAI, DeepMind, Axiom) | | [14:28–15:06] | Formal vs. informal reasoning; auto-formalization challenges | | [16:06–19:13] | Verification, RLVR, and eliminating hallucinations | | [20:46–23:45] | How generalizable is Axiom’s approach; moving from math to other scientific domains | | [24:53–28:18] | Carina’s personal journey: China, MIT, Ross Math Camp | | [29:53–32:00] | Transition to neuroscience, law, and founding Axiom | | [34:06–36:47] | Axiom’s technical architecture and new product launch | | [38:40–40:14] | Data scarcity & synthetic Lean data generation | | [43:10–45:19] | Internal benchmarks and scaling bottlenecks in proof AI | | [46:32–47:49] | Can AI win a Fields Medal? The real challenge in research math | | [47:49–52:20] | The future: mathematical and scientific renaissance | | [53:25–55:47] | Industry applications of verified AI: going beyond research | | [56:01–57:25] | Math community response; formal proofs increase trust | | [57:48–63:22] | Leadership, hiring, and founding an elite, adventurous team |
This episode offers a rare, detailed look at the future of automated mathematical reasoning and proof, the technical, philosophical, and societal shifts it will drive, and the remarkable journey of one of its pioneers.
