
Loading summary
A
Last week, OpenAI published a press release titled An OpenAI model has disproved a central conjecture in discrete geometry. They were talking specifically about the planar unit distance problem, which was first posed by Paul Erdos in 1946. Now this is actually a pretty simple problem to state. It basically says what is the maximum number of pairs of points in a set of endpoints in a flat plane that can be exactly one unit of distance apart. Now, back in the 1940s, Erdos proposed an answer to this question. He couldn't prove it, but he thought he knew what the answer was. Last week, OpenAI essentially announced that they had used an LLM to prove that Erdos proposed answer was in Fact Incorrect. The OpenAI press release was accompanied by a video that featured dramatic music and a group of researchers writing earnestly on a comically small blackboard as they explained why this was a big deal. Here, let's play a clip of that video. This is the first mathematical breakthrough due to an AI. It's been described as the most well known problem in combinatorial geometry. So for a whole subfield of mathematics, it's like maybe the best known problem there is. The mainstream press soon picked up on this story with enthusiasm. Here's the New Scientist headline. Mathematicians stunned by AI's biggest breakthrough in mathematics. People on X predictably went even more wild. Peter Diamandis tweeted the following. An OpenAI model just proved an 80 year old math conjecture from Paul Erdos, one of the most prolific mathematicians in history. We're going to solve everything. All right, so what's actually going on here? Did AI just reach genius level? Has math as a discipline just been automated? As a theoretical computer scientist myself who has published a lot of applied mathematics research in my days, and someone who proudly boasts an Erdos number of three, which you can look up if you don't know what that means. I am, for obvious reasons, particularly interested in these questions. Well, it's Thursday, which means it's time for an AI reality check episode of this show, which is the perfect opportunity to seek some answers. So that's exactly what we're going to do. As always, I'm Cal Newport and this is Deep Questions, the show for people seeking depth in a distracted world. All right, so we need to start by getting more specific about what exactly OpenAI actually did, and then we can get into the implications of what that means for the rest of us. All right, so we're looking at this unit distance planar unit distance conjecture. Erdos was convinced that he had identified the answer to the question, I don't want to get too mathy here, but just to say it quickly, Erdos thought that if you were placing n points into the plane, the maximum number of points that you could get to be a unit distance apart would be upper bounded by N raised to the power of 1 plus some constant C divided by the double log of n. Now as you're probably noticing as you listen to me, that second term in the sum is going to tend towards zero as n increases, asymptotically speaking. So this result, the answer asymptotically is going to approach plane linear as the point set increases. That's a really elegant answer. Erdos was convinced that was right. A lot of other mathematicians just assumed that that was right because Erdos is usually right. And so people tried for a long time to prove that was indeed the fundamental limit. Now what OpenAI did was they released a paper that said no, that's wrong. We actually have a counter example. We have a way of placing points that is going to be have more points, we're going to feature more points at unit distance than that limit. Even as n increases, I believe the actual bound is something like N +1 some small fixed constant epsilon that stays fixed as you increase N as opposed to approaching zero. So they had a counter example. They didn't say here's the right answer, here is what the limit is, here's the best, here's the most possible point. So they didn't replace Erdos conjecture with a better conjecture. They didn't prove Erdos conjecture, but they provided a counterexample construction that showed the thing he thought was the right answer couldn't possibly be right. Now how did they do this? Well, they used a reasoning LLM. So that's an LLM that has been tuned to essentially talk out loud, to sort of think out loud and wander with its thoughts. We first saw the first reasoning models back in 2024 with O1 and the O models, a deep sequence reasoning model as well. Basically, reasoning models are a way of taking an LLM which are static and have no memory and having them approximate something like more dynamic computation with memory because it can sort of, as it rambles right, it's looking at everything it said so far when it produces a new token. So it can, if you're rambling, you're thinking out loud, you can use all of that thinking in producing the new token. So it's like you have some memory and this wandering can be somewh dynamic. You can get some Basic iterative or looping type thinking in it. So they used a reasoning model and what they did is I don't know how many times they prompted it or on what questions they prompted it, but on one of the times they prompted it about this particular problem, the model spit out a very long transcript of an answer and a team of expert mathematicians poured over this answer. And in this long chain of thought transcript they identified in there the core idea that became the counterexample. So these mathematicians then pulled that counterexample idea out of this transcript, they polished it, they wrote it properly, they elaborated it and put it into a short, concise, much more human readable paper. And that's what OpenAI actually posted. So the LLM did not post this sort of elegant five page paper or what have you, human mathematicians did it, but they got the idea for writing this paper out of this really long chain of thought transcript that this model produced. When cogitating about the Erdos problem, they asked it about. Right, so that's what went on. All right, that's what happened. Let's now get into the core questions that people have about this thing that just happened. Question number one, is this result really important? Right. There's been some issues before where AI companies have claimed we solve these, these important math problems. And then other mathematicians came along and said well that's already been solved. You probably just saw in your training data or that problem is really minor. Is this the same or is this actually a really important problem? And the answer is yes, it is important. It is important in the sense that it's a well known problem and everyone just assumed that Erdos proposed answer was correct. So it really was surprising to mathematicians to learn, oh, he was wrong. No one had come up with that before. I don't know how many people are actually trying to prove the counterexample, but no one had come up with that before. All right. Also it's a result that you could publish in a top venue. Right? And they're going to, I assume, I mean there's some complexities about how do you cite this and how do you do authorship. Right, but, but if a human had come up, you know, if they had come up with this exact result without the help of an LLM, Boom. Analyst, mathematics, Right. It's a big deal to show that Erdos was wrong. So it's an important, it is an important result and it surprised a lot of people. All right, question number two that a lot of people are asking. Does this mean LLMs are now smarter than human mathematicians. The short answer here is no, but I want to provide you a long answer as well. I want to provide some context about what is and is not actually happening in this particular example. Now I want to start here by reading some comments from the mathematician Thomas Bloom, who is a world's expert on Erdos open problems. Now OpenAI released a companion commentary paper that collected comments from eminent mathematicians on this result. And Thomas Bloom was one of the mathematicians they asked for his comments. So I want to read to you from Thomas Bloom's comments about this new result published in the commentary paper that OpenAI put out. So Bloom starts by saying, I mean, I guess this quote, he says a lot of things, but the quote I'm going to read here starts with Thomas Bloom saying if the result of this paper was a proof of the unit distance problem, that would be truly incredible, right? So just to step back here, what he was saying is that if the LLM had found a way to prove that Erdos conjecture was correct, that's a much harder thing than proving a counterexample. That really would have been incredible. He goes on to say, while I was still very surprised to hear of this result, this was dampened slightly when I learned it was a construction of a counterexample and still further when I learned that the nature of the construction being, with the benefit of hindsight, a natural, albeit highly non trivial generalization of the original lattice based construction of Erdos's. All right, so again, let me translate the mass speak here. He's like, okay, when I first heard this problem was solved, I was like oh my God, that's incredible. And then when he learned it, well, it wasn't solved, there's a counterexample to the existing proposed solution, his enthusiasm was dampened. And then when he saw that the actual counterexample construction wasn't some new original leap of mathematics, it actually was just taking the original construction that Erdos had proposed that he thought matched his answer and then just, I don't know how to say that I'm going to say this wrong, but basically they applied a, the LLM said what would happen if we applied a sort of standard and they had to generalize this, it was non trivial, but if we applied a standard, sort of like algebraic embedding I guess would be the right word to the original solution. Hey, it turns out this thing has more points with unit distance than Erdos thought, right? So it was, the answer was kind of in the near field, around the right. It was around, it was Nearby, just no one had found it before. All right, so let me go on. I'm going to read some more from Bloom. On examining the construction, it becomes more clear how people had missed this before. It requires the confluence of several different unlikely events that a good mathematician is one, spending significant time in thinking about the unit distance conjecture in the first place. Two, seriously trying to disprove it despite the oft repeated belief of Erdos that it is true. Three believes that there is mileage in generalizing the original construction to other number fields and so is willing to expend significant time in exploring such constructions and four, sufficiently familiar with the relevant parts of class field theory to recognize that the appropriately phrased question about infinite towers and number fields with a appropriate parameters can be solved using existing theory. The AI met all of these criteria and its success here echoes previous achievements. It often produces the most surprising results by persevering down paths that a human may have dismissed as not worth their time to explore. Combining superhuman levels of patience with familiarity with a vast array of technical machinery. All right, so he's explaining here a lot of things had to come together to find this answer, which is why no one had before. It's not that hard of an answer once you see it. So why had we not found that? And he, through these reasons, like people weren't looking for it. They thought Erdos was correct. It was. There was. You had to. There was like two concepts that came together and you would have to know both those concepts. There's some perseverance that was required. But there's a phrase in here that I want to pick up on because it's going to tell us more generally what's happening in mathematics. Bloom said this result has quote echoes it quote echoes previous achievements in quotes. So what's he talking about here? Well, in recent years, there's actually been an explosion in using Computer Aided Math tools and professional mathematicians. Now, these type of tools have been around for a long time, but what's happened more recently is we've combined and augmented these tools with LLMs. LLMs plus existing computer aided Math tools has created an explosion of new results. It's uncovering a lot of results that tend to have the same sort of characteristics, mainly that they are too tedious for most humans to fruitfully pursue. Right. It's because the results that these Computer aided tools with LLMs are looking at, they often require either like a systematic search of some type of space, which would be so boring you wouldn't even want like a new grad student to do this and it would take forever. Also, these tools can draw on lots of different, like a vast knowledge of many different existing results and techniques that can be trained on it. Many more than most mathematicians can keep in their head. So it's willing to systematically explore answers, mix and match different approaches and see what works. It can also a lot of these systems will use formal proof verifier, so it can try a bunch of stuff and see what works. So it's been really big for mathematics. So this result, what Thomas Bloom is saying is this result is not some brand new capability that we didn't know that AI had. He says it's in the. He's saying it's in the trajectory of those existing type of results we've been doing for the last years. It's fault in that sweet spot where AI enabled math tools really work well. Now, I want to be fair. There's two things about OpenAI's result that do separate it from this sort of existing recent explosion in AI augmented computer aided math work. Number one, this was done, at least they claim, just purely with an LLM prompt. Right. The tools that mathematicians are using tend to be using modular architectures with many different types of models hooked together. You'll have an LLM, you'll have a formal proof verifier, usually using a formal verification language like lean, which LLMs can speak very well. You'll have some sort of complicated control logic. You'll have specialized training of the LLMs on very specific types of math techniques that are relevant to the fields you're looking at. This was not that there was no elaborate scaffolding. It was actually just a prompt to a reasoning machine that just Talked for like 150 pages and in there they found an answer. All right, so that is different about this result. I want to give a caveat here though. I think that this is really. The fact that they're using a pure LLM to do this is more marketing than utility. Actually. The modular architecture tools are the right way to use LLMs plus computers to help with math. I would assume it's inefficient, expensive and complicated just to use regular LLMs, especially as shown by they had to. God knows how many transcripts they produced and how many problems they looked at and how much work was required to dig through this long rambling chain of thought to find an insight. Right. I don't think this is the right way to get this type of result. So I think they're using this to try to. It's a new model they're about to release. So just like Anthropic made the claim that Mythos they can find bugs that no other model could before, even though it turns out that actually largely they could so that you would be more willing to just use their Mythos model. In general, I think this is OpenAI be like, hey, it solved a math problem. Hopefully companies will now pay to use this model. I think that's what's going on here. I want to point out, for example, right after this OpenAI announcement, Google DeepMind put out their own paper announcing the Alpha Pro Nexus, which is a modular architecture system of the type I'm talking about, in which you have LLMs tuned on math, you have proof solvers, you have really complicated control logic, you have agents and sub agents that are systematically. The control logic, I believe, if I understand it correctly, is helping to systematically guide the types of prompts to the LLMs and what spaces to explore and then taking the answers and running through the proof solver and giving feedback. And this is this new, this modular architecture, the cutting edge modular architecture. They just announced they ran it on 353 open problems, open airdos problems, same type of problems this was from and it solved nine of them. The other ones it couldn't solve, but solved nine of them and it was pretty cheap to run. These are small models. These are not 20 trillion parameter general reasoning LLMs. So that probably is the way to do math. This doesn't discount the importance of the problem. I'm just saying the fact it's a pure LLM is new, but maybe not that important. Okay. The second thing that made this OpenAI result novel is the fact that it was an important problem. That's by far its biggest distinguishing factor is like people knew this problem. If you're a combinatorial geometrist, as I know so many of you in my audience are, this was a well known problem and so this is like the first cool problem that was solved. This technique of using LLMs and computer tools to help find proofs or disproves this has been going on for a little while now. But this is the first major problem that was solved. So that is another like a feather to put in OpenAI's cap. Again, there's another caveat here as Bloom was kind of emphasizing this may have been a little bit of luck in the sense of this well known problem happened to have a relatively easy counterexample that existed in the type of space that these type of tools are really good at. Right? Not that we now can solve all these type of open problems as shown now by the Alpha Proof Nexus, which is probably a higher power, better application of this type of thinking, much more automated, much more systematic, and it could only solve 9 of the 353 similar style of problems they pointed out. So again, most problems with for whatever reason are unsolvable. You know, a lot of them because they're actually literally unsolvable, and a lot of them because the types of solutions don't fall into that style of solution space where these tools are well suited. All right, so there are some caveats to that, but that's what's going on here. All right, question number three. Does this mean all equally hard challenges will now be conquered by AI? I definitely got that sense on X. Mathematicians aren't saying this, I don't think OpenAI. They're implying it, but they're not saying this. But I think the, the feeling we have online is like, well, that's a really hard thing to do. It's a problem humans tried to solve but couldn't. Does that mean we have this superhuman intelligence that now can give us super intelligent performance and all sorts of things I might care about? Is it time to learn, like Sarah Connor and Terminator 2 how to cock a shotgun with one arm and show off our awesome guns, by which I mean biceps? The answer here is likely no as well. Remember in a previous AI Reality Check episode of this podcast, I said there's two mental models for thinking about AI capabilities. The one model is a rising water one, the capabilities grow and it's like a water level of capability rising. And you have mountains that represent problems of different difficulties. And as the water rises, all mountains of that difficulty are now solvable. So if the water is high enough to cover the air dosh open problem peaks, then like it's covering all these other even easier problems that are relevant to our everyday life. And therefore we're about to have really big disruptive impacts. That, however, is the wrong way to think about AI development. I said the better way to think about AI capabilities is like you have like a main river and you're exploring tributaries into the river and some of those. So each of these tributaries is like another type of problem or application. And some of these tributaries are able to make a lot of progress, and other ones, you almost immediately get stuck and it becomes non navigable. And so making progress on one tributary doesn't really tell you anything about whether this other tributary down the river is going to be equally as explorable or not. That is what I've been arguing is the right model. And I think that helps us better place these math results in context. And remember, ever since like ChatGPT put generative AI and large language models on people's radar, computer scientists have been saying there are two areas in particular in which LLMs are going to be very well suited, computer programming and mathematical reasoning. And this is because those two problems share four elements. They deal with highly structured language, either computer code or mathematical notation. They have clear notions of correctness. Does this program compile and pass the test? Is this proof true? Is this math result right? There's endless data to train on the computer programming tuned models. We can tune it on these sort of endless examples of code online where people ask questions, other people give answers. Math is even better because you can actually artificially, synthetically create data, example after example of many different math problems and correct proofs, and tune it again and again and again with it. So it's really good at particular types of mathematical reasoning. And also in both cases programming and math, you have expert users who are willing to use hard tools and massage good results out of it. If you told me, I have this great AI tool, it's going to spit out 150 pages. You have to comb through it and try to piece together maybe in there there's a useful insight for like my business problem. The average business person is going to say, I'm still trying to figure out where the paperclip at Microsoft Word went. But a mathematician, oh yeah, sure, we'll do this. We love hard tools. All right, so you can think about programming and math as like the two major tributaries right now of the Genai river that have proven to be very navigable. And that's where we're therefore putting a lot of our explorers and continuing to try to push progress. Because if you're an AI company, you want to report progress on something, but being able to explore that tributary tells you nothing about the other. So do not think the absolute wrong mental model is anything that is easier than solving an air dose problem is something that AI can now do. If anything, and I pointed this out, Gary Marcus quoted me saying this in a substack he put out soon after this announcement. If anything, the fact that with a IPO looming, revenue concerns mounting, that the use case that OpenAI is crowing about is we are helping mathematicians, professional mathematicians, on creating discrete geometry proofs. If anything, that is a huge vindication for the tributary mental model. Right? There could be few fields less lucrative than professional mathematicians working on proofs there is no money spent in that field because it doesn't generate money. It's incredibly esoteric knowledge for knowledge is sake. And so if you really want to impress investors and say we're going to be okay, we're going to make money, you wouldn't be bragging about the least lucrative possible application. You would be bragging about, look at this application, which is saving companies on average $100 million. We just generated a billion dollars of revenue. We just helped these companies cut their operating conditions by half. This, this is going to be more important than electricity. Everyone needs this tool. That's what you'd want to be announcing. Not, oh, there's an interesting algebraic field in which if you move a standard otherwise two dimensional square lattice, you're able to squeeze something that is super linear in terms of points at unit distance. Right. That's, that's not what you would announce. If it was true that solving that problem meant that you now had genius level AI, you would use that to solve all the economically useful things you could do with an automated genius level artificial intelligent mind. So I think the fact that this is what we're focusing on just vindicates the idea that just being good at one thing doesn't make you better at other things or we'd see more economically productive examples. All right, question number four. What does this tell us about the future of math? The future of math is exciting. Let me be really clear about this. Just like how computer programming now has LLM based tools deeply embedded in it. It's just we haven't really figured out exactly how to use them. Like we haven't stabilized that yet. There's still a lot of nonsense of like agent supervising agents who supervise the agent supervisors. And we look up at the end of the day and they have a broken hello world code or whatever. Like you know, we're still figuring it out but like LLM tools are completely changing how programmers work. Something similar has been afoot in professional mathematics for about the last year or so. Computer aided tools have always been there and very powerful, but they're too annoying to work. But when you add LLMs into the picture, especially LLMs that are tuned for doing this type of mathematical reasoning, especially when you're in the very smart type of modular architectures that Open DeepMind is producing, it really helps. Now I haven't published professional applied mathematics research. That was my field. I'm a distributed algorithm theorist and applied mathematician within distributed algorithms theory. A few years ago I stopped doing mathematical research to help start up the center for Digital Ethics at Georgetown and do a lot more of this public facing technology criticism. But I'm looking at these tools, I'm like man, that would have helped. And if I return to applied math I'll be using the hell out of those tools. I think it would make me, just knowing what I know from doing this Type of work 2x more effective in terms of quality, comprehensiveness and speed with which I produce results. I don't think people understand how much like mathematics is a combination of the creative insight plus a lot of tedious work, learning and applying results, trying different things, working out the details, getting stuck. Where you often get stuck in mathematics, at least in applied mathematics, is you have the idea, it's the algebra. It's often like I gotta bound this sum of these random variables. I need to bound it at this level to show that I can apply a union bound and something like that or I can't apply a union bound here. And there's dependencies but they're correlated, they're not quite negative correlated. But we could apply like a martingale here to try to keep the discorrelation. And I don't quite remember how that works. And the best mathematicians, applied mathematicians, just spent more time learning results and practicing results and reading results. If you could have an AI help you with that, it really would make a big difference. I'm telling you from experience. If you could have an AI explore proof spaces for you, I it's a fantastic tool and it's going to be a big part of mathematics. And I think what's going to happen is in the near future you're going to have an explosion of people doing like low hanging fruit results like we're seeing in these papers. I'm searching subspaces within my specialty. I'm just like letting the AI I'm you know, very carefully tuning it and giving it the right questions like here's the space we're looking at, but I'm going to kind of find, refine a lot of results that exist in the field. There's a little gap here, we can close this gap or find a lot of counterexamples to existing conjectures and a lot of it's not going to be that interesting and there's going to be a problem of too many results to be refereed. But I think longer term, medium term, let's say we're just going to see the average quality of the high end math results is going to jump up. Because I just know I produced a few interesting results in my Time, I think they could be more interesting with these tools. So I think the future for mathematics here is exciting and it kind of makes sense. Math and computer programming, that's what we were told from the beginning. LLM sweet spot. And we're getting these custom tools, custom tools in programming which are these, like, complicated coding harnesses, and custom tools in mathematics which are these modular, architected, sort of proof exploration systems. And they're making a difference. And I actually think that's exciting. It also, by the way, vindicates my vision of what you could call either distributed AGI or narrow AGI. This idea that the only people who believe that we're going to just keep scaling these massive models, these singular LLMs, until they're HAL 9000, that you can just build lightweight harnesses on top of them to do everything, are the AI companies that want that to be their moat, but that the future of AI is going to be bespoke systems with modular architectures that are tuned to particular problems, types of problems or domains with a lot of input from people in that field that do really well in those domains. That is the future of AI. I really like that future of AI, by the way. It's like way more resource efficient. It's incredibly alignable because your modular architectures to do one thing are very controllable. It's not just like querying an LLM and doing what an LLM says. It's more economically diverse, it's more responsive to customers because you're building bespoke tools for bespoke places. And it gets us away from this pervasive sense of building an alien mind that's out of our control and God knows what it's going to do. I'm so tired of that type of dialogue. So I think that, to me, I think this is exciting. It's important. What's happening in math. There's caveats around this particular result that I think makes it less of a massive new thing than OpenAI is letting on. But the thing that is happening in math is very interesting. As a mathematician myself, I say it's cool. All right. This brings me to my concluding thoughts, though I don't like the way this announcement's being talked about and I think it's an indictment about the way we're talking about AI more generally. This should be narrow and exciting. Like, hey, new tools and math are really like, improving what we're able to solve. It should be like something that we say. And most people are like, shut up, nerd. I'm watching the NBA championships, and that's that. But instead you see all this chatter. This has to be cast as some sort of signal that a terribleness is brewing. That somehow we had to take something as specific and bespoke as math tools and transform it into a source of digital ick. Here is my plea. Not everything about AI has to be placed in some sort of Manichean system of the people versus the machines, where we have to evaluate every news release versus, like, is this in favor of people triumphing or does this evidence that the machines will eventually take us all out? This is a ridiculous dichotomy. It's Gladys, it's the Coliseum, it's people online cheering on because it gives them meaning in their life. It's cynical CEOs at these companies that like the ick and the despair and the anxiety of this sort of dichotomy of darkness versus good because it makes their product seem exciting. They're going to get a better ipo. I'm so tired of it. Can we talk about AI as like a normal technology that produces can be used in specific products? If I told you five years ago I have a new computer aided math solving tool that's really sweeping through math and is really opening up all sorts of types of problems we weren't able to solve before. And mathematicians are excited about it, you would lose negative five minutes of sleep about that. You'd be like, that's kind of cool. I'm not a nerd. And you would move on with your life. And yet that exact same announcement today, because it's placed in this Manichaean formalism or framework, makes people be like, oh God, I guess I gotta be doing the Sarah Connor pull ups in my jail cell so that I'm ready to fight the robots when they attack. So that's just what I want to say. I'm exhausted about this discourse. Could we just treat AI like a normal technology and allow math nerds like me to say, this is a cool tool, I can't wait to use it. And that's how we should be thinking about much more of what's happening with AI. Nothing terrible just happened. This is only good news. It's narrower than you think. If you're not a mathematician, you don't have to care about it. If you are, you should be excited about it. Fields evolve. Math has evolved massively with technology over the years. That's just how these things work. Can we all just chill out? All right, that's my final plea. All right? That's all we have for this week. I'll be back on Monday with an advice episode of this show. That's where I give advice about how you can join the fight for depth in attractive world. Sign up for my newsletter@calnewport.com if you want my personal dispatches about what I'm currently thinking about in this fight for depth in a distracted world. And most Thursdays I have an AI Reality Check episode, so probably next Thursday I'll have another one of these, because God knows there's a lot of claims that require a little bit of reality to be injected into it. All right, that's it for today. Remember to care about AI, but not everything you read about it.
Episode: Did AI Just "Solve" Math? (Let’s Take a Closer Look) | AI Reality Check
Date: May 28, 2026
Host: Cal Newport
In this AI Reality Check episode, Cal Newport examines OpenAI's recent claim that its new AI model disproved an important 80-year-old conjecture in discrete geometry, the planar unit distance problem originally posed by Paul Erdős. The episode aims to unpack what actually happened, assess how significant the achievement is, explore its implications for mathematics and AI, and critique how such AI news is reported and interpreted by both technical and popular audiences.
No:
“While I was still very surprised to hear of this result, this was dampened slightly when I learned it was a construction of a counterexample and still further when I learned that the nature of the construction being, with the benefit of hindsight, a natural, albeit highly non-trivial generalization of the original lattice-based construction of Erdős’s.” — Thomas Bloom [~31:00]
The AI shines where:
Recent trends:
There’s been a surge of AI-augmented math, especially combining LLMs with computer-aided proof tools (e.g., Google DeepMind’s Alpha Proof Nexus, formal verification languages).
LLMs are good at some types of math, especially those amenable to vast training data and systematic search.
“LLMs plus existing computer-aided Math tools have created an explosion of new results... uncovering results that tend to have the same characteristics: too tedious for most humans to fruitfully pursue.” — Cal Newport [~36:00]
What's novel here?
OpenAI used a “pure LLM” without established modular architectures or verification scaffolding.
However, Cal argues this is probably more about marketing and not a sign of a fundamentally new AI capability.
“I think that this is really... more marketing than utility.” — Cal Newport [~41:00]
Comparison to other tools:
Cal attacks the “rising water” notion (AI gets better at one challenge so suddenly solves all lesser challenges), supporting instead the “tributary” model (progress is domain-specific; solving one problem doesn’t guarantee progress elsewhere).
Programming and mathematics are particularly suited to progress with LLMs, because of:
“Do not think the absolute wrong mental model is anything that is easier than solving an Erdős problem is something that AI can now do.” — Cal Newport [~51:00]
Evidence for this: If LLM math breakthroughs heralded general AI genius, companies would be touting transformative business applications, not esoteric geometry solutions.
“If it was true that solving that problem meant that you now had genius level AI, you would use that to solve all the economically useful things you could do with an automated genius level artificial intelligent mind.” — Cal Newport [~53:30]
Math is poised for a productivity and insight leap as LLMs are integrated into proof exploration tools:
Cal foresees continued modular, custom AI systems producing domain-specific advances—the “distributed AGI” or “narrow AGI” future—rather than one giant multi-purpose LLM.
“I really like that future of AI, by the way. It’s way more resource efficient. It’s incredibly alignable... It gets us away from this pervasive sense of building an alien mind that’s out of our control...” — Cal Newport [~61:00]
Cal calls out sensationalist reporting and the framing of every AI advance in apocalyptic or triumphalist terms, especially around “machines vs. people.”
Pleads for a more “normal” view of AI as a specialized technology:
“Not everything about AI has to be placed in some sort of Manichean system of the people versus the machines... Could we just treat AI like a normal technology and allow math nerds like me to say, this is a cool tool, I can’t wait to use it. And that’s how we should be thinking about much more of what’s happening with AI.” — Cal Newport [~66:00]
Points out that the same result would have elicited little public concern or excitement if announced as a new computer-aided math tool five years ago.
“If you could have an AI help you with that, it really would make a big difference. I'm telling you from experience.” — Cal Newport [~59:00]
“The future of AI is going to be bespoke systems with modular architectures that are tuned to particular problems...” — Cal Newport [~62:00]
“Nothing terrible just happened. This is only good news. It's narrower than you think. If you're not a mathematician, you don't have to care about it. If you are, you should be excited about it.” — Cal Newport [~68:00]
Cal Newport offers a nuanced, skeptical, yet optimistic analysis of OpenAI’s mathematical “breakthrough,” situating it in the broader landscape of AI progress. He highlights that while this is a notable and surprising result for mathematicians, it does not signify general intelligence or the imminent ubiquity of “AI genius.” Instead, it exemplifies how LLMs—especially when paired with expert humans and soon modular, domain-tuned architectures—can meaningfully augment technical progress in niche but important fields.
Final message:
Everyone should “chill out” about mass-market AI hype and panic. Advances like this are beneficial, narrow, and ultimately exciting—for those in the relevant field. Math and tech evolve, and so, too, should public conversation about these tools.