Transcript
A (0:00)
If you're an H Vac technician and a call comes in, Grainger knows that you need a partner that helps you find the right product fast and hassle free. And you know that when the first problem of the day is a clanking blower motor, there's no need to break a sweat. With Grainger's easy to use website and product details, you're confident you'll soon have everything humming right along. Call 1-800-GRAINGER clickgrainger.com or just stop by Grainger for the ones who get it done.
B (0:30)
If you're the purchasing manager at a manufacturing plant, you know having a trusted partner makes all the difference. That's why hands down, you count on Grainger for auto reordering. With on time restocks, your team will have the cut resistant gloves they need at the start of their shift and you can end your day knowing they've got safety well in hand. Call 1-800-GRAINGER click granger.com or just stop by Granger for the ones who get it done.
C (1:00)
I was recently watching a video by Mo Gada. It was a keynote he was giving. He's a former Google X executive and he was saying that AI is no longer just writing code, it's actually correcting human math. He gives this really incredible example where he says basically for the last 56 years he's been using the same matrix multiplication method for code. And this is something that he said is, is like very standard. People agree on this for a very long time. And he said that recently he was, you know, talking to, to AI and telling it basically to improve itself. And he said when he told AI to improve itself, the AI realized that their matrix multiplication method was flawed. And so instead of trying to go and optimize the software that he had had created that they had for AI, which is what he assumed it would do, instead it invented a completely new way of doing math and he said to essentially optimize itself. And he said that that new invention resulted in a 26% in performance boost and the removal of hundreds of millions of dollars in cost and energy use for Google. So like this massive uptick in basically optimization. This is a fascinating concept. When I first saw that, I was really fascinated by the fact that it AI is kind of getting, is definitely getting better at math. But beyond just getting better at solving math or solving math the way that we might solve it, it's creating new ways to solve math and coming up with completely new methods when it thinks that our methods are flawed. So today on the podcast I Want to get into AI and math where it is today because there's also a whole bunch of really interesting news about math problems that have been solved recently. And I think it's, it's easy to talk about AI hallucinations and how, you know, AI can't do xyz. I honestly think like beyond the hype, what I'm actually seeing in my day to day use of AI is that it is getting like startlingly good and it's improving very quickly. And I think a lot of that isn't necessarily the maybe the model's getting better, but the tooling we're adding. Anyways, we're going to get into all of it on the podcast. Before we do, I wanted to say if you want to go check out the latest updates I've done to AI box AI that allow you to build any AI tool you want without knowing how to code. You just prompt it to build something and it will link together all of the AI models, put in the prompts and build something cool. Most recently I saw someone created a Bible story graphic novel generator that was a really cool tool that I'm sure my children will love. But there's so many different options. If you want to go check it out, there's a link in the description to AI Box AI. You can go try to build something and check out a whole bunch of things that other creators are building. All right, let's get into the state of AI and math today. I wanted to start this off by saying that that AI models right now are starting to crack a whole bunch of high level math problems. I was recently on X and I saw a tweet from Bartos Nasrecki where he said GPT5Pro solved in just 15 minutes without any Internet searches. The presentation problem known as Yu Semrutsu's 554th problem. He said this is the first model to solve this task completely. He expects more of these kind of results. The model showed that it had a really strong grasp of elementary abstract algebra reasoning. So like these models are getting better and better at solving problems, but they're also doing really good in math competitions and other areas. DD recently posted on X and said AI just achieved a perfect score on the hardest math competition in the world. The Putman has 12 problems and they each are worth 10 points. The highest score last year was 90, the median was 0. Axiom's AI Pro Prover in Lear scored 120 out of 120 and just shared all of the solutions. Huge milestone in AI. I saw Another really interesting story where over the weekend Neil Simani, who is a software engineer, he's a former quant and now he has a startup, but he was testing how well OpenAI's newest AI model could handle really difficult math problems. And he said he had a, he saw something that was really surprised him. Essentially he pasted in a really long unsolved math problem. So there's these lists online, by the way, that like there's one Hungarian mathematician, he's got like a thousand problems he's posted online that have never been solved. And basically people take them and they put them into AI models to see if the AI model can solve them. And it's like, oh my gosh, like AGI is here, the AI model could solve it. Recently we had one last year that Google's AI model solved. And so anyways, it's kind of always an exciting thing when they get solved. So he posted an unsolved math problem into ChatGPT. He let it run for 15 minutes. He came back and it had a solution for him. But you never know, right? Like maybe this is just hallucinated. So he goes and checks the solution and it turns out that it was actually right. There's kind of these online verification tools. One of them in particular is called Harmonic and it's basically just designed to make sure that the logical arguments of solving a math problem are sound and apparent. Apparently once he pasted in chat GPT's response, Everything checked out. It said it was accurate. He, this is a quote from him. He said, I wanted to get a sense of where AI systems can actually solve open math problems and when they still get stuck. So I think he was really surprised by the fact that just how it had actually solved this problem. So problems that were previously out of reach, he says he thinks are now solvable. Okay, but I want to talk to you how it solved about how it solved this problem. It's line of reasoning. Because what's cool with ChatGPT and with reasoning is you can go and like look through its chain of thought. And so he's obviously, he was a formerly a quant, he's a huge math nerd, so he can go and understand his chain of thought. And it was fascinating. So this is how ChatGPT got to solving this, you know, very complex previously unsolved math problem. Basically what it did is it pulled out a bunch of well known ideas from mathematicians. So first it does research, it gets all of these, it, it tries to connect them in a logical way. This particular case, it Went and found some information about Legend's Formula, Bertrand's Prostulate and the Star of David theorem. And then what absolutely blew my mind. It went and it found an old math overflow post from 2013 where there was a Harvard mathematician named Noam Elixis who had a really interesting solution to a related problem. Right? So not, not the same problem at all, but it was kind of a related problem. It goes and finds that and then instead of copying just like the solution of how, you know, that former Harvard mathematician solved that problem, it took a completely different approach and ended up producing a much more complete answer to the question that was connected to some works of Paul Erdos, which is one of the most influential mathematicians of the 20th century. So I think for anyone that is skeptical about machine intelligence, this is amazing because this isn't just one isolated example. I think a lot of people are seeing, like AI tools are already being used by a lot of different researchers. They're helping in a lot of different things, searching through academic papers and checking complex arguments. But I think since ChatGPT 5.2 came out, which is what Sumani was using, he says that he has seen a huge shift in, in its reasoning basically than earlier versions. A lot of this is because of tooling. When you add give tools to these models, different formulas and algorithms and calculators, like they're, they're better. But its reasoning was so good, like it was going and doing research, finding old posts, finding, you know, solutions to similar problems, adapting them to this problem and writing more complete versions of it, which was incredible. So I think we're going to start seeing way more breakthroughs with where AI is being used to do this now. It's kind of interesting because like AI isn't out just there in a vacuum. It's not just running around solving all the math problems of the world. Like you have to point it in a direction, it needs a person to point it in a direction. For now anyways. And so it's amazing as we just pick what directions to point it in, how it's able to solve so many things. Something that was interesting to me is that Samani was looking specifically at erdosis problems. It's just this famous list of a thousand unanswered math questions like I was telling you about, which it's been online for years. But what's interesting is like the range of the problem. So there's like some simple puzzles, there's some extremely difficult challenges. And anyways, this basically makes it a really popular benchmark for testing human and machine problem solving. And I think since Christmas, 15 problems on the EARDOS list have moved from open to solved. So open means, you know, here's the problem, no one has solved it to have been officially solved. 11 of the cases, the published solution explicitly mentioned AI tools as part of the process. So whether, you know, that was like an AI model 100% solving the problem, or a human was most likely in a lot of these cases, a human was using AI tools to help them. 15 or 11 out of 15 of those iridos problems that got solved since Christmas were using AI tools. So in my mind, this is just no doubt that this is really pushing the field forward and in really novel, interesting new ways. So I think not everyone is claiming that AI can now replace mathematicians. Terence Tao, he's one of the world's most respected mathematicians. He's basically tracked the progress carefully. And he says that in a whole bunch of different cases, AI systems produce meaningful, meaningfully new ideas on their own. But in other areas they helped by finding relevant past research that humans could build on. Right. Like in the case we were talking about earlier, you know, it was going, and it was finding some work that a hard, every mathematician had done and it was kind of adapting it to its problem. But like, to be honest, that's still amazing because it did come up with something new and it did adapt it in a new way. So it's like, obviously it's drawing on something anyways. I think, I think a lot of like mathematicians say this, like, look like it's still using human research. Well, of course it's using human research. Like where do you think it's training data came from? Where do you think it was? Like, what was it fed? How can it do this? It's using human knowledge to be able to do this, but it is coming up with new novel things, which is interesting because, you know, at that rate, eventually it could just come up with stuff without human knowledge, theoretically. Right, that's, that's kind of the interesting thing. Fully independent mathematicians are still a long way off, according to Tao, but he does say that those tools are already making a huge difference. So there's a recent post that he also made that he suggested that I might be especially good at tackling less famous overlooked problems. A lot of those questions are not unsolvable, but they just simply never get enough attention from human experts. And because AI systems can work really, like methodically, they can search through thousands of possibilities. And I guess, like, if I'm being a hundred percent honest, it's also because they don't get bored. And because of this, Tao basically argued that a lot of those systems might be able to solve problems using AI that humans have not solved alone on their own. Not because humans are incapable, but because, you know, we're humans. And solving really long, complex math problems maybe is boring for some people. Okay, so another reason that I think progress is accelerating a lot is a growing focus on making math arguments easier to check. So traditionally proofs are written in natural language, which can hide a lot of the small mistakes or some of the unclear steps. So there's a bunch of new software tools that are allowing researchers to translate those arguments into a really precise format that can be automatically verified. And, uh, this process is slow and it's tedious by hand. But if you're using an AI system, which of course these things are getting increasingly better at helping with it makes it way easier to confirm results and then, and then also to build on them. Right? Because the faster you can actually check that the AI model or even a person solved a problem correctly, the faster you can say, okay, here's how we solve this type of problem now. We'll use this kind of as a building block for future problems in this direction and there's more problems that you can solve. There was an interesting comment made by the founder of Harmonic. It's this tool that checks the math problems and his name's Tutor Archimec. And something that he said that's really interesting is that the most important signal is not how many problems actually get solved, but who's willing to use the tools. Here's a quote from him. He said what matters is that serious math and computer science professors are actually using them. These are people whose careers depend on being careful and credible. So when they say they rely on AI tools, that says a lot. And honestly, I think it really does. I think basically the real world implications is going to go a lot further than just math. Right? Because the same abilities that help an AI reason through on, you know, for example, like an abstract problem, it could still, it can actually be used and applied into fields like engineering, economics, medicine, science. Right. Like progress in a lot of those fields is often slow because problems are really complex, they're hard to verify. So as the AI systems get better at actually exploring ideas and checking work and connecting past knowledge, they are going to dramatically speed up research and innovation. So for me, the advancements being made in AI and math isn't just exciting for math, but for so many other areas that are going to benefit and the reasoning capability of these models. Improving means that so many different areas are going to improve, including software engineering, which personally is getting me really excited this week. All right guys, thank you so much for tuning into the podcast. If this was interesting, it would mean the world to me if you could leave a rating and review on the show. Basically it just helps other amazing people like yourself find the show. It helps me rank in the algorithm on Apple podcasts, so it's the number one way you could say thank you. As always, make sure to go check out AI box. AI if you want to build tools without knowing how to code and if you want to try over 40 of the top AI models in one place, 20 bucks a month saves you a ton of money. The link is in the description all right, have a great rest of the day guys.
