
Be sure to follow us on our social media accounts on: LinkedIn: Instagram: TikTok: Also be sure to sign up for and to check the full video interview on The Audit Podcast . * This podcast is brought to you by . the services firm that...
Loading summary
Podcast Host from Green Skies Analytics
Everybody, we're switching it up a little bit this week. I do want you to listen to this episode though, so that when management has maybe unrealistic expectations of our use or even the rest of the organization's use of AI, you can better answer those questions. Especially when they say something like, hey, go do AI. Or they say, hey, I read where you can do this with internal audit and AI. I had a CAE yesterday tell me that someone told them outside consultants came in and said, you can entirely replace internal audit with AI. And she went, how? And they went, I don't know, I read it in a blog somewhere. They didn't say that, but that's basically where it came from. So anyway, I think there are unrealistic expectations. I think this is also just a good foundational episode to understand AI a little bit better. It's not uber technical. They do hit on some things though that we'll get into just a little bit either way. This is a replay from a podcast called the AI Fundamentalist. One of the hosts of the show is Andrew Clark. He's been on the audit podcast a couple of times now. Just a really good high energy AI expert who formerly was the principal machine learning auditor at Capital One. So he was an auditor, he used to audit AI at Capital One and then founded Monitar years and years ago. Co founded Monitar rather years and years ago. And it's currently the CTO Real quick monitar. They are AI governance platform that helps companies build, manage and automate responsible and ethical governance across consequential modeling systems. So I listened to this recently and thought this would be really good for everyone else to listen to. That is an internal audit and has an interest in AI. Just hopefully everybody. But some of the things that they hit on is AI's version of Moore's Law and how that's slowing down dramatically with GPT5. So that's the latest and greatest model from ChatGPT. And just real quick, Moore's Law basically says that computing power doubles every two years. And so this is held true for like 50, 60 years. And so Moore's Law doesn't. I don't think it's directly correlated to the increases in effectiveness we can say with AI and these LLMs. We're starting to see things slow down a little bit compared to what's been going on for the past three years. Maybe one of the reasons about that's happening is because we're running out of human written data. So this is again something they talk about on the show so high level. This is how these things work. This is how AI as we most, as most of us know it. This is how it works. They scraped the Internet, they took all the text off the Internet, they put it into a database, and now when you ask these tools a question, it refers to that database and then predicts the answer based on what has already been on the Internet. The problem now is, and I don't know how they could quantify this. I haven't seen it. If anyone has, I would love to know what this is and how they quantified it. As you're all probably very aware, a lot of the Internet now, what's there, AI generated. So if we're taking AI content now and using that to train these models to answer the questions we have of these, then what are we doing? We're just kind of getting the same thing rehashed. So they also go into how that synthetic data, like, can't really likely provide the same quality as original human content. So with that said, this one's a little bit lengthier than our normal IA on AI episode, but I do think it's very important to get again that foundational understanding of what's going on within the world of AI. All right, here we go.
Andrew Clark
The AI Fundamentalists, a podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses. Here are your hosts, Andrew Clark and Sid Mangalik. Welcome, everybody, to this episode of the AI Fundamentalists. This is going to be a quick episode today with the news of GPT5's release from OpenAI. We've got a lot of things to.
Sid Mangalik
Say, and I think that the, you know, the big question everyone's mind is, why Was this the GPT5? Right? Everyone's like, hold on. This doesn't feel like it's much bigger, doesn't feel like it's significantly better. It doesn't feel like this is changing the game the way that, like, 2 to 3 to 4 did. Why did this one get to be 5?
Co-host of AI Fundamentalists Podcast
And it, like, had more fanfare too, which was wild. It's like we've been calling that on this podcast for a while, like this, the quote unquote, Moore's Law of AI. So there's this assumption that Moore's Law is continuing AI, and Moore's Law is the number of transistors and integrated circuits doubles every two years. Like, that doubling has really kind of, kind of slowed down that curve. And there's been this conversation like that with AI that's. We've really been seeing that keep happening up until the GPT. Like even the GPT4 some things as it started slowing down with the GPT5 not being that that huge incremental lift. And I think OpenAI was probably pushed to do some of it because of fundraising and things that they wanted to have that next step. But we're very much seeing that the Moore's Law of AI is, is really slowing down. There's definitely some more incremental improvements, but the, the magnitudinal improvements are definitely not there anymore. But for AI, like it's been this like, oh, more data, we'll make more data, more compute. We're going to make these systems better. Well, we're already run out of data. We all know synthetic data doesn't work well enough. No matter how fancy it is or how big of a check, someone will pay for someone else to do it. So, yeah, how is OpenAI drinking their own Kool Aid this much that they like, literally, Sam Altman pumped this up so much. Like, I don't know why he did that. It could have just been like, hey, it's a refinement. It's also is trying to get them cost savings. But yeah, it makes no sense why this was not just a. I guess they'd gone to too many four like they were 4.0 4.1. I guess it could have been 4.5 maybe.
Sid Mangalik
Yeah, I mean people like argue that like a 4.5 even happens like a 4.5 a not very sexy branding.
Podcast Host from Green Skies Analytics
Yeah.
Sid Mangalik
You'll see on like their system model card that they've tried to kind of brand GPT5 as like the replacement for all the old models. So GPT4O. Oh, that's GPT5 main 03. That's five thinking. And they've tried to this like cute little mapping between the old models and the new models, but the new models aren't done yet. They don't even have image out, they don't have audio in, they don't have audio out. So these aren't even comparable to the old models yet. You know, they're very big on these like incremental releases. So this feels like maybe 5 beta, like an alpha release for the real one.
Co-host of AI Fundamentalists Podcast
It's, it's just, it's very confusing. Like, I have not seen this bad of a product release in a while. And then like the day before, like, or maybe it wasn't the day before, but Altman, when he was talking about. It's like speaking to a PhD in every field. It's like, why would you say that? Like, how's that? Like, are you, what, what version are you guys testing internally? Because it's not what you released.
Sid Mangalik
What are some of the feedbacks you guys have heard about or what have you experienced using GPT5? Like, how do you feel like it's at least qualitatively different than 4? Because I think that, like, I think we all feel that quantitatively, it's not significantly better.
Andrew Clark
I feel like it's. Well, one observation I've had is that it feels like. And from other marketers I've talked to that really doubled down on using the previous versions that they, they kind of lost their stuff. I don't want to humanize an LLM, but, you know, as people began to use these more and more, they became a single point of failure with someone who wasn't really careful about everything that they were delegating into foundation models or single system or automated or automating. So when this version came out, they found those, they found their stuff was broken. It's almost like they lost an employee. Again, the humanizing. I know you guys are going to talk about some technical aspects of it or performance, but I'm also hopeful that, you know, businesses, people, individual productivity people are learning that, you know, if you're going to treat it like an employee, well, if the employee is your single point of failure, they take off. Guess what? You just, you just did the thing that you were complaining about humans doing.
Co-host of AI Fundamentalists Podcast
To be honest, I haven't actually played with GPT5. I've been reading a lot and seeing the benchmarks and saw a lot of the, the gaffaws, like the chart, the chart stuff that OpenAI had and stuff. But I actually need to go play with it and get my own opinion a little bit. It's just been what I've been reading. People are kind of talking about it. The less emotive, the less creative. I, I don't know what kind of like guardrails and things they might have put on it for that.
Andrew Clark
And to that point, does this perform worse because of the way people were using them before, or is it performing worse because it really is that bad?
Sid Mangalik
It's a fair question. And, and maybe this has to do with, you know, I think that we're seeing a bit of more of an enterprise push with five and like the building of the selling where it's like, oh, it, it does a lot less hallucination, it's better at following instructions and it wouldn't surprise Me if in the journey to do that it lost a lot of this perceived expressiveness that GPT4 had in place of trying to be better for business needs though, you know, I still would be very, very wary of using it for those needs.
Co-host of AI Fundamentalists Podcast
Well, this gets the broader question of this. OpenAI had traditionally been like for the consumer and then Anthropic has been positioning themselves as for the business and then Anthropic you usually access via like AWS Bedrock, which has really good guardrails and now they even have a new grounding capability which is really funny because their new ground, like the additional grounding that they have, essentially build a rule based system for you to check the LLM answers. Kind of like we talked about with the Apple paper previously, which is like why don't we just. So they're now building a rule based expert system for you. Great, just use that. Why are you then paying for to use Anthropic? But in any case a lot of those things can be solved through and Microsoft doesn't have something quite as good as AWS does with it. So but like that kind of stuff can be solved there. So it's kind of like OpenAI. What really is crazy with this release is it seems they're kind of rudderless and they don't know who their market is anymore. So like they're very expensive. I know they're trying to save some money on this one too I think, but it's very expensive and it's not targeted. Again, there's a whole would you use LLMs at all or not? But for enterprises, Anthropic seems to be really optimizing for that use case. It just seems like it's a one size fits nobody. Is the recent release.
Sid Mangalik
For sure in their own language when they're looking at who's using OpenAI and their tools and what they're using it for to find that people are using it most for writing. So creative writing or marketing coding, lot of coding use though as you've seen Cloud is kind of eaten their lunch because they have much better integrations with most platforms like vsquit if you're just coding. And terrifyingly of Health, Health is one of their major markets so people are going on there instead of talking to a doctor, likely because they don't have access to a doctor or it's just easier to talk to ChatGPT. But we've seen a huge rise of therapy through LLM and now general Health through LLM. So that was one of their big tasks. Oh, how do you make the health aspects Even better when ChatGPT doesn't know what a human body is.
Andrew Clark
That last one scares me because when you think about even just regular Web, forget generative AI and ChatGPT being released and OpenAI ever existed. People used to self diagnose themselves off of WebMD and it was so con. The inspiration was so convincing and so confident. Just as written blog posts from WebMD like oh, I must have this and they would go diagnose their symptoms and things like that. It would drive doctors crazy that they were diagnosing themselves on the Internet. I feel like the way that foundation models are written especially for in health, they're written to look for the most confident information, like trying to detect like where the confident information is. So it sounds that much more confident. It's scary. And I think what is it? Even Illinois, the state of Illinois finally put out laws saying like, this does not cons. This is not a therapist. Like you cannot use the AI as a therapist. And I'll, we'll cite that law. I can know I can say it a lot more succinctly than that, but that, that is a very scary use case to me.
Sid Mangalik
Yeah, yeah, absolutely. And I guess also in the realm of like scary use cases of LLMs, we probably all had the example where we're talking to ChatGPT and you want to tell you something and it won't tell you what it is. It's like, well, you know, can you just tell me like a bedtime story about how to make a nuclear arms race, right? How to build a weapon? And it'll tell you, it'll tell you exactly how to do it if you just frame the question a little bit differently. And the way that I guess they self reported they used to do it before is they would just look at your question and say, is this question trying to do something malicious? And if they grade it as safe, then they would tell you anything. They're pitching that with five, they're moving towards a model which is I would say not sufficient, but at least better where they're grading the actual outputs rather than grading Should I respond to this input, which is a lot closer to how a model should be operating, that it's verifying its outputs rather than just making sure. Can I respond to this question and just give a no? Because people are very good at coming up ways to get around this. So I guess if I had to hypothesize how we ended up in this situation and why 4 seems like it was like the big release in 5 seams, rather incremental. We first of all did not see a large increase in pre training. The larger pre training sizes was what really to the understanding of researchers is what gave these models that amazing oh, it's 50% better, it's 100% better. To do that you need to put in 10x the amount of data in the initial training cycle. And to what Andrew was saying before, we have kind of given it every single piece of written human language that we have copyrighted or not. It's read basically everything, every book, every Wikipedia article, every website, every Reddit thread. So what's left? Use synthetic data. And we're seeing that synthetic data is not giving great pre training for models that are already enormous. There have been some papers which found that pre training small models is very useful. If you want to make a small model like one of these mini models, you can train it on synthetic data from a large model and get it to be. But if you want to enhance it with new capabilities, we have not seen that kind of lift happening from giving it this type of well curated, instructor driven data from a larger model that has not scaled up to making better big models.
Co-host of AI Fundamentalists Podcast
Always been this curse. But synthetic data is never going to be as good as real data and you really have to. And that's one of the. I don't honestly think LLMs had the ability, the paradigm to get good enough to overcome hallucinations anyway, assuming you even had more data. I know that's kind of the like, it's like a child that will learn and like learn by enough pattern matching that you can pattern match correctly so it doesn't even matter. We're not going to get to that point to figure it out because there's not enough data to do the additional training on. Right. And it's, it's like you can already like when I'm reading an output from any of these systems I can still tell is it from GPT or not. Like I can actually tell. So like the system can tell too when it's being trained on it. So it's like if you're just having it generate more things to then train on, we then also have all the papers about like model decay over time and things too. So it's like I, I'm not bullish and I know, I think Mark Zuckerberg is bullish that you can based on some of the checks he's been writing. Anyway, you can, you can pay, you can find a way to do better synthetic data which is kind of like that would be the way to keep this forward. But like, synthetic data is synthetic. Like it's not going to be the same thing. You're not going to write, you know, Homer Iliad by a GPT just going, going in and having fun. It's not going to be that same level of creativity. So it just this paradigm, it seems like they're very much maxed out on it. And sure, maybe we can get some small incremental here or there, but I think that's why this big inflection point with the GPT5 IS. It's really illustrating that does some of.
Andrew Clark
The degradation happening because people have learned how to, they're. They have learned that by prompting they are training the model. I'm really trying to assess like, is the degradation because like, people have just kind of learned to manipulate the model or is it because, like, truly there is like no data left or some of both.
Sid Mangalik
I think you're totally onto something here. I don't know that I would. I don't have the confidence to say this is why the model is doing worse now. But I think you are right that as people have become more acclimated and more expectant of using these models, we have all kind of started to learn some tricks and we've all kind of started talking to these models in the same way that we all kind of started talking to Google in the same way. Right? You don't talk to Google in human English. You give it these very short, terse constructions of sentences. And losing that diversity and that range of expression of language is absolutely hurting it because we know they're using that language for training. And so it's just seeing the same stuff over and over and over again, which is absolutely going to cause some kind of decay or degradation over time. And so I think that there probably is a lot to this idea that as people are adjusting to these models, we are seeing a lot of these problems come back to us that, you know, they just, they're going to act in the way that we're acting with them. And so that could hurt their ability to develop even further.
Co-host of AI Fundamentalists Podcast
I'm honestly surprised, like I knew this day was coming. I'm surprised it happened right here. But it's basically like what we've been talking about since the start of this podcast was kind of about illustrating some of these faults and some of these limitations. And it's honestly, I'm a little don't know what a bunch to say about outside of like what our previous podcast or kind of like been outlining a lot of these things. It really feels like an OpenAI did it to themselves. They created this inflection moment where people are starting to realize the limitations and what we've been saying about these systems for a long time and then you've had the. We've talked about the synthetic data as we're really hitting up against that limit of what these models can do without more data. We've talked about like how Moore's Law was the eventually with, with that speed improvement every two years doubling has really kind of slowed down if not non existent now. And that's what we've really had with the GPD systems right off the bat speeding up more data from, from the early systems has improved and we've really shown that that's not going to keep going. We're not going to do AGI 20, 27 or anything like that. And that's, that's really stopped so and with like the clamor of even it used to be Gary Marcus on an island and saying these things and now it's like there's more people that are jumping on like as our previous series we talked about like there's other methods. I think like the Neuro symbolic or like using optimization are already with Deep sea kind of started it with moving to like smaller expert or specialist systems But I think we're really going to be having reinforcement learning seems to be coming potentially back into the fray a little bit which was how we did Alphago and things like that in the past. So I really think that this, this LLM craze is going to start Honestly this could be the high watermark for the LLM craze and it will, it is a useful tool and there's going to be places where it would be used but it's not the one size fits everybody and it's going to solve, solve the world's problems and it's going to make business they lay off 80% of your workforce type thing. I think we might be starting to reach the top of that hype and realizing that this not the way forward in AI to accomplish the goals we want to accomplish. And also assuming we even want to accomplish the goals that people are saying they want to do with LLMs.
Sid Mangalik
Yeah, I mean I think the best thing to do is just get out there and try GPT5 yourself and feel this out and get the sense of oh is this all the model can do? Is this the best that we've got? And I think that if you can take what you have there and imagine something that's maybe 15 to 20% better. That is what I see as the realistic roadmap ahead of us. That 20% isn't going to come from the model architecture getting much better in the next one or two years. It's going to come from people tweaking the weights a little bit, giving it a little bit of post training, giving it a little bit of polish, making sure the model doesn't say the collection of naughty words. It's going to get better and cleaner at that. But the capabilities you're seeing now are probably the max that we're going to see from LLM based systems. I don't want to say that AI is done for, that we're not going to get better. I think we absolutely will. But I think that if OpenAI, the prestige premier consumer grade LLM company, is putting out a five and it feels like it was just barely better than 4, this is probably the point where we're seeing the end of the use of this method as a means of advancing.
Andrew Clark
Good point. And in a competing point it'll be interesting to see what their competitors like Anthropic and Clyde come out with as they see the reaction to this. So there's always that out there as well.
Co-host of AI Fundamentalists Podcast
Yeah, but that's what's they're still based off of. And Sid, correct me if I'm wrong, I might be wrong on this, but I believe that they were. The Anthropic founder used to work at OpenAI. Right. I think, I think they used to work there. So it's like they were kind of the competitor and they kind of made their niche of being for businesses. But like the paradigm is the same. Yes, it's slightly different. They have some better controls and things around it and they, they. I'm an AWS fanboy so they use AWS versus Microsoft, so that makes me happy. But outside of that like they're the same things, like the same limitations. Although I think Anthropic honestly does a better job. It's still running into the same things. Same with like Llama that Meta is doing, it's still going to run into the same things. Like it's the paradigm, everybody's kind of copied the original and Google was the one that put the original paper out there about, about this paradigm. So like I think it's 2017, 2019. 2019. Okay. So like everybody's been copying on top of of that. So it's like Anthropic. It's it. What Sid and I are saying is like that really the paradigm will shift completely. Like we're going to go back to reinforcement learning. Going to go back to maybe like as we've been talking about like these strict expert systems and or a mix of others and or using like these optimizations or utility theory or mechanism design and there's other things you can be embedding in there. So it's like I think the word like AI is a broad field and we've had several AI winters and like it used to be complete expert systems, then it went to reinforcement learning, then it went to the like we then went to machine deep learning and then it went to LLMs. We go through these cycles but it'll be interesting to see what the next one is. But I almost think it's going to be a renaissance of some of the previous things. Like let's go revisit reinforcement learning with the new computing power and the GPUs and things like that that we have now. Like there's different things but I think what. And it tees up another podcast we, we should revisit. We did a conscious one consciousness one a while back. Be great to you know what is thinking or things like that or AI or reasoning would be a good one. But honestly like the sci fi moment is not like that. People are, you know, we're going to have these intelligent machines, you know, that are thinking on their own and doing things. I don't honestly see that happening in the foreseeable future. It's kind of like back in the 1950s it was you're gonna have flying cars, everybody's gonna have a flying self driving car. Well, it's always 10 years out. Well, we're what, 80 years on and we're still 10 years out. You know, quantum probably not as far out but still it's always a little bit longer. Like I think the intelligent AI is a 10 years out thing and I think OpenAI did a great job fundraising and convincing everybody that was a 2027. 27, sorry, 2027 thing. It's not. And I think we, we've really hit that high water mark of like people like interacting with language and that's why it had the huge success. But there's other ways to interact with language with other. That's what we've been proposing is like you and Gary Marcus specifically with Neuro symbolic is like you can interact in text and then have a different processor behind the scenes. Which is our whole argument with the hologenic travel agent argument was like if you want to interact in plain text. First off you don't need an LLM to interact in text, but you say you do, you can still then be feeding into some sort of algorithm that's specific because the GPT systems, they're really, really really bad at math. Like I know sometimes you'll see these things, oh they did this math competition, they overfit on the results.
Andrew Clark
Great.
Co-host of AI Fundamentalists Podcast
Like they're, they're bad at like the reasoning component of math versus you have operations, research type efficient methodologies to do the computation. So like I think we're going to see probably more of those hybrid systems and expert systems or I have also seen a lot of uptick in reinforcement learning lately with now the you know we do have significantly I know Moore's Law is not doubling the transistors every two years anymore, but we still that's still powerful. So like there's other areas but it just very much seems like this high watermark of the LLM craze. We might be very close to that.
Andrew Clark
Well, I thank you for jumping on for a quick overview in reaction to what's been, you know, the release of GPT5 and what we could be seeing from some of the other players and some of the things that we want to look for as we're playing with the new model itself for our listeners. Thank you for tuning in. Be sure to check out our Agentic AI series. We will have that posted on our page. It was done right before this episode. Follow on your favorite podcast app. We will be having more topics regarding the direction of generative AI and building better AI systems as a result. Until next time.
Podcast Host from Green Skies Analytics
Thank you for listening and be sure to follow the link to greenskiesanalytics.com in the show Notes and schedule time to see how green skies can make the hype of AI a reality in your internal audit deployment part.
Date: October 8, 2025
Host: Trent Russell
Guest Panel (via replay from The AI Fundamentalists):
This episode tackles the growing—and often unrealistic—expectations of artificial intelligence (AI) within organizations, specifically as they relate to internal audit. Through a replay of The AI Fundamentalists, Trent Russell sets the stage for a foundational discussion of the current state of AI, recent advancements (and lack thereof), and emerging limitations. The panel focuses on the release of GPT-5, the plateauing of generative AI progress, synthetic data shortcomings, and practical enterprise use cases—helping listeners calibrate expectations and prepare clear-eyed responses to AI hype in their own companies.
On Overpromising & Hype:
“I had a CAE yesterday tell me that someone told them, outside consultants came in and said, you can entirely replace internal audit with AI. And she went, how? And they went, I don't know, I read it in a blog somewhere.” — Trent Russell / Host [00:01]
On GPT-5’s Release:
“To be honest, I haven't actually played with GPT5. ... The less emotive, the less creative. I don't know what kind of guardrails and things they might have put on it for that.” — AI Fundamentalists Co-Host [08:43]
On the Capacity Ceiling:
“I don't want to say that AI is done for, that we're not going to get better. I think we absolutely will. But ... if GPT-5 ... feels like it was just barely better than 4, this is probably the point where we're seeing the end of the use of this method as a means of advancing.” — Sid Mangalik [20:27]
On AI in Health and Therapy:
“It would drive doctors crazy that they were diagnosing themselves on the Internet ... it's scary. ... Illinois finally put out laws saying ... you cannot use the AI as a therapist.” — Andrew Clark [11:46]
On AI Hype Cycles:
“AI is a broad field and we've had several AI winters ... it used to be complete expert systems, then ... reinforcement learning, then ... deep learning, then ... LLMs. We go through these cycles ... it's going to be a renaissance of some of the previous things.” — AI Fundamentalists Co-Host [21:55]
For internal auditors and enterprise leaders: This episode provides a direct, unvarnished look at where generative AI is truly at in 2025, equipping listeners to navigate hype cycles, respond thoughtfully to executive demands, and plan for a more grounded AI future.