Steve Gibson (124:25)
Yes. And that said, aside from the fact that the recent, truly astonishing advances in AI are going to directly impact everyone's lives outside of the security sphere, I'm also very certain that we're going to be seeing AI's impact upon the security of our software and operating systems. And we may not be needing to wait long. So over the course of the next few years, I'm sure that the topic of AI will be re emerging. And I'm not saying, I'm not saying I'm never going to talk about it again because, you know, it'll just be fun to, to talk about the major advances that I expect that we're going to be seeing. 1 actually, I'll be talking about in a second, only about a month away. So our listeners have been following my journey through this topic, you know, and it's not been a straight line. You know, more than anything else, I endeavor to be an honest researcher. An honest researcher will readily revise their entire belief system as required when presented with new facts and information. You know, clutching to obsolete dogma simply because it's familiar and comfortable is not the way of science, you know, and it was because I was puzzled and confused by what I was experiencing firsthand that I went searching for that information. I believe I found it, I believe I understand it at least as much as is possible without actually implementing it myself. And I've got other work to do. So that's not going to happen. And I've been changed by what I learned three weeks ago. As I said, I might have something to say about this before we met again today and that and I said if so, I would probably enjoy sharing that with this audience with a special email over the holidays. Now, the possibility of that happening induced more than 1100 of our listeners who had not already signed up to the security now mailing to do so. So for that reason alone, you know, due to that declaration of interest, I felt I had to say something today. I have much more to say on the topic than I did nine days ago last Monday, December 30, when I sent that out. But let's start with what those 15,060 subscribers received from me last week. Then I'll expand a bit on what I think are the most important points and what I've continued to learn since. So what I wrote then was When I first set about writing this email, my plan was to share what I had learned during the first half of our three week hiatus from the podcast. But it quickly grew long, even longer than this because I've learned quite a lot during a lot about what's going on with AI. Since I suspect no one wants to read a podcast length piece of email, which I would largely need to repeat for the podcast anyway, which is what I'm doing now, I'm going to distill this into an historical narrative to summarize a few key points and milestones. Then I'm going to point everyone to a 22 minute YouTube video that should serve to raise everyone's eyebrows. So here it is. First, everything that's going on is about neural networks. This has become so obvious to those in the business that they no longer talk about it. It would be like making a point of saying that today's computers run on electricity. Duh. Okay. AI computation can be divided into pre training and test time, also called inference time. Pre training is the monumental task, and it is monumental, of putting information into a massive and initially untrained neural network. Information is put into the network by comparing the network's output against the expected or correct output, then back propagating tweaks to the neural network's vast quantity of parameters to move the network's latest output more toward the correct output. A modern neural network like GPT3, which is already obsolete, had 175 billion parameters interlinking its neurons, each of which requires tweaking. This is done over and over and over, many millions of times across a massive body of Knowledge, which I have in quotes, to gradually train the network to generate the proper output for any input. Counterintuitive though it may be, the result of this training is a neural network that actually contains the knowledge that was used to train it. It is a true knowledge representation. Now, if that's difficult to swallow, consider human DNA as an analogy. DNA contains all of the knowledge that's required to build a person. The fact that DNA is not itself intelligent or sentient doesn't mean that it's not jam packed with knowledge. In fact, the advances that have most recently been made, which I'll get to in a bit, are dramatic improvements in the technology for extracting that stored knowledge from the network. That's why I titled today's podcast AI Training and Inference. The inference is the second half the implementation of neural networks is surprisingly simple, requiring only a lot of standard multiplication and addition pipelined with massive parallelism. This is exactly what GPUs were designed to do. They were originally designed to perform the many simple 3D calculations needed for modern gaming. Then they were employed to solve hash problems, to mine cryptocurrency. But now they lie at the heart of all neural network AI. Now, even when powered by massive arrays of the fastest GPUs rented from Cloud providers, this pre training approach has become prohibitively well, was becoming and is prohibitively expensive and time consuming. But seven years ago in 2017, a team of eight Google AI researchers published a truly groundbreaking paper titled Attention Is all youl Need. The title was inspired by the famous Beatles song Love Is all youl Need. And the paper introduced the technology they named Transformers. Actually it was named that because one of the researchers liked the sound of the word. The best way to think of transformer technology is that it allows massive neural networks to be trained much more efficiently. In parallel, this insightful paper also introduced the idea that not all of the training tokens that were being fed into the network, which is the long string of data being fed into a model during one training iteration. Not all of those tokens needed to be considered with equal strength because they were not all equally important. In other words, more attention could be given to some than others. These breakthroughs resulted in a massive overall improvement in training speed, which in turn allowed vastly larger networks to be created and trained in reasonable time. Basically that paper allowed it solved the problem that they were hitting five years ago, six and seven years ago that it just training took too long, that limited the size of the networks, so that limited the quality of the networks. What happened was it then, thanks to this breakthrough, it became practical and possible to train much larger neural networks, which is what gave birth to today's LLMs, large language models. Now the GPT in ChatGPT stands for generative pre trained transformer. Pre trained is the training transformer is this technology. But over time, once again, researchers began running into new limitations. They wanted even bigger networks because bigger networks provided more accurate results. But the bigger the network, the slower and more time consuming and thus costly was its training. It would have been theoretically possible to keep pushing that upward. But a better solution was discovered post training computation. Traditional Training of massive LLMs was very expensive. The breakthrough transformer tech that made LLM scale neural networks feasible for the first time, well now that was big tanker for granted. But at least the training was a one time investment. After that, a query of the network could be made almost instantly and therefore for almost no money. But the trouble was that even with the largest practical networks, the results could be unreliable, known as hallucinations. Aside from just being annoying, any neural network that was going to hallucinate and just make stuff up could never be relied upon to build chains of inference where its outputs could be used as new inputs to explore consequences when seeking solutions to problems. Being able to reliably feed back a network's output into its inputs would begin to look a lot like thinking and thus inference for true problem solving. Then a few years ago, researchers began to better appreciate what could be done if a neural network's answer was not needed. Instantly they began exploring what could be accomplished post training if when making a query, some time and computation and thus money could be spent working with the pre trained network. This is known as test time computation and it's the key to the next level breakthrough. By making a great many queries of the pre trained network and comparing multiple results, researchers discovered that the overall reliability could be improved so much that it would become possible to create reliable inference chains for true problem solving. Using the jargon of the industry, this is often called chains of thought. Although I still object to, you know, giving too much credit to, you know, to, to imbuing these with too much human brain. Yes, yeah, technology involved. So inference chains would allow the problem solving behavior, would allow for problem solving behavior by extracting the stored knowledge that had been trained into these networks. And the pre trained model could also be used to for the correction of its own errors. Now I should note that the reason asking the same question multiple times results in multiple different answers is that researchers also had long ago discovered with neural networks that introducing just a bit of random noise, which is called the temperature into neural networks, resulted in superior performance. And yes, if this all sounds suspiciously like voodoo, you're not wrong. But it works anyway. OpenAI's recently released O1 model, which I talked about at the very end of last year, is the first of these more expensive test time inference chain AIs to be made widely available. It offers a truly astonishing improvement over the previous chat GPT4O models that we were using. Since O1 is expensive for OpenAI to offer on a per query basis, subscribers are limited to seven full queries per day. But the O1 mini model, which is faster and still much better but not as good, can be used without limit. But wait for there's more. The big news is that during their celebration of the holidays, OpenAI revealed that they have an O3 model that blows away their brand new O1 model. It's not yet available, but it's coming soon. What is available are the results of its benchmarks and that's why I believe you need to make time to watch this YouTube video. I created a GRC shortcut with this episode number which is 1007. So GRC SC 1007 that will bounce you to a I think it's 22 minute YouTube video talking about the benchmarks that have been the independent benchmarks that have been run against this O3 model. Okay, so is it AGI? OpenAI is saying not quite, but there's little question that they're closing in on it. As you'll see in that video, the performance of OpenAI's latest O3 model when pitted against independent evaluation, benchmarks designed specifically to measure the general reasoning strength of AIs when confronted by problems that were absolutely never part of the AI's training set, demonstrate reasoning abilities superior to most humans. You need to watch the video GRC SC1007 even if it were AGI, even if it were AGI, and we're probably not far from that people are saying it is, I don't care. But that doesn't mean it's taking over. The AGI designation is only meant to indicate that over a wide range of cognitive problem solving tasks, an AI can outperform a knowledgeable person. Computers can already beat the best chess, go and poker players. I think it's very clear that today's AIs are not far from being superior to humans at general problem solving. That doesn't make them Frankenstein's monster to be feared. It only makes AI a new and exceedingly useful tool. Many years ago I grabbed the domain clevermonkeys.com just because I thought it was fun. It occurs to me that it takes very clever monkeys indeed to create something even more clever than themselves. All the evidence I've seen indicates that we're on the cusp of doing just that. Okay, so with that, with a little bit of editing to improve it, that's what our listeners received from me over the holidays. If you take nothing else away from this discussion of AI today, here is the one point I want to firmly plant into everyone's mind. Because this is the sticking point that I see everywhere. Nothing that was true about this field of research yesterday will remain true tomorrow. Nothing. This entire field of AI research is the fastest moving target I have ever experienced in my nearly 70 years of life. There are a number of consequences to this fact. For one, no book about AI that was written a year ago or six months ago, or even last month will be usefully up to date about what's happening today. Books written in the past can definitely be useful for describing the history of AI and as a snapshot of a point in time, but even their predictions will prove to have been wildly wrong. The guys at OpenAI who are working on this and ought to know, believed two years ago that at least another decade, another 10 years would be needed to achieve what they announced last month and are getting ready to unveil. They thought it would take 10 years. It took two. One of the factors in facilitating this astonishing speed of development is that it turned out that much of what was needed was scale. And a weird side effect of cloud side computing is that it's massively scalable. If you can pay to rent it, you get to use it. So investor dollars were pumped into the training of ever more complex models, and they kept seeing surprising improvements in performance. Leo's original appraisal of large language models as fancy spelling correctors was an accurate and useful from the hip summary of OpenAI's Chat GPT3 model. That's their take on it too. Chat GP3 produced grammatically correct language, but it only coincidentally and occasionally produced anything highly meaningful. If it was left to keep talking, it would soon get lost and wander off course to produce grammatically correct nonsense. Even so, back then, highly creative people who operate on the cutting edge, like MacBrake Weekly's Alex Lindsey, were using the ChatGPT3 model as a source of new ideas and inspiration. As I wrote this, I was reminded of how popular formal brainstorming once was, where sometimes random ideas were just tossed out without any filtering and that was the. You know that. That was the entire point, to say something as a means of inspiring some new perspective. So even chat GP3 was useful for the nonsense that it sometimes produced. But as a consequence of everything I've learned over the past three weeks and of the events which have transpired since our Previous podcast title, podcast 1005 three weeks ago, the wizard of Oz.