Steve Gibson (85:32)
Yes, his posting provides a long, deep and detailed glimpse into the inner workings of Cloudflare's bot behavior, discovery detection and traffic routing system. So for anyone who may be interested and curious about the inner workings of one of the Internet's premier bandwidth providers, I commend Matthew's entire posting, which will satisfy even the most deeply curious among us. A link to it is in today's show notes, but for most of us, understanding just a little something about the nature of that cord someone tripped over will likely suffice. Fortunately, Matthew, or whomever may have assembled this posting for the public, you know, to which he applied his name. I don't know if he writes his own stuff. I mean it was. Hopefully he doesn't have time, but whoever it was is a skilled writer who began that detailed posting with a very nice summary of the chord. Tripping over adventure. So here's what the world learned last week. They wrote on 18-11-2025 at 11:20 UTC. Now that would have been 3:20am for us on the West coast or 6:20am on the East coast of the US, they wrote. Cloudflare's network began experiencing significant failures to deliver core network traffic. This showed up to Internet users trying to access our customers sites as an error page indicating a failure within Cloudflare's network. And even the, even the failure message was nice and fair. It showed three icons, you know, you, meaning the browser. It had a green check mark. He's like, yep, your browser is working. Then at the other end the icon showed a server and it. And, and said the host is working. That's good too. In the middle was a red check, you know, cross that sound that showed Cloudflare error and, and the, and the big title on that was Internal server error. So something was wrong. They wrote the issue was not caused directly or indirectly by a cyber attack or malicious activity of any kind. Instead it was triggered by a change to one of our database systems permissions, which caused the database to output multiple entries into a feature file used by our bot management system. That feature file in turn doubled in size. The larger than expected feature file was then propagated to all the machines that make up our network. The software running on these machines to route traffic across our network reads this feature file to keep our bot management system up to date with with ever changing threats, the software had a limit on the size of the feature file that was below its doubled size that caused the software to fail. After we initially wrongly suspected the symptoms we were seeing were caused by a hyperscale DDoS attack, we correctly identified the core issue and were able to stop the propagation of the larger than expected feature file, replacing it with an earlier version of the same file. Core traffic was largely flowing as normal by 1430, so that would have been a little over three hours after the initial collapse. We worked over the next few hours to mitigate increased load on various parts of our network and as traffic rushed back online. As of 11 oh, I'm sorry, as as of 1706, all systems at Cloudflare were functioning normally, so that would have been two and a half hours more we are sorry for the impact to our customers and to the Internet in general. Given Cloudflare's importance in the Internet ecosystem, any outage of any of our systems is unacceptable. That there was a period of time where our network was not able to route traffic is deeply painful to every member of our team. We know we let you down today. This post is an in depth recount of exactly what happened and what systems and processes failed. It is also the beginning, though not the end, of what we planned to do in order to make sure an outage like this will not happen again. And then at the bottom of page 11 of the show notes where we are, I have a link to this beautiful, very lengthy posting. So something broke in the deep infrastructure of Cloudflare's systems and a huge portion of the Internet went dark for between three and five and a half hours. A critic might ask how could they not have some backup system in place to keep this from happening. But I believe that the fairer observation would be that the world has grown so dependent upon the world class services Cloudflare provides, specifically because events such as these, while not the first time and probably not the last, are few and far between and have been relatively brief. Cloudflare has competitors. It's true there are alternatives and someone could move. But for the sites that seek shelter behind the protections provided by Cloudflare's attack absorbing size, there's no reason to believe that anyone else would be able to offer a better solution. A full reading of Matthew's explanation of the event will leave anyone with a deep appreciation of just how much complexity is required to offer the attack, resilience and reliability that keeps Cloudflare's customers from wondering whether there may be greener pastures. To me, that seems unlikely, although I'll admit to I know to have become something of a fanboy for Cloudflare. That's only and entirely because they have gradually earned my fandom over many years due to their ethics, their communication and as you said, Leo, their transparency. I I find no fault with them. So yeah, you know, they had an oopsie and the oopsie knocked a huge chunk of the Internet down for a painful three to between three and five and a half hours. But you know, they understand what happened and they fixed it in their backup. And we noted that there have been a number of major outages in the last couple weeks. These systems have become very complex and with complexity comes frailty. I mean they become brittle and small mistakes have a tendency to explode. So that's what we saw here. Okay, so it appears to be human nature to feel the need to find someone to blame when something bad happens. And during event recovery is often the worst time to make big changes. Since overreaction appears to be another common human foible. We saw this effect in the U.S. state of Mississippi where following that tragic suicide of the 16 year old Walter Montgomery, which was precipitated by his interaction with scammers on social media, Ms. enacted the Walter Montgomery Protecting Children Online act, which requires anyone of any age accessing any social media service within the state to provide acceptable, unspoofable proof of their age and in the case of any minors, to obtain the permission of a parent or guardian. Everyone believes Mississippi's regulation, their their law is like huge overreaction to what happened. But overreaction is what we do. And you know, while this remains a focus for this podcast since it turns on First Amendment rights, the need for robust privacy preserving online age verification, and the potential for the use of VPNs for geo relocation as a measure to avoid whatever state level blocks or filters may be erected, that's not what made me think of this Mississippi overreaction today. I was reminded of that previous overreaction to events due to what appears to be happening in the United Kingdom in the wake of what we all agree was a shockingly significant Jaguar Land Rover cyber attack driven outage they have to be held accountable for this outage. And we learned that they didn't have cyber attack insurance. No one's really explained why that's the case, but you know, it took them down for a long time and there was a ripple effect out to their suppliers because they stopped being able to purchase anything through their supply chain. And so lots of their smaller suppliers who didn't have any ability to withstand a an order shortage were on the verge of bankruptcy. So reported yesterday in the record is their coverage with the headline Software Companies get this LEO Must Be Held Liable. Software Companies Must Be Held Liable for British Economic Security, say the MPS okay, now our long time listeners know that I've often noted with some surprise that since the earliest days software has enjoyed a unique position with regard to product liability. Under the license by which software is used, its users agree to hold software publishers harmless in the event of anything whatsoever that might happen, even as a direct consequence of the software's use misuse or or of its complete failure of any sort. It really is somewhat amazing to see what the entire software industry has gotten away with so far. But as the world grows ever more dependent upon software, and as the major vendors of that software grow ever more rich and wealthy without consequence or liability, and as Western legislators appear to be losing whatever shyness they may have once felt toward the big mystery that is software, one is led to wonder whether the strength of this long enjoyed exception to the rule may be waning. The Right, the Record writes. An influential committee of lawmakers warned on Monday that a lack of liability for software vendors get this is among the most pressing issues putting Britain's economic and national security at risk. A lack of liability for software vendors is among the most pressing issues putting Britain's economic and national security at risk. Wow. The report by the Business and Trade Committee says economic threats facing the United Kingdom are multiplying and in the years ahead will grow exponentially, leading to a huge increase in the private ownership of public risk. While calling on the government to take action to manage these threats more broadly, the committee identified threat three specific measures to address cyber security risks quote introducing liability for software developers, incentivizing business investment in cyber resilience and mandatory reporting following a malicious cyber incident. Those are the three. The report follows a series of cyber incidents in the UK Including a cyber attack on Jaguar Land Rover, which the committee's chair, Liam Byrne, described as a cyber shock wave ripping through our industrial heartlands. The attack on Jaguar Land Rover, as well as a spate of ransomware incidents affecting grocery retailers, quote, highlighted not just the disruptive impact but also the potential public cost of increasingly frequent cyber attacks, warned the committee's report. So what of software liability? Since the industry's early days, software has been sold to users. This is their report. Software has been sold to users either as a service or as licensed intellectual property, not as a product with traditional liability standards for defects. Supporters of the current system, including the Business Software alliance, the BSA trade association which includes Microsoft, Oracle and Amazon Web Services among its membership, have lobbied against introducing oh, you bet they have a liability regime by arguing it would damage the economy by stifling businesses ability to innovate. Okay, now I'll just interject to note that this would be an astonishing, nearly unimaginable change. Can you imagine Microsoft being held responsible for all the specific instances of damage caused by bugs and security failures in their software? Wow. Or Cisco or Google with Chrome? As I said, it would be a truly unimaginable change to the software industry, and a strong argument could be made that accountability would indeed kill the golden goose. The Record continues their reporting writing Critics of the status quo, including national cybersecurity centers, Britain's NCSC Chief Technology Officer Ali Whitehouse, argue that the current system is already causing economic damage. The issue, as White House explained earlier this year, is the economic concept of a negative externality, a cost caused by one party but financially incurred or received by another, such as a factory emitting dangerous pollutants. The current situation externalizes the cost of insecurity onto the users of the software, rather than internalizing it by forcing the developers to accept the costs of designing better software, whitehouse said, quote the reality is that in 2025 we know how to build secure products and services, unquote. And we know he's kind of right, right? This podcast has articulated a number of simple policy changes, not even fewer bugs, but in the deliberate design and deployment of devices, which would have the effect of dramatically changing the security profile of the Internet over time. But for example, since no one can hold Cisco accountable when anyone anywhere accesses their devices insecure remote management consoles, they have no incentive to implement a change that would also likely increase the technical support burden on them. So, as Ollie Whitehouse here correctly noted, the cost of Cisco's failures are externalized onto their customers. The record says a liability model would push the cost currently born by society back onto the companies themselves, rather than allow those companies to profit from the systemic risks their insecure product products disperse throughout society. Ouch. Despite some interest in the idea in the US under the Biden administration, President Donald Trump has signaled a dislike of the concept, signing an executive order. Well, he saw who he was surrounded with by during his inauguration, signing an executive order earlier this year. Scrapping requirements for software companies who sell to the government to attest their products are secure. We don't want them have to do that. Alongside its work in the U.S. the BSA also lobbied to change the liability regiment being introduced in the European Union's Cyber Resilience Act. Oh. Although the law does not create an EU wide civil liability regime, it includes. I'm sorry, it introduces the power for European regulators to find companies who fail to develop secure Software up to 2 1/2% of their global revenue. They'll feel that the British government maintains a software security code of practice through the ncsc, but compliance with that code of practice remains voluntary. The committee recommended that the government require that companies follow the code as a matter of law, with enforcement agencies able to levy penalties against firms that fall short of the rules. Wow. So we learn that just as our previously bemused legislators have awoken to the fact that they can attempt to regulate the selective use of encryption and age gated access to Internet content, they're also beginning to wonder whether the get out of jail free card that's been long held and used by the software industry may need revisiting. Like I said, unimaginable. But you know, maybe. Okay, a comment, a quick Sci Fi note and then we will get into our main topic here, the second trailer. You know what we once called a preview of the movie made from Andy Weir's project Hail Mary Sci Fi novel appeared Last Tuesday on YouTube and as since then, Leo get this, when I checked, I guess it was yesterday. It has been viewed this second official trailer 15,000,727, 169 times and two of them were me. On the occasion of the first trailer, I created a GRC shortcut to make that first trailer easy to find for our viewers. That was GRC SC Hail Mary H A I L M A R Y. And you know, since YouTube has become a bit of a mess and there's a whole bunch of like weird knockoffs and people commenting on Hail Mary and so forth. Anyway, that'll get you to the first official trailer. I've done the same for the second trailer, but I gave this one an even shorter title. GRC SC PHM Numeral 2 Project Hail Mary Numeral 2 PHM 2. Now I do need to caution everyone about spoilers. Whereas the first trailer disclosed the essence of the dilemma faced by our reluctant hero, this one goes significantly further. And I won't say how because even that would be a spoiler. I have a very good friend, as I noted, who loves movies and science fiction as much as I do, and he refuses to view trailers or learn anything about a movie that he knows he will eventually see. He doesn't read books, so he won't have read the book, you know, whereas in this case I've read it twice. And that brings me back to the dilemma posed by this novel, which I've read twice, being made into a feature length film. It is a wonderful bit of science fiction. I mean, it is really great. Yet I believe that it must represent a huge lost opportunity. It should probably have been made into what has now become the standard ish, eight part limited series in a, you know, as a streaming release. The book is so full of vivid detail, it is so fun and it's so rich and so much happens that I cannot see how it could possibly be crammed into a single feature length theatrical release as a movie. But what do I know? I was also bitterly disappointed that so much of the original Jurassic park novel failed to make it onto the screen. And that didn't seem to hurt its success any. So perhaps the preservation of an author's original pure intent is just for fiction geeks. You know, like many of us and Leo, you, you said that you. That you had heard or believed that they'd actually had to change the nature of what the story is.