Meta Apps and JavaScript Collusion
Loading summary
Leo Laporte
It's time for Security Now. Steve Gibson is here. We're going to talk about an Apple research paper that explores whether these new large reasoning models are really doing any thinking. I'll give you a hint they're not. Goodbye to Bill Atkinson. And then what is the Linux Foundation's response to the WordPress kerfuffle? All that and more coming up next on Security Now.
Steve Gibson
Podcasts you love from people you trust.
Leo Laporte
This is Twitch. This is Security now with Steve Gibson. Episode 1029 recorded Tuesday, June 10, 2025 the illusion of thinking. It's time for Security now the show we cover your security, privacy and everything else. Frankly. We're going to do AI today with this guy right here. He's the king of the hill as a. As far as security comes, frankly, as far as geekiness goes. Mr. Steve Gibson. Hello, Steve.
Steve Gibson
Hello Leo. Great to be with you again.
Leo Laporte
Nothing is more geeky than the clock. With the milliseconds behind show everybody, let's.
Steve Gibson
Just take 50 minutes to discuss this open. Show us how it's a clapper. We need to see the clapper.
Leo Laporte
It's a clapper. It's all this stuff. It's. It's really. I think it's very cool. This is the new clock. So that people. Because people want a clock behind me for some reason. Yeah.
Steve Gibson
It came yesterday, right?
Leo Laporte
Yeah.
Steve Gibson
What's coming tomorrow?
Leo Laporte
You'll see.
Steve Gibson
Okay, we've got a great episode. There's a. Something happened with Meta and also Yandex which is needs to get some attention mostly because it's really interesting. We get to do a deep dive. It's. It bears on a lot as we'll see and that's how we're going to largely start today's podcast. We're going to finish it by looking at some research that Apple's guys did and I don't think this is sour grapes because you know they kind of missed the AI train. They did some tests using something other than math. We which they argue is not a really good way to measure a reasoning model's ability to reason because if it's just really good at matching patterns it can score better than people can because we're not that good anyway. So lots of fun stuff to talk about. We are going to start by as your programs have Leo remembering somebody. I've got that for the first page of the show notes because he was amazing. We're going to talk about this. Meta, native apps and JavaScript colluding behind their users backs. The EU has believe it or not rolled out their own DNS service which the good news is it works much better in the EU than it does here in the States. They didn't, you know, they didn't create it for some guy in Southern California with a benchmark but which is a good thing because not good over here. Also Ukraine DDOs, Russia's Russia's railway DNS. And we're going to pause on that briefly to say so the Linux foundation has created an alternative WordPress package manager because apparently there's some politics over in WordPress land that created some schisms. Oh and a Court has told OpenAI that they must not delete anyone's chats. Anyone's not just selective people. We're going to dig into that. Also there is a very bad well depending upon who uses Erlang OTP's SSH library. If you do hopefully you already know about this 10.0 CVSS there's been some questions raised about whether Russia is able to intercept telegram messages. Seems like maybe which would be a surprise. Spain's ISPs blocked Google sites. Whoops. Reddit is suing anthropic Twitter's new encrypted DMS are apparently as lame as the old ones were. Also seems that the login.gov site doesn't have backups. What could possibly go wrong? Wow. And then we're going to look at an interesting way that Apple came up with to generate some really good metrics about to what degree this next generation like the O3 and Claude 3.7 so called large not large language models but large reasoning models are actually reasoning and whether they're just better at language than reasoning so I think I maybe this time Leo for podcast number 1029 we've actually got some interesting stuff to talk about.
Leo Laporte
Finally after a thousand podcasts we're getting.
Steve Gibson
The hang of it I think actually.
Leo Laporte
I'm really this Apple paper was a research paper is very you know, scientific and deep and and I really thrilled that you want to dissect it because I have I need some help. So we got charts, we got charts.
Steve Gibson
Charts coming up, charts coming up.
Leo Laporte
Of course we count on Steve every week that's aren't glad you're here. This is why we're here. Thank you Steve. We'll get right to the meat of it in just a bit. But first a word from our wonderful sponsor. We really appreciate the people who sponsor the show. They make make it all happen along with you Club Twit members. Our sponsor for this segment is Hawks Hunt, not Fox Hunt. Hox Ho x Hawks Hunt. Look, if you're in your, in your company, you're the security leader. Wow. You have a lot of respect. You get paid to promote and protect your company against cyber attacks. These days though, that is a tough job. More cyber attacks than ever. You've got phishing emails. And what's worse is now they're using AI. So you can't just say if it looks ungrammatical, throw it out. You know, they all look very real, which means your legacy, one size fits all. Training awareness programs don't really stand a chance. They send, you know, at most four generic trainings a year. Most employees go this again and ignore them and learn nothing. When somebody actually clicks on one of these, you know, fake emails, they're forced into embarrassing training programs that really, honestly, if you've ever done it, you know, they feel like punishment, you're being punished. And no one learns if they're being punished. That's why more and more organizations are trying Hox Hunt. Hawkshunt is so cool. I had great conversation with him. Basically, they've gamified this training, right? It goes well beyond security awareness. It actually changes how people behave and it does it by doing something humans really dig. Rewarding good clicks and coaching away the bad. If you see it at work, you can actually see it at the website hawkshunt.com security now you'll really get this. When an employee suspects an email might be a scam, they click a button. Hox Hunt will tell them right away, hey, nice job, you found it. Providing a dopamine rush that gets your people not to be more suspicious, but more, you know, kind of paying attention and motivated to click and learn and of course protect your company, which is what you really need. As an admin, you'll love it because hoxhunt makes it easy to automatically deliver phishing simulations across not just email, but slack teams. You can use AI to mimic the latest real world attacks, so you can really compose some humdingers. Simulations are personalized to each employee. Hi, this is your mom. Based on department location and more. This is so brilliant. You know, you get these micro trainings, not this big long, oh, you gotta watch a slideshow for an hour. But micro trainings, which will really solidify and drive lasting, safe behaviors. This is the way it ought to be. You could trigger great gamified security awareness training that, that awards employees with stars and badges. That sounds silly, but it's true. We all want stars. We do. Boasting completion, boosting completion rates, ensuring compliance There's a huge library of customizable training packages you can choose from, but you can also generate your own with AI. Hawks Hunt. This is so good. It has everything you need to run effective security training in one platform. Every company needs this. You can measurably, measurably reduce your human cyber risk at scale, but you don't have to take our word for it. Over 3,000 user reviews on G2 make Hox Hunt the top rated security training platform for the enterprise. Easiest to use, best results. It also is recognized by as a customer's choice by Gartner. Thousands of companies use it. Some of the biggest, you know, Qualcomm, AES, Nokia, they use it to train millions of employees all over the globe. Visit hockshunt.com securitynow right now to learn why modern secure companies are making the switch to Hawks Hunt. And there's a great demo there. You'll see why it's fun. And you know what? If, if learning is fun, people learn. Hawkshunt.com SecurityNow we thank him so much for supporting SecurityNow. This is actually every advertiser. SecurityNow is a company that I really think you ought to check out. This is one of them. Hawkshunt.com SecurityNow thank you, Hawkshunt. Steve, let's talk. We got a picture of the week.
Steve Gibson
We do. And I gave this one the caption. If your kitchen oven challenges you to prove you're human, something has gone very wrong somewhere.
Leo Laporte
No, not a captcha. Wait a minute, we gotta look at this one. A captcha on an oven.
Steve Gibson
What the what? As Leo would say.
Leo Laporte
Oh, no.
Steve Gibson
And you can see this lady. You can see the reflection of her face with her wearing glasses in the screen of the oven. Somehow she is being asked. The capture that is there on her oven screen is saying click all the buttons that contain traffic lights. You can see the word traffic lights.
Leo Laporte
Yeah, this is a smart things. This must be a Samsung oven. That's crazy.
Steve Gibson
It is not so smart.
Leo Laporte
But they build. What's not smart is they build browsers into their appliances. That's just dumb. And by the way, I have a close friend who has a Samsung refrigerator. She can't use the browser because it's out of date. It's. And that happens so quickly. Oh, but this is even worse. Oh yeah.
Steve Gibson
So gee, all I want to do is warm up the pie and I've. Let's see, where are the traffic lights? And does it.
Leo Laporte
Oh my God.
Steve Gibson
Anyway, if the kitchen oven challenges you to prove you're human, well, they they.
Leo Laporte
You got the wrong oven.
Steve Gibson
A little more technology than it should have.
Leo Laporte
Yeah.
Steve Gibson
So I wanted to take a moment to note with sadness.
Leo Laporte
Yeah.
Steve Gibson
As a huge. The Internet has really responded to the passing of Bill Atkinson, who. Who died last Thursday, June 5, after losing his battle with pancreatic cancer. And as you noted, that's also, of course, we know what took Steve Jobs 14 years ago, back in 2011. Steve was only 56 at the time. Still, Bill went too soon. He was 71, born in 1951. I, I got a kick out of what he wrote in the third person of himself for the about page of his website. He modestly said, aside from being a nature photographer, he, meaning himself, is also well known in the world of software design. Years ago, as a member of the original Macintosh team at Apple, he helped design much of the initial Macintosh user interface and wrote the original QuickDraw, MacPaint and HyperCard software. And as I said, talk about modest. So Bill received his undergrad degree from UC San Diego, which is where he also met the now also famous Apple alumnus, Jeff Raskin, Apple, who also died at pancreatic cancer. And isn't that kind of bizarre, Leo? Like, were they all drinking the same strange potion that Jobs came up with or what? How could all three of these guys. I don't know. Just seems bizarre. What are the chances? But on the other hand, you know that we tend to see patterns even when they don't exist. Anyway, Jeff was employee number 51. Bill Atkins was. I'm sorry, Jeff was. Jeff Raskin was Apple employee 31.
Leo Laporte
Right.
Steve Gibson
So. And, and he, and he, he met.
Leo Laporte
Was 51. So.
Steve Gibson
Yes. Yeah, and he met, he met Jeff at UC San Diego where he is one of his professors. Then later, Bill Atkins Atkinson studied neuroscience neurochemistry at the University of Washington. Then while like, almost, almost closing in on his PhD, Raskin invited Atkinson to visit Apple, where of course, Steve Jobs got his hooks into him and persuaded him to forget. Oh, forget about school. You don't need one of those degrees, you know, who needs that? Join the company and, you know, change the world. And of course, Jobs can be very persuasive when he wants to be. So Atkinson became Employee 51, of course, at Apple. Bill became the principal designer and developer of the GUI for Apple's Lisa and later became one of the first 30 members of the original Apple Mac dev team, where he also principally designed the Mac's UI. He is the author of MacPaint, which at the time, I'm sure you remember this, Leo. Our jaws Dropped. I mean Mac Paint was an astonishing piece of work. No one could believe it. And it was built upon the foundation of the Quick Draw toolbox which Bill had first written for the Lisa and.
Leo Laporte
Then478 just to put it in perspective, I mean this was a long time ago.
Steve Gibson
And then ported that to the Mac. And need I note that Quickdraw was 100% pure Motorola 68000 assembly language. Because that's the only way you could get these machines. I mean in order to create a reasonably priced consumer PC then you basically had a processor and some fancy was hardware in order to map some memory onto the screen. But there was no GPU that, you know that it was all bit banging as we called it in order to.
Leo Laporte
Draw all this in order to, you know. Bill studied with Raskin at the UC San Diego where of course UCSD Pascal came from. And Bill really wanted Pascal in the Macintosh. And everybody said no, you can't put UCSD Pascal on the Macintosh. Bill went back, went home six days.
Steve Gibson
I'm trying to get six fingers on.
Leo Laporte
The screen and wrote it and made it work on the Macintosh and that. So I wrote 68,000 code on my original Mac. But I also remember very well Macintosh programmers workshop and being able to write in Pascal.
Steve Gibson
That was his language that impressed Steve and forever changed Jobs opinion of Atkinson. And the key to this was that Pascal was based on a pseudo machine, a P machine. And so this was the brilliant thing that Bill Atkinson realized. All he had to do was to implement the UCSD Pascal P code, the pseudo machine in Motorola 68000 code. And then all of the rest, the compiler, the editor that was, that was part of UCSD Pascal, all of that and all the apps and everything would start running. So it was like the perfect thing to do in under a week on, you know, in order to say okay, we got UCSD Pascal now. And actually it was a, it was a very nice Pascal. I didn't mess with it on the, the Mac but I did on the Apple ii because Apple II also had UCSD Pascal. Maybe it was beauty using a soft card. I don't quite remember now, but I remember that I wrote a, I wrote a, something that solved some sort of puzzles. I think it was just a, the one of the pig jumping puzzles at the time and I, I did it recursively.
Leo Laporte
I think now that you say that, that the port was for the Apple II not the Mac, right when, I don't know.
Steve Gibson
Oh maybe I think it was. I have to go back and look, may have been. Yeah, that would certainly make sense.
Leo Laporte
Yeah, it makes, kind of makes more sense. This is what was interesting is the interface for the Macintosh and this is inside Mac. The volume is all in Pascal. So if you wanted to write to the Apple rom, you could do that. That's right.
Steve Gibson
So Pascal would have existed, it would have been well in place by then.
Leo Laporte
So I think he did for the Apple too.
Steve Gibson
And that makes sense too because MacPaint was a hybrid of Pascal and assembly language.
Leo Laporte
Some of those low level quick draws obviously got to be an assembly, right?
Steve Gibson
Yeah, Pretty impressive anyway. And of course Bill also then famously designed and implemented HyperCard which gave non programmers access to programming and database design. And in fact, years later, Bill Atkinson received, It was in 1994 the EFF Pioneer Award for his contributions in the field of personal computing. You interviewed him on a long triangulation.
Leo Laporte
We did four with him. One of them was a five hour interview.
Steve Gibson
Yeah, yep. And you, you chopped it up into two pieces. So I wanted to let our listeners know that, that you guys did a great job with Bill Atkinson. Anybody who wants to, you know, listen to him and look at him, you know, being interviewed by you, you know, it was great.
Leo Laporte
I felt very fortunate to be able to spend so much time with somebody that I admired so much.
Steve Gibson
Well, and all of his photo card stuff that you talked about for years was incredible. Yeah.
Leo Laporte
Here's if you go to my blog. Leo fm, John Jammer B Slanina took a bunch of pictures of Bill and I took a few of them of him on our set. And you notice, by the way, he brought the sidekick, he brought the Macintosh, he brought a lot of stuff. And I think it was Alex Gumpel who had a unopened copy of HyperCard that Bill signed for us. Just incredible. I have a link there to all of the interviews that we did with Bill over a period of time. The first one was in 2016 at the Brick House and the last one was in 2018 in the east side studio. Yeah, that was the one. We spent five hours together. I just, I'm really, for some reason this really, this one really hit me.
Steve Gibson
Well, he was a good guy.
Leo Laporte
He really was.
Steve Gibson
He was also, you know, 74. Let's not be leaving at age 74. I'm certainly not planning.
Leo Laporte
We're both getting close and that, that's maybe another reason. But also it hit me, I think, because this is a generation, yes, Steve Jobs, Jeff Raskin, Bill Atkinson were a generation of people who changed computing forever. And we owe them so much, you know.
Steve Gibson
Yeah, there was someone we were, Laurie and I were looking at or talking about the other day. Who I don't know. They're 10 and like they will never know a world that didn't have the Internet, that probably, they probably won't be. Won't be aware of it really, that didn't have AI assistant stuff. I mean, they're growing up in an entirely different environment than we did. I mean, it's just. There's no comparison.
Leo Laporte
That's. I guess that's why I feel it's incumbent on us to remind them of their elders, the people who made it all possible.
Steve Gibson
And then we just seem old. Oh, you.
Leo Laporte
Yeah, back in the day.
Steve Gibson
Oh well. Oh, okay. So if anyone might be at all unsure about just how badly the likes of Meta are determined to surreptitiously track their users movements around the Internet for the purpose of secretly profiling them, the news I have to share about a recent super sneaky tracking discovery, something we've never talked about before, will disabuse anyone of any doubts along those lines to quickly lay out what it does and how it works, the write up of this begins with a quick overview. The guys who found this wrote we disclose a novel tracking method by Meta and Yandex potentially affecting billions of Android users and I'll just save the record. Not only Android, this is cross platform, but it's being done on Android. We found that native Android apps including Facebook and Instagram and several Yandex apps including Maps and Browser get this silently listen on fixed local ports for tracking purposes. Okay, now I'll just interrupt to note that that's actually kind of diabolically brilliant. Although I'm not endorsing it, it's not completely new. For example, my own native Windows Squirrel client and the other Squirrel clients that people created running in the user's machine opens and listens on port 25519. Of course, I chose that port because that's the crypto that I used for connections from a SQRL script running on login pages. The SQRL login JavaScript on a website's login page would send the SQRL client app, which is running on the user's machine, a unique token. By opening a TCP connection to the local host IP where the resident Squirrel client app was listening. The Squirrel client app would then connect to the remote site at the URL provided by the website, which contained a unique token. It would identify its user, I.e. the SQRL client app would identify its user and use the unique token to perform a secure public key authentication. Upon authentication success, the remote site would return a URL which the SQRL client would then forward to the waiting web browser, which would then jump the user to the logged on page at the site. Thus essentially presto, without doing anything, the user would be logged in with complete security that could not be hacked, spoofed or intercepted. So that's how I used this feature, which is controversial at best, to allow script running in the browser to connect to something listening on the local host IP, you know 127.0.0.1 so the idea of allowing a website's JavaScript to talk to a local native app is not entirely new. But of course what Squirrel was doing was above board and fully documented as part of the protocol. That is decidedly not the case with Meta and Yandex who were doing this purely for tracking. And oh is this powerful for tracking because it bypasses everything. During the development of Squirrel there was some worry about this handy facility disappearing since Microsoft was aware of the potential for the abuse of this and for a while they tried to shut down browser access to the local host IP from within the web browser. But there turns out there are many other legitimate use cases for this too. So much so that too many things broke when Microsoft tried to do this and they were forced to backpedal and leave the facility in place on Windows. And it's obviously there on Android. So the guys who discovered Meta and Yandex's abuse explained these native Android apps receive browsers, metadata, cookies and commands from the Metapixel, which is what they call it. It's actually a JavaScript Metapixel and Yandex Metrica scripts embedded on 5.8 million websites. These JavaScripts load on users mobile browsers and silently connect with native apps running on the same device through local host sockets. Since native apps have access to device identifiers like the Android advertising ID or directly handle actual user identities as in the case of meta apps, this method effectively allows these organizations to link mobile browsing sessions and web cookies to real world user identities. De anonymizing users visiting sites, embedding their scripts. This web to app ID sharing method bypasses all this is them writing this bypasses all typical privacy protections such as clearing cookies, incognito mode and all of Android's permission controls. Yes, it also opens the door for potentially malicious apps eavesdropping on users web activity because nothing prevents other apps from also saying oh let's monitor all of these metapixel JavaScripts which are going to be trying to connect to local host. So what we have here is an interesting and extremely privacy invasive hack. The concept is that this is not leveraging some bug that can be found, fixed and eliminated. As I noted, Microsoft previously tried and failed to eliminate this capability. I think it was when they were heading toward IE11 as I recall. I think that was the IE that was going to be saying no more of this local host business. They had to back away. Maybe it was 10, I don't know. Anyway, so that everyone's clear about this, the problem Microsoft had with cutting off their browser from all access to the local machine is that it has always been possible to do this. And as we've often seen it, anytime something is possible, it will eventually be done. And once applications have become dependent upon some available mechanism, it's extremely difficult to take it back. For example, many web developers run local web servers on their machines and they test their web code locally on web browsers running on the same machines. It's entirely practical and easier than needing to set up some second external web server somewhere and talk to it. Another example is that web browsers have become so powerful that a local application might be written to be headless without its own desktop UI and presence on its own. Instead, it will just launch the system's web browser to perform all communication with the user. The user experiences it as a website, but they're actually communicating with an application running on their own local machine. This is done by running a web server on the local machine, which the browser communicates with. So Meta and Yandex are both abusing this deliberate and formally supported ability of web browsers not only to connect to faraway remote servers out on the Internet, but but also to little local servers set up and running inside any application on the same machine. And there's no obvious way any user can know this is going on, let alone prevent it from happening. Since this problem is not going away, let's take a closer look at what these researchers found. They wrote While there are subtle differences in the way Meta and Yandex bridge web and mobile contexts and identifiers, both of them essentially misuse. This is again, this is them writing this essentially misuse the unvetted access to local host sockets. The Android OS allows any installed app with the Internet permission, which will be all Android apps except maybe Calculator to open eliciting socket on the loopback interface 127.0.0.1 browsers running on the same device also access this interface without user consent or platform mediation. This allows JavaScript embedded on web pages to to communicate with native Android apps and share identifiers and browsing habits. Bridging ephemeral web identifiers to long lived mobile app IDs using standard web APIs. The meta Facebook pixel JavaScript, when loaded in an Android mobile browser, transmits the first party underscore FBP cookie using WebRTC to UDP ports 12580 through 12585 to any app on the device that's listening on those ports. They said we found meta owned Android apps Facebook and Instagram available on the Google Play Store listening on this port range. So here's the step by step of this in detail. First, in their normal course of use, the user opens their native Facebook or Instagram app on their device. You know, on any Android device, Android smartphone the app is eventually switched away from is sent to the background and creates a background service to listen for incoming traffic on a TCP port 12387 or 12388 and a UDP port, the first unoccupied port in the range from 12580 through 125 85. Users must be logged in with their credentials on the apps, so the app the user identified to the app Facebook or Instagram. The user then opens their web browser and visits any one of 5.8 million websites which integrate the meta pixel JavaScript. Websites may ask for content for consent depending upon the website and the visitor's location and local requirements for them to do so. The metapixel script sends the underscore FBP cookie to the native Instagram or Facebook app using the WebRTC protocol. The metapixel script simultaneously sends the underscore FBP value so the same cookie it's sending to the local app, it sends it to www.facebook.comtr&G. Do you think that maybe TR might be short for track? The URL's query tail contains other parameters such as the page's URL, website and browser metadata, and even the event type like page view add to cart donate purchase whatever the Facebook or Instagram app which has received that underscore FBP cookie from the metapixel JavaScript running on the browser, then transmits that to graph.facebook.com graphql along with other persistent user identifiers which links the user's FBP cookie ID with their Facebook or Instagram account, thus bypassing all other privacy controls which the industry has created through the past, you know, most recent 10 years or so. The researchers explain, according to Meta's cookies policy, the underscore FBP cookie quote identifies browsers for the purpose of providing advertising and site analytics services and has a lifespan of 90 days, unquote. The cookie is a present on approximately 25% of the top million websites and as we saw 5.8 million overall, making it the third most common first party cookie of the web according to well, Web Almanac 2024. They said a first party cookie implies that it cannot be used to track users across websites as it is set under the website's domain. That means the same user has different underscore FBP cookies on different websites, right? That's the way it's supposed to be. Now however, the method we disclose, they write, allows the linking of the different underscore FBP cookies to to the same user, which bypasses existing protections and runs counter to user expectations. Okay, so just to be clear, this entire surreptitious surveillance system was specifically designed to explicitly and deliberately bypass not only all user expressible anti tracking wishes, but also to circumvent all of the work the browser vendors have invested in to limit and control cross site tracking. This neatly circumvents all of the explicit first party domain tide cookie isolation and stove piping that our web browsers have recently added specifically to prevent the abuse. So evil of it is really evil and there is no other purpose. It's doing nothing other than this. There is no other reason and the.
Leo Laporte
Only way to really remove it is to remove Facebook and Yandex apps from your phone.
Steve Gibson
Yeah it is and did. This behavior is entirely indefensible.
Leo Laporte
I just deleted Facebook from everything. Everything. Unbelievable.
Steve Gibson
So that's what Meta has been up to. How does the Russian service Yandex compare? The Researchers write since 2017 the Yandex Metrica script initiates HTTP requests with long and opaque parameters to local host through specific TCP ports to 2900-929-0103102 and 3103. Our investigation revealed that Yandex owned applications such as Yandex Maps, Navigator Search and browser actively listen on these ports. Furthermore, our analysis indicates that get this one Leo oh boy. The domain yandexmetrica.com y-a n d e x m e t r I c a dot com is resolving to the loopback address. I put it into nslookup because I couldn't believe it yesterday and sure enough it came up 127.0.0.1 what it resolves to local host yes, in order to be extra sneaky, and I'll explain that in a second and that the Yandex Metric script transmits data via HTTPs to local ports 29, 0.10 and 3103. This design choice, they wrote, obfuscates the data exfiltration process, thereby complicating conventional detection mechanisms. Okay. In other words, it's quite sneaky to have a public domain like yandexmetrica.com resolving to the local host IP127001 since script code analyzers would likely look for the string local host or the IP 127.0.0.1. But Yandex embeds a public appearing to domain name to further obscure what's actually going on, and their use of HTTPs means that any communications is also obscured and is less easy to intercept, monitor and analyze. And then Yandex gets even trickier. The researchers explain Yandex apps contact 8, contact a Yandex domain startup mobile, yandex.net or similar to retrieve the list of ports to listen to the endpoint returns a JSON containing a JSON object containing the local port number, you know, 310229 get this and a first delay seconds parameter which they wrote we believe is used to delay the initiation of the service on one of our test devices. First delay seconds roughly corresponded to the number of seconds it took for the Yandex app to begin listening on local ports, which was around three days. The only possible reason for this is to avoid detection and to prevent any researchers from easily discovering this deliberately concealed behavior. It's really despicable, they write.
Leo Laporte
After receiving Facebook wouldn't do anything the Russians would do.
Steve Gibson
That's right. At least receive, they said. They said after receiving the local host HTTP requests from the Yandex Metrica script, the mobile app responds with a base 64 encoded binary payload, embedding and bridging the Android Advertising ID, among other identifiers accessible from Java APIs like Google's Advertising ID and UUIDs, potentially Yandex specifically as opposed to Meta's Pixel case. All of this information is aggregated and uploaded together to the yandex metrica server mc.yango.com by the JavaScript code running in the web browser rather than by the native app. In the case of Yandex, the native app acts as a proxy to collect native Android specific identifiers, then transfer them back to the browser context through local host sockets. Okay, in other words, Meta has their native Facebook or Instagram app doing the communicating with the meta mothership. Whereas the various Yandex apps run native servers that the Yandex JavaScripts communicate with in order to specifically to obtain whatever device specific information Yandex may wish. That information is then returned to the browser from the little local Yandex servers, which the Yandex JavaScript then forwards to Yandex. The researchers point out an additional problem under their heading additional risk browsing history leak and Leo, I note that we're at 40 minutes in, so let's take a pause and then we're going to look at the additional problems that doing this creates. And there are several.
Leo Laporte
So now we should mention that they've. They've stopped doing this. Right.
Steve Gibson
This is the day this report was published. They went, oopsie, that's admitting it. That's the day it came out. It suddenly stopped.
Leo Laporte
Oh, we don't do that.
Steve Gibson
What are you talking about?
Leo Laporte
What are you talking about? Omg.
Steve Gibson
I know.
Leo Laporte
If I use the Facebook app on a computer, is it doing the same thing or like the website?
Steve Gibson
Well, it would be interesting to see if you ran the Facebook app on Windows.
Leo Laporte
Oh, it would do the same thing.
Steve Gibson
You could do a, A net stat.
Leo Laporte
Right.
Steve Gibson
And get the application names that are opening and listening on the local host and see whether it's. Whether, whether Facebook and Instagram apps are listening on localhost. I don't know if that is. I'm not running any of that.
Leo Laporte
Right. Yeah, for good reason. Holy cow.
Steve Gibson
It is a spy on, on anyone's machine.
Leo Laporte
Willfully bypassing every indication that you as a user have made that you want privacy.
Steve Gibson
Yes. And willfully bypassing all of the browsers, well meaning attempts to allow this to happen. But we're going to keep you from tracking with it. And Facebook's.
Leo Laporte
Can I block these ports?
Steve Gibson
We're going to be talking about that.
Leo Laporte
Okay, that's coming after break.
Steve Gibson
After our break.
Leo Laporte
Obviously I have many questions, all of which will be answered soon. Oh my goodness.
Steve Gibson
Yeah, it's just evil.
Leo Laporte
This is why we listen. What a great. What a great show. Our sponsor for this segment of security now, another great product. Love these guys. Threat Locker. Threat Locker is zero trust. Done easy and right and affordably. That's. I don't even need to do the ad. I just should tell you that that's all you need to know. Right. I'll give you some extra information about it. You know, ransomware, you know, if you listen to the show is, is just killing the business world everywhere. Not just business, schools, infrastructure, local city Governments, phishing emails, infected downloads, malicious websites, RDP exploits. Look, you don't want to be the next victim. You need ThreatLocker's Zero Trust platform. How does it work? Really simple. It takes a proactive and these are the three words you care about. Deny by default approach. Deny by default approach that blocks every unauthorized action unless it's explicitly authorized. It does not happen. That protects you from both known and unknown threats. Threats nobody ever heard of. Right? Because they can't do anything unless they're explicitly authorized. That's why global enterprises like JetBlue Trust, Threat Locker, Port of Vancouver infrastructure trusts Threat Locker. Threat Locker shields you and them from zero day exploits and supply chain attacks while providing complete audit trails for compliance. Threat Locker's innovative ring fencing technology isolates critical applications from weaponization, stops ransomware, limits lateral movement within your network. And the good news is it works in every industry. It supports PCs and Macs. Your network can be protected in its entirety. You get great support 24. 7 from US based support folks and, and in a way it's a side effect, but it's a great one. Threat Locker enables comprehensive visibility and control. Mark Tolson, he's the IT director for the city of Champaign, Illinois. Another, you know, very important mission critical IT operation. He says, and this is a direct quote, Threat Locker provides that extra key to block anomalies that nothing else can do. If bad actors got in and tried to execute something, I take comfort in knowing Threat Locker will stop that. Stop worrying about cyber threats. Get unprecedented protection quickly, easily and cost effectively with Threat Locker. We've talked about Zero Trust on the show before. It's a really great technique. This is the best way, simplest, easiest way to implement it. Visit threatlocker.com TWIT, you get a free 30 day trial and you'll learn more about how ThreatLocker can help mitigate unknown threats and ensure compliance. Threatlocker.com TWIT Yes, Windows and Mac. All right, so I want to hear more. Let's, let's go.
Steve Gibson
Yeah, so they said under their heading Additional risk Browsing history leak they wrote using HTTP requests for web to native ID sharing, which is what these guys are doing, may expose users browsing history to third parties. A malicious third party Android application that also listens on the aforementioned ports can intercept HTTP requests sent by the Yandex Metrica script and meta's communication channel by monitoring the Origin HTTP header which is the website domain. Thus any app on the platform is able to to use this to basically the user's web browser has now been turned into a leaking civ which is broadcasting everywhere the user goes that has either a Yandex or a meta JavaScript cookie and anybody is able to listen for it, they said we developed a proof of concept app to demonstrate the feasibility of this browsing history harvesting by any malicious third party app. We found that browsers such as Chrome, Firefox and Edge are susceptible to this form of browsing history leakage in both default and private browsing modes. You can't hide from this. The Brave browser was unaffected by this issue due to their block list and and the blocking of requests to the local host, and DuckDuckGo was only minimally affected due to missing domains in their block list. I didn't understand what they meant by that, but it's interesting that Brave does have localhost blocked. While the possibility for other apps to listen to these ports exist, we have not observed any other app not owned by Meta or Yandex listening to these ports due to Yandex using HTTP requests for its local host communications. Any app listening on the required ports can monitor the website. A user visited with these tracking capabilities as demonstrated by the video above, and they had a video on their site showing it. They said, we first open our proof of concept app which listens to the ports used by Yandex and send it to the background. Next we visit five websites across different browsers. Afterwards we we can see the URLs of these five sites listed in the app. In other words, once this local system abuse is present, there's nothing to prevent other apps from establishing their own competing services, little servers, and hooking into this illicit extra browser communications to obtain for their own purposes the same Internet wide tracking and monitoring that the Meta and Yandex apps are deliberately employing. Finally, summarizing things they wrote, this novel tracking method exploits unrestricted access to localhost sockets on the Android platforms, including most Android browsers. As we show, these trackers perform this practice without user awareness as current privacy controls, sandboxing approaches, mobile platform and browser permissions, web consent models, incognito modes, resetting mobile advertising IDs or clearing cookies are all insufficient to control and mitigate it. We note that localhost communications may be used for legitimate purposes such as web development. However, the research community has raised concerns about localhost sockets becoming a potential vector for data leakage and persistent tracking. To the best of our knowledge, however, no evidence of real world abuse for persistent user tracking across platforms has been reported until our disclosure. Our responsible disclosure to major Android browser vendors led to several patches attempting to mitigate this issue, some already deployed, others currently in development. We thank all participating vendors, Chrome, Mozilla, DuckDuckGo and Brave for their active collaboration and constructive engagement throughout the process. Other Chromium based browsers should follow upstream code changes to patch their own products. However, beyond these short term fixes, fully addressing the issue will require a broader set of measures as they are not covering the fundamental limitations of platforms sandboxing methods and policies. These include user facing controls to alert users about local host access, stronger platform policies accompanying by consent, and strict enforcement actions to proactively prevent misuse and enhance security around Android's inter process communication mechanisms, particularly those relying on local host connections. So I'll add that while these guys are only focused upon, as I said earlier, mobile platforms, this is not a mobile only problem. As I said, my implementation and others of this legitimate intra platform communication for squirrels use works cross platform everywhere on both mobile and desktop. So we know that there are currently no controls for this. My own feeling is that no browsers should allow this by default. It's just too dangerous to permit out of the box. So the default should be for browsers to block and notify their user when any website they visit attempts to open a backdoor channel to something running perhaps surreptitiously on their own local machine. Any legitimate use of this, such as for web development would then expect and permit this and a browser might offer some configuration there might be like for example 3 settings block and don't notify or request permission or always allow. And as another option since for example Firefox certainly appears to have no upper limit on the number of fine grained configuration settings that it's able to manage, a user might permit this local host network communication only over certain ports, such as the standard web ports 80 and 443 to promote to permit local web server access while blocking all other high ports that apps might use. And technology aside, this makes one sort of shakes one's head, Leo. And I know your head's been shaking for the last half hour.
Leo Laporte
No kidding.
Steve Gibson
You know Yandex is Russian, so they're not friends, they're not friends of the west and they're certainly or privacy, right? And they're certainly not on any friendship trajectory toward the west, right? But Meta is a huge and we would wish responsible US corporation that would like to have and deserve the trust of its users. But but the design and installation of these covert backdoors in their apps which can only have the purpose of communicating with Matching user tracking web scripts spread across 5.8 million Internet sites really deserves the attention, I think, of U.S. authorities. You know, and as you noted, Meta knows this was wrong because this horrifying behavior was immediately shut down the same day after the publication of this research. They got caught bypassing all user choice and anti tracking browser enforcement and immediately turned it off. They're able to do this since those javascripts are all being sourced by their own content delivery network. So it was only a matter of changing the code being sent out from the mothership. But their apps will still be opening and listening for any local web browser connections. Who's to say where, when and how they might attempt to resume this behavior in the future? Who would know?
Leo Laporte
Yeah, I'm sure they'll try something else. These guys are smart and this just.
Steve Gibson
Demonstrates how determined they are. They, they, they insist on, on profiling their, their own users.
Leo Laporte
Well, if there's any question in anybody's mind about whether Facebook was evil, there should not be any question. Evil's maybe a strong term.
Steve Gibson
Not your friend, amoral.
Leo Laporte
Yeah, I mean I'm sure in their minds it's justified because they need that tracking to sell ads and that's their revenue model. I think it's really good that you've exposed them and these guys have exposed them and everybody should know this, should somebody. So there are a couple of things. One point somebody made is Paul did Paul Holder, your friend and ours. Because as soon as everybody knew this, we could have reverse abused them and flooded them with fake sites and IDs, which is true. As soon as it becomes public, it's easy to fake another person pointed out out of sync. Also very smart that it would be nice if, if you'd get a pop up when the browser is accessing a localhost because that's definitely, that's questionable behavior. There's times when you do that I do it, but you know you're doing it if it's happening and you haven't done it on purpose.
Steve Gibson
Exactly.
Leo Laporte
That's not good.
Steve Gibson
Or I would say if the user puts localhost into the URL address, then they're deliberately going to a local host server. If script tries to access local host. Oh and boy, in that tricky setting up Yandex metrica.com to resolve to 127001. Oh.
Leo Laporte
So who is. What, what, what domain register would allow that? I guess you just change the DNS to point.
Steve Gibson
Exactly. It's just the DNS pointing there.
Leo Laporte
Yeah. Wow. Unbelievable.
Steve Gibson
Yeah, I mean there, there is no excuse for this. They got caught. And I mean, their own guilt is demonstrated by the fact that they immediately turned it off. It's like, oops, bad idea, guys.
Leo Laporte
Wow, what a story. Thank you for that.
Steve Gibson
Yeah. Let's take another break since we're now at an hour and then we're going to look at the DNS servers, the new service that's been set up in the EU by the eu.
Leo Laporte
I think that's fascinating.
Steve Gibson
Yeah.
Leo Laporte
Boy, that's really interesting.
Steve Gibson
Just as long as the US Just.
Leo Laporte
As long as you trust them, it's okay. It's good.
Steve Gibson
Actually, there's been some question.
Leo Laporte
Yeah. Why would they. It's a service.
Steve Gibson
Yes. And no one makes anyone, you know, use their DNS, so I think it's above board. Anyway, we'll get to that in a second.
Leo Laporte
Yeah. Our show today, brought to you by US Cloud. You know the name. I've been talking about it for some time now. I admit when I first heard of them, I said, are you a cloud company? They said, no, we're the number one Microsoft Unified support replacement. We can't help you in the cloud. I'll explain how in a minute. But their business is to replace Microsoft expensive Microsoft support with better, less expensive, faster support with US Cloud. They're the global leader in third party Microsoft support for enterprises. They support 50 of the 50 of the Fortune 500. And one of the reasons, yes, switching to US Cloud can save you a lot. 30 to 50% over your Microsoft Unified and Premier support. That's big savings. But it can't just be less expensive. It has to be as good. Oh, how about this? It's better, certainly. It's faster, twice as fast on average. Time to resolution versus Microsoft. Okay, that's good. And now US Cloud's excited to tell you about a new offering that will save you money. And this is the cloud right here. Azure Cost Optimization. So let's talk here. All right. When was the last time you evaluated your Azure usage? You've been thinking about it, but if it's been a while, you've undoubtedly got some Azure sprawl. A little, you know, spend creep going on. The good news is saving on Azure is easier than you think. With US Cloud. US Cloud offers an eight week Azure engagement powered by VBox that in eight weeks, two months will identify key opportunities to reduce costs across your entire Azure environment. With expert guidance, you'll get access to US Cloud senior engineers. That's the other way. US Cloud's better. These guys are the best. An average of over 16 years with Microsoft products. At the end of the eight weeks in this engagement, you'll get an interactive dashboard which will identify where you have rebuild or downscale opportunities or unused resources. Which means you can take those, reallocate them, and take those precious IT dollars, put them towards needed resources. May I suggest you keep the savings going and invest that Azure savings in US Cloud's Microsoft support. That's what a few US Cloud customers have done. Eliminate your unified spend and the savings just continue on. Sam, the technical Operations manager at Bead Gaming B E D e Gaming, gave us Cloud 5 stars. He gave us this review, and I'm quoting, we found some things that had been running for three years which no one was checking. Three years. These VMs were, I don't know, 10 grand a month. Not a massive chunk in the grand scheme of how much we spend on Azure. But, you know, once you get to 40 or $50,000 a month, it really starts to add up. It's simple. Stop overpaying for Azure, identify and eliminate Azure creep and boost your performance. And you can do it all in eight weeks with usCloud. Visit uscloud.com right now. Book a call, find out how much your team can save. That's uscloud.com book a call today. Get faster and better. Much better. Microsoft support for a lot less. Uscloud.com we thank him for supporting security now. Okay, Steve, let's go.
Steve Gibson
Okay, so last week you can go Leo to join DNS 4 numeral, 4 EU.
Leo Laporte
Funny that it's in English.
Steve Gibson
Last week the European Union launched its own multi flavor DNS service.
Leo Laporte
They call it a safe space.
Steve Gibson
Join the European Safe Digital Space. So there are flavors for government, for telcos and for home users. The service is designed to provide secure and privacy focused DNS resolvers for the EU block as an alternative to US and other foreign services.
Leo Laporte
So they. They want their own?
Steve Gibson
Yeah, they want their own.
Leo Laporte
Okay.
Steve Gibson
The project was first announced back in October 22nd. You know, October 2022. Sorry, the year 2022 and was built under the supervision of the EU Cybersecurity Agency en ISA. It's currently managed by a consortium led by the Czech Republic security firm Whalebone W H A L E B O N E. And members include cybersecurity companies, CERTs, academic institutions from 10 EU countries.
Leo Laporte
Sounds good.
Steve Gibson
I confirmed the whale bone ownership since I immediately dropped the various DNS resolver IPs into GRC's DNS benchmark and the benchmarks ownership tab showed they were all within A network owned by Whalebone sro. Now, naturally, these EU resolvers include built in DNS filters for malicious and malware linked domains that is filtering them out that prevent users from connecting to known bad sites. The lists are managed from a central location by EU threat Intel analysts. And none of this costs anything for EU users or anybody for that matter, nor companies or or any governments that might decide to adopt the service. The pitch to governments and telcos is that having the EU offer a trusted DNS service can eliminate the costs and overhead associated with running their own DNS infrastructure. And to the degree that independent DNS services required security personnel to manage and filter the directory, you know, like upkeep and all that, that can now be offloaded to this dedicated DNS for EU team. The variations that are offered for DNS which are targeted to home users give people a choice of different profiles. You know, malicious domains can be removed. Adult content, ad filtering, interestingly so this.
Leo Laporte
Is like next DNS or OpenDNS or 4444, like Cloudflare.
Steve Gibson
Exactly. So on their page for home users they say choose the resolver that fits your needs. So at 86.54 11.1, that's the protective resolution that removes questionable and malware domains. If you use 11.12 you get protective plus child protection, so it removes adult content. Or if you use 11.13 you get protective plus ad blocking. 11.11 gives you all of that protective child protection and ad blocking. Or if you go to 11.100, that is to say 86.54 11.100, you get unfiltered DNS, all of the domains that are available on the net. Now, while it would be nice to have government backed free DNS web content filtering, you know, I have a DNS benchmark and so I immediately dropped those IPs in wondering how those five resolver IPs list on the benchmark and I was not impressed. I've got. I included a clip from the benchmark showing the performance where the word atrocious comes to mind. But stop. Because people in the EU have since confirmed they work great over there. And of course that's what you'd expect, right? For me in Southern California, their average Response time ranged from 163 to 173 milliseconds, which is very slow. For example, compare that to Cloudflare's DNS that the same benchmark had come in at 20 milliseconds and okay, but again, I want to make sure everybody understands they didn't do this for me in Southern California Union.
Leo Laporte
Yeah.
Steve Gibson
Is not suggesting that someone located in Southern California should be using their DNS at all.
Leo Laporte
Is it typically the case that if it's geographically closer, it's, it's faster?
Steve Gibson
Yes, because the packets have to travel all that distance.
Leo Laporte
So it's at the speed of light, Steve.
Steve Gibson
I mean, yeah, but turns out it's got to go across the ocean under.
Leo Laporte
The cable, you know, does it go through other servers too on the way or is it.
Steve Gibson
Yeah, the. It is bouncing. Well, I am connecting directly to that server. The, the reason that Cloudflare is so fast anywhere is that they're a CDN, right? You know, they're. You use a Cloudflare IP, what is it? 1.1.1.1. Well, you're not actually, you know, that's a pseudo IP. You're actually being routed to some very local Cloudflare DNS server that is physically close to you. Even though I use that and people in the EU use that ip, they're getting a Cloudflare server near them.
Leo Laporte
So it makes sense that Whalebone would be slow from Southern California.
Steve Gibson
And again, I want to make sure everybody understands. I posted to GRC's DNS dev newsgroup where we've all been testing this evolving next generation DNS benchmark code and I asked anybody who's located in the EU to give the same set of DNS IPs a run. Because of the time zone difference, I didn't hear back by the time I posted today's show notes. Since then I have and they that for Anybody in the EU, they're getting great performance. They're getting the same 20 millisecond ish performance from those.
Leo Laporte
Yeah.
Steve Gibson
Yes. So, and that's why, that's why frankly GRC's benchmark is so valuable is, you know, I don't get the same thing as when somebody else runs it. It matters where you're running it from and which is to say it, you know, and that's the DNS server you want to choose for that location. So the DNS services are available under all protocols. IPv4, IPv6, DNS over UDP. And so, so, so those, you know, IPv4 and V6 over UDP, but also DoH and DoT, where you get privacy enforcing secure DNS over TCP with tls. So that the benchmark showed them in green, which also indicates that they support DNS SEC security. So that the records that are available, you know, they, they will support signed, cryptographically signed DNS records to prevent any one from spoofing or altering those records. So anyone in the EU wishing to explore this further should jump their browser over to join DNS numeral4EU where you'll find all the information is.
Leo Laporte
It's free, right?
Steve Gibson
Yeah, it's free.
Leo Laporte
And see, I understand if you're in the eu, you might want to use this.
Steve Gibson
If.
Leo Laporte
If our government decided to make a DNS server, I don't think I'd use it. No, I just don't think I want to use the Doge DNS server.
Steve Gibson
While we're on the topic of DNS, I noted that Ukraine's military intelligence agency claims that it took down the DNS service of the Russian railways using a 6 gigabit 2.5 million packet per second DDoS attack. The reporting was in Ukrainian news and it was in Ukrainian and I didn't bother to dig any further. It's unclear to me what that accomplished. You know, it was fun. We could do it. Yeah.
Leo Laporte
As we know, trains do not run on time now.
Steve Gibson
Yeah. Any attack on DNS would need to be sustained until the local DNS caches expired. At that point things would begin to collapse. But it wasn't clear what would collapse. Would the trains no longer run at all? Would the scheduling and the ticket sales fail? I don't know. Now that said, using a large number of inexpensive stealth fully inserted autonomous drones to remotely take out many extremely expensive Russian cruise missiles launching warplanes, now that's something to write home about.
Leo Laporte
And a 6 gigabit attack is not that.
Steve Gibson
No, those are like okay, wow. Yeah, okay, I guess, I guess it wasn't a very well, and it's probably some server in a closet somewhere that is like it started to smoke, but okay, who cares? It's the Russian railway anyway.
Leo Laporte
It's the Russian railway.
Steve Gibson
That's right. The Linux foundation has launched what they call the fair f a I r WordPress package manager, giving the. You know, given the astonishing number of websites that use the WordPress core as their content management system, their their CMS, I always want to keep our listeners abreast of any important WordPress related news. So when the Linux foundation announces the the launch of their replacement for WordPress.org's own package manager, that makes the news cut. I haven't kept up up to date on the politics surrounding WordPress and automatic, but the reporting that I saw said, quote the new system is a decentralized alternative to the WordPress.org plugin and theme ecosystem developed with help from veteran WordPress developers who were pushed out from the main WordPress project last year during a power grab by Automattic and Matt Mullenweg.
Leo Laporte
Oh. Ow.
Steve Gibson
So there.
Leo Laporte
Yeah.
Steve Gibson
So what I do know is that this replacement looks pretty sweet.
Leo Laporte
The.
Steve Gibson
The. It's called the Fair PM page, so it's GitHub.com fairpm they explain the FAIR Package Manager is an open source initiative backed by the Linux Foundation. Our goal is to rethink how software is distributed and managed in the world of Open web publishing. We focus on decentralization, transparency and giving users more control. Our community brings together developers, infrastructure providers and Open web contributors and advocates who all share the same mission to move away from centralized systems and empower site owners and hosting providers with greater independence. FAIR is governed through open working groups and consensus driven processes, ensuring that its development reflects the needs of the broader community. Whether you're a contributor, a host or an end user, FAIR invites participation at every level, from writing code and documentation to community organization and governance. As a community led project, we aim to build public digital infrastructure that's both resilient and fair. The FAIR Package Manager is a decentralized alternative to the central WordPress.org plugin and theme ecosystem designed to return control to WordPress hosts and developers. It operates as a drop in WordPress plugin and seamlessly replaces existing centralized services with a federated open source infrastructure. And then they finished with. There are two core pillars to the FAIR system. First, API replacement. It replaces communication with WordPress.org APIs such as update checks and event feeds using local or FAIR governed alternatives. Some features, like browser version checks are handled entirely within the plugin using embedded logic. And they said, for example, browsers list. And then second, decentralized package management. FAIR introduces a new package distribution model for themes and plugins. It supports opt in packages that use the FAIR protocol and enables hosts to configure their own mirrors for plugin theme data using AspirePress or their own domains. While stable plugins currently use mirrors of WordPress.org future versions will fully support FAIR native packages. So anyway, this seems like a useful addition to the Internet's number one web authoring and delivery system.
Leo Laporte
Yeah, kind of a rebuke to Matt Mullenweg.
Steve Gibson
Yeah, yeah. Especially when it was, you know, it was. It was created by people who were pushed out, who were, you know, old WordPress hands.
Leo Laporte
Right.
Steve Gibson
So they said, okay, fine, we'll do our own.
Leo Laporte
However you feel about Matt, it does seem appropriate that WordPress should not be dependent entirely on WordPress.org for its libraries. I think it's just too important.
Steve Gibson
It's gotten. I mean, it's too big a success, essentially. Okay. I was reminded of my recent discovery and reporting of the privacy preserving. I mean explicitly and deliberately privacy preserving and, and unfiltered conversational AI, which we talked about a couple of weeks ago. Venice AI. When I saw Ars Technica's headline, OpenAI slams court order to save all ChatGPT logs, including deleted chats, with the subhead OpenAI defends privacy of hundreds of millions of Chat GPT users. Yikes. And when ours says all chappie chat GPT logs, they mean all of every user's chat GPT logs, not just those of selected users, not just users that some court order might say, you know, like under subpoena, you must save. So this is everyone's chat GPT interactions, period.
Leo Laporte
Even if you explicitly say delete this interaction, which is, yes, big problem here.
Steve Gibson
They are not legally able, they are not currently legally allowed to actually delete people's chats. So it seems clearly better for ChatGPT to never have any logs to save in the first place, which is one of the features of that Venice AI service. To understand what's going on here, that I think the details are worth sharing. So here's what ours reported. They said OpenAI is now fighting a court order to preserve all chat GPT user logs, including deleted chats and sensitive chats logged through its AI business offering after news organizations suing over copyright claims accused the AI company of destroying evidence. OpenAI explained in a court filing demanding oral arguments in a bid to block the controversial order, quote. Before OpenAI had an opportunity to respond to those unfounded accusations, the court ordered OpenAI to, quote, preserve and segregate all output log data that would otherwise be deleted on an ongoing. I'm sorry, on a going forward basis until further order of the court. In essence, the output log data that OpenAI has been destroying, unquote. In the filing, OpenAI alleged that the court rushed the order based only on a hunch raised by the New York Times and other news plaintiffs. And now without any just cause, OpenAI argued the order, quote, continues to prevent OpenAI from respecting its users privacy decisions, unquote. That risk extended to users of ChatGPT, Free plus and Pro, as well as users of OpenAI's application programming interface. OpenAI said the court order came after news organizations expressed concern that people using Chat GPT to skirt paywalls might be more likely to delete all their searches to cover their tracks. What? Okay, I mean, even that seems kind of far fetched to me. Do people even know that this what they're getting from Chad GPT was skirted a paywall OpenAI said that evidence to support that claim, News Plaintiffs argued, was missing from the record because so far Open AI had only shared samples of chat logs that users had agreed that the company could retain. Okay, they're being responsible, right? Respecting their their users Privacy concerns Sharing the News plaintiff's concerns the judge, Ona Wang, ultimately agreed that OpenAI likely would never stop deleting that alleged evidence absent a court order granting News Plaintiff's request to force the preservation of all chats. OpenAI argued that the May 13 order was premature and should be vacated until, quote, at a minimum, news organizations can establish a substantial need for OpenAI to preserve all chat logs. They warned that the privacy of hundreds of millions of chat GPT users globally is at risk every day that the sweeping, unprecedented order continues to be enforced. OpenAI, OpenAI argued. As a result, OpenAI is forced to jettison its commitment to allow users to control when and how their ChatGPT conversation data is used and whether it is retained. Meanwhile, there's no evidence beyond speculation yet supporting claims that OpenAI had intentionally deleted data. OpenAI allegedly and supposedly there is not a single piece of evidence supporting claims that copyright infringing ChatGPT users are more likely to delete their chats. And to me that seems reasonable. OpenAI argued. OpenAI did not destroy any data and certainly did not delete any data. In response to litigation events, the order appears to have incorrectly assumed the contrary. At a conference in January, Wang, the judge, raised a hypothetical in line with her thinking on the subsequent order, she asked OpenAI's legal team legal team to consider a ChatGPT user who found some way to get around the paywall and was getting the New York Times content somehow as the output. If the user then hears about this case and says, oh whoa, you know I'm going to ask them, delete all of my searches and not retain any of my searches going forward, the judge asked, wouldn't that be directly the problem that the order would address? OpenAI does not plan to give up this fight, alleging that the News plaintiffs have fallen silent on claims of intentional evidence destruction and the order should be deemed unlawful. For OpenAI, risks of breaching its own privacy agreements could not only damage relationships with users, but could also risk putting the company in breach of contracts and global privacy regulations. Further, the order imposes significant burdens on OpenAI, supposedly forcing the ChatGPT maker to dedicate months of engineering hours at substantial costs to comply, OpenAI claimed. It follows then, that OpenAI's potential for harm far outweighs news plaintiff speculative need for such data, OpenAI argued. Quote While OpenAI appreciates the court's efforts to manage discovery in this complex set of cases, it has no choice but to protect the interests of its users by objecting to the preservation order and requesting its immediate vacature, OpenAI said. Millions of people use ChatGPT daily for a range of purposes, OpenAI noted, ranging from the mundane to profoundly personal. People may choose to delete chat logs that contain their private thoughts, OpenAI said, as well as sensitive information like financial data from balancing the house budget or intimate details from workshopping wedding vows. And for business users connecting to OpenAI's API, the stakes may be even higher, as their logs may contain their company's most confidential data, including trade secrets and privileged business information. Given that array of highly confidential and personal use cases, OpenAI goes to great lengths to protect its users data and privacy. OpenAI argued it does this partly by honoring its privacy policies and contractual commitments to users. And the article goes on, but you know, anyone, everyone has the idea. So anyway, it's a mess. The bottom line is that for the time being, and since this began, no one's Chat GPT logs have actually been deleted. Since May 13, they've been forced by court order to retain everyone's everything. And I don't mean to make more of this than it is. I'm not suggesting that we should be terrified. I have no doubt that Chat GPT will treat them, these logs with, you know, as much respect as possible. But deleted, you know, needs to be put in air quotes. It doesn't actually mean now that it's truly gone. So for what it's worth, if you are someone who cares about maintaining as much absolute privacy as possible, you know, you'll want to look at something such as this Venice AI whose entire architecture is designed in TNO mode so that they never have any logs to either keep or delete. I should mention though that I have compared. After I Talked about Venice AI, I did some side by side comparison against OpenAI's O3 model, which blows Venice AI away.
Leo Laporte
E3 blows pretty much everyone away. It's pretty amazing.
Steve Gibson
It's just, it's astonishing. Yeah. Yeah. So it's not like they are at parody, but unfortunately Chat GPT, being the big guy in town, has become a target of the advertisers, I mean of, of the content producers and they're saying, hey, you know, our content's being slurped up and users are getting it for free by asking Chat GPT what happened today?
Leo Laporte
What model? Oh, it's all using open source models. Venice is like Llama and fl.
Steve Gibson
Yeah. And. And actually it's distributed open source and they're not using the Chat GPT API they're using. They can't, obviously. Right, yeah, exactly. Yeah, yeah. Because they are completely uncensored.
Leo Laporte
Actually, somebody can, which is Apple. Apple claims that they don't send any information to Chat GPT GPT when you use it on an iPhone. So presumably you could use Chat GPT, maybe not its strongest models, but you could use it on an Apple device.
Steve Gibson
Because, I mean, especially what we heard at the WWDC yesterday. They're all like, they're engaging Chat GPT all over the place.
Leo Laporte
Yeah, but it doesn't. What it's. It'll send the prompt. It has to, but it won't send any personal information. So they've made a deal obviously of some sort with chat GPT with OpenAI to do that. Yeah.
Steve Gibson
You mean it won't identify who you are? Right to. So. So it's anonymizing your prompt.
Leo Laporte
It can't.
Steve Gibson
Right, but.
Leo Laporte
So if you send it your tax returns, you're out of luck. But if you set it just a simple prompt, it doesn't know who it is.
Steve Gibson
Got it. Okay. Erlang. I don't know anybody who uses Erlang, but when. When you get a CVSS of 10.0.
Leo Laporte
Oh, that's not good.
Steve Gibson
It. You know, the four people who do use it really need to pay attention.
Leo Laporte
It's actually widely used in. And because it was written by Ericsson for a mobile phone, so there are a lot of embedded and interesting uses of relying.
Steve Gibson
In that case, CBS10 is a big deal. Yeah. And it's on an server, so it's an authentication bypass. It got a 10.0. That's the official CVSS. The description says Erlang OTP is a set of libraries for the Erlang programming language prior to versions. Now there are. There are three version threads. 27, 26 and 25. 3.2.20. Those versions are safe. Prior to those, an SSH server may allow an attacker. And when you know, you know, we know that when they say May, that means we gave it a 10.0. Read between the lines. It's not much of a may. Probably actually should say an SSH server already did allow an attacker.
Leo Laporte
It did it. Yes.
Steve Gibson
The attacker already has what they want to perform unauthenticated remote code execution. They have in parens rce. By exploiting a flaw in SSH protocol message handling, a malicious actor could, and we know they mean did, gain unauthorized access to affected systems and execute arbitrary commands without valid credentials. A temporary workaround involves pulling the plug. No, involves disabling the SSH server or to prevent access via firewall rules. Meaning don't let anybody use your SSH server anyway. Even though no one talks about using Erlang, as I wrote in the show notes, apparently it's out there. And Leo, you've confirmed that Ericsson mobile phones. Does Ericsson still make mobile phones?
Leo Laporte
No, but they made Erlang, so there.
Steve Gibson
You go, they made Erlang.
Leo Laporte
OTP implies it's the one time password. Oh no, that's actually the name of Erlang is Erlang otp. Okay, so it's not a library, it's erlang. Okay, wow.
Steve Gibson
Anyway, 10.0 kitties, so unplug it if you got it.
Leo Laporte
Holy cow.
Steve Gibson
Yikes. Yes Can Russia intercept Telegram messages? There's a report that appears to allege that Russia now has some means for intercepting telegram messages. My most pressing question is whether this applies to two party one to one messages. Here's what the reporting says. Human Rights Russian human rights NGO known as First Department warned on Friday, just this past Friday, that Russia's Federal Security Service, the infamous fsb, has learned to intercept messages set by Russians to bots or feedback accounts associated with certain Ukrainian telegram channels, potentially exposing anyone communicating with such outlets to treason charges. Russia's principal domestic intelligence agency, Again, FSB has gained access to correspondence made with Russia with Ukrainian telegram channels, including Crimean Wind and Vision Vishnun According to First Department, which said that the FSB's hacking of Ukrainian telegram channels had come about during a 2022 investigation into the Ukrainian intelligence agencies gathering information that threatens the security of the Russian Federation, unquote, via messengers and social networks, including telegram. The case is being handled by the FSB's investigative department, though no suspects or defendants have been named. In the case of According to First Department, when the FSB identifies individual Russian citizens who have communicated with or transmitted funds to certain Ukrainian telegram channels, it contacts the FSB office in their region, which then typically opens a criminal case for treason against the implicated person, First Department said. We know that by the time the defendants in cases of state treason are detained, the FSB is already in possession of their correspondence, and the fact that neither defendants nor a lawyer are named in the main case allows the FSB to hide how exactly it goes about gaining access to that correspondence. First Department stressed that their findings highlighted the various security risks inherent in using Telegram for confidential communication, especially in cases where the contents of such private messages could result in criminal charges. Dmitry Zerbek, the head of First Division, said that materials from Telegram have already been used as evidence in, quote, a significant number of cases, unquote, adding that in most cases they have been accessed due to compromised devices. However, there are also cases in which no credible technical explanations consistent with known access methods can be identified. So this guy does sounds like he knows what he's talking about. He said this could indicate either the use of undisclosed cyber espionage tools or Telegrams cooperation with the Russian authorities, obvious signs of which we see in a number of other areas. So, you know, we've been watching. Pavel Durov's previously adamant stance softens somewhat over time, particularly after he was arrested and convicted in France last summer. He's allowed Telegram, you know. I'm sorry, has he allowed Telegram to be compromised? You know, it's certainly not a messaging system that can be trusted. And remember that an audit of its homegrown crypto technology did raise additional concerns several months ago. So it's not what I would recommend. Anybody use Leo? What I would recommend. Oh, everybody.
Leo Laporte
Yes, I is our next sponsor is.
Steve Gibson
What do you know?
Leo Laporte
And if you knew what our next sponsor was, you would recommend it, because our next sponsor is the Things Canary. Yay. Yay.
Steve Gibson
Monitoring is what you got to do.
Leo Laporte
Oh, I love this. It's a honey pot. You can easily configure. Here's my. This is my Things Canary. And as you can see, it's not big. It's the size of a, I don't know, external USB drive. It's got an ethernet port on it, power connection. That's it. You plug it in, you put it on your network, you register it with a console, and then let me fire up the console because this is. Oh, somebody's been scanning my ports, it looks like quite frequently as well. Wonder who that could be. Let's just check in on this alert. Oh, it's coming from inside the house. It's actually the I've got a fring or thing, I guess it's called running on my computer, which is called left. That's why it's saying left. And fing is doing port scanning from time to time of all the devices on our network. So this is an example of. I turned on port scanning just so. So you could see it. I turned on the port scan, look. Which is Actually, a good thing to have on. You see, there's 13 pages of this, but these are just the alerts from this thing. Scanning it now, it says you have more alerts than normal. You like to mass acknowledge them. Yes, because I know this is not a threat. I know exactly who this is. I could turn it off. And there are no new alerts. I just showed you that. Because normally I have that turned off just so that you could see that it does work. There's the Thinx Canary hardware, which I've got right here. Currently a Windows server, but there's also Canary tokens. These are files you can create with the things Canary that you can put anywhere you want. I've got four tokens. One is. These are on the Google Drive, so you can. This is the interesting thing. It's not just your local hard drive. If you want to say, hey, is somebody snooping around my Google Drive? You could put tokens there. You could put tokens almost anywhere. What can a token be? Well, of course, documents. But it also can look like. You can have it be a DNS host name that alerts if somebody queries it, like yandexmetrica.com you could have it be a credit card that alerts you when somebody uses it, an AWS API key that alerts you when somebody uses that, and on and on. This is, look at this, a wired guard client config that will alert you if somebody connects to your wireguard. You know enough to know this is bogus. This is just a fake file the bad guys don't know. And that's the beauty of the things to Canary. You can have all the perimeter defenses in the world, but once somebody gets into your network, how do you know they're in there? Well, the Thinks Canary is a honeypot. It doesn't look vulnerable, it looks valuable. It's a honeypot they can't resist. Once somebody gets in to your network, maybe it's a. Maybe it's a bad guy who's penetrated your defenses. Maybe it's a malicious insider. They can't resist brute forcing that fake internal SSH server. But you'll then get the alert, an alert that says you have a problem, and you'll know exactly what the problem is. No false alerts, just the alerts that matter. You can have them sent by email, text, Slack webhooks. It supports. It supports API syslog. So pretty much any way you want to get alerted, you will get alerted. So that's the thing. You choose a profile register with a hosted Console for monitoring and notifications. And then you sit back and you wait. Attackers who breached your network or malicious insiders. Any adversaries cannot help but make themselves known because they're going to open those files, they're going to access that credit card, they're going to try to hit that fake SSH server. I think this is such a brilliant idea. A big bank might have hundreds spread all around a casino back end, you know, at every possible spot. A small business like ours might just have a handful. Let's say you needed five things. Canaries. Okay? You go to Canary Tools Slash Twit. Canary tool slash twit. Five of them cost 7,500 bucks a year. You're going to get five of them. You're going to get your own hosted console up. Upgrades, support, maintenance, everything. It's all in there for a year. Oh, I can save you a little bit. If you use the code Twit in the how did you hear about us Box, you're going to get 10% off the price. And not just for that first year, for as long as you own your canaries. So that could, that savings will really add up. If you're even the slightest bit reluctant, maybe I could throw one more thing in. They have a very generous return policy. You can always return your Thinks canary with their 2 month money back guarantee for a full refund. 60 days. I should mention that we've been doing these ads for eight years. In all the time we've done these ads, the folks at Thinkst say that refund guarantee has never been claimed. Because once you get one of these, you go, oh, I love it. I need more Visit Canary Tools Twit. Don't forget the offer code Twit in the how did you hear about us box to save 10% off for life. This is such a great idea. It's a honeypot that's easy to deploy and very effective. By the way, these guys know their stuff. It's super secure too. Canary Tools Twit. We thank them so much for their support over all these years. They're big fans of yours, Steve. That's why they, that's why they wanted to be on the show, I believe.
Steve Gibson
Eight years. Yeah.
Leo Laporte
Isn't that great? Harun and his team are fantastic. We saw them at RSAC at the RSA conference and they're just really smart guys who have created something that is super, super valuable. Isn't it fun to see? I saw all those port scans the first time I saw that. I went and then I realized, wait, A minute it's all coming from thing. So I, I turn off the, that, that monitor so it's really cool on we go with the show.
Steve Gibson
Okay, so I had to double check the date on this news when I read that Spanish ISPs had accidentally blocked Google domains while attempting to crack down on illegal soccer live streams. The double check was required, of course, because this is not the first time this has happened, nor the first time we've noted what a lame and harebrained approach it is to force specific ISPs to locally filter large chunks of the Internet for only their own subscribers. Right. I mean everybody else could can see what they want. Maybe someday we'll learn. But I don't know. I'm not holding my breath. I did did note that Reddit has sued Anthropic for scraping and using Reddit comments to train its Claude AI chatbot. And I guess this is just going to be a thing Leo, for a while. You know, we've, we've, we just talked about OpenAI in trouble with the New York Times and other plaintiffs and, and now Anthropic, you know, Reddit's upset and we know there are sites that specifically say, oh no, don't worry, AI is not allowed in. So I would just say obey those robots. Txt files, folks. You know, behave yourselves. A recent analysis of Twitter's new encrypted X Chat messaging appears to leave as much to be desired as you might imagine. The researcher who looked into it wrote. When Twitter launched encrypted DMS a couple of years ago, he wrote it was the worst kind of end to end encrypted. Technically end to end encrypted, but in a way that made it relatively easy for Twitter to inject new encryption keys and get everyone's messages anyway. It was also lacking a whole bunch of features such as sending pictures. So the entire thing was largely a waste of time, he wrote. But a couple of days ago, Elon announced the arrival of X Chat, a new encrypted messaging platform. Quote built on Rust. It actually isn't. It's written in C. Oh, with Bitcoin style encryption. Whole new architecture, unquote. What? So the guy says maybe they got it right this time. And then a little bit later he says the TLDR is no use signal. Yeah, he said Twitter can probably obtain your private keys and admit that they can man in the middle you and have full access to your metadata. So anyway, the analysis goes deeper and it was, to me, it looked kind of interesting. Might make for some additional attention and a deeper dive for the podcast. So I may return to that next week. We'll see. In the meantime, I would follow this investigator's recommendation and not assume that what Elon has brought us in this new X chat is actually secure because they apparently were in a hurry. Didn't actually write it in Rust.
Leo Laporte
And you know, that's hysterical that he would even claim that.
Steve Gibson
I know because I guess. Woo.
Leo Laporte
Rust makes it better. Makes it better. And what does it even mean to say Bitcoin style encryption?
Steve Gibson
I don't know.
Leo Laporte
Is it. Bitcoin's not encrypted, by the way.
Steve Gibson
Exactly. It's a public ledger that everyone can look at.
Leo Laporte
So I guess what they're admitting is oh yeah, we. There's no encryption, but I think it.
Steve Gibson
Just like throw in some more buzzwords, maybe the messages.
Leo Laporte
All the DMs are put on the blockchain.
Steve Gibson
You would think he would have been in Dogecoin, but I guess not. So. Yeah, Meanwhile, Thunder Mail, the worst named service ever, please. We'll have email servers located in the European Union for increased privacy. Yeah, okay, fine, whatever. But could you please change the damn name?
Leo Laporte
How about Lightning Mail? Do you like that better?
Steve Gibson
That's better than Thunder Mail.
Leo Laporte
It is.
Steve Gibson
I mean Thunder Mail just sounds so bad. I don't know what it is. It other happy.
Leo Laporte
It's from Thunderbird, that's why. Right.
Steve Gibson
I mean, I get it. I. Yes. And on Thunderbird, that seems fine. I don't know why you can't do a. You can't change the bird to male and have it still be good.
Leo Laporte
Something about a message and Thunder that just don't go together. I don't know. Yeah.
Steve Gibson
In other Happy news, the GAO, the U.S. government Accountability Office has a report out which incidentally noted in passing that the login.gov site service has no policy to verify that its backups are working. So a cyber attack, a mistake, or any other IT issue could completely crash the US government's entire login and identity system for I don't know, days, weeks, or even months until it's restored.
Leo Laporte
This is how I get into my Social Security account.
Steve Gibson
Yeah. Well you better log in and hope you stay logged in because apparently it could go away.
Leo Laporte
Yeah.
Steve Gibson
And Lord knows, I mean, you know.
Leo Laporte
Oh, also Global Entry. My Global Entry accounts there, my IRS account actually they use ID me. They. That really makes me nervous. They use a third party.
Steve Gibson
Yeah.
Leo Laporte
System.
Steve Gibson
Maybe it's better to send it somewhere else.
Leo Laporte
Maybe.
Steve Gibson
I would imagine ID me probably actually has backups okay, so let's take talk about the Illusion of thinking and Apple's work on this. We have one more break, but we'll get to that halfway through this.
Leo Laporte
It's a quick break. So yeah, okay. Yeah.
Steve Gibson
A couple of days ago I added an AI group to GRC's long running text only NNTP newsgroups. In my inaugural post to that group I wrote everyone I've learned not to haphazardly create groups that do not have enduring value, since it's more difficult to remove groups than to create them and and endless group proliferation is not ideal. But I think it's way beyond clear that artificial intelligence is in the process of rapidly changing the world and I cannot imagine any more important and worthwhile new group to create. Then just this past Sunday, upon discovering this just released research from Apple, thanks to feedback from one of our listeners, Urs Rao, I posted the following into this new Our brand new AI News Group. There I said the Illusion of Thinking is how the title of their well assembled paper begins. The entire title is the Illusion of Thinking Understanding the Strengths and Limitations of Of Reasoning Models via the Lens of Problem Complexity. And so I wrote in this posting into GRC's news group, is this just sour grapes engendered by Apple finding themselves behind the rest of the industry in AI deployment? I don't think so. This looks like an exploration that adds to our understanding of what we have today. And it's not suggesting that what we have today is not useful, nor that Apple might not wish they had some of their own. What it's doing is exploring the limits of what we are now calling artificial intelligence and suggesting what many of us have intuited, which is that while a massive problem space can be solved with powerful pattern matching when there are not patterns to be matched, today's systems are revealed to not be exhibiting anything like true problem understanding. In other words, Leo, your earliest take on this, which was that AI was little more than fancy spell correction, carried an essential kernel of truth onto which Apple has just placed a very fine point. I think everyone should listen carefully to what Apple's research paper Abstract explains. They wrote recent generations of frontier language models have introduced large reasoning models, LRMs that generate detailed thinking processes before providing answers. And Leo, you and I were just talking about O3 and yes, it is astonishing. They said while these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties and limitations remain insufficiently understood. And as I said a week or two ago, researchers are going to be studying what we have and it's not something that happens overnight, but we're going to begin to get answers that tell us more about what it is we have. This is one such set of answers, they wrote. Current evaluations primarily focus on established mathematical encoding benchmarks emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces, structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of of compositional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers, but also the internal reasoning traces, offering insights into how LRMs think. And they have that in air quotes. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complexity. I'm sorry, face a complete accuracy collapse. Beyond certain complexities, there's a cliff. Moreover, they exhibit a counterintuitive scaling limit. Their reasoning effort increases with problem complexity up to a point that then declines despite having an adequate token budget. Meaning we're letting you have. We're letting you think about this as much as you want. Keep going. But they don't, they wrote. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes. First, low complexity tasks where standard models surprisingly outperform LRMs. Second, medium complexity tasks where additional thinking in LRMs demonstrates advantage. And then three high complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation. They fail to use explicit algorithms and reason inconsistently across puzzles. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the model's computational behavior, shedding light on their strengths, limitations, and ultimately raising crucial questions about their true reasoning capabilities. Okay, now, as I've cautioned before, anything and everything that's believed to be known about AI definitely needs to carry a date stamp and also probably a best used by expiration date. What this means for us here is that Apple is showing us some interesting and probably previously underappreciated features of today's LRMS large reasoning models. It's worth reminding ourselves that if Apple had written the same paper a year ago, before the appearance of LRMs and only challenging LLMs, the results would have been similar, though significantly less impressive for the AI side. The question then is whether, and if so, to what degree, even larger reasoning models in the future will be able to eclipse the performance of today's large reasoning models. In other words, since what we all want to know today is what's going to happen with AI in the future? To what degree is Apple's research able to speak to any fundamental underlying limitations that might limit any future AI? That is, will this this current language linguistic neural network based approach hit a wall? To answer that question, we need to see what Apple's research discovered. Here's how Apple's researchers set up the question. They wrote large language models LLMs have recently evolved to include specialized variants explicitly designed for reasoning tasks. Large reasoning models such as OpenAI's O1 and O3, Deepseek, R1, Claude 3.7, Sonnet Thinking, and Gemini Thinking. These models are new artifacts characterized by their thinking mechanisms such as long chain of thought with self reflection, and have demonstrated promising results across various reasoning benchmarks. Their emergence suggests a potential paradigm shift in how LLM systems approach complex reasoning and problem solving tasks, with some researchers proposing them as significant steps toward more general artificial intelligence capabilities. Despite these claims and performance advancements, the fundamental benefits and limitations of LRMs remain insufficiently understood. And you know, also, they're very new, right? So, okay, critical questions still persist. Are these models capable of generalizable reasoning or are they leveraging different forms of pattern matching? How does their performance scale with increasing problem complexity? How do they compare to their non thinking standard LLM counterparts when provided with the same inference token? Compute? Most importantly, what are the inherent limitations of current reasoning approaches and what improvements might be necessary to advance toward more robust reasoning capabilities? We believe they wrote. The lack of systemic analysis investigating these questions is due to limitations in current evaluation paradigms. Existing evaluations predominantly focus on established mathematical encoding benchmarks, which, while valuable, often suffer from data contamination issues and do not allow for controlled experimental conditions across different settings and complexities. Moreover, these evaluations do not provide insights into the structure and quality of of reasoning traces. To understand the reasoning behavior of these models more rigorously, we need environments that enable controlled experimentation. In this study, we probe the reasoning mechanisms of frontier lrms through the lens of problem complexity. Rather than standard benchmarks, meaning math problems, we adopt controllable puzzle environments that let us vary complexity systematically by adjusting puzzle elements while preserving the core logic, and inspect both solutions and internal reasoning. Then we see, to my delight, the paper's diagram of one of the puzzle tests Apple's researchers chose, which is the famous Towers of Hanoi. This is a classic puzzle with very simple rules, which is what makes it such a great puzzle. I received a beautiful wooden version one Christmas when, as a child, my annoying aunt, who was always trying to stump me, thought, okay, now for Those who are not familiar, I love it.
Leo Laporte
I had the one when I was a kid too, and that's how I learned recursion. I think it's why I was able to grok recursion right away.
Steve Gibson
Yep.
Leo Laporte
Isn't that fascinating?
Steve Gibson
Yeah. For those who are not familiar, the puzzle consists of three pegs in a line with one of the pegs having a stack of disks of decreasing diameter with the largest disk on the bottom and going to the smallest disk on top. The challenge is to move all of the disks from the starting peg to the peg at the other end of the three by moving only one disk at a time from any peg to any other peg, while never placing a larger disk over a smaller disk. It's a truly lovely puzzle because that's the rules. The rules are simple, but the solution requires patience, repetition and grasping a deeper solution concept. That's what makes this such a perfect puzzle for to test reasoning. Okay, now I should note that the puzzle is also a joy to solve by computer using traditional coding methods and that the most elegant coding solution employs recursion, since this puzzle itself is deeply recursive. For anyone who has an age appropriate child or nephew, Amazon has a large selection like pages of beautifully rendered wooden and colorful versions of this famous puzzle. Now, what's so clever about Apple's choice of this puzzle is that its complexity can be uniformly scaled simply by changing the number of disks. So first imagine that we just have one disk. We can simply move it to its destination peg. If we have two disks, the smaller disk must first be placed on the middle peg so that the bottom larger disk can be placed on its destination peg at the other end of the puzzle. Then the smaller disk can join the larger disk on the end peg and the two disk puzzle is solved. Switching to three disks requires a bit more work. So visualize three pegs and three disks. The smallest disk temporarily goes onto the third destination peg. The middle disk goes to the middle peg. Now the smallest disk can go on top of the middle disk on the middle peg. This frees up the third peg to receive the largest bottom disk, which is now all alone on the original peg. So you move that over to the third peg, the middle disk, the middle size disk is then moved to. I'm sorry. The smallest size disk is then moved to the first peg, which uncovers the middle size disk which is on the middle peg, which can now be placed onto the third destination peg. And the smallest disk can then join the others to complete the stack and solve the three disk puzzle. It is quite satisfying to do this. And Note that the 2 versus 3 disk puzzle may hopefully teach the astute puzzler which peg should first receive the smallest disc based upon whether the disc count is even or odd. And that would be confirmed by solving the four disk puzzle. Now I should mention that if anyone is who is listening is planning to make a gift of one of these, please encourage its recipient to start out this way rather than just jumping into a very frustrating deep end using all of the 8 or 10 disks that these puzzles provide. Solving the puzzle with very few disks will provide the encouragement and stamina that will eventually be needed to tackle and solve this very gratifying full puzzle.
Leo Laporte
Then make them write it in Python. And now you got something.
Steve Gibson
And again, that little trick about noticing which paint to start out with will definitely save the day.
Leo Laporte
And you need it. You keep it's recursive so you need it each time we start. Yeah, the next thing. Yeah.
Steve Gibson
So I think that Apple's choice of the Towers of Hanoi is brilliant by reason of the puzzles. Lovely scalability of difficulty.
Leo Laporte
Yeah.
Steve Gibson
In all they used four different somewhat similar sequential combinatorial puzzles. Towers of Hanoi checkered jumping on a linear strip of squares, something that they call block world and also river crossing. So here's what Apple explained. They said these puzzles first, offer fine grained control over complexity. Second, avoid contamination common in established benchmarks. Third, require only the explicitly provided rules emphasizing algorithmic reasoning. And fourth, support rigorous simulator based evaluation enabling precise solution checks and detailed failure analysis. Just very clever that they did this. They said our empirical investigation reveals several key findings about current language reasoning. I'm sorry, large large reasoning models, LRMs. First, despite their sophisticated self reflection mechanisms learned through reinforcement learning, these models fail to develop generalizable problem solving capabilities for planning tasks. Yeah. And look at these charts here in the middle of page 19, Leo.
Leo Laporte
Yeah.
Steve Gibson
With performance collapsing to zero beyond a certain complexity threshold. Second, our comparison between LRMs and standard LLMs under equivalent inference compute reveals three distinct reasoning regimes. And that's what I talked about before. They said for simpler low compositional problems, standard LLMs demonstrate greater efficiency and accuracy as problem. You know like there's this overthink problem. As problem complexity moderately increases, thinking models gain an advantage. So that's what we're now seeing right in in what O3 is doing, we're seeing this improved advantage. However, when problems reach high complexity with longer compositional depth, both model types experience complete performance collapse. And we see that in the chart that I've got on page 19 on the left. They said notably near this collapse point, LRMs begin reducing their reasoning effort measured by inference time tokens as problem complexity increases despite operating well below generation limits. That's shown in the middle diagram. They said this suggests a fundamental inference time scaling limitation in LRMs reasoning capabilities relative to problem complexity. And they said finally, our analysis of intermediate reasoning traces or thoughts reveals complexity dependent patterns in simpler problems. Reasoning models often identify correct solutions early but inefficiently continue exploring incorrect alternatives, an overthinking phenomenon. At moderate complexity, correct solutions emerge only after extensive exploration of incorrect paths. And that's fair. And beyond a certain complexity threshold, models completely fail to find correct solutions. In other words, they're not really reasoning. This indicates LRMs possess limited self correction capabilities that, while valuable, reveal fundamental inefficiencies and clear scaling limitations. These findings highlight both the strengths and limitations of existing LRMs, raising questions about the nature of reasoning in these systems with important implications for their design and deployment. They then list their key contributions from this research which we're going to go into after our final break.
Leo Laporte
All right, you got me thinking. And I just ordered a Towers of Hanoi because I remember this was such.
Steve Gibson
Fondness from my childhood. It's just pleasant and gratifying. Yeah.
Leo Laporte
And once you understand it, it's pretty straightforward. But it's fun. Yeah.
Steve Gibson
But for a 5 year old or an 8 year old.
Leo Laporte
I hadn't really thought about this, but I think the fact that that was on our coffee table when I was a little kid and I did figure out how to solve it probably prepared me well for understanding recursion because you repeat the same algorithm over and over.
Steve Gibson
Exactly.
Leo Laporte
And planning because you have to start on the right peg to make it most efficient. There's a few things in there and.
Steve Gibson
You'Re able to give yourself simpler versions of it in order to kind of get the hang of it.
Leo Laporte
Right. Because you're just repeating it. Yeah. There's really nothing to say. Except if you enjoy this show and you want to support what Steve does here. Best way to do that, Join the club. Club Twit. Look, I'm a little round thing. Is this a tribute to Apple? I don't know. The club. What if I press that? The club is our way of kind of adding enhancing our revenue. Because if we started it during COVID four years ago because ad revenue was declining and we didn't want to. We didn't want to stop doing what we do, we wanted to want to do more of it. So we Said, well you know I, and this was always my thought. I always wanted this to be a listener supported network. When we first started 20 years ago, the tools weren't there to make that easy. And even though we tried, we never made quite enough money to grow and so we ended up doing ads. But I think now the tools are there. We use Memberful, which is a patreon company for our club membership. That's made it very easy for you to join the club. Well and I think the testament is that is that we have, I don't know what the last number is, but at least 12,000 people in the club. That's fantastic. We really appreciate it. Now we did recently raise the cost. Those of you who are already in the club continue to pay the price you paid when you joined. But for new members, 10 bucks a month. I think that's very fair. What do you get? Well you get ad free versions of all the shows because I always hate it when they, when they charge you and then still show you ads that always, always bugged me. So of course because we're charging you, you're supporting us. You don't need to hear the ads. You wouldn't even hear this plug for the club. You also get access to the club Twit Discord which is the members home on, on the Internet. Now you don't need to join the club to get the, I mean the Discord to get the benefits of the club. Lots of people don't. But I should tell you that the Discord is where all these special shows like yesterday's WWDC keynote, the Google I O keynote, the Microsoft Build keynote and future keynotes, these will all happen inside the club. So that's an important reason to join. Also our special shows like Friday I'll be doing Phototime with Chris Marquardt at 1pm Micah's crafting corner is June 18th. That's always fun. I'm going to do some vibe coding when Microsoft does Microsoft. Oh, Micah does some Lego. I'm going to call Micah Microsoft from now on. We also do other things. We've got Stacy's Book Club in here, the AI user group on the first Friday of every month. We will have a Stacy's Book Club soon. We haven't scheduled the date yet, that's all. But all of this happens inside the club Twit Discord. So that's another benefit. You also get the special wonderful feeling that you're supporting the work we do here at twit because about 25% of our operating revenue comes from you, our viewers and listeners. Without that, we would have to cut back, frankly. We'd have to let people go. We'd have to cut back on shows. I don't want to do that. In fact, I'd like to grow. Which is why I'd like you to join the club, if you would. Twit. TV Club Twit. You will have my eternal gratitude and I will see you in the discord. Twit.tv club twit. The 1,000,000th sign up gets a free doll, according to Pretty Fly for the CIS guy. You want that? We can make it for you. It's cute. Twit. TV Club Twit. And thanks in advance.
Steve Gibson
Hi, Zoe Saldana. Welcome to T Mobile. Here's your new iPhone 16 Pro on us.
Leo Laporte
Thanks.
Steve Gibson
And here's my old phone to trade in. You don't need a trade in. When you switch to T Mobile. We'll give you a new iPhone 16 Pro. Plus we'll help you pay off your old phone. Up to 800 bucks and you still get to keep it. There's always a trade in. Not right now. @ T Mobile. I feel like I have to give you something in return for karma. That's okay. I don't really have much in my purse. Oh, let's see. Hand sanitizer. It's lavender. I'm good. Seriously. Let me check this pocket. Oh, mints. Really, I'm fine. Oh, I have raisins. I'm a mom. Wait, wait one sec. I've got cupcakes in the car. It's our best iPhone offer ever.
Leo Laporte
Switch to T Mobile. Get a new iPhone 16 Pro with Apple intelligence on us, no trade in needed. We'll even pay off your phone up.
Steve Gibson
To 800 bucks with 24 monthly bill credits. New line 100 plus a month on.
Leo Laporte
Experience beyond Finance Agreement 999.99 and qualifying.
Steve Gibson
Ported for well qualified plus tax and 10 connection charge payout via virtual virtual prepaid card.
Leo Laporte
Allow 15 days credits and imbalance due if you pay off early or cancel CT mobile.com Back to you, Mr. Gibson.
Steve Gibson
Okay, so they say their key contributions are from the research. Yes. We question the current evaluation paradigm of LRMs on established math benchmarks and design a controlled experimental test bed. By leveraging algorithmic puzzle environments that enable controllable experimentation with respect to problem complexity, we show that state of the art LRMS O3 mini deep seq, R1 Claude 3.7 sonnet thinking still fail to develop generalizable problem solving capabilities with accuracy ultimately collapsing to zero beyond certain complexities across different environments, we find that there exists a scaling limit in the LRMS reasoning effort with respect to problem complexity, evidenced by the counterintuitive decreasing trend in the thinking tokens after a complexity point. We question the current evaluation paradigm based on final accuracy and extend our evaluation to intermediate solutions of thinking traces with the help of deterministic puzzle simulators. Our analysis reveals that as problem complexity increases, correct solutions systematically emerge at later positions in thinking compared to incorrect ones, providing quantitative insights into the self correction mechanisms within LRMs. And finally, we uncover surprising limitations in LRM's ability to perform exact computation, including their failure to benefit from explicit algorithms. We'll get to this, but at one point they told it how to do the Towers and it still could. I like they gave instructions, here's how you solve this anyway. And their inconsistent reasoning across puzzle types okay, so for those thinking or listening to this without the advantage of the performance charts in the show notes that the Claude 3.7 thinking versus non thinking model performance on the Towers of Hanoi puzzle was interesting. We talked about everyone understands the Tower of Hanoi now. Both the earlier large language model and the later large reasoning models performed perfectly, returning success 100% of the time when only one or two disks were used. And we saw how simple those were. Both models still did very well after a third disk was added, but interestingly the fancier thinking model underperformed the simpler LLM by about 4%.
Leo Laporte
That's wild.
Steve Gibson
Yeah, but when that first peg was stacked with four disks, the deeper thinking model's performance was restored, Whereas the simpler Claude 3.7 LLM collapsed to only finding the solution 35% of the time, whereas the thinking model held at 100. As the discount then increases both models above 4 both models performance continues to drop, but the LRM holds a huge lead over the LLM until they get to 8 disks. The LLM is never able to solve that one, whereas the thinking model finds the 8 disk solution about 1 out of every 10 tries. About 10% and but 10 disks is beyond the reach of ether. The full research paper has lots of interesting detail about the various models performance on the four puzzle types. I noted however, that the nature of the other three puzzles seem to be pretty much beyond the grasp of any of this so called AI. One of their more interesting findings was the appearance of what they term the three complexity regimes. Paraphrasing from the paper they wrote under how does complexity affect reasoning? They said, motivated by the observations to systematically investigate the impact of Problem complexity on reasoning behavior we conducted experiments comparing thinking and non thinking model pairs across our controlled puzzle environments. Our analysis focused on matching pairs of LLMs with identical model backbones, specifically Claude 3.7, sonnet with and without thinking and deep sync deep seq R1 versus V3. For each puzzle, we vary the complexity by manipulating problem size N where n represents the discount, the checker count, the block count, or the crossing elements. Results from these experiments demonstrate that unlike observations from math, and that's probably one of the most significant things here is that you know, we keep seeing oh this thing, these do better than a math PhD and it's like okay, how about frogs jumping over each other? Oh well, no, can't do frogs. No. So they said there exist three regimes in the behavior of these models with respect to complexity. In the first regime where problem complexity is low, we observed that non thinking models are capable of obtaining performance comparable to or even better than thinking models with more token efficient inference, meaning it's cheaper to do them. In the second regime with medium complexity, the advantage of reasoning models capable of generating long chain of thought begin to manifest and the performance gap between the model pairs increases. The most interesting regime is the third regime where problem complexity is higher and the performance of both models have collapsed to zero. Results show that while thinking models delay this collapse, they ultimately encounter the same fundamental limitations as their non thinking counterparts. I think it's important to address their decision to use puzzles as an evaluation mechanism versus math problems. They gave this a lot of thought and they wrote on the math and puzzle environments question, they wrote the following they said, Currently it is not clear whether the performance enhancements observed in recent reinforcement learning RL based thinking models. All of the LRMs we've been talking about are attributable to increased exposure to established mathematical benchmark data to the significantly greater inference compute allocated to thinking tokens order reasoning capabilities developed by RL based training that is the reinforcement learning training. Recent studies have explored this question with established math benchmarks by comparing the upper bound capabilities of reinforcement learning based thinking models with their non thinking standard LLM counterparts. They've shown that under equivalent inference token budgets, non thinking LLMs can eventually reach performance comparable to thinking models on benchmarks like Math 500 and AIM 24. We also conducted our comparative analysis of Frontier Frontier LRMs like Claude 3.7, Sonnet with and without thinking and Deep seq R1 versus V3. Our results confirm that on the Math 500 data set, the performance of thinking models is comparable to their non thinking counterparts when provided with the same inference token budget. However, we observed that this performance gap widens on the AIM24 benchmark and widens further on AIM25. This widening gap presents an interpretive challenge. It could be attributed to either increasing complexity requiring more sophisticated reasoning processes, thus revealing genuine advantages of the thinking models for more complex problems, or reduced data contamination in the newer benchmarks, particularly AIM25. Interestingly, human performance on AIM25 was actually higher than on AIM24, suggesting that AIM25 might be less complex. Yet models perform worse on AIM25 than AIM24, potentially suggesting that data contamination during the training of Frontier LRMs is occurring. That is, there's more contamination in the older models because there's been more time for the contamination to happen as compared to the newer training benchmarks or testing benchmarks. Given these non justified observations, and the fact that mathematical benchmarks do not allow for controlled manipulation or problem complexity, we turned to puzzle environments that enable more precise and systematic experimentation. Okay, so we have the very real problem of data contamination. That makes judging what these AI models are actually doing meaning that the models you know may have previously encountered the problems during their training and simply memorized the answer. So they're not actually reasoning, they're not thinking or solving new problems. They're pattern matching at a very high level and just regurgitating. But even puzzles like the Towers of Hanoi and River Crossing exist on the Internet and are also presumably in the training data. The researchers talk about this under the heading Open Questions Puzzling behavior of Reasoning Models, they write, we present surprising results concerning the limitations of reasoning models in executing exactly problem solving steps, as well as demonstrating different behaviors of the models based on the number of moves in the Tower of Hanoi environment, even when we provide the algorithm in the prompt here again, again, this is what I was talking about. In the Tower of Hanoi environment, even when we provide the algorithm to be used in the prompt so that the model only needs to execute the prescribed steps, performance does not improve and the observed collapse still occurs at roughly the same point. This is noteworthy because finding and devising a solution should require substantially more computation for search and verification than merely executing a given algorithm. This further highlights the limitations of reasoning models and verification and in following logical steps to solve a problem, suggesting that further research is needed to understand the symbolic manipulation capabilities of such models. Moreover, we observe very different behavior from the Claude 3.7 sonnet thinking model in the Tower of Hanoi environment. The model's first error in the proposed solution often occurs much later around move 100 for when you have 10 disks compared to the river crossing environment where the model can only produce a Valid solution until move 4. Note that this model also achieves near perfect accuracy when solving the Tower of Hanoi with five disks, which requires 31 moves, while it fails to solve the river crossing puzzle with just n equals 3, which has a solution in only 11 moves. This likely suggests that examples of river crossing with n greater than 2 are scarce on the web, meaning LRMs may not have frequently encountered or memorized such instances during training. In other words, it is very, very difficult to to test a these models where you need clean models that have that have not absorbed contaminating information that allows them to appear to be creating new thought as opposed to just finding something from the past. So this work by Apple's researchers is full of terrific insights that I want to commend to anyone who's interested in obtaining a more thorough understanding of where things probably stand at this point in time. I've got a link right under the title at the beginning of this in the show notes so here's what the researchers conclude. They said in this paper we systematically examine frontier large reasoning models through the lens of problem complexity using controllable puzzle environments. Our findings reveal fundamental limitations in current models. Despite sophisticated self reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds. So I'm going to repeat that since I think that's the essence of this entire paper. Our findings reveal that despite sophisticated self reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds. So the models are doing much better at doing what their simpler LLM brethren have been doing. But the difference is fundamentally quantitative, not qualitative. Apple continues, we identified three distinct reasoning regimes. Standard LLMs outperform LRMs at low complexity, LRMs excel at moderate complexity and both collapse at higher complexity. Particularly concerning is the counterintuitive reduction in reasoning effort as problems approach critical complexity, suggesting an inherent compute scaling limit in LRMs. Our detailed analysis of reasoning traces further expose complexity dependent reasoning patterns from inefficient overthinking on simpler problems to complete failure on complex ones. These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning. Finally, we presented some surprising results on LRMs that lead to several open questions for future work. Most notably, we observed their limitations in performing exact computation. For example, when we provided the solution algorithm for the Tower of Hanoi to the models, their performance on this puzzle did not improve that they gave them the answer and it didn't help. Moreover, investigating the first failure move of the models revealed surprising behaviors. For instance, they could perform up to 100 correct moves in the Tower of Hanoi, but fail to provide more than five correct moves in the river crossing puzzle. We believe our results can pave the way for further future investigations into the reasoning capabilities of these systems. And then finally, under limitations they just said we acknowledge that our work has limitations. While our puzzle environments enable controlled experimentation with fine grained control over problem complexity, they represent a narrow slice of reasoning tasks and may not capture the diversity of real world or knowledge intensive reasoning problems. You know, they're algorithmic, not knowledge based. It is notable that most of our experiments rely on black box API access to the closed frontier LRMs, limiting our ability to analyze internal states or architectural components. Furthermore, the use of deterministic puzzle simulators assumes that reasoning can be perfectly validated. So step by step. However, in less structured domains, such precise validation may not be feasible, limiting the transferability of this analysis to other more generalizable reasoning. So in other words, the only thing this is is what it is. It may or may not be more widely applicable, and it may not even have any meaning or or utility beyond the scope of these problems. There's not a great deal of real world need, you know, for stacking disks on poles after all. But for what it's worth, it does track with the intuition many of us have about where the true capabilities of today's AI falls. You know, using terms like comprehend or understand or even reason really don't seem to apply. They're used by AI fanboys. You know, maybe they're just a lazy shorthand, but I don't feel that they're helpful. In fact, I think they're anti helpful. So what I think we need is some new anti anthropomorphic terminology to accompany this new technology. There's, there's zero question that that scale driven computation has changed the world forever. Everyone is asking chat, GPT and other consumer AI more and more questions every day, and that's only going to accelerate as the benefits of this become more widely known. AI does not need to become AGI or self aware to be useful, and frankly, I would strongly prefer that it did not. To that end, I doubt that we have anything to worry about anytime soon, and perhaps not even for the foreseeable future. Thus the title of today's podcast the Illusion of Thinking. Because I believe that the fairest conclusion is that's all we have today, it's useful, but it's not thought.
Leo Laporte
Yeah. And I don't think it, you know, Anthony Nielsen's asking a legit question is if they. How much they coach the lrm, you know, you can say to it, for instance, use code. And it might well have been able to do better had it. Had they said use code. There's things you can say like think harder, that actually make a difference, but it doesn't. Doesn't change your main point, which is, no, they're not thinking maybe they can do better, but even if they did better wouldn't necessarily mean they're thinking by any means.
Steve Gibson
And I think in the same. In the same way that we under. We. We were initially astonished when these things started to, like, talk and, and appeared to understand us.
Leo Laporte
Yeah.
Steve Gibson
It's like holy to astonishing. Yeah. And. And so I think now what we're underappreciating is the amount of knowledge that is captured by these. And that when we ask them to think more, think longer, think harder, more of that captured what appears to be understanding but isn't. Actually, we get that out, we squeeze the sponge harder and we get more out of it.
Leo Laporte
And that's, of course, what these companies are doing as fast as they can because everybody's competing to come out with the smartest solution. We should also note that this paper was written with the older models from Claude. They have 4.0 out now.
Steve Gibson
Right. And as I said, also, this is all a moving target. I mean, it's absolutely. And that's really the point, though, Leo, does it matter which model, how far into the future this goes?
Leo Laporte
Probably not. You know, fundamentally, they're not thinking.
Steve Gibson
Exactly. And they don't. I don't think they're going to. I think they're just going to be able to squeeze the sponge harder and get more of the juice out and. But at some point, you know, they're not. They're not creating new juice.
Leo Laporte
Right. That's exciting times. We'll see. I don't. I don't know myself. Great. It was a great paper. I'm glad you explained it. I appreciate it, as always. I look to you every. Every week. I say, oh, I can't wait till Tuesday. I wonder what Steve's going to say about it.
Steve Gibson
And again, if anyone has a youngster around, look at how gorgeous those puzzles are. Aren't they beautiful?
Leo Laporte
Fantastic.
Steve Gibson
Yeah, yeah.
Leo Laporte
There is a story just breaking that you might be interested in. This is from the Register. Security researchers have managed to access the live feeds of 40,000 Internet connected cameras worldwide. These are not cameras intentionally made public. These are cameras improperly secured and they did it with a browser. So just, just be. The US had 14,000 feeds total, allowing the access to the insides of data centers, health care facilities, factories and more. Wow. I imagine we'll be talking about that next week. This is why you got to see every episode Security. Now we do it Tuesdays right after Mac break weekly 1:30 Pacific 4:30 Eastern 20:30 UTC. You can watch live if you remember the club you got behind the velvet rope. Access in the club, Twit, Discord. But there's also for everybody. YouTube, Twitch, TikTok, X.com, facebook, LinkedIn and Kick. So there are plenty of places you can watch. Most people don't watch live. 99% of the audience watches after the fact because it's a podcast. So we make copies available of both audio and video on our website at Twitt TV sn. Steve has actually some unique versions of the show. He's the only guy who has a 16 kilobit audio version, which if you don't know is about a quarter to that.
Steve Gibson
But.
Leo Laporte
He'S also got a 64 kilobit audio version which sounds perfectly fine. That's good quality. But it is 1/2 what we do because for technical reasons we need to do 128. He also has the transcripts. No one has those except Steve, carefully crafted by Elaine Ferris. So that's nice. Not AI, but, but, but but a real.
Steve Gibson
He's going to a a family member's graduation, by the way, at the end of the week. So this week's transcript will be a little delayed. Of course you won't know that until you're reading it and you will come to the end of the transcript. So this is why. At least now you know why.
Leo Laporte
That and the show notes and a lot more available at Steve's website, GRC.com when you get there, you might want to go to GRC.comemail because you can sign up so that Steve will not out of hand reject your emails. You'll whitelist your email address and you can even check boxes there. They're unchecked by default because Steve's a good guy. But you can, if you wish, subscribe to his newsletter, his weekly show notes newsletter you'll get usually the day before the show comes out. And also he does a very rare mailing when something new comes out, which I know you want to know about, like his DNS benchmark coming out soon.
Steve Gibson
I have something to say about that soon.
Leo Laporte
Yeah, no hurry. I'm not rushing you. I just mention it. Just saying while you're there, you might also want to pick up spin, right? This is Steve. This is the way Steve makes a living. It's his bread and butter. It's the world's best mass storage, maintenance, recovery and performance enhancing utility. Suitable for both spinning drives and SSDs. Current version is 6.1 just got updated. You can find out more and buy a copy, which you should@grc.com. there's also a YouTube channel with a video from the show. Great way to share clips. If you, you know, hear something, you say, God, I have to tell the boss about this. Or my, my aunt or my friend. Whatever. Good, good way to do it. Everybody has access to YouTube. You can clip it out in YouTube and send it to them. Also, of course you can subscribe because it is a podcast in your favorite podcast client and get it automatically. If you do that, leave us a five star review so that everybody knows how great this show is. And if you're in the club, or even if you're not in the club, you might want to subscribe to our free newsletter. We have one too. Comes out every week. Twitter TV newsletter. It's free and it will keep you up to date on what's coming up on this show and all the other shows that we do. Steve, have a great week. We'll see you next time.
Steve Gibson
We'll see you on the 17th. Yay.
Leo Laporte
Security. Now.
Podcast Summary: Security Now Episode 1029 – "The Illusion of Thinking"
Release Date: June 11, 2025
Hosts: Leo Laporte & Steve Gibson
The episode opens with a heartfelt tribute to Bill Atkinson, a pivotal figure in Apple's history and a member of the original Macintosh development team. Steve Gibson shares his admiration for Atkinson, highlighting his contributions to early Mac software such as QuickDraw, MacPaint, and HyperCard. Atkinson's untimely passing due to pancreatic cancer has resonated deeply within the tech community.
Notable Quote:
"Bill was a principal designer and developer of the GUI for Apple's Lisa and later became one of the first 30 members of the original Apple Mac dev team."
— Steve Gibson [13:38]
Leo Laporte reminisces about his interviews with Atkinson, emphasizing the generosity and brilliance of his late friend.
Steve Gibson delves into a concerning discovery involving Meta (Facebook) and Yandex. Recent research unveiled that native Android apps from both companies are silently listening on fixed local ports to track user activities. This method enables them to link mobile browsing sessions and web cookies directly to real-world user identities, effectively bypassing traditional privacy protections.
Key Points:
Notable Quotes:
"Meta has been up to... the design and installation of these covert backdoors in their apps which can only have the purpose of communicating with Matching user tracking web scripts spread across 5.8 million Internet sites."
— Steve Gibson [56:28]
"This is an interesting and extremely privacy invasive hack."
— Steve Gibson [38:05]
Discussion Highlights:
The European Union has launched its own DNS service named "Join DNS 4 EU," aiming to provide secure and privacy-focused DNS resolvers as an alternative to foreign services like those based in the US.
Key Features:
Notable Quotes:
"These EU resolvers include built-in DNS filters for malicious and malware-linked domains that is filtering them out that prevent users from connecting to known bad sites."
— Steve Gibson [65:17]
Discussion Highlights:
Steve and Leo cover a series of recent security-related news:
Reddit vs. Anthropic: Reddit has initiated legal action against Anthropic for scraping and utilizing Reddit comments to train its Claude AI chatbot.
Twitter's X Chat Security Flaws: A researcher criticized Twitter's encrypted X Chat messaging platform, suggesting vulnerabilities that allow Twitter to intercept private keys and metadata, rendering the encryption ineffective.
Servicing Issues:
Erlang OTP Vulnerability ([92:33] – [95:07]): A critical CVSS score of 10.0 was assigned to a vulnerability in Erlang OTP's SSH library, allowing unauthenticated remote code execution. Users are advised to disable SSH servers or implement firewall rules as a temporary workaround.
Notable Quotes:
"A malicious actor could... gain unauthorized access to affected systems and execute arbitrary commands without valid credentials."
— Steve Gibson [93:51]
"They [Meta and Yandex] are thinking, they are not your friends, amoral."
— Steve Gibson [58:16]
Discussion Highlights:
Telegram Message Interception: Reports indicate that Russia's FSB has the capability to intercept messages sent to certain Ukrainian Telegram channels, potentially leading to treason charges against Russian citizens.
EU's DNS Service Performance: While the EU-based DNS service performs excellently within Europe, it suffers from high latency for users outside the EU, making it impractical for global use.
Erlang OTP SSH Vulnerability: The critical vulnerability poses significant risks for systems utilizing Erlang OTP for SSH services, emphasizing the need for immediate mitigation measures.
A significant portion of the episode is dedicated to discussing Apple's recently released research paper titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity."
Key Insights:
Research Objective: Apple aims to assess whether current large reasoning models (LRMs) genuinely exhibit reasoning capabilities or merely perform advanced pattern matching.
Methodology: The study utilized controlled puzzle environments, such as the Towers of Hanoi, with adjustable complexity to evaluate both final answers and internal reasoning processes of various models, including OpenAI's O3, Deepsea R1, Claude 3.7, Sonnet Thinking, and Gemini Thinking.
Findings:
Three Performance Regimes:
Reasoning Effort: Surprisingly, as problem complexity increases beyond a certain point, LRMs reduce their reasoning tokens despite having sufficient resources, suggesting a scaling limitation.
Reasoning Traces: Analysis revealed that LRMs often explore incorrect solutions even after identifying the correct one, leading to inefficient problem-solving and eventual failure in high-complexity scenarios.
Notable Quotes:
"Despite sophisticated self-reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds."
— Steve Gibson [135:07]
"This suggests LRMs possess limited self-correction capabilities that, while valuable, reveal fundamental inefficiencies and clear scaling limitations."
— Steve Gibson [135:12]
Discussion Highlights:
Towers of Hanoi Analysis: Both standard LLMs and LRMs performed flawlessly with up to three disks. However, as the number of disks increased, LRMs outperformed standard LLMs up to eight disks, after which both models failed to solve the puzzle reliably.
Concept of "Illusion of Thinking": Apple's research underscores that while LRMs appear to reason through complex problems, their capabilities are limited by inherent scaling issues and reliance on pattern matching rather than genuine understanding.
Implications for AI Development: The study raises critical questions about the true reasoning abilities of current AI models and highlights the need for advancements beyond merely increasing computational power or training data.
Conclusion: Apple's research provides a sobering perspective on the limitations of today's AI models. While they offer impressive performances in controlled scenarios, their inability to generalize reasoning across varying complexities questions the notion of their "thinking" capabilities. This work encourages the tech community to reevaluate the metrics and benchmarks used to assess AI reasoning.
The hosts touch upon a recent vulnerability where security researchers accessed live feeds of 40,000 internet-connected cameras worldwide. These cameras were improperly secured, exposing sensitive environments like data centers, healthcare facilities, and factories.
Notable Quote:
"Security researchers have managed to access the live feeds of 40,000 Internet connected cameras worldwide. These are not cameras intentionally made public."
— Leo Laporte [162:19]
Discussion Highlights:
Scope of the Breach: The compromised cameras were inadvertently exposed due to security misconfigurations, highlighting the ongoing challenges in securing IoT devices.
Potential Risks: Unauthorized access to such cameras can lead to privacy invasions, espionage, and other security threats affecting both individuals and organizations.
Preventative Measures: Emphasizes the importance of securing all internet-connected devices with robust authentication mechanisms and regular security audits.
Leo Laporte and Steve Gibson conclude the episode by promoting their respective platforms and encouraging listeners to engage with their content through various channels such as YouTube, Discord, and their websites. They also briefly mention upcoming topics and express their commitment to providing in-depth security analyses in future episodes.
Final Thoughts:
Episode 1029 of Security Now offers a comprehensive exploration of current cybersecurity threats, privacy invasions by major tech companies, and critical evaluations of AI reasoning models. The hosts adeptly navigate complex topics, providing listeners with valuable insights and actionable information to safeguard their digital lives.
For more detailed information, transcripts, and show notes, listeners are encouraged to visit the Security Now website.