Loading summary
Sponsor Voice
This episode is brought to you by the Electronic Frontier Foundation. The election is over. The United States and the world might be headed into uncharted territory. That means defending digital privacy is more important than ever. Consider that Google search data can tell the police if you looked up the address of an immigration attorney. Or the fact that online chat logs can reveal that you talked about an abortion. What was often benign data before could quickly become potentially criminal evidence now. For more than 30 years, the activists, technologists and attorneys at the Electronic Frontier foundation have worked to protect your privacy and your ability to control your personal data. Data that can reveal where you go, who you talk to, who you love, and what information you seek online. EFF defends user privacy, free expression and innovation, regardless of the obstacles. In these uncertain times, their work is more crucial than ever. Learn more@eff.org 404 Media support the Electronic Frontier foundation because in a world where your data can be used against you, defending digital freedom is defending yourself. That's eff.org 404Media.
Joseph
Hello, and welcome to the 404Media podcast where we bring you unparalleled access to hidden worlds, both online and IRL. 404 Media is a journalist founded company and and needs your support. To subscribe, go to 404 Media Co as well as bonus content every single week. Subscribers also get access to additional episodes where we respond to their best comments. Gain access to that content@404Media co. I'm your host, Joseph, and with me are 404Media co founders Sam Cole. Hey, amazing timing on that. Emmanuel May. Hello. Hello. And Jason Clever will also be with us, but he's going to call in remotely for the second segment. That was like a dub haul. That was great.
Sam Cole
It was right in front of my apartment, literally, as you introduced me, and now it's gone. So that was nice.
Joseph
All right, let's keep going. These first couple of stories that Sam wrote, all about bluesky and scraping and data and privacy, the one you've just published, the headline is, your bluesky posts are probably in a bunch of data sets. Now let's go back in time though. Where does this start? It starts with a database of 1 million BlueSky posts, right?
Sam Cole
Yeah. Yeah. So last week, a machine learning librarian at Hugging Face, which is this platform that's for like open source, primarily machine learning data sets and research data and things like that, where people post these things, posted on Blue sky that he was releasing a data set made up of a million Blue sky posts and they included everything that was basically like attached to the post, like the, the person who posted id, like their, their handle or whatever, whatever they call it a blue sky. I don't even know their timestamp. Any kind of like images that were included would say whether there were imagery and all of that. And of course the content of the post. And people freaked out. People really lost their shit. Like, I'm not exaggerating, people were like, this man should be thrown in jail. And worse, people were, I mean, quite late comparing it to like rape. Yeah, it was really crazy.
Joseph
Yeah. I think just to add one other bit of context, you say they're from Hugging Face. What did they say that this database of 1 million blue sky posts they made, what did they say that was for exactly?
Sam Cole
They said it was for machine learning research, which could mean a lot of things, right? Like data sets from social media are compiled all the time. It's not usually something that people compile in a non anonymized way. So that definitely made this different. And yeah, it was just kind of put on a hugging face just to be like, hey, here's this data set that I made to play around with different, you know, like you could make an LLM with it in theory. You could do like anthropological research on the users of Blue sky if you wanted to. Like you could make a bot. I mean, there's just like lots of ways that you could use that kind of data that people do, you know, use social media data for all the time. But I think partially because it was not anonymous and you could track people's ideas back to their posts and because people were leaving Twitter in part because they were protesting the scraping of their data, their user data, to build LLMs.
Joseph
Right. Well, Elon Musk, or rather X, had come out and said, we reserve the right to scrape and to use whatever you post onto X. And we're going to use that for our own machine learning. They have their little chat bot called Grok, right? Which is supposed to be like a woke chatgpt or something stupid, but people don't want to contribute to that, so they go to bluesky and then somebody else is scraping all of their posts as well. I mean, I guess we should say this up top just so it's obvious. Like we know public data is going to be scraped. Lots of BlueSky users know that as well. Why do you think people reacted to this so aggressively? Because we see the data set, you cover it, that brings more attention to it. And then people voice their opinions against this person who made the data set. Why do you think the reaction was like that?
Sam Cole
I mean, I think in general, I think people, when people find out, and we've seen this happen with other platforms too, when people find out that their data is being used in ways that they didn't realize or didn't consent to or didn't look into or didn't occur to them that it could be, they get pissed off. Like that's the case across pretty much any platform. So it's like not some kind of like special phenomenon that's happening with Blue sky that people are mad that their stuff is being used without their consent. I think it is a little bit special to Blue sky in that people, like I said, people are very anti this sort of scraping in general. They're by and large, at least in the, in the worlds that I'm following and I'm still somewhat new to Blue sky, people are very anti generative AI, anti AI generally anti machine learning. I mean, it's just, it's, it kind of reaches, and I wrote about this a little bit in my behind the Blog last week. But like it kind of reaches this point point where people are just sort of dog piling. It's like they see something that they can jump into and say, hey, fuck this and they just, you know, type hey this you, and then hit reply and then move on. And that accumulated into like, I don't know, it was like 800 different replies, people saying that they did not like this. And then the guy eventually, I mean, not event, not even eventually, it was like within like a couple hours, took it down and said, you know, hey, I hadn't considered the ramifications of this, whether those ramifications were getting into trouble or actually, you know, use of people's data without their consent. He didn't really elaborate on, but he took it down, which I think people were pleased to see. And he was like, okay, I get it, I'm standing down. You know, it's gone from Huggy Face, it's gone from his Bluestar page, all that.
Joseph
So yeah, well, he took his one down. But in the same way there was a response with the people who were angry or upset on Blue Sky. There was a sort of other set of reactions which is now other people have made even more larger data sets. I mean, what are some of the ones have been made in response and what's sort of the trolling going on here? Because it's not just like, oh, hey, here's research. It's like some of these people are making data sets specifically to piss off bluesky users.
Sam Cole
Yeah, yeah. So this reaction to this one guy's post and data set went so viral on bluesky that it broke out of those usual containment zones where people are anti generative AI and bled into Twitter basically is kind of how I saw it happening because a lot of people are still on both. It's not like an either or for most people. If you're on Twitter, you're probably also on Blue sky, vice versa, or you're likely to be. So, yeah, people saw this going viral on Blue sky and on Twitter, they started posting about it there too, which turned into more people making more data sets and even bigger data sets. There was a 2 million one, there was an 8. The 2 million one was super popular, by the way, and had a lot of downloads. There's an 8 million one. There's almost 300 million one, I think is how much it was. Right.
Joseph
It's like, yeah, 298.
Sam Cole
It's like a crazy amount of posts, but this is still probably a fraction of the post total on bluesky. And we know this because the firehose that bluesky has set up to make public contains all of this data and it is public. So it's very easy to just put it all in a dataset, as a lot of people did. Like you said, they're trolling. They're. I think one of the, the, the 300 million one was like, had this like weird like rant in the description that was addressing people saying that they didn't want to be scraped and kind of saying, you know, if you, if you don't want to be scraped, basically don't post to social media, period, log off or start a blog, which I thought was hilarious because we have started a blog and we still deal with scraping every day. Every day. And it's like, that's not really. It's not a solution and it's also wrong and dumb. But like it's.
Joseph
Well, it's almost like, yeah, it's almost like the complete opposite reaction. On one side you may have people who, and I don't think anybody really believes this, but it's sort of a straw man in that on one side you may have people who are just like, what you can scrape on the public Internet, that's outrageous, that sort of thing. And on the other side there's these people saying they're almost being defeatist about it, or even more than that, they're actually enabling it and performing that scraping themselves. And you seem to have both sides of that spectrum here when it comes and you touched on this because bluesky is open. There is a fire hose of data that people can access. You can't do that really on Twitter anymore. You have to pay for API access rights and an extortionate fee to the point where researchers don't even do it anymore. I used to scrape Twitter for various purposes and it was pretty easy to spin up a Python script to be able to do that because the API was available. So that openness about bluesky, you touch on it in the piece and you call it like a double edged sword. What do you mean by that?
Sam Cole
Yeah, I mean what makes bluesky different from Twitter or threads or Facebook or any of these others is that it's decentralized and it's built on this protocol that's open and you can port your content around you. You, you own like your following. You own like what happens on there in a way that you might not on Twitter or threads or some of these others. So that's, that's what's appealing about Blue sky to people in a lot of ways. If you care about that sort of thing, that's the reason why a lot of people are on it and that's also the reason why it's vulnerable to this kind of data set collection. And yeah, I mean it's definitely, there definitely are two very passionate sides where like you said, it's like some people are just like never ever fuck no. And then other people are like, well someone's going to do it eventually and it might as well be me. I think is like the attitude that people have always had, especially with like not just tech but like it's really prevalent in like machine learning and AI in those communities where people are like, well, eventually like we saw this with deepfakes, we see this with all kinds of harms had that have to do with generative AI. It's like someone's going to do it, someone's going to have this idea eventually, so I might as well do it.
Joseph
Do you see that mentality across different things? You mentioned deepfakes then and like you know this and Emmanuel knows this as well better than me, but was that part of the thinking? Not necessarily the person who first coined the term deepfakes and started doing that, but was the idea that, well, somebody's going to do this, it might as well be me? And I almost don't quite follow the logic, but was that part of the sentiment?
Sam Cole
It's not super logical. It's mostly kind of an excuse in my opinion to do the thing Yeah, I mean that's almost verbatim what a lot of people who have developed deepfake type tools have said to us in the past and keep saying it's just this kind of. I think people are in like a free for all, grab as much as you can while the legal gray area is still gray. Whether or not these datasets are even like legal under GDPR and copyright and things like that. It's like it's all still being decided because these are problems that we hadn't had before or you know, they. Not at this scale. These are problems that have always existed. Not with AI, not with AI, not when we're talking about machine learning. So yeah, it's. It's something that's still being like litigated literally right now. So I think it's just kind of a mess. Like it's. People just don't know what ground we're standing on ever when it comes to whether this stuff is like not just putting aside like moral, ethical, safe. Like it's really not very safe to scrape like non anonymous data into a data set like this. There's a lot of like personal information out there that people are scraping in these data sets. But even like legal I think is like what everyone's kind of waiting on. And we see this in all kinds of industries, which is something I touched on in the story. But yeah, it's just right now it's like get it while you can kind of attitude in the industry.
Joseph
Yeah. And again these are independent individuals, for lack of a better way of putting it. And it's not BlueSky itself taking user data and training AI or doing machine learning or anything like that. Just to pull it into this conversation, what is BlueSky's stance on scraping and training AI with user data itself? What's their stance?
Sam Cole
So BlueSky has said in the past that they are not going to train AI on user data, which again sets them apart from Twitter or threads or any of these others that are actively using user data to train their own LLMs and things like that. But they said they won't, but there's really nothing. And they've said this also publicly, not verbatim, but in a nutshell, there's nothing stopping anyone else from doing that because of the nature of the thing. And I think they're like, they've, I think they're going to end up being forced to. And also like they've said publicly that they are thinking about how to address this because people are so put off by it. They're thinking about how to work better consent tools basically into the platform so that this sort of thing doesn't happen if you don't want it to. There's no, there's. You can't have a private account on Blue sky, which is a big problem for a lot of people. And that also opens you up to scraping if you don't have a private account. Yeah, they're, they've, they're working on it is kind of their response. But I don't know what there is to do, you know.
Joseph
Yeah, I don't know what you would do because as you say, it's open and the only way you could stop somebody scraping is by blocking the specific account, which is viewing your profile. But I mean, you don't know what account that's going to be. Presumably it's not going to be an account that says, hey, I'm a big scraper, scanning all of your posts. And also presumably, I think you could probably actually scrape it. And without an account potentially as well, you know, you can view Summer Blue sky while you're not logged in. So there isn't really anything users can do, is it? More people just have to wait to see what the company does, if anything.
Sam Cole
I guess. I'm not like, I'm. Again, it's like I'm not really sure if it's on the platform or users to do this. It's like it's. Again, it's like it's this problem of like, the people doing the bad need to stop doing the bad.
Joseph
Like, it's like they, yeah, cut out.
Sam Cole
They just, like there's, it's hard to like build. You can only build so many walls against this kind of thing. It's just something that, like, we agree generally, I think that taking people's content without their consent is usually bad. Not nice at least. But it's not illegal. So, like, is it ethical, Is it moral to do that? You know, it's like that's kind of where the argument is. So, like, this guy should have known better who did it originally. The people who are doing it again probably should, like, are doing it for the trolling and for. To get people more pissed off, which is very classic Internet behavior. So I don't know how much you can tell them to cut it out. It's just like we need to like, socially agree that this is shitty if that's the way this is going to go, you know, or we agree that it's fine and whatever and, you know, take the data and go nuts. I don't know. I'm sure there's some kind of middle ground that we can figure out. And like this is something that researchers have been doing for a long time too is using data sets and like anonymizing them and giving people the chance to opt out, which is part of like the law in like the EU and uk. People should get a chance to say I don't want this, I don't want to be part of this data set. And I don't like, I think it should be anonymized at the very least like that the bare minimum people's like username should be attached to it because again, sensitive information and also deleted posts are in these data sets. You know, it's just you don't know what you're getting in there. It's like the 300 million one is like there's like a thing on the description that's like, is it not safe for work? Yeah, for sure. There's probably like porn in there. There's definitely a hole in there. Like, you know, whatever.
Joseph
Have you ever been on Bluesky before? Like that's literally half the website.
Sam Cole
Yeah. And like I don't think it should be on the users to stop posting hole. That's kind of the ethos that we've always followed here.
Joseph
We've always followed. Yeah, we don't need to get into that. About our potential earlier names beyond 404 Media. Don't say it. Maybe we'll save that for subscribers. Maybe we've already mentioned it. All right, we will leave that there. When we come back, we'll hear from Jason about Redbox and I guess the Redbox removal team. We'll be right back after this.
Sponsor Voice
There are so many things to worry about when you're starting a business. One thing I worried about how to get 404 media set up as a legal entity. I shouldn't have worried. I found LegalZoom which walked me through every step of the process so I could cross another thing off my list. LegalZoom helps business owners like you take the first step and every step after. From reliable business formation to experience guidance in legal and tax. Setting up your business properly and remaining compliant are things you want to get right from the get go. And LegalZoom saves you from wasting hours making sense of the legal stuff. Launch, run and protect your business. To make it Official today@legalzoom.com and use promo code 404Media to get 10% off any LegalZoom business formation product excluding subscriptions and renewals expires 123124 get everything you need from setup to success@legalzoom.com and Use promo code 404Media legalzoom.com and Use promo Code 404Media. LegalZoom provides access to independent attorneys and self service tools. LegalZoom is not a law firm and does not provide legal advice except we're authorized through its subsidiary law firm, LZ Legal Services, llc. The holiday season is here, which means that if you have a business, you're probably pretty busy. I've handled an influx of new merch orders using Shopify. Simplify and empower your business and online store using Shopify's all in one solution to host your shop, make it look the way you want, manage inventory and fulfill orders. Shopify's simple but powerful backend gives you everything you need to start and manage your store so you can spend less time researching and more time selling. Upgrade your business and get the same checkout we use with Shopify. Sign up for your $1 per month trial period at shopify.com media all lowercase go to shopify.com media to upgrade your selling today. Shopify.com media Lumen is the world's first handheld metabolic coach. It's a device that measures your metabolism through your breath and on the app it lets you know if you're burning fat or carbs and gives you tailored guidance to improve your nutrition, workouts, sleep and even stress management. All you have to do is breathe into your lumen first thing in the morning and you'll know what's going on with your metabolism and whether you're burning mostly fat or mostly carbs. Then Lumen gives you a personalized nutrition plan for that day based on your measurements. I've been working out more lately and I've been trying to get better sleep. I've realized that my metabolism is my body's engine, which helps me perform well when I'm exercising and helps me feel alert and focused throughout the rest of the day. Lumen's recommendations have helped me learn more about my body and how it turns food into fuel. It helps me know when to eat and helps me know what to eat so I can make sure I'm ready for a workout and ready to face the day. So if you want to stay on track with your health this holiday season, go to lumen me/404 media to get 15% off your lumen. That's L u m e n me 404 media for 15% off your purchase. Lumen makes a great gift too. Thank you, Lumen, for sponsoring this episode. Let's face it, after a night with drinks, I don't bounce back the next day like I used to. I have to make a choice. I can either have a great night or a great next day. That is, until I found pre alcohol zbiotics. Pre Alcohol probiotic drink is the world's first genetically engineered probiotic. It was invented by PhD scientists to tackle rough mornings after drinking. Here's how it works. When you drink, alcohol gets converted into a toxic byproduct in the gut. It's this byproduct, not dehydration, that's to blame for your rough next day. Pre alcohol produces an enzyme to break this byproduct down. Just remember to make pre alcohol your first drink of the night. Drink responsibly and you'll feel your best tomorrow. I've been trying pre alcohol before drinks, and I definitely noticed a difference the next day. Last weekend, I took pre alcohol, hung out with my friends, then woke up, surfed, walked my dog, and reorganized my living room. I had an actually productive day. With the holiday season upon us, I know I'm going to be consuming a bit more alcohol than usual. With pre alcohol, I know I can stay on track and not let the season throw me off course. Go to ZBiotics.com 404 Media to learn more and get 15% off your first order when you use 404 Media at checkout. ZBiotics is backed with 100% money back guarantee, so if you're unsatisfied for any reason, they'll refund your money, no questions asked. Remember to head to ZBiotics.com 404Media and use the code 404Media at checkout for 15% off.
Dina Temple Reston
Hackers and cyber criminals have always held this kind of special fascination.
Sam Cole
Obviously, I can't tell you too much about what I do.
Joseph
It's a game. Who's the best hacker? And I was like, well, this is child's play.
Dina Temple Reston
I'm Dina Temple Reston, and on the Click Here podcast, you'll meet them and the people trying to stop them.
Sam Cole
We're not afraid of the attack. We're afraid of the creativity and the intelligence of the human being behind it.
Dina Temple Reston
Click here. Stories about the people making and breaking our digital world.
Joseph
AI machines, satellite engine ignition.
Dina Temple Reston
Click here.
Sam Cole
And liftoff.
Dina Temple Reston
Click here every Tuesday and Friday, wherever you get your podcasts.
Joseph
All right, and we are back with Jason calling in remotely. Jason, can you hear me?
Jason Clever
I've been here the whole time just.
Joseph
Listening I appreciate it. Like this voice from nowhere. You've done a couple of stories on Red, a few at this point on Redbox. This latest one is called the Red Box Removal Team, where you went on a trip with some people liberating these boxes. Step back a little bit. And you know, because I wasn't familiar with this. Can you just explain for those who don't know what is Redbox and what happened to the company?
Jason Clever
Redbox are these DVD rental kiosks that popped up all over, like the outsides of grocery stores, Walgreens, cvs, Dollar General. And it sort of, I feel like it replaced Blockbuster kind of as a physical location that you could go and rent a DVD from. And I think for a while they also did video games. The big thing about Redbox was that there was essentially no overhead. So it's like Netflix came and essentially killed Blockbuster. But Netflix was a mail service for a while and so it would take a day or two to get your Netflix rental in the mail. Whereas with Redbox it was at, you know, a convenience store around the corner from your house. So if you were deciding kind of last minute to rent a movie, you could just go there, rent it. I believe for a while it was just a dollar a day, which was a selling, big selling point. And I remember this being quite popular. Like my family used them all the time. And in a kind of amazing turn of events about maybe like two or three years ago, I don't know the exact timeline, Redbox was sold to a company called Chicken for the Soup Entertainment, which was the entertainment conglomerate arm of the Chicken Soup for the Soul self help books.
Joseph
Uh huh.
Sponsor Voice
Like, are you aware of Chicken Soup.
Jason Clever
For the Soul, Joseph?
Sam Cole
I was a big fan.
Jason Clever
Or is it an American thing?
Joseph
I've had. I have no idea what you're talking about. And I'm. But I'm glad Sam does. I mean, Sam, what's. What do you mean you're a big fan? We're a big fan, dude.
Sam Cole
Chicken Soup for the Soul. I was a very wholesome teenager, so I had Chicken Soup for the Teenage Soul. There's like a Chicken Soup for every soul you can imagine. Chicken Soup for the Cat Lover's Soul. It's like it goes on.
Jason Clever
Yeah, I was gonna go to the Dog Lover's Soul. They're sure. Other types of souls that there's, there's like hundreds of these books, Joseph. And they, they were like all over Borders and Barnes and Noble and perhaps sometimes grocery stores themselves. And it was one of the. Rather than go to Therapy. What we did in the United States was you bought one of these books for your like troubled teen. I never had one. Emmanuel, are you aware of this?
Emmanuel May
I have definitely heard the name, but I don't really know what it is other than like a series of books. I was going to say it's like what is like the. Not Miss America. Like there's like a series like American Girls, also. American Girl something. Yeah, I was going to compare it to that.
Jason Clever
The thing that I would compare it to is For Dummies, the four Dummies series. It's like Mental Health for Dummies tailored directly to you because they all have the same cover. They all. And there's like 500 different versions of them. So anyways, this company, Chicken Soup for the Soul Entertainment, purchased Redbox because they briefly got into trying to have a streaming service and trying to do movies. And shocker, this did not work. And so Chicken Soup for the Soul Entertainment went bankrupt earlier this year. And they didn't go chapter 11 bankrupt, they went chapter 7 bankrupt. Which is where. Is that the bad one? It's the bad one. It's where you go fully. You have no money and you basically go to court and you say like, hey, we're fucked, we're out, goodbye. And what happened was there's something like 20,000 of these kiosks all over the country. And you know, they were servicing these for a while, stocking them with DVDs, fixing them when they were broken. And they were paying rent to these convenience stores that had contracts with them and they filed for bankruptcy and they were like, we're out, goodbye. Like, we're not paying you any money, we're just done. And so these like 6 or 700 pound devices, like these big red steel, they seem like they're made of steel to me. Kiosks have just been totally abandoned. But the really interesting thing is that they still work. Like a lot of them still work. And so what people have been doing is they have been going and quote, unquote, renting DVDs and just keeping them. And then there was this whole community of people who decided, wait, why rent all the DVDs when I can just bring my pickup truck and an angle grinder and I can, like. Because they're all like bolted to the ground. And so they get permission from the store and they take them home and they've been reverse engineering them. There's like two or three thousand people in a discord right now who are reverse engineering these things, figuring out how they work, figuring out the different problems with them and taking them home. And I wrote one article about them. But then there's this other part of it where there is a company called Junkluggers and there's a few other ones that are essentially taking these giant kiosks and taking them to recycling centers and trashing them, more or less.
Joseph
Right. Just before we get to them and sort of your trip out in the field getting one of these boxes, you said that this community who were reverse engineering them, they go to the store, the Walgreens or whatever, they get permission and then they take the box away. But wasn't there some sort of tension around, was it Walgreens corporate learned of this and then stopped giving permission? Like what happened there?
Jason Clever
Yeah. So I think for a while I have not gotten internal Walgreens communications with managers, but it seemed like in the first few weeks after Redbox went bankrupt, Walgreens was like, oh, we need to get rid of these. And so individual store managers were just telling whoever wanted them that they could take them home. And I believe that Walgreens corporate eventually learned about this and said, please don't do that. And I have to assume that the reason is because they're like 700 pound devices that are bolted to the ground and they're connected to not just like a regular power plug, but they're connected directly to outside power in a scary like you can get electrocuted to death kind of way. And the community of people who are doing this have figured out how to do this safely. But it's entirely possible that someone could hurt themselves doing this. And I assume that they don't want that risk. And so they like Walgreens, Walmart, Dollar General, like random convenience stores that are regional that I hadn't heard of, like there's one in upstate New York, so on and so forth, have signed contracts with these junk removal companies. And so they are sending their teams out all over the country picking these things up. And then, you know, I don't think this is happening with junk luggers, but I've seen some of them end up on Facebook, Marketplace. I have seen people in the discord say, I've got a junk removal guy that I've now bought five of these from because they're essentially just selling them for scrap metal at this point and they're taking them to recycling centers and we don't need to get into it, but it's like it's better if someone takes this to their house and uses it for many years to come versus it Going to a recycling center and being shredded down into, like, little bits. It's like, there's a ton of. There's a lot of metal here. Not all of it can be reused. It's not like a great outcome, more or less. It's like huge pieces of e waste at this point.
Joseph
Yeah. So who did you tag along exactly? And sort of. What was the. What was the purpose of that?
Jason Clever
Yeah, it's very funny that you say I took a trip because it was like 20 minutes from my house. I, like, got in my car and drove to a Dollar General. But I interviewed the CEO of this Junk Luggers Corporation, which is like a franchise out. They do a lot of, like, hoarders. Like, they clean out the homes of hoarders is what the people told me. And I just said, hey, I want to see what this looks like when you, you know, remove one of these. And so I met them at a Dollar General in Southern California. They had a big truck, a big trailer, and there was three people there. And they basically, like, they disconnected the power, and then they started using an angle grinder, which is essentially like a big saw. And they started sawing through bolts that, you know, they couldn't access with their wrenches and things like that. And it was really funny because they were, like, unable to get this final bolt because you need to open the Redbox machine to remove that bolt. And they didn't have the key because Redbox has the key. And Redbox is bankruptcy. And so they, like, started bashing it with a hammer. They tried to break the lock. They tried to pick the lock. They tried to angle grind the lock. They had a crowbar that they were, like, prying the machine open with. And they just, like, couldn't get into it for maybe 20 minutes. And then literally one of the guys just pushes it over because there was only one bolt left. He just, like, shoved it and it broke the bolt. And then they were able to clear it away, which was.
Joseph
Did that not break the machine at all, pushing it over?
Jason Clever
I mean, the machine is going to go get shredded into a million pieces anyways. And so they were not really. They were very professional. Like, this is not to talk negatively about these people, but they weren't taking care to make sure that the machine was going to be in one piece.
Joseph
Yeah, they weren't taking it home to then, you know, deliver DVDs to themselves whenever they wanted. I mean, just super briefly on that before I just ask my last question. What was the deal with Twister and copies of Twister the DVD could not get taken out of the machine because of a software issue or something.
Jason Clever
Yeah, that's something that this community of, like, hackers and reverse engineers learned is that the machine would error out if you tried to rent Twisters, the first movie that came out in, like, 1996, which was incidentally one of the first DVDs ever. And now it was like this last DVD that was stuck in this, you know, red box machine. And it was just a software error that had something to do with, you know, the film not being able to be rented out after a certain date. That was hard coded into it. And so eventually someone did solve this. Like in the last couple days, someone was able to look at the source code and they figured out what was wrong with it. And then they filmed a YouTube video of themselves renting Twister. And this was a big moment for the Redbox reverse engineering community on Discord. But it is funny that you mention it. Like, a lot of people are taking this home, they're putting it in their man cave or whatever. They're putting their own DVDs into it. And then when they say, oh, like, I want to rent a DVD or I want to watch a movie, they're going and renting it from themselves in, like, their garage.
Joseph
That's sick.
Jason Clever
It's sick. It's pretty cute.
Joseph
So there's all of that. But what does this show us? I was going to say about Lost Media, but I don't think that really applies here, I guess. What does this show us about electronics and recycling and the business around that? I mean, sort of. What's your takeaway of this whole episode?
Jason Clever
Yeah, I mean, I think the interesting thing is this happens to a lot of different devices. In this case, these devices were public and very large. And so the size and weight of them makes a big difference. But there are so many, like, smart devices that. And like Internet of things devices that companies launch, the company goes bankrupt, the devices become bricked, and then they have to be destroyed. And I think it's just a very sort of public reminder about the fact that we just consume so much stuff, we consume so many devices. All of this has an ecological and environmental cost. You know, Redbox was around for something like 15 years, maybe longer than that. And so I'm not saying that this is some horrible disaster, but it does show that when companies just randomly go bankrupt, like, someone has to clean up that mess on the other side. And there is an ecosystem that does that. I mean, I think that E waste and electronics recycling and the people who manage that and work in those fields are super interesting and I've written about them a handful of times and I've been to electronics recycling centers and they're incredibly fascinating places. I don't know if anyone will ever get a chance to go, any of our listeners, but if you do, please go. It's very, very interesting. Like some municipalities have days where they open up their recycling center and it's super fascinating just to see the end of life of all of these sorts of devices. And I guess last thing I'll say is if you work in any of these, hit me up because I'm super interested in sort of what happens to our stuff after we get rid of it.
Joseph
Yeah, for sure. Please do that if you're in a position to talk about that. And I do recommend that listeners go read Jason's piece because you know, there's a lot of visual material in there and of course you can see what we're talking about. But we'll leave that there. If you're listening to the free version of the podcast, I'll now play us out. But if you're paying for a full media subscriber, we're going to talk about a bunch of action the US Government just took against data brokers. There's location data, there's credit header stuff that cybercriminals use to dox and harass people. For some reason, it all landed on one day. I'm not really sure why. You can subscribe and gain access to that content at 404 Media co. As a reminder, 404 Media is journalists founded and supported by subscribers. If you wish to subscribe to 404 Media, I directly support our work. Please go to 404 Media co. You'll get unlimited access to our articles and an ad free version of this podcast. You'll also get to listen to the subscribers only section where we talk about bonus story each week. This podcast is made in partnership with Kaleidoscope. Another way to support us is by leaving a five star rating and review for the podcast. Also, just tell a friend. Get them listening as well. That stuff really does help. This has been 404 Media. We will see you again next week.
The 404 Media Podcast Summary
Episode: "Your Bluesky Posts Are Probably Training AI"
Release Date: December 4, 2024
In this episode of The 404 Media Podcast, host Joseph, along with co-founders Sam Cole, Emmanuel May, and remote contributor Jason Clever, delve into pressing issues surrounding data privacy on BlueSky—a decentralized social media platform—and explore the aftermath of Redbox’s bankruptcy. The discussion offers a comprehensive look into how user data is being exploited for AI training and the broader implications for digital privacy and electronic waste management.
The conversation kicks off with Sam Cole detailing a significant data scraping incident on BlueSky. A machine learning librarian at Hugging Face released a dataset comprising 1 million BlueSky posts, including user handles, timestamps, images, and post content (02:37). This non-anonymized compilation sparked severe backlash within the BlueSky community.
Sam Cole remarked, “People really lost their shit. ... some were comparing it to rape” (03:45), highlighting the intensity of user reactions against unauthorized data usage.
Joseph adds context by comparing this incident to actions taken by platforms like Twitter (now X), where Elon Musk asserted the right to scrape user data for machine learning purposes. This led many users to migrate to BlueSky, hoping for a more privacy-conscious environment. However, the open nature of BlueSky made it a target for similar data scraping efforts.
Sam Cole explains, “When people find out that their data is being used in ways that they didn't realize or didn't consent to... they get pissed off” (05:03). This sentiment underscores a universal concern across social media platforms regarding data privacy and consent.
Following the initial dataset release, the backlash on BlueSky triggered a wave of additional data scraping, often driven by trolling. Sam notes the emergence of larger datasets, including 2 million, 8 million, and even 300 million BlueSky posts (08:39). These datasets were sometimes created with malicious intent, aiming to aggravate the already frustrated user base.
A notable moment occurred when a 300 million post dataset included a disparaging message urging users to either accept data scraping or cease using social media entirely—a stance Sam found both “hilarious” and “wrong” (10:50).
BlueSky’s decentralized and open protocol offers users greater control over their content, distinguishing it from platforms like Twitter and Threads. However, this same openness makes it vulnerable to extensive data scraping.
Sam Cole describes BlueSky as “a double-edged sword” (11:54). While users appreciate the ownership and portability of their data, it simultaneously exposes them to risks of unauthorized data harvesting.
The episode delves into the murky legal landscape surrounding data scraping. Sam emphasizes the ambiguity surrounding laws like GDPR and copyright protections, noting that the ethical considerations often lag behind legal frameworks.
“People just don’t know what ground we're standing on...” Sam states, highlighting the uncertainty and lack of clear regulations governing data usage (13:20). This legal gray area fosters a “get it while you can” mentality within the AI and machine learning communities, further complicating the issue.
BlueSky has publicly committed not to use user data for training AI, setting it apart from other platforms. However, Sam points out the inherent challenges in enforcing this stance given the platform’s open nature.
“They are thinking about how to work better consent tools basically into the platform...” Sam explains BlueSky’s potential response to enhance user consent and control over data scraping (15:44).
Despite these efforts, the lack of private accounts exacerbates the problem, leaving users with limited options to protect their data. Sam asserts that combating unauthorized scraping requires a collective ethical stance rather than solely relying on technical or platform-based solutions.
Jason Clever transitions the discussion to RedBox, the DVD rental kiosk company that once dominated physical media rentals across the United States. RedBox’s low-cost rental model made it a staple alongside Blockbuster and a competitor to Netflix’s mail service.
Approximately two to three years prior, RedBox was acquired by Chicken Soup for the Soul Entertainment, which ventured into streaming services—a move that ultimately failed. This led to RedBox filing for Chapter 7 bankruptcy earlier this year, resulting in the abrupt shutdown and abandonment of around 20,000 kiosks nationwide.
Despite their bankruptcy, many RedBox kiosks remain functional. A vibrant community on Discord has emerged, focused on reverse engineering these machines. Members creatively repurpose the kiosks, turning them into personal media servers or tinkering with their hardware.
Jason shares his experience observing the RedBox Removal Team, a group affiliated with Junk Luggers Corporation, a franchise specializing in hoarder cleanouts. During a site visit to a Dollar General in Southern California, he witnessed the dismantling of a RedBox kiosk. The team struggled with accessing the last bolt securing the machine, ultimately resorting to brute force to remove it (37:28).
Jason reflects on the broader implications of RedBox’s demise, highlighting the environmental and logistical challenges posed by electronic waste. The episode underscores the ecological and environmental costs of rapid technological obsolescence and the responsibilities of companies and communities in managing E-waste.
He emphasizes the fascination and importance of electronics recycling, encouraging listeners to explore and understand the lifecycle of their devices. Jason calls for increased awareness and engagement with recycling initiatives to mitigate the negative impacts of discarded technology (39:24).
The episode concludes with reflections on the interconnectedness of data privacy and electronic waste. While BlueSky’s open platform champions user control, it inadvertently fosters vulnerabilities to data exploitation. Similarly, RedBox’s physical kiosks illustrate the tangible consequences of technological advancements and corporate failures on everyday life and the environment.
Sam Cole encapsulates the ethical dilemma: “We just need to like, socially agree that this is shitty if that's the way this is going to go” (18:06). This call for a collective moral stance underscores the need for societal consensus in addressing the challenges posed by both digital and physical technological infrastructures.
This episode of The 404 Media Podcast sheds light on the intricate balance between technological openness and privacy, alongside the tangible impacts of corporate failures on communities and the environment. By examining BlueSky’s data scraping issues and the RedBox kiosk fallout, the podcast underscores the pressing need for ethical considerations and sustainable practices in both digital and physical realms.
For those interested in a deeper dive, including visual materials and extended interviews, subscribers can access bonus content at 404media.co.
Note: Times in brackets correspond to the transcript timestamps for reference.