The 404 Media Podcast Summary
Episode: "Your Bluesky Posts Are Probably Training AI"
Release Date: December 4, 2024
Introduction
In this episode of The 404 Media Podcast, host Joseph, along with co-founders Sam Cole, Emmanuel May, and remote contributor Jason Clever, delve into pressing issues surrounding data privacy on BlueSky—a decentralized social media platform—and explore the aftermath of Redbox’s bankruptcy. The discussion offers a comprehensive look into how user data is being exploited for AI training and the broader implications for digital privacy and electronic waste management.
BlueSky Data Scraping and AI Training
The Initial Scraping Incident
The conversation kicks off with Sam Cole detailing a significant data scraping incident on BlueSky. A machine learning librarian at Hugging Face released a dataset comprising 1 million BlueSky posts, including user handles, timestamps, images, and post content (02:37). This non-anonymized compilation sparked severe backlash within the BlueSky community.
Sam Cole remarked, “People really lost their shit. ... some were comparing it to rape” (03:45), highlighting the intensity of user reactions against unauthorized data usage.
Public Reaction and Privacy Concerns
Joseph adds context by comparing this incident to actions taken by platforms like Twitter (now X), where Elon Musk asserted the right to scrape user data for machine learning purposes. This led many users to migrate to BlueSky, hoping for a more privacy-conscious environment. However, the open nature of BlueSky made it a target for similar data scraping efforts.
Sam Cole explains, “When people find out that their data is being used in ways that they didn't realize or didn't consent to... they get pissed off” (05:03). This sentiment underscores a universal concern across social media platforms regarding data privacy and consent.
Proliferation of Multiple Datasets and Trolling
Following the initial dataset release, the backlash on BlueSky triggered a wave of additional data scraping, often driven by trolling. Sam notes the emergence of larger datasets, including 2 million, 8 million, and even 300 million BlueSky posts (08:39). These datasets were sometimes created with malicious intent, aiming to aggravate the already frustrated user base.
A notable moment occurred when a 300 million post dataset included a disparaging message urging users to either accept data scraping or cease using social media entirely—a stance Sam found both “hilarious” and “wrong” (10:50).
The Double-Edged Sword of BlueSky’s Openness
BlueSky’s decentralized and open protocol offers users greater control over their content, distinguishing it from platforms like Twitter and Threads. However, this same openness makes it vulnerable to extensive data scraping.
Sam Cole describes BlueSky as “a double-edged sword” (11:54). While users appreciate the ownership and portability of their data, it simultaneously exposes them to risks of unauthorized data harvesting.
Legal and Ethical Implications
The episode delves into the murky legal landscape surrounding data scraping. Sam emphasizes the ambiguity surrounding laws like GDPR and copyright protections, noting that the ethical considerations often lag behind legal frameworks.
“People just don’t know what ground we're standing on...” Sam states, highlighting the uncertainty and lack of clear regulations governing data usage (13:20). This legal gray area fosters a “get it while you can” mentality within the AI and machine learning communities, further complicating the issue.
BlueSky’s Stance and Future Measures
BlueSky has publicly committed not to use user data for training AI, setting it apart from other platforms. However, Sam points out the inherent challenges in enforcing this stance given the platform’s open nature.
“They are thinking about how to work better consent tools basically into the platform...” Sam explains BlueSky’s potential response to enhance user consent and control over data scraping (15:44).
Despite these efforts, the lack of private accounts exacerbates the problem, leaving users with limited options to protect their data. Sam asserts that combating unauthorized scraping requires a collective ethical stance rather than solely relying on technical or platform-based solutions.
RedBox and the Removal Team
Background on RedBox
Jason Clever transitions the discussion to RedBox, the DVD rental kiosk company that once dominated physical media rentals across the United States. RedBox’s low-cost rental model made it a staple alongside Blockbuster and a competitor to Netflix’s mail service.
Bankruptcy and Abandonment of Kiosks
Approximately two to three years prior, RedBox was acquired by Chicken Soup for the Soul Entertainment, which ventured into streaming services—a move that ultimately failed. This led to RedBox filing for Chapter 7 bankruptcy earlier this year, resulting in the abrupt shutdown and abandonment of around 20,000 kiosks nationwide.
Community Response and Reverse Engineering
Despite their bankruptcy, many RedBox kiosks remain functional. A vibrant community on Discord has emerged, focused on reverse engineering these machines. Members creatively repurpose the kiosks, turning them into personal media servers or tinkering with their hardware.
Jason shares his experience observing the RedBox Removal Team, a group affiliated with Junk Luggers Corporation, a franchise specializing in hoarder cleanouts. During a site visit to a Dollar General in Southern California, he witnessed the dismantling of a RedBox kiosk. The team struggled with accessing the last bolt securing the machine, ultimately resorting to brute force to remove it (37:28).
Implications on E-Waste and Device Lifecycle
Jason reflects on the broader implications of RedBox’s demise, highlighting the environmental and logistical challenges posed by electronic waste. The episode underscores the ecological and environmental costs of rapid technological obsolescence and the responsibilities of companies and communities in managing E-waste.
He emphasizes the fascination and importance of electronics recycling, encouraging listeners to explore and understand the lifecycle of their devices. Jason calls for increased awareness and engagement with recycling initiatives to mitigate the negative impacts of discarded technology (39:24).
Insights and Conclusions
The episode concludes with reflections on the interconnectedness of data privacy and electronic waste. While BlueSky’s open platform champions user control, it inadvertently fosters vulnerabilities to data exploitation. Similarly, RedBox’s physical kiosks illustrate the tangible consequences of technological advancements and corporate failures on everyday life and the environment.
Sam Cole encapsulates the ethical dilemma: “We just need to like, socially agree that this is shitty if that's the way this is going to go” (18:06). This call for a collective moral stance underscores the need for societal consensus in addressing the challenges posed by both digital and physical technological infrastructures.
Notable Quotes
- Sam Cole (03:45): “People really lost their shit. ... some were comparing it to rape.”
- Joseph (05:03): “It's like not some kind of like special phenomenon that's happening with Blue Sky that people are mad that their stuff is being used without their consent.”
- Sam Cole (10:50): “If you don't want to be scraped, basically don't post to social media, period, log off or start a blog... that's not really a solution and it's also wrong and dumb.”
- Sam Cole (11:54): “BlueSky is a double-edged sword.”
- Sam Cole (13:20): “It's not super logical. It's mostly kind of an excuse in my opinion to do the thing.”
- Sam Cole (15:44): “They are thinking about how to work better consent tools basically into the platform.”
- Sam Cole (18:06): “We just need to like, socially agree that this is shitty if that's the way this is going to go.”
Final Thoughts
This episode of The 404 Media Podcast sheds light on the intricate balance between technological openness and privacy, alongside the tangible impacts of corporate failures on communities and the environment. By examining BlueSky’s data scraping issues and the RedBox kiosk fallout, the podcast underscores the pressing need for ethical considerations and sustainable practices in both digital and physical realms.
For those interested in a deeper dive, including visual materials and extended interviews, subscribers can access bonus content at 404media.co.
Note: Times in brackets correspond to the transcript timestamps for reference.
