Wed. 11/20 – AI To Read The Books - Tech Brew Ride Home

Summary

Techmeme Ride Home: Wed. 11/20 – AI To Read The Books

Released on November 20, 2024 by Ride Home Media

1. AI Training on Authors’ Works Sparks Controversy

Overview: The episode opens with a pressing issue in the literary and tech worlds: the use of authors' books to train artificial intelligence (AI) models. Microsoft has secured a deal with HarperCollins to utilize nonfiction books for training an undisclosed AI model. This collaboration allows authors to opt out, but the arrangement has stirred significant debate.

Key Points:

Microsoft and HarperCollins Agreement:
- HarperCollins has partnered with an unnamed AI firm to use select nonfiction backlist titles for enhancing AI model quality.
- Authors retain the choice to participate, aiming to balance opportunities with the protection of their work's value and revenue streams.
Industry Context:
- Technology giants like Microsoft, OpenAI, and others are actively seeking high-quality textual data from various publishers to refine their AI capabilities.
- Previous agreements include partnerships with prominent publishers such as News Corp, OpenAI, and media entities like Vox Media and Time magazine.

Notable Quotes:

Brian McCullough [00:04]: “Technology companies use an array of data from social media sites to news articles to train AI models, and companies like Microsoft are hunting for additional sources of high quality text that they can license...”

2. Author Daniel Kibblesmith Rejects AI Training Proposal

Overview: Daniel Kibblesmith, a comic author and writer for Stephen Colbert, publicly criticized the proposal to include authors' works in AI training datasets. His stance highlights the tension between authors and tech companies regarding data usage and compensation.

Key Points:

Kibblesmith’s Refusal:
- Received an offer of $2,500 to include his book in AI training for three years.
- Described the proposal as “abominable” and rejected it, emphasizing the devaluation of authors' works.
Broader Implications:
- Penguin Random House (PRH) has taken a strong stance by amending its copyright notice to explicitly prohibit AI training usage.
- This move reflects growing resistance among major publishers against the unregulated use of literary works for AI development.

Notable Quotes:

Daniel Kibblesmith [02:06]: “... the fear of robots replacing authors is a false binary. I see it as the beginning of two diverging markets...”
HarperCollins Statement: “... Technology companies use an array of data...”

3. Google Accelerates Android 16 Development

Overview: Google has unveiled the first developer preview of Android 16, marking a significant shift in its development timeline to address fragmentation issues and streamline the rollout of the new operating system.

Key Points:

Early Release Strategy:
- The Android 16 Developer Preview 1 was released in November, much earlier than previous years.
- Aims to provide developers with ample time to adapt to new APIs and behavioral changes ahead of the anticipated June 3, 2025, full release.
New Features:
- Introduction of an embedded photo picker, medical record support, and an updated privacy sandbox.
- Emphasis on platform stability, ensuring no changes to APIs or core behaviors post-platform stability milestone.

Notable Quotes:

Brian McCullough [02:07]: “Google is releasing Android 16 DP1 today so that app developers can test the new APIs and behavior changes that will arrive in next year's big update.”

4. Sony Launches Cloud Streaming for PlayStation Portal

Overview: Sony has introduced a beta version of cloud streaming on the PlayStation Portal, enhancing the gaming experience for PlayStation Plus Premium subscribers by allowing them to stream select PS5 games.

Key Points:

Streaming Specifications:
- Available in select countries to PlayStation Plus Premium subscribers.
- Requires a minimum of 7Mbps for 720p streaming and 13Mbps for 1080p quality.
Limitations:
- Initial rollout excludes features like game trials, party voice chat, and streaming of PS4 and PS3 titles.
- Child accounts are unable to access the cloud streaming service.

Notable Quotes:

Brian McCullough [07:15]: “Sony has launched cloud streaming on the PlayStation portal in beta, letting PlayStation Plus Premium subscribers...”

5. Crypto Gains Momentum with Presidential Nomination

Overview: President-Elect Trump has nominated Howard Lutnick, CEO of Cantor Fitzgerald, as his Commerce Secretary. Lutnick's background in cryptocurrency, particularly his company's role with Tether, underscores the administration's interest in integrating crypto into national commerce strategies.

Key Points:

Lutnick’s Credentials:
- Cantor Fitzgerald serves as a custodian for Tether, managing substantial US treasuries backing the USDT stablecoin.
- Recently launched a Bitcoin financing division with an initial funding of $2 billion.
Stance on Cryptocurrencies:
- Lutnick advocates for Bitcoin’s recognition as a commodity to ensure favorable regulatory treatment.
- Expressed preference for Bitcoin over other cryptocurrencies, citing its decentralized and censorship-resistant nature.

Notable Quotes:

Brian McCullough [08:08]: “Howard Lutnick has been a Custodian for stablecoin company Tether since 2021...”
Brian McCullough: “I am a fan of crypto, but let me be very specific. Bitcoin, just bitcoin.”

6. BlueSky App Surges, Narrowing Gap with Threads

Overview: BlueSky, a social media platform, has surpassed 20 million users, maintaining its position as the top app in the US App Store since November 13th. Its rapid growth is closing the user base gap with Instagram Threads.

Key Points:

User Growth Metrics:
- Achieved 15 million users on November 13th, expanding to over 20 million shortly after.
- Daily active users (DAUs) and website visits are nearly matching Threads, though Threads still leads globally.
Competitive Landscape:
- SimilarWeb data indicates a narrowing gap in DAUs between Threads and BlueSky, particularly in the US.
- Meta's Adam Mosseri has disputed the accuracy of these metrics, highlighting the limited transparency in user data.

Notable Quotes:

Brian McCullough [13:21]: “BlueSky now has more than 20 million users after hitting 15 million users just on November 13th.”

7. Government Concerns Over Smartphone Tracking and National Security

Overview: A collaborative investigation by Wired, BaeSreicher, Rundfunk, and Netpolitik.org has exposed how US data brokers are enabling the tracking of military and intelligence personnel abroad. This unregulated data collection poses significant risks to national security.

Key Points:

Investigation Findings:
- Tracked movement patterns of US military and intelligence workers in Germany using billions of location coordinates.
- Devices were traced to sensitive locations like NSA facilities, Air Force bases, and even recreational spots, revealing daily routines and routines at high-security sites.
Security Implications:
- Location data can be exploited for blackmail, espionage, and identifying vulnerabilities within military operations.
- Low-level personnel with access to critical systems represent potential weak links in security protocols.
Official Reactions:
- US Senator Ron Wyden condemned the unregulated sale of location data, emphasizing the threat to national security.
- An internal Pentagon review acknowledged the pervasive nature of data collection and its unavoidable impact on service members' privacy.

Notable Quotes:

Brian McCullough [16:23]: “A collaborative analysis of billions of location coordinates obtained from a US data broker provides extraordinary insight into the daily routines of US service members.”
Daniel Kibblesmith [18:16]: “...location data can piece a lot of secrets together on their own...”
Vivek Chilukuri at CNAS: “A system is only as secure as its weakest link...”

Conclusion: This episode of Techmeme Ride Home delves into critical intersections between technology, privacy, and national security. From the ethical implications of AI training using authors' works to the vulnerabilities exposed by unregulated data brokers, the discussions underscore the need for balanced advancements and stringent protections in the digital age.

Transcript

Brian McCullough (0:04)

Welcome to the Techmeme ride home for Wednesday, November 20th, 2024. I'm Brian McCullough. Today now, authors are being approached about training AI on their books and some are not pleased. The new Android development cadence is here. More signs crypto is ascendant. More signs that Blue sky has taken off. And a case in point for why governments and militaries are worried about smartphone tracking. Here's what you missed today in the world of tech. Bloomberg says that Microsoft has signed a deal with News Corp's publisher Harper Collins to use nonfiction books to train an unannounced AI model. HarperCollins says authors can opt out of the scheme, but as you can imagine, this has proven controversial. In a statement to Bloomberg News, HarperCollins confirmed it reached an agreement with an unidentified AI technology company that would allow limited use of select non nonfiction backlist titles for training AI models to improve model quality and performance. HarperCollins authors will have the option to participate or not, the company said. Part of our role is to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams, HarperCollins said. This agreement, with its limited scope and clear guardrails around model output that respects authors rights, does that end quote Technology companies use an array of data from social media sites to news articles to train AI models, and companies like Microsoft are hunting for additional sources of high quality text that they can license to make their programs more accurate, better able to answer questions or provide expertise on specific subjects. News Corp signed an agreement in May with OpenAI to let the company use content from more than a dozen of its publications, including the Wall street journal, Barron's and MarketWatch. OpenAI has also signed licensing deals with publishers including Alex Springer, the Atlantic, Vox Media, Meredith Hearst Communications and Time magazine. Microsoft has worked AI initiatives with Reuters, Hearst and Axel Springer, which publishes Business Insider and Politico.

Daniel Kibblesmith (2:06)

End quote.

Brian McCullough (2:07)

Now I became aware of all this because my friend Daniel Kibblesmith, the comic author and Colbert writer and guest on my Rad History episode about Calvin and Hobbes, went viral when he posted screenshots from an email from his agent asking if he wanted to participate. Quoting from the A.V. club, Kibble Smith was apparently offered a non negotiable $2,500 to allow his book to be bundled in with other works for TR covering a three year period of use. The posted email, which invokes the specter that these AI models may one day make us all obsolete, also mentions that several hundred authors have already agreed to the deal and emphasizes the stance that hey, getting paid to have your work fed into an AI wood chipper is better than having it stolen for that same purpose. Kibble Smith did not agree, including in his post a screenshot of his rejection of the deal, which he called abominable. In a statement to the A.V. club, Kibblesmith wrote that quote, it seems like they think they're cooked and they're chasing short money while they can. I disagree. The fear of robots replacing authors is a false binary. I see it as the beginning of two diverging markets. Readers who want to connect with other humans across time and space, or readers who are satisfied with a customized on demand content pellet fed to them by the big computer so they never have to be challenged again. One thing I do want to note is $2,500 for a single book. $2,500 a pop might be cheap sounding from the author's perspective, but that is insanely expensive if you're thinking of it from the AI model perspective of trying to get new training data at scale. As Megan Fox said on Bluesky, it isn't obvious yet, but this is where AI ends the second you put any non fixed monetary costs in the data, it costs too much. Might have worked at smaller models, but at the scale they're at now. Lol. No, even if it's a fixed and not per piece rate now won't stay such end quote. Meanwhile, Penguin Random House is going the other way. It has amended its copyright notice globally to prohibit the use of books for training AI. The notice will be included in all new titles and reprints. So I guess sort of like a robots text notice, but analog Quoting the bookseller the new wording states no part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems and will be included in all new titles and any backlist titles that are reprinted. The statement also quote, expressly reserved the titles from the text and data mining exception in accordance with a European Parliament directive. The move specifically to ban the use of its titles by AI firms for the development of chatbots and other digital tools comes amid a slew of copyright infringement cases in the US and reports that large tranches of pirated books have already been used by tech companies to train AI tools. In 2024, several academic publishers, including Taylor and Francis Wiley and Sage, have announced partnerships to license content to AI firms. PRH is believed to be the first of the Big five Anglif phone trade publishers to amend its copyright information to reflect the acceleration of AI systems and the alleged reliance by tech companies on using published work to train language models. Google has shipped the first Android 16 developer preview far ahead of schedule compared to the past decade or so. They're hoping to lower fragmentation by giving OEMs more time to get things together for the new OS. This is something that I told you about recently, quoting Android Police Surprise Google is releasing the first developer preview of Android 16 today so that app developers can test the new APIs and behavior changes that will arrive in next year's big update. Android 16 Developer Preview 1 is going live with new features like an embedded photo picker, medical record support, and an updated privacy sandbox. First of all, it might surprise you to hear that Google is releasing Android 16 DP1 today. After all, the first developer preview of Android 15 launched in February of 2024. February is also when Google released the first developer pre previews of the last four major Android releases, making November quite early for the first preview of the next major Android release. However, Google announced at the end of last month that it's accelerating Android's release schedule. The company confirmed it plans to release Android 16 sometime in Q2 of next year. One leak points towards a June 3, 2025 release date for Android 16, but that hasn't been confirmed yet. Android 16 beta 3 in March 2025 will mark the operating system's platform stability milestone. When Android 16 reaches platform stability, Google ass subsequent updates to Android 16 won't change any existing APIs, add new APIs, or modify app facing system behaviors. The platform's stability milestone is also when developers will be allowed to release updates to their apps on Google Play to make them target Android 16 for reference. Android 15 reached its platform stability milestone with its third beta in June 2024, so again Android 16 will be ready for users and developers much earlier than usual.

Summary

Techmeme Ride Home: Wed. 11/20 – AI To Read The Books

Released on November 20, 2024 by Ride Home Media

1. AI Training on Authors’ Works Sparks Controversy

Key Points:

Microsoft and HarperCollins Agreement:
- HarperCollins has partnered with an unnamed AI firm to use select nonfiction backlist titles for enhancing AI model quality.
- Authors retain the choice to participate, aiming to balance opportunities with the protection of their work's value and revenue streams.
Industry Context:
- Technology giants like Microsoft, OpenAI, and others are actively seeking high-quality textual data from various publishers to refine their AI capabilities.
- Previous agreements include partnerships with prominent publishers such as News Corp, OpenAI, and media entities like Vox Media and Time magazine.

Notable Quotes:

Brian McCullough [00:04]: “Technology companies use an array of data from social media sites to news articles to train AI models, and companies like Microsoft are hunting for additional sources of high quality text that they can license...”

2. Author Daniel Kibblesmith Rejects AI Training Proposal

Key Points:

Kibblesmith’s Refusal:
- Received an offer of $2,500 to include his book in AI training for three years.
- Described the proposal as “abominable” and rejected it, emphasizing the devaluation of authors' works.
Broader Implications:
- Penguin Random House (PRH) has taken a strong stance by amending its copyright notice to explicitly prohibit AI training usage.
- This move reflects growing resistance among major publishers against the unregulated use of literary works for AI development.

Notable Quotes:

Daniel Kibblesmith [02:06]: “... the fear of robots replacing authors is a false binary. I see it as the beginning of two diverging markets...”
HarperCollins Statement: “... Technology companies use an array of data...”

3. Google Accelerates Android 16 Development

Key Points:

Early Release Strategy:
- The Android 16 Developer Preview 1 was released in November, much earlier than previous years.
- Aims to provide developers with ample time to adapt to new APIs and behavioral changes ahead of the anticipated June 3, 2025, full release.
New Features:
- Introduction of an embedded photo picker, medical record support, and an updated privacy sandbox.
- Emphasis on platform stability, ensuring no changes to APIs or core behaviors post-platform stability milestone.

Notable Quotes:

Brian McCullough [02:07]: “Google is releasing Android 16 DP1 today so that app developers can test the new APIs and behavior changes that will arrive in next year's big update.”

4. Sony Launches Cloud Streaming for PlayStation Portal

Key Points:

Streaming Specifications:
- Available in select countries to PlayStation Plus Premium subscribers.
- Requires a minimum of 7Mbps for 720p streaming and 13Mbps for 1080p quality.
Limitations:
- Initial rollout excludes features like game trials, party voice chat, and streaming of PS4 and PS3 titles.
- Child accounts are unable to access the cloud streaming service.

Notable Quotes:

Brian McCullough [07:15]: “Sony has launched cloud streaming on the PlayStation portal in beta, letting PlayStation Plus Premium subscribers...”

5. Crypto Gains Momentum with Presidential Nomination

Key Points:

Lutnick’s Credentials:
- Cantor Fitzgerald serves as a custodian for Tether, managing substantial US treasuries backing the USDT stablecoin.
- Recently launched a Bitcoin financing division with an initial funding of $2 billion.
Stance on Cryptocurrencies:
- Lutnick advocates for Bitcoin’s recognition as a commodity to ensure favorable regulatory treatment.
- Expressed preference for Bitcoin over other cryptocurrencies, citing its decentralized and censorship-resistant nature.

Notable Quotes:

Brian McCullough [08:08]: “Howard Lutnick has been a Custodian for stablecoin company Tether since 2021...”
Brian McCullough: “I am a fan of crypto, but let me be very specific. Bitcoin, just bitcoin.”

6. BlueSky App Surges, Narrowing Gap with Threads

Key Points:

User Growth Metrics:
- Achieved 15 million users on November 13th, expanding to over 20 million shortly after.
- Daily active users (DAUs) and website visits are nearly matching Threads, though Threads still leads globally.
Competitive Landscape:
- SimilarWeb data indicates a narrowing gap in DAUs between Threads and BlueSky, particularly in the US.
- Meta's Adam Mosseri has disputed the accuracy of these metrics, highlighting the limited transparency in user data.

Notable Quotes:

Brian McCullough [13:21]: “BlueSky now has more than 20 million users after hitting 15 million users just on November 13th.”

7. Government Concerns Over Smartphone Tracking and National Security

Key Points:

Investigation Findings:
- Tracked movement patterns of US military and intelligence workers in Germany using billions of location coordinates.
- Devices were traced to sensitive locations like NSA facilities, Air Force bases, and even recreational spots, revealing daily routines and routines at high-security sites.
Security Implications:
- Location data can be exploited for blackmail, espionage, and identifying vulnerabilities within military operations.
- Low-level personnel with access to critical systems represent potential weak links in security protocols.
Official Reactions:
- US Senator Ron Wyden condemned the unregulated sale of location data, emphasizing the threat to national security.
- An internal Pentagon review acknowledged the pervasive nature of data collection and its unavoidable impact on service members' privacy.

Notable Quotes:

Brian McCullough [16:23]: “A collaborative analysis of billions of location coordinates obtained from a US data broker provides extraordinary insight into the daily routines of US service members.”
Daniel Kibblesmith [18:16]: “...location data can piece a lot of secrets together on their own...”
Vivek Chilukuri at CNAS: “A system is only as secure as its weakest link...”