Summary7 min read

Summary of "SED News: Data Land Grabs, Copyright Fights, and the Great AI Talent War"

Podcast Information:

Title: Software Engineering Daily
Host/Author: Software Engineering Daily
Description: Technical interviews about software topics.
Episode: SED News: Data Land Grabs, Copyright Fights, and the Great AI Talent War
Release Date: July 8, 2025

Introduction

In this episode of SED News, hosts Gregor Vand and Shawn Faulconer delve into the latest happenings in the tech and software engineering world. They discuss major headlines from recent weeks, focusing on significant developments in artificial intelligence (AI), corporate strategies, and industry competitions. The conversation is enriched with insights, notable quotes, and thoughtful analysis, providing listeners with a comprehensive overview of current trends and events.

Main Topics

1. Meta's Copyright Lawsuit

Discussion Overview: Gregor and Shawn begin by addressing the ongoing lawsuit involving Meta and copyright issues related to AI models trained on books. The crux of the dispute centers around whether the use of large text corpora, including copyrighted material, by AI models constitutes fair use.

Key Points:

Legal Arguments: Meta successfully argued that their use of book texts falls under fair use, similar to Google's defense during the Google Books case a decade ago.
Market Impact: The judge ruled there was no significant market dilution, meaning AI models like Meta's are unlikely to replace book sales.
Content Limitations: While AI can generate summaries or excerpts, it cannot reproduce entire books verbatim, maintaining the necessity for consumers to purchase original works.

Notable Quotes:

Gregor Vand [05:40]: "There was no meaningful evidence of market dilution."
Shawn Faulconer [05:05]: "There's some irony in it though that there's protections basically put in place from a technical control perspective."

2. Meta's Investment in Scale AI

Discussion Overview: The conversation shifts to Meta's significant investment of $14.3 billion in Scale AI, giving them a 49% stake in the company. This strategic move has sparked reactions from other tech giants.

Key Points:

Strategic Motives: Meta aims to secure access to high-quality training data, a critical component for developing competitive AI models.
Industry Reaction: Companies like Google, OpenAI, and Elon Musk's XAI have responded by pausing or winding down their projects with Scale AI.
Competitive Landscape: The investment is seen as a land grab for data ownership, giving Meta a substantial advantage in the AI space without crossing the acquisition threshold that would trigger antitrust scrutiny.

Notable Quotes:

Gregor Vand [08:40]: "It's not an acquisition... just 0.9%, lots of money in your bank account, but you can still supply other people if you want."
Shawn Faulconer [09:35]: "It's a similar structure as, I think Microsoft's investment in OpenAI."

3. The Great AI Talent War

Discussion Overview: Gregor and Shawn explore the fierce competition among tech giants to attract and retain top AI talent. Meta, in particular, has been offering substantial signing bonuses to lure experts from competitors.

Key Points:

Signing Bonuses: Reports suggest Meta has been offering up to $100 million in signing bonuses, making headlines and intensifying the talent war.
Talent Movement: Prominent researchers have moved from OpenAI to Meta, highlighting the shifting landscape of AI expertise.
Long-Term Commitments: Such high bonuses are likely structured over multiple years and performance milestones, ensuring talent retention.

Notable Quotes:

Shawn Faulconer [27:03]: "It's almost like you're in the space of professional sports or something like that where people are getting these huge contracts."
Gregor Vand [28:23]: "He's obviously trying to say look, we have the best engineers. This is what people are trying to pay for them. It's marketing effectively."

4. Salesforce vs. Glean

Discussion Overview: The hosts discuss Salesforce's recent tensions with Glean, a startup focused on enterprise search solutions. Salesforce, which owns Slack, has imposed strict rate limits on Slack's API, directly affecting Glean's functionality.

Key Points:

Competitive Tactics: Salesforce's rate limiting on Slack's API hampers Glean's ability to catalog and search Slack messages effectively.
Data Ownership: This move underscores the broader industry trend of companies striving to control data flows to maintain competitive advantages in AI development.
Impact on Startups: Smaller companies like Glean face significant challenges when incumbents restrict access to essential data sources.

Notable Quotes:

Gregor Vand [21:41]: "Salesforce is making a huge play around becoming a data cloud company."
Shawn Faulconer [22:00]: "It's all about data. That's what Scale AI does is it gets data ready for training."

5. OpenAI vs. Microsoft

Discussion Overview: The relationship between OpenAI and Microsoft comes under scrutiny, particularly concerning legacy agreements and the controversial "AGI clause."

Key Points:

AGI Clause: A provision in the partnership agreement states that achieving Artificial General Intelligence (AGI) would terminate the collaboration, presenting a paradoxical challenge.
Corporate Espionage Concerns: Tensions have arisen over accusations of anti-competitive behavior, reminiscent of Microsoft's historical antitrust issues.
Strategic Shifts: OpenAI is striving to reduce dependency on Microsoft Azure, seeking to diversify partnerships and enterprise focus.

Notable Quotes:

Shawn Faulconer [16:32]: "There's not a clear definition of even what AGI is."
Gregor Vand [18:55]: "OpenAI acquired Windsurf and then Microsoft has GitHub and Copilot and all their IDEs."

Hacker News Highlights

1. Innovative Side Projects

My iPhone 8 Refuses to Die: A standout project featured on Hacker News involves an iPhone 8 repurposed as a solar-powered vision OCR server. This ingenious setup allows continuous image processing without relying on external power sources, demonstrating the versatility of older hardware when combined with creative engineering.

Notable Quotes:

Gregor Vand [34:13]: "It's all self powered. The iPhone 8 is remarkably stable and powerful for this purpose."
Shawn Faulconer [34:36]: "I loved how he talks about the mana over engineering that he little side project and stuff like that."

2. AI Agents vs. Traditional Queries

Agent vs. SQL Query: An article by Gunnar Morling proposes that many AI agent tasks could be more efficiently handled using traditional SQL queries within stream processors like Apache Flink. This challenges the necessity of deploying complex AI agents for straightforward data processing tasks.

Notable Quotes:

Shawn Faulconer [37:38]: "You don't have to do everything on them. You don't always need this massive power of this model to do things that have been kind of solved problems using lighter weight techniques."
Gregor Vand [37:50]: "You can't sort of prompt to get the whole book out and. Exactly."

3. Frequent Reauthentication Debate

Frequent Reauth Doesn't Make You More Secure: An insightful discussion on security highlights the inefficacy of frequent reauthentication and mandatory password changes in enhancing security. The conversation emphasizes that such practices can lead to user frustration and create additional security vulnerabilities.

Notable Quotes:

Gregor Vand [43:43]: "It's like people being concerned over going back to the cloud providers and AWS having access to my data."
Shawn Faulconer [44:22]: "It's the illusion of security."

Security Discussions

The hosts delve into the ongoing debates surrounding authentication practices and their actual impact on security.

Key Points:

Reauthentication Fatigue: Constantly prompting users to log in can lead to MFA fatigue, where users become desensitized to security prompts, potentially compromising safety.
Password Management: Mandatory password changes often result in weaker passwords or insecure storage practices, undermining the intended security benefits.
Alternative Security Measures: Emphasizing device possession and leveraging modern authentication tokens (like JWTs) can offer more robust security without the drawbacks of frequent logins.

Notable Quotes:

Shawn Faulconer [41:55]: "When you put all this friction behind using a product in the goal of increasing security, what it does is actually create additional security holes."
Gregor Vand [44:19]: "Any modern platform, A, they're most likely using some version or derivative of JWTs and most likely now they're also using JWTs with refresh tokens."

Closing Remarks and Predictions

As the episode wraps up, Gregor and Shawn reflect on the trends discussed and offer their predictions for the near future in the tech landscape.

Key Points:

Summer Slowdown: Shawn anticipates a cooling off period in tech announcements during July due to holidays and reduced conferencing activities.
Potential Developments: Gregor predicts possible challenges in Meta's acquisition of Scale AI or user frustrations with emerging AI platforms like Manus AI.

Notable Quotes:

Shawn Faulconer [45:06]: "I guess my prediction is there's not going to be a major, major announcement in July."
Gregor Vand [45:35]: "I'm going to go out on a limb there and just say something about Manus or if it's about something we talked about today."

Conclusion

This episode of SED News offers a thorough exploration of critical developments in the software engineering and AI sectors. From legal battles over copyright and strategic investments to fierce competitions for AI talent and data, Gregor Vand and Shawn Faulconer provide valuable insights into the forces shaping the future of technology. The inclusion of Hacker News highlights adds an extra layer of depth, showcasing innovative projects and pivotal debates within the community. Listeners are left with a clear understanding of the current landscape and thoughtful predictions for what's to come.

Loading summary

Transcript79 lines

[00:13]
Gregor Vand
Hello and welcome to SED News. I'm Gregor Vand.
[00:16]
Shawn Faulconer
And I'm Shawn Faulconer.
[00:17]
Gregor Vand
And this is a slightly different format of SE daily podcast where we basically take a spin across the last few weeks in terms of big headlines. We have a main topic in the middle, we look at hacker news highlights and we just kind of give our little thoughts on what's been going on in the tech and predominantly software world. So how's your few weeks been, Sean? What's been going on?
[00:39]
Shawn Faulconer
It's been good. I mean, I think last time I chatted I was in the thick of coming off of Snowflake's big conference and Databricks big conference. And then the challenge with going back to back conferences is that then I come back and I have like a mountain of work to actually catch up on because it's hard to do the work while you're at the conferences. So I think the last chunk of time has been essentially just trying to recover from a work perspective on all those things that were piling up and then getting ready for a personal trip to Hawaii soon, so.
[01:10]
Gregor Vand
Oh, awesome. Nice.
[01:11]
Shawn Faulconer
Yeah, like Hawaii, what have you been up to?
[01:13]
Gregor Vand
Yeah, it sounds like we've had sort of opposite schedules then I was sort of heads down for a while and then. Yeah, the last couple of weeks have been quite busy from a sort of events perspective. Yeah, we had super AI, which is like this big conference that got put on over here in Singapore. It was a bit weird, I'm not going to lie. It was a sort of like going into a nightclub for like two days straight. So I had to sort of take breaks, but it was interesting. Just a lot of agentic. Agentic stuff. I would say it was more sort of corporate leaning perhaps, but good keynotes. There was Dwarkish Patel, another AI podcast that's very popular these days and so had a quick chat with him. And Edward Snowden was also a keynote speaker, but by video link for obvious reasons.
[01:55]
Shawn Faulconer
Yeah, I haven't heard Edward Snowden and it's been a while since his name.
[01:59]
Gregor Vand
Was kind of quite. Yeah, so yeah, it was a very interesting event and lots of people had flown in from all corners of the earth for this thing, so it's always nice just to meet a bunch of new people from around the world as well.
[02:10]
Shawn Faulconer
Yeah. And flink forward, Asia is happening in Singapore, I think next week, which is coming out very soon. I guess Singapore is like a hot bed for tech conferences. So.
[02:18]
Gregor Vand
Yeah, in this part of the world for sure. It's kind of the place, yeah, the facilities here are kind of. You can just go down to this place called Marina Bay Sands. I believe it's actually owned by Sands of Las Vegas. And yeah, you just go down there any day of the week and there's something going on down there. So it's kind of fun. So. But yeah, thanks to the Singapore government for the ticket. I did not pay a thousand bucks for that ticket. So I am a real startup founder. I don't just blow a thousand bucks on these things. Anyway, let's get onto the headlines. The big one we're going to talk about to begin with is around Meta. There's a lot to talk about Meta this week, but Meta and copyright, basically, and they've been in a lawsuit and at least the first sort of round of that is that they effectively won their argument that this is all around books. So it's the idea that publishers are not very happy that basically chunks of text come out in responses. Obviously this means the model has been trained on these books. And yes, the publishers are not very happy about this. But it seems like Meta has won on the basis that I think the TLDR here was simply that their argument wasn't strong enough yet. And Meta, I imagine, had gone to town on their lawyer side. But yeah, what was your take on this one, Sean?
[03:33]
Shawn Faulconer
Yeah, I mean, my understanding was they were kind of able to argue it was under sort of fair use, which it was the same kind of argument that Google made over a decade ago about Google Books, which was also determined fair use. And I guess, look, it's a very complex issue. I'm certainly not a lawyer. Some of the fair use arguments kind of make sense to me. If I can't prompt the model essentially to regurgitate the entire novel word for word, then are you really getting anything more than perhaps you would be able to get from Wikipedia or CliffNotes version? That also falls under presumably fair use example. But then at the same time, if I'm an online content creator, if I own Reddit, for example, I can block third parties from crawling and ingesting that content and training models on it. So it's kind of strange in some ways that if you're in the digital world and your thing exists digitally, I can prevent essentially model providers from essentially getting that data and using it for training. But then if I am an author of a book, I don't have the same easy control over it. And I have to, even if they didn't make the right argument, they have to essentially take this thing to court in order to be able to make the argument. So it's very complex. I don't pretend to be the person who has expertise on it. I do find it kind of. There's some irony in it though that there's protections basically put in place from a technical control perspective. But if you're not in that world, suddenly you don't have the same control to protect your ip.
[05:05]
Gregor Vand
Yeah, I think this is the point. A lot of us in tech technically, I do have a law background. I'm maybe unusual, but I don't ever sort of pretend like I'm more knowledgeable in law these days than anybody else in tech who's not like an in house counsel. So I think the point is that we're not lawyers, but we still have to understand on what side is sort of right or wrong in some ways. I believe what was said here was that there was no meaningful evidence of market dilution. That's like a fancy way of saying they don't believe that. This is so the judge saying, I don't believe an LLM is going to stop people from buying the book. It's kind of like a translation of that, I think.
[05:40]
Shawn Faulconer
Right. Yeah. If you want to read Harry Potter, are you going to ChatGPT and saying, I'm already paying for this, I want to save some money, so I'm just going to have it tell me the story of ChatGPT. I certainly am not. But I can't speak for everyone. I mean, I think that's a fair argument.
[05:54]
Gregor Vand
Right. Because I think to what you mentioned just for that, you can't sort of prompt to get the whole book out and. Exactly. You can't just say, oh, I want to read chapter one right now. And then chapter one pops out. It doesn't work like that to my understanding. And I think, yeah, Cliff Notes for those that remember cliffnotes.
[06:10]
Shawn Faulconer
Yeah, it's a dated reference.
[06:13]
Gregor Vand
So cliffnotes were kind of like cheat sheets for when you were studying literature and things.
[06:18]
Shawn Faulconer
Yeah. If you couldn't understand, I don't know, the Shakespeare play that you're supposed to read in high school English class, you could get the cliffnotes version where it would explain it to you in plain English.
[06:28]
Gregor Vand
Exactly. I definitely use those. Yeah. So here we are, we're going to see how this plays out. And obviously we're touching on it here because this does affect all of us in software. If suddenly, oh, the model has to stop referencing any books tomorrow. Well, a lot of products just don't have that kind of concept that, that might not come out. So then what are the cascading effects of people's platforms or products where suddenly this sort of content that was just assumed that would be available is gone, for example?
[06:59]
Shawn Faulconer
Yeah, and I think this kind of talk is one thing that points to a larger sort of theme. And some of these things we're going to be touching on through the course of our conversation. But there's such a land grab right now around data. Like the big competitive advantage for people building models, or even those building applications on the model is really what data do I have access to? What data can I either use for training purposes, or if I'm building applications of these models, like, how do I get the right contextual data into the prompt in order to get something relevant for my business application? And the people who are the most successful of that are the ones that are going to kind of win the market. That's really where the competitive advantage is. It's less, at least currently, around new, truly innovative techniques in terms of how these models are constructed. It's a lot to do with essentially the data and how organized it is for training purposes or prompt assembly purposes.
[07:58]
Gregor Vand
Yeah, exactly. And we've been seeing a few things like this pop up, especially in the last few weeks. We're going to get onto that, as you mentioned, Sean, and the main topic around basically, walls that are going up. So we'll get to that shortly. The other kind of main, I guess, announcement headline that made mainstream news as much as tech news was Meta again making a 14.3 billion investment in Scale AI, which gives them a 49% stake in that business. And that's always a great number. If you see 49 or 51, you know, it's effectively saying this is effectively equal ownership. It's just that someone's decided there's a reason to take one off one side and put it on the other.
[08:40]
Shawn Faulconer
Is that the same for Microsoft and OpenAI? Is it 49%?
[08:43]
Gregor Vand
That's a good question. I'm not super sure.
[08:46]
Shawn Faulconer
I don't know. It sounds familiar, but don't quote me on it.
[08:48]
Gregor Vand
Yeah, exactly. It could be that kind of similar. So in this case, Scale AI, they have high quality training data and they are a vendor to all the big players, to my understanding. And this is just a blatant land grab by Meta, but crucially, because they haven't gone over that 50%, it's not an acquisition. So again, let's just put the legal hat on for five seconds. It's the idea that this is not going to be scrutinized from antitrust and I'm sure just a whole bunch of time and effort that would be needed to fully acquire or majority acquire a business. And this is, no, no, we're just 0.9%, lots of money in your bank account, but you can still supply other people if you want. So, yeah, I mean, you're kind of on the ground over on that side. Sean, what do you make of this?
[09:36]
Shawn Faulconer
It's a similar structure as, I think Microsoft's investment in OpenAI. Whether that's 49% or not, I can't remember the exact percentage breakdown or Amazon has a stake in Anthropic as well. And a lot of these giant tech companies are investing in these companies to access the AI capabilities without necessarily triggering any sort of antitrust reviews. It's a bit of a hack around that system. But as a consequence to the move that Meta made, Google, I believe, paused their scale AI projects within hours of the announcement. OpenAI is also winding down their relationship. Elon Musk's XAI project also halted some of their projects as well. So a lot of these companies are pulling out of this. And this kind of goes back to even the earlier conversation that we were talking about. It's all about data. That's what Scale AI does is it gets data ready for training. These massive models, they have all kinds of people deployed around the world that are involved in the cleanup process and the labeling process. There's a lot of human labor that goes into preparing the data. And that Scale AI has been able to address that. And they were working with all the biggest companies in the world to help prepare the data for training. And now, of course, Meta gets to strategically kind of own that data funnel.
[10:55]
Gregor Vand
This is, though, maybe still the bit that maybe some of the audience as well are like scratching their heads on a little bit. I'm still just trying to fully understand, okay, meta own, let's just say effectively own scale AI, but at the same time, and Scale AI is providing the data. So what changes? When is it around how they're going to structure the data? Is it going to be very skewed towards, say, llama models versus something else? Why is there such immediate pullout from these companies? I'm still trying to get that concern.
[11:26]
Shawn Faulconer
Yeah, I'm not 100% sure there either. Like why the immediate reaction was punish Scale AI in some fashion?
[11:33]
Gregor Vand
Because who are the alternatives at this point?
[11:36]
Shawn Faulconer
Yeah, I don't know who else is in that. I mean, there's a couple other companies like Label Box and stuff like that, that are in sort of the. There's a bunch of data labeling companies but as far as I know there's no one sort of operating at the scale of scale AI, but there's a bunch of companies kind of focused on that problem set. So I don't know if the plan for the Googles of the world is to go and leverage those competitors in the space or maybe they're going build out known some of this themselves because they realize that they don't want to be dependent on a third party vendor to provide this. I don't fully understand what the impact would be if all those companies were using. They're already using scale AI. But presumably when I give data to say AWS and Amazon, it's not like just because it's running in an Amazon server, anybody at Amazon can go and just look at that data. So I'm not sure why they had the. You'll feel that compulsion to pull out.
[12:27]
Gregor Vand
Yeah, could also be. Well, it's a pure economic thing, you know that that thing effectively is your competitors. And then you say, well these big multimillion dollar contracts, yep, there are potentially even billions at this point. No, they're gone. It could be a kind of power play where they're trying to maybe have Meta double check their decision on that one, but who knows?
[12:46]
Shawn Faulconer
Yeah, I mean it could be also just because the space is so competitive and presumably all those companies using scale AI have some proprietary data that's part of that process. They do have concerns of something nefarious going on. We talked last time about the corporate sort of spying and espionage. So it's that just because you sign a contract doesn't mean people aren't going to necessarily break it if the right incentives are put in place, especially if they feel like they can get away with it. So maybe there's just enough potential risk there that they want to go and seek somebody else to do that job for them.
[13:21]
Gregor Vand
Yeah. And I think this is maybe a good time to move kind of onto our main topic, which is just the idea that big tech, the walls are going up. And why is this sort of significant? Well, I think it's fair to say maybe pre AI or pre gbt. We just didn't maybe see so much of this where the big tech was really going at each other. They didn't really have a specific piece of land, so to speak, to fight over. They were kind of like, well Google has its ad thing and so does Meta, but Meta also has this other thing and we're kind of all Doing our dances around different things.
[13:56]
Shawn Faulconer
Whereas this is, I think cloud is maybe the closest between the major cloud providers. But all the biggest companies are kind of multi cloud anyway, so there's lots to go around there. Certainly competitive. But I agree, I don't think we've seen this kind of level of competition since maybe the early days of social when Facebook or Slash Meta became really big. There was a real existential threat to Google's business and they really put a lot of resources and time and effort behind like Google and the other failed Google social projects and stuff like that. I forget what the circle, maybe that was Google, but Google Buzz was another one. They had a whole bunch of different social experiments and none of them really took off. And then eventually they kind of gave up on that as a business and went after other things. But since then I don't think I've seen, I think AI, at least in my lifetime, since I've been working in the industry is probably the thing that I feel like has the highest competition and companies are just throwing crazy amounts of money at it. They're trying to steal each other's talent with offering massive amounts of money, incentives for people to come over, for talented people to come over. It's like this land grab where I think that they see this as the future. There's going to be winners and losers and they want to make sure that they're on the winning side.
[15:17]
Gregor Vand
Yeah, for sure. So we're going to dive into a few of these specific battles going on. So this is sort of in the realms of the walls are going up, like who have we got against each other and on what grounds? So the one that's also touched the big headlines, main headlines very recently, OpenAI versus Microsoft. And this is around legacy agreements that they had around Microsoft. As you've touched on Sean owning a significant chunk of OpenAI for a long time. However, there was this interesting clause in there called the AGI clause. And this is around. At what stage does technology get to that stage? And the thing is, it's a sort of ironic problem because achieving AGI would automatically terminate the partnership. But surely you want to reach that stage if the idea is just to advance technology. And there's been something mentioned where many executives at Microsoft Back in 2019, they thought this clause was nonsense. But again, reportedly Satya Nadella was like, no, we're too far behind on this, I imagine Transformer, et cetera. We just need to do this deal. We'll figure it out later. And well, here we are later, six years later. Almost. And yeah. So, yeah, how does this look?
[16:33]
Shawn Faulconer
I mean, I think the challenge just on the AGI clause thing is there's not a clear definition of even what AGI is.
[16:41]
Gregor Vand
Let's just make sure. AGI is artificial general intelligence.
[16:45]
Shawn Faulconer
So essentially we've reached the place where we have human level intelligence. And some people argue like, hey, we can have a chatbot pass the Turing test, which was sort of the original idea of a test created by Alan Turing many, many years ago, of where essentially the idea is you have somebody that's interacting behind a closed door asking questions to either a human or some sort of computer. And if it's a computer and the human can't tell the difference between the answers coming from another human or from a computer, then essentially the computer passes the Turing test. And for certain types of question answers, certainly you could argue that something like ChatGPT could passed the Turing test now. And I think people have kind of tried to prove that. But at the same time, there's things that these models are incredibly stupid. There's ways of tricking the model that would never ever trick a human.
[17:37]
Gregor Vand
And even just basic arithmetic and this kind of thing you would expect to ask a human. Most humans, hey, what's one plus one? And get the right answer and there's enough evidence to show you might not always get that answer from a model at the moment.
[17:50]
Shawn Faulconer
Yeah. So there's these types of challenges. So how do you even prove that you've reached AGI? It would probably become some sort of leak legal thing. Again, how do you enforce that? There's no clear set mathematical definition of what that is. So I think that's a challenge. But the relationship between OpenAI and Microsoft has continued to get, I think, more and more contentious over the last couple years or last 18 months. Certainly OpenAI has accused Microsoft of anti competitive behavior multiple times. In a lot of ways, it kind of reminds me of the old browser war days where there was a lot of accusations against Microsoft in terms of like forcing OEMs to have Internet Explorer installed. And Microsoft certainly has a history of anti competitive behavior. Famously, they got brought up in the early 2000s on antitrust charges where they tried to essentially break apart Microsoft as a company. And eventually those charges went away. But it really damaged Microsoft their sort of reputation. Yeah, their reputation, Sorry. During that time.
[18:50]
Gregor Vand
And we've seen the Windsurf acquisition, of course, which sort of plays into this whole thing as well.
[18:55]
Shawn Faulconer
Yeah. So OpenAI acquired Windsurf and then Microsoft has GitHub and Copilot and all their IDEs and suddenly you have this competition that's happening between these two companies where there's significant amount of stake, Microsoft's perspective and sort of investing in OpenAI and OpenAI has been running on Azure cloud. I think they've been trying to pull back some of their dependencies there to be less vendor dependent. So there's all these things that are happening sort of behind the scene. And then I think also OpenAI over the last year or so has started to really pay attention to the enterprise and be less just about a consumer facing application. I think enterprise is where Microsoft historically has really thrived as well and that's where a lot of the dollars are. So that of course creates more tension between the two companies.
[19:43]
Gregor Vand
Yeah. This episode of Software Engineering Daily is brought to you by Capital One. How does Capital One stack? It starts with applied research and leveraging data to build AI models. Their engineering teams use the power of the cloud and platform standardization and automation to embed AI solutions throughout the business. Real time data at scale enables these proprietary AI solutions to help Capital One improve the financial lives of its customers. That's technology at Capital One. Learn more about how Capital One's modern tech stack data ecosystem and application of AI ML are central to the business by visiting capitalone.comtech so we're going to move on to the next battle. This is Salesforce versus Glean. Now, what does this even look like? So Salesforce owns Slack. That's kind of where we're going to go with this one. Glean is, let's say this is more of a startup. So this is interesting. We're actually seeing a kind of incumbent go against one of the more early guys. We did have an episode on Glean back in April with yourself, Sean, and this was funny because I remember listening to that episode. I happened to be in London listening to this episode. I went straight into a meeting with a friend who works at a very large PE firm and we got straight onto the topic of AI and they said they just rolled out this white labelled search, all our internal information. And I said, oh, who is it? Oh, it's Glean. So suddenly it all kind of made sense. That's the context here. We're talking, we don't need to name the name here, but the largest PE firm in the world effectively is running on Glean. And here comes Salesforce or Slack saying, oh, you're not going to actually be able to with any sort of practical means now catalog Slack messages via the API. They've put in a pretty Onerous rate limit on that. So it's not like full block, but it is incredibly hampering. And it seems like Glean was the main target for this. Yeah. What do you make of that?
[21:41]
Shawn Faulconer
Yeah, I think it's unfortunate because Glean, I think, you know, was born out of Google. Google had created internally a product called moma, which was to try to solve this kind of heterogeneous search problem of internal documentation across Google. Like Google's been a company over 20 years, 100,000 plus employees. There's stuff everywhere like how do you make it findable essentially? And then a bunch of smart engineers from Google left started Glean to take that idea and build a product around it and it solves a real pain point for enterprise business. Anybody who's worked in a large company can identify with this challenge of like, oh, someone said something to me like where the heck is it? Was it in a doc? An email? Slack message? The enterprise search problem is really challenging for most businesses and Glean did a really good job of addressing it, where it can index essentially all these different disparate data sources, give you one interface in the search into it. And then they've done a lot of stuff since then. Now they have a bunch of AI tools. You can chat to it, so I can ask questions and it'll go use sort of a rag based application behind the scenes to pull in the context and provide a proper response so I don't have to necessarily just search it and click on links. And then they also have the agent support now where I can build my own workflow and use my own internal docs and stuff like that. So really, really cool stuff, cool company. And I think it's kind of unfortunate from a competition standpoint that Salesforce is penalizing their ability to do that. But it kind of all goes back to the data problem. Like all these companies want to own the data because if they own the data then that's like where the AI serving is going to have to be dependent on. And Salesforce is making a huge play around becoming a data cloud company. They're going after lots of snowflakes of the world and they just bought Informatica. They're trying to get more and more data sort of in their gravitational pull as a company. And most of these companies are not interested in that data flowing out. They're only really interested in the data coming in. And for their products to work, they kind of need to own the whole world to make it work. So I think that's unfortunate. But hopefully there's some resolve to that where glean figures that work around or the other products in the same space. I think Notion is also looking to go out.
[23:53]
Gregor Vand
Notion? Yeah. I went to Notion or trying to make big inroads over here. I believe they're APAC offices in Australia, but they put on this very on brand event over here, rented out this very nice space in the National Gallery and it was called Cafe Notion and it had jazz music and fancy breakfast canapes, which I thought was just a very nice spin on this kind of event where you're not pushing it to the evening with drinks and things. You actually push it to the morning and just make it very nice. So all on brand. But I think the big standout for me with them during the whole presentation was A, we're going for enterprise and they make a big case about OpenAI in theory, all running on Notion now and then. B, there's data integration. This was their big push. It was, oh, you can integrate all these platforms and these ones are coming soon, but you can already integrate Slack and so on. And I'm just wondering where do they go when their whole offering is saying, look, we know you've got disparate data, you might not be all in Notion yet, but what can we do about that? Well, we can integrate. Well now one of the biggest providers of the integration that would probably help you because I think most companies, if they're using Notion, it's not like they don't use Notion instead of Slack. That's just not a thing. It's actually Slack was trying to kind of recreate Notion Y bits inside Slack.
[25:08]
Shawn Faulconer
Yeah, yeah.
[25:08]
Gregor Vand
I don't think it really worked very well, but. So what does a Notion do in this case as well?
[25:12]
Shawn Faulconer
I'm assuming they have the same throttling challenges. Right. All the companies that are integrating there besides Salesforce, which probably has a workaround through some internal API, I'm not sure what will happen. Or you have to get into a place where you have to end up with more of a strategic partnership with Salesforce where they unblock you. You're not just using sort of a public facing API endpoint. There's an API that has higher limits or it's a different API and stuff like that.
[25:39]
Gregor Vand
It's interesting just on the basis that Notion roll in, they're pushing very much. We can clean up these 10 SaaS platforms that you use and even if you need to keep paying for them, you won't actually interface with them. All the data will just be pulled into us. And as we're going through right now. Data seems to just be the thing that actually companies used to be kind of more open to exchanging because it seemed like a fair trade. Like, well, this data for this purpose and you need it for that purpose. Sure, let's have APIs. That's the name of the game. But as companies like Salesforce clearly are, I guess looking at this and saying, why can't we be the people to give you the best context from our own data? And this context is the value here seems to be. It's what we're being told. So much data context is the value. So the next one we're going to move to, it's more of a rumor, but it has been strongly, I guess reported. And you touched on it earlier, Sean. Meta vs OpenAI and this is in the sense of people. So in theory, Sam Altman came out in public saying that Meta had been offering $100 million signing bonuses, which is obviously, I don't think we've ever seen a number like that in terms of at least publicly stated for signing bonuses. Meta haven't commented on this and I don't know if that sometimes means it might be true, but I don't think we have seen this kind of level for a while, at least in terms of competition.
[27:04]
Shawn Faulconer
Yeah, I mean, even if it's not 100 million, I'm sure it's a lot to poach talent from some of these other companies. And certainly I think there's been three fairly prominent researchers have moved from OpenAI to Meta recently. And I'm sure there's a movement all over the place. It's almost like you're in the space of professional sports or something like that where people are getting these huge contracts. I wonder if there's a multi year contingency to that. It's like $100 million but over 10 years to join or something like that, or based on performance and things like that. So you get into a place where you're not just acquiring companies, you're acquiring the talent to build the company that you want.
[27:40]
Gregor Vand
Yeah, exactly.
[27:41]
Shawn Faulconer
And it really goes back to this arms race that's happening in AI right now. Whether it's for data or it's for essentially the talent to make things happen.
[27:49]
Gregor Vand
Yeah. And as you call out, it's 100 million. Okay, sure. That's like a headline. These things are never structured that way. It's not just like, oh, you join and then on a Monday and on Tuesday, 100 million lands in your account. There's many ways they sort of structure this in terms of options and as you say, like kind of over a certain amount of years and performance basis and so on and so forth. But obviously it's not unusual to see someone like Sam Altman just take the number just to kind of make a splash. And he's obviously trying to say look, we have the best engineers. This is what people are trying to pay for them. It's marketing effectively.
[28:23]
Shawn Faulconer
Yeah, that's another angle. Right. In terms of the competition, the competition's I think going. It's all the way up the stack essentially all the way at the hardware level. If you look at the hardware level, Nvidia AMD to some extent are essentially providing all the chips. But a lot of the cloud providers and also big model companies are also looking now to figure out they're investing essentially in their own chip designs because they don't want to have this vendor dependency. And a lot of AI startups are going multi cloud because they don't want to have too much dependency on a single vendor. It's really about either. It's protective moves. They're trying to diversify their stock portfolio to some extent so that they're not beholden to anyone company.
[29:06]
Gregor Vand
And you have also come across the fact that Google has donated A2A to the Linux Foundation. So how does this sort of play into the walls going up or are these actually walls coming down or like what is this?
[29:19]
Shawn Faulconer
Yeah, it's kind of the opposite in some sense, at least from a Surface level where we talked about Google Agent to agent a couple times on here came out a handful of months ago. It's kind of a similar idea to Anthropics MCP but focused on interagent communication. And just recently at the Open Source Summit North America, the Linux foundation announced the formation of the Agent Agent project which involves like AWS, Cisco, Google, Microsoft, Salesforce, SAP and ServiceNow perhaps others. I think it's good from we're kind of moving towards consolidation like even Cisco, which had a somewhat competitive product, is integrating A2A support into their agency's core. So I think that's good from a standards perspective. Overall it's good that companies that are looking to invest in using something like agent to agent now you have neutral governance, remain vendor agnostic, it can be a community driven project. But strategically for Google there's probably multiple reasons I'm sure to do this. It's not necessarily something they're going to directly make money from and more adoption of agent to agent and other such standards overall is better for the Companies that are already succeeding in the agent market because it essentially becomes another thing that people can build against. Less barriers to entry. You want to reduce as much as that as possible, make it as easy as possible for people to build stuff. And if you're the own the GPUs, you own the cloud serving, or you own the models and the tokens being generated, then you want more people essentially building.
[30:53]
Gregor Vand
Yeah, absolutely. And yeah, sort of on the same vein, we've got an episode coming up in the future around MCP security, actually. And actually during that episode we talked a lot about there is no sort of one place at the moment to go for validated MCP servers, for example. And maybe this is sort of the same thing where. Not the same thing, but if Google are kind of handing this off to the Linux foundation, they maybe hope they're going to be a good steward of that protocol, for example, more so than say, a Google doesn't look neutral, unfortunately, Linux kind of does. So.
[31:28]
Shawn Faulconer
Yeah, yeah, exactly. I think one of the challenges with a lot of the MCP servers right now too is that the majority of them aren't managed experiences from like a vendor that you trust. It's source code that's available from the vendor that you have to run yourself, essentially. So you're taking on the burden of doing it. Or you go to one of these MCP aggregators that are running that on your behalf and then you have to know, do I trust this aggregator or not? How are they running this, all that type of stuff. So I think those are just signs of it being such a new thing. It's not even a year old, so it's going to take a little while. It's surprisingly few companies that have an actual managed MCP server that you can just use.
[32:13]
Gregor Vand
Yeah, that's precisely what we touched on in that episode where a lot of it's around. Do you know who provided this server effectively? So, yeah, look out for that in the future. So I think we've hit the high notes on walls in tech. Obviously we could have covered at least double that, but I think we've kind of covered what the last three to four weeks have shown us in terms of what's going on. No doubt in next month's news episode we'll see some developments here as well as. Yeah, who knows? Let's move on to. I think it's almost my favorite part of SED news, where we just get to look at hacker news throughout the weeks. We just sort of have a scratch pad of things that we've maybe seen and kind of interest us. So I might kick off with one that. I just love these little projects that people do for no other reason than it's cool and fun but actually they have interesting uses at the end of the day. And this one was called My iPhone 8 Refuses to Die. Now it's a solar powered vision OCR server and I just, it was like I have to click this and see what this is. So thank you to the user that submitted that. This is basically someone who has rigged up, I think it's an iPhone 8. And their argument was this thing was sitting doing nothing. But it's actually incredibly powerful, especially with vision OCR from Apple, which is a on device vision model. They've rigged this thing up like a power unit which is powered by solar. So this whole thing runs effectively from its own power. And this person says they've got a lot of image processing needs. They don't explain what that is, but they say thousands of images I think at least per week that need to be analyzed and labeled. And this person claims it's doing it fantastically well. It's all self powered. The iPhone 8 is remarkably stable and powerful for this purpose. And I also like. He says it's also a great conversation starter. This thing just sits on his windowsill and people are like what is this thing? So yeah, very fun, very fun.
[34:14]
Shawn Faulconer
Yeah. I also love to get a kick out of these little side projects and stuff like that. And a lot of this side project stuff is kind of like how I started my interest in computer science and software engineering and stuff like that many, many years ago. I have less time unfortunately to do those things now. But I loved how he talks about the mana over engineering that he little side project and stuff like that. So it's fun.
[34:37]
Gregor Vand
Yeah, super fun. I almost got it confused that he was using it for like bird watching because of the fact that the phone was like on the windowsill and I thought, oh, is this like you're using the camera as well? But it's not that I don't think it's like he just happens to place it on the windowsill. He's clearly feeding images to it from. I think he's got like a microserver also sitting there which is also powered by the solar.
[34:57]
Shawn Faulconer
I think it's like a laptop or something like that. I think it's a laptop he's using. It's not connected to anything. It's all like one internal network. There's actually bird feeders now that have.
[35:06]
Gregor Vand
Like cameras in them.
[35:07]
Shawn Faulconer
That will tell you which bird.
[35:08]
Gregor Vand
That's why I was thinking about it because I was like, well, unfortunately a bit of a closet bird watcher. So I was sort of looking at this thing going, oh, could I do this? Could I set up my. I mean, I imagine you could, but to be clear, I don't think he was using the camera piece of this. It was actually just using this as a pure piece of hardware that's surprisingly good at this purpose. And as he points out, he's feeding thousands of images for assessment and he doesn't actually want these going to a cloud provider. So this is all on device. So yeah, pretty cool, pretty use case of that. He points out the fact that the vision OCR side of things was pushed by Apple pretty quietly. So maybe we'll see more with that. And obviously Apple's not having a great time of it from the AI standpoint right now. People don't really associate them with any sort of strong AI offerings at the moment. So it's kind of interesting that there's maybe some little hidden gems right now in terms of what's actually possible on device for Apple. Sean, what did you find in your travails of hacker news?
[36:09]
Shawn Faulconer
I want to talk about this article or highlight this article. It's actually written by a colleague of mine, Gunnar Morling, who is pretty well known from the Billion Row Challenge from a couple years ago and he's been involved in a number of open source projects. Now he's a principal technologist at Confluent and he wrote an article called this AI Agent should have been a SQL Query, which was like the number one article on hacker news for some period of time. It was inspired by this talk by Seth Wisemans which was titled that Microservice should have been a SQL Query, where he made the case for implementing microservices as SQL queries on top of stream processors. And then in Gunnar's article he kind of explores that similar idea but for AI agents. And he goes through this use case of processing research documents and being able to do agentic workflows all using Apache Flink and Flink SQL running on top of the stream processor. And he highlights a bunch of the various open source flips that have been contributed to Apache Flink over the last year or so or year plus that bring in AI functionality, including one called flip531 which is about Flink agents, which I actually co authored. It's really interesting. I think he did a really, really good job of just kind of explaining where maybe you don't want to do all agents this way. But there's a lot of agents that are really just like, hey, give me input. I'm going to go do my AI magic box and spit out an output. So why do I need to stand up a whole bunch of infrastructure to do that if I can just run that essentially as a stream job?
[37:39]
Gregor Vand
Yeah, and I'm sure there's many examples of that where people are running things now through LLMs and unfortunately still regex or just an SQL query could have done it at least as fast, if not certainly more efficiently.
[37:51]
Shawn Faulconer
Or even traditional models. You see a lot of people doing things like basic classification or sentiment analysis. There's other models that existed that can perform those tasks quite effectively. And I work in AI, I love building stuff on foundation models, but you don't have to do everything on them. You don't always need this massive power of this model to do things that have been kind of solve problems using lighter weight techniques.
[38:14]
Gregor Vand
Yeah, that's a great call out. Yeah, we've got an episode coming up in the future with Jigsaw Stack, which is a small model specialty company. So. Yeah, that's exactly what you've just mentioned, Sean. These are small models trained for very specific purposes. Faster, more accurate for those purposes. Look out for that. That's a fun one. Very, very young company. So great to sort of have a chat with someone. Thick in the trenches with that one. Okay, so this is obviously security leaning. I always like to try and bring something security in if I can. So this was ultimately on the tailscale block. It was submitted by user ingve and this is an article by, I believe it's the CEO Avery Penaroon and it's called Frequent Reauth. Doesn't make you more secure. And I had to go into this one because I've worked a lot in this space and there's a product that we have under milpass, which is pretty much exactly what he's talking about, which is we sort of took a look at credential management and was like, why are we constantly asking people to log in, bouncing them out of a service, changing passwords more frequently than is needed. And it also highlights the fact around device. Devices now can do a lot of effectively indirect checking. This is super interesting because he's just talking about frequent logins are the wrong answer. You don't, you shouldn't need to. He says it's kind of from a bygone era of Internet cafes, which I think is a pretty astute way of looking at it. We used to often back then use shared computers in a school or in an Internet cafe, if you can remember about that far. I think maybe schools and universities are maybe the more likely case there where you're logging into your Gmail whilst on the school network or something. And of course everyone just has a laptop now.
[40:00]
Shawn Faulconer
Yeah, I think the last time I was in an Internet cafe and the last time I was in Internet cafe was I traveled to Europe for a conference in graduate school and the computer I brought, the WI fi wouldn't work for some reason and so I had to go to an Internet cafe just to. In southern France just and paid, you know, whatever it was to be able to use it for 20 minutes to check my email.
[40:21]
Gregor Vand
Yeah, I think for me it's also Europe, it's Croatia. It was just out of high school interrailing and arriving in a town and not having anywhere to stay and using it to go on one of these room sharer websites and trying to find someone at 6am in Croatia who will accept me in their home. Which worked. That's a whole other story. So anyway, let's go back to the reauth, which is the fact that device possession, which is kind of what we're talking about here, that you actually tend to now just always have the device, it's your device, you've probably unlocked it yourself. So why are you now also re unlocking a platform? Why are you logging back into it? These very short session timings, like a day or even somewhere like 30 minutes. Avery points out okay for banking, sure. There's probably a bunch of reasons that's a nice extra safeguard to have, but for most platforms this doesn't make sense. And the other point he makes is something around passwords. Being asked to change your passwords on a specific schedule, usually by enterprise, that's kind of at least the classic case of that. It's something I've been absolutely against from the early days. I remember when we were running our startup and my co founder was like, oh, we need to change our passwords on. Back then it was LastPass and I was like, but this is a really, really, really good password and it hasn't been breached. And if I'm just being forced to change it, I'm going to forget the password or it's going to become less secure. And that's his point. As soon as you ask someone to change their password every couple weeks. So yeah, I don't know what's your kind of like experience with all this real nonsense?
[41:56]
Shawn Faulconer
I mean I think that when you put all this friction behind using a product in the goal of increasing security, what it does is actually create additional security holes because people figure out ways of working around it because they have to get work done or they need to access whatever it is in maybe in certain circumstances where it's not mission critical that they get access to it, they just give up with it and use something else that doesn't have that friction. And you end up creating a situation where I think people are. They run out of passwords, like they're good passwords. They start using passwords that are just easy to remember, or they just start writing them down and putting them in places where maybe they shouldn't. And I think the other point that he makes in the article too is about how attacks aren't due to physical access to devices. For the most part, attacks of getting in your email happen remotely. So logging you out on your laptop isn't necessarily helping prevent an attack that's happening remotely. It's kind of like people being concerned over going back to the cloud providers and AWS having access to my data. Breaches don't happen because somebody in a data center walks up to the computer that has the hard disk with the actual physical data in it and grabs it. They happen because someone left an API key in their GitHub repo and someone found it and they use that to get access to the data. Or they find someone gets access to a server that has log files in it that haven't been encrypted and they grab the log files and the log files happen to have a bunch of passwords dumped in it and stuff like that. So it's more of these remote use cases and these kind of password protection stuff of forcing people to log out re log in really doesn't solve the root of the problem.
[43:44]
Gregor Vand
Exactly. And any modern platform, A, they're most likely using some version or derivative of jwts and most likely now they're also using JWTS with refresh tokens, which basically means that behind the scenes this token is being refreshed. So if someone is lurking around on your system every hour, that thing's being changed anyway. So this bizarre idea that logging you out and logging you back in has any effect on that is kind of nonsense, quite frankly. So, yeah, I think it was just a pretty short article. I think he makes his points in record time, so. Really liked that one from Avery.
[44:20]
Shawn Faulconer
Yeah, it's the illusion of security.
[44:22]
Gregor Vand
Yeah, we didn't touch on this, but yeah, he also mentions MFA fatigue. And that's like an attack in itself now, where basically do something to make your MFA pop up and eventually, eventually you just accept it because you're like, I don't know why this thing's popping up, but I guess I should just like use my touch id because that's what I'm being told to do. And then bing. They've got the access into the platform that they're looking for. So, yeah, it's a very good one to look at. Anything else from your side on Hacker Newstron?
[44:48]
Shawn Faulconer
No, that's all cool.
[44:50]
Gregor Vand
All right. So that was a nice little tour around Hacker News in terms of looking ahead. We're going to obviously be back next month. This is a purely hypothetical. Do we have any, like, predictions for what we might see playing out through July?
[45:06]
Shawn Faulconer
I guess I actually think things are going to cool off in July because in the us, in Canada as well, you have first week of July is kind of gone due to national holidays, then a lot of people are on vacation and stuff. Like, there's not a lot of conferencing going on in July, so you have a sort of slowdown in the announcement cycles and stuff like that. So I guess my prediction is there's not going to be a major, major announcement in July and we'll see if I. I'm probably going to be 100% wrong on that, but that's what I'm going to say.
[45:36]
Gregor Vand
What's my prediction? So interesting. The company Manus AI has been really blowing up, I would say, like over here. And I do feel it's just something, though, where users are finding that they're spending too much money on manis and they don't know what they're spending the money for. So I'm going to make a prediction that something comes out about Manus that is in that realm that, like, user frustration with Manus. I think it's an interesting product. I'm not trying to knock Manus here, but it just seemed like an underlying theme where people are not kind of clear on how they're spending their credits there. I'm going to go out on a limb there and just say something about Manus or if it's about something we talked about today. Again, I'm just going to go out on a limb and say maybe this scale. AI Sudo acquisition hits a roadblock. So let's see about that. Awesome as always. Great to catch up, Sean. I hope we've also given the audience just a nice little tour around what's been going on over the last three weeks, there was a lot to cover. So again, I hope we've hit some main areas for everyone and some fun things as well. So hope to see everyone next time on SED News next month.