Building Open Infrastructure for AI with Illia Polosukhin - Software Engineering Daily

Summary7 min read

Podcast Summary: Software Engineering Daily - Building Open Infrastructure for AI with Illia Polosukhin

Episode Information:

Title: Building Open Infrastructure for AI with Illia Polosukhin
Host: Kevin Ball, Vice President of Engineering at Mento
Guest: Illia Polosukhin, AI Researcher and Co-Author of the Transformer Paper
Release Date: July 17, 2025

Introduction

In this episode of Software Engineering Daily, host Kevin Ball engages in an in-depth conversation with Illia Polosukhin, a renowned AI researcher and one of the original authors of the groundbreaking Transformer paper, "Attention is All You Need". Polosukhin brings over a decade of experience at the intersection of artificial intelligence and decentralized technologies. Currently spearheading Near AI, his focus lies in developing open-source infrastructure tools and products for agentic, privacy-preserving AI systems.

Illia Polosukhin’s Journey and Background

Polosukhin begins by sharing his lifelong passion for technology, highlighting his early ventures into video game development and his subsequent fascination with machine learning at the age of 14. He recounts building his first neural networks in Pascal and securing a remote job with a San Diego-based machine learning company, which eventually led him to move to the United States.

"[01:30] Ilya Polosukhin: ... I moved to the US which was exciting. And then I saw the Transformer paper..."

His decision to focus on natural language processing (NLP) stemmed from the belief that language better captures human intelligence compared to image recognition, which was the prevalent focus in AI research.

Origins of the Transformer Model

Polosukhin delves into the inception of the Transformer model at Google Research, driven by the need for faster and more efficient neural networks capable of handling large-scale data processing.

"[04:22] Kevin Ball: Yeah, sometimes being early is as bad as being wrong, right?"

He explains the limitations of the then-existing models, which processed text sequentially and were too slow for practical applications like Google’s question-answering systems. This led to the innovative idea of processing entire texts in parallel, leveraging hardware accelerators to understand relationships through multiple layers—a foundational concept that birthed the Transformer architecture.

Despite the initial success, Polosukhin notes that scaling these models required substantial effort and collaboration, eventually inspiring his venture into automating coding processes through wipe coding, aiming to reduce manual developer work.

Pivot to Blockchain and Near Protocol

Recognizing the challenges in scaling AI models, particularly in crowdsourcing training data and handling global payments, Polosukhin and his team pivoted towards blockchain technology in 2018. This shift aimed to solve practical problems related to coordinating and compensating contributors worldwide, overcoming issues like monetary restrictions in various countries.

"[04:25] Ilya Polosukhin: ... we started looking at crypto as actually just like solving our own practical problem."

This pivot led to the development of Near Protocol, which focuses on creating a scalable, user-friendly blockchain platform. Today, Near boasts 50 million monthly active users and ranks among the top blockchains by active users and transaction volumes. Near Protocol supports a diverse range of applications, including remittances, payments, loyalty points, and financial instruments.

Vision for User-Owned AI

Polosukhin articulates a compelling vision for user-owned AI, emphasizing the importance of returning control and benefits to users rather than centralizing power within large corporations. He highlights several critical issues in current AI models, such as:

Bias and Data Poisoning: Models may inadvertently reflect biases present in training data or be manipulated through data poisoning techniques.
Governance and Safety: There exists a precarious balance where AI models could potentially hack into other systems if not properly governed.
Data Privacy: Maximizing model utility often requires extensive data, raising significant privacy concerns if that data is compromised.

To address these challenges, user-owned AI focuses on:

Privacy Preservation: Ensuring data remains private and secure within user-controlled environments.
Transparent Governance: Allowing users to understand and analyze the data and biases within their AI models.
Decentralized Incentives: Leveraging blockchain to create incentive structures that prioritize user success and well-being.

"[09:59] Ilya Polosukhin: ... user-owned AI where how do we bring the focus back on the user?"

Secure Computing and Trusted Execution Environments

A significant portion of the discussion revolves around Trusted Execution Environments (TEEs), specialized hardware components that securely execute code, ensuring data remains encrypted and inaccessible to unauthorized parties.

Polosukhin explains how Near AI utilizes TEEs to secure AI model inference and fine-tuning processes. By running models within secure enclaves, they prevent external access to sensitive data and model weights, thereby safeguarding intellectual property and user information.

"[24:05] Ilya Polosukhin: ... secure enclave where effectively now all of the information is streamed directly into the server that's encrypted end to end."

He also touches upon the importance of formal verification in ensuring that the code executed within TEEs adheres strictly to predefined security criteria, thus mitigating risks associated with vulnerabilities and malicious code behaviors.

Implications for Developers

Kevin Ball probes into what Near AI’s infrastructure means for software developers. Polosukhin outlines a layered approach to integrating their secure AI services:

OpenAI Endpoint for GPU Inference: Developers can send data encrypted via TLS, which is decrypted within TEEs for processing.
Agent Hosting: Developers can upload Docker containers as agents that operate within secure enclaves, ensuring that user data remains private even from the developers themselves.
Agent Hub: A repository of pre-built agents that developers can utilize or customize for their applications.
Agentic Protocols: Smart contracts written in Rust or JavaScript that interact with these secure agents, enabling verified and secure operations within blockchain environments.

"[19:46] Ilya Polosukhin: ... package Docker container and upload it as we call it an agent into the system."

Addressing Observability and Debugging

Polosukhin acknowledges the inherent trade-off between ensuring data privacy and maintaining system observability for debugging purposes. To balance this, Near AI is developing tools that allow developers to specify their desired levels of privacy versus observability. This enables:

Full Observability: Complete access to all data, useful in non-sensitive applications.
Partial Observability: Summarized logs and failure reports without exposing user queries.
No Observability: Maximum privacy with minimal developer insight into user interactions.

"[22:24] Ilya Polosukhin: ... specify privacy versus observability threshold."

Future Outlook and Bootstrapping User-Owned AI

When discussing the future, Polosukhin envisions Near AI as a catalyst for democratizing AI development by making it more secure, private, and user-centric. He outlines the steps required to realize this vision:

Infrastructure Development: Enhancing secure computing capabilities to support encrypted AI models and decentralized compute networks.
Community Building: Encouraging open-source contributions and creating financial incentives for data sharing and model training within secure environments.
Formal Verification: Investing in mathematical proofs to guarantee that AI models adhere strictly to security and privacy criteria.
Decentralized Compute Network: Leveraging underutilized global GPU resources to create a scalable and efficient AI compute infrastructure.

"[35:43] Ilya Polosukhin: ... we've got computation into secure enclave mode, join the network, or you can just run it on your own workloads."

Conclusion

As the conversation wraps up, Polosukhin reiterates the importance of community involvement and collaboration in advancing user-owned AI. He emphasizes that achieving a secure, decentralized AI infrastructure will require collective effort, innovation, and the establishment of robust incentive models to support open-source initiatives.

"[49:38] Ilya Polosukhin: ... think through how people can contribute. Right. Because at the end it's going to be an open source like community initiative."

Overall, this episode provides a comprehensive exploration of the intersection between AI and blockchain technologies, highlighting the potential for creating secure, user-centric AI systems that prioritize privacy and democratize access to advanced computational resources.

Notable Quotes:

"We wanted the model to be safe, but the people who build the model have an unsafe version." — Ilya Polosukhin [04:25]
"AI can give everyone personalized context, it can collect information from everyone, it can broadcast it in personalized way." — Ilya Polosukhin [16:16]
"It can deal with the scale, right? So one of the things like as a person imagine you have thousand reports, I mean you'll go crazy and you'll be really bad manager for them. But AI handling thousand reports is no problem." — Ilya Polosukhin [16:16]
"We're going to create this trust level at a mathematical kind of guarantees." — Ilya Polosukhin [24:05]
"We're thinking about how do you do type checks, how do you do unit tests, how do you do all of these different things for a long time." — Kevin Ball [28:43]

This summary encapsulates the key discussions, insights, and forward-looking statements made by Illia Polosukhin during the podcast, providing a comprehensive overview for those who have not listened to the episode.

Loading summary

Transcript56 lines

[00:00]
Host
Ilya Polosikin is a veteran AI researcher and one of the original authors of the landmark Transformer paper Attention is all youl need, which he co authored during his time at Google Research. He has a deep background in machine learning and natural language processing and has spent over a decade working at the intersection of AI and decentralized technologies. His current venture is called Near AI and he's focused on building open source infrastructure tools and products for agentic privacy preserving AI systems. He joins the podcast with Kevin Ball to discuss his journey, the origins of the Transformer model, the vision for user owned AI, document oriented development, and much more. Kevin Ball, or K. Ball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co founded and served as CTO for two companies, founded the San Diego JavaScript Meetup and organizes the AI in Action discussion group through Latent Space. Check out the show notes to follow K. Ball on Twitter or LinkedIn or visit his website K Ball LLC.
[01:19]
Kevin Ball
Ilya, welcome to the show.
[01:21]
Ilya Polosikin
Thanks for having me.
[01:22]
Kevin Ball
Yeah, excited to get to talk with you. Let's maybe start with a little bit of intro about you, your background and what you're up to these days.
[01:30]
Ilya Polosikin
For sure. Yeah. Well I've been I guess tech geek since I was 10 years old. Been building a lot of video games back in the day and then got really excited about machine learning when I was like, well about AI in general and then started learning machine learning when I was like 14, was building my first neural networks in Pascal and got a job actually remotely. So I'm originally from Ukraine working for this machine learning company out of San Diego and they were happy with my work and so they offered me to move. I moved to us which was exciting. And then I saw the Cat Neuron paper that came out from Google from Andrew Yang and Jeff Dean and I was like okay, this is the thing. Like the unsupervised pre training learning about concepts in the world. You don't need supervision. And so I was like okay, I want to do that. And so I applied, I got into Google Research and I always thought that yes, images are cool, but there's thousands of species that can see, but there's only one, maybe some people argue, maybe two that can actually speak. And like languages affect the way we test intelligence, right? We ask questions. You ask a person to read the text and we ask questions if they understand it. And so that's why I wanted to focus on natural language. We were doing question answering, trying to build products into google.com where when you ask questions, it would give you a response. This is where your previous, I guess, CTO was my director back in Google. And one of the challenges we were facing actually that the models we were using, the three coral neural networks, were too slow. Right? They need to read one word at a time. And Google requires really fast response time. And you want to read multiple articles. So you cannot approach it as a human, you need to approach it as a machine. And so that's where this idea of like, hey, what if you consume the whole article, the whole text at the same time in parallel, using the hardware accelerators we have, and figure out the relationship through the number of layers and kind of steps of reasoning instead of trying to read one more time. And so that's what gave birth to Transformers. Kind of like coded up a first version. And obviously now it was not random, it was doing something. So obviously it took a lot of work by everyone to make it really work from there. And then I was excited of using this technology to apply it to actually coding because I always thought, hey, why are we doing so much manual work as developers? Can I just tell the machine? And it figures it out? And so now we call it wipe coding. Back then, we were just teaching machines to code. This was 2017 back then. I went and pitched this to VCs. Most of them thought we're delusional. Somewhere between science fiction and delusional. That was the.
[04:22]
Kevin Ball
Yeah, sometimes being early is as bad as being wrong, right?
[04:26]
Ilya Polosikin
Yes, exactly. So we were very early. We didn't have the capacity to scale the models to the level it needed, even though we kind of were doing, I mean, a lot of similar things. Obviously there's a lot of small details that matter that were done very right by OpenAI. But what we were doing was actually a lot of crowdsourcing. So we were trying to get a lot more training data with people. And we had this challenge, like we had computer science students effectively in Southeast Asia, China, Eastern Europe. And we had trouble paying them because it's actually really hard to pay in a lot of these countries. There's monetary restrictions. Chinese students don't have bank accounts. In Ukraine, you need to sell half of the dollars when they're having your bank account. And so we started looking at crypto as actually just like solving our own practical problem. How do you coordinate and pay people around the world? And so this 2018, nothing that would be like actually even to our medium sized use case, thousands of people would actually scale at that point. And so that's kind of where we're like, hey, we should solve this problem. This seems like a big problem to solve. So we've kind of pivoted our original NEAR AI into near protocol and really focused on solving scalable, easy to use, easy to build on blockchain. And so near now has 50 million monthly active users. It's top one, top two by active users, blockchain, top two, usually top three by number of transactions. It kind of everything from remittances, payments, loyalty points, financial instruments, kind of a whole variety of use cases including the payment for the crowdsourcing and data labeling kind of. We've had that application running since 2021, so kind of build it out. You know, there's ecosystem, it's lots of different people building different applications. But obviously we're on the back of our mind kind of always wanted to go back to AI. And so when we've seen all of the improvements, GPT3 and then ChatGPT, GPT4, one of the newly found, I would say through this blockchain journey understanding is that yes, there's technology, but also there's kind of the governance question of this. And the other part is, I mean one of the interesting things that happens as models evolve, right? There's some threshold at which it actually becomes game theoretic for any of the companies if they have a model that is able to hack into other systems, to actually use it to hack into other labs, to delete their models because if they don't do that, the other labs, when they cross that threshold will do it to them. And there's a safety claim always like hey, their models are unsafe so we need to make sure they don't do something bad. But there's a very interesting state that we can get to pretty quickly and it's really hard to determine where it is. There's also just like practically even now when you're asking these models, you have no idea if the responses coming from statistics and kind of data, if the data was biased in some way. Like even when I worked at Google, right? Like sometimes you just delete some training data because it contains some signal that you don't want to have in your results. And like in turn you're also biasing data in a very specific way, right? Like for example, Obama is born in Kenya was a very prevalent statement back in the day across all of the right wing news. And so like if your eval set has that question, removing all the right wing news actually improves your evaluation set, your evaluation. So there's like unclear biases in data, there can be data poisoning. This can be so called sleeper agents. So there's this concept where you can add into training data some specific modification that doesn't show up in normal evals. But if there's something in a context like a date or a specific statement, it actually changes how the model behaves. And so the way to effectively use it is think of in your cursor, your wipe coding and specifically in one specific case, it will change the import from import transformers to import transformers or misspelled request, which is actually a malicious library in pip. Right. So there's like all of those things that we just really don't know what's going on in these models now. So there's like a governance question which is yes, we wanted the model to be safe, but the people who build the model have an unsafe version. We as users have no idea how this is used. And there's a data privacy question which is to make these models extremely useful, you want to give them as much of your context as possible. There's this hardware device that listens to you at all times. But at the same time, if this data now goes anywhere or that company gets hacked and all the data gets leaked, that's a massive invasion of privacy. You have all this host of problems. And so the suggestion and this vision we formulated, we call user owned AI where how do we bring the focus back on the user? Where instead of trying to build a model that effectively benefits the company, we build model that the meta function is to optimize towards the user. Right. Which means it's private, which means its value loss function at least meta function is toward the user's success and well being. We know which data went in, so at least you know which biases the model has or at least anyone can analyze it and have reports, et cetera on that. And so that's kind of the conceptually user owned AI. And to do that you need all the blockchain methodologies that we have across coordinating people to build data sets and models have privacy technology that blockchain has been developing as well as kind of incentive layer and mechanism to really gear it toward the users.
[09:59]
Kevin Ball
All right, that is quite the background. I'm actually, if it's okay, I want to go back a little bit and just ask you some questions about different pieces along the way because you have a pretty unusual and unique story there actually going back to that paper at Google just really quickly because when I first started getting into I'm later come to machine learning than you are and when I started getting into this latest round, like attention is all you need. The Transformers kickoff paper was like foundational reading club material. Did you know at the time that you were doing that work and doing it how big this was going to get?
[10:31]
Ilya Polosikin
Not really. I think at the time the pace of innovation was very quick. Right. And there was a lot of different architectures and different structures. Right. I mean, in a way, if you think of it, Transformer is really removing things like we removed things from the other models we haven't added. I mean obviously it was like a very powerful architecture because it was so performant because it was showed that actually you don't need to have these recurrent relationships, you don't need to have even convolutional networks and you only need this self attention mechanism to really capture all of these relationships and have a sufficient reasoning capability. And I think the team, I actually was the first one to leave the team continued experimenting and they saw a lot of promise on images and on other contexts as well. So there was definitely promise that this is like a very generic architecture. But I don't think it was clear that this is going to be the last evolution. At the time it felt like new architectures are coming every few weeks. There was something new. There was like neural gpu, there was neural computer, there's this, that. So it wasn't clear that okay, this is it. And then everybody just builds now on top. Right. And figures out how to train it better, et cetera.
[11:42]
Kevin Ball
Yeah, moving on a little bit. So you pivoted fairly early to crypto. I didn't realize it was quite so early. And I think it's interesting because you're actually using it for one of the core use cases that feels like it has continued to be relevant. Right. How do you provide financial services for the unbanked across borders? All these different things. We had this huge boom in NFTs and all these other different tokens being in that space. What parts of that do you, you know, I know a lot of developers have become very skeptical of this. So what parts of crypto do you think are the enduring value and where is it? Just noise?
[12:16]
Ilya Polosikin
Yeah, I mean that's a really deep question, I think. And for context, because it was a very delusional idea in 2017 to build a machine that codes itself. Right. And for context, I tried to do that back when for my master's degree in university. Then it's completely. Nothing worked. So it's like a recurring theme for me. And so we gave ourselves a year and after a year we kind of like, okay, we had some papers, we made some progress, but it wasn't near the level we needed to really make it commercial. And blockchain clearly was like, hey, this is a use case that I'm being from Ukraine, very familiar with cross border payments and kind of money movements and complexity of that. And so I think I cluster the use cases of blockchain effectively into maybe four categories. So one is global identity. One of the real problems on Internet is how to create a global identity. Right now we're using DNS, we're using IP addresses, we're using all these methods which are actually really bad and have a lot of issues. Like DNS has literally a group of people who are proving stuff at the top, right? And from potentially spending ton of money on things they shouldn't. So it is a very clear Internet problem like how do you create a global registry that is open to everyone and has the same rules. So blockchain solves that and you can create it for identity, you can create it naming, service, etc. Second one is payments for sure, how to transfer value between and kind of in any asset, in any value. And it's definitely is we have right now 600 millisecond blocks, 1.2 second finality, right? So within 1.2 seconds you actually move value around the world. Billion dollars, no problem. And hundreds or even thousands of nodes are confirming that. Finally you have marketplaces, right? So one of the really big benefit that you have here is that you can create because of global registry and payments. If you bring them together, it becomes a marketplace. It's a global marketplace where you can sell anything, offer anything. And this is why it's used for speculation. Because the simplest thing to do on marketplace is speculate on assets that don't have any other value except for what people intrinsically assign to them. But you can think of, for example, if you want to buy, you know, 100 tons of steel and you want to get that delivered to you right now, you'll need to email a bunch of people, probably call someone, figure out, probably call Flexport to get the shipping going, warehousing, et cetera. Or you can imagine, and I mean we'll get to it, but you can effectively say, hey, I want this done on the marketplace. And then you have now other actors who are like, hey, I will do it for you for this much money. And there's a contract with money, with escrow, everything on chain, guaranteed execution when the factor is delivered and the value itself is tokenized right? So while it's in progress, you effectively can borrow against it because it has the escrow money locked in this, which is like trade financing. So there's kind of a lot of financial instruments you can build on this primitive of marketplace. Finally, the last piece is kind of this coordination. And I think this is where I think blockchain has failed. It had a lot of promise of like, hey, we'll have a new type of organizations that don't have traditional management, which is like, I think everybody agrees and in any good organization people try to go away from I'll tell you what to do, right? It's more I'll support you in what you're trying to do. But it still kind of creates this hierarchy and you need this hierarchy because people cannot scale the relationships. And so the idea was like, hey, we can create effectively game theory to coordinate people instead and use kind of on chain mechanisms to pay and do this. And I think that failed because people are messy and there's a lot of people things that needs to happen. And this is where I have a whole thesis about actually AI being in the middle of this coordination actually solves a lot of these problems because.
[16:12]
Kevin Ball
Because it can deal with messiness in a way that traditional code can't and.
[16:16]
Ilya Polosikin
It can deal with the scale, right? So one of the things like as a person imagine you have thousand reports, I mean you'll go crazy and you'll be really bad manager for them. But AI handling thousand reports is no problem, right? It can give everyone personalized context, it can collect information from everyone, it can broadcast it in personalized way, et cetera, right? So it actually scales with the organization. So to me this is like main 4 use case kind of core primitives that then everything else on top like hey, we want to bring whatever real world assets here is because of the marketplace, right? We want to issue equity as a token because of the marketplace. We want to figure out how to build new type of organizations because of this coordination mechanism. We want to coordinate payments, et cetera. So all of these pieces reinforce each other, but they are the use cases. And then everything else, like for example, privacy and other things, they kind of leverage some of this, right? If you want to have. So for example, we use this approach called Trusted Execution Environment. So this is a specialized hardware element that are available on Intel CPUs, AMD as well as on Nvidia GPUs and some of other accelerators. And the idea there is you can use it like Azure provides you this service as well. But you need to trust Azure. Azure tells you like, hey, we're running it in secure hardware.
[17:35]
Kevin Ball
There's so many things right now where we're just like Microsoft, Google, Amazon, we can probably trust them, right?
[17:41]
Ilya Polosikin
Yeah. So versus if you have this global registry now, the device can register directly and say hey, here is my certificates from intel and Nvidia and you can verify them on chain and now there's an IP address registered. So like when you go to them you have all of this cryptographic, routing and supply chain to verify directly without needing to trust extra cloud provider. And so you can now build a full cloud which is just from directly providers who self registered who can come in online. Which means you can also find a closer for example data center and provider for your AI inference to reduce latency. You can distribute the compute more evenly and not have all 100,000 GPUs all sitting in Memphis and using all electricity. You can actually have privacy because the data is fully inside secure enclave and not visible even to the hardware operator. And you know what model runs there, you don't need to be oh did I run 4.030 or like did they change it yesterday? I have no idea. Right. Like you can actually have guarantees around that. So it actually gives a lot of these guarantees because we have this blockchain layer for identity, coordination and payments. Right. Because you need to pay these people to use their hardware.
[18:54]
Host
This episode of Software Engineering Daily is brought to you by Capital One. How does Capital One stack? It starts with applied research and leveraging data to build AI models. Their engineering teams use the power of the cloud and platform standardization and automation to embed AI solutions throughout the business. Business Real Time Data at Scale enables these proprietary AI solutions to help Capital One improve the financial lives of its customers. That's technology at Capital One. Learn more about how Capital One's modern tech stack data ecosystem and application of AI ML are central to the business by visiting capitalone.comtech.
[19:34]
Kevin Ball
So I want to dig into that and from a few different angles. But since this is Software Engineering Daily, let's start from the software side. So if I'm a developer wanting to tap into to that, what does it end up actually looking like for me?
[19:47]
Ilya Polosikin
Yeah, So I mean depends on where you are in a stack of what you're trying to do as a developer. Right. So the simplest way we have for example just an OpenAI endpoint for GPU inference that runs inside secure enclave. Right. So everything you send there, it's TLS encrypted on your side. It's decrypted inside secure enclave. Nobody in the middle can actually access it. It runs on the model that you asked and you get back and you have a certificate again that you can check and verify that Nvidia and Intel signed effectively on that. Now if you want to build an agent for example that runs on behalf of a user and even you as developer don't have access to what users is asking for, which is super useful, right? As you go financial use cases, medical use cases, but also just daily life, right? Imagine you have fireflies or this recording of meetings bots, right? Right now their servers are getting all of your calls and all your data which is like now I need to think about are they going to get hacked? What did I say? Or if they were using our stack, they could have put the whole system into the secure enclave where effectively now all the information is streamed directly into the server that's encrypted end to end run there and then only you get back the result and then developer just uploads their code, right? So you effectively package Docker container and upload it as we call it an agent into the system. It uses private inference, but the agent itself, your general code runs in the secure enclave mode as well. So we have an agent hub where you can see a bunch of like we have about 1,000 agents who are running or can optionally run in this mode. Now if you're even lower level developer or you yourself want to build something that includes payments and other systems, that's where we have this idea of agentic protocols where you can effectively create a smart contract. So a contract written Rust or JavaScript that runs on blockchain that itself can call into this agents and get back the result kind of this verification. And so the examples we have now are mostly about trading kind of in financial use cases. That's the first thing people do. But again let's say somebody wants to build a naming service or something else. You can also have this kind of things where again the logic happens like maybe your pricing model or your loan evaluation scoring happens in this verifiable way and then the execution of actions happens through the blockchain. So it really depends on kind of on the level of the stack. You want to build your applications in.
[22:24]
Kevin Ball
A few different questions about that. So thinking about this model of I'm a developer, I want to build a secure agent or something like this, I just upload my docker container. Now for me as someone who ships a lot of applications, I immediately start saying okay, what about observability. How do I know if they run into a bug? How do I debug this thing? What does that end up looking like in this stack?
[22:45]
Ilya Polosikin
Yeah, so this is where things get interesting because now you have a trade off between privacy and observability, kind of on the different sides of the spectrum. So we are actually working on analytics and debugging system that sits underneath as you ship your docker to give you some of the observability, where you effectively specify privacy versus observability threshold, which, I mean, you kind of will inform the user as well where you want to sit. And so obviously you can have full observability, but then you have access to everything that users put, or you have none, or you can have somewhere in the middle where it actually summarizes stuff for you and maybe gives you the logs of failures and bugs, et cetera, but doesn't give you the exact queries that users sent. Right. So we actually have exact kind of sprint on building out the tooling, including like quality control, latencies, times like all of the stats that you actually need as a developer to understand how your agent is working.
[23:42]
Kevin Ball
That makes sense. Maybe. Also, can we go in a little bit on the trusted execution environment? And in particular, I'm thinking about things like, okay, I can know that if I'm a user or I'm a developer sending something off to a service, I can know my data is encrypted, I can get back stuff that it was encrypted. How do I know that your software isn't just posting that data somewhere else? Is the trusted environment locking down the network or how does that all work?
[24:05]
Ilya Polosikin
Yeah, so there are a few things that happen. So first of all, when you establish a session, you're effectively getting back the hash of a docker container that runs there, which is authorized by the hardware. So the signature you get effectively says this docker container runs on this CPU and this gpu and you can verify that. So if developer published this docker container, you can make sure what it is. Now, not everyone wants to open source everything they do. And so this is where a indeed the plan is to have a firewall system where you can indeed lock the access, because you may want it to go and access some APIs and some MCP servers or whatever. Right. The other piece is we're actually working with external team on agent security, so where you actually have an agent itself who runs Insight, who inspects the code of this docker, of this agent that use developer uploading and so it effectively gives you a security report based on like, hey, it looks like it's sending all the requests it received to some external IP address. Right. Or maybe it like parses all the API keys and leaks them. So effectively we can have scanners that are themselves are AI based that there's no person who looked at external developer code, but there's AI that looked at it and certified it in some way. Now that is cat and mouse to be clear. But with combination of this methods you can get some reasonable level. And then the longer term research we're actually investing in formal verification. So this is a bit more again fundamental. As I mentioned, I think there will be a threshold at which the models will start hacking into other systems. The thing is both people write code with vulnerabilities and AI now trained on the code with vulnerabilities writes code with vulnerabilities. There's this image obviously with a thin slice, everything is on top. We're kind of layering in more now as AI at a faster speed. And so the fundamental way to solve that is if we have a mathematical proof that the code that runs exactly satisfies your criteria. So usually right now when form verification is used because it's so expensive, like it's manual work, you only do it once for some set of criteria. The problem is the set of criteria itself can be wrong. And so what you want actually is when you're calling the service you want to provide you as a developer effectively calling into it want to provide set of things that you want to be guaranteed. For example, that none of this data is leaving this enclave and only this URL's are getting accessed in this way. And then the service actually responds back with a verification like certificates around the secure enclave and verification that indeed this criteria satisfied. And so this is actually what we're working on is really to build this trust level at a mathematical kind of guarantees. It's also very useful for blockchain where people getting money stolen all the time. Where this is like a very fundamental piece where if I'm putting in money, I want to guarantee that money will not I'll be able to withdraw at least as much money as I put in. And so it's a very short term applicable to blockchain and it like but long term we want it applicable to every service in the world because this is actually how we're going to stop kind of this sprawl of vulnerabilities in all systems.
[27:36]
Kevin Ball
Yeah, that's fascinating. Do you think that's going to Kind of limit the set of programming environments that is able to work in this space.
[27:45]
Ilya Polosikin
I mean, we're going to have either way kind of collapse of programming environment as coding models get better. Because the thing is like, I mean, AI really doesn't care. It can write in any language, right. And so it's actually, it's better to write a language that's more of written because more training data. Even now that part is getting solved because there's some companies where, you know, they just generate a lot more training data in the programming language of the target. And so you can train in that. So I think it will be really more important to have this kind of strong guarantees of security than having 50 different programming languages people can write in. I use this before we would write code once and read it many times. And so you wanted to make it. Now we write code once and read it never.
[28:31]
Kevin Ball
You know, this is interesting, right? Because it kind of taps into a few different pieces. One is with LLMs or anything that's sort of kind of probabilistically generated, the ability to validate rises in importance tremendously.
[28:43]
Ilya Polosikin
Yes.
[28:44]
Kevin Ball
And in fact, one of the reasons I think that coding is such a useful environment or something that's so amenable to these models is because we already have to think about validation, right? We've been thinking about how do you do type checks, how do you do unit tests, how do you do all of these different things for a long time. What do you think are the attributes that need to be there for a programming language to be a good LLM target? Right. So like, for example, I've seen LLMs do a much better job at generating strongly typed code, particularly because agents are able to use that as a part of their feedback loop. Whereas if you use a dynamic programming language, even one with a lot of training data in the corpus, JavaScript, like it's not as joyful of an experience working with LLM code.
[29:24]
Ilya Polosikin
Let's say that, I mean it's very practically speaking, right? It's like a lot of the types, especially in languages like Rust, they become very semantic, right? I mean, at least when I build, I try to make a semantic typing, even if it's the same underlying thing. But for example, for near smart contracts, we have an account and balance as the type, even though it's like U128 underneath and a string. But those semantic types allow to effectively, when you look at the function's specification, you can like, hey, this is amount in, amount out. This is from two accounts, right? So it gives You a lot more kind of context as a human. So I mean, AI is not that different. AI has, I would say at this point, lower ability to kind of disambiguate and like map some of the complex structures. Right. I mean this is also just practically speaking, the models have a limited amount of reasoning steps. They do, right? I mean you can run them for longer. This is where all this O style models and R style models come, where they literally run. Okay, we need more reasoning. Let's just push more tokens through the inference. But obviously it has its own limitations. So when you need to map like, okay, there's an argument coming in, I need to look at everywhere else where this function was called to understand what semantic meaning this argument has. Like it's obviously way harder. And then memorize that when next time I need to call a function to really disambiguate this. So yeah, I think strongly typed and then adding this form of method because this actually adds you additional semantic properties, right? So for example, for sorting, it will literally, I mean what we're designing will give you like, hey, actually the return will be such as that every element is larger or equal than the previous element. Now you have like semantic meaning of the whole function without needing to read the implementation and maintain that constantly. So it gives you like a lot more properties. So I think that is going to be the more useful environment for AI generated code. Because then indeed we don't need to go and read and validate it because again we have like engineering team who are using AI now on daily basis and you kind of cannot catch up anymore if you have five engineers who are pushing 10,000 lines every day of AI generated code. We actually starting to think how to manage the team, how to structure the organization to the code differently than you would do before. Because before you would usually want to have multiple people who know how the code works and review each other's pull requests, et cetera. And now I actually think it might be not like it actually would slow down things and maybe not very useful. Instead just give everyone their own subsystem to own and they just need to doc like the. There needs to be documentation that describes what the system does, which ideally should be enough to regenerate the whole system through LLM. And then there should be just a bunch of tests.
[32:31]
Kevin Ball
This is really interesting and relevant because everybody's trying to figure this out, right? Like okay, these tools dramatically accelerate our ability to write code. What does that mean for what we do and how we do it and what you're describing is actually very similar to what my team ends up doing where we call document oriented development. Right?
[32:48]
Ilya Polosikin
Yeah.
[32:49]
Kevin Ball
The core thing you're engineering is this specification or document that can be used to generate the code. The code itself is like. It's like a binary.
[32:58]
Ilya Polosikin
Yeah. And then the other interesting thought that, I mean, we've a little bit experimented but haven't fully implemented yet was if you depend on somebody else's system, you actually write tests for their system. So usually you expect them to write tests and then you just use it. But because they may regenerate all the code tomorrow completely, you want to declare your dependencies through tests.
[33:20]
Kevin Ball
Oh, that's fascinating. So you essentially are writing like, here are the guarantees that I'm depending on from your system so that if you regenerate it, it makes sure those continue to be valid, correct?
[33:31]
Ilya Polosikin
Yes. Then each system can be literally owned by one person. And if that person moves on to another, whatever, and somebody needs to come in, they need to read documentation and they can even regenerate the whole thing if needed. And other subsystems will tell if something is off.
[33:46]
Kevin Ball
Another piece of this that I'm curious if you have thoughts on is how do you indicate to the LLM what sets of context to pull in for any particular subsystem that it might be editing? Is it just that one document or there are links in different ways? Like, how do you think about that?
[34:01]
Ilya Polosikin
I mean, ideally that document has like as much context about that subsystem as possible, but you may need broader context somewhere. I think the courser has its rules, which are kind of a useful concept. I think, you know, some links and some kind of maybe again, hierarchy of dependencies is useful as well. But yeah, I haven't seen that like fully worked out yet. But this is definitely an interesting as well. Like. Yeah, what is the knowledge graph of the. The systems as well? Especially when we're talking about really big code bases, like hundreds of thousands of lines of code that becomes the mapping out the concepts. LLM needs to do that somehow. And so you kind of need to feed it enough of information to do that without also overwhelming its context. Even a million tokens is cool, but if we're talking about 100,000 lines of code, that's way more than million tokens.
[34:50]
Kevin Ball
Usually coming back a little bit to this privacy first AI that you're talking about, a thing I'd love to get your sense on is kind of around how to bootstrap this, right? Because looking at the industry right now, one we have models themselves are extremely expensive to train. And two, we're in what feels like a worldwide GPU shortage where there's literally not like I was talking to a couple different folks at AI companies and they're like, yeah, we just get throttled by the providers because they are out of GPUs. There is not enough GPU for all the inferences that are happening. So in the big corp world, they are all putting massive amounts of capital down to try to build out new data centers and all these things. If we're looking at a privacy distributed type of system, how do you actually get that built?
[35:44]
Ilya Polosikin
I'll start with the second part because it actually, it's a solution to this problem. So right now you say, hey, I'm going to, let's say Anthropic courser and it starts to throw on me. And the reason why this is happening is not because in the world there's no GPUs available right now. It's because Antropic doesn't have access to GPUs available and they don't want to get their model to be run on some GPUs. They don't know who runs. Right. They trust Azure, they trust Amazon, maybe they trust some other provider. They don't trust me having a box of eight GPUs to upload, you know, their whatever, 4.0 model. And it is a real challenge, like the model providers, because that is main ip, like it's a very valuable IP if they give it to somebody else to even, you know, there is actual providers like Fireworks and you know, together and others they're serving open source models. They could serve other models as well, but the model providers don't trust them. And so what we actually were solving that problem because we actually say, hey, we have this secure enclave where if you upload the model, neither the hardware provider nor the user can have access to it. It's effectively in sealed container, but now you can deploy it anywhere. There's a data center in Philippines that's underutilized. Cool, let's ship a model there and serve it from there. There is somebody has hardware in Tokyo and there's a bunch of requests coming from there. Cool, let's make it there. So it's actually solving this exact problem of right now. You kind of need to like everybody's building big data centers for themselves, but then there's also a lot of smaller like 10,000 H1 hundreds and H2 hundreds data centers built everywhere right now, which are actually underutilized. Like if you Go to there's GPU list and there's like SF Compute and a few others. They actually have a lot of inventory which is not underutilized because nobody wants to go and buy 4000h, 2000s or whatever for a year. Unless you're a big company, you don't need that much. I just need to run whatever that model that just published yesterday on 10 GPUs. And so that right now is a highly inefficient market. And so you remember we talked about blockchain being really good for markets. Well, this is where the solution comes in. And privacy is a very important component because of this like kind of IP needing to be moved around in an encrypted way. So this is part of our decentralized confidential machine learning cloud where you can actually encrypt your model. So it's kind of addressed in encrypted format. And then when somebody needs it, it gets decrypted inside secure enclave and gets used there. And you can run it across any place in this decentralized compute network and you get automatic rebalancing and validity from that. Now how do you bootstrap? This is an interesting question. Now this is also where blockchain has an approach. And the approach is effectively subsidizing initially compute while you're growing the network. So this is how Bitcoin grew. It was effectively subsidizing compute before it had any value. People were willing to bet that it will be valuable and started mining it. And then as value grew, it caught up. And so there is an opportunity here to have a very similar model where we effectively subsidizing people coming with compute while we growing the demand. And then again open it up for more model providers to actually serve their model. And imagine now Entropic is like hey, rate limiting. Or you can use this decentralized compute, which is verifiable. We verify that it's all the path is correct. Cool. We're going to upload our model. And now everybody can use including actually if you have your own GPUs, you can turn them on into this mode, join the network, or you can just run it on your own workloads. So you have them sitting under your desk or in your data center. Now you can use it for your own workloads, but you're still paying Entropic for using it. So that's important part. It's not like open source, free as a beer, but it's actually you're paying back the developer for using it, but you cannot get actual physical access to the model weights.
[39:58]
Kevin Ball
Yeah, that's fascinating. So in some ways, if I were to sort of replay your argument here. Each hosting provider is building for peak usage and it's inefficient assigning. Essentially people are saying, who do I trust? Well, if I'm anthropic, maybe I only trust the big three and that's the only people I'm going to use to host my model. And you're saying, okay, well there's all of this spare capacity out in the world where the gap is trust and coordination, human coordination. Right. Building those contracts or what have you. So if you can automate that layer, suddenly you have a much larger pool that can scale up and down.
[40:34]
Ilya Polosikin
Yeah. And it solves, I mean, latency and even electricity problem. Right. Because you're kind of distributing the workload right now. It's effectively like Amazon needs to build a big cluster with a gigawatt electricity station on it. Or you say, hey, we actually have a lot of smaller data centers with smaller power consumption around the world and so we can just distribute across them.
[40:53]
Kevin Ball
That's fascinating.
[40:55]
Ilya Polosikin
For context, Nvidia has had run this program where they effectively gave allocation of GPUs to the smaller data centers around the world. I mean, their strategy has been trying to counterweight some of the hyperscalers who have a lot of the GPUs to have a big small distribute like smaller 10, 20k clusters around the world. But those are underutilized because if you're sitting in Silicon Valley, you would go to Amazon, you wouldn't go and hunt for a data center in Japan or in somewhere in Norway.
[41:26]
Kevin Ball
So that in some ways solves the GPU coordination issue. But it doesn't necessarily solve some of the things you brought up before around like sleeper agents and unknown biases. If we're distributing anthropic models and OpenAI models and things like that. So what about the model bootstrapping process?
[41:45]
Ilya Polosikin
Yes, yes. So that is harder. It's step by step. First we need infrastructure where we distribute some models, including potentially obviously the easiest ones are open source that already exist, like deep seqs, quens, llamas, et cetera. But indeed, even though we call them open source, they're actually not open source. They open parameter models, we have no idea what went into them. And so how do we actually do a truly open source model? Well, we need to train it in this way where we know what inputs went in, but if you also release the weights, then you're not going to make Any money. So you actually can train the model inside the secure enclave where the outcome is not known to anyone. The outcome is always encrypted and only usable inside the secure enclave. So now you have a model that's not owned by anyone, it's not owned by any single company. You can have effectively token holders, community to come together, say hey, we're going to train this model. Here's the data set, here's the model training process, let's collect, let's say amount of dollars required to do this, we're going to launch it, it's going to train and now this model going to be used inside this network as well. And the revenue going to be coming back to people who put in the work and the money to train it. And the token is effectively now a method to distribute this value back and forth. And so with token you can now fundraise, you can go and say, hey, we're going to be training an open source model that is going to be encrypted weights, not open weights, encrypted weights that is actually going to generate revenue. And now you can invest and get return and maybe reinvest in the next model or cash out. And so now that's still like I skipped some hard parts which is getting the right training data, getting the training process. But this is kind of the scaffolding of how to do this is actually create a community on models where the community decides what goes in training data, et cetera, they can inspect, they can decide and then the training process happens inside this. Now I will caveat it that like the reason why this haven't happened yet is because the confidential computing tech is only catching up. So this whole thing has only been really possible for about a year. And so like the broader community we have this really great partner fala, who's been building a lot of this infrastructure for confidential computer compute. And only the black holes actually support a cluster level confidentiality. And so that's not yet available. So this is kind of like we're growing with the compute and hardware actually availability of that. But the idea is to have this kind of system ready as soon as hardware is available. Right now you can only like right now you do inference and fine tuning on the H1 hundreds, H2 hundreds in this way because you don't get the cluster level, you only get the machine level confidentiality.
[44:34]
Kevin Ball
It's a really interesting model because it essentially inverts what open weight models do today, right? Instead of saying hey, we have a set of training data which we might tell you about, but you don't know Exactly. And we have a training process as well. Like here's the software that's going to run, here's how we're doing reinforcement learning at the end or tuning or all these different things and then we publish the weights. You're saying, okay, let's take all that initial stuff, make that open, make that visible, make that public. But the outcome, we're going to hold onto that in an encrypted way so that we can actually recoup some of the investment.
[45:06]
Ilya Polosikin
Correct.
[45:07]
Kevin Ball
Interesting. So you mentioned the hardware is just getting there, all of these different pieces. Can you project out like what does the timeline in your head look like for how this is going to play out?
[45:21]
Ilya Polosikin
I mean we started talking about this about I would say eight months ago. Right. So in past eight months the hardware started to catch up. We have built out the initial things where you can run this inference. Now there's some first versions of fine tuning of the inconsidential way as well. So fine tuning is the first version where you can take like you can take an open source model like Deep Seq and then fine tune it on private data or in public data and then the weights are encrypted now but still monetizable. So that's kind of the first version of in the step by step process. I think the proliferation of Blackwells will be required for this really to turn the next step, which given Jensen's projections should be happening anytime now. So I think within next year we're actually going to start seeing this really working. And then year and a half, two years is when I think my hypothesis is the open source and kind of this ability to coordinate a lot more people contributing data, contributing research expertise is able to actually outrun the centralized labs if done well, if done in the right way. For example, we may need AI researcher agent that sits, that is able to get everybody's ideas, score them, maybe run some evaluations, et cetera, because you need to allocate compute on some of those things. So I think the goal is probably within two years to get to the speed of innovation that happens in this user owned way is faster than what's happening in closed source labs. But again confidentiality you can use now. So there's benefits from this now that people can already benefit. Again, I think any use case that touches medical, financial and those highly sensitive government areas is definitely can leverage this now and then there's also just a lot of enterprises who are uncomfortable with giving all of their data To a company. To another company. Right. So this is useful for them pretty much immediately.
[47:21]
Kevin Ball
Yeah. Well and as you highlight if for example Anthropic or OpenAI or someone like that wanted to be able to give guarantees and say, hey, we can't see your data, we literally cannot see it. They could also start running.
[47:34]
Ilya Polosikin
True. Exactly.
[47:35]
Kevin Ball
In this way.
[47:36]
Ilya Polosikin
Yeah. And like maybe for them it doesn't make sense immediately, but for next level of companies, Right. That don't have as big reputation, this actually really makes sense to do right now. Yeah.
[47:47]
Kevin Ball
Awesome. Well, we're getting close to the end of our time. Is there anything we haven't talked about here that you think would be important to discuss before we wrap up?
[47:55]
Ilya Polosikin
I mean I think given this principle, I encourage people to really think through like how people can contribute. Right. Because at the end it's going to be an open source like community initiative. Right. There's a financial incentives and model to reward people. Because I think one of the challenges is open source historically been how do you support it. Yeah. Unless you work for Google or Microsoft which kind of pays your salary. Right. It's a very thankless job. But I think the opportunity here is actually kind of create something that indeed can move quicker and has the wisdom of the crowd coming together as well as can use some of the private data that maybe you don't want to actually touch. But one of the ideas was again because you have a verifiable compute, you can have a pipeline where let's say you take people's private data but then you have a very specific cleaning process that everybody looked at, audited and agreed like remove Social Security numbers, phone numbers, addresses, names, et cetera, which not useful for training these models anyway. And everybody knows that they can contribute data and receive some reward and it will be cleaned in the right and expected way and then that data is never seen by anyone anyway and fed into model at this pre training steps so you can have this new ways of actually even gathering more training data or for research where let's say right now medical information is not able to be used but you can run inference on it in a private way. So there's just like so many new use cases and opportunities. So I just encourage everyone to kind of think through where they can really leverage that and reach out and connect with us and the team to leverage what already is available now and then contribute to building this forward.
[49:39]
Kevin Ball
Great, that seems like a good wrap.
[49:48]
Host
It.