
Loading summary
A
Foreign. Welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel and I'm joined by my co host this week, founder of Small AI.
B
Hey. And this is a little bit of a Latent Space discord reunion because we have Fa Chik in the house from B2B. Welcome.
C
Hey, good to see you guys. Thank you for having me.
B
Help me with your last name because I realized I've never had to pronounce it malegiously.
C
No, no. Well, I. I guess closed in, but. Mlainsky.
B
Mlansky.
C
Yeah. And my official legal first name is Vaclav, but everyone pronounces it as Vaclav, which I hate. So it's just Vasek.
B
Okay. Awesome. We're Both invested in E2B in different ways, but you and I go back the furthest. I just realized three years ago when you were working on DevBook and you were interested in sort of that developer experience angle and somehow you pivoted to E2B. Maybe you want to tell that story.
C
Yeah. So Thomas, my co founder who's our CTO, we've been interested in DevTools for quite a long time, like six, eight years. And before DevBook there was like a bunch of iterations. There were different iterations and pivots of DevBook. We just stopped renaming things at some point and just went with DevBook. The one you are talking about was interactive documentation for developers. So basically the idea was we wanted to instead like you as a developer when you come to a Tools Docs website, instead of reading about everything and then googling it and trying it in your coding editor, the idea was to give you interactive experience in the browser so you would have like pre made interactive guides playgrounds. You could try things right away and the company, the owner of the docs, would prepare the experience for you because you would also be trying everything in the browser. They would now see if you get stuck anywhere what you are doing so you get a very valuable onboarding analytics. We actually built an interactive playground for Prisma. I think it's still up. They are still using it the last time I checked a couple months ago. So it was like interactive guides and playground that you could try out PRISMA without having to manually set up all the databases and everything. So you could just try Prisma right away. And that was like the very, very first version of actually our infrastructure that.
B
We are offering now basically sandboxes.
C
It was sandboxes. It literally was sandboxes. The same technology but just completely unscalable.
B
So Then did that somehow in 2024ish.
C
Turn to E2B 2023 in March23. And we were pretty burned out. Thomas and I, we were working from Prague, from the Czech Republic, from my apartment. Nothing was really moving, no growth. And GPT 3.5 came out like really first model kind of good ish with Codegen. So we took, let's take 10 days break, two weeks break from dev book because everyone was trying things with AI was very clear, like this is something where, where the future might go. So we wanted to just like from out of curiosity try things out. We wanted to build like a Devin kind of like thing. The first idea we had was like let's automate our work because with every project we were starting there was a set of tools you always want to integrate in your backend. Like stripe, like for SaaS business, like stripe Analytics, Slack notifications, emails, sending out emails. And so we gave the agent tools to run code and we needed some kind of sandbox. We were like, yeah, that's good coincidence. We have sandbox from DevBook and we posted about it on Twitter. Basically the agent actually pulled GitHub repository, wrote code, started the server, tested everything. And at that point I think we deployed it to Railway. Railway had like the best DX and it was easiest to just plug it in into the agent. I tweeted about it and I think Greg Brockman retweeted it like hey, I don't remember exactly what he said, I need to find a tweet. But it was around the time where OpenAI was retweeting or OpenAI co founders were retweeting all the things that people are doing to show what you can do with GPT 3.5. And people like it had like half a million views after a few days. And so we were with Thomas, we were like, we gotta do something like people are interested in like. So we just open sourced it. The repository and it was named. The organization was like the AI company. Like we had no name. I kind of wish like we could have legally stick with that. We just came up with a name. It was like ituby because you take English, you convert it to bits. That's how it started. We started building community around it and like two days later we were like, let's focus on the sandbox part, not on the agent part. We actually had a bunch of hypothesis behind it, like why it might be more interesting than building the agent.
A
And then you had the smart developer run smart developer on E2B which that.
C
Was a little bit later.
A
Yeah. What was the.
B
This is in 2023, when you first.
A
Launched, E2B was kind of like a AI agents cloud. What did you call it?
C
23. The hypothesis was like agents, cogent agents especially will need some kind of environment to run the code. And the same way developer needs a laptop or something, you know, so. But we struggle a lot with how the product should actually look like and what's like a go to market the first version. And so the high level idea was like we will host your agent, everything from actually deploying it to then monitoring it. And it will also have this environment running the code. One of the first test project was taking Sean's small agent project and deploying it inside our sandbox and just giving the agent tools to pull the GitHub repositories, work on that, do a print and then Post it on GitHub. Post the PR on GitHub, which was very, very popular. It was clear that there was something very interesting.
B
It's amazing how much people went with that. It was literally not meant to do that. It was meant to do Chrome extensions. Yeah.
C
And I think you had good insight that you let the agent plan the work in Markdown file basically and write down the specific. Which I think even now when you look at deep research agents, that's sort of oftentimes what they are doing. They plan everything.
B
Yeah. These days I would say they should do more structured output than Markdown, but that's an implementation detail.
C
So that was taking off. But it attracted a little bit different audience than we wanted because it attracted a lot of people who nowadays would be using tools like Lovable, for example, you know, they just wanted to build projects for them. And of course like it wasn't really working at the time. It worked for like one simple website, but the moment you wanted something more complex, like it was really hard.
B
I've been also reflecting on why I stopped working on it and there was actually a. So when I built it, it was with Claude 3, the new Cloud 3 launch. It was trying to utilize the 100k context of Claude 3. And I used the same project to do this, to try to repeat the demo that I made for myself a month afterwards. And it wasn't anywhere smart. So Cloud 3 got dumber. But it looks like I made up the demo or something. But no, literally I just reran the same code and it just was not as capable. I think to some extent this is like RNGs hurting me or it's the time that it's like A different month. So the model is different or something. But I think also basically people had this vision of what they wanted and then they tried to do it in reality and he couldn't do it because the models are not ready.
C
So a lot of feedback we got at some point people thought like small agents is from us because we had a website deploy small agent. But we were no, that's like Sean's work. You should go to his repository, submit an issue.
B
No, they could clone it. It was like.
C
Edit it however you want. For us it was more like a test. If the environment, the sandbox is useful for the agent, if it can sort of scale and it can work, which worked well. But then it took us another, I would say six months to actually find the right go to market strategy which was code interpreting for us. So typically AI data analysis, data visualization inside like a headless jupyter type of notebook environment.
B
Specifically jupyter.
C
It doesn't need to be jupyter. The important part is that you don't need to explain the model and the model doesn't need to care about how to keep the state of the program running. So it was especially with our earlier models. I think the models are now smarter but they kept producing code snippets and thought they can reference to previous variables and functions definitions which if you had only normal code execution like you run the code snippet and then you finish it wouldn't work. So you need some kind of repl environment. And what was, I think especially early on like Python was the language that the models were probably the best or one of the first languages where the models were working very well. Looking back at it, there was like a really strong pull from people. People just wanting to visualize data and talk with their data. Python was really good at it and it mixes well with jupyter type of environment because you get charts out of the box. You can also support interactive charts. Jupyter itself isn't the right environment to actually do it because all the technical problems that you will run into once you start doing it on scale. So it gets slower and slower. And actually what we are coming into is we are building our own thing internally just to support LLM specifically it's got your own runtime.
A
At what point did you start going from just code interpreter run code to expand? Because now you have people doing rft, you have computer use and all of those things. When were the models ready for people to start using it? You demoed one of our first events as well and I think there's always this lag between the infrastructure that you build and the capabilities of the model. When did you go from just code interpreter to start saying, okay, now it's time to do computer use, now it's time to do RFT.
C
That was probably end of 24, start of 25. So when you look even at our data 24, we are growing, we are growing good. But 25 is like up to the right. And so it feels like at 24 people are like figuring out these agents and building them and trying them and 25 is everyone moving them into production and finding more and more use cases. Around end of 24, start of 25, we started seeing things like using Sandbox for reinforcement learning type of use cases or using Sandbox for computer use, which was very interesting. When Anthropic launched their computer use, we had like a desktop version of a sandbox that was sitting in our GitHub repository for six months. We were like, okay, this is probably interesting, but no model can actually use it. So when Anthropic announced it, it was like we have something here, we can show you. Using Lovables and Blitz's type of products. Also started using the Sandbox for. For more than just like run code snippet like data analysis and then deep research agents. That has been something really big in the last few months.
B
How does deep research agents. Are you referring to Manus or.
C
Yeah, Manus for example, one of the.
B
Companies Deep research I typically think of as like a search, like a web search heavy task, doesn't really use a code interpreter in any way. I have some idea that Manus uses need to be in an interesting code interprety way to do deep research. What's the difference?
C
Yeah, I think it's a good idea to stop thinking about the Sandbox just for code interpreting and more about like a runtime code runtime for the LLM or the agent. The use case for the sandbox. It's a very horizontal in a sense that it can cover everything from the agent needs to create a file, make a to do list. It uses browser not inside the sandwich. That's like separate. You have browser use being used for research on the website, but then you download the data somewhere. You need to transform the data. You want to do data analysis, you want to actually write a small app, you want to create an Excel sheet. So it's the same way you are using your laptop as a human that is the same useful for the agent. So you can think about it as more like a dev box and at the same time, the agent that's using it is also like a very, very good developer, good accountant, good slides creator, researcher. And so you are just basically giving it tools to let it do the job even better and faster.
A
Yeah, and when you say up and to the right, I just want to share some numbers from the investor updates. So March 20, 24, which you were talking about.
B
I don't know.
A
Yeah, yeah. You shared this. No, you shared these. You shared these publicly. Yeah, but yeah, March 24th you were doing 40000 sandboxes. You want to say how many you've done last month? March 2025, I.
C
15 million. Yeah, around that.
A
So in one year you're gone from 40,000 to 50 million. And yeah, I think you can kind of see the slope, especially from like the Sonnet 3.7 release. And I think this is like an interesting model versus infrastructure. I think there's this usual like vc oh we should invest in like the tools, you know, the picks and shovels instead of the application layer. But I think it's. This is like the first time where like the infrastructure is lagging the applications.
C
That's a good point. Like I think 24 was all about like the agent couldn't use the whole sandbox and now sometimes we are actually catching up with some features for the LLMs that they need more than what we have at the moment.
B
Yeah, I guess what we're doing here, you are another one of the LLM OS companies that we are talking to. We also did one with browser base and I'm not sure who else will qualify under that term. Basically you don't specifically yourself use LLMs internally in your product, but you enable others to work to augment their LLMs with your infrastructure. Yeah, I don't know any reflections on just the general LLM OS landscape. You have other competitors. How is this evolving? How do you position in it?
C
A lot of people are saying if you are a GPT wrapper in 23, you were really bad positioned because all the value will be captured by the AI labs. It's good to be GPT wrapper because you get all the advantages from a new model. You just switch it. I mean the just is not so simple. Probably have evals, need to change the prompt a little bit. But I would say it's increasingly easier to switch models. So we need to think about it the same way. Our users are switching models a lot. We need to be agnostic to the LLMs. Oftentimes people want to deploy us in their cloud or on premise so that's also something very important. I think a good analogy here is sort of like technologically it's kind of you want to be the kubernetes of the world for the agent, but with much better DX and easier to use.
B
One thing I'm thinking about also is like what is valuable real estate to Occupy in the LMOs and I have this spectrum, I think on the browser based episode I was talking about, either you can focus on browser emulation or you can focus on the VM or you can do like a custom Python sandbox like Modal does. Would you say that you are the most general of all of them?
C
Yes, it's very general, but that's not really historically how you want to market it because people don't know what to do with it. Okay, so we started as whenever had our website in 24. It said something like cloud computer for AI. People didn't just understand what to do with it, so we had to literally show them. First you change it to code interpreting because that's something people knew from OpenAI. And then you just show them very, very specific use case and you use that to get early traction and early set of users. And we spend a lot of time, Teresa from our team spent a lot of time, especially her on educating the market and developers. So and I think this is the space, the AI space is you sometimes have to show developers what they might need and you kind of have to trust your guide. Like this might go in this direction, this could make sense and show them what they can build with it because it's very hard to imagine what you can build with things that you don't even have. Right. So why would you need forking sandboxes or checkpointing sandboxes? How is that useful? Well, turns out if you are building some Monte Carlo type of a thing, search for agent, it's very useful. But to actually go for a developer who might not be in the AI deep research type of thing, it's not obvious. So you want to be agnostic, you want to be channel, but you want to also show people very clear use cases how they can use you. And over time they then get educated and realize, okay, there's more use cases understand the platform, they start coming up with their own ideas. But onboard people with general use cases is very hard. At least that's what we learned hard way.
A
Yeah, and you also don't really tailor to the more DevOps infrastructure person, which I think a lot of the other sandboxes are like, oh, we have a Gvisor runtime and we have all these different terms that you don't really know if you're the AI engineer type.
C
Yeah, that's exactly true that our user isn't like an infra engineer. Even ML engineer usually isn't our type of a user. It's like AI engineer from definition you have I still remain valuable ish Web developers and JavaScript World, TypeScript World. Even though we have a ton of usage from Python, but there's so many web developers and it's easier and easier to use LLMs. I really think that you need to cater to these type of developers and make it things simple for them. And if they want to dive in, they can. Chances are they don't even want to. They want to focus on building product developers, infrastructure developers.
B
It's very interesting. Yeah, I mean this whole GPT wrapper versus model lab thing, it's not like it's better to work on wrapper over model labs because the model labs people are making a lot of money. It's just that there is room for rappers and the rappers also do make money. And I think people were not seeing that in 2023. And this is what we saw.
C
It's not binary, it's not either or.
B
Yeah, I think. And then just a quick check. Do you know the rough percentage between JavaScript and Python?
C
Slightly less JavaScript, more Python still. So from number of Downloads of our SDK per month, it's like 250,000. JavaScript close to around half a million. Python. Yeah, something around that.
B
Two to one.
A
Yeah.
B
Yeah, interesting. I mean if the use case is really for code interpreting, generating charts and all these, then Python wins.
C
Yeah, exactly. Python wins for that. But once you go into more like for example building apps, generated apps, then it's JavaScript winning. Right? Because you probably have, you have frameworks all the svelte next JS Vue JSS frameworks, which is like JavaScript thing. It's interesting because I would think that it doesn't really matter what code you want the LLM to produce. It wouldn't dictate what kind of user is using us. Right. If you think about it, when developers are building AI data analysis, it's typically Python developer. When developer is building a type of V0 use case, it's web developer, it's a product developer. So I don't think this is super obvious that I wouldn't think that that would be the case.
B
There's all sorts. I mean Bolt's argument is that you should want to use something like A web container where it's like run on your own browser and it's all free and very fast. I guess the one more critical question, let's say I do know my infra what is the point of cloud for AI, like computer for AI? So basically why can't I use existing infrastructure tools like Railway? Railway wants to go after your customers, right? So why does there need to be an AI focused virtual machine or sandbox or execution environment, whatever you call it.
C
What we offer is sort of orthogonal to cloud. So beforehand you don't know what kind of code you will be running in our cloud. So you can't really optimize it.
A
Right.
C
So there's no like a build deploy step per se. Everything happens ad hoc during runtime. So you need to solve problems like you want to install dependencies very fast, you want to be able to pull GitHub repositories very fast. So then how do the workloads we are running can go from five seconds to five hours. So that also changes even the pricing model a lot. How do you make sure that everything makes sense for you and for the user from the pricing point of view, from the unit economics and also from just the infrastructure point of view where the sandboxes are getting placed inside your cluster. The security model is also different usually because also comes down to you don't know beforehand what code you will run. So by default it's untrusted code and you need to have complete isolation between these sandboxes to make sure first that if something happens in one sandbox it doesn't affect other sandboxes. But also you want to know about security inside the sandbox. So as the LLMs are getting better, you want to know what's happening inside the sandbox. There's like a fun story from hugging face when they were using us and are using us for their openr1 real one of the developers, he shared it online, so I think it's okay to say that. So he lost access to their cluster because the LLM decided to change permissions. If that happens with us, we just kill the sandbox and get a new one. It takes like, I don't know, 150 milliseconds for them. They had to take down the whole cluster, set up everything, didn't even know that this happened. So it's like the compute model TLDR is the compute model, security model is different from current cloud providers and you need to think from the first days about it differently.
A
And then people use depending on the because the difference that maybe people don't think about is like, you can generate the code with the Python SDK, but you can still run LUA code or like R code. Like it's not always matched to the runtime of the SDK.
C
Yeah, it's a general machine, so whatever you can run on a Linux, you can run inside a sandbox. And we had users running of course like Python code, but C, we had Fortran, someone running Fortran, which is like why they were doing very old API for banking or something like that. But you can also start a server inside a sandbox and then you want it to be accessible from the Internet. It's a virtual machine and the challenge is like, how do you make everything fast? For example, how do you make everything secure? At the same time you need to make it accessible enough and controllable by the LLM and observable for a human. So you're kind of building for two Personas for the human developer, the AI engineer, and for the LLM who's like using the sandbox.
A
Yeah, the composability thing is something had not thought about before. But later, like you mentioned, it's like, you know, you might need to run Fortran to access one API and then in the next step you'll need to take that data and run it in a Python script and then you're going to expose that through JavaScript to something else. And you guys can switch the runtime halfway.
C
Yeah, and in Idle world you don't want to kill the sandbox, start a new one, or maybe create a new sandbox template. You just want to keep using the computer. And because we are in cloud, we can do it in a way that you can get more ram, you can get more cpu, you can get less cpu. So you are really paying only for what you need. And as the LLM is doing more and more, you can have very elastic sandbox and keep adding features. And the goal where we think this is getting going is the LLM decides what it wants to do and how it wants to have the sandbox configured. So it basically starts controlling the infrastructure itself and creating sandboxes themselves.
B
While we're talking about the technical details, I just wanted to let you tell people about any other technical details, like what is the box, where we get, what kind of Linux, what size of whatever, what details matter here.
C
So you get Ubuntu box, you can customize it and anything Debian based is going to work. You can even add graphical interface. So if you want to, you can run like legit not headless, but like legit Ubuntu computer on it.
B
So you can do this sort of take like an operator type experience like.
C
You take over control. We have an SDK called desktop SDK that does this for you out of the box.
B
Yeah.
C
And it supports vnc, so you can also have a human in the loop type of a thing and control everything and stream and see what's happening. So by default you get two CPUs and half a gig of RAM on the free tier and you can customize it on our landing page. We say up to eight gigs, but if you tell us what's our use case, you can go to 64 gigs of RAM. We have users using such a beef at sandboxes you can go to 16 CPUs I think if I'm not mistaken. And storage is free. Storage is free. I think there's like a lot of things that are free that we probably need to think about a little bit more. You know, with the DevTools especially infra, I always see see this pattern and I had this naive idea as well that the founder says, like oh, this AWS pricing, it's terrible or gcp, you have no idea what you are paying for. There's so many small add ons you need to pay for. We will start a new infrastructure company. You're just going to pay $200 per month and then you just pay for pure compute. That works until you start scaling and you figure out okay, actually people are doing weird stuff that I didn't expect like having lots of traffic producing. We have a customer that produced petabyte of data. I mean it's not free to post petabyte of data and that's growing. So then you start introducing, okay, probably they should pay some amount for ingress, egress for storage and the pricing gets increasingly complex. So I just think it's a very interesting phenomenon that you start with this brave idea that everything will be super.
B
Simple and we only do code interpreting, so we only need to price for compute, right?
C
Yeah, exactly. And then you wake up in the real world. It's messy.
B
Yeah. So I know about this from my career in cloud. And so the common refrain is that I call this the first principle of technology. Everything can be broken down into some combo of compute, storage and networking. If you fail to price one of them, you will get abused because you're.
C
Essentially offering free storage or compute or something. Right?
B
Yeah. For those interested in this idea, there's a fourth one which is basically the control plane or like the off layer, the IM policies and all that. And that's like maybe you can call that security as well. It's like the fourth layer that people kind of pay for as its own independent thing. For those also interested, Hashicorp has more breakdowns from this like David McJanet which I think is very interesting. If you're just in the business of running a cloud infrastructure company, you should know these things. Many hundreds of businesses have run into the exact same problems. You should just not repeat them and just learn whatever the best practice is.
C
Yeah, I think the billing model has been figured out many times. So I don't think it's like the challenge is not figuring it out. The challenge is I think is introducing it sometimes quick enough. You actually need to do changes on your infrastructure to make sure you know about all this data that's happening and moving one way or another. But yeah, I completely agree with you. Like this problem. We are not the first one that are having it.
B
Yeah.
A
Do you use one of the usage based billing providers, Orb?
C
We are talking with Orb right now. So we are Orbin Meter. Open Meter is another one I think. Or Meter.
A
Yeah, Metronome is the other one.
C
We have been using Stripe's usage which has been a little bit sometimes rougher around the edges. Yeah, this is the thing. We shouldn't spend so much engineering on it. I want to outsource it because that's not our product. Someone else should be focusing on this full time and it's actually pretty non trivial to make sure you have everything right and you really don't want to make mistakes here.
B
Is there anything that you're really looking for that would say okay, that's really what we want? That maybe Orb or Metronome haven't really adjusted for AI yet.
C
I don't think this is AI specific problem. This is infrastructure as you know it. For us, some things that didn't work when we look at some of the providers where the cut they took from the revenue, for example, they take from you. So some of the pricings, I don't know what's the latest but I don't remember which one was it honestly. But I knew that pricing was basically they take a small cut from the revenue each month which is basically like what Stripe does when you are processing payments. It's just like the value was really high and then it's a lot about how hard it would be to integrate it, like how much time you are spending on it. Because we know we don't want to build this in house. It's more like, is the switch worth it?
A
Basically I'm curious your updated takes, I know you had the YSN usage base building a bigger category. I'm curious now with AI would like token based pricing if you have updated thoughts.
B
So for people who don't have context, at netlify we went from relatively flat tier based pricing to usage based billing because that's basically how all infrastructure companies should eventually go. Because you have some whales who use a lot of infrastructure and some who don't use that much and you shouldn't charge the same for both of them. The fun insight was that you would think that at a is pass company that revenue is the most important problem to work on and directly impacts the company's revenue and valuation and all that. No engineers wanted to do it and I was like, why? We actually looked around for a long time we tried to hire and then we couldn't hire. So we ended up putting one of our most senior engineers on it and she took a year to ship the whole billing project. That was presumably a board level objective which was like, hey, let's change from this pricing plan to this pricing plan. How hard can that be? Turns out very hard because you have to instrument everything, even the things that you're like, I don't know if we'll ever use this.
C
Yeah, that's exactly what I meant.
A
That's why storage is free.
C
The first thing is you need to know about everything that's happening inside a log cluster.
B
How big are your logs?
C
Yeah, it's crazy how the pricing is actually the. Not the figuring out the business model, but the integration of it and implementation is actually a lot of engineering soft.
B
Limits, like hard limits. Like do you cut off people once they bust the limits? Probably not, because they get pissed at you. And then they also get pissed at you if you don't cut them off. Because then you send them a big bill. So there's no winning.
C
Yeah, exactly. Also additional problem that people are asking us like I want to run the agent for five hours and you can do that and you might be a very early startup. So our goal isn't to cash you out. Our goal is you can use as much of E2B as you can and just grow. But the longer you run the sandboxes, the larger is going to be your bill. So I think it doesn't really need to correlate with how much product market fit you have. For example, how much users you have. Because with these agents they can work for a long time even if you are Pretty early stage company. I have few users, so sure, I.
B
Mean, yeah, I think. But there's still a question of soft limit, hard limit, that kind of stuff. So just to answer your question on billing for those who are interested, we actually had the CTO of Orb send in a talk for a remote track for the New York Summit on what he thinks pricing for agents looks like. And I think basically you are reselling tokens, Right. The base layer is coming from either your open model cloud provider or your closed model lab API and you resell them. And a lot of them, some people have a lot of markup on them like certain unnamed AI builders and some people have negative markup on them. They are basically selling you at a discount. You should buy as much as you want because they are using VC money to subsidize your thing and I think that seems fine. Simon Willison often asks for bring your own key solution where I will have my control over my relationships and my pricing and my credits with, with my LLM providers. But I want to use your app, so I'll give you my keys and then you use the keys on behalf of me. That doesn't seem to be as popular as I think, as I understand it. I don't know if you've had that request.
C
It's sort of like in crypto, use your own hardware wallet. Always going to be less popular than just going with the thing that's.
B
Yeah. So literally Alex from OpenRouter is the only person who's implemented this. And it's fine for individual use cases because there's a lot of free tiers for individuals. But yeah, I think pricing wise people are trying to move that discussion on like are you positive margin on your tokens or are you negative margin on your tokens? And there's some economic reality there, but they're trying to move that to the agent work, which is what is the value of the human labor you're replacing, which is a whole different thing. Right. So instead of comparing on cost of goods sold, you're comparing on value delivered and that is much higher.
A
Yeah, I feel like if we could get to a good market on bid ask of work being done and then you can kind of arbitrage how many tokens you need. But I think today the agents are so unreliable and unpredictable that it's hard to price ahead of time because you could price any software engineering task per task. It's like do a new button, it's like $500 and then I can arbitrage that. But today there's no certainty of that. And I'm curious.
B
Yeah, you would think. Like also this is why I was very excited about Replit when they first launched their marketplace credit thing. If Replit can't do it, then I don't know if anyone can.
A
Right. The other technical thing we talked about before is forking. You also mentioned it before. And sandbox checkpoints and things like that. Is this something you have today? And then what are people using that for? Like our example was the cloud place, Pokemon Hackathon. Are there more enterprise use case where you see people request forking and checkpoints?
C
Yeah, we don't have this publicly now yet, but that's something we are working on to release somewhat soon. We have persistence, which is like the prerequisite. The core.
B
The amount of volume.
C
Yeah, and. Well, amount of volume, but also memory persistence, which is very interesting. So you basically can pause the whole sandbox while. Even when all the code is, you know, with all the context of the code execution and resume to it, resume it later and come back to it. I don't know, like two weeks later. And it's still going to be there.
B
But the continuous session time is limited to 24 hours, right?
C
No, that's limited in the beta in 30 days.
B
Okay.
C
Do you mean. Are you asking?
A
I don't know.
B
I was looking at your pricing pages.
C
Are you asking about like when the sandbox is running? The sandbox can run up to 24 hours, but when it's paused, it can be paused for like a month. I personally think this is one of the cases where you kind of need to show the developers why it's useful. And this is something I think is going to be very useful as the agents are getting better and LLMs are getting better, because you will be able to parallelize problem solving, essentially. So instead of having a single agent doing one thing, you might have multiple agents trying different paths. And if you imagine like a tree or a graph, every node is like a snapshot sandbox, like a checkpoint sandbox. And then from that node we fork the sandbox and go to the next state. Eventually you find the right path. Right. It's kind of like a. It's a tree search. So the forking at checkpointing solves the local state problem. It doesn't solve the remote state because that's something you can't really control, but it solves a local state for the agent and can then come back to a state and you don't need to replay the whole session or trying to force the LLM to do the same thing again. You have the chat history and you have even all the steps inside the sandbox that went to it.
A
Do you feel like you want to help with the forking and then the re merging? Because I think people understand the forking but then it's like okay, how do I monitor which of the leaves is successful and then how do I merge that back in the thing? Do you think that's something that you want to help people do? Like kind of spread out, parallelize and then find the winner? Or is this something people should do on their own?
C
I think this is sort of like a framework discussion on top of E2B. So this is some like we, we are looking at it. I think eventually like we should go more higher level and I don't know if framework is the right type of a thing. We like to in the team think about it as a toolkit. Instead of building open ended framework how to build agents we give you sort of like wrapper around E2B that makes it really easy for the LLMs to for example merge these states or navigate this tree. So I think it will eventually move there. I think it's a good question to ask how it's going to look like. I still think like building a framework is very hard in the AI as things are moving very very fast.
B
Curious about frameworks, do you see any rise in popular frameworks that we should be keeping tabs on?
C
Well, I don't know if this unpopular opinion but people keep telling me LangChain isn't popular but if you look at its stats it has 20 million downloads per month. How can you have not popular framework when it has 20 million downloads and it's growing? I think, I think there's a slight bubble in. I don't know if it's like people NSF but developers thinking LangChain isn't popular or used. There's one framework that's interesting I think it's called Mastra from one of the founders of Sam Bagwet. Yeah, he's in the discord so that I like that looks very promising. I also like it's like Typescript first which I thought is the right decision because as we talked about being more bullish on the product developers instead of a Python machine learning type of a developer. I've seen a lot of more like toolkits or I don't know how to call them. For example Composio has been one that gives you all the tools advanced. There's been more tools like that which is interesting approach I think Also, browser base is Stagehand is super exciting because in my head it's not a framework. It's like it doesn't necessarily dictate how the agent should behave, it just gives it the good tools to navigate a website and it's very elegant. I think they added three methods by Act C and something.
B
Yeah, there's three APIs. Yeah, observe.
C
Yeah, observe.
B
We talked about it in that episode. No, it's cool actually. I wasn't expecting that many names to come out, but these are good names for people to know. I would agree with most of them as they're in the conversation of tooling and a lot of people listen to us for like, oh, what's going on in sf, right?
C
Yeah, actually it's a good question because when you ask it, I would expect to be bigger framework boom nowadays because it would make more sense to build more sense to build a framework now instead of 23 because things are a little bit more stable. Big problem with frameworks is that the idea of frameworks should probably be that things aren't changing underneath your hands, which if you launch your framework in 23, it was. So that's not a fun thing to be at as a developer who's using the framework.
B
I think things are still changing.
C
Things are 100% still changing, but slightly. I would say some things are clearer than in 23.
B
I don't think people realize, but I'll just say it out here, chat confusions is dying. So any framework that was built in that era with no conception of real time, no conception of omnimodal or multimodal native things, they will probably not age very well.
C
Yeah, probably. What's the next after chat based interaction?
B
With LLM you need chat confusions and you also need reasoning with streaming. And the streaming interactions of agents is also not meta very well. But all agent frameworks will have to adjust to that.
C
Basically good question to ask when thinking all about devtools with LLMs. Is my devtool more relevant as the LLMs are getting smarter and as people need less prompting? I'm for example, really bad at prompting, but I can get more work done over the years because the LLMs are getting better. So that means there's less need for prompt management type of thing.
B
The a talk from Ramp at our New York conference was basically about this. How do you set up your architecture so that you benefit from 10,000x improvements in models rather than every time model improves? You have to kind of throw out your existing workflows.
C
Yeah. The question we ask very Often when thinking about the features and how to position ourselves in the ecosystem, essentially we.
A
Have gone 51 minutes without talking about MCPS, which is tragic.
B
Yeah, how do you do that? How do you talk about AI infrastructure and no mcp?
A
Right. It's kind of crazy, but since you mentioned the problems. We just had the episode with the MCP creators and they were, I wouldn't say not frustrated, but maybe they hope more people will use the prompts and resources in MCP servers instead of just the tool costs. I'm curious if you've seen any fun use cases with like remote MCPS on E2B or any other things like that?
C
Yeah, MCP is watching it very closely, but like still undecided, but actually not what it is, but kind of like what to do with it.
B
There's no MCP on your docs, bro.
C
If you go to GitHub, we have like MCP server.
B
Okay.
C
But yeah, we've seen people using E2B to host MCPS. But then I would say like, I don't think you need E2B for that. You can just probably, you can run it. It's not even optimized for it. Probably it's not. There are probably better ways to do that. I think people are a little bit too much focused on the protocol side of it, even calling it a protocol. I don't know, it's just like a server and there's like a server and client. Yeah, server and client. And there's some agreement. So I guess in a sense it is a protocol. But I see a lot of people comparing to email protocols and such, which seems a little bit far stretched to me at this current moment, but maybe I'm missing something. I think it's super interesting idea. I just haven't had the right insight about it yet. One last thing I wanted to add because we have a bunch of users that added E2B, our MCP server to their registries. And so I think at least at the moment what's more useful is higher order mcps. I was talking with Henry from Smithery around it about it that you know, and he was saying exactly this, telling me exactly this concept. Like it's unclear who's using the MCP at the moment. Is it a developer? Is it like end user or is it another agent? If it's a developer then it might be makes sense to have like a sandbox creation of sandbox in the mcp. If it's an end user, probably like it's too low level primitive. So you want some Kind of higher order mcp. Like instead of us offering like a sandbox, we would be offering like a way to build a code gen agent with MCP or something like that, or run Cogen agent that's using E2B MCP like in the background. So I have more like these type of unanswered questions in my head about mcps.
A
I mean they are all running locally right now mostly. I don't think there's many remote MCP servers yet. I know that people are pushing for.
C
It Looks like there's more remotely actually. I don't know. This is what I heard from people managing these registries of MCPs.
B
Well, they're incentivized to tell you that. Yeah, I fully agree with that confusion about what to do. I do think that every devtools company needs some kind of MCP strategy for better or worse. It's annoying as it is. I think it does start with having an API though rather than like a SDK first experience because then people can just wrap in whatever language that they want. And then you also for you particularly, you might want to have a distinction in your strategy for MCP clients versus MCP servers because those might be different things. And particularly I like what you said about the higher order MCP for the end user who doesn't really care about implementation detail. I think that is fantastic for E2B where the agents can just spin up an E2B instance in the background and they don't even know about it.
C
Yeah, in ideal world we have like MCP E2B dev first. It needs to have figured out authentication so the agent should just ask to have I guess account created for it or whatever it is it's going to be. And then you can just like launch a sandbox, do the execution there and you can come back to it month later and the state is still there. So that I think makes a ton of sense. Then you can start building higher order things on top of that, which could be very interesting. I like what you said about having API first versus SDK first approach. I think this is very, very important for the LLMs and that's how we are building our whole new dashboard, the infrastructure. So everything needs to be API controllable From like public APIs for users because eventually the LLMs will want to control and get this data.
B
Okay, so that's a big shift for E2B because you're SDK first.
C
It has been like underneath API based all the time but now we will go more into it and it was more like terms of prioritization. So we needed to start with humans to get to LLMs and first build for human developer and then now building for like a LLM developer first.
B
Yeah, I'll just call out that since we did the MCP episode, they announced their update to the spec that they added an AUTH component to the spec itself. It seems to be just based on OAuth 2.1 and I assume like, you know, that's the first easiest thing to do. But there's no effective distinction between an agent and an end user. We never had that.
C
I think this touches more like a broader question of you have all the websites that has optimized everything for humans, but now you will have agents visiting those websites. What are the incentives there? Probably you as a website owner you want to know it's an agent because you spend so much time and money optimizing everything for humans. So I think the dynamics on the Internet might get really weird if you don't have a distinction. This is a human, this is agent.
A
Yeah. We tried to record an episode with Matthew Prince, CEO of Cloudflare yesterday and then we had technical difficulties. But they're building a lot of.
B
You can see the stats, the stats of.
C
Yeah, yeah.
A
They mentioned, for example, used to be, you know, Google will be a 2 to 1 crawl to referral ratio. So for every two pages they will read, they will send you one visitor. Yeah, he said OpenAI is 250 to 1. So they'll read 250 of your pages and send you one person. And anthropic was like 6,000 to 1.
B
Wow.
A
So they'll read 6,000 pages before they send one person back to your website. So obviously we have Jeremy Howard who's been working on LLMs Txt to kind of have a separate interface for that. I think today people don't really curate the LLM experience. I think a lot of the LLM Txt that people are making is like automate it, take a website and turn it into LLM txt But like that's not really what you want to do. It's like how do you separate completely the two things? You know, like if you're E Commerce store, the LLM txt should have your own inventory and like one thing, you know, shouldn't have a search button.
B
I feel like obviously LLMs Txt is a good movement that improve the legibility of these doc sites for LLMs. But I feel like it's kind of like a halfway measure. Like I think we failed with agents if we are Reshaping the human environment for agents. Like there must be two of everything. What's your human side and what's your agent side like? No, agents should just be the human side. Why are we making any special dispensation for these things?
A
Yeah, I think the monetization is the only thing. Like too many website are like ads driven kind of like. Yeah, I think you need to change. Once we figure that out, I think you can use the same interface. But I think today just like all these pop ups it's like okay, bitcoin solves this.
B
Yeah, I'm just kidding.
A
But anyway.
B
Yeah, I don't know if you have a take on all this.
C
I have like a general rule that usually when something new comes out, people have this tendency to recreate the thing that already exists for the new thing. That's like the Internet for agents versus you already have all the infrastructure for the old thing and probably might be easier to teach the new thing to use the old thing. I think it's very common coming from developers that it will be a perfect world when you have nice clear distinction between these two. But actually the world is super messy so nothing comes to my mind that they will end up that you have clear distinction between two type of sort of entities. Or we created a new Internet for just mobile phones. You actually have both desktop website version and mobile phone version. Right. It's much more obviously complicated and messy in the real world and having a nice clear distinction which is something I think developers strive for just from working with code because you want this nice clear distinction. But humans are more complicated than that. So I think everything ends up being sort of mixed and you need to adapt to it for sure.
B
Yeah, I think we'll do this for 20 years and then we'll figure out the.
C
And probably it will be somewhere in between that you have like a world where agents are using human Internet but the Internet changed because you have agents or something like that.
B
This reminds me of some conversation that I think. I don't know who was making this analogy. Like in the mobile era we had m.yourdomain.com and then we had www.yourdomain.com and now we might have just LLM.yourdomain.com and that's just the LLM MCP. Oh yeah, way better. Way better. Consumes everything. Cool. We're just going to go through all the rest of the use cases. We can refer people to your website, but I always want people to have a good mental map of when they should go to E2B and what other people are using E2B so that they don't miss out. Right. So it's AI data analysis, data visualization, coding agents, generative UI code gen evals and computer use. Do you think that's like the sequence of most populous data analysis, least populous.
C
Computer use, most experimental is computer use. I would say it still gets a lot of traction when you share the demos of it.
B
It's just Manus. Who else is doing this? I basically don't see anyone else.
C
When you see computer use, I imagine a graphical interface as far as I know isn't running that so would also call computer use but without graphical interface because you are using the full computer but more from code point of view. But that's a different discussion. I would say like computer use is very exciting but experimental from what I've seen.
B
Yeah. Okay.
C
I think for real computer use, you really want to support more platforms than just Linux Windows.
B
Right.
C
And then it might be more licensing battle than technical battle.
B
Yeah, lawyers always win. We'll probably have Eric from Pig at some point, talk about his movements on Windows. So that was going to go into evals.
C
Right.
B
And also how that links to rft. Yeah. Just like can you tell more stories about the OpenR1 projects, how you work with them and any other academics that are working with E2B that you think could be possible for the research or model training use case or fine tuning use case?
C
Yeah. The way hugging Face who built the OpenR1Real project is using us is during like the code gen reinforcement learning step where the R1 model, the open R1 model has a training step where they give it a code problem and the model needs to generate and run code somewhere. Then you have reward function basically giving you 0 or 1, telling you if that was a good solution or bad solution, then you improve the model. So kind of like the feedback loop, they are using the E2B sandboxes to run many hundreds of these sandboxes. Thousands of these sandboxes per training step. So they can achieve big parallelization. We started very fast. You don't need to use your GPU cluster for that, which is very expensive to use for these type of workloads. And that goes with the story that I mentioned in the beginning is that you don't need to worry about the LLM actually changing permissions in your cluster and then you can't access the cluster because everything is isolated and it's secure from each other. So I think that's very interesting use case because we've Had a few other companies reaching out and started using us this way, building models, foundation models. When we started E2B, that wasn't the use case that we had in mind. But it's like makes total sense. Also if you think about life cycle of an AI agent, it makes a lot of sense for us to be from the earliest stage possible and the earliest stage is probably model training. So this fits very, very, very nicely in that use case. We actually released a case study with Hugging Face that's on our website that people can read.
A
Have you seen people also use that to evaluate agents they want to use? Or is it mostly people doing training?
C
We've seen people using E2B for evals.
B
Yeah, that's what I'm thinking. It should be very easy to run Sweepbench and all these on E2B.
C
Yeah, this is more foreshadowing. But we will be launching in the next couple of months like a startup and research program for people and universities and researchers doing exactly these type of things. Different use case. But we for example work with LM arena folks from Berkeley that are using us to compare models in AI app generation and we run the AI generated app.
B
He wrote that in the question, right? Yes.
C
I think that's only possible thanks to Alassio. And I'm not joking because first he connected it and then he actually wrote it.
B
Man, the things you have to do to win deals these days.
A
Full stack value add.
C
He was already our investor. I think also shows it even in better light because you clearly see the person is not interested just to win the deal, but actually, you know, to.
B
Increase the value add.
C
Talk about value add. Like how many VCs are actually fixing your bug in your code base?
A
Let's make a YouTube short of this part so I can share it.
C
You should put it on our landing page.
A
Yeah, exactly as Amen. So let's talk about. Since you mentioned VCs a lot of VCs that passed on you before because you only do code execution. It's kind of like a small market. I would love for you to maybe also paint the picture. So you just mentioned it's cheaper than the GPU cluster than Hugging Face has. Do you want to do GPUs in the future? You mentioned Railway. That was easy to do. Do you want to compete with Railway down the line? Like where do you see E2B going?
C
So GPU question is an interesting one. GPU market is hard. You are competing on the compute internal. I think is hard because eventually you will have competitors and everyone Will be like pricing it a little lower and no one is making any margin. So that's like. But it just opens new use cases for you. And for example, even with the data analysis, if you just run. I think it's Pandas code from there's a recent update. If you just run it on gpu, it's like device as much faster. If we want to do really big AI data analysis, you need that. So I think GPUs for us. And also if you want to have LLM train small machine learning models, maybe you want to build full games, you really want to offer the full computer. But in cloud that's very elastic. So I think GPUs are there on the roadmap. I wouldn't say it's like the thing that we are immediately working on, but eventually it makes a lot of sense for us to offer this. And sorry, what was the other question?
A
Do you want to host the apps that you are building to?
C
We are very well positioned that the LLM is doing all the development work with us. Then you need to deploy it somewhere. So it's like very natural next step. Eventually we want the LLMs to deploy these services apps that are building and have them manage it. And developer is more like in the backseat looking at things. If everything is working correctly for a swarm of agents is working correctly. It also requires a slightly different infrastructure for deploying. But I think there's a big advantage in knowing what developers are building on your platform and then because then you can see like what's the idle. Even from technical point of view, you have all the insights that you need to actually effectively deploy it and kind of COVID the full life cycle of building building the app. But now it's not built by the human developer, it's built by AI. So dldr, yes, probably down the road somewhere. Like we want to build essentially the new AWS but for LLMs.
B
So we're just going to move out. And just to wrap up. One of the interesting things that I saw you do was you were originally check based.
A
Based.
B
And I'll be very blunt, when I first invested in the early round that you did, I was like these guys know how to recruit in Czech Republic. It's like a competitive advantage. And then the next thing I know you're moving to sf. Your whole team, you show up to our office.
A
That's hilarious.
C
Actually I think you offered even your place like duly for.
B
So why move to sf? Do you think that everybody in your kind of similar situation should any pros and cons that you're experiencing I think.
C
You can definitely build a devtool company from Europe. I think it's a lot about question of how easy you want it to be in the earlier days. Especially if you are building kind of like red ocean versus blue ocean waters. If you are building in a field that already exists and you have large competitors, you are building something that's 10 times 100 times better. You probably don't need to be NSF. Eventually you probably will need some kind of US base because for your sales and customers. But you can very well build this from Europe. I know great companies doing that because the knowledge is already among all the developers. But I would maybe argue that it might be harder to find people that are comfortable with a fast iteration loop and changing things early on. Almost you are pivoting every week, every month. But the main motivation for us was we just wanted to be very close to our users. And it was clear after a few weeks that SF is becoming this AI hub and what we used to do and we still do it sometimes but slightly less because of not having that much time and resources. But we just met with the customers that had problems, our users and we just implemented E2B for them. Next to them we made a PR.
B
I call this the collision installation.
C
By the way, I don't know if this is known thing but do you know how many times they did that?
B
200 twice.
C
3 times.
B
Oh lol.
C
I asked about it like 3 years ago when I had a chance to ask a question from Patrick Collison like how many times you did the collision installation? And he was like three times. But then after that you probably don't want to do that because you want to automate things and focus on other stuff. But it's exactly like do the things that don't scale three times.
A
Exactly three.
C
But my whole point was that how many such users I can meet in Prague in a week versus in San Francisco in a week in Prague it's going to be probably one. And then I can't meet anyone else for the next half of the year because I just don't have users in Prague. But all of my users, our users were here. So we could just keep repeating doing that again and again. I would even argue if we did it too much, we could have automated slightly faster. But it's very useful feedback that you can get and you can then start moving much, much, much faster. So that was important. And I think also eventually you are in the B2B business even from the early days and it's good to have good relationship with people and just like meeting it in person is just better than meeting over zoom.
B
Well, I mean that's why we do this in person. But yeah, I mean obviously I run a conference. I'm very sympathetic to people meeting in person. Right. But I also want there to be some hope for people who are, who are never going to come to SF that they can still get involved. That's partially why we do this podcast, is to get them involved in the community.
C
And I mean we started a new office in Prague.
B
So you're hiring again in Prague?
C
Yeah, and I think that there's really good talent in Europe interpret for example. I don't, I can't speak for other countries, but I can imagine it's very similar. Once, once you have a clear idea of what your product looks like, then you can find really good expert on certain part of your infrastructure on database and things like that and just have top talent, get top talent for that. The reason we didn't want to do it early on was because we kind of didn't know ourselves what we were building and we had to figure it out in person with users here. But now we feel much more strong about knowing the roadmap for the company and for the product. And so it's much easier to hire people that have eight hour difference from us and explain them what they are building even remotely. Sometimes if you are not there and you're just communicating through Slack just to.
A
Wrap, what are the roles that you're hiring for?
C
So we are hiring distributed systems engineers, we are hiring platform engineers, AI engineers. We are also, I'm also hiring account manager and customer success engineer. So kind of all over the place we see a lot of market pool and momentum being built up. So we want to double down and move even faster because we see all the potential what we can do and just want to kind of like you want to pour the gas on the fire, on the spark and that's how fail where we are now at G2B.
A
Awesome man. Thank you so much for coming on.
C
Yeah, thank you for having me.
B
It's great.
Date: April 24, 2025
Host(s): Alessio (CTO, Decibel), Sean (Founder, Small AI)
Guest: Vasek Mlansky (Co-founder, E2B)
This episode dives deep into the fundamental role that open-source cloud sandboxes play in powering modern AI agents and code-generating systems. The conversation chronicles the evolution of E2B—from a developer experience project to a core piece of AI infrastructure—and explores the unique technical, business, and ecosystem challenges of creating cloud-based, general-purpose execution environments for AI agents. Discussion spans from building infrastructure for LLM-based code interpreters, to sandbox technical design, horizontal versus vertical go-to-market strategies, and reflections on AI software market shifts in 2024–2025. The hosts and Vasek candidly discuss the “wrapper vs. infra” debate, emerging use cases (including data visualization, research agents, and reinforcement learning), pricing complexities, the state of agent development frameworks, and global talent/relocation.
“It was sandboxes. It literally was sandboxes. The same technology but just completely unscalable.” — Vasek [02:28]
“People had this vision of what they wanted...and then they tried to do it in reality and [the models] couldn't do it because the models are not ready.” — Sean [07:07]
Massive Growth:
“In one year, you’ve gone from 40,000 to 15 million sandboxes per month.” — Alessio [14:20]
Lag Between LLMs and Infra: 2024 was the year infra lagged behind new models and applications; in 2025 infra is scaling up.
Cloud-First Unique Aspects:
Quote:
“It’s not just dynamic compute, it’s dynamic security and dynamic pricing models too.” — Vasek [24:08]
“You’re kind of building for two Personas: for human developers and for the LLM who’s using the sandbox.” — Vasek [24:23]
“The billing model’s been figured out many times. It’s not the model, it’s introducing it early enough and instrumenting your infra.” — Vasek [29:43]
“Instead of a single agent, you have multiple agents forking, exploring different paths; each node is a snapshot, a checkpoint…” — Vasek [37:35]
“Is my devtool more relevant as the LLMs get smarter? There’s less need for prompt management tools.” — Vasek [43:08]
“People are focused on the ‘protocol,’ but right now—it’s just a server and a client. Comparing it to email protocols is a stretch.” — Vasek [44:44]
“The way Hugging Face… is using us is…during the codegen RL step; the model needs to generate and run code somewhere. We run hundreds—thousands—of sandboxes per training step.” — Vasek [55:03]
“I could meet one user in Prague a week…versus in SF, I could meet all of our users, every week. The speed of feedback is just different.” — Vasek [64:00]
| Timestamp | Topic | |-----------|-------| | 00:56–05:19 | E2B’s origin story and early iterations (DevBook ➔ sandboxes) | | 05:27–07:28 | First agent deployment experiments, community traction | | 07:28–09:06 | Discovering agent/product-market fit, and initial codegen limits | | 10:35–11:04 | Shift from code-interpreter to platform for broader agent tasks | | 14:02–14:20 | Explosive growth: 40k to 15M sandboxes/month | | 21:23–26:16 | Technical sandboxing: arbitrary language/runtime control | | 28:37–33:19 | Pricing, storage, and the challenge of usage-based infra billing | | 34:08–36:00 | Value-based pricing and “bring your own key” for agent infra | | 36:34–40:04 | Forking, checkpointing, and agent parallelism features | | 41:08–42:51 | Agent frameworks landscape, toolkit vs. framework trends | | 44:01–47:47 | MCP protocol: uncertain value, protocol skepticism | | 49:23–53:00 | Internet for agents—should we adapt or build anew? | | 55:03–57:31 | OpenR1/Hugging Face RLHF training with E2B (large-scale parallel sandboxing) | | 60:58–64:53 | Why relocate to SF, benefits of hub, team structure |
On infra vs. app tradeoff:
“The infrastructure is lagging the applications. 24 was all about like the agent couldn't use the whole sandbox and now sometimes we are actually catching up with some features for the LLMs that they need more than what we have at the moment.” — Vasek [14:46]
On horizontal/vertical product marketing:
“It’s very general, but that’s not how you want to market it…We had to show them code interpreting, very specific use cases, to get traction. Over time, devs realize there’s more they can do.” — Vasek [16:56]
On securing LLM-executed code:
“You don’t know beforehand what code you will run. By default it’s untrusted code, and you need complete isolation between sandboxes.” — Vasek [23:00]
On the broader purpose:
“Eventually, we want the LLMs to deploy these apps and services they are building and have them manage it…We want to build essentially the new AWS but for LLMs.” — Vasek [59:47]
For more on breaking AI infra trends and in-depth views from the builders pushing the space forward, listen to full episodes at Latent Space or read the show notes at latent.space.