Podcast Summary: Latent Space — "Why Every Agent needs Open Source Cloud Sandboxes"
Date: April 24, 2025
Host(s): Alessio (CTO, Decibel), Sean (Founder, Small AI)
Guest: Vasek Mlansky (Co-founder, E2B)
Episode Overview
This episode dives deep into the fundamental role that open-source cloud sandboxes play in powering modern AI agents and code-generating systems. The conversation chronicles the evolution of E2B—from a developer experience project to a core piece of AI infrastructure—and explores the unique technical, business, and ecosystem challenges of creating cloud-based, general-purpose execution environments for AI agents. Discussion spans from building infrastructure for LLM-based code interpreters, to sandbox technical design, horizontal versus vertical go-to-market strategies, and reflections on AI software market shifts in 2024–2025. The hosts and Vasek candidly discuss the “wrapper vs. infra” debate, emerging use cases (including data visualization, research agents, and reinforcement learning), pricing complexities, the state of agent development frameworks, and global talent/relocation.
Major Topics, Insights, and Memorable Moments
1. Origin Story: E2B’s Pivot from DevBook to Sandboxes
- Interactive Dev Tools: E2B’s roots go back to DevBook, an interactive documentation and code playground environment for developers.
- Transition Point: The move to E2B was catalyzed by the release of GPT-3.5 and the founders’ burnout changing direction. They open-sourced their sandbox infrastructure, drawing major attention after OpenAI’s Greg Brockman retweeted their demo.
- Quote:
“It was sandboxes. It literally was sandboxes. The same technology but just completely unscalable.” — Vasek [02:28]
2. Early Agent Infrastructure Experiments
- From “AI Agent Cloud” to Sandboxes: Initially, the pitch was hosting everything for agents—deploying, monitoring, and providing a runtime.
- Unintended Audience: The first surge of users wanted quick simple site builders; the infra wasn't ready for complex, robust agents yet.
- Model Capabilities Lagged Ambition:
“People had this vision of what they wanted...and then they tried to do it in reality and [the models] couldn't do it because the models are not ready.” — Sean [07:07]
3. Product Evolution: Data Analysis and Code Interpretation
- Jupyter-Like for LLMs: Key insight—AI agents need stateful, code-executing environments, especially for Python-based tasks like data analysis and visualization.
- Evolving Beyond Just Code: Infrastructure needed to support not only code snippets but also complex, interactive sessions, with features such as persistent state and handling non-Python tasks.
4. Cloud Infrastructure as Core AI Enablement
-
Massive Growth:
“In one year, you’ve gone from 40,000 to 15 million sandboxes per month.” — Alessio [14:20]
-
Lag Between LLMs and Infra: 2024 was the year infra lagged behind new models and applications; in 2025 infra is scaling up.
-
Cloud-First Unique Aspects:
- Ad hoc, on-the-fly execution with unknown workloads
- Need for high isolation/security (prevent LLMs from “breaking out”)
- Flexibility: supports everything from 5-second tasks to 5-hour compute jobs
- Security anecdote: Hugging Face once had an LLM change cluster permissions, highlighting the value of robust sandboxing [23:00]
-
Quote:
“It’s not just dynamic compute, it’s dynamic security and dynamic pricing models too.” — Vasek [24:08]
5. AI Infra vs. App Layer: “Wrapper” Debate
- On LLM “Wrappers” vs. Platform: No shame in being a wrapper if you provide real value and are able to swap models quickly; value shifts between infra and app layers as capabilities migrate.
- Kubernetes Analogy: Aim is to be like “Kubernetes for agents” — infra-agnostic and highly developer friendly.
6. The Unique Technical Demands of Agent Sandboxes
- Generalized Runtime: Anything runnable on Linux is fair game—Python, C, Fortran, and even launching servers or GUIs.
- Supporting Elastic Composability: Sandboxes can be kept alive, expanded, or shrunk as needed, and LLMs will control their own infra over time.
- User Personas: Building for both human AI engineers and for LLM agents as “first-class users.”
- Quote:
“You’re kind of building for two Personas: for human developers and for the LLM who’s using the sandbox.” — Vasek [24:23]
7. Pricing, Billing, and Usage
- Pain Points: Complex, unpredictable usage patterns make pricing hard; need to track compute, storage, network, and (eventually) higher-layer value delivered.
- Third-Party Billing Providers: Orb, Meter, Metronome—all being considered, but integration and per-usage fees are concerns.
- Quote:
“The billing model’s been figured out many times. It’s not the model, it’s introducing it early enough and instrumenting your infra.” — Vasek [29:43]
- Approach: Usage-based, with awareness that infrastructure for token/memory/billing tracking can itself be a significant engineering effort.
8. Advanced Features: Forking, Checkpoints, and Persistent State
- Persistence & Forking: Soon to be launched—pause/resume entire sandbox state, crucial for agent parallelization, experimentation, and tree search methods.
- Use Cases: Monte Carlo searches, multi-agent experiments, research workloads.
- Quote:
“Instead of a single agent, you have multiple agents forking, exploring different paths; each node is a snapshot, a checkpoint…” — Vasek [37:35]
9. Frameworks, Toolkits, and the State of the Agent Ecosystem
- LangChain Still Dominant: Despite developer complaints, stats show wide adoption.
- New Entrants: Mastra (from Sam Bhagwat), Composio, Browserbase Stagehand—tools focused on modern agent needs, often TypeScript-first.
- Trend: Fewer rigid “frameworks,” more flexible toolkits/add-ons, in line with rapid model evolution.
- Quote:
“Is my devtool more relevant as the LLMs get smarter? There’s less need for prompt management tools.” — Vasek [43:08]
10. MCP (Multi-Component Protocol): Skepticism and Confusion
- Not Yet Core to E2B: Some support for running MCP servers on E2B, but limited clarity on actual protocols and differentiation from simple API clients.
- Distinctions Emerging:
- Developer vs. agent vs. end-user facing MCP components
- More value in “higher order” MCPs (e.g., launching a whole agent, not just a server)
- Quote:
“People are focused on the ‘protocol,’ but right now—it’s just a server and a client. Comparing it to email protocols is a stretch.” — Vasek [44:44]
11. API vs. SDK First and Building for Agents (Not Just Humans)
- Shift Underway: Moving from SDK-centric design to API-first to better enable agent and automation use-cases.
- Inference: LLMs will increasingly want to control infrastructure directly, not just consume SDK abstractions.
12. AI Agents as Cloud Resource Consumers and Creators
- Cloud Infrastructure for AI: E2B aims to be the “new AWS but for LLMs,” providing the elastic, configurable backdrop for agent-generated applications and experiences.
13. Use Cases and Evolving Demand
- Most Common: Data analysis, data visualization, code evaluation (evals), generative coding.
- Emerging: Real computer use (with GUIs), reinforcement learning, deep research agents, running entire agent architectures, agentic app deployment.
- Hugging Face & RLHF: Hugging Face used E2B sandboxes massively in OpenR1 for RLHF codegen tasks, isolating unsafe code generation and allowing thousands of parallel tasks. [55:03]
- Quote:
“The way Hugging Face… is using us is…during the codegen RL step; the model needs to generate and run code somewhere. We run hundreds—thousands—of sandboxes per training step.” — Vasek [55:03]
14. Scaling, Talent, and the Global AI Hub Shift
- Move to San Francisco: Initially based in Prague, E2B relocated for proximity to AI builders and rapid iteration.
- Quote:
“I could meet one user in Prague a week…versus in SF, I could meet all of our users, every week. The speed of feedback is just different.” — Vasek [64:00]
- Still Hiring in Europe: Reopened Prague office for specialized hires; remote work viable once direction is set.
Timeline of Key Segments
| Timestamp | Topic | |-----------|-------| | 00:56–05:19 | E2B’s origin story and early iterations (DevBook ➔ sandboxes) | | 05:27–07:28 | First agent deployment experiments, community traction | | 07:28–09:06 | Discovering agent/product-market fit, and initial codegen limits | | 10:35–11:04 | Shift from code-interpreter to platform for broader agent tasks | | 14:02–14:20 | Explosive growth: 40k to 15M sandboxes/month | | 21:23–26:16 | Technical sandboxing: arbitrary language/runtime control | | 28:37–33:19 | Pricing, storage, and the challenge of usage-based infra billing | | 34:08–36:00 | Value-based pricing and “bring your own key” for agent infra | | 36:34–40:04 | Forking, checkpointing, and agent parallelism features | | 41:08–42:51 | Agent frameworks landscape, toolkit vs. framework trends | | 44:01–47:47 | MCP protocol: uncertain value, protocol skepticism | | 49:23–53:00 | Internet for agents—should we adapt or build anew? | | 55:03–57:31 | OpenR1/Hugging Face RLHF training with E2B (large-scale parallel sandboxing) | | 60:58–64:53 | Why relocate to SF, benefits of hub, team structure |
Notable Quotes
-
On infra vs. app tradeoff:
“The infrastructure is lagging the applications. 24 was all about like the agent couldn't use the whole sandbox and now sometimes we are actually catching up with some features for the LLMs that they need more than what we have at the moment.” — Vasek [14:46]
-
On horizontal/vertical product marketing:
“It’s very general, but that’s not how you want to market it…We had to show them code interpreting, very specific use cases, to get traction. Over time, devs realize there’s more they can do.” — Vasek [16:56]
-
On securing LLM-executed code:
“You don’t know beforehand what code you will run. By default it’s untrusted code, and you need complete isolation between sandboxes.” — Vasek [23:00]
-
On the broader purpose:
“Eventually, we want the LLMs to deploy these apps and services they are building and have them manage it…We want to build essentially the new AWS but for LLMs.” — Vasek [59:47]
Takeaways for AI Engineers
- Open-source cloud sandboxes are quickly becoming must-have infrastructure for AI agents and codegen workloads—a place where LLMs can safely reason, experiment, evaluate, and build applications.
- Market and technical challenges: General-purpose infra must be shaped into specific use cases to gain traction before developers learn to exploit its generality.
- Security, elasticity, and pricing are core differentiators; infra needs to match the unpredictable, dynamic workloads of LLM agents.
- Emerging demand for persistence, forking, and stateful agent sessions will expand agent capabilities far beyond current “code-interpreter” paradigms.
- API-first infra is the future: Direct agent-controlled infrastructure is coming.
- In-person access to talent and users (e.g., in San Francisco) accelerates innovation, but global talent remains key as direction matures.
Further Resources
- E2B Documentation & Case Studies
- Latent Space Podcast Archive
- [Relevant Episodes: Browserbase, MCP Creators, OpenR1 case study]
For more on breaking AI infra trends and in-depth views from the builders pushing the space forward, listen to full episodes at Latent Space or read the show notes at latent.space.
