Modal and Scaling AI Inference with Erik Bernhardsson

Software Engineering Daily: Modal and Scaling AI Inference with Erik Brynhardsen

Release Date: July 31, 2025
Host: Sean Falconer
Guest: Erik Brynhardsen, Founder and CEO of Modal

1. Introduction to Modal and Erik’s Background

The episode kicks off with Sean Falconer introducing Erik Brynhardsen, the founder and CEO of Modal. Erik brings a wealth of experience from his seven-year tenure at Spotify, where he developed the music recommendation system and the Luigi workflow scheduler. His journey led him to recognize significant gaps in AI and machine learning (ML) tooling, ultimately inspiring the creation of Modal—a serverless compute platform tailored for AI workloads.

2. Motivation Behind Founding Modal

Erik Brynhardsen elaborates on his motivation for founding Modal:

“I realized there's kind of a general gap in the tooling. I ended up building a vector database called Enoy, no one uses it today. And also a workflow scheduler called Luigi that very few people use today.” (01:26)

Erik's experiences at Spotify highlighted the lack of robust tools for data and AI engineering. This realization drove him to create Modal, aiming to build the exact tools he needed but couldn't find in the market.

3. Developer Productivity and Feedback Loops

Sean initiates a discussion on the lag in ML engineering tooling compared to traditional application development. Erik emphasizes the importance of developer productivity, linking it to the speed of feedback loops:

“How fast are your feedback loops? ... we've taken a step backwards in that [AI and ML].” (02:50)

He contrasts this with frontend development, where immediate feedback enhances productivity. In AI and ML, the reliance on cloud infrastructure introduces significant friction, slowing down the iteration process.

4. Modal’s Technical Architecture

Sean probes deeper into Modal's deployment capabilities. Erik Brynhardsen describes Modal as an SDK that transforms any Python function into a cloud-executed function, akin to AWS Lambda:

“The easiest mental model to think about Modal is function as a service. So similar to AWS Lambda, if you're familiar.” (10:07)

Modal leverages multi-tenant architecture, pooling compute resources to enable rapid scaling and efficient utilization. This design allows Modal to offer usage-based pricing, charging only for the actual compute time used.

5. Multi-Tenant Model and Scalability

Erik discusses the advantages of Modal's multi-tenant model:

“Because we can pool a lot of people's very bursty workloads and run an underlying shared compute pool.” (08:23)

This approach ensures high GPU availability, enabling users to access substantial computational power almost instantaneously. Modal's infrastructure manages fast container cold starts, critical for maintaining quick feedback loops essential for developer productivity.

6. Use Cases and Applications

Modal caters to a diverse range of AI applications. Erik highlights several key use cases:

Generative AI (GenAI): Companies like Suno utilize Modal for AI-generated music, running proprietary models at large scales.
Computational Biotech: Applications include protein folding and medical imaging processing.
Geospatial Analysis and Physics Simulations: Modal supports compute-intensive tasks such as turbulence modeling.
Developer Utilities: Smaller-scale applications like web scrapers and web servers benefit from Modal's ease of use.

“Modal is the goal was always to build a fairly general-purpose platform. ... we've seen people use Modal for like chess engine.” (17:11)

7. Future Directions and Challenges

Looking ahead, Erik outlines Modal's plans to enhance performance, particularly targeting low-latency applications like real-time audio and video streaming. Achieving this requires:

Decentralizing the Control Plane: To reduce latencies associated with global state management.
Smarter Routing: Implementing edge computing strategies to execute functions closer to users.

“Another thing we want to do in 2025 is decentralizing the control plane so that we can use smarter ways.” (20:00)

8. Security and Multi-Tenancy

Sean and Erik delve into the security implications of a multi-tenant architecture. Erik reassures that Modal prioritizes stringent security measures:

“We think a lot about how do we encrypt all the storage, how do we encrypt all the data in transit ... impossible to break out of containers.” (22:26)

He draws parallels with companies like Snowflake, noting the gradual industry shift towards embracing multi-tenant models with robust security frameworks.

9. AI Infrastructure Evolution

The conversation shifts to the broader landscape of AI infrastructure. Erik shares his thoughts on the role of vector databases in AI applications, acknowledging their current necessity while questioning their long-term abstraction:

“It's too early to say. ... we're still kind of early with vectors and I don't know.” (28:13)

He speculates on future developments, such as integrating vector capabilities directly into traditional databases like PostgreSQL or evolving towards even more abstracted storage solutions.

10. Technical Optimizations and Achievements

Erik highlights significant technical optimizations that set Modal apart:

Custom File System: Developed to efficiently handle container data, reducing redundancy and speeding up container starts.

“We built a few space file systems that cache all the data under the hood.” (37:59)

This innovation enables Modal to launch containers swiftly, a cornerstone of their promise for rapid feedback loops.

11. Conclusion and Future Plans

As the episode wraps up, Erik shares Modal's upcoming initiatives:

Distributed Training: Enhancing Modal's capabilities to support distributed model training across multiple GPUs.
Enterprise Focus: Investing in security compliance, single sign-on (SSO), and custom telemetry integrations to cater to enterprise customers.

“Just like kind of was always like the core value of Modal. Now you can also hopefully get that up to 100 GPUs.” (38:57)

Erik emphasizes Modal's commitment to building superior developer tools and infrastructure to meet the growing demands of AI and ML engineering.

Key Takeaways

Modal addresses a critical gap in AI and ML tooling by providing a serverless compute platform that emphasizes speed and scalability.
Developer productivity is at the forefront, achieved through fast feedback loops and efficient resource pooling.
Multi-tenant architecture allows Modal to offer high GPU availability and usage-based pricing, optimizing cost and compute utilization.
Modal supports a wide range of applications, from GenAI and biotech to developer utilities, showcasing its versatility.
Security and scalability are paramount, with ongoing efforts to enhance low-latency capabilities and enterprise-ready features.

This episode provides an in-depth look into Modal's mission to revolutionize AI inference deployment, the challenges faced, and the innovative solutions being developed to empower AI teams worldwide.

wavePod