Railway: The Agent-Native Cloud — Jake Cooper - Latent Space: The AI Engineer Podcast

Summary8 min read

Latent Space: The AI Engineer Podcast

Episode: Railway – The Agent-Native Cloud (Jake Cooper)
Date: May 20, 2026
Host(s): Alessio (ChronoLabs), Swix (Latent Space Editor)
Guest: Jake Cooper (Founder, Railway)

Episode Overview

This episode dives deep into the evolution of cloud infrastructure for the agentic era, as told by Jake Cooper, founder of Railway. Jake shares Railway’s philosophy and journey—from bootstrap startup to a core platform powering the next generation of AI-driven, agent-native applications. The discussion covers everything from agent-native primitives, feature flagging, data centers and infra economics, to agent deployment, developer tools, and a candid reflection on founder life outside the YC mold.

Main Discussion Themes

1. Railway’s Mission and Agentic Evolution

Railway’s Value Prop: “Railway is the easiest way to ship anything. You just go to the canvas or you talk with Claude and you say, deploy postgres instance, deploy my GitHub repository, run this code, et cetera…you'll just be up and away to the races.” (C, 01:09)
Motivation: Making software deployment and evolution trivially easy; removing friction from building in the “physical world” by making creation in the virtual world accessible to all.
Agent-Native Vision: Jake describes the coming wave—moving from coding in assembly, then C, then JavaScript, and now: “words.” The future is natural language/software development powered by agents.
Agent Use as Core Direction: Over the last six months, “we've probably deeply prioritized agentic as a mechanism to go and build and deploy things…” (C, 11:24)

Notable Quote:

“We've moved from assembly to C to JavaScript to now like words, right? And you're going to need to be able to close that. But that's where it goes.”
– Jake Cooper (C), 12:13

2. Scaling, Growth, and Inflection Points

Railway’s growth took years of “slow grind” before recent hockey stick adoption, with user numbers now adding “100,000 users a week” (C, 09:38).
Inflection points: early user acquisition was highly hands-on via Discord; periods of expansion (free tier explosion) and contraction (focus on business sustainability).
Managing User Base: Handling the influx of less desirable free-tier users (bots, crypto miners): “You build an open product on the Internet…the Internet is a horrible place…” (C, 08:12)
Lean Team Philosophy: Despite scale—3 million users—Railway maintains a headcount of only 35 people.

“We don't want to just like add headcount for the sake of headcount...We want to build, like, systems.” (C, 09:38)

3. Infrastructure Philosophy: Building the Agent-Native Cloud

Primitives & Bare Metal

Custom Primitives: Network, compute, storage, and orchestration—built for agentic usage, not just human users.
Avoiding Kubernetes: “We don't really use kube because we want the higher order of control...” (C, 13:29)
Control Costs and Performance: Running their own data centers: “Our payback period when we go to metal, if we rented in the cloud, our payback period is about three months...That’s four years worth of depreciated hardware.” (C, 15:42)
Data Centers Worldwide: “Two in every other region now. Singapore, we're adding a second one in Q3...” (C, 14:52)
Scaling and Bursting: When demand spikes, Railway bursts onto public clouds; most workloads reside on their own infra.

Debt Financing

“Infra startups raising debt is a tool that people don't utilize enough…” (B, 20:19)
Hardware and capital costs often financed with debt, secured against servers, to optimize operational leverage versus expensive venture equity.

Notable Quote:

“If you look at just token use right now or compute use...those things are blowing up massively over time...You can get a lot of almost back of the napkin, balance sheet margin, whatever you want to call it, to kind of make those experiences solid by building your own metal.”
– Jake Cooper (C), 13:42

4. Agent-Native Workloads: What Agents “Want” and Infrastructure Implications

Requirements for Agents

Agents need all the same organizational primitives as humans—version control, feature flags, observability, I/O, orchestration—just at “a thousand X scale.” (C, 25:36)
CLI design: Provide as many flags and args as possible; “you hand it to an agent...‘this is excellent’...so many handles…” (C, 28:36)

The “Canvas” and CLI Paradigm

The web-based Railway “canvas” was designed for humans—now the CLI (with deep hooks) is more important for agent workflows (B, 31:43).
Canvas as Output, not Input: Moving toward serving as an observability/approval layer and “anchor for your context” in large, agent-driven orgs.

Progressive Rollouts & Testing

Emphasis on making safe, progressive changes (e.g., “non-deterministic version control” and “shadow traffic”) core to the platform.
Cites Meta running “10,000 versions of meta” as aspirational level of sophistication.

Notable Moment:

“You need the primitives and the workflows and the experience...so that you can fork any point at any service at any point in time.”
– Jake Cooper (C), 42:39

5. Incident Response and Internal Tooling

Central Station: Railway’s in-house context aggregator, clustering all user feedback and incidents.

“Every piece of feedback, every piece of customer support, every single thing like that gets aggregated into what we call, like, clusters.” (C, 35:15)

Transparency and Over-Disclosure: “What's the honorable thing to go in and do? It's like, well, you notify people, you know, to the widest degree at which they may have been, you know, affected or there was an issue…” (C, 39:45)
Internal Testing: Progressive rollouts and extensive internal dogfooding precede public features.

6. The “Agentic” Software Development Lifecycle

Code Review and SDLC Transformation: Anticipates pull requests being replaced by prompt requests (“the PR is dying…code review is also kind of dying”) (B, 70:57).
AISRE (AI Site Reliability Engineering): Jake is skeptical on indiscriminate auto-remediation; needs proven safe primitives first, or “it's going to nuke your production database” (C, 45:10).

The “Holy Trinity”: Spec, Code, Tests

Future development: clear specs, tests, and code in continual balance and fuzzily reconciled by agents and tools (C, 46:36).

7. Product Evolution & Differentiation

Serverless Redefined: Railway blurs lines between classic serverless and full VMs: “The ability to run stateful, long-running...workflows, executions, whatever.” (C, 51:31)
Heroku as Precedent: With Heroku sunsetting, Railway absorbing migration; but with ambitious divergence in core primitives and focus on agentic infrastructure, not “just the new Heroku.”

8. Technical Deep Dives: Temporal, Language Ecosystem, and Build Systems

Temporal/Cadence: Powerful, but brittle at scale; demands full-system modeling in the developer’s head, and mistakes cause non-determinism.
“When it works really, really well, it works like super, super well. Right. But then you run into a spot where…you could put yourself in a spot where you would cause issues…” (C, 60:28)
Language Choices: TypeScript, Rust, Go; C for eBPF/kernel extensions.
Railpack/Nixpacks: Proprietary dependency analysis/build engine; regrets around Nix and challenges with content addressed, versioned binary stacks at fleet scale.

9. Founder Journey: Going Against the Consensus

Jake eschewed the accelerator/co-founder path: “You just have to think about all these things and be obsessed with all of these things… every layer of the stack.”
Advocates deep focus (“obsession”), relentless writing and planning, and adapting to the moment—sometimes disconnect for clarity, sometimes “work sun up to sundown.”
“Most advice is to be digested and to be thrown out the window. And if it's helpful, it'll come back.” (C, 83:46)

Memorable Quotes & Moments

On Building Infra:

“Anything is figureoutable. Right. Like you'll just, you'll just figure it out, you know.” – C, 03:57
On Scaling Organizations:

“You expand your company...then you almost compact it, smooth out those things. So the experience is like, really, really stellar.” – C, 74:11
On Software Evolution:

“If you’re writing code by hand, you’re doing this wrong.” – C, 66:10
On Agentic Infrastructure:

“You need something massively better than what currently existed, right?...Because you need to do thousands of these things, what assumptions change?” – C, 25:36

Key Timestamps

01:09 – What is Railway?; company mission by Jake Cooper
06:40 – Growth inflection stories and business transitions
11:24 – Prioritizing AI and agentic infrastructure
13:29 – Railway’s infra architecture; avoiding Kubernetes
15:42 – Data center build out and payback math
20:19 – Debt/financing for infra startups
25:36 – What do agents “want” from infra? Scaling primitives
28:36 – CLI surface area for agents
31:43 – CLI vs Canvas, interface design for agent-centric workflows
35:15 – Central Station: Internal feedback and incident management
45:10 – SRE, incident response, agentic auto-remediation
51:31 – Modern serverless, long-running computing, “fluid compute”
53:34 – Heroku’s decline, migration to Railway
58:42 – Technical: Uber’s Cadence/Temporal, workflow complexity
68:12 – Roadmap acceleration: Agent-enabled speed
70:57 – The death of the PR, rise of prompt requests & agent-native SDLC
74:39 – Feature flagging and safe rollouts
76:15 – Cattle vs pets, stateful infra, and new snapshotting paradigms
79:42 – Solo founder psychology and company-building against consensus
83:46 – Writing, advice, and nonstandard routines

Conclusion & Takeaways

Jake Cooper’s vision for Railway is deeply infused with an “agentic” mental model—building for a world where bots and humans alike need instant, safe, contextual deployments at hyper-scale. The company culture and infra philosophy are notably pragmatic, hands-on, and grounded in technical depth, relentless focus, and willingness to do “whatever is required” at any layer of the stack. Listeners will leave with a sense of the fast-evolving agent-native cloud, and the kinds of infrastructure tooling, leadership, and culture needed to stay ahead.

Links and Resources

For listeners building the future of AI and infrastructure, this episode provides a definitive, insider view on what it takes to engineer and operate the coming wave of agent-native cloud platforms — and the ethos required to survive, thrive, and lead.

Loading summary

Transcript331 lines

[00:04]
A
Hey, everyone. Welcome to the Late in Space podcast. This is Alessio, founder of ChronoLabs. I'm joined by Swix, editor of late in Space.
[00:11]
B
Hey, hey, hey. And today we're in the studio with Jay Cooper of Railway Conductor. Of Railway Conductor.
[00:16]
C
At Railway Choo. Choo. Choo Choo.
[00:17]
B
Do you actually have that, like, anywhere on, like, your business?
[00:20]
C
Well, we, like, we roughly call, like, people. Well, I don't have a business card. We're not. They're not that big yet. At some point, I will. I got handed a nice business card from the Super Micro folks, and I was like, damn, that's actually like, pretty official.
[00:30]
B
They're coming back business cards.
[00:32]
C
Yeah, they're cool, they're hip, they're jiggy. But yeah, the whole conductor thing, like, we call some of our volunteer moderators conductors, you know. Yeah. So it's a good one. It's a good one. Like, we're trying to figure out what we want to call each other internally, and there's, like, varying levels of thought. Some people are like, oh, it's super cringe. Like, just don't, like, you don't need a name for, like, you know, people internally. And some people are like, oh, yeah, we want to call each other, like, this thing or whatever. I was like, we still don't have a really good one. You know, we've got, like, new railcruits. We've got like, trainiacs. We've got like, nothing's like, really?
[01:00]
B
I like trainiac training.
[01:01]
A
Sounds good.
[01:02]
C
Sounds. Yeah.
[01:02]
B
Railwayient. Okay, so. Well, for those who don't know what is railway, let's give people a crisp definition up front.
[01:09]
C
Yeah, Railway is the easiest way to ship anything. You just go to the canvas or you talk with Claude and you say, deploy postgres, instance, deploy my GitHub repository, run this code, et cetera. Right. And you'll just be up and away to the races. Right?
[01:22]
B
Yeah. You got a nice animation on the landing page.
[01:24]
C
Oh, well, thank you. None of my work, by the way. They don't let me touch any of the design stuff anymore. But yeah, we want to make it trivially easy for not just to deploy things, but for you to almost evolve applications over time. We believe that most of the tooling right now is kind of like, stacked up like you're stacking entropy on top of entropy on top of entropy. Right. So you have like Docker and Cube and then like, ansible scripts and all of these other things. Right. And if we can kind of like, version all of your software for you and keep track of all of the changes. Then we can make it actually trivial for you to clone environments, you know, fork into a parallel universe, get copies of like production data, get copies of like any of your services, make those changes, validate those changes, collapse it in without kind of having to just like reproduce everything across a, you know, a staging environment or all of those other things. Right? Yeah.
[02:08]
B
Amazing. One thing, I was looking at your background, right, Like Bloomberg, Uber, there's nothing immediately that stands out to me as like, okay, this guy's going to found like the next great platform as a service. What prepared you for Railway?
[02:21]
C
It's almost like a curiosity to just like ever go deeper, right? And so like, you know, started out on like front end stuff, you know, like working on the like Wolfram, like web Mathematica and porting it over there and then you know, briefly moving to Bloomberg and then moving towards Uber and like distributed systems and kind of like taking all the jump bikes kind of systems and moving them over to a distributed system built on top of cadence. Like the temporal. Yeah, the pre. Temporal temporal, which by the way, I'm
[02:47]
B
happy to talk about pros and cons.
[02:49]
C
Yeah. I think like, it's like, let's do the Railway story. And so like, it's just been a continual step of like, I, I want this experience, whether it is like walking up to like a bike and just unlocking it and like having it be like frictionless to like work or whatever, and then like necessitating the like, depth required to go in and make that happen, right? Like a lot of the work that I do and a lot of the team does is like, it's all in service of that experience, right? And like, we fundamentally don't care, like how deep we have to go, whatever, like we will swim to the bottom of the swimming pool to go and get the experience. Right? And I think that's what a lot of, you know, kind of the trajectory was, right? And so it's not like I have a physics PhD or whatever. I did like an EECS degree, you know, it's just, it's always been about just trying to figure out that next step of like, how do we get there, right? And that's like what's led to, you know, starting Railway for that experience and then like moving all the way to bare metal data centers, right. Like, you know, I was adding patches to the kernel this week, right, Just to like get the experience there because I'm like, I see it and like, how much better it can be. Right.
[03:50]
B
You added patches to the Linux kernel this week.
[03:51]
C
Yeah, well, not upstream railpack.
[03:55]
B
No, this is different. This is the OS on top of railpack.
[03:57]
C
Yeah, no, this is like, this is that actual kernel like patches. But it's, it's always literally just what do we have to do to get that experience and just figure it out? Right. Like anything is figureoutable. Right. Like you'll just, you'll just figure it out, you know.
[04:10]
A
So would you send the patch upstream or is it just because like it doesn't fit?
[04:14]
C
Maybe it's like we have to work out the experience for us internally it has to do a lot with the like storage layer that we're building for some of the agentic stuff. So maybe it'll be useful to people upstream, but it's deeply useful for us internally.
[04:29]
A
I mean, you mentioned open source before, so I'm just kind of curious about how you think about starting from open source and then coding agents let you do a lot more from forks of it.
[04:39]
C
I think it's funny because I think GitHub's original sin is that it's almost a series of broken pointers. It's like you have essentially this thing and then you clone it and then, okay, great, I've just lost that whole upstream. Right. How do we make it trivial for people to modify really, really small pieces of it? Right. And you think of Git almost in this discrete sense of I've either made a change and I've merged upstream or I haven't. Right. What would it look like if it was like percentage based or a little bit more non deterministic or anything else like that More of like a stream of changes that you kind of like traversed as a user more as kind of like a percentage of this is rolled out in general and it's been ruled all the way up. Right. You know, we have the open source like kickback program and allowing you to deploy those templates because we almost want to make it trivial for people to like go and version these shards. Over time it solves like a really, really large problem in terms of authentication, authorization, security. Like you know, NPM has that thing where you can almost define, hey, don't take any new packages or whatever. Like the ideal end state is actually like you should roll out progressively to the users who have the minimum impact zone for any of these things and just continually roll up. Right. Like JP Morgan or something else like that should probably be the last one on the patch line for that. Right. For all of our sakes. Right. Like because we have all of our, you know, money, all of those other things. It's okay if like, Johnny Vibe coder gets like a broken patch or something else like that, because ultimately there's so much entropy in the system that you do have to. You do have to roll. Like rubber has to meet road at some point. Like, you have to test at varying levels. Right. So, yeah, a little diversion from wherever we started.
[06:13]
B
But, you know, so I just wanted to, like, pull up this glorious chart you say, which is basically your usage
[06:21]
C
or number of daily signups, I think. Daily signups, yeah.
[06:25]
A
Yeah.
[06:25]
B
So you started six years ago and. Yeah, like a slow grind.
[06:30]
C
Slow grind, yeah.
[06:31]
B
And now, now obviously you're on a rocket ship. You say, don't doubt your fight and don't quit. But, like, maybe if you want to pick out, like, certain points that were, like, sort of key inflections of the company, that might be fun.
[06:40]
C
Oh, yeah, yeah. Well, I mean, at the start, it's basically like, how do you get your first hundred users? Like hell or high water, right? And so, like, starting in, you know, we had a website and we had a support link, and the support link was the Discord Channel. And you just showed up there. And I had notifications on. I had two monitors. I had the monitor I was working on and then I had the other monitor. And if anybody came in, I was like, oh, hey, how's it going? Like, you know, it's like. And it was like, super rare or whatever, so trying to get those initial, like, first hundred users to, like, actually kind of come back to it. Um, and that's, I think, where you can kind of like, see the really, like in between January 2021 and 2022, like, probably the middle, like, they're kind of. Right. And that's like the, the start. Um, and then you ultimately end up building a consultancy factory of, like, users wanted all of these things in general. And so you kind of have to go back to the board a little bit and be like, well, what is the actual product offering that I want to build on top of these? And I think, like, incidentally, it's funny, like, I think VCs really want, like, charts that, like, always look like this or whatever. Right. But I think in reality you actually don't want charts that look like that. Most companies, I think, or at least for us, there's been periods of, like, expansion of, like, okay, we're going to go and add these features to, like, going into these use cases. And then there's been periods of, like, compaction where we're saying, like, okay, how do we have. If the experience we have is really, really good, how do we make it significantly better? Right. Like, maybe we're even stripping out features that don't fit our ICP anymore. How do we go in and do that? And I think throughout this whole chart you can see a lot of those things. The boom in the 2022-2023 is like we had a free tier and everybody under the sun was using it and all those other things.
[08:10]
B
A lot of Reddit bots and stuff, discord bots.
[08:13]
C
And I think there's a thing that's really, really tough to teach people or tell people about is like when you build an open product on the Internet where anybody can sign up. The Internet is a horrible place that has like so many things. Like if I told you about my PC. Yeah. Like we got, yeah, Crypto miners. You got like all these other things, right. And so you kind of go through these periods of like, well, how do I reach as many people as possible? And then like, how do I fit in? Exactly. The use case for the people who are really, really going to matter and are going to be really, really excited about specifically this thing. Right. And we go back and forth internally and then there's like a, what is that? A two year period of like making the actual business work in general. Right? So like free tier era, losing what I think half a million dollars a month and like, you know, we're making
[09:00]
B
on like a 20 million bank account.
[09:02]
C
Yeah, yeah, like a 20 million bank account with like, I don't know, like maybe $50,000 a month in revenue or something else. Like that is horrible business. I don't know why, but anyways, you have to kind of go through and be like, cool. Like we have an experience that people love in general, but like, the business has to work. Right? And I think there's, there's like, I guess two schools of thoughts is you can, you can continually run the horrible business all the way up in general and have bad margins, or you can actually go and go back and kind of make it work. Right. And for us, you know, we've always really wanted to have like a super lean team, right? So we're 35 people right now. You know, it's very, very small. We have like what, 3 million 200.
[09:38]
B
3 million.
[09:39]
C
Yeah, yeah, because we're adding like 100,000 users a week right now. Right? So it's like, it's growing really fast. But we've always wanted to have a really, really lean team. Like, we don't want to just like add headcount for the sake of headcount, just like throw bodies at these problems. We want to build, like, systems. Right. It's really, really hard to build systems when you're, you're kind of in that expansion phase because you're just adding stuff to the, to the system in general because people are asking for it or things are breaking in general. Right. We basically were like, all right, like, you know, we're going to, we're going to cut it for now. Like, we're just, we can't support this. Like these free users that, like, we want, like we want to reach as many people as possible because we believe that, you know, software is this really, really important thing where if you can kind of like create something, it's become really difficult to create things in a physical world. So it's really important to make it really easy for people to build things in a virtual world so that people have access to creation. Right. And so we want to reach as many people as possible, but there's kind of like legs on that journey. So we basically had to kind of close off the free, free kind of users for a little while, rebuild the business, make sure that it worked in general. Right. And then I think you can kind of like see the building of that in general. Right. And then I think you see kind of some divots in those charts. Right. Like, if you actually follow between, I think 2025 and 2026, it's either summer or winter. That's basically it.
[10:47]
B
Right.
[10:47]
C
Like, either people go on holidays with their family or they go on.
[10:50]
B
Oh, it affects that much.
[10:51]
C
Yeah, yeah, yeah. Well, because it's like, it's kind of B2C. It's kind of B2B in general. Right. And so you have a lot of these users where, like, they're shipping constantly and then, you know, they'll kind of like stop or whatever. Right. And so maybe for summer or like, maybe like our, our activation curve is like, now we see a lot of people like activating in the weekday. Right, right. Because we have a lot more like business users in general, so that gets a lot less sheer, so to speak. Right. And it kind of like smooths out over time, you know.
[11:17]
B
Yeah. Is there any point at which you started prioritizing AI developments or agent development?
[11:25]
C
I think like, we've. So we've prioritized almost like agentic as like a top of funnel thing. And probably over the last like six months, we've probably deeply prioritized like agentic as a, as a mechanism to go and build and deploy things just because we believe fundamentally like the, the curve is so sheer and like that is the way that people are going to go and build and deploy software. And it almost like fundamentally doesn't matter if it's if like this is.com or not because we're all on the Internet now anyways, right? And so if agents are going to go and deploy a bunch of things and we hit an inference wall at some point, then like at some point we will go in and fix those problems. But like that will be kind of the dominant species over the next like 10 years. Is, is we've moved from assembly to C to C to JavaScript to now like words, right? And you're going to need to be able to close that. But that's, that's where it goes, you know.
[12:13]
B
So when you say this is.com, do you mean like buying the domain or.
[12:17]
C
No, no, no, no, no. I mean like actually just like, you know, they had a bunch of run up in the dot com era for companies because they were like the Internet is really, really important. And then you hit kind of like bottlenecks, fundamental laws of physics, math didn't work, all of those other things and everybody kind of like, you know, went back down to the earth, right? But at the end of the day it didn't matter because the Internet is like so, so impactful for our lives that if you operate on a long enough time horizon that you should be like, you should just build these things anyways because you can see where that's going, right? And that's where I fundamentally believe a lot of the agent stuff is.
[12:46]
B
Right?
[12:47]
C
And we can talk about a little bit of it later, but you're going to get to a point where you're running thousands of these agents in parallel, right? Like one, what's the inference cost for that? What's the compute cost? How do you make that efficient? All of those other things. But like two, how do you go and coordinate all this stuff? Like we had it, we have, we have issues coordinating humans in general, right. We don't even have good tooling for that. And now we're starting to figure out, it's like, oh, how do you get agents to coordinate? How do you go in and get them to be able to safely version changes or for them to know when to put their hand up to get somebody to intervene, Otherwise it just becomes an interrupt factory. That's crazy.
[13:21]
B
Maybe we'll go right on the technical side of things. What are the core infrastructure or architectural beliefs of railway that allow you to do what you do.
[13:30]
C
Yeah. I think the primitives matter a lot for us. A lot. We need to be able to do network compute and storage and orchestration all kind of around it. You kind of need control over a lot of those things. Like we've talked a lot about, like how we don't really use cube like kubernetes because we want the higher order of control to be able to like go in and place workloads in very, very specific places. Right. The reason for that is like, you know, it's kind of the thing we talked about previously. But like you have to be very, very efficient with these agents, like memory reuse, all of those other things or you're going to massively, massively blow up your cost structure. Right. I think also incidentally, being able to rack and stack your own servers and build your own metal, it unlocks a level of performance. One, but two cost, where you can say, oh, those experiences that you want to offer where you're running a thousand agents in parallel are not massively cost prohibitive. Right. Because if you look at just token use right now or compute use or anything else like that, those things are blowing up massively over time. Those things are going to have to get a lot and a lot more efficient. You can get a lot of almost back of the napkin, balance sheet margin, whatever you want to call it, to kind of make those experiences solid by building your own metal. Right. So kind of to the earlier point of like, we've always tried to go a little bit deeper every time to make that experience. It's all in the service of offering that differentiated experience to as many people as like humanly possible, you know? Yeah.
[14:52]
B
You have a data center in Singapore?
[14:53]
C
Yeah. So we have two in every other region now. Singapore, we're adding a Second one in Q3, so. Yep.
[14:59]
B
So like, what's it like? I mean, I've never built a data center.
[15:02]
C
Yeah, well, we'll have to like go to one or whatever.
[15:04]
B
Go to like Equinox and say, hey, I want some.
[15:06]
C
Yeah. So. So, yeah, I mean, I can run Equinix. Equinix? Yeah, Equinox.
[15:10]
B
I mean, Equinox.
[15:13]
C
I mean you can put a. You can put a data center in the steam room and get nice and hot or whatever. But yeah, yeah, you basically just go and you say, hey, listen, I want power and I want a cage. And they're like, great, here, this is what it's going to be. And then you rent the cage for A period of time and then you have to fill the cage with racks, servers and then hook up Internet to it. Right, that's realistically.
[15:36]
B
And then you handle everything else right?
[15:38]
C
Yeah, you just handle everything else right.
[15:39]
B
And like, what's the math versus obviously the clouds.
[15:42]
C
Yeah. Our payback period when we go to metal, if we rented in the cloud, our payback period is about three months because crazy. It's nuts. Yeah. And that's like four years worth of like depreciated hardware. Right. And so I think it's like you're going to see a lot of this almost like compute crunch, so to speak, because a lot of the hyperscalers are buying up a lot of stuff. Like we're working directly with OEMs and like resellers and like directly with people who are like building these machines, like Supermicro, Dell, all of those other things to go in and get these things, things working. But you know, upstream, there's like a bunch of supply stuff. It was funny because when we raised our last round in between basically deploying the capital for the servers and actually I think even now the amount of money that we've raised is less than the amount of money that we have in the bank. Plus what the value of the servers are because the servers have actually appreciated in value because RAM has gone up. In general, it's nuts just in terms of how valuable hardware and all of this stuff is. Right. If you look at especially a lot of like hyperscalers, like what they deployed like $80 billion of like capital expenditures like this year and like into next, it's going to be like more in general, right. There's these massive, massive scale like infrastructure buildouts. And you can look at that by like, wow, that's crazy that they're spending like way more than the Manhattan Project. But like again, if you go back to every person is going to run, you know, dozens, hundreds, whatever of agents in parallel. Like you should spend more than you have. You have like, you have no conceptual idea of like how much compute is required to go in and make that experience happen. Even if you're deeply efficient, even if you're sharing resources, even if you're doing all of these things correctly. And that doesn't even count inference.
[17:23]
A
How do you plan on the build out? Like, I mean the growth chart is so vertical that you know, like, are you usually 100% utilization rate as soon as you're live with these drugs or like how far ahead are you?
[17:33]
C
Yeah, so, so like we still maintain like cloud presence for like Bursting, essentially. And so what we can do is you, we work with AWS and GCP and a few of those other clouds we can just rent. And then the moment we kind of get space or power or whatever, you almost just compact those off the cloud. Because we started on the clouds and then we built a system to allow us to migrate to our own metal. And so there's nothing that says you can't just continually do that again, which is exactly what we do right now. Right. And so we never want to be in a spot where essentially we are computer constrained. Right. And at the start of the year, like, we actually got to a point where we were compute constrained because the one upstream provider that we were actually working with wasn't able to give us quota at the rate that we needed to. And the hardware was like slower. Right. And so we had to do a bunch of different stuff. I spent a weekend rebuilding our entire, like, network, like, overlay, essentially, so that we could straddle five different clouds. Right, yeah. Oracle, aws, ourselves, GCB and like one other one. Right. And we can do more than that now. Right. But, you know, we got into a spot where like, we were just trying to pack instances tight because we couldn't get the amount of compute that we needed. And it was really unfortunate because as a result we had a few reliability things which are now past us. But it was all a result of this. There was a tweet that I made where I got in trouble because I was trying to point it out, but I accidentally caught the Supabase folks in the crossfire. But the tweet was about, it's really, really difficult and it's going to become more and more difficult to acquire compute at the rate that these models need to acquire compute. Right. And we got bit by it, which is, you know, fair and reasonable in the karma scheme of me, you know, trying to point it out. So, yeah, how do you think about
[19:16]
A
pricing knowing that you might not have Euro metal available at all time? Like, are you pricing assuming that you'll need to, like, pay yourself extra margins if you had to end up going in the cloud?
[19:27]
C
Because we've built out our metal data centers, like, our margins on metal are like quite high for the like, 70%. And so we can actually deeply subsidize the cloud business if we want to scale at a reasonable rate. And so we have a few different, like, it's actually very fun from an operations perspective because you have a few different levers on how you can go and scale it. You have the Metal which actually makes your margins. You have the cloud burst, et cetera. You have debt you can use to buy servers in general. So it's a very interesting operational problem to basically say, okay, we have this much cash. Oh, and then you have obviously venture capital you can raise on top of it. You have this much cash. How much money should we raise? How quickly can we go and deploy it, et cetera. If we can scale revenues basically as quickly as we can scale computer, provided we continue to make it trivially easy for people to go and build and deploy in that the faster you can close this loop and the more operational excellent you are with the capital, just the faster your business. It's just a basically straight linear deployment rate on some of that stuff.
[20:20]
B
I think infra startups raising debt is a tool that people don't utilize enough or know enough about.
[20:27]
C
Oh my God.
[20:28]
B
What can you tell us about that? Yeah, I mean, is it secured against your CPUs or what?
[20:33]
C
Yeah, it's just, it's just secured against, against our like hardware.
[20:36]
B
Yeah.
[20:36]
C
Right. And what, what rates do you get?
[20:38]
B
Like, who are the lenders?
[20:40]
C
We just pay like prime at whatever it is. Plus like, oh, like we, we can refinance any of the debt as it goes down. Like the terms are pretty good from that perspective. I think like the, the unfortunate thing is like Twitter has no nuance or whatever. So they're like venture debt bad or whatever. It's like, well, no, like as with all things, like, it's not venture debt. Yeah, yeah. Or, or yeah, it's data center debt. Right. But like, yeah, I think there's, there's specific tools in specific areas where you can be very, very deliberate about not just using one specific tool as a hammer. Like venture capital is a hammer for everything. You just have to kind of like go out and explore it and figure out how it kind of like, yeah,
[21:13]
B
VC is the most expensive financing you can get.
[21:15]
C
Yeah, yeah, yeah. I think incidentally, I think also people think about VC completely wrong from a raising capital perspective.
[21:20]
B
Okay, tell us how this is wrong.
[21:22]
C
Yeah, well, I think most people are like, okay, how to raise as much money as possible from like whoever is like probably the best I can get at that point in time. And I think that's like kind of close to right. But I think what you should be doing, or at least what we've tried to go in and do is like try and figure out what almost unfair advantage you can buy with that equity because it's the cheapest equity or it's the Most expensive kind of equity you're going to give away at that point in time, assuming your company's going to get better and better and better. And how do you use that to, like, go in and work with somebody who is stellar and who's going to go in and compliment you? Right. Like, you know. Yeah, Like Series eight.
[21:51]
B
Blocky.
[21:52]
C
Yeah. Right. Like, you know, great. I've never started a company race. Race Malaki. He's got good advice. I can text him all the time, he's really fast, et cetera. Like, awesome. Right. Then you kind of like, move on and you kind of like, you know, worked with, you know, John and Jordan at Unusual. Right. And they were like, yeah, you roughly know what you're doing in building a product. Like, we're just going to mostly, like, leave you alone and be totally available for advice. Amazing. Awesome. Get to Series A business is a total, you know, operational tire fire because we just don't know how to scale a business. Go and work with Erica and Jordan's over at Redpoint. So bonus, we get to work with them continually. And then now moving into Raised from TQ and fpv, we're moving into the Enterprises now and feeding into air. So every step of the way, we've kind of moved towards who can we partner at this specific time, who's going to help us unlock that next section of the journey. Because guess what? I don't know. Enterprise sales. I can roughly eyeball it and be like, yeah, as an engineer, I think these are the kind of features that we're going to roughly going to need. And we have some wonderful people who are going to help us internally. But you really want to work with those people at the boardroom dynamic level are going to be like, oh, yeah, we're all aligned. And that's obviously what we want to go in and do. And we can spend our time basically saying, how do we. How do we win this versus, like, bickering about strategy. Right?
[23:03]
B
No, I just had to pull up some beautiful data center charts.
[23:06]
C
Yeah.
[23:07]
B
I feel like you've done others. I couldn't. I just couldn't find them.
[23:10]
C
Well, these are good. I mean, like, they all kind of look the same. Like the servers in a rack box. Yeah, exactly. This is our box. Like, do you want to see more racks? It's like, oh, yeah. It's like, you know, I want the
[23:20]
B
J. Cooper signature edition.
[23:21]
C
Yeah, it's. We have. We actually have. We have plans internally. Yeah. So it'll be fun. We've got a few different promos that we're going to do and like stunts for. For the year, so those will be fun.
[23:31]
B
Yeah.
[23:32]
A
You had a tweet about data centers in space just before we wrap this section. Yes.
[23:36]
B
Why no data centers in space? Why you hate so much?
[23:40]
C
Okay, so. So it's not no data centers in space. Because actually, I think, like, like, my hot take is like, I think this is solvable. I've just never seen anybody solve it. Right, because you need to like.
[23:49]
B
No, no, no. You said, how are you going to dissipate that much heat in a vacuum? You're making a physics claim.
[23:56]
C
Yeah, yeah. Well, because. Because I haven't seen anybody like, prove how you're gonna go and dissipate that much heat in a vacuum. Right. Like, it doesn't mean that it's not possible. It just means that, like, nobody's kind of. Pardon.
[24:06]
B
Astrophage. The Martian thing. Okay, you're very lost.
[24:10]
C
Yeah, yeah, that's fair. But, yeah, I don't. I mean, it could work in general. Right. But I think a lot of people, and I think incidentally, this is probably what you have to sort of do is like, they're putting almost the cart before the horses. Like, oh, yeah, we're going to put data centers in space. It's like, okay, but how? It's like, well, we have some period of time to basically figure it out, right? It's like, it's like, you know, in the Martian where they're like, oh, how are we going to, like, intercept?
[24:29]
B
Yeah.
[24:30]
C
Oh, okay, right. It's like, how are we going to do that? It's like, well, we'll figure it out. We have so however long to go in and figure that out, you know, so.
[24:37]
B
Yeah, yeah. Making a bet on human invention is weird because you just have to blind trust that it can be solved.
[24:43]
C
But 100%, right.
[24:44]
B
I feel like physics and there are some first principles, bounds that you can put on. Maybe not.
[24:50]
C
Yeah, I know, right.
[24:51]
B
Maybe you're asking to travel time here or break some fundamental thermodynamic law.
[24:56]
C
Yeah. And I don't know how VCS do this incidentally, too, because it's like, how do you know what's basically not possible and is a grift versus is possible, but sounds completely insane. Right? And you're like, oh, cool, we're put data centers in space. It's like, okay, coin flip as to whether that's like one or the other. You just don't know, I guess. And I guess you'll know in like 10 years. Cool. That's one cycle.
[25:22]
B
Okay. Okay. Yeah. So moving back to agents.
[25:24]
C
Yeah.
[25:25]
B
I think the branching that you do, the fast spin up and orchestration, it's kind of like the pre work that happen to be exactly what agents want. What do agents want differently than humans?
[25:37]
C
What do agents want differently than humans? I think they want the ability to version things. So it's not actually that different. There's just almost slight deviations in terms of how it materializes. So agents want a way to be able to go in and test changes incrementally. We have feature flags as like engineers or whatever, right? Like is there any reason why they can't just use feature flags, right? I don't think so. Like, I think there's ways that you can just go in and do that, right? They want version control. Is there ways we can use git or not git? I think that one is like realistically completely up in the air, right. And I do think that something ultimately outside git will emerge in terms of how we're going to go and version a lot of these things over time. They need observability. You need to be able to go in and essentially query what happened at what point in time, which steps failed, traces, logs, metrics, all of those other things they need like network, compute and storage. They need the ability to write files, save files, iterate on files, snapshots, file system, all of those other things, right? And so I think a lot of the stuff that we roughly needed is like very, very kind of in line with a lot of the stuff that agents also need, right? And so like the branching and forking stuff, like it's not different, like we're just moving a thousand times quicker than we used to. And so some of these things look like you really need something massively, massively different, but it's just you need something massively better than what currently existed, right? You need orchestration, you need something massively better than Kube, right? You need networking, you need something probably better than Envoy. And it just goes all the way down the stack essentially in terms of well, if the workload profile doesn't change so much as it gets like massively, massively compressed. Because you need to do thousands of these things, what assumptions change, right? Like etc is going to melt, right? Like you know, you need to replace it with something, right? And then I think you can go all the way down the stack and basically say, okay, well that part has to change and that part has to change and that part has to change. And the interesting thing about the kind of like super Exponential curve is that you have to build your systems in such a way where you can rip out those parts at any point in time because a new bottleneck might emerge because, you know, you start getting really, really good at like parallel agents. Right? And then that's, that's kind of where the new bottleneck is. Right. And that breaks a different part of your system. Right. So I think it's very much like similar kind of stuff that, that kind of like the humans have needed. You just need it at a thousand X scale. Right. So like, how do you, how do you do code review in the age of the agents? Right. I guess this is more of a.
[28:00]
B
You do more agents.
[28:01]
A
I don't.
[28:02]
C
Yeah, right. But then like, who, who reviews things for like, CVEs and like all of those other things? You can. Yeah, right, okay. And then that's how we hit the inference wall at some point. Right. And you can continually throw agents and agents and agents at that problem. Right. But like, you know, I think there's. I think there's a limit to like the amount of agents you can kind of throw out. A problem you started, though you already
[28:26]
A
had a CLI before. It was cool, I guess.
[28:28]
C
How is cool? CLIs have always been cool. By the way,
[28:32]
A
how has the shape of like what you're exposing change, if at all?
[28:36]
C
Yeah. So I think the CLI changes because the way that we think about this is like, how do you give Claude or Codex or Chat or like whatever, like any of these models, almost like handhold. And like a CLI is a single command when you think about it, right? It's like, okay, well you're going to do deploy or whatever, right? You're going to get logs, whatever. Right? Like things that were prohibitively annoying to humans are not actually prohibitively annoying to agents. They're really, really nice. Right? And so if I wanted to hand you a CLI and I said, hey, guess what? The CLI has 40 arguments and 600 flags, you'd be like, wow, that's crazy. Like, I'm never going to use all those things in general. Right? But you hand it to an agent and you say, hey, there's 40 arguments and 600 flags. To be like, oh, yeah, this is excellent. You know, like, I have so many handles that I can go in and kind of like work on with this. Right? And so I think, incidentally, if you're going to go in and try and expose things for agents over, over that mechanism, you want to just basically have as many handles as possible where they can get information, query additional dynamic information, and then see how it can close that loop as quickly as possible. Most of the kind of problems right now are actually just how do you close loop as quickly as possible, where does the agent get stuck and how can you go and remove that? That's why, incidentally, telemetry is very, very important. Because if you can tell where the agent gets stuck from the CLI and you say, hey, listen, 12% of people are actually getting deviated from the happy path because of this thing, and now I go and add this ARG, and that drives it down to 2%. You've massively increased the rate of the loop closing for a lot of people in general, right? So that's kind of the way that we think about not just the cli, but every point in the dashboard, right? Like it is a user journey from I hear about railway, I go and get something deployed, I get my first green build, whatever, aha moment, I see an endpoint, I see some logs, I see whatever. And then I go in and iterate, right? And then I go in and iterate loop is indefinite and infinite until the end of time, right? It's basically like user wants to deploy a new thing, user wants to deploy new postgres instance, user wants to change their code, user wants to iterate all over time, right? And so if you just focus on a lot of those iteration loops and figuring out what's blocking that loop from closing as quickly as possible. One of the things we talk about internally is you never ever, ever want to be waiting on Compute anymore. You always want to be waiting on intelligence, right? And if you're waiting on Compute, there's a bottleneck that needs to be destroyed there. Because at some point that bottleneck will be so, so, so large that some other workflow will kind of emerge to go in and change a lot of that stuff. And I think incidentally, we've built a really, really awesome product where you can push code and then you build the code and all those other things that push, pull, whatever kind of loop, I just fundamentally believe it's going to go away. We're going to get to a point where you make a small change in production that changes version across your entire kind of infrastructure. You're working alongside, copy and write versions of your database, all of your infrastructure, and then you merge it in and instantaneously it's like live, right? Because that's like the holy grail of loops, right? But that push, pull, rebuild thing is a point of friction that we are removing entirely from Our loops.
[31:43]
B
Yeah, it's incredibly fast, so if anyone hasn't tried it. Yes, that fast feedback is great. My hot take is that Railway was kind of famous for its canvas, which sort of visualizes your infrastructure unless you manipulate it visually. But that was for humans and actually now for the next phase in growth. Really, CLI is more important than canvas, which is what you were famous for.
[32:06]
C
Yeah. So I think the canvas is funny because it's actually just a mechanism to show you changes over time. But I think you're totally right in the sense that we have previously used it a lot as an input, and its goal moving forward is actually a lot more like an output. What I mean by that is you would go to the canvas and you'd make some changes and all these other things. Things, whatever, and you see them and your agents or your infrastructure would evolve over time. Now you just have a bunch of agents that they have access to CLI and they can go in and make those changes in general. And so the canvas actually, instead of becoming this input thing where you're like, oh, cool, how do I go in and make this happen? It's actually just more of an output thing. It basically says, what information. Yeah, what information does the human need at this point in time to make suitable decisions about control requests of, do I approve this? Do I not approve this? That's realistically all the canvas becomes at that point in general. Right. And also a way. And I think this is important, and I think this is lost on a lot of people who are, like, building some of these, like, canvas experiences. It has to be almost like an anchor for your context. It has to be like a port in the storm. It has to be like, you have to think basically about it as, like, layers and like a file system almost to, like, get to the next spot. Right? And so you have all your infrastructure and, like, this is why the canvas starts as, like, it's just a project, right? And then you. You have a drill down chart, right? Like, it's like, I'm breaking these services or this, like, section that just is like a function or code or anything else like that, because you want to actually be able to represent the entire thing, not just in your head, but in this. In this canvas, so that other people can also get that representation so that they can think on the same wavelength as you, so that they can move as. As quickly. Right. I think a lot of orgs, especially as they scale, they get in trouble because all that context lives in somebody's head, basically. And then it's like, oh, how does this microservice work? It's like, I have no idea. Go ask this specific person. And then you have entire categories and classes of products that are built around how do you do context discovery at all these things? And I think a lot of that stuff just gets melted in terms of if you can have a really, really solid hierarchy and you can infinitely nest services, infinitely nest code, infinite, less context, infinitely nest all these things all the way down, that's what allows you to build these kind of structures up over time. And I think it's also what's going to allow us to, like, build. I've written a bit about this. Like, these, like, hyper structures, like things that are way, way bigger. And like, you know, you look at the Golden Gate Bridge and you're like, how, how did we build that? Like, you know, we. There's that whole meme of like, oh, how do we build this? Like, we lost the technology. We don't know how. We don't know how anymore. Right? It's like, it's like. Well, yeah, I mean, to some extent, yes, because a lot of the coordination that we that built those things, like, has evolved, right? And like, has changed and there are new things that we've lost. Almost like some of the art of like, building that structure as we've just like jammed everything into Slack, right? And we're just like, everything happens through
[34:52]
B
Slack and it's not anything in Discord. So.
[34:53]
C
Yeah, well, it's the same point. It doesn't. It doesn't really matter. It's just like message passing and interrupts. Message passing and interrupts, Message passing and interrupts. Right, Like.
[35:01]
B
So you're arguing that there should be something better, more structured than Slack? Yeah, yeah.
[35:06]
C
Oh, for sure. I think Slack, I think. And incidentally, I think Discord is awful too.
[35:10]
B
This is the equivalent of my mom test, right? Like, what have you done that has your solution to this?
[35:15]
C
So internally, we built a tool called Central Station that allows us to go in and aggregate all the context from all of our users. So every piece of feedback, every piece of customer support, every single thing like that gets aggregated into what we call, like clusters. If you have an incident brewing or like anything else like that, now we can go and determine how many users are affected, all of those other things, et cetera, and then we can actually break off a discussion based on that. And I think a lot of that is actually a lot more helpful and more correct in terms of instead of having just these long running channels where you're Just like, which channel should I put this thing in? If you can dynamically aggregate that information and dynamically route it to the right person based on the context we know internally, these four people are pretty close on networking. So if we see, okay, we've got a networking thing, you can roughly drill it down to those four people. And if you're saying, oh, okay, cool, it's actually with this part, you can just go and look at the commits. And this is no longer a manual process internally. This is the whole point of why we built. If you go to like station or help.railway.com. there's a whole reason we built this thing. Right. Is because we wanted to figure out how we're going to go in and scale with like a massive, massive, massive amount of leverage to go and aggregate all this feedback.
[36:26]
B
You know, this is built in house.
[36:28]
C
Yep.
[36:29]
B
Okay. So. And then I remember helping out on this one with Angelo in 2023. Yeah, you scale a lot with. With a very small team.
[36:38]
C
Yeah, yeah, yeah. So we're like 10 times bigger now.
[36:40]
B
Oh, my God. You have your full developer account here.
[36:43]
C
Yeah.
[36:44]
B
Okay.
[36:44]
C
All right. Yeah. If you go to our.
[36:46]
B
I can just like, like cron this and then just.
[36:48]
C
You don't even have to cron it. We expose this as like a pub subable thing. So go to railway.com stats.
[36:53]
B
Oh, there you go.
[36:53]
C
Yeah, that's like all real time metrics for all of this stuff. There's a way to get this as like a JSON too, somewhere. If you care or anything else like that.
[37:01]
B
We'll look it up.
[37:02]
C
Yeah. But yeah, yeah, we're big on like trying to build everything in public. Talk about a lot of the stuff we're working on. You know, like, we've had some issues or whatever in the past and we're like, hey, cool, like, here's how we're fixing these things. Like, we've, you know, we've got both compliments as well as some flack for incidents reports and always trying to make them better over time just to talk with people.
[37:21]
B
Yeah. Obviously you had a big one recently. I like that it was only scoped to 3,000. You presumably use Central Station, talking through what happens and I guess, how do you address it internally as a team?
[37:38]
C
Yeah, so internally has named this one like really, really Sucked. You know, it was, it was like to do with an upstream provider that didn't. They didn't do the behavior that they said they were documenting, which is unfortunate given they like wrote the RFC on how the behavior should work. But we rolled those things out and then Central Station kind of caught that initially where we had a couple users being like, oh, like caches aren't invalidating for some of this stuff. Right. And so turn it off immediately, etc. Right. But when you go and kind of roll out to those like that like, large user base of like 3 million people. Right. You know, like, you have a lot of different disparate behaviors that, that can kind of come up. Right. And so try as we will, we tested those things and, you know, staging. We have tests for them, like all this other stuff, you know, and unfortunately we like hit kind of an edge case there. Right. And we've incidentally, like gone and hardened a lot of those systems and now we can make a lot of that stuff better. But. But yeah, it was a tough one, unfortunately.
[38:39]
B
Yeah. I always wonder how the private disclosures are supposed to work. If people find an issue, are they supposed to contact you first? When you run a platform, these things are going to happen. And what channels should people pursue to quietly resolve it before it becomes a much bigger incident?
[38:59]
C
Yeah. So I think there's responsible disclosure. We kind of err on the side of the, like, we'd rather over disclose and know that, you know, that something is wrong versus almost like having your provider gaslight you. And so, yeah, we've erred on the side of like sharing those things kind of more publicly, even if they go and impact a small subset of those users. Right. And that's kind of just a decision that we've made internally. It's under, like, we have four values. One of them is honor. And so what's the honorable thing to go in and do? It's like, well, you notify people, you know, to the widest degree at which they may have been, you know, affected or there was an issue or whatever. And then we kind of confront that head on and be like, why did that happen? What can we do better in the future? All of those things kind of like that, you know, so.
[39:45]
B
Yeah, not the whole user base.
[39:47]
C
No.
[39:47]
B
And that's because of like incremental rollouts
[39:50]
C
and progressive rollouts and stuff like that. Right. So interesting.
[39:55]
B
Yeah. I feel like that should just be the norm at all large platforms, right?
[39:58]
C
Oh, it totally should. And a variety.
[40:00]
B
Which you did this.
[40:01]
C
Yeah. And a variety of companies. It totally is. Right. Like what? There's a whole quote of like, meta runs like 10,000 versions of different versions of meta in general. And like to our earlier point about agents, right, like they need the same thing. They need to Be able to shadow traffic. They need to build all these other. I think we've built so much ceremony around like production is sacred, all of these other things that we need to get to a point where it's just trivially easy to test different behaviors in a safe environment because then you can make those mistakes in an environment that's safe in general.
[40:29]
A
Right, you mentioned somebody brought it up. Do you see a world in which these things get automatically caught not necessarily by your agent, but your customer agent? You know what I mean? The cache invalidation thing seems like a pretty easy thing to check if you know to look for it.
[40:44]
C
It's hard because then you almost need. Well, for us to determine it, we need almost. We'd have to hook in with your observability infrastructure in general. This is why we almost have the template loop on the platform is to be able to roll those things out progressively. Where you say, hey listen, I can roll this out to Johnny Vivecoder initially or I can push a shard and you can almost consume that at your own leisure and be like, oh okay, I'm going to update to this specific version or have this rollout over a period of weeks where you're pushing a new version and then it goes to, you know, 0.1% of people, 1% of people early, like whatever and then rolls out all the way there. Right. That's the kind of like non deterministic version control that we've kind of talked about earlier. So yeah, 100%. Right. And I do believe that like that's where most things should go, go towards because I think ultimately most companies end up building that stage rollout system in house. Right. And it's just the same thing built again and again and again at every single one of these, these different companies. So there's a massive opportunity to consolidate a lot of like developers Deck you
[41:45]
A
should have a free tier like the model providers give you free tokens if you let them use the data. Like we'll give you free compute if you're like the number one shark that goes out and you let us plug into your observability.
[41:55]
C
Yeah, like incidentally we do that. Right. And that's why the, you know, we talked about, yeah, we talked about, you know, the impact of that on like 3,000 people or whatever. We start with the kind of lower impact like the, you know, larger companies, et cetera on the platform. Right. Like they're the last ultimately that should receive those kind of rollouts so that they have a version of the platform that's like deeply, deeply stable, right?
[42:16]
A
I have three services, so I'm sure I get the first roll. You can nuke my thing at any time, man. I guess my other question is like there's all these like SRE agent companies. There's like the observability people also want to have agents that fix your upstream problems. How do you kind of, you have your own agent in the canvas now that you can chat with. How do you kind of see that play out?
[42:39]
C
It's almost like the stacking entropy thing in general, right? Like I think if you don't have the primitives to make iterating in production safe, it becomes very, very difficult. And so if you're an observability provider and you're like, oh, here's this fix to this error, assume 80% of those, they're probably actually good, they're going to make sense, et cetera. But then the last 20% of that long tail of complex issues in general, ultimately rolling those changes out, if you just let somebody say oh cool, this looks good and just stamps it, there's an opportunity for you to have an issue or an incident or anything else like that. And I think that's why it's really, really important to have those kind of like forked environments in general. And people have staging, etc. But it always end up ends up like deviating from prod, right? And so you need the primitives and the workflows and the experience like, like built in our, in our mind on like as a first party thing on the platform so that you can fork any point at any service at any point in time so that you can almost like, you know, I think I consider the canvas almost as like a little like sheet of transparency paper and the agent is kind of like this little guy that you push up and it's like, should be able to like pop up in the canvas and it should be like, oh cool, like well, I need to copy that service. I need to copy that service so I can test these two things. Right? That's my hypothesis as like an agent or whatever. Okay, cool, I can go in and do that. Looks good for all this stuff. Ideally I get a read only copy of production. Anything that's PIA et cetera is kind of like marked as like a transform when we automatically clone that database or go for a copy on write version of it or read from it and then just makes those changes. It says, does this actually work? Right, like as close to production as possible. Right. Because ultimately that's how close you have to be. Or you just have a massive amount of drift where, oh, I've changed this thing. And then it just kind of gets out of sort. Right. The system gets a lot more unstable. And I think that that's like what you see with a lot of these kind of almost massive systems that these companies built on top of, like Docker for local and then like Cube for production and like this specific thing for whatever. Right. It's like all of that complexity ends up getting to a point where it slows down the developers. Yes. But it just gets to a point where it's so unstable at scale that it becomes hard for people to go and iterate and make those changes. And so we want to compress a lot of that stuff way down and just say as close to prod as you could possibly be. That's where we want to be.
[45:01]
B
I was texting Erica for questions and she says actually you were originally not a believer in AI sre.
[45:08]
C
Oh, yeah. I mean, I've kind of.
[45:10]
B
Have you come around on it.
[45:11]
C
Yeah, well, I flipped. I'm actually still not a believer on the arsre because I believe that you need the primitives to make those things safe. And if you just unleash an aisre on your production infrastructure and you don't have safe primitives for copying volumes, making sure that this is fine, it's going to nuke your production database. It's not a matter of if, it's a matter of when it's going to nuke that database. Right. I'm a big believer in. In making those kind of, like, loops safe in general. I think I was a pretty deep, like, almost. I don't want to say skeptic until like 2023 and then 2024, I kind of like, was like, okay, like, maybe I can make this thing roughly do it, et cetera. 2025, I was like, okay, now I can, like, hold this, etc. And then, like, over the whole Christmas break, I think you just saw, like, I guess, winter break, but you just massive. Like, everybody came back, they're like, oh, my God, it's almost impossible.
[46:02]
B
Here's you on the cloud document. Yeah, cloudbot.
[46:05]
C
Well, open cloud, but it's gotten to a point where it's almost like it's harder to hold it wrong than it is to hold it right, you know, and it's like, you know, there's that scene in, like, Avengers or whatever where vision's like, it's terribly well balanced, you know, like when he picks up Thor's Hammer or whatever. Like damn. Like this, this thing just kind of like self balances and like works quite well from that perspective. So yeah, I'm a deep believer at this point in terms of that will be the dominant species. Right again. Assembly, C C, JavaScript words.
[46:35]
B
Yeah, it feels like a big jump.
[46:37]
C
Yeah, it feels like a big jump. And it is too. Right. And I think it's not like you abandon CPU based discrete logic in general and just move straight to fuzzy logic. You need both. Right. So your skills should call code or applications or whatever, some sort of static structure and you can use the skills to kind of distill what the almost like procedure should be or like how the code should act. Right. I'm kind of coming to this thesis which is you need three points essentially, which is you need a clear spec of what defines the system, you need the code and then you need the tests. Right. And I think when you say this thesis out loud it's like, well, if you've been in engineering for any amount of time, you're like, well no, yeah, of course that's a rfc, like a request for comment. That's tests and that's your code. Right. But they all matter a lot. And having them all be actually together so that they can reinforce each other and say, well, the spec and the tests match but the code doesn't. Let me reconcile that. Oh, okay, now the tests and the spec match. Let me go and reconcile this other thing. Right? And you can kind of move through that period of basically saying, well, this is fuzzy and these two are either discrete in the case of tests or slightly fuzzy, slightly discreet in the case of code. Right. And that's kind of your iteration loop. I think that's also incidentally why you're seeing a lot of people be like software factories and I want to write this doc and have it go and reconcile and all this other stuff, which I think is a bit of architectural astronomy if you don't actually go in and implement it. But I do think generally that loop is kind of where most things are going to ultimately end up.
[48:08]
B
Yeah, for listeners. We've been talking about this on the POD for three years. The holy trinity of specs and tests. Itamar Friedman from Kodo is the reference for people who want to look it up.
[48:18]
C
Nice.
[48:19]
B
One thing I do want to mention, just on the open cloud thing, is also the idea that you can self modify, which is kind of interesting. I don't know how exactly Railway would support it, but I do have my openclaw and I just Tell it that it has the railway cli. You can do whatever and in theory you can just whatever capabilities and new infra you need. You can just call the railway cli, provision it and add it to itself. And so the agent can modify its
[48:44]
C
own infra, which I think is, yeah, it's nuts. We have a loop that I've kind of set up which is you put the railway CLI on top of of something that runs on top of railway and so you're essentially authenticated as whatever the current box is in general. And you can make any sort of changes to it. And then you just call railway deploy and it deploys itself, right? Like it's just like, oh cool, I need to go and spin up this instance of this environment. I already exist in this environment. Excellent. I've got access to a postgres instance now, right? Like, and this is kind of where we want to go with a lot of the like agentic almost like self replicating, like infrastructure is like that's your loop. Like you iterate in production, that's your loop, right? You're going to just continue to make some sort of change and either it will work and you're going to want to go in and merge it and say cool, that's great, like put it into our upstream or it will not work and you can just kind of throw it away, etc. Right. How do you go in and make those throwaway copies, like as trivial as possible to spin up, run super cheap, etc. I think the era of like I have an AWS instance and I'm going to, you know, get four VCPU and 16 gigs of RAM, it's going to get completely destroyed, right? Because it's like if you do that for agents or anything else like that, you now need a thousand of those machines, right? Like it's so prohibitively cost expensive versus like, you know, we've spent a ton of time trying to figure out how do we go in and make these deploys, whatever you want to call them. You know, cloud service got the like isolates, everybody's like called spam box. Like whatever that, like that atomic unit of deploy only pay for what you use, spin up instantaneously closed loop as quickly as possible. Because if the system can self replicate the system and it can do so safely and say this is my environment, I'm making these changes, et cetera, it can come back with hey, does this look good? This is a new state of infrastructure. Given this prompt, I think I've solved this problem. And then you can go back to the agent and say, actually looks a little bit different, goes and does the loop again and you're like, cool, excellent. Apply.
[50:39]
B
Yeah, I think that's retroactively obvious. Retroactively, kind of like the most useful kind. I don't know. Any other comments on just like agent deployment on Railway?
[50:51]
C
No, I mean, it's getting better every day and I'm on X or Twitter or whatever you want to call it. And you can always yell at me about the experience not working as well as it should, because there's plenty of things that should work way, way better.
[51:04]
B
I was going to say, I think right under this stage in the juncture, when people want the massively or embarrassingly parallel compute, they usually talk serverless. And I feel like there's a new serverless that has emerged compared to the previous five years of serverless. You're kind of like in that new bucket. I don't know if you have comparisons or philosophical differences that you want to call out.
[51:31]
C
No, I think it's like, it's, as you kind of mentioned, it's somewhere in between. Right. It's like the ability to run stateful long running, like you want to call them workflows, you want to call them executions, you want to call them whatever.
[51:43]
B
Which like Vercel has Fluid Compute and then Cloudflare has some container thing.
[51:48]
C
Yeah.
[51:49]
B
Google has always had the App Runner.
[51:51]
C
App Runner and the new one. Yeah, I forget a bunch of them. Yeah, yeah. I think like, that's kind of where everything, roughly, and this is why we've been working on it for the last like six years, is like, we just believe, like you do need access to a computer. You'd like a lay a box that speaks Linux. Right. So that you can deploy the things that you want to go in and deploy on it. Right. Like, other things are going to. I mean, they're going to change the almost like surface area of what you can kind of go in and build. And. And for us, we're always like, no, like, users need a computer and they need to be able to deploy anything that they truly want. Right. And that's why we focused on long time, for a long time on those primitives of network compute and storage. Right. Because if we can give you those things and we can expose them to you and allow you to run these things indefinitely, that's of course, where we believe that it's going to go in general. And so I think you're seeing right now where, again, the whole Twitter has no nuance, where everybody's servers. Servers is like. No, it's always, it's always somewhere in the middle. It's always some sort of convergence of well, I want to run it for a long time, but also I don't want to provision this resource statically or pay for just things that I'm not using or anything else like that. And that's always been our thesis from day one. It's like pay only for what you use, run it indefinitely. It is just full Linux basically.
[53:13]
B
I think that's why I like Vercel naming a fluid. It's like it's. Well, it's fluid, it's flexible. Another milestone. And then I wanted to ask one more technical question which is the Heroku official deprecation or what do they basically, you know, you are one of the presumptive new Heroku's New Heroku has been a category for like as long as I've been in developer tooling.
[53:34]
C
Yeah, right.
[53:35]
B
It's finally happening.
[53:36]
C
Yeah.
[53:36]
B
What was that like when you know, standing behind the scenes of like, well, this is the moment.
[53:42]
C
Yeah. I mean you just have, you have so many people just like, you're just like, like you were running stuff on here. Like you as this company. Like it's crazy that like you whatever like name that you would know is running this thing and then you're coming to us be like, yeah, we kind of like want to like move a lot of this stuff off or whatever. Like okay, cool. But yeah, it's kind of just nuts
[53:59]
B
like I think any like behind the scenes. What's. What is what. Why does Salesforce let Heroku kind of just stagnate?
[54:06]
C
Well, I mean I can only, I can only like guess I guess. Right. Like, I mean I think it's just hard when like it's not your business. Like the business of Salesforce is to build a really, really good CRM, you know. Right. And like that's their focus, right. They should be really, really focused on building a really, really great CRM. And then you acquire this business as a compute business that's kind of an offshoot of your, your business in general. Right. And I think like, you know, a lot of the early meta folks have talked a lot about like focus. Right. And like I think Boz has a whole like write up that he's done basically where he, he talks about. In the early days of Meta, we had no money and like we were forced to get focused. Right. And then we basically turned on the money. This is all like, you know me verbatim. Yeah. Rephrasing or whatever, we turned on the money tree and then we had no reason to not have focus because we just had infinite money where we could go and split all of our focus, right? But that ends up diluting your product. It ends up making these things where you kind of have these offshoots where you're just like, is that the focus of the business? Right? And it ultimately ends up not being if it's not the core of your business, right? And so to me, it's like kind of no wonder that like it languished in general, right? Because it's, it just wasn't the core focus of the business. And I think that a lot of companies get in trouble with this when they kind of like split out their focus in general because it means that you're almost like fighting a multi fronted war trying to compete with all these things and not just compete with them externally, but compete with them internally for alignment. And where are we going? What are we doing? What is our purpose here, right? If you're really, really Salesforce pill, then you're like, hey, listen, I love Salesforce and I really want to work on all those things. Like, you know, and you're mission driven, which is like the aspiration for a company in general of like, why do people work on things, right? It's like they want to work on something interesting, right? Like Heroku is off to the side. It's like it's not the core of the business, right? And so to get those resourcing, you know, like budget or focus or alignment or whatever, internally, it's just pushed away. Right? So it was, it was literally just a matter of time for it to happen in, in our mind, right? Yeah.
[56:07]
B
I think kudos for them to like actually call it out instead of just letting it be unknown or.
[56:12]
C
Yeah, well, their whole, their whole release was a little bit odd because they like, you know, they, they kind of called it out.
[56:17]
B
They did the. Our incredible journey.
[56:19]
C
Yeah.
[56:20]
B
They didn't say they were like shutting it down, but they're like, yeah, yeah, yeah.
[56:24]
C
So yeah. And then, you know, behind the scenes, I think they issued some, some stuff to people being like, hey, yeah, you should like close these accounts down. Like we are going to go in and defecate this and like remove it over time. So. Yeah, I mean it's just like. And it's crazy because like some of my first deployment experiences were like on Heroku. It's like a foundational thing where I
[56:44]
B
had a freaking alias in my bash for like Heroku deployments.
[56:47]
C
Yeah, right. Like, you, you start with like dragging stuff into an FTP server and then like, you move on to like trying to get a deploy working. Like, how do I go in and make this happen? And it's like Heroku. Right.
[56:56]
B
Did you know about Heroku packs and.
[56:57]
C
Yeah, exactly. Right. Like, and you learn about all this and it was the on ramp for us. Right. You know, but that the wheel turns regardless. Right. Like, there's, there's new stuff that's emerging and like, we're very, very happy to like, almost like continue to like, you know, carry the torch on for a lot of that stuff. But we, we don't want to be the new Heroku. We want to be the way in which people are building and deploying software and ultimately the way that people monetize software over time. Right? Yeah.
[57:20]
B
So, I mean, still, it's a big crown to be a new Heroku. Like, there's like 50 companies that fought for this.
[57:24]
C
Oh, yeah. Everybody's kind of like, you know, holding some portion of this, being like, ah, you know. But yeah, I think for us, we're just happy to go in and support people, companies, et cetera. The platform works a bit differently, so it's obviously almost the similar kind of game loop cycle. Exactly. But we've been quite dogmatic in terms of where we believe these things are going to go in terms of primitives, the agents kind of fan off all of those other things. And so some things will fit and then some things will, you know, you have to change a few of their workloads, et cetera. Like, we don't have. And what's that feature that people really love? Pipelines? Heroku. Yeah. Right. Like we have some approximation of it with the environment system in general. Right. But yeah, so it's, it's been super exciting. We've got a ton of people that we're able to go and support, so. And it's growing a lot. So.
[58:12]
B
Yeah. Any other technical. I have one more Temporal. Okay. So Temporal, I have sold my shares. You are a power user. You're one of our earliest customers. I think I met you through Temporal or something. You're a big temporal business. Like your business build on Temporal. You have complaints. I think this is the most neutral, most informed conversation that anyone will ever hear about Temporal without someone working at the company. Yeah, it's the two of us.
[58:42]
C
Yeah, yeah. No, I think that's fair. I have used Temporal for almost like 10 years now. Right. Because like Cadence, Uber, all of us,
[58:51]
B
other things like that just give people a scale of what Cadence is at Uber, people don't know.
[58:57]
C
Yeah. So Cadence was the precursor to temporal. And it powers all of the trip actions, the rides, the like, you know, when you like rent a jump bike or scooter or like anything else like that or a car, it's like you're running these workflows for a period of time and you're basically saying this ride will run for an indefinite period until it like finishes. Right. And you can go and attach information whether it's like, oh, you paused it in this zone and so you know, you need to add this dollar charge to like the, the bill or anything else like that. And then when you end the trip, like your workflow is done. Right. That whole experience behind the scenes, I don't know about today in general, but it was like powered by, by Cadence at that point in time. And so it's a really, really like.
[59:34]
B
And I used to say, like, it's like, imagine if you could program the entire user journey top down as one function.
[59:39]
C
Yeah, right. Yeah. And it's, it's such a, it's such a powerful idea and it's so, so important. It's also incidentally, so important for the next phase of the agentic journey, where you want an agent to do a specific task and then you want it to be complete or incomplete on that task and then move on to the next thing. You need a way to be able to go in and manage these workflows. You need a way to be able to go and manage these workflows dynamically. And I think for me, temporal was always really, really, really great in theory, and it was really, really great when you got it working the way that you wanted to in production. It's just, it required you to model that entire journey in your head. And if you didn't have the entire journey in your head, you could put yourself in a spot where you would cause issues where replaying the state of the entire workflow causes a non determinism issue.
[60:26]
B
Because it works on deterministic workflow history.
[60:28]
C
Yeah, exactly. Right. And so it's very, very easy. The way that I kind of described it is, well, it's a jet engine, right. If you know how to go in and operate, if you know how to go in and run it, all of those other things. But you can't hand it to people who are trying to build things that end up being complicated but don't have that whole kind of state in their head. So if you have a large. We run our whole deployment pipeline on top of it. And so that's like a reasonably complicated workflow, right? There's pre commit hooks, there's signaling, there's queuing, there's all of this other stuff in general. And we kind of ran into the same thing at Uber where as you tried to express this large workflow, as you mentioned, like going all the way down got more and more complicated and it got more and more states in the state machine that you had to like map the state machine back to like the world. Yes, right, yeah, exactly. Yeah. And so at Uber we built a system for, you know, doing the state machine and like testing the state machine and all other stuff. And we've started to like go and build some of those things here because like it's, it's grown, you know, quite heavily. Right. But it's like, it's such a, like, you know, I don't want to say love, hate relationship because that's like too broad in general. Like it's. When it works really, really well, it works like super, super well. Right. But then you run into a spot where you just like somebody who hasn't interacted with the system or doesn't have the full context of the system goes and puts something in the system that invalidates some of the state or causes a non determinism issue or spins off a ton of activities or anything else like that. And then you have to kind of keep track of almost underlying SRE knobs of like, oh, we have the amount of activity slots in this thing. Right. It's like, well, these should just scale with memory, vcpu, all of those other things in general. Right. So it ends up becoming a bit of a bear to kind of scale out in general.
[62:11]
B
Yeah. So you need like a very capable sysadmin running things behind the scenes for you.
[62:15]
C
Yeah, yeah.
[62:16]
B
If you were to move off, what would you do?
[62:19]
C
I think we would build our own workflow engine. We have a few internally that we've kind of like worked on. So. Yeah, because it's like.
[62:27]
B
Yeah, this is one of those things where like, you know, this is one of those classes of things where like you, you typically wouldn't vibe code it. But I'm wondering if.
[62:34]
C
Well, I don't think you should vibe code it still. Like you still want to run like Jepsen tests and stuff like that, like to make sure that like, I mean,
[62:42]
B
it's not like Turbo had to invent that from scratch either. Right. So there's libraries for those things that you can run. And on top of that, it's just A state machine that you have to really map out. But ultimately you define those abstractions that you want and you run into a state machine and that's it.
[63:00]
C
Yeah, it's very, very doable. I think the workflow stuff is very, very interesting. There's a few really like cool company. So I think like Restate's doing some neat stuff here.
[63:10]
B
So you're very tied into JavaScript. You're like a JavaScript maxi.
[63:13]
C
Internally we have JavaScript, we have. Or we have TypeScript, we have Rust and we have Go. Those are three languages. Right. We don't add any more stuff. Actually that's not true. We have a little bit of C because we write BPF code and like, and it's hooks and stuff like that. So but those are the kind of.
[63:28]
B
Is this for this like the side container things? Side car stuff?
[63:33]
C
No. Well, so this is for the networking stack as well as the volumes and stuff like that. So yeah, but it's like. Yeah, we use the TypeScript stuff a lot because it's like what powers the dashboard. But we're going to move a lot of the kind of workflow stuff off of the kind of dashboard stack into actually the infrastructure stack where recently.
[63:55]
B
Yeah. Don't power things on front end guys. Even though it's free computer.
[64:00]
C
Yep.
[64:00]
B
Yeah, yeah.
[64:01]
C
Cool.
[64:01]
B
Any other technical infrastructure. Cool stuff. Railpacks. I don't know if that's still.
[64:07]
C
Yeah, yeah. I mean we built an engine for determining dependencies based on your source code, which is super cool. It's called railpack. We built the first version called nixpacks, which is on top of Nix. And then, yeah, we moved.
[64:17]
B
People have been trying to get me to adopt Nix and nixos for like four years.
[64:21]
C
Yeah.
[64:22]
B
Is it going to ever going to be a thing?
[64:23]
C
I don't, I don't know. Like we were super excited about it in general, but it's like it has a bunch of different kind of pain points in general because if you just think of it, it's like it's a stack of version source code or it's a stack of version binary at specific slices in time. Right. And so if you want version X and version Y, you end up bloating a lot of your kind of like package like space. Right. Which blows up the size of your images and makes it really, really difficult for really real world workloads.
[64:53]
B
I think if you content address it and you cache it, there's a lot of optimizations that in theory you should be able to do.
[65:00]
C
In theory, yes. Right. And what Happens ultimately is like you have a large enough user base and you have a disparate enough set of machines that you kind of run into the problem that there's a paper that meta released XFAAs, they're like internal kind of serverless system. It ends up being very, very difficult to go in and do that at scale, unless you break out specific runtimes, basically, which we did not want to go in and do. Right. Because we wanted to truly allow you to deploy anything, which was our initial kind of thing with Nix. But we've moved towards some interesting stuff that I think we'll be able to talk about a little bit later that we've built for doing context addressable file systems to be able to lazy load anything from any point and then just page that into memory.
[65:48]
B
Amazing.
[65:49]
C
Okay, that's going to be fun. The whole future is very, very bright. It's. It's crazy. It's going to be nuts.
[65:55]
B
Okay. Founder journey stuff.
[65:56]
A
Yeah. And your cloud usage, you tweeted you're going to spend 300k this month.
[66:01]
C
Yeah, I think we got.
[66:02]
A
I think we got two coding agents across the company.
[66:05]
C
Yeah.
[66:06]
A
You only have 35 people, so I'm sure they're not all spending 10k a month. What's kind of the distribution?
[66:11]
C
I think I'm at about 25 in general. And then we have some power users kind of all the way down. We came back from the winter break and I was basically like, if you're writing code by hand, you are doing this wrong. The tools are good enough at this point that you move extremely, extremely quickly. And yes, there are issues and pain points and all these other things, but you should be reviewing the code that you are writing instead of trying to go in and write it by hand. All of those architectural patterns, all of those other things, you're not going to throw them in the garbage or whatever. Actually, they matter more now than any other time. But you just shouldn't spend your time generating code that you would write. If you know how to go in and write it, just ask the agent to go in and write it and then reconcile it until it looks like you would have written it yourself. And I think, incidentally, people misconstrue my propensity to push people towards agents for, hey, we're growing really, really fast and we've had some kind of bumps in reliability. They're not necessarily related in terms of that, but I think people should really, really understand the tools are good enough for you to be able to move extremely, extremely quickly to Build things way, way larger than you could afford, have possibly built before, right? And so to our point about, way earlier about like how do you cool data centers in space? It's like, well, I don't know actually. Right, but you're at a point now with software. You can actually be like, well how would I build block storage from scratch? How would I go in and do these things? I have ideas because I've got history, I've read all these papers in general, right? Let me go in and work them out in general and let me build like massive test benches with like thousands of tests, right? Because they're free to, they're free to author right now, right? To go in and make sure that like this system can now can be built, right? And I think that if you're not using the kind of AI systems to almost like speedrun your roadmap, to go in and figure out where you need to go in and be to reconcile your existing system onto the future, then you're kind of missing a large point of what is currently happening right now, right? Because you can just template out anything and validate it on the side for free, right?
[68:12]
A
What's the path to spend 3 million a month? Is it bound by ideas and things that the customers can absorb?
[68:19]
C
I think for most companies it's actually bound by deployment at this point in time. And I think that's why we've seen a lot of a massive boon in terms of users trying, not just users, companies like Fortune 50s below, et cetera, going and being like, how do we get our developers to go in and move quicker? I think you're probably going to hit your CFO before you hit any of these limits in general because they're going to look at this and be like, there's an eye watering amount of like money being spent on these tokens. Like I think, I don't know which I think it was the Uber CC blew our token budget for the entire year or whatever, right? And so inference has, costs have to come down, but they're also, you know, we're inference constrained at this point in time, right? And so you're going to almost get this like price discovery of like what makes sense for an org to go in and adopt. And I think what you're going to end up with is actually you're going to almost like end up with the like F1 driver concept, which is if you have somebody who's like really, really adept at these things, it makes sense to go and put them into like a $3 million car or whatever. Right. But if you're not, then like, it probably doesn't actually make sense for you to go in and do that. And we're going to take a few of these people and say, you drive the F1 car. We need to go in this general direction, figure out if this works, and like, almost go ahead and prototype it. Right. And so we've done a few of those things. Like, we've vastly accelerated our roadmap in terms of, oh, we thought we were going to be able to go in and ship this thing in the next, like, few years, but. But actually we can probably ship it in the next few months now. Right. Because we're saying, oh, validated it out, it works. Don't have to even build it incrementally. We can now skip steps to go and just move towards where our vision is for a lot of this stuff. And I think that that's kind of where you end up with a lot of it.
[69:58]
A
Yeah, I think a lot of people are realizing the roadmap doesn't always have a business impact. And so it's like, oh, it's too expensive to run these tokens. But if your roadmap was actually built to make more money, by the time you built the whole thing, you would have some sort of token pricing for it the same way you do with sales.
[70:14]
C
Yeah.
[70:14]
A
Like you will spend a billion dollars in sales if you knew you would get $2 billion of revenue.
[70:19]
C
Exactly right. And I think the, the, the really naive way to go in and measure this is almost like your percentage of tokens that end up in production.
[70:27]
A
Right.
[70:28]
C
And so if you can measure that, you are getting this level of impact because those tokens are ending up in production. That's awesome. But I think the kind of burden of proof is now going to kind of arise. And you see it internally too, on our stuff, we have a growing number of pull requests that haven't yet been merged. And you're just like, okay, how do you get this into production? And so it's really about how quickly you can go and kind of build and deploy that software. Right. Which is exciting because our whole software, we build and deploy software,
[70:57]
B
the SDLC is changing. And it's something that, that both of us are super interested in exploring as well. One of my thesis, or it's not my thesis, it's the pull request is dying. It's going to be the prompt request. And then beyond that, code review is also kind of dying. Because do you really need to. If you have all the other systems in place. What else is changing about the stlc?
[71:19]
C
What else is different? Well, I think the aisre. Aisre the tools to make. So the AISRE is like, like one of those things where it's like, you know, it's a pie in the sky. Aspirational. What, what does it take to get an AI sre?
[71:32]
B
And by the way, you should expose your tooling to your customers at some point.
[71:35]
C
Yeah, well, which tooling?
[71:37]
B
Central command center.
[71:39]
C
Oh, Central Station. Central Station. So we have it for template maintainers. Right. So template maintainers can like deploy and maintain templates and they get feedback on a lot of that stuff. Right. And so we're 100% like going to go in and explore those things, like incrementally.
[71:51]
B
Yeah, but like, you know, clustering around incidents, everyone has a version of that. But like, like I don't think anyone's solved it.
[71:56]
C
Yeah, yeah. Right. And I don't say we've solved it internally, but it's gotten so good that like now we can see those incidents forming like pretty quickly.
[72:05]
B
Yeah. Real time and AI clusters.
[72:07]
C
Yeah. So at some point those will be things that either somebody else goes and builds or we go in and build. But we've always built stuff that like was purpose built for us and if it made sense and there was a way to go in and make it useful for users or monetize it or make sure that that loop becomes like a, a profit center instead of a cost center, like we want to go in and do that at some point.
[72:27]
B
Right.
[72:27]
C
So. But yeah, Portrait is definitely dying.
[72:30]
B
Do you do first party feature flagging and incremental rollout type stuff as well?
[72:34]
C
So we have a feature flagging engine that we built internally that at some point we will, we will roll.
[72:38]
B
I don't see it as a user.
[72:40]
C
Yeah, yeah, yeah. So like that, that would be, that's good. Right?
[72:43]
B
How come you don't give us what you have?
[72:45]
C
Because we have to beta test it. Like we actually care a lot, a lot, a lot about. But the quality of the things. There's plenty of stuff that we've used internally and then we've got it to a point where it doesn't make its way entirely through the journey because it fails. It's like this holds for one service, but it doesn't hold for multiple services. So we'd have to go and build these things for multiple services to go in and make this work. And we know for a fact that if we release this thing, we'd have to go and rebuild this Thing again and again and again. And some things are worth doing to go in and do that, but a lot of them are basically, that just informs our roadmap of, okay, well, for us to go and make that actually a bit easier, we can do a few of these things first, and then we get to that experience. We don't want to dilute the experience by basically saying, oh, yeah, this works, but only for this service. Unless it's a very, very core initiative, which is over the next few months, we're going to roll out a few things that are like, okay, it works for a single service, and then it works for multiple services, and then it works multiple service across the environment. But you have to be very, very deliberate about those things. Otherwise you end up with. With a bunch of broken, disparate experiences which ultimately end up creating a ton of support load because people are like, how do I use this feature? How do I go in and do this other stuff? Right. So it's kind of the thing earlier about, like, you expand your company in general to get those, like, features, and then you almost compact it, smooth out those things. So the experience is like, really, really stellar. Like, we were talking in the hallway earlier where you're like, oh, my God, it's gotten so much better. And I'm like, oh, man. Just internally, we're like, damn, this part's early sucks. We got to make this significantly, significantly better.
[74:11]
B
No, I can, I can attest to, you know, over the last three years that I've watched you build Railway. But yeah, no, I would call to. To listeners if you're not aware, like, the importance of feature flagging. It's a very big part of uber culture. So much so that they have too many feature flags and then they have another thing to remove feature flags.
[74:29]
C
Yep. 100%.
[74:30]
B
What was it called?
[74:31]
A
There's a.
[74:31]
C
There's a paper about this flagger and. And there. There's been another one.
[74:34]
B
There's a thing that, like, looks.
[74:35]
C
Facebook has gatekeeper. Yeah. So though, they're really important and agents
[74:40]
B
are going to need this. That's like the fundamental thing behind just incremental rollouts. OpenAI acquired stat sig and basically GPT5 is just routing and flagging through different models.
[74:54]
C
And it's super important. Right. Because if you assume the software development lifecycle is 100% going to go in and change, but it's going to change because we're trying to do things a thousand times faster and 1000 times more concurrent than we were currently doing. Right.
[75:08]
B
This is routing.
[75:09]
C
Yeah. Right. And so what ends up becoming important at scale? You know, before I even, you know, started Railway, I actually built a feature flagging product. I tried, I tried to go in and sell it to people. Okay. Because I was like, oh, it's like a, you know, like it's an easier version of like launchdarkly or whatever. Right. And I ran into this situation which is like anybody who's small enough to adopt your technology doesn't care about feature flags. Right. And then anybody who's large enough to try and actually need feature flags needs so much scale that you have to like build out all the existing infrastructure, so end up scrapping that. But what is old is new again because now companies are trying to move really, really quickly. But you can't just yolo this like vibe coded thing straight into production. You need to basically say, hey, here's my blast radius, here's my impact, here's my like, whatever. I want to shadow it for these users. Right? Feature flags, right. Like you're going to need those tools that ultimately those larger companies ended up having to go in and build to maintain their structures. Everything's just going to get compressed by like a thousand X so that everybody can go and do that and everybody can build those structures really, really quickly. Right. And that's like exactly where we're at right now is like, like you're compressing the software development life cycle and then we're going to expand it and add way more new things to it, you know?
[76:15]
B
Yeah. And then the other term that comes to mind with when this kind of discussion happens for me, for newer developers who haven't heard this term. Cattle. Not pets. Yeah, right. Because like your prod, people treat it like a pet, like it has a name, you know, I have to keep it alive. But when it's cattle, you can just mass farm and you can like roll out and you can like, you know, portion out parts of them and kill them or whatever.
[76:38]
C
Yeah, yeah, exactly. I actually, I actually think that maybe that's the, the hot take, but I think that that's actually going to change and I think you can move towards having pets so long as you have a. And this is going to be a jump. So long as you have a cloning machine for your pets.
[76:52]
B
Yeah, yeah.
[76:53]
C
If you can snapshot every single thing at every frame, then like, it actually doesn't matter if, you know, that thing gets obliterated because you have some sort of like snapshot of it. Right. All of the things that we have built right now are to essentially Block out any sort of changes or alterations or whatever from that, like hermetically sealed DevOps, like line or whatever. It's like, okay, well, you have to write a Docker file because I only need these specific instance, like only this specific cut of the file system, et cetera. Right. What if you just had the whole file system? What if you just snapshot it? What if you lazily load the entirety of the file system? Right. Then you can get around this problem entirely. You don't need the ceremony of those, you know, having a Docker file or like having ansible script or like having all of these other things. You can just iterate on that loop and then like snapshot it. It's like, is this the right loop? Is this the right thing at this point in time? Okay, cool. Like now I'm going to go and merge it in production. Like go merge the file system?
[77:46]
B
Yeah, why not?
[77:46]
C
It's going to be really fun.
[77:47]
B
Yeah. This is like a whole other kind of worms, but like, I think the number of things that are stateful in a vm, I think if you just kind of catalog them and just like develop dedicated solutions for solving each of them, you can actually kind of to cut this down problem down a lot. And it's surprising that people weren't really trying until now.
[78:04]
C
Yeah, well, so it's surprising. I mean, it's always been surprising to me because these are the things that we work on because they're just like. I'm like, it's so obvious.
[78:12]
B
Principles, you need them. Everyone in theory needs them. And then like the big clouds don't do them. So you're like, it's impossible. Something I don't know.
[78:18]
C
Yeah, exactly, right. You're like, oh, well, Meta has all the people who write EBPF code and they're doing something with them. But you need that kind of stuff to solve these problems. Right. And talked about it earlier. It's like whatever is required, however deep, we have to go in and get to solve those problems. All the way down to the kernels, tcpip stack. Right. We're going to go and figure that out. Is there something that we need to go in and modify to go in and make that work for the mental model that we have for the universe moving forward? Yeah, 100% we're going to go in and do it. We'll just keep going all the way down. It's super fun. It's so much fun. I have to literally peel myself away from the fun, interesting problems that we have to make sure that we can scale the company in a way that works. And there's so many different fun, interesting problems. Whether it is how do you get the information from the customer to support to the person who built the thing internally, or it's like, how do you do safe iteration, or how do you get context from the dashboard to users, or how do you drill down all the way to the infrastructure layer? How do you manage orchestration as a real time operating system versus a feedback control system? Right. Like, it's just so fun, you know?
[79:30]
B
Yeah. I mean, speaking of, maybe you talk about the founder side. You're famously like, you know, the yc, the SF consensus is you go to yc, you get a co founder, you do all these things, you've done none of that.
[79:42]
C
Yeah, I've like done a lot of different things in general.
[79:45]
B
Right. In the elevator you were like actually co founder. It kind of makes sense if like one person is the tech person, the other person is the biz dev person. Yep. And. But you have to contain all those multitudes yourself.
[79:57]
C
Yeah.
[79:58]
B
How do you do it?
[79:58]
C
Okay, I was gonna ask, is there a question in there? Yeah, the question is what the hell?
[80:03]
B
The question is how. How are you alive right now?
[80:05]
C
Yeah, well, I mean, yeah, I mean, just try to get eight hours of sleep, you know, like, is there like
[80:12]
B
a balance that you. Ideally, like 50, 50, 30, 30, 30. Like, what's the mental model that you use as a solo?
[80:18]
C
There's no balance. There's like, you just, you just have to think about all these things and be obsessed with all of these things. Like, whether it is being obsessed with, like, how do people think about your product from a go to market perspective or being obsessed from a perspective of like, well, like, if I can make this change at the like, kernel level, then I can make it so that the user's SSH connection never drops. Right. Like, because that's what I want. Like, I want a universe in which I can go and like a snapshot, all these things and it looks exactly like. You would just kind of iterate on a vm. Right. And I think you just have to be obsessed with all those things at every layer of the stack. And I think that's what makes it easier for me. I think some people, they're obsessed with different portions of the kind of journey, the company, whatever. And I think that that's when you can get really, really good, almost like cohesion by segmenting out these things. Right. And so in the elevator, I was talking about, you have a technical kind of person et Cetera. And then you have the customer kind of person in general. Right? And I think if you can se segment those lines out really, really well, and you can be very, very clear about what your areas of ownership are for yourself or your company or just where you're going to operate, you're going to have a good time, right? If you can't be clear about those things, Right? And this is why I was saying, like, two is the worst number of co founders is because you have no tie break, Right. You basically are like, well, I disagree on this thing. And I disagree on this thing. Right? It's like, well, how do you resolve that? Right.
[81:39]
B
Well, usually someone's CEO.
[81:40]
C
Right, Right, Exactly. Right.
[81:41]
B
Then you're like, okay, you have to tie break.
[81:43]
C
Yeah, totally. I mean, listen, it's hard every single way you cut it, right? It's hard if you get help, it's hard if you do it yourself. It's just hard to run things, roughly speaking. Right. But it's so rewarding. It's so fun, you know.
[81:56]
B
What have you found useful? Like a coach? Any advice that has been really helpful.
[82:01]
C
I like to write a lot. I got in trouble. I get in trouble a lot for my Twitter. There's a pattern.
[82:07]
A
Who do you get in trouble with?
[82:08]
C
The people on Twitter. I was talking about it and I was like, hey, if you, you know, if you're working weekends, you're kind of messing up your planning. Roughly. Right. And I've gone kind of back and forwards on that. Right? Because I think actually right now we're kind of at an extenuating time in general where it actually makes sense to, like, work more. Right. Because the goals are pretty clear in my mind. Right. And so if you have the vision and you know where you're going, you should work a little bit harder to distill that vision and go and do those things. But if you don't have the, like, we're. We're like, I think we should be going this journal. I'm not 100% certain. I want to get a little bit of clarity. I think what you need to do is you need to, like, disconnect and you need to take your weekends, like, very, very seriously. You need to write about where are you, what do you want to do, where you want to go, what problems are you trying to go and solve? And, like, think about a lot of these things. Right? So, you know, like, writing is important sitting down. Like, I don't like the word like meditation or whatever, but, like, whatever gets you into the state of like, your mental clarity. Like, that's the thing that's, like, really, really important when you're trying to go on these journeys of saying, well, we're here and we really need to be. Be here in general, or we're here, and I think we need to be roughly in this kind of space for this to work. So those are the things. And then disconnect. Hang out with the people that you love. And then work super, super hard when you're like, I try and work sun up to sundown, Monday to Friday, all out in general, and then I try and disconnect on Saturday, and then I come back to work on Sunday afternoon, and then I do my writing plan for the week, all those other things, and it works really, really well for me. But another hot take is, like, most advice is to be digested and to be thrown out the window. And if it's helpful, it'll come back.
[83:46]
B
Right?
[83:47]
C
If it's helpful, you'll have kind of, like, learned it over time through experience or anything else like that. But yeah, you mentioned, like, the kind of standard, you know, YC advice, all of those other things. We have a lot. Like, we've made failure as a society very, very expensive, and it makes it difficult for people to kind of trod off the paths. Right.
[84:03]
B
So, yeah, makes sense.
[84:05]
A
Any other soft books you want to get on? Like, anything that you have not tweeted and gotten in trouble with that you want to preview to the world?
[84:13]
C
No, I think the agent stuff is like. It's just like, it's crazy. It's going to be the dominant way in which people are doing pretty much everything. Right. Provided we can, of course, get the amount of inference required for that to go and happen. But over the next 10 years, you see a fundamental shift in terms of how people are thinking about even just authoring the logic that's in their head. Right.
[84:38]
B
Maybe one way of phrasing this is if all birds can become a GPU provider, so can Railway.
[84:44]
C
Yeah, I think there's a lot of ARB in us actually not becoming a GPU provider. I think you're defined almost more by the things that you don't do than the things that you do. Because it's really, really easy for you to just say yes to a bunch of different things. Right. And I think it's gonna be very, very interesting to watch. I think Anthropic is like an amazing company and super, super stellar, and they're moving into a variety of different zones. Right. They're moving into, like, the figma kind of like stuff that they're. They're after today.
[85:09]
B
Yes, as a recording.
[85:11]
C
They've got Claude, they've got Mike Krieger
[85:13]
B
was on Figma's board and then they removed him like Monday and then they launched this today.
[85:16]
C
Yeah, yeah. So, I mean, things move very, very fast right now, but yeah, it's just going to be the way in which people are.
[85:25]
B
Okay, so your answer is focus. No GPUs for now.
[85:28]
C
Yeah, focus.
[85:28]
B
Never say never.
[85:29]
C
Yeah. Right. Like, I can tell you for a fact that we will not be doing GPUs now, but we 100% will be doing GPUs at some point in the future. And that's not like me leaking our roadmap because we don't have plans to go and do GPUs. It's just a function of at some point you need flops.
[85:45]
B
Right.
[85:45]
C
Like at some point you want, like if you're fully vertically integrated and you want to make it really, really trivial for people to go and iterate and build and deploy things, you need access to this core piece of fundamental logic. Right. So. So, yeah, yeah.
[85:58]
B
And then like at some point, presumably your own data center traffic is like a minority of your workload right now. But is there like a majority or, you know, you just kind of completely turn off?
[86:10]
C
Oh, at some point we got to 100% data center. Like our own data centers. Yeah, it's. And it's right now it's the vast majority of the stuff that exists on our bare metals data centers. Okay, so.
[86:21]
B
So you're already there. Like, vast majority.
[86:23]
C
Yeah, yeah, yeah. Right.
[86:25]
B
I didn't know the extent of the transition.
[86:26]
C
Yeah, totally. It was completed at some point. Point. And then we grew so fast that we had to basically like go and scale back on.
[86:34]
A
Take us back.
[86:35]
C
Yeah, sorry.
[86:35]
A
Google Cloud.
[86:36]
C
Yeah, it was funny. It was funny. We got to. On the datadog dashboard, it's like it got to 100% and then it divided back down into the 90s or whatever because we're like adding capacity.
[86:45]
B
Yeah, it's interesting. You're literally building a new cloud that's independent and people assume that that could never happen post the aws.
[86:54]
C
Yeah. And it's hard. Right. We're going to figure out a bunch of different things to make sure that the platform is deeply, deeply reliable. But you have to break ground on a lot of new things. When you basically decide you're going to build a cloud from scratch but not copy the hyperscalers. We've been very, very deliberate to Invent our own infrastructure from scratch based on reading a ton of papers in general, but almost promising to ourselves that we wouldn't copy somebody else's homework because we were saying, hey, listen, if we copy somebody else, we lose. You're just going to become them over time. And so you have to have a core thesis about why does this business need to go and exist at this point in time. And for us, it's always been about the activation energy. To get something to go and deploy it in production at any of the hyperscalers as of right now is far too high. And we believe that it should be instantaneous. We believe that there should be no friction in between what your thought is and reality that kind of comes out that you can share with your friends. And so that's what we're kind of building toward again at every layer of the stack. Like if we got to go down to energy, we'll go down to energy at some point, right? Like, it just, it matters a lot for us from, from the experience of, of giving people access to this tooling because it's, it's gated behind. Like, it's not even just gated for regular kind of like these citizen developers that are now vibe coding. It's like you have multiple layers. You have the citizen developer, you have a front end developer, you have a backend developer, you have a DevOps person, you have like all of these layers, right? And they all need to go in and disappear. So people can just like ship like that.
[88:20]
B
Amazing. All right, that's the future. Yeah.
[88:23]
C
Thank you. Thank you for having me. It's been wonderful.