Static Analysis for Ruby with Jake Zimmerman - Software Engineering Daily

Summary6 min read

Podcast Summary: Static Analysis for Ruby with Jake Zimmerman

Podcast: Software Engineering Daily
Episode: Static Analysis for Ruby with Jake Zimmerman
Date: October 14, 2025
Host: Josh Goldberg
Guest: Jake Zimmerman, Lead Developer on Sorbet at Stripe

Main Theme and Overview

This episode delves into Sorbet, Stripe’s open-source static type checker for Ruby, with lead developer Jake Zimmerman. Hosted by Josh Goldberg (TypeScript/JS tooling expert), the conversation explores Sorbet’s origin, technical and architectural challenges of static typing in a dynamic language, key features, performance, collaboration with the Ruby ecosystem, and both present and future of typing in Ruby. The episode offers both a beginner-friendly intro to Sorbet and plenty of deep technical dives for advanced listeners.

Key Discussion Points & Insights

1. Jake’s Background and Early Motivation

How Jake Got Into Tech:
Jake traces his interest in software back to a seventh-grade computer science elective ([02:03]).
Teaching Experience and Onboarding:
Motivation for Sorbet and onboarding new engineers at Stripe borrows from his experience as a college teaching assistant:

"If you can use some of the whimsy that you might have developed as a teaching assistant...maybe the people remember the onboarding material a little bit better."
— Jake ([02:36])

2. Introducing Sorbet to New Engineers

First Day Pitch:
Sorbet checks your Ruby code for type errors, flags them, helps with navigation/autocomplete, and acts as a productivity tool ([03:29]).
Deeper Pitch:
For experienced engineers, Sorbet’s type system is a robust tool for catching bugs and reasoning about code structure ([03:29]).

3. The Need for Static Analysis in Ruby

Dynamic Languages’ Flexibility and Issue:
Ruby’s flexibility leads to code that’s difficult to reason about, hence the need for a type checker ([04:51]).
Example Motivation:
Not knowing if a function argument is an object or an ID is a key pain point solved by types ([04:51]).

4. Metaprogramming and Dynamic Patterns in Ruby

The Challenge:
Ruby’s culture of extreme metaprogramming makes static analysis hard. Sorbet offers partial support but can’t cover all cases, and adding a type checker curtails certain dynamic patterns, which may be a beneficial tradeoff for large teams ([06:21]):

"If the reason why you really like Ruby is because of all this flexibility...adding a type checker might just be a strict loss for you...But, the selling point...is that those sorts of programming patterns tend to be extremely confusing and hard to teach."
— Jake ([06:21])

5. Sorbet Project History & Lessons

Origins:
Born out of strong internal demand at Stripe in 2017, Sorbet was developed due to the lack of suitable alternatives ([08:27]).

6. Advanced Type System Features

Modeling Metaprogramming:
Sorbet supports Ruby’s classes-as-values paradigm via clever use of generics ([09:50]).
Retrofit to Ruby’s VM:
Challenge in modeling Ruby’s runtime semantics statically, e.g., 'private' methods aren’t truly private ([11:46]):

"No one really ever thought about the static semantics of these language features..."
— Jake ([11:46])

7. Sorbet’s Architecture

Core Pipeline:
Sorbet is written in C for performance. It parses Ruby code to an AST (Abstract Syntax Tree), then compiles that to a control flow graph for type checking with flow-sensitive typing ([12:39], [14:30]).
Why C and not Ruby?:
For performance, memory control, and team experience, especially for Stripe’s large code base ([15:43]).

"The actual computational complexity is not...what drowns a type checker. It’s whether you have too many spurious allocations..."
— Jake ([15:43])

8. Performance and Scaling Challenges

Previous Strategy:
Heavy micro-optimizations, reading the entire codebase into memory (~20GB), but scaling limits are looming ([20:28]).
Ongoing Re-architecture:
Moving towards analyzing only dependency-based code subsets at a time ([20:28]).

"We fundamentally cannot read 100% of the code base anymore. We need to figure out a way to make it only read a subset..."
— Jake ([22:55])

9. Explicit Dependencies & Community Practices

Combatting Ruby’s Implicitness:
Making dependencies explicit is crucial as Ruby allows implicit/hidden dependency flows ([23:17]).
Ruby Features Stripe Disallows:
For example, Stripe bans dynamic constant lookup to avoid untraceable dependencies ([24:30]).

10. Collaboration & Ecosystem Impact

Working with Ruby Core Team:
Meetings every few months to advance the typing ecosystem (esp. RBS, the Ruby standard for type annotations) ([25:43]).
Shopify Collaboration:
Sharing innovations and ensuring type system improvements suit large codebases ([25:43]).

11. Upcoming and Desired Features

Ecosystem and Syntax Improvements:
Big push for more palatable type annotation syntax ([26:56]).
Closing Type System Gaps:
Fixing poor interaction between structs/records and interfaces (finally being addressed after years, affects thousands of files at Stripe) ([29:27]).
Shapes and Tuples:
Bringing richer type support akin to TypeScript object types ([29:27], [32:34]).

12. Community and Ecosystem Support

Gaps Acknowledged:
Lack of support for publishing gem type definitions outside Stripe, e.g., issues declaring generics in RBI (Ruby Interface) files ([33:39]).
Desire for Increased Community Focus:
Jake aims to address these now that core scaling work is stabilizing ([33:39]).

13. Measuring Code Quality and Type Coverage

Custom Metrics:
Code quality and productivity metrics are best tailored per codebase, not standardized ([35:50]):

"Each individual code base can craft a metric for what productivity looks like...which is something I believe is basically the only way to do it."
— Jake ([35:50])

Memorable Quotes and Notable Moments

On Type Safety vs. Flexibility:
"If the reason why you really like Ruby is because all of this flexibility...adding a type checker might just be a strict loss for you..."
— Jake Zimmerman ([06:21])
On Choosing C:
"The actual computational complexity is not...what drowns a type checker. It’s whether you have too many spurious allocations..."
— Jake Zimmerman ([15:43])
On Scaling Problems:
"We fundamentally cannot read 100% of the code base anymore. We need to figure out a way to make it only read a subset..."
— Jake Zimmerman ([22:55])
On Custom Quality Metrics:
"...the way to measure these things is hyperlocal and hyper specific to individual projects..."
— Jake Zimmerman ([35:50])

Technical Deep Dives

Sorbet’s Type System Features (Lightning Round)

Union Types:
Useful for expressing a method that can return one of several possible values (especially for errors/failure conditions), prevents over-reliance on untracked exceptions ([38:13], [39:06]).
Branded/Opaque Types:
Concealing implementation details; public interface only exposes limited operations (e.g., UNIX file pointers) ([41:15]).
Interfaces and Abstract Methods:
Underused in Ruby, not just for multiple implementations but for data hiding/information encapsulation even with a single implementation ([42:26]).

Community, Usability, and Future Work

Promised Improvements:
- Palatable annotation syntax
- Closing long-standing feature gaps (structs/records vs interfaces, shapes, tuples)
- Better external gem type declaration workflows
Call to Action:
Join the Sorbet community Slack for discussion and support ([46:15]).

Personal Note/Outro

Jake’s STP Bike Ride Experience:
Seattle-to-Portland (207 miles); found the ride left him blissful and mentally refreshed—no work thoughts at all ([44:07]).

Timestamps for Key Segments

Jake's Background & Onboarding Philosophy: [02:03] – [03:10]
What is Sorbet (Beginner & Advanced): [03:29] – [04:37]
Ruby’s Dynamic Patterns & Sorbet's Limitations: [05:58] – [08:20]
Sorbet’s Origins and Stripe’s Decision: [08:27] – [09:33]
Sorbet Type System Details: [09:50] – [11:46]
Sorbet Architecture: C, AST, CFG: [12:30] – [15:19]
Performance & Scaling: [20:28] – [24:20]
Working with Ruby Core/Shopify: [25:43] – [26:43]
Upcoming Features & Ecosystem Needs: [29:27] – [35:38]
Measuring Code Quality: [35:50] – [37:28]
Lightning Round—Type System Features: [38:09] – [43:28]
Personal Note—STP Ride: [44:07] – [45:42]
Community Call to Action: [46:15]

For More Info

Sorbet: sorbet.org
Join the Community: sorbet.org/slack
Follow Jake: Through Sorbet channels

This episode is essential listening for Rubyists considering adding static typing, for those scaling dynamic codebases, and for anyone curious about type checker implementation and ecosystem integration.

Loading summary

Transcript83 lines

[00:01]
Narrator/Advertiser
Dynamic languages like Ruby, Python, and JavaScript determine the types of variables at runtime rather than at compile time. This flexibility allows for rapid development and concise code, but it also makes it harder to catch certain classes of bugs before execution. Type checkers for dynamic languages add structure and safety without compromising their expressive power. Sorbet is a static type checker developed by the Stripe team and designed specifically for Ruby. The motivation behind Sorbet stemmed from the growing complexity of production Ruby applications, where developers needed stronger guarantees and more scalable code quality tools than dynamic typing alone could offer. Jake Zimmerman is a software engineer at Stripe and leads development on Sorbet. He joins the podcast with Josh Goldberg to discuss his background, the challenges of typing in Ruby, the motivation behind Sorbet and its architecture performance optimizations, and more. This episode is hosted by Josh Goldberg, an independent full time open source developer. Josh works on projects in the TypeScript ecosystem, most notably TypeScript eSlint, a powerful static analysis tool set for JavaScript and TypeScript. He is also the author of the O'Reilly Learning TypeScript Book, a Microsoft MVP for Developer technologies, and a co founder of SquiggleConf, a conference for excellent web developer tooling. Find Josh on Bluesky, Fostodon and dot com as JoshUakGoldberg.
[01:46]
Josh Goldberg
With me today is Jake Zimmerman, developer at Stripe on Sorbet and Ruby infrastructure. Jake, welcome to Software Engineering Daily.
[01:52]
Jake Zimmerman
Awesome. Thanks for having me.
[01:53]
Josh Goldberg
Josh, we're really excited. You do a lot of really interesting work and you have a long history of very fascinating blog posts. But before we get into Sorbet and type checking and Ruby, can you tell us how did you get into tech?
[02:04]
Jake Zimmerman
Oh sure. It's a long story. It goes back all the way to seventh grade. My school had a elective for computer science and an elective for like mechanical engineering. I took both of them and I liked the software side of computer science more. So yeah, I've been working on tech stuff basically since middle school and yeah, then all through college and so it's kind of been very long love of mine.
[02:27]
Josh Goldberg
And in university, based on your website, you were involved in student groups and you took quite a few interesting classes. Was there anything that jumped out to you then that's now relevant to your work on Sorbet today?
[02:36]
Jake Zimmerman
Yeah, I think the thing that's relevant today is how much being a teaching assistant in college feels like helping out with the onboarding program. In industry there's a lot of overlap in terms of just trying to come up with exciting ways to keep people engaged because the People who started your company are similarly kind of like, checked out in the first week, where there's just so much going on, so much to learn, and you kind of want to give them all the tools that they need for their job, but you recognize that they're not going to remember everything that you can say. And so if you can kind of use some of the whimsy that you might have developed as a teaching assistant in school, maybe the people remember the onboarding material a little bit better.
[03:10]
Josh Goldberg
That's excellent advice. Let's put it into practice. I'd like to give you two prompts. The first prompt is going to be, suppose I'm an overloaded engineer on my first day. How would you introduce Sorbet to me? And the second prompt will be, I'm a much more relaxed engineer on my second week. How would you introduce Sorbet to me? Can you get started on the first?
[03:29]
Jake Zimmerman
Sure. I think the way that we introduce Sorbet to people on their first day is that you're going to have a lot of tools that help you write good code at the company. And one of these tools is Sorbet. Sorbet is a type checker for Ruby, which means it's kind of constantly running in the background looking for type errors in your code. And when it finds them, it will flag them with little red squiggles. But it can do a lot more than that. It will also help you understand the relationships between your code, let you jump to definitions and find references really quickly, show you autocomplete suggestions. And so it's this tool that's kind of helping you get your job done faster and hopefully make fewer mistakes. And then maybe the longer winded example is, you know, maybe after a few weeks of writing code, you've seen all the things that it can do, but you've probably only scratched the surface. And type systems are this very powerful tool that if you really lean into it, can kind of help shape the larger design of your programs. And Sorbet itself has some fairly interesting type system features that are somewhat unique but also somewhat shared with other type systems. So the more that you know about what's possible to express and how to use the type system, the better you're going to be able to use those features I mentioned before about, like, catching errors and navigating through the code base and stuff like that.
[04:38]
Josh Goldberg
That sounds exciting. Let's say I know very little about types or type systems. I've only ever programmed in, say, python without, or JavaScript or Ruby without. What would you describe as some of the kind of entry or intro level features for a type checker like Sorbet.
[04:52]
Jake Zimmerman
Yeah, so one of the biggest features that you get when you're kind of just stepping into the type system and one of the things that was a key motivating example of why we built Sorbet in the first place was you have a lot of code where you've got a function and it says I accept some parameter called merchant. And you don't really know what that merchant parameter is. It sometimes refers to like some database ID representing that you could load that ID from the database and get back a like real merchant object. And sometimes it is the merchant object itself. And whether it's a string or an actual database model is something that's really important to know because they're going to support different operations. And so if you can go through your code and annotate, yes, this merchant parameter is an ID and this merchant parameter is an object, you get a lot of things for free, like you get those type checking errors that I mentioned before. You get the ability to query what methods are available because maybe a string is only going to have make this uppercase or make this into two strings or get the first character or whatever. A database model is going to be able to actually get you the fields that are on a merchant object. So that's the kind of pitch for why you might want type checking. It just makes it easier to understand the relationships in your code.
[05:59]
Josh Goldberg
Sure, all that sounds lovely. However, Ruby has a bit of a reputation for being a bit of a wild west as a language where there's a lot of wacky stuff you can do in Ruby makes JavaScript look like C. How do you represent all those wild and wacky overrides and added dynamicism in Ruby in a static type system, the way Sorbet has things set up?
[06:21]
Jake Zimmerman
Yeah, I think that's definitely the case. I think especially when you compare JavaScript and Ruby are both dynamically typed languages. And so you can kind of do the same things. But languages are not just the syntax and just the features that they have, but they're also the communities that spring up around them and those communities develop different patterns around how to use the features of the language. And I think that you're absolutely right that in Ruby people have tended to really, really lean into the dynamicism. What that means in practice is that you will find Ruby code where it uses a lot of metaprogramming. And by that we mean it's kind of dynamically defining methods at runtime, so you'll have a bunch of logic kind of factored out into these helper functions whose sole job is to define other methods. And this can be done in JavaScript, but my impression is that a lot of the times when you're dealing with methods in JavaScript, they just kind of show up syntactically at the top level of your class, and they don't really get hidden so much, such that they're finding their way onto an object or a class at runtime. And so Sorbet has to deal with the fact that there's all of these hidden from the static system definitions. It's not perfect. I think that Sorbet has mechanisms to deal with this in certain cases, but that is definitely a limitation. If the reason why you really like Ruby is because all of this flexibility that it provides you with metaprogramming and runtime introspection and stuff like that, in some sense adding a type checker might just be a strict loss for you because it removes the ability to use these things. The selling point for why this might actually be a blessing in disguise is that those sorts of programming patterns in especially large code bases tend to be extremely confusing and hard to teach people about if they weren't actually the one to write that metaprogramming facility in the first place. So the number of developers at Stripe is in the thousands at this point. I don't even know what the number is, but when you have that many people working on one code base, you really benefit from having kind of guardrails to say if you're considering implementing new forms of metaprogramming, maybe don't, because it will be harder for people to understand and, and also harder for the type checker to provide this intelligence for you.
[08:21]
Josh Goldberg
Sure. You've been working on Sorbet since 2017. When was it first started as a development project?
[08:27]
Jake Zimmerman
Sure, yeah, I started at Stripe in 2017, it also started in 2017, and then I joined the project one year later in 2018. So it has a pretty long history. And the kind of interesting part that people ask us about this history is like, how did you get started? How did you convince the company to start this type tracking project? And. And the funny thing is that it was the opposite, that when we were going about our job just maintaining Ruby infrastructure at Stripe, we continually asked people, okay, we built this thing for Ruby. What's the next thing that you really want us to work on? What's the next thing that would really make you more productive in your job and at some point, the overwhelming answer to this question was it would really help if we had a type checker that provided all these benefits that we've already been talking about. So in 2017, they started evaluating some of the options for how to proceed. Whether that might mean picking an off the shelf Ruby type checker, which didn't really exist at the time, it existed in some sort of copy projects and research projects, or whether we might want to rewrite into a different language, which had its obvious downsides of the complexity involved in that, or whether we want to hire a team and build a team internally to actually create a new type checker from scratch.
[09:34]
Josh Goldberg
So, Jake, that means you've been giving similar or very introductory overviews of Sorbet and its features for seven or eight years now. But as someone who works deeply on it, there have got to be quite a few architectural nuances or interesting type system features that you're excited about. Is there anything in particular you'd like to bring up?
[09:50]
Jake Zimmerman
Yeah, I'm so glad you asked this question, because this is how I wish that every podcast would go is we just dive straight into the super advanced details and architectural stuff. There's so much that I could talk about. I think for the sake of the podcast, keeping it short, one of the things that I'll focus on one of my personal favorite features is how Sorbet models certain kinds of metaprogramming that you can do with classes as objects. So in Ruby and most dynamically typed programming languages really, classes themselves are expressions. You can pass them around as values and stuff like that. Which also means that you can accept a class object and then dynamically instantiate whatever the user happened to give you. So maybe if you give in a class that creates houses, then you can instantiate it and get back a house. And if you pass in a class that creates cars, you can do the same thing in a statically typed language like Java. This ends up being really clunky because you have all of these weird factory patterns. And in Ruby it's supernatural because you just pass the classes around themselves and technically you're operating on the factory pattern, but it's super transparent because it's just operating on the runtime class values themselves. And Sorbet also has support for this. It can know that when you pass a class object and then dynamically instantiate whatever you happen to be given, that you get back an instance of that class. And and it makes it kind of easy to model the types of Ruby programs that you see in practice. And I think that the way that it works is super cool. It uses kind of generics, like class based generics in a way that you might not have expected it to and falls out really nicely. And maybe you could think of the theory of Sorbet. I think it's just a really cool feature that also makes it possible to write some really cool code.
[11:24]
Josh Goldberg
That was an explanation that very clearly described not just the Ruby and Sorbet features, but also what it means for a language to have first class, say classes or first class functions. Ruby has a lot of features that other programming languages don't have. Do you ever feel that there's kind of an added difficulty in describing them in the type system compared to some of the more traditional languages?
[11:47]
Jake Zimmerman
Yeah, absolutely. I think that a lot of the features that it has are just what's possible to do at runtime. So Ruby has this concept of private methods, but these private methods aren't actually private methods like you would have expected them to be in a statically typed system. They're this really weird hybrid of actually protected methods and some kind of visibility modifiers that only allow you to call things on self. And so like there are examples where because the language and all of its features were designed about what was possible to build into the vm, no one really ever thought about the static semantics of these language features. So yeah, there's absolutely places like that where trying to retrofit some type system feature to model what the VM makes it possible to do is it's basically the entire job.
[12:30]
Josh Goldberg
I'd like to dive down a little bit more into areas like the vm. Now could you give an overview of how Sorbet actually works? The programming product Sorbet?
[12:39]
Jake Zimmerman
Yep, Sorbet is a C program that basically it parses your code, reads all of the source files off of disk, parses them into some abstract syntax tree and then type checks them. So the type checking itself is somewhat different from what you might expect a typical static analysis pass to do because most by the book compilers courses will tell you that you should parse to an AST and then type check that ast. Sorbet goes one step further because the Sorbet type system has this notion of control flow sensitive typing where if you branch in some if condition or case statement or something like that, Sorbet will know that in one branch if a certain condition is truthy, that a certain assertion holds about different variables types. So to be able to model this control flow sensitive typing you can do one of two things. But what Sorbet does is it actually compiles the AST into a control flow graph where these control flow branches are explicit in the representation of the program and then it does type checking on that. So that's kind of Sorbet's high level algorithm is it parses all of your code, builds some explicit representation of the control flow in the program, and then type checks that.
[13:49]
Narrator/Advertiser
Feeling the AI anxiety From questions to job security to cybersecurity and everything in between, it's easy to feel overwhelmed with the rate of AI innovation. Enter aia, the enterprise AI orchestration and security platform built to boost your confidence. With aia, you don't have to compromise between speed and innovation or security and governance. Quickly deploy AI without cutting corners on compliance. Give your teams the confidence to adopt AI with ARIA. Ready to eliminate your AI anxiety? Visit Airia.com to get started for free today. That's a I r I-a.com when you.
[14:27]
Josh Goldberg
Say AST, is that a for abstract syntax tree?
[14:30]
Jake Zimmerman
Yep. Sorry about that. Abstract syntax tree.
[14:33]
Josh Goldberg
Interesting. Why do you need to do it this way? Is it because of how dynamic the types are in Ruby?
[14:38]
Jake Zimmerman
I think that it's actually for convenience of implementation. So when you take an abstract syntax tree, you're going to have anywhere from like dozens to hundreds of different syntactic nodes for every different language feature that you have. But if you have one construct that represents conditional branches, then you only have to implement the control flow sensitive logic once. You don't have to implement it once for if nodes, once for unless nodes, and once for while nodes, and once for rescue nodes, and once for break nodes and continue. Like you just model all control flow as this one node in your control flow tree. And then you implement a very kind of standardized algorithm for modeling that control flow. So it's less because of Ruby makes it difficult and more because this model makes it easier for Sorbet.
[15:20]
Josh Goldberg
That's fascinating. Now that you mention it, there are quite a few different constructs in every programming language for control flow. But what you're describing, if I'm understanding right, is that you've sort of unified and abstracted away the language specific details of AF Rescue and so on and just turned it into this is how the code might branch. Sort of trick.
[15:37]
Jake Zimmerman
Exactly. Yep, exactly.
[15:39]
Josh Goldberg
That's great. Tangent. How come C why not say Ruby?
[15:44]
Jake Zimmerman
Yeah, that's a great question. We have a whole internal design doc that I think would be really fun to publish one day. But it was a very explicit choice early on in the project. The choice of Ruby was Definitely considered, and I wasn't a part of this decision. But I know that the original team actually sat down and chatted with the team building MyPy, which is a static type checker for Python that actually is written in Python. And there were some people on the early team who were very excited about that because it means that you might be able to attract the wider Python community to work on the Python type checker or the wider Ruby community to work on the Ruby type checker. The MyPy team had explicitly advised the early members of the Sorbet team not to take that approach, because what they found from implementing this in MyPy was that the performance of it was very, very hard to tune. And given that, we knew that, the whole reason why we wanted to build a type checker was to be able to give people fast feedback about their code locally and in CI and in their editors. And given the scale of Stripe's code base at the time, building a type checker that was fast and in Ruby was going to be a Herculean task, basically. So that left all of these maybe compiled languages like C or Rust or Go or whatever. I think even OCaml was mentioned in the Design doc because Flow is implemented in OCaml and there were a history of other type checkers being implemented in these functional programming languages. The decision to go with C was actually just a very practical one. It was three people on the team at the time had tons of experience working on large C code bases from previous employers and previous experience. Rust at the time was kind of up and coming and not necessarily the sort of technology that it was obvious to bet on. And definitely no one on the team had experience using it in a large code base or a large type checker. Project Go, I think, was not considered because of the founding members of the team's experience and understanding that the key operation that you need to control in a type checker is minimizing allocations. The actual computational complexity is not actually the thing that drowns a type checker. It's whether you have too many spurious allocations that really dominates the performance characteristics. So in Go, you don't really have nearly as many controls over whether you're allowed to allocate in a certain spot. But in C, and obviously Rust, you have very, very fine grained controls to guarantee that an allocation isn't happening in a hot path. So it ended up being just a checklist of various conditions and C ended up checking the most boxes.
[18:12]
Josh Goldberg
It's fascinating how the positioning of the project's people the year it's released. The company around it can so drastically influence a major project in the ecosystem for reference, TypeScript was originally written in TypeScript upon release and just recently announced a very large effort to rewrite in Go. Because now in almost a decade later, the characteristics of the Go ecosystem have changed and the needs of TypeScript, like you said, have evolved from less of we need to bootstrap and experiment and more we need performance and better memory profile and a lower level language like Go and C and so on can do that with a lot less engineering effort than some Ruby or JavaScript.
[18:50]
Jake Zimmerman
Yeah, I'm also kind of very excited about the TypeScript Go rewrite. I've been following it a little bit, mostly because working on Ruby infrastructure at Stripe, I also sit very closely next to the people who work on JavaScript infrastructure at Stripe and the TypeScript type checking job inside of Stripe CI is, I want to say, like dozens of minutes long because of the performance architecture of TypeScript. To my understanding, TypeScript, the one written in TypeScript, is single threaded and JavaScript code base the size of stripes. You have to go really out of your way to break up your code into separate packages to be able to get any sort of parallel type checking. So I'm very excited about the Go rewrite just because I think it will unblock just tons and tons of performance optimizations for them. And also kind of to the point about I think one of the reasons why TypeScript ended up choosing Go was because their plan for how to port it was to actually be this like port verbatim almost, where you have a function called foo in typescript and you port it over to a function called foo and go. And that's the thing about TypeScript and Go is that they're very similar. They're both implementing this kind of structural duck typing type system where classes just happen to implement interfaces. As long as all of the methods have been implemented, you don't have to explicitly include an interface or explicitly declare implements to get that types relationship to show up. So it makes a ton of sense for them to pick Go because it's the easiest to implement that porting strategy. You don't have to fundamentally rethink the core types and objects and functions in your type checker.
[20:19]
Josh Goldberg
Speaking of performance, you've talked recently about the different performance strategies that you've taken with Sorbet. Are there any particular initiatives you'd like to spotlight that have happened recently or are ongoing?
[20:28]
Jake Zimmerman
There's always performance work ongoing. That is basically my whole job this is why I lit up so much when you asked me to talk about type system features. Because I spend so much of my time in the performance minds, I've sometimes don't get to talk about type system features anymore. The main performance work that we're doing right now is kind of a somewhat fundamental RE architecture of Sorbet's internals. So what I mean by this is the approach to performance that Sorbet has always taken is we are going to squeeze as hard as we can to micro optimize various parts of Sorbet. We're going to make all the internal data structures use crazy bit flipping hacks to squeeze every last ounce of performance that we can get out of the assembly. And that works really well for a long time. But at some point you just run into scaling laws where you've basically optimized it as much as you can. And the only way to get further performance is to rethink the algorithms and operations that you're doing at a larger level. So that's the work that we're doing right now. The model Sorbet has always taken until now is it will read every file in your code base. It will do some sort of global fixed point analysis to figure out where all of the definitions are, where all of the classes are, what inherits from what, and it'll do this all at once. It'll read everything and then do this global analysis to figure out where everything is defined. And then after that it will type check everything in parallel. So having built up this global model of all the classes and methods and types and whatever, it will type check each file in parallel. Which means that there is at some point 100% of the code base in memory. And given how big Stripe's codebase is, keeping 100% of the code base in memory is anywhere from 15 to 20 gigabytes right now. And if you have a 64 gigabyte dev box development machine, that's something that you can do. But if the code base doubles one more time, now you're talking about over half of the memory doing nothing but just type checking the code. So what we're doing is trying to figure out how we can type check only a subset of the code base at any given time and then move on to the next subset. So you figure out this particular bit of code is entangled and all needs to be type checked at once. We're going to read that small subset type check it, and then kind of page it out back to the file system, move on and Read the next thing. And so this requires having a much better idea of explicit dependencies from one piece of code to another that involves some other internal tooling that Stripe has built for modeling these dependencies. But the core algorithm is we fundamentally cannot read 100% of the code base anymore. We need to figure out a way to make it only read a subset, which is a fun time, basically, because it means rethinking a lot of the core assumptions. And you get to. It almost feels like a greenfield project, even though the project is seven years old.
[23:04]
Josh Goldberg
How easy or how doable is it to feel confident that you're reading a subset of the code base and you understand fully what other parts of a code base it relates to? Do you ever have magic hidden implicit dependencies that make this difficult?
[23:17]
Jake Zimmerman
That is exactly the problems that we're running into is realizing all of the places where our dependencies are getting circumvented. And it kind of ties back to the fact that the language itself was untyped, right? So you didn't have to be explicit about these dependencies. People just wrote code and it just kind of happened to work. As long as it loaded in the right order at runtime, then it didn't matter that those static dependencies weren't explicitly written down somewhere. So the good thing is that the system that we have for tracking these kind of explicit imports of one piece of code into another piece of code is pretty robust. And we think that it's mostly correct. It's mostly possible to capture all of these relationships because no one was ever using it for the kind of fidelity that we were needing to use it for. Building this, you know, explicit traversal of the dependencies inside of the code base. Inside of sorbet. No, it wasn't like 100% faithful, but we think that it is expressive enough where we can just go find the places where the relationships weren't being written down and write them down. And then once it's powering sorbet, that will kind of be a check that it remains correct going into the future.
[24:21]
Josh Goldberg
With a code base the size of stripes, I imagine you have probably a pretty good representation of most to all of the wild and wacky stuff people might be doing in the wild.
[24:30]
Jake Zimmerman
It's actually kind of a little bit of both. We definitely have a lot of wild and wacky stuff if it's possible to write it inside of a method body. So, like every kind of, you know, combination of control flow or like language feature one plus Language feature two, we probably do. But There's a lot of stuff where Stripe has decided we really do not want this feature in use in the code base anywhere. So for example, one of these features is Ruby lets you dynamically get a constant based on a string. So you can say like there's a constant somewhere in the code base called foo. And I'm just going to ask the VM to give me that thing, even though I don't have it in scope. This is really nice if you're doing some sort of a metaprogramming that we talked about earlier, but it's also incredibly hard to understand if that foo is not a string literal, but rather it was a variable that was passed through 12 levels of indirection. You don't actually know what constant you're dynamically accessing at that spot. So that's an example of something that stripes code base explicitly disallows. You're not allowed to call the const get method in the Ruby Standard library.
[25:34]
Josh Goldberg
Are there any collaborations ongoing with the Ruby Core team or the Ruby language itself that might be useful for Sorbet or even working with you in Sorbet?
[25:44]
Jake Zimmerman
Yeah, we actually meet with members of the Ruby core team every few months and talk about improvements that we can make to just kind of the wider typing ecosystem. So that's the way that the Ruby community more largely has decided to come up with type annotations is to have these type annotation files that live alongside your code, called RBS files. They're very similar to typescripts like D.TS files, and that's the main interface that the Ruby community has right now for kind of type checker agnostic type annotations. And so we've been working with them a lot on trying to come up with the best way to move that specification forward. Another big collaboration that we have is we work a lot with Shopify, who has a similarly large sized Ruby code base and trying to make sure that we can build type system features and build type annotation features that make it easier to interoperate with the wider Ruby ecosystem, especially the part of the Ruby ecosystem that is enthusiastic to adopt typing.
[26:44]
Josh Goldberg
That brings up a couple of follow up questions. I'd like to start with that section. Are there things that are happening or going to be added to Sorbet and or Ruby soon that you think the people who are excited about typing will be excited about?
[26:57]
Jake Zimmerman
So one of these things that I think people might be excited about is just the seemingly renewed interest in coming up with better syntax for defining types early in the development of Sorbet, we didn't really lean too heavily into let's come up with a super, super pretty syntax because the things people were asking for were all solved by having a type system at all, not necessarily having a type system that had a great type annotation syntax. And as Sorbet and typing in Ruby have become more popular, the kind of people who are on the margin who might want to adopt the type checker might not. Maybe their last deciding feature is whether the syntax is like, you know, tolerable or palatable. And for a lot of people it's a tough pill to swallow to deal with Sorbet's type annotation syntax. And that's where I think a lot of the kind of energy is in the community right now is trying to come up with better type annotation syntaxes. So I think that's probably for the people who are on the fringe. The thing to pay the most attention to is how much people are working on thinking and implementing better type system or better type annotation syntaxes. APIs are the foundation of Reliable AI. And Reliable APIs start with Postman. Trusted by 98% of the Fortune 500, Postman is the platform that helps over 40 million developers build and scale the APIs behind their most critical business workflows. With Postman, teams get centralized access to the latest LLMs and APIs, MCP support and no code workflows all in one platform quickly integrate critical tools and build multi step agents without writing a single line of code. Start building smarter, more reliable agents today. Visit postman.comsed to learn more. Capital One's tech team isn't just talking about multigentic AI, they already deployed one. It's called chat, concierge and a simplifier.
[28:45]
Josh Goldberg
In car shopping you can using self.
[28:46]
Jake Zimmerman
Reflection and layered reasoning with live API checks. It doesn't just help buyers find a car they love, it helps schedule a.
[28:52]
Josh Goldberg
Test drive, get pre approved for financing.
[28:55]
Jake Zimmerman
And estimate trade in value. Advanced, intuitive and deployed. That's how they stack. That's technology at Capital One.
[29:02]
Josh Goldberg
I'd meant for the second question to be how about stuff for the people who are on the fence? But let's switch a little bit to the people who are very excited. Let's say that someone is already a Sorbet user. I am equivalently large to a Stripe or Shopify or maybe just a large code base. I know there are performance improvements coming down. I've started making my dependencies explicit. Very excited stuff. What else do you have in store that you think I will be juiced about.
[29:28]
Jake Zimmerman
So there's a couple long standing big type system feature gaps that we've never really had a chance to invest in that I think we're going to get the chance to invest in somewhat soon. So one of these, actually we have an intern on the Sorbet team working on right now, which is that Sorbet has interfaces where you can have these abstract methods and then classes that implement them. But it also has syntax for declaring records or structs, where you have some simple syntax for declaring getter and setter fields with a certain type. But these two features really do not play well together at all. If you try to implement an interface with, with a record or like a struct, the override checking kind of vanishes. You don't actually get the interface override checking that you thought you were getting. And I think that's like probably the most exciting feature that we're working on right now is finally getting that feature gap closed. I remember the last time I had an intern was 2019 and I was looking at the list of intern projects and it was also in this list. So it's been this long, long standing problem and it's really satisfying to finally be able to, you know, find the time to go fix it. Part of the reason why it was tricky to fix is just because of how widely used these two features are. And so there's, I think the last time we checked there were when we actually implemented the code to check that if you implement an interface via a struct, this causes 30,000 type errors on Stripe's code base. So we had kind of kept punting the problem down the road because we knew that if we were going to build this feature, we're going to have to burn down that list of violations. The that's really the hardest part of my job is dealing with the fact that we have so much code and if there was ever a bug in the code base, probably someone was implicitly relying on it. So that's super satisfying to find that we're finally getting a strategy for burning these down and then being able to turn that feature on for the rest of the community. There's also just a handful of other features. I think one of the things that people have been asking for a while is better support for kind of shapes and tuples. So Sorbet doesn't really have nearly as good support for shapes and tuples as for example, TypeScript does with its object types, where you can just have a Type that says, I happen to have this key with this value and this other key with this other value. Sorbet technically has syntax for describing this, but this type system level support for it is almost non existent. And so I think that relatively soon we might actually be able to really focus 100% of our time on fixing this problem and building a solution that actually works.
[31:55]
Josh Goldberg
This is going to be very joyous for you now to be able to focus on type system features.
[32:00]
Jake Zimmerman
Absolutely. Yeah, exactly. I think that's one of the nice things about this performance architecture that we've chosen is that it should scale as the code base scales. And so that frees us up a lot to be able to focus on the features themselves, because it's not that every six or 12 months we're getting pulled back to work on performance work that we can finally think a little bit more long term about type system features or editor features or Ruby ecosystem features and stuff like that.
[32:22]
Josh Goldberg
In terms of bang for buck, where something might take not that much work to do, but you are very excited about it, what is the most bang for buck feature that is now or will soon be worked on for the type system?
[32:35]
Jake Zimmerman
Yeah, I think the shapes and tuples is hard to say is the best bang for buck. I think that the bang will be there, but the buck will also be similarly high. I think that it will be a very impactful feature, but also a very expensive feature to come and figure out a solution for. I don't actually know necessarily what the highest ROI feature would be for us to build because a lot of the really impactful features that we would like to build, the reason why we haven't built them is because we know that once we implement them, it will involve doing a lot of work internally in Stripe's code base to get it adopted. It's a great question. I actually don't know necessarily that I have a list of features that are super impactful but also quick to build because I think in a lot of cases we've actually built them.
[33:18]
Josh Goldberg
That's a good sign for your team that you've been prioritizing the things that get the most impact out soonest. You mentioned that you're going to be working on ecosystem or community areas as well. I understand that there are at least one or two different ways that say the community can write type definitions or types for packages that aren't written in sorbet. How is that area of the type system in sorbet shaping up these days?
[33:40]
Jake Zimmerman
So I think that's a huge gap And I think part of it is because Stripe's code base doesn't routinely publish open source gems that then get consumed internally. And so we don't exercise any of these flows. We're not necessarily dogfooding the features that you would need in Sorbet to have a really great ecosystem level support. And I noticed this most acutely when people leave Stripe, go start a startup and then they complain to me about all of these features that I had never heard someone complaining about when they were in Stripe, where you just want to interoperate with gem, or you just want to interoperate with Rails, or you want to publish a gem and have people consume the types in the gem that you published. These things are workflows that basically don't get used inside of Stripe. And so there's ways to address it. There's ways to just go through and say like, okay, well this is a problem, there's an easy fix for it. This is the problem, there's an easy fix for it. So I guess that's the thing I'm most excited about, is the chance to finally have space to go do that. And one example of a feature like this that's missing is if you have a generic class and you declare that generic class as being generic in maybe it's a container and it's generic in some element type and in your source file you declare that generic element type. If you also then write an RBI file to declare the types for people to consume your library and you declare the class's name and you declare the generic element type that on its own, Sorbet says has a conflict because you've like redundantly declared the generic type. So simply putting a generic type inside of a RBI file means that you get this like spurious error. That's not a complicated fix, it's just something that we have to go think through what the ergonomics of it are like. What's the meaning of, you know, you have this type declared in one spot and you have it redundantly declared in another spot. It's the sort of thing that blocks you from being able to publish types for your generic classes and your gems that people using your gem could use. And it makes it tricky to have this ecosystem of typing tooling spring up around Sorbet.
[35:39]
Josh Goldberg
Heading towards the end of the interview and I have a few lightning round questions for you at the end. But before we go there, are there any other areas such as the community work or upcoming initiatives that you're excited to talk about?
[35:50]
Jake Zimmerman
I don't really know. I think that the stuff that we've been talking about so far is the stuff that I'm excited to be talking about. I think that the other kind of pitch that I might give is that Sorbet ties really closely into kind of code quality tooling. I think that there's been a big push recently, at least inside of Stripe, but also in the larger community, to try and figure out how we can measure kind of productivity and how we can measure code quality and stuff like that. I think that a lot of the ways that we measure this right now are kind of just vibes. And I think that one of the things that we've seen success with at Stripe in terms of kind of deciding what good code quality looks like, is figuring out what's important inside of the code base. Sometimes what's important is like a very standard metric, like maybe test coverage or something like that. But sometimes maybe the thing that's important about code quality is just whether you're using a specific library that was like old and legacy and you're not supposed to use anymore, and how many people are still using that old legacy library or another thing. Or, you know, you started in an untyped Ruby code base and you're trying to migrate to a typed Ruby code base. How many files are not typed yet? I think that, yeah, one thing that I'm also excited about is just how each individual code base can kind of craft a metric for what productivity looks like in that code base and what code quality looks like in that code base, which is something I believe is basically the only way to do. It is like these metrics don't necessarily transfer from one code base to another. But I think once you realize that the way to measure these things is hyperlocal and hyper specific to individual projects, it kind of unblocks the ability to make progress on actually measuring the thing because it'll be what matters for you.
[37:29]
Josh Goldberg
I really like the way you're phrasing that. There have been a lot of efforts to make these kind of one size fits all metrics, for example, test or typing coverage as you described. But A, those might change in each code base and B, each code base's value or weighting for each of those could be completely different. And, and you also mentioned a little, yeah, harder to measure ones like, you're on a very outdated gem or dependency. How do you measure the quality impact of that? That's really interesting, but let's enter the lightning round. Jake, I'd like to give you two features of the type system that I know you like, and then I will ask you for one or two of your own that you know are very useful and theoretically interesting, but you wish people used more, and I'm going to ask you to explain them. Are you ready?
[38:10]
Jake Zimmerman
Oh, boy. All right, let's hear it.
[38:11]
Josh Goldberg
Union types. What is that?
[38:13]
Jake Zimmerman
Yeah, union types is basically, you can say, I have a value of this type or a value of this type. It's kind of this disjunction, this like.
[38:20]
Josh Goldberg
Either or choice, and why is that useful or good? Or how could I even work with that if I don't know what type my value is?
[38:26]
Jake Zimmerman
The most common case where this is really good is for representing the possibility of failure. And so if you say either my method returns a successful result or it failed to produce a result with one of these known classes of problems, this is something that every time that I work in a language without union types or without some types, I always notice that it's hard to represent this, like failure condition and propagate that through the code base. And what ends up happening is people tend to get lazy and then they'll just raise an exception, and that exception won't necessarily be tracked in the type system. And you end up with these. You know, the happy path works really well, but the error conditions have not been thought through and the software ends up not being very robust.
[39:06]
Josh Goldberg
The not being able to track exceptions or thrown errors in the type system is, I think, one of the biggest source of issues in modern code bases or previous modern code bases that is just prevalent. It's everywhere if it's not addressed.
[39:20]
Jake Zimmerman
Yeah. And I think specifically because you mentioned union types, there's a kind of very minor nuance here between what you might call union types and some types. And I think that the difference here is that a sum type traditionally is when you know explicitly, it's kind of this closed union where you only have either one thing or another thing. When you talk about union types, you can kind of bucket more things and make these ad hoc unions. So you can say in my method it's either X or Y, and in the one method up from that it's either X or Y or Z. And you don't have to necessarily declare a new type to capture that third alternative. And I think that one of the nice things about languages like Sorbet and typescript that have this ad hoc union type makes it really easy to just add one more thing to your error stack and not necessarily have to define up front in the top of your Whole code base that you have one of 10 possible failures. You get these very fine grained tracking.
[40:16]
Josh Goldberg
This touches on what you were describing earlier about the pedagogy, the teaching approach of it, or the onboarding where you can build all these fascinating useful features into the type system, but you also need to make it approachable. You need to make it so that people enjoy having this stuff added on. It's not an added chore for them.
[40:32]
Jake Zimmerman
Exactly. Yeah. I often think that it would be super useful, at least inside of Stripe to give like, you know, a whole undergraduate curriculum about like how to use the type system. You know, we start with this like stripe 101 session at their first, first or second week of the company. And it covers really surface level things. It doesn't really cover all of the really fancy things that you can do if you lean into every aspect of a type system. I think it would be really cool to think through what those super useful type system features are and relate them to specific programming patterns and specific programming problems that the type system helps alleviate.
[41:06]
Josh Goldberg
Well, let's dig in then. Here's the second of the two prompted type system features. Can you describe or explain what is sometimes called branded or opaque types?
[41:15]
Jake Zimmerman
Sure. So an opaque type is when all that you know about the type is its name. You don't know necessarily how it's implemented under the hood. The classic example of an opaque type is the UNIX file pointer. So when you open a file that's backed by some C struct, but the operating system only gives you a pointer to it, it doesn't tell you what that C struct's members are. It doesn't tell you that it happens to have an inode pointer in there. It happens to have a pointer to a character string of its file path or something like that. The only thing that it gives you is just this is opaquely a file, which gives you the ability to then craft your own set of explicit operations that you're allowing to happen on this opaque type. You say explicitly, there are two public functions that accept a value of this opaque type. And maybe one of them is get the inode and maybe one of them is get the file name. But if there were any other fields in there, you don't expose functions that let you access that field. So this is a way to get kind of information hiding and you get to control your public interface against your private implementation.
[42:16]
Josh Goldberg
Great, thank you. That's an excellent explanation. No notes, but. But now it require from you one last piece of technical content. What's a type system feature you wish people knew more about or used more.
[42:26]
Jake Zimmerman
I think a lot of people know about abstract methods and interfaces, but I wish people used them more. I think that specifically in Ruby and Sorbet, people have this stigma around, oh, I should only have an interface if I'm going to implement it multiple times. That an interface is kind of a way to say, you know, we have the test mode implementation of some interface and the production implementation of the interface, and maybe like an interface that's based on binary trees and an implementation that's based on arrays or something like that. People assume that an interface is only useful if you have multiple implementations back in the interface. But I think that interfaces also make it really easy to get some of the aspects we were just talking about with kind of opaque types, where you can essentially only expose certain things in the interface and hide all of the other things that the class that implements that interface would have needed to get its job done. Even if there's only one implementation of that interface, the data hiding and implementation hiding aspects of interfaces people tend to overlook. So I think people should use interfaces and abstract methods more frequently.
[43:28]
Josh Goldberg
I really appreciate that you, for the sake of the audience, didn't dive into some extremely difficult convoluted topic and actually just brought up something that most developers who are experienced in these areas already know and understand, just a way to use it more effectively.
[43:44]
Jake Zimmerman
I probably go into the example of using more abstract methods and interfaces every week. So I think, like, it's an answer that I give all the time and I think that people still underuse them.
[43:54]
Josh Goldberg
Well, you're doing good work, Jake. I have one last question for you. I'd like to end every episode on something non technical. Can you tell us, A, what does STP stand for? B, what is stp? And C, what is it like to partake or to undergo stp?
[44:08]
Jake Zimmerman
That's awesome. Yeah, so STP stands for Seattle to Portland. And it is a. It's a bike ride where you start in Seattle and you finish in Portland. So the total distance is something like 207 miles. And yeah, it's just an organized ride that I think the Cascade Cycling Group puts on every. Every summer. And it's a really great event. I. If you're ever thinking about doing it, I strongly recommend it. I recently did it. I think it might have been last weekend. If I'm not mistaken, maybe it was the weekend before. And yeah, it's. It's a super fun event. So I was kind of really excited to get the chance to challenge myself because it was the first time that I had ever ridden anywhere near as far as that. The longest ride that I'd ever done before this was maybe 100 miles. So doing 207 was, it felt, you know, very accomplished to finish it and I had a pretty nice time. But I also think that the, the speed that I finished at was a lot, largely a function of the tailwind that I had the whole time. So it's always nice when you get a tailwind. Yeah.
[45:03]
Josh Goldberg
Did you think on work at all during the trip? Did you come up with any features or bug fixes while riding the bike?
[45:09]
Jake Zimmerman
So I've noticed this about. I do a lot of kind of outdoorsy hobbies and so cycling, hiking, skiing and stuff like that. I noticed that when I'm hiking I have tons of like work related thoughts. I'm kind of letting my brain wander and think about, you know, Sorbet in the background or something like that. But for some reason when I'm on the bike it's like I may as well have no recollection of the last hour or however long I was on the bike. It's just kind of, I feel like I just go a little bit numb and just riding my bike and it's kind of blissful. So I did not think about work at all for basically the entire ride.
[45:38]
Josh Goldberg
So when you're say hiking you're more of a Sorbet state, but when you're biking you're more of a flow state.
[45:43]
Jake Zimmerman
Yeah. Yeah.
[45:44]
Josh Goldberg
Great. Well Jake, thank you so much for hanging out and talking about Sorbet. You work on an awesome project that's doing a lot of really interesting work and for very beneficial stuff for Ruby developers. We talked about kind of the opening of the start of Sorbet, how it became such a powerful and big type checker, some of the architectural improvements y' all are making in it, why it's in C, some of the upcoming type system features and of course lots of great type system features for folks to use today. Jake, is there anywhere that you would direct people to find out more about you or the project or the stuff that you work on?
[46:16]
Jake Zimmerman
Sure, yeah. You can go to sorbet.org to learn more about Sorbet and specifically I would encourage you to go to sorbet.org slack if you want to join the Slack community where people that are using Sorbet kind of discuss and share their experiences. I think that it's pretty vibrant and we always love new people showing up there. Great.
[46:36]
Josh Goldberg
Well, for software Engineering Daily. This has been Jake Zimmerman and Josh Goldberg. Thanks for listening, everyone. Have a good day. Cheers.
[46:42]
Jake Zimmerman
Thanks, Josh.
[46:49]
Narrator/Advertiser
It.