Podcast Summary: CoRecursive – Tech Talk: Python to Scala Rewrite at Duolingo with Andre Kenji Horie
Date: January 7, 2018
Host: Adam Gordon Bell
Guest: Andre Kenji Horie, Senior Software Engineer at Duolingo
Overview
This episode explores Duolingo’s ambitious project to rewrite its “session generator” (the system that creates adaptive language lessons) from a Python monolith into a new, Scala-based, functionally architected system. Host Adam Gordon Bell interviews senior engineer Andre Kenji Horie about the motivations behind the rewrite, the challenges and benefits of moving to Scala, the technical and business impacts of the architectural changes, and thoughts on Duolingo’s unique learning-centric company culture.
Key Discussion Points & Insights
1. What is Duolingo & the Session Generator?
- Duolingo is a language learning app with over 200 million users (as of the episode date).
- Core to the experience is the "session generator," which assembles personalized, bite-sized lessons for users on demand.
- Courses are created via crowdsourced input from volunteers in Duolingo’s Incubator tool.
- The generator picks exercises based on user knowledge, memory models, and engagement.
Notable Quote:
"A session is, well, simply speaking, is a collection of exercises."
– Andre Kenji Horie [06:09]
2. Why a Rewrite? The Limitations of the Python Monolith
- The original session generator was built quickly in Python as part of a large, monolithic codebase—a common startup pattern.
- Over time, the monolith accrued dependencies, performance issues, and instability.
- Python’s dynamic typing hindered confidence in making changes, leading to more runtime errors and making refactoring risky.
- Unit testing helped, but parts of the monolith were hard to test or mock, especially as testing wasn’t prioritized early.
Notable Quote:
"It all starts with a monolith, I guess. And all of the nightmarish stuff starts with a monolith."
– Andre [09:04]
3. Motivation for Scala: Why Pick Scala Over Other Languages? [14:05]
- Infrastructure fit: Duolingo’s backend runs on AWS, which supports Python, JavaScript, JVM languages, and Go.
- Type safety: Unlike Python (and JavaScript), Scala provides static typing and compiler checks – addressing the main pain point encountered.
- Developer productivity: Java was ruled out for verbosity. Scala offered a modern, expressive language with mature backend and big data ecosystem support.
Notable Quote:
"JavaScript has the same problem as Python, it’s weakly typed and this is something we wanted to avoid... Java is a bit slow and verbose... Scala is a very modern programming language, mature in the backend... seemed to be a very good fit."
– Andre [14:05]
4. Functional Programming: Immutability and Referential Transparency
Referential Transparency [17:06]
- Functions always yield the same output for the same input and don’t change external state.
- Eases testing and debugging.
- Example: Given exercises A and B, a pure function decides which to present, guaranteed free of side effects.
Notable Quote:
"When we have referential transparency, we have a method that... only calculates the output and doesn't change any state anywhere. So it's very easy to unit test."
– Andre [17:06]
Immutability [19:53]
- Data structures are not modified; each transformation yields a new one.
- Makes reasoning about state across complex pipelines much easier.
- Scala's standard library makes immutable collections and transformations the default.
Notable Quote:
"When data is immutable, you have some nice properties, like you don’t need to think if something is changing your data from somewhere else."
– Andre [19:53]
Impact on Rewrite [26:16]
- The vast majority—over 99%—of the new session generator is written in this functional, immutable style.
- A few complex algorithms retain imperative style for practical reasons.
5. Developer Experience: Testing, Refactoring, and Readability
Easier Testing and Debugging [27:31]
- The rewrite uses Finatra (Twitter's HTTP server for Scala) and Guice (dependency injection), making mocking and testing easier.
- Now achieve ~70% test coverage; code is significantly more maintainable.
Refactoring and Confidence [41:37]
- Strong static typing means IDEs and compilers support safe refactoring.
- Developers report far fewer runtime bugs; refactoring that formerly took an hour in Python now takes minutes in Scala.
Notable Quote:
"First time I had to refactor some stuff in Scala... in Python it would take me…a lot of time. In Scala I finished in less than one minute and I was just so surprised."
– Andre [42:14]
6. Language Features of Scala: Pros & Cons
- Conciseness: Scala is as succinct (or more so) than Python due to type inference, list comprehensions, and omission of boilerplate.
- Implicits: Used implicit parameters (not conversions) to streamline code, but remain cautious to avoid confusing transformations.
- Pain points: Most issues stem from under-documented libraries or missing utility functions rather than the language itself.
- Delightful surprises: Functional syntax, readability, and compile-time assurances made development enjoyable compared to both Python and Java.
7. Architecture Changes: Microservices & Data Pipelines
- Now designed as microservices, the session generator fetches pre-processed data from S3 (AWS), caches in memory for speed and resilience.
- Reduced external dependencies; almost all online work is in-memory, enabling very low latency.
Performance Impact:
- Latency dropped from 700-800ms to tens of milliseconds.
- Server count to serve the same traffic dropped by an order of magnitude (~10x reduction).
Notable Quote:
"So from rearchitecting and using the in-memory cache and S3, we decreased latency... to the order of tens of milliseconds... server count decreased by maybe ten times or so."
– Andre [44:22]
8. Concurrency and Scalability
- JVM’s strengths in multi-threading surpassed Python’s capabilities.
- Used Scala “futures” for asynchronous processing; considered actor model (Akka), but didn’t implement it for the online generator due to simplicity needs.
- The new system better handles concurrent requests; no Python equivalent out of the box.
9. Business and Developer Benefits
- Business: Big savings in cloud infrastructure costs; serving is now cheaper and more robust.
- Developer: Higher confidence, less “painful” work, code is easier to reason about, fewer runtime incidents.
- Rewrite viewed as a strong success—faster, cheaper, more maintainable, more fun to work on.
Notable Quote:
"It's also very good because... feedback from the other developers is that it's just less painful."
– Andre [49:10]
10. Learning Culture at Duolingo
- Company is filled with language learners and people passionate about cultures and learning.
- Andre himself is a polyglot—native Portuguese speaker, fluent in English, speaks Japanese and some Spanish.
- The international, learning-focused environment shapes both the product and the workplace culture.
Notable Quote:
"It's very fun to, you know, be surrounded with people who have the same interests and for learning. There are some people who are learning like a ton of languages... It’s very interesting because they have all this view of the world that you don’t usually see."
– Andre [51:30]
Memorable Moments & Quotes (with Timestamps)
- [09:04] "All of the nightmarish stuff starts with a monolith."
- [14:05] "We wanted a more modern programming language, which is why we thought that Scala was a good, might be a good choice."
- [17:06] "When we have referential transparency... it’s very easy to unit test and to just logically debug."
- [19:53] "When data is immutable... you don’t need to think if something is changing your data from somewhere else."
- [44:22] "From rearchitecting... we decreased latency... to the order of tens of milliseconds. Number of servers decreased... by maybe ten times."
- [49:10] "Feedback from other developers is that it's just less painful."
- [51:30] "...people who want to learn new cultures... have a very broad horizon, I'd say."
Timestamps for Major Sections
- [02:00] What makes Duolingo engaging—gamification in language learning
- [04:38] Function and importance of the session generator
- [09:04] Reasons for the rewrite & downsides of Python/dynamic typing
- [14:05] Choosing Scala: motivations and trade-offs
- [17:06] Functional programming: referential transparency and immutability
- [27:31] Testing, coverage, and improved developer experience
- [44:22] Performance improvements and cost reductions
- [51:30] Company culture and the value of learning
Conclusion
This episode offers a detailed, honest look into what it takes to re-architect a critical system at scale—from language and technical decisions to human and business outcomes. The migration to Scala and functional programming not only “decreased pain” for engineers but provided tangible improvements in performance, reliability, and developer happiness at Duolingo—a company as serious about software as it is about learning.
