
Tech Talks are in-depth technical discussions. Doulingo is a language learning platform with over 200 million users. On a daily basis millions of users receive customized language lessons targeted specifically to them. These lessons are generated by ...
Loading summary
Adam
Welcome to Co Recursive, where we bring.
Narrator
You discussions with thought leaders in the.
Adam
World of software development. I am Adam, your host.
Narrator
Duolingo is a language learning platform with over 200 million users. On a daily basis, millions of users receive customized language lessons targeted specifically to them. These lessons are generated by a system called the Session Generator. Andre Kenje Horier is a senior engineer at Duolingo. He wrote about the process of rewriting the session generator, as well as moving from Python to Scala and changing the architecture all at the same time. In this episode, I talk with him about the reasons for the rewrite, what drove them to move to Scala, and the experience of moving from one technology stack to another.
Adam
It's a great interview and I think you'll enjoy it. Andre Kenji Horier is a senior software engineer at Duolingo.
Andre Kenji Horier
Oh, hello. Welcome for inviting me.
Adam
Yeah. So you have a great sort of case study about the rewrite work that's been done at Duolingo. But before we get into that, I was wondering if you could explain what Duolingo is.
Andre Kenji Horier
Yes. So Duolingo is a language learning app. We have around 200 million users as of today. In the app, people can learn how to learn a new language by doing exercises and translating sentences from their language, their native form, to the language they want to learn. And then we have other exercises that practice listening skills and some reading skills, some grammar skills.
Adam
It tries to make it sort of a game to learn a new language. Is that correct?
Andre Kenji Horier
Yes, that's correct. We have all of the gamification mechanisms, such as hearts, levels, experience, all of that to make the language learning experience more engaging. Because when you're learning a language, that's the process that takes people several years to become fluent in a language. By gamifying it, we make it easier for users to keep learning and keep engaged so that they can achieve this very long term goal.
Adam
How big is the engineering team at Duolingo?
Andre Kenji Horier
Engineering team should be around 40 to 50 people, maybe.
Adam
How is that organized?
Andre Kenji Horier
We have several teams. We have some product teams which we call product teams. For example, we have the learning team, which is the team that focuses on improving our learning metrics, improving the learning experience to users. We have, for example, our growth team, which is concerned with expanding our user base and keeping people engaged. We also have some more foundational teams, like for example, Q and A or operations.
Adam
Which team are you on?
Andre Kenji Horier
I'm in a team called the Architecture team. It's basically guaranteeing that we have a healthy code base or a healthy system in terms of the application level.
Adam
Would your tasks be around setting standards around what, what languages are used or what frameworks or things like that?
Andre Kenji Horier
That also. So language is used. Also maintaining common frameworks and libraries for the rest of the company to use, making sure that the architecture as a whole is healthy and each microservices is doing correct things and not abusing infrastructure.
Adam
You were involved in a rewrite of the session generator. Could you explain what the session generator is?
Andre Kenji Horier
Yes, absolutely. When a user is learning duolingo, they will need to learn. So how. So we give users exercises and we have a pool of exercises which is pretty large. So actually, let me take two steps back. We have a product called the Incubator and in that product we have volunteers making the courses, the courses that you see. So it's like a crowdsource approach. And the volunteers will input all of the data and will make the course structure and write all of the course content. And then the session generator is responsible for pulling all of the data and picking the exercises that will be the most interesting for the user at that point and give it. So giving a session that is bite sized and small for the user to do, let's say in a bus or wherever they are.
Adam
So a session is like a unit of learning and the session generator creates that unit.
Andre Kenji Horier
Yes, that's correct. A session is, well, simply speaking, is a collection of exercises.
Adam
Okay, so is this session generated on the fly or is it created ahead of time?
Andre Kenji Horier
Part of it is on the fly. We have a lot of pre processing going on because there's so much data involved in each session that if we do every single step on the fly, then that's very time consuming.
Adam
Why couldn't all the say Spanish sessions be created ahead of time? Is there?
Andre Kenji Horier
Yes. So we also want the sessions to be adaptive for each user. Right? So for example, let's say a user knows 10 words and some other user knows 50 words. So these sessions for these two users will be different. And also we take into consideration how likely they are to remember a word and other things like that. So it's very important for these sessions to have this online component.
Adam
How do you figure out when somebody's going to remember a word?
Andre Kenji Horier
So we have a model for that. So there is this thing called the forgetting curve for a word. And it's basically the probability that a user will forget or will remember a word after a period of time after they've learned, let's say that one day later they have a probability of remembering that x two days later, the Probability will decrease. And so we model that as in a curve and then we do some sort of regression to estimate how well the user knows each of the words.
Adam
So if I learned the like garcon is man today, then you figure out, based on certain probability, I'll probably know it tomorrow, but not next week. And then you, you reintroduce it.
Andre Kenji Horier
Yes, that's correct.
Adam
So why did you decide that a rewrite was required? So you have this session generator, it takes into account these kind of forgetting curves and builds a lesson on the fly, I guess. So why did you decide you needed to rewrite it?
Andre Kenji Horier
Yes. So it all starts with a monolith, I guess. And all of the nightmarish stuff starts with a monolith. So we have the monolith and it's written Python. And the thing is that in the first years of a startup, we are all thinking of, let's move fast, let's ship quickly. And after some years we end up adding a lot of dependencies like data stores and other services and whatnot. In the end, the entire system becomes very. Well, the performance is not so great anymore and also the site's not as stable. And that's. So one of the reasons is to re architect the whole system so that the system is more robust. And the second reason is that we have the code in Python. And well, Python is a great language for writing things quickly and having prototypes ready very fast. But for systems that are very large, they have very complex data structures and complex algorithms, then Python is not so great because let's give an example. So something as simple as dynamic typing becomes a nightmare because then you don't have all of the niceties that you have in a, in a strongly typed language because the compiler won't do as many checks, for example, and then the developers lose confidence in writing code and then they have to spend a lot of time testing and testing to see if all of the possible corner cases are being caught and nothing is going to break in production.
Adam
So let's see if I understand it. So we have this giant Python monolith, so as you add new features, it keeps on growing. You're saying the problem with Python is when you want to add, when you want to add something new, like your level of confidence is low because of dynamic typing. Could you, do you have an example maybe to explain that?
Andre Kenji Horier
An example? Sure. Okay, so, okay, I'll give a very simple example, which is, let's say we want to rename a function and I mean, you could grab your entire code base and start changing stuff. Right, but it's also possible that in some part of the code you pass that function as a lambda and it has a different name somewhere. And then your code breaks into production, in production. Or for example, if you want to add another argument to a function, then you have to guarantee that also nothing is going to break. But then it's also very hard to find all of the occurrences or for example, even simple things, as you don't know what data type you're getting into a function. So you can assume you're getting something, but in the end you're getting something else. And these are all things that a simple compiler check could figure out and say that you're doing it wrong. But since Python doesn't have that, then it's very easy to break stuff.
Adam
Did unit tests help with this process at all or were there tests?
Andre Kenji Horier
So we do for Python, part of the code. Yeah. So unit tests definitely help, but in the starting years of a startup, I guess people are more concerned about shipping things than writing unit tests. And it gets just stated, even parts of our code base are very difficult to mock with a unit test or part of the monolith, because after a while we learned from our mistakes and realize that, oh, we can architect things in a different way that makes unit tests easier. But in the monolith there are still many places where it's very hard to write unit tests, very hard to mock things.
Adam
Especially if you didn't have unit testing in mind. Sometimes you make decisions that make it hard to add them after the fact.
Andre Kenji Horier
Yes, that's very true.
Adam
So when you decided to do this rewrite, what made you choose Scala?
Andre Kenji Horier
So there are some reasons. Well, the first is that our infrastructure is built on top of aws. We needed to think of the languages that are natively supported by Amazon. They are, well, Python, JavaScript, IndoJS, JVM based languages such as Java, Clojure and Scala. And I think Go is also one of the supported languages. Well, JavaScript has the same problem as Python, it's weakly typed and this is something that we wanted to avoid for the particular case of the session generator. And then Java is, well, it's well known and it's a bit slow and verbose. It's slow to develop and it's very verbose. So we wanted a more modern programming language, which is why we thought that Scala was a good, might be a good choice. And also because Scala is also very mature in the backend and, and it's used by many applications in the big data. Domain and big data is kind of similar with what we're dealing with, the session generator. Right. Because it's like there's also like a lot of data, complex data structures, complex algorithms, and it seemed to be a very good fit.
Adam
Was there any concerns about the learning curve of a language like Scala? I mean, it has a reputation for being somewhat complex.
Andre Kenji Horier
Yeah, so we had concerns right now since our entire engineering team is small and the number of people dealing with Scala is also small. So my population of people who learn Scala here in the company is also very small. So I don't know how good. We can talk about how easy it is, but up until now, I personally thought it would be harder for me and for other engineers to learn Scala and to get going, but it was a lot easier than we anticipated.
Adam
Let's discuss some of the features of Scala that you found useful in your rewrite.
Andre Kenji Horier
So.
Adam
Could you describe referential transparency?
Andre Kenji Horier
When we have referential transparency, we have a method that the only thing it does is calculate the output and doesn't change any state anywhere. So it's very easy to unit test and also to just logically debug what's going on. Because you know that once you have that. Once have some input, the output will always be that one. Always be what you're expecting or. Yeah, so it makes things very easy to test, to reason about. And that's one thing that's very good once you have very complex data structures.
Adam
So an example, it's helpful for me and probably the listeners if I can tie it back to an example. So referential transparency means you have a function with some amount of inputs. So let's say, like my what the words that I know in Spanish. And then the output is. Do you have an example of where that could be used?
Andre Kenji Horier
Okay, yeah, sure. So let me think. Okay, so let's say in some part of the algorithm, we are thinking if we should choose an exercise A or an exercise B and we put both into the function. Pass them into the function. And so what happens with referential transparency is that first there are not going to be any side effects. So we won't write things to the data database, for example, because that will be adding a side effect. Or we won't be changing something inside one of the objects because that will also. That doesn't contribute to the output. Right. And the idea is that we will just do some calculation and that calculation will be used only for the output. So let's say if we wanted to pick between two exercises, the output is simple it's one of them. But we can guarantee that nothing else.
Adam
Is going on, which is what makes it easy to reason about.
Andre Kenji Horier
Yes, correct.
Adam
You also mentioned, and I think it ties into that example about immutability. So how does immutability tie into this?
Andre Kenji Horier
So. Well, immutability, if you think that your data, like when data is immutable, you have some nice properties, like you don't need to think about. You don't need to think if something is changing your data from somewhere else. And I think many people have seen you call a function, and then you pass an object to a function, that function calls, and that function calls another function, and somewhere along the way someone changes one of the properties, and then you don't know who changed it, what it changed to, and you just need to add print property, print statements all over the place to figure out what is going on, because something changed part of your data and you don't know why. So when you have immutability, you're guaranteed that, well, things won't change. So in each step of your algorithm, the state will be very clear to you. And what this means in the context of Scala, for example, is that in each step of your algorithm, what happens is only data changes. Let's say you have a list and you change the elements of a list by, let's say, adding one to some property. And then you know that in your next step that your new list will have all of the elements with plus one. And you know that you're guaranteed that that's gonna be that forever. So if you use that variable for something else, that value will always stay the same.
Adam
So, but you did say the list, we're adding an element to it. So how can something be immutable when we're adding to it?
Andre Kenji Horier
Yes, that's a good question. So what we do is we actually copy the list. So we have the original list, we have the new list with plus one in all the elements, and then we keep just transforming data as we go. And that's a way that functional programming happens, is that you will always generate new stuff, but you won't be changing what you had, because if you change what you had, then it becomes hard to understand what's going on once you have a giant system.
Adam
So this is like Scala's immutable collections. And you're saying the immutable control structures means when you write your session generator, nothing's changed. If a number of lessons come in to the input and we need to select one, then that one comes out the other side.
Andre Kenji Horier
Yes. Another example would be, let's say if you pass, for example, A, you have a set of exercises and in one part of the algorithm, you want to filter and consider, let's say, half of these exercises because they're better in that context. So you filter your pool to half, and then you do whatever you have to process. But then in your next step, you wanted to use the actual full set of exercises, and the data is there because you have not touched it. You created a new smaller set in the previous step, which, if you want, you can use it or not, but the idea is that your variables always stay the same. So if you want that entire pool for the next step, you're guaranteed that nobody changed it, nobody removed things without you realizing.
Adam
Mm. To make this rewrite work like, it's quite a different model of operations. So did you have to change the architecture of the, of the session generator so that it could work more as a, you know, transforming inputs rather than, you know, mutating them?
Andre Kenji Horier
Yes. So we did change so many of the algorithms, we had to. We had to just rewrite them in a immutable way sometimes when the algorithm was way too hard to rewrite and it was risky to introduce errors then in those cases. So Scala has this nice thing that it's. I don't know if it's nice or not, people might disagree, but so it's. You can write. So it's almost. So you can port Java code to Scala and just run Java libraries in Scala, because all of them, both of them run on top of jvm. And what this means is that Scala also supports the things from the Java world. So like mutable types and some for loops, while loops that are not very functional in this sense of functional programming. So if you are rewriting something and then you realize that, oh, this will be very hard to rewrite without adding complexity or making risky changes, then you can write in a more Java like idiom of Scala.
Adam
What percentage would you say is more of a Java written in Scala? And what percentages did you kind of go with this functional transformation style.
Andre Kenji Horier
So in our code base, I'd say that more than 99% is functional because we try to do things in a more. So when you're using immutable collections and we're using referential transparency, things are much easier to debug, much easier to maintain. So we try to make things immutable and referential, transparent and functional in general, in most of our code base, most of our code bases, maybe one or two algorithms, we Thought it would be better to just use. Use the non functional version.
Adam
Wow, 99% is quite a. It's very functional.
Andre Kenji Horier
Yes. When you're writing a code base from scratch, you can also do unit tests that you couldn't do in your monolith. So things are much easier to just test that your algorithms are working as expected.
Adam
How is unit testing in Scala?
Andre Kenji Horier
So, yeah, so we use. Or in Scala, should I say like in our whole framework. So we use Finatra as our HTTP server, which is. Finatra is the HTTP server written by Twitter and they use guice, which is the Google library. So unit testing in this context, we have all dependency injection of the box, we have mocking out of the box and it's very easy to do everything. And I'd say that it was just easy to write unit tests and we end up writing a lot of them. And I think our coverage right now is somewhere along 70% maybe.
Adam
Nice. So back to the architecture. Is the session generator you mentioned, microservices, Is it something that calls out to a bunch of services and combines them together or how does it interact with the rest of Duolingo's infrastructure?
Andre Kenji Horier
Mm. So we do have a lot of. So we have to pull data from a lot of places and we have a data pipeline for that. And it's. So the pipeline is. Right now it's just a task that runs offline so that we don't, you know, so that we don't make real online traffic depend on our data stores. Right. So we have this task that runs daily and hits all of the services and whatnot that we need to hit in order to fetch the data. And then this task pre processes everything and then serializes all of the data into all of this pre processed data into S3, which is a file server by Amazon AWS. And then when we are online, we're serving actual requests for users. What we do is we just fetch from the file server and cache it in memory and serve it. And when we do that, we have an architecture, a system that's very robust to failures because your only real dependency is your file server and. Well, network, of course. And also it's also very fast because everything is cached in memory.
Adam
So the only real external dependency you have is S3. And then even that is kind of insulated by your cache.
Andre Kenji Horier
Yes, yes.
Adam
So with all this functional idiomatic code you're writing, does that mean the session generator is sort of like. It takes as an input, like a user, and then all of the possible lessons Ever. And then it spits out what they should learn next. Is that.
Andre Kenji Horier
A little bit. So it actually takes as input. So the online part of the session generator takes as input the lesson the user wants to learn and some other user settings and outputs the session to the user. So the collection of exercises.
Adam
Okay, there's an idea that statically typed languages are more verbose and dynamic languages are more succinct. So I've actually found in my experience that Scala is a very succinct language, maybe even more so than Python in some cases. What did you find in terms of verbosity moving from one language to the other?
Andre Kenji Horier
Yes. So I think verbosity depends a lot of the language itself. And so if you're familiar with Python as your Python or JavaScript as your Go to dynamically typed language and Java as your statically typed language, then I would agree that yes, because Java is super verbose. But then we have. Scala is a language that is concerned about a lot of typing. So there are many. So Scala tries to infer types whenever it can. Sometimes it can't and make some errors here and there, but usually it's able to infer types, sometimes infer other things that the compiler can infer. It makes like Python, you can do list comprehensions and write things, write one liners instead of writing like in Java, you need to write three or five lines of code just to do a for loop. And so, yeah, so Scala is very, very succinct and there's not as much typing as a language like Java. And compared to Python I'd say I don't see in some cases Scala is more, is less verbose. For example, when you're defining a class, you don't actually need to write. Basically, if you don't need a body, you don't need to write a body. For Scala.
Adam
Are you using implicits within your code base?
Andre Kenji Horier
We are using. So in Scala there are two kinds of implicit the implicit parameters and the implicit conversions.
Adam
And.
Andre Kenji Horier
Well, so the implicit parameters is when you, when you basically define what some of the parameters of your method as implicit and then as long as you have within the scope of the caller a variable of that type that is also implicit variable, it's declared as implicit. Then you're able to just not write the code to pass that value into the function. So it's basically to avoid typing. And so for those we do use implicit, we don't use implicit conversions because we usually find that a little bit scary because you won't see when things are being converted so generally no implicit conversions, but we do use implicit parameters.
Adam
And now what's an implicit conversion?
Andre Kenji Horier
Yeah, so an implicit conversion is when. Well, so you have an object of type A and then let's say that you want to convert it to an object of type B. There is a mechanism in Scala that you can define your conversion from type A to type B. And if you put that conversion in your scope, you're able to convert it from A to B without writing code to convert things.
Adam
So it could be very handy and save on typing. But also maybe you don't know what's changing to what.
Andre Kenji Horier
Yes, yes, yes.
Adam
In my experience, Scala code can be less prone to runtime bugs. I think you mentioned you had some issues with runtime bugs getting to production in Python. So how did this change now that you've rewritten?
Andre Kenji Horier
Yeah, so in Scala we do have a lot less runtime bugs. Part of it is because your compiler will just get most of the errors. So as long as compilation passes, you pretty much have. Well, you don't have programming errors, you have application logic bugs here and there. But that's another problem. The compiler does a lot of stuff for you. And also the unit testing framework is also very user friendly, so we can write a lot of tests that just. That makes sure that your code doesn't have the most common application logic errors that you know about. So in the end it's very hard to have these runtime bugs going on in Scala.
Adam
What did you find to be the pain points of moving to Scala as a language?
Andre Kenji Horier
Pain points? That's a good question. So I was actually more fascinated that Scala as a modern language, it has so many nice things that we don't have in Python and Java that kind of outweighs pain points. There are some pain points, but I would say that they're not actually like the language itself. There are some small things here and there in the language, but those are. It's not a big deal. Most of our pain points were. Let's say we have a library in Scala that it's like under documented, or we have function that's available in Python by default or in one of the common libraries and it's not in Scala.
Adam
So.
Andre Kenji Horier
So there are some small things, but I guess that's true whenever you change a code base from one language to the other.
Adam
Nice. You mentioned there's some things about the language that fascinated you.
Andre Kenji Horier
Such as, okay, let me see, what are the things? So, yeah, I think one of the things is, well, functional programming in general and how easy it makes, it makes how readable it makes your code. And also since the language is not two verbals, the end result is that your code is very explicit on your application logic instead of having like a boilerplate of, you know, just like controlling your loops or things like that. And so I find the code in Scala very readable and that's nice. It's also very easy to just look at the code and see if there is a problem because it's all immutable and it's just easy to debug even without running any code. And also, yeah, so I originally started my life as a programmer in Java and then when you're in Java and you move to Python, the first thing you think to yourself is that, oh, this is a lot less verbose, it's much faster to write stuff. And then there's also that difference like Scala is very, it's very fast to write code. So in the end I spend my time not only writing code, but all of that extra time that I would either spend, you know, just typing in Java or testing things in Python. I write, I use that time write unit tests in Scala and in the end it's that confidence thing. You can write code and be confident that things will work as the way you expect.
Adam
So you're back on the jvm. So you got tired of the JVM and you left and now you're back. But the language is more fun.
Andre Kenji Horier
Yes, yes, exactly.
Adam
Is. I think you mentioned this, but do you find, is it faster to write scales slower but worth the trade offs or compared to Python?
Andre Kenji Horier
Compared to Java, I'd say that for it's, it's a bit slower than Python, but there are caveats on it. So it's slower to write a piece of code, but then the amount of effort you have to maintain that code and to test the code is a lot lower in Scala. So for me in the long run Scala is just a faster language than Python.
Adam
So not faster to initially develop, but faster like the all in time that.
Andre Kenji Horier
Yes, yes.
Adam
So how about maintenance? So maybe you don't know, maybe this is so new that you haven't had to do much maintenance of it.
Andre Kenji Horier
But well, so yeah, so we haven't done that much maintenance because the system is very new. But whenever we have to refactor something it's very easy to. Because your IDE does it for you basically, so you don't even need to think about it, you just click two buttons and that's it. So yeah, so maintenance and then.
Adam
The compiler Gives you confidence, I would imagine too around these refactorings because, you know, you get some sort of error checking.
Andre Kenji Horier
Yes, exactly. So, yeah, so I remember the first time I had to refactor some stuff in Scala. I was just surprised that in Python it would take me like, I don't know, an hour or maybe not an hour, but you know, a lot of time. And in Scala I finished in like less than one minute and I was just so surprised. It's the sort of thing that you, after working with Python a lot of time, you just, you just become so used to that, that whenever you have something nice, you're like, oh, that's nice.
Adam
So Scala being on the JVM should in a lot of cases be much faster than Python. Do you have any sort of numbers about that?
Andre Kenji Horier
I wouldn't say so. We haven't actually done any benchmarks that you would be able to trust that compares the exact same code in one environment or in the other. The one thing that we have going on is that we rewritten the whole system, the whole session generation. In Scala we have also re architect things and some of the performance gains that we saw was so from rearchitecting and using the in memory cache and using S3, we decreased latency from, I don't remember it was maybe 700, 800 milliseconds to things to the order of tens of milliseconds. So it was like more than 10 times. That was very good. Also the number of servers that we need to use to serve the same amount of traffic decreased by how much was was like to the order of maybe ten times or so.
Adam
Wow. So that's a big savings cost bottom line, I guess.
Andre Kenji Horier
Yes, yes. And also just the fact that Scala and well, the JVM in general does better in multi threaded environments and multi processing and is able to just run a lot of stuff at the same time. It's nice. So one thing that Scala has that Python does, for example, it's called futures. So what a future does is basically it's unit of asynchronous computation. Whenever you do a request, you get the response a future, but the value is not there. The value is you're waiting for the value in other thread. So what happens is that you don't block your original thread and because of that you're able to do a better job in concurrency. For example, that's one thing that we see in Scala, is that our servers are able to handle a lot more concurrent requests than Python, because in Python you have uwsgi and whatnot. And then whenever you have a request and then you're waiting for IO, that thread is completely blocked. And if you have, I don't know, let's say 20 threads in your server, then you have one less to serve traffic.
Adam
Is there a Python way to deal with this or there's just not.
Andre Kenji Horier
That I know of. Well, not out of the box, maybe. There might be some libraries that do. For example, the actor model, which is something. It's the thing that languages like Elixir and Erlang out of the box too, so it just handles concurrency better. I think there is something for Python as well. There's Akka, it's a library for Scala and I think if you use that, you might be better off. But not out of the box, not if you're using like Flask or Pyramid.
Adam
Are you using the actor model in your session generator?
Andre Kenji Horier
Not in the session generator, no. Because we wanted to. So the interactions with IO are very simple in our session generator and there is also the overhead to get it set up first because nobody in the company had that kind of know how. So we chose to first to do the more common or common approach. But it's something that after you start reading, you become interested in.
Adam
Okay, Are you using actors at all? I'm just curious because you mentioned it.
Andre Kenji Horier
So I was thinking of implementing that for the. For the offline part of session generator because we have a lot of data and in that situation it wouldn't make more sense to have a data pipeline that uses actors. For now, we still haven't had the opportunity to do it because we're still working on some other things and that's not the highest priority yet.
Adam
Makes sense. So now that your rewrite is over, what were the business benefits of the rewrite? Was it a success?
Andre Kenji Horier
Yes. So for now I think it's a success, but it's not completely over yet because we have still some features to port, but it's mostly done. I would say it's a success because we were able to have those cost benefits. It's just a lot cheaper to run or to serve traffic with the rewritten code and the rewritten architecture than the original one. Also, in terms of developer productivity, it's also very good because it's a bit weird if I say it, but my feedback. But the feedback from the other developers is that it's just less painful. And I think painful was the word they actually used.
Adam
So what makes it weird?
Andre Kenji Horier
Well, I started the whole process of moving to Scala, so I might be very Biased towards the new system.
Adam
So you don't find it weird, but other people do, Is that what you're saying?
Andre Kenji Horier
No, it would be weird for me to say because I'm biased. But other people. I also got feedback from other people that it's less painful. It's that thing that I talked about, like you can write code with constants and I think that's like very important.
Adam
Okay, I understand. Would the rewrite have been successful or as successful if you had, if you had made the architectural changes but not the language change?
Andre Kenji Horier
I think partially so we would have. With the, with the architectural change, we would see improvements in latency. I think in Python they wouldn't be as large. One reason is because Python is a bit slower than Scala. The other reason is that having a thread safe cache in Python is not as trivial as it should be. So I think that would be one problem. But also generally we would lose all of the benefits of developer confidence of not pushing breaking changes because it's all dynamically typed and it would be much harder to make larger changes or just to know what, what the data structures are.
Adam
What is it like working at a company with such a focus on learning? I think. Are you a language learner yourself? Does the company have a learning perspective based on what it does?
Andre Kenji Horier
Yeah. So I think it's very interesting to work here. So I am a language learner, so my native language is actually Portuguese, so I, I was born in Brazil. I'm a Japanese Brazilian, so my native language is Portuguese. Second language is English. I've learned Japanese and I kind of know Spanish, so. And it's kind of, it's very fun to, you know, be surrounded with people who have like, the same interests and for learning. There are some people who are learning like a ton of languages and they know a lot, a lot of languages. And even with our community, sometimes when we meet some members of our community, it's very interesting because they have all this view of the world that you don't usually see in your daily life. People who want to learn new cultures, learn new languages and who had, like, very broad horizon, I'd say.
Adam
Yeah, I could see why that would be re. Refreshing people, you know, talking to people who have a global perspective. Well, it's been great to talk to you about this rewrite and I'm glad it's been a success. Thank you so much for your time.
Andre Kenji Horier
Yes, thank you. Thank you for inviting me. It was great talking to you.
Adam
All right.
Date: January 7, 2018
Host: Adam Gordon Bell
Guest: Andre Kenji Horie, Senior Software Engineer at Duolingo
This episode explores Duolingo’s ambitious project to rewrite its “session generator” (the system that creates adaptive language lessons) from a Python monolith into a new, Scala-based, functionally architected system. Host Adam Gordon Bell interviews senior engineer Andre Kenji Horie about the motivations behind the rewrite, the challenges and benefits of moving to Scala, the technical and business impacts of the architectural changes, and thoughts on Duolingo’s unique learning-centric company culture.
Notable Quote:
"A session is, well, simply speaking, is a collection of exercises."
– Andre Kenji Horie [06:09]
Notable Quote:
"It all starts with a monolith, I guess. And all of the nightmarish stuff starts with a monolith."
– Andre [09:04]
Notable Quote:
"JavaScript has the same problem as Python, it’s weakly typed and this is something we wanted to avoid... Java is a bit slow and verbose... Scala is a very modern programming language, mature in the backend... seemed to be a very good fit."
– Andre [14:05]
Notable Quote:
"When we have referential transparency, we have a method that... only calculates the output and doesn't change any state anywhere. So it's very easy to unit test."
– Andre [17:06]
Notable Quote:
"When data is immutable, you have some nice properties, like you don’t need to think if something is changing your data from somewhere else."
– Andre [19:53]
Notable Quote:
"First time I had to refactor some stuff in Scala... in Python it would take me…a lot of time. In Scala I finished in less than one minute and I was just so surprised."
– Andre [42:14]
Performance Impact:
Notable Quote:
"So from rearchitecting and using the in-memory cache and S3, we decreased latency... to the order of tens of milliseconds... server count decreased by maybe ten times or so."
– Andre [44:22]
Notable Quote:
"It's also very good because... feedback from the other developers is that it's just less painful."
– Andre [49:10]
Notable Quote:
"It's very fun to, you know, be surrounded with people who have the same interests and for learning. There are some people who are learning like a ton of languages... It’s very interesting because they have all this view of the world that you don’t usually see."
– Andre [51:30]
This episode offers a detailed, honest look into what it takes to re-architect a critical system at scale—from language and technical decisions to human and business outcomes. The migration to Scala and functional programming not only “decreased pain” for engineers but provided tangible improvements in performance, reliability, and developer happiness at Duolingo—a company as serious about software as it is about learning.