
Loading summary
A
Foreign. Welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel. And I'm joined by my co host, Wicks, founder of Small AI.
B
Hello. Hello. We're back in the studio with Andrew Stewart. Speak. Welcome.
C
Thank you for having me.
B
I have to start this off. I didn't prep you on this at all. But you were a teal fellow in 2011.
C
First class.
B
First class, yeah. Is that the one with SBF?
C
No, he was, I think, several years later, actually. Yeah. Yeah.
B
What was it like? Just talk about the.
C
That's a good question. Haven't been asked that one in a while. It was a really crazy idea at the time and very controversial. And I think the first few years of the fellowship were definitely, let's just find 20 people in their 20 and give them a hundred thousand dollars to drop out of college. And it could be, it was no holds barred. You could do anything. You could be doing some crazy research idea, a startup, anything. And I actually met my current co founder at Speak. He was in the second year of the fellowship and made many, like, very close friends from the first few years. But I mean, for me it was life changing. I. I had a very unusual path where I was. Actually I did finish college. Unfortunately, I was in grad school at the time because I went to school really early.
B
Yeah, I was like, aren't you too old? You know, teal legs. I'm young.
C
I was, I was 19 at the time and in grad school it was a very accelerated path. But I think, like, I knew at the time that I was going to leave grad school and do startups anyway and the timing lined up really well. Yeah, yeah, yeah. Vitalik, I think he was also in a later year.
B
Ah, yeah, damn. Okay.
C
Anyway, but the first two years had, I mean, there are some crazy successes, you know, Dylan from figma, I mean, yeah, like a lot of people.
B
Awesome. Well, you know, feel free to bring in those stories as and when, because obviously only, you know, like those kinds of people. You are now cto, co founder of Speak. I would say from a very early stage, like one of the most successful and prominent OpenAI partners that like anyone would know is like, doing well and like teaching English to Koreans is like your rough remit at the time. How did that all come about?
C
It's funny that you say that because despite our current sort of revenue scale and objectively, I think how successful we are, we've always operated in a market, at least initially, on the other side of the world. And Been much, much more popular in the sort of eastern world and a bunch of Asian markets and relatively unknown in the West. So it hasn't really felt like we've had that sort of awareness until the past few years really. But brief story is that my co founder and I back in 2016 were fascinated by the promise of AI and we spent a year sabbatical basically learning everything we could. We talked to Carpathy back then actually when he was like just finishing grad school and did a lot of sort of self study research. And we were just so convinced, I think fundamentally that speech models were going like this, language models were going like this, and in the five to ten year span they would become superhuman. And we were utterly convinced of this future. And we saw that the way people learn things and specifically learn languages, which was a very sort of human based thing, if you really care about fluency, that would completely change and we'd be able to build language tutors that were pure software, pure AI. So that was kind of the genesis story of speak. It took much, much longer than we expected to build a great product and find good pmf. The first few years were very painful and I think without this really compelling vision of the future, we would have quit. We actually like never pivoted. Last year we brought the entire company to Taipei. We do this company trip every year and we played our original YC obligation video on screen. And it was really funny because the things we were saying in that video were the exact same things that I still say today about the long term vision and what we're building towards. So that was really cool to see.
B
Can you summarize the long term vision again?
C
It was that as speech models and language models become superhuman, that would let us create an AI language tutor that would help you become fluent faster than any human could. And I think we're like 80 to 90% of the tech is here now.
A
And you have this big focus on like speaking. Obviously it's in the name of the company.
C
Yeah, that's right.
A
And I think the speech models were maybe a little delayed compared to the text models. Did you ever think about, okay, maybe speech is just not going to work for this use case or like what were kind of like the valleys of discomfort and then what were maybe some of the pivotal releases and models that you were like, okay, it's going to work. It might take a little longer, but it's going to work.
C
So we've always done custom speech stuff. The first act of the company, if you will, was before alums right before 2022, when Whisper came out, when ChatGPT came out, in the years before that, you know, like roughly two to three years is when we feel like we found PMF in South Korea and then started growing still only in that market, still only teaching English. And we developed custom speech recognition models and users were speaking into the app all day. So we had a ton of this non native English speaker data and we would use that to fine tune models, understand our users better. We still do that today and it's important for us for the core recording loop in many of our lessons that it's extremely fast. So we're very latency sensitive. There's many other sort of product surfaces within the app today that are more LLM powered, where it's more open ended real tutoring where we actually give you feedback on what you said in the semantics and so on. So that stuff is more like whisper powered, more LM powered, but we've always had like a very fast core ASR loop that's been fully custom.
A
I just onboarded to the app earlier today. Yeah, Unlike other apps, there's kind of like this tutor conversation that you do for onboarding. I'm guessing that it's mostly LLM based and then you're kind of judging the person response. So I select the Spanish and the conversation was in Spanish via text to start and then from there started to create lessons for me.
C
Yeah.
A
Was that all unlocked from LLMs where now you can kind of have these conversations and then bring people into the speech flow?
C
Yeah. So before that. So we call that magic onboarding and it was a new thing we built that was more conversational. We wanted it to feel more like you were talking with a tutor and they were sort of learning things about you and we would use that later to personalize the experience. Before that we had a, like a much more traditional app onboarding. There's still a lot of open questions, interesting questions around what is the proper onboarding ux because a lot of people start using speak and they're not in a situation where they can actually speak loud. So we have like, you know, fallback outlets and so on. But it's something we're like super actively experimenting with.
B
Is there a structured output behind that? You know, anything that you found implementing magic onboarding? I think people always want to improve onboarding. What's the uplift or was there one? We still don't know yet.
C
The interesting thing is that in general, because it's speaking based, which is a much higher barrier than just like Tapping a multiple choice button. What we see is that install to sign up rate is a decent amount lower, but trial start rate is higher. It's still something that we have an active experiment that that's running and we're trying to be super agile about testing many different sort of like formats of this. I don't think I have like the final answer yet.
B
Yeah.
C
But I think the intent, the really like vision that we're going for here is that as soon as you download the app from the app store, maybe you see it in an ad. The first thing that the first interaction when you have like a fresh open of the app should feel pretty futuristic. It should feel like, okay, this is like the new AI native next gen way of learning a language to fluency. And that's kind of been always our ambition. Like we want to build something that wasn't possible before without LM and like AI technology.
B
Yeah. I think I wanted to go back on the onboarding soon, but there's a general idea of like when you replace a form with voicebot, that you need to have some kind of state machine behind the hook, the thing to drive. Like what else don't I know about you? Let me proactively ask that. And I'm just wondering if you had any insights there or is it literally just a state machine?
C
We tried both actually.
B
Yeah.
C
Right now I think probably what you saw is a state machine, but I think that.
B
Trust the AGI.
C
Yeah. Right. I think that things should move in a direction where it's much more of a natural conversation. There is a general sense of a goal in the prompt that you can specify. And part of the hard thing here is all the guardrails. Right. Like once you start talking about what you had for breakfast yesterday. Right. And trying to be like antagonistic to the system, then things start like really going off the rails. So for a bunch of these experiences, we're pretty careful about the fallbacks and we have a lot of evals around that. But I think where it should end up is just feeling like you have a quick 3 to 5 minute conversation with your tutor and then it knows a lot about you. And then you create your account, et.
B
Cetera, and you create memories.
C
Like yeah, so we, we store what you're saying. We summarize in the experience. The way it works is the, the tutor will ask you some sort of question like what are your goals around learning English or the language? And then we will basically use a separate LLM prompt to summarize. So it's not the full Transcript for what you said that you see, it's more of like an abstracted. Okay, here's what you care about. And we think that's a better product experience.
A
What were some of the other key tenants on the product? Obviously, language learning is like one of those consumer markets where like dozens of companies always try and get started and you get these old companies, like, you know, Babel, and you got Duolingo. So speak speaking. The act of speaking was like a big part of it. I think this memory stuff is great. I think if you tried. Some of the other apps is like, they always try to re. Ask you the same things that you get wrong before, but you're not really learning. Is there anything else that is maybe not as obvious from the outside in the design of the app and the product that is like, you think it's really different.
C
I would say from a macro level, this is actually a pretty new product category. AI powered language learning. And all these apps that you mentioned, Duolingo, Babel, et cetera, they're more of like the Gen 2 of language learning. So, like, if you think about Gen 1 was Rosetta Stone, if you, you know, if you remember right, CD ROMs in airports, and then Gen 2 was basically mobile. So you have these very casual, massively popular mobile apps like Duolingo that I think the comp there is probably closer to a mobile game. Something that feels productive, something that's very engaging, very gamified.
B
And Duolingo is very leaning into it, the gamification.
C
And they've done an amazing job of that, to be clear.
B
Yeah, they might be the world's best people at it.
C
Yeah. And our view is that LLMs and AI now enable Gen 3 of language learning, which is something that is very AI native, very focused on functional fluency, which is why we do all these role plays and let you practice Spanish by talking to your Uber driver. We don't teach vocabulary and grammar, we teach sentence patterns and we try to get you to just repeat and drill and drill and drill, almost like you're in a gym until it's automatic, because that's what speaking is, right? Like it has to be spontaneous and automatic. In terms of the other aspects of the design, though, we went through many, many iterations over the first few years of starting the company. This is kind of what I was mentioning about. It was like, really painful in the first four or five years. And in fact, the current version of the Speak app is not the first thing that we launched. We had something that we call internally the red App, which was like a red app icon, still a similar logo. And it was more around packs of content instead of courses where you could sort of choose any topic that you wanted to learn. It was for many different languages, for learning. It was essentially like, not a very directed experience, and it didn't really work. It was free. It was a very basic thing. But we, in 2018, tore everything down and realized that we had to really fully change what we were doing. And that's when we decided to focus on South Korea, specifically on teaching English. We built a bunch of new lesson types and we created our courses so that the experience was much more on rails. We realized people don't want to choose. They're already using some of their motivation on a daily basis just to open the app. They don't want to make another choice after that. Right, Just tell me what to do, right? Like, give me a big button and then I can tap it and just start a video lesson or whatever. We also, pretty critically, I think, abandoned the free version and just went straight premium. And we kind of sidestepped the motivation question that way because we knew that there were a ton of users that really wanted to learn English and were already really motivated. So we wanted to basically filter for these users. Um, so there, you know, like, it was. I wouldn't say there was one silver bullet. It was kind of the combination of many learnings over three or four years. And then that started really growing in South Korea. And from there, I guess, like, phase two was really 2022, when LLMs came out and whisper came out. And that allowed us to go from this more supplemental speaking practice tool to more full featured language tutoring, where we could use LLMs, like, 3.5 Turbo back then to give you direct feedback on your wording and on, you know, like, that was kind of a weird thing to say. A native speaker would say it this way or use a different word or whatever.
B
I always do a poor job of doing this. But can we get some headline numbers, like, just to get a sense of scale? Because I think maybe some audiences don't know. Where are you at now in terms of your reach?
C
So we're now the biggest English app in South Korea. Yeah, we do billboards, big celebrity campaigns that sort of scale. Like, we're, you know, very popular there. I think, like, 6% of the Korean population has tried us where, you know, like, well, on the way in a bunch of other Asian markets, like Japan, Taiwan. So the Asian markets are currently our mainstay. We also teach English in 40 more countries. We're coming to the US as well, launching. I mean we, we have Spanish, French, Life and several more languages are coming this year. That's a huge focus of the company right now. In terms of revenue scale, well over 50 million ARR. It's a pretty simple business model. It's like mostly consumer.
B
Yeah.
C
The B2B stuff is super, super exciting and that's also growing really fast and I think it'll be a really meaningful part of the business.
B
When did you start B2B?
C
About a year ago. It would.
B
Okay.
C
It kind of. It was like very much a side bet slash experiment at first and then it just started working and of course.
B
It'S going to work.
A
Yeah.
C
And now it's like, okay, you know, this is part of the future, right? This is a real thing. Yeah. So that's exciting.
A
What's the B2B race between learning language and like real time AI translation? At Google I O they're like one of those like Google Beam things for like, you know, for conferencing. They do real time translation, like.
C
Yeah, yeah, yeah. So people always ask this, right? They're always like, what happens when the Babelfish comes. Right. When the real time translation comes.
B
And Babelfish is the hitchhiker's guide, right?
C
Yes, exactly. The counter example that I always have that I think is quite illustrative is in German, the verb is at the end of the sentence, right? So if you're trying to do real time translation from German to English as an example, you can't actually make any progress on the English until you hear the whole German sentence and you know what the verb is at the end, right. So like the minimum latency there is the full sentence and that, that's like an example of the technical blocker for like why it'll never be truly, truly perfect. But also I think besides that, if you talk to all of our users in Asia, they don't want a translator. The reason that they are trying to learn English is to make themselves a better person to connect with other people. They want to be able to look you in the eye and speak English, speak the same language as you. Right. So it's actually like a very different thing. I think what will end up happening is that we will build a real time translation feature into Speak and have it integrated into the learning experience.
B
And also like there's always that human side, right? Like I'm dating a Romanian woman.
A
Yeah.
B
His wife is trying to learn Italian. Like there's always that.
C
Yeah, absolutely.
B
Going to keep happening. I Want to double click on Korea? Yeah, I think it's like a very insightful, smart decision. Maybe people only know Korea through K Pop, but actually I think a lot of Americans learn Korean because of K Pop. That's a side thing. But like, you could have done Taiwan, you could have done China. I saw, I remember starting a documentary about how China was crazy about English or mad about English. I think that was the title of the documentary. Was it obvious? Were you sure when you went into Korea or was it just a test?
C
We visited a bunch of Asian countries when we were thinking about how do we relaunch things, how do we focus in. And we almost chose Taiwan actually, but I think it was a little bit serendipitous. So our first employee is Korean and was my co founder's college roommate. Actually. When my co founder visited Seoul to check out the market, he asked SJ to come along as essentially a translator and to like, you know, facilitate. And I think that just went really well. And it was just very obvious from being on the ground in the market that Korea is pretty obsessed with learning English and there is every human based solution possible, right? You know, like English academies, classes, skyscrapers full of classrooms, stuff like that. And our logic was basically, if we can really make headway and win this market that is chock full of these human competitor products and all these people that fundamentally care about fluency, then we probably have something pretty real and strong PMF that we could win other markets with. So that was the original logic. And you know, so far it's been working.
B
It's retroactively obvious, which is the best kind of obvious. But like, it's so counterintuitive that you would be the team to do this and not a Korean team. Right? They would be, they would know because they had personal experience of like, I started in Korean, I learned English. Here's how you do it.
C
Yeah, in hindsight, super weird, right? Like we were definitely, you know, sitting in an office here in San Francisco, operating with users in a market all the way on the other side of the world. It would not have worked without Sunjay. I have to give him a lot of credit here because we paid a lot of attention to the specific wording of button text in the app and local, you know, like, like localized strings. We had a lot of reports from users pretty early on that they were shocked that it was an American company. Like they thought, dude, right, because you can always tell there's always some weird wording or whatever, but there wasn't in speak and I think that probably had like a large sort of non tangible effect.
B
Yeah. Focus, attention to detail. Tech stack you. This was 2018. What were you rolling? You just did ASR and there's no LLM. So Bert.
C
But maybe we actually had really no LLM component of it. So all of the content. Oh yeah. Another thing we did that I forgot to mention was we decided we needed to fully own all the content. So the way that we teach all in house, all sort of thought from first principles, we built this thing called the Speak method, which is basically like a pedagogical philosophy around teaching sentence patterns that you drill and then sort of combine into higher order patterns. And all of that was in house with, you know, our content team and our teachers. And we build a lot of internal tooling to make this possible. There is just a lot of operational overhead. I would say this is something we've struggled with to scale to many more languages and that's like a big research effort within the company. Right now we're building a mobile product. Right. My co founder and I have always just loved apps and been big iPhone users. So we cared a lot about the app being native, feeling great, being high performance. The DNA of the company was always consumer. Frankly, my co founder and I had never worked in a real company. I dropped out of grad school, had a few failed startups and then eventually started Speak. And he had never worked in a real company either. He just startups in the past. So we didn't know anything about enterprise, enterprise workflows or what sort of software real companies used. So I think frankly consumer was the only path. I don't think we could have done anything else. We just didn't know enough. And I think that has served us well though in terms of just really caring about the craft of it and wanting to build something that felt not 90 to 95% but 95 to 100% in terms of polish.
A
Was it hard to build an engineering team that did that at the time? Because ML engineering is very academia driven back then. And then you have like the more consumer stuff that it's maybe more nascent and it's mobile.
C
I'm now realizing that our story is very weird. So you only just realized in addition to the market on the other side of the world. Our first iOS engineer that we hired through a YC referral was in Slovenia. If you don't know where Slovenia is, look it up on Google Maps. But it's, you know, it's like a pretty obscure little country.
B
Yeah.
C
And then we needed to hire a backend engineer. And one of his best friends was a great backend engineer and we hired him. And then this happened four more times all in the same city. And then we were like, okay, we should probably just open a physical office. So. For Slovenia. Yes. So for several years we had an engineering office in Slovenia.
B
What?
C
And then a few people here in San Francisco, and we still do now we have 90% of our core product development team in San Francisco.
B
Here.
C
Office in Fidi were really only hiring here, but for the first, like, several years that, you know, that was like another very interesting sort of cultural aspect of the company, I guess.
B
I think a lot of early stage founders have to do that. That's the only people they can afford or whatever.
C
Yeah.
B
What are your tips that make that remote stage work for us?
C
It wasn't really a price thing. I think legitimately thought he was the best person that we interviewed. And then it just kind of happened that way. When you roll it out.
B
Yeah, it's not a price, it's more about remote work. Right. Like distributed team, early stage. Like a lot of people say, like, no, you have to. You have to move everyone to SF or your startup would die.
C
Yeah. I don't think that we were good at remote work. I don't think that my personality or my co founder's personality is inherently very good at async, just to be perfectly frank. I actually think that almost like, in spite of it, we made it work. It was a little bit brute force. Like, I would just sync with them every single day.
B
Right.
C
And there was pain because the time zone overlaps. Like, it was like, exactly the most inconvenient. Yep. But I think for several years we did that. We got really good at the cadence of it. I think they were excellent engineers as well. So it worked out. But if I had to do it over again, I probably wouldn't do it. It's hard to say. Yeah.
B
Shall we move to phase two on the LLM side? That's when OpenAI started opening up. And when did they Invest?
C
This was 2022. So that was also when Whisper dropped. And Whisper was a really exciting moment for us. It was actually since we started the company and made that prediction of, okay, in five or ten years, speech models, language models will become superhuman level. Whisper was really that magic moment for us where we were like, oh, shit. I think what we predicted is here. And I pretty distinctly remember this moment in the office when we got access to the model and we were testing it on an audio clip of like, A very beginner English learner in Korea saying something. And it was, if you close your eyes as a human, you'd have no idea what they were saying. There were four of us in the room. We all closed our eyes and none of us had any idea. And the model got it right. So, I mean, superhuman. I think that was the moment that we had been waiting on. And at the same time LMS were on the Ascendancy, ChatGPT would come out, I think, on Thanksgiving of 2022, and 3.5 Turbo came out. And I think, like we kind of realized very quickly that all the pieces were clicking now. Right. Like, we have what we need at our fingertips now to go from something that was listen and repeat, where the user would see something on screen, hear a reference of the teacher saying the thing, and then they would just repeat the thing. Right. It was like very simple. Still a great product, by the way. You know, still grew to like several million error in South Korea. So clearly there is like a big marketing for that.
B
Pre whisper.
C
Yes. Wow. This is from like 2019 through 2022.
B
Yeah. And then that's the grind.
C
Yeah.
B
You need it to hang in there. Yeah.
C
And again, I think that if we. There were many moments when things weren't working from 2017 through 2019. We were looking in the mirror and we were like, why are we doing this? This is. This is crazy. But I think we were so convinced about the vision, we just like couldn't believe that the vision would not come true. So we stuck with it. So Fast forward to 2022. The pieces started coming together. We realized that we could start building something that felt more like a language tutor that could give you feedback, that could start explaining to you why you did something wrong. And that was Act 2 of Speak True English Tutor.
A
This is something that a lot of founders struggle with today. It's like I'm kind of building something, hoping that the models get better later.
C
Yeah.
A
How did you feel once the models got better? Did you feel like, okay, I am ahead of the curve because I built all this history of building product and like doing all this work, or did you almost feel like, okay, we spent all this money and time building these models and now we're just going to use Whisper?
C
It was purely positive for us. We still kept using our custom ASR system because it was streaming real time, really fast, really well, fine tuned Whisperer wasn't streaming, so it was a different use case. We used it for the more spontaneous stuff. And I think in almost every way we were Just really excited because pretty directly as the frontier of model intelligence improved, it would just unlock things on our roadmap that were locked before, if that makes sense. And we still really operate in that mode today where we take a model and then we try to think about, okay, how do we saturate model capability by building product on top of it. And then it happens again. Right. And then we build and saturate the model capability again. I think that's a really cool paradigm to like, you know, think about. But all the LLM stuff basically allowed us to build a tutor for English and we still didn't have like real time voice, for example. Right. But the barriers are coming down now. Obviously it's a really hot topic. We're actively building out a real time voice platform that we can build a lot of more verticalized, specific lesson experiences. On top of that, I'm super, super excited about. I don't think they're going to replace our current lessons. They're going to be more immersive, just a different thing probably for more advanced learners.
B
Still language learning though, not broadening out from language.
C
Yeah. So I think that language learning is interesting because it is so universal. 99% of people, you know, have certainly tried to learn a language and it's so hard.
A
Right.
C
Becoming fluent just has a huge failure rate and it's something people are willing to pay for. So I think that has been just like a pretty amazing beachhead for us. And I think we'll be doing language learning for a long time. There's a huge, huge, huge company to be built here. But our even longer term ambition is really this idea that even beyond language, we think AI will reinvent how people learn anything. Right. It already has for me. Right. I use ChatGPT to learn things every 10 minutes and I think I'm just naturally like a very curious person. So whenever I'm thinking about something, I want to know more about it and then I'll naturally go to ChatGPT and then I'll learn about it. It's unlocked this like entirely new dimension of learning and I'm spending way more time learning as well as an adult, which is really cool. And I want to bring that in a more sort of structured, systematic way to everyone. So I think that that's like the vision beyond language.
B
I'm curious to sort of double click onto just the tech side. We talked a little bit about the content that you own and develop in house and we talked a little bit about the onboarding memory. I assume that you have conversational memory as you go. Right. And any other major pieces of the puzzle that really unlocked it for you.
C
So there's a few things I can talk about. I think one thing is in order to go from teaching English to teaching a bunch more languages, we needed to really figure out more direct AI content generation. That was a pretty right. Because it's hard to scale. Like our little studio in LA where we shoot a lot of the video lessons, all of the scripts were written manually before by our content team. But we want like a hundred x more content. Right. And 10 x more languages. Eventually 100 x more language pairs, which is how we think about it. It's like, what's your native language? And then what language are you learning? And really the only way to do that is to make it more AI generated. And, you know, very much like a AI native company. We want to be on a frontier here. We want to keep a small team and to have as much leverage as possible through these types of tools. So that's a big active area where we're building out. I think using, you know, people overuse the word agent, but we have a tutor agent, we have a curriculum writing agent, we have a giant LM based pipeline that creates curriculum, scaffolds it in the right way, writes the lessons themselves. That's a big active area that will basically help us to scale to a lot more markets and a lot more languages. So that's like one big thing. Another big thing is we care a lot about fluency, obviously. Specifically, we want to be able to quantify how fluent you are. So if you're learning Spanish, it's like, okay, what, what does it mean to be fluent? Right.
B
And some real world tests for that.
C
We care about real world fluency. Your ability to go to Mexico City and go to a street taco stand and actually order. Right. That's very functional fluency in one aspect. You might be really good at that, but be completely unable to like, talk about your family. Right. So the frontier of fluency is very jagged, but we're very pragmatic and we care a lot about meeting user goals and helping them become fluent at what they care about. And we're thinking a lot about, okay, how do you quantify that? How do you actually store a knowledge graph of everything you know about Spanish in terms of the vocabulary you know or you don't know this, you know, the patterns that you know or you don't know, the mistakes you made using Speak over the last month that are clustered.
B
You said the magic word of knowledge Graphs. Is that live?
C
Is, is that experimental? There are aspects of it that are live and it's, it's a very sort of multidimensional system where we think of it as there are many aspects of fluency. Right. There's many subscores and we have a few of them that are currently live and we're actively developing other aspects of it. And then all those will fold up into a more holistic fluency score. The idea is that eventually, once we have a complete enough picture, everything will fold up into a number that we call the Speak score. That is a very sort of holistic measure of just like, how good are you at Spanish? Right. And obviously 54 is kind of meaningless by itself, but it does give you a general sense. Right. Like being at 54 versus being at 5 is very different. Right. And I think everyone can kind of like intuitively understand that.
B
It's surprising, like I would have grounded it more in real world. Like, we will get you to pass this exam. That is a standard. That is like the ESL standard or whatever.
C
So the way that we think about that is we don't really teach for the test. I think it's possible in the future that will do a test prep product. But in general we care about real world proficiency in various functional situations. So the way that we think about it is if you're at this level, then these are the things you can do. Right. So it is exactly that.
A
We have that a lot in Italy. I grew up in Italy, so English is my second language. And there's a lot of people that pass a lot of tests and like get high grades in all the classes and then they travel to the US and the UK and it's like hard to speak because you don't. You know, I feel like the, the hard part is like being in the conversation. You know, it's, I think like when I started my brain and reading was like much higher than that conversation.
C
Yeah.
A
Which like doesn't really help you if you're like traveling.
C
So that's me for Chinese because my, my parents spoke Mandarin to me growing up. So I can understand like a non trivial amount. But I'm very bad at speaking. Yeah.
B
I heard there's a good language learning product.
A
I have one question on the course generation.
C
Yeah.
A
How do you eval that product? Like when you're asking the AI to generate courses, how do you figure out the courses are going to be good?
C
Rely very heavily on our content team and we are trying to build out an eval suite. It's really hard. The illustrative example here is that as we try to hire and train new content writers on our content team, it's so nuanced. There's many different aspects of training them in the speak method and how to write the right types of lessons and articulating why this form of lesson, which is subtly different from this other form of lesson is better. Right. So we try as hard as we can to articulate that. So I think like forming like a sense of evals using model graded evals like that, that's one piece of it. And I also think like in the future a really good curriculum or lesson writer agent will probably be like reinforcement fine tuned on a lot of our internal data as well. That's something we're experimenting with, but it's still pretty early.
A
This seems like a great example of like you know, AI removing jobs which is like, oh, you're creating the courses with AI, you don't have a person. But it's actually like instead of one person creating two courses, like reviewing 50 courses that the AI generates, that's kind of how you're seeing the content team.
C
The way that we see it really not just for, for our content team members, but also I think it's perfectly applicable to engineering is that it's, it's leverage. It just allows you to do a hundred x in the same amount of time. We still need human review of the syllabus, the curriculum, the specific lines, et cetera. But the hope is that this will allow us to launch a hundred x more courses.
B
A lot of language is colloquial. I think that the way that you put it on one of our episodes one time was the Italian that is taught in school is not the Italian Italian speak.
C
Yeah.
B
How much of that do you adjust for informal versus formal entirely.
C
That's one of our fundamental tenets which is that we don't teach textbook English or textbook language. Like we try very hard to teach Gen Z slang. We don't go quite that far, but slay kings. We try to teach very casual conversational language that is actually what real people use. And like you said, that's usually very, very different. Like if you pick up like a typical English textbook in Korea, it's all really traditional and weird formulations and it's not how people actually speak.
A
Yeah, I know you're going to release Italian soon, so I can give you a hand on that. I know in the US there's not that many dialects. There's like accents but like most of the language like the Words that people use are similar because I know, for example, Spanish is like, you know, Spanish spoken in Argentina is, like, very different than Spanish spoken in Mexico. How do you kind of adjust for that? Or maybe you don't, but.
C
So I would say that, for example, currently we teach American English, Standard American English. We don't really teach other accents or other dialects for now, given how small we are, we just have to be pragmatic and teach in the direction that most people want and most of our users know. So we've made those decisions, like on the content team side for American Spanish, every language that we're teaching. But I do expect that in the future we're going to get a lot more sharply differentiated. Like, if you want to learn British English, then we'll teach you British English, we'll teach you how to pronounce it, et cetera. I think all of that feels like something that superhuman language, you know, tutor should be able to do.
B
I just think it'd be very funny if all the Koreans had, like, a very distinct Southern accent. It'd be great and make that happen. Yeah, I do think about this because obviously there's a moving of the goalpost now that we have this. Now we want the next thing. And obviously people who are English as a second language always have an accent. A lot of people think, I don't have an accent, but if you know any Singaporeans, you know, I'm Singaporean. How much accent training is important, right? I think actually that does help a lot for people. And you cannot tokenize accents yet.
C
Yes, that's right. So I have two main thoughts on this. I think the first one is that communication and your ability to speak spontaneously and get a concept across, an idea across, is almost fully orthogonal to pronunciation. You can be really bad at pronunciation, but still communicate effectively. So a lot of the current core project experience is about just speak as much as possible, make mistakes, don't worry about screwing something up on the accent or the pronunciation side. The important thing is that you literally move your mouth and you make the sounds. Right? And it turns out there's like a really key psychological barrier there where people are just not willing to do this in front of a human, even if it's a human that is a teacher that you're paying. Right. So a lot of the core message of our marketing campaigns in many of our, like, biggest markets is along the lines of, like, you can make mistakes in this private space with speak. And I think psychologically that's extremely powerful. And then you can go and get it right. More confidently in the real world after you practice with Speak. Now having said that, people do care about their pronunciation and their accent, right. So we, we have for English only right now a pronunciation coach that is basically like a fine tuned version of Wave two, which is a meta model, but we basically fine tune it on a bunch of our own phonetic transcripts, like fine tune data. It works pretty well. It's currently for single words, we're going to expand it to full sentences, to more languages, et cetera. But I think that just if you look at like the pure market opportunity, our sense is that we really want to push people to just speak very freely as much as possible. You know, just get that volume up. Yeah, yeah.
B
In terms of immersing language learning in the real world, one of the more interesting approaches that people keep trying is to have let's say like a Chrome extension or something on top of a page. I think Toucan was doing this.
C
There's a bunch of those.
B
Yeah, yeah. And then there was another one I saw recently which is like watch a YouTube video and it'll transcribe for you, but randomly mask out.
C
I saw that too. Yeah, yeah, yeah.
B
That was like a show. Hacker News.
C
Yeah.
B
Do those work?
C
There's kind of the question of is, you know, is that the right product? Right? Yeah, I don't think so.
B
Basically the difference is your content or real world content.
C
Right.
B
Obviously you want real world content.
C
I think that for work. Right. So for Speak for business, for the, for the B2B product, another part of the vision is really like what should a superhuman language tutor be able to do? It should probably be able to handle kids as well as a Samsung employee that wants to transfer to the US office and wants to use Speak for work. Right. So our view there is that it's the same product, it's a different distribution Mechanism.
A
Right.
C
Consumer versus B2B. And I think that we will eventually build something like a Mac app. Maybe it'll be integrated with the browser in some way. We're not really sure yet. But obviously in order to apply it to your day to day, there needs to be some way to hook into your actual sort of work documents, whatever. That's a whole can of worms. We are actively thinking about it. But I think my sense is that it's not clear to me that any of these products have really taken off. And I think that there's many other approaches that are possible. I don't have the answer. But like another example, very hypothetical future world is maybe OpenAI. You Know, the new Johnny I thing will come out with some hardware that will be listening to you all day. And then we can, you know, give you some sort of like, very deep analysis that is integrated with the Speak app at the end of the day or like, you know, the end of the week, Whatever. I don't know.
B
Okay, one more time since you brought that up. I'm sure you don't. I should. They haven't told you anything, but. What?
C
I don't know anything.
B
What's it going to be?
C
I don't know anything.
B
It's like a very, like. It's the number one topic in all the parties I go to now.
C
Really? Yeah. What's the most compelling idea you've heard?
B
Okay, so there's people that say Joni hates wearables.
C
Yeah, I've heard that too.
B
And I'm like, if it's not a wearable, then you've just made a second phone. In that case. Just make a phone.
C
Yeah.
A
I thought they said it was. I mean, didn't Sam say that it was. He wanted to do a phone in.
C
The past that was in the far. That was like in the far past.
B
He says a lot of things.
C
He does say a lot of things.
B
Yes. Okay. Anyway, I think wearable makes sense. I think the race is to capture context.
C
I mean, I have a wearable one.
B
Yeah, we have a wearable here, too. Transcribe everything.
C
That transcribes everything. Yeah, that's cool. Yeah.
B
It's a previous episode of ours with. I can hook you up if you want. But yeah, I think it's something that a lot of people are interested in, obviously, because it's a huge bet by them and. Yeah, curious. Okay. You mentioned video. I just wanted to double click on that a little bit. I'm sure engagement very high for video because people love to watch video. I thought that Speak would be one of those places where like you just kind of leave it in your pocket. You walk, you take. Take a walk, learn to speak. Probably that's not true.
C
What we've done so far is part of the course experience is a teacher video. We've tested other more audio Forward types as well. We found that, of course, like you said, video is very engaging, but at the same time, we have a lot of users that do want to be able to walk around with the phone locked in their pocket. So doing something that is more like voice mode with optional, you know, visuals I think is really good. I think there's huge opportunity for a better way to learn things like listening Comprehension. So I took German in grad school for two years and I thought I was getting somewhere, but anytime I listen to a native German speaker, it's so fast. It's completely on a different level. And I think you can imagine a plethora of really cool experiences that feel kind of like you're listening to a podcast, but it's all AI generated. It's fully controllable, it's integrated with the app. You know, there's something there for sure. Yeah.
A
Don't want to do AI podcast, man.
C
We're cooked.
B
It's okay. We'll document the own ending. I mean, I think when that happens, we just end the show.
C
Like, why not to zoom out a little bit. In the pretty near future, multimodal models will cross the threshold where they will be able to generate images a lot faster than they currently are. Maybe somewhat close to real time, even. Right. And audio at the same time, Text at the same time. And you can imagine just like a very powerful multimodal tutor that can kind of do it all at once, where there's an audio track. And then if the teacher is teaching you something with the right timing, it chooses. Okay. At this point, I'm about to introduce a new concept. So I'm going to show the word on screen so that the user can see how it's spelled right. There's a lot there. You can do generative ui, a lot of nuance there where it's easy to do it badly, but to do it well requires a fair amount of reasoning and mental modeling of what the user knows. Yeah. Which feeds into what you need to show at what time. So that's probably gonna have to be like a pretty parallel set of systems.
A
Have you spent any time looking at this, like, you know, like VO3, where you do video plus audio at the same time, on how you can tweak the audio part versus the video part, because I can imagine you might work on a video part and then you want to change the audio generation model. I don't actually know how the model works inside on, like, how much you can tweak.
C
We haven't really looked at the video stuff much. We basically think that we're very bandwidth constrained. Right. So we're just scaling and trying to hire as fast as possible like everyone else is. And as a result, we're really focusing on just like the most in reach, highest impact things. I do think that the barriers are coming down very fast for all of this sort of stuff. I'm just so excited about multimodality and where things are going here. Because imagine if you're learning Spanish, being able to look at an image that the model generates for you and then doing Q and A on it. Right. Like a beach scene. And then the model will ask you, like, how many people are running on the beach? And then you have to sort of respond in the target language that you're learning. Very traditional language learning exercise, but you can imagine it being fully generative, which is really cool.
B
Awesome.
C
Lots of stuff like that.
B
The engineer in me worries about inference costs, but I think you can just kind of sweep that under the rug, see if it works first and then you can worry about costs.
C
Yes.
B
You mentioned Real time voice platform. I just want to give you the platform to platform to talk more about that. Just like you mentioned, for example, that you're a very heavy user of the Real time API from OpenAI and you build a bunch of tooling around it.
C
Yeah. So we last year had early access to the Realtime API and there's a very obvious sort of use case for language learning. I think one common theme that has just been pretty awesome since LLMs came out is that language learning as an application is just a really good fit for LLMs. All these model types in almost every way, which has been just really great for speak specifically for Real Time. I think the audio piece promises to really, like, infuse almost every surface in the app you can imagine. This is the primary way that you talk to your tutor. Right. And an additional complication is that it needs to be multilingual and there needs to be code switching. So that's a pretty frontier problem right now.
B
Right.
C
So, like, I should be able, if I'm learning Spanish, to speak both English and Spanish and vice versa from the model. That's a pretty hard TTS problem today. It's actually like only a few models are able to speak two languages in the same sentence and then. And then pronounce them properly.
B
Sorry, you can't have a router model, like a tiny little router model. Guess which language first and then route.
C
Well, the problem is that there's. You could have a subword in, in a single sentence in a different language. Yeah. So you can't just concatenate either. Yeah. Because it won't sound right. Yeah.
B
Right.
C
It won't sound natural. That's not how humans do it. So this needs to be like a, like a very native, controllable audio, you know, function. Yeah. But we are in the process of building a variety of experiences on top of the Real Time API. I want to clarify that actually nothing is in production yet, mostly for price reasons. Frankly, the pricing model of the Realtime API makes more sense for something like a customer support agent where you're very directly replacing somebody that you would pay hourly otherwise. And that's how you're seeing the price model for a lot of these initial agents work out for us. We want our users to be able to do these real time role plays and have these conversations for many hours a day. Right. If they want. Getting cost under control is definitely a pretty key consideration right now, but we are pretty close. Maybe actually even by the time that this episode is released, we'll have something live. But we have a, what I think is a really cool application of the Realtime API, which is basically a new instructional lesson where it's the model actually teaching you something like a new language concept. And it's intended to sort of augment, slash, play the same role as our current video lessons, which are the instructional lesson type. And it's interactive. Obviously at certain points in the three to five minute lesson you're interacting with the real Time API. It's semi on guardrails. There was a lot of scaffolding we needed to build to basically number one, switch between the interactive and non interactive portions of this lesson properly, if that makes sense. Right there, there's, there's some portions where you're just listening or looking and then some portions where you're actively in a short conversation and we kind of swap back and forth and we have like a bunch of sort of custom architecture and info around that. And then there's also making the cost make sense or at least like semi make sense. And then there's a bunch of WebRTC infrastructure. We're at sort of, you know, not huge but non trivial scale either. So we definitely just. It'll cost us millions of dollars if we do something wrong. Yeah, yeah.
B
Do you do inference in Korea because of the, you know, latency and all that?
C
It's something that we have been increasingly paying attention to for all the real time paths. Like I would say two or three years ago when real time stuff was still quite nascent, users didn't really care as much. But I think now the standards have risen, right? Like latency has to be low, everyone cares.
B
Do you have a hard latency budget for responses or do you just kind.
C
Of work it out?
B
So for example, you have a knowledge graph that you're accessing, you have content that you're retrieving. There's a lot of stuff there and then maybe you're using a reasoning model, probably not. But that all eats into the budget.
C
I will say that from the real time engineering side, everyone talks about, okay, submit user request to get agent audio response. Like first bytes. Right. First audio bytes. What's that? Latency. And then we try to get that as low as possible. I would argue that's actually like a vanity metric because what you don't take into account is how the VAD works. How do you do turn detection to detect when the user is finished speaking. Right. Because that can easily add like another second if you do it badly. And nobody talks about that for some reason. Right. Like what you need to measure is actually when does the user stop talking to when does the model first audio come? And usually that number is much larger. That is a very domain specific problem. You can use like the semantic VAD on real time API for regular English conversation. And that will basically classify at every token how likely it is that you're done speaking as a sort of normal conversational English speaker. Like in this conversation. That's fine. But it doesn't work at all for language learners. Right. If I am trying to respond in a language that I'm learning, I'm going to be hesitating halfway through for 10 seconds for more. Right. So it needs to be fully custom. Probably this is something that we're also actively working on, but that is actually like the dominating factor in perceived latency coding.
B
Do you use Cursor, Windsurf and other autonomous agents?
C
It's kind of all of the above, yeah. So I think, like, as the cto, I view it as part of my responsibility to really set expectations, push everyone on the team, show them what's possible. We've been trying everything. Yep. And I think we tried to basically set the expectation that the frontier is moving so fast it's deeply non intuitive. If you've tried coding tools six months ago and they weren't that great. Especially if it's not Typescript or Python. Right.
B
It's like Mo collapse are the most popular languages. Like that's all it is.
C
We try to set a culture in the engineering team where usage of these tools as much as possible is. And as the default path is the expectation. And in hiring, we are now explicitly asking about this a lot, thinking about what are the types of people that are going to be better hire agency at trying these types of tools.
A
It's so important before we zoom out anything we missed about speak that you really want to highlight or something that.
C
People underrate about it One thing that I've always been really excited about is that I feel like a lot of the foundational pieces that we're building around Knowledge Graph, for example, a lot of these concepts should be applicable to not just learning language, but also other things in the future. We're already starting to see the very beginnings of this on the B2B side, where a lot of it is more like management skills and hospitality skills, communication skills, more like true L and D for enterprise, less like core pure English proficiency. So I think you were, you know, that's like obviously immediate neighborhood, but you can imagine many academic subjects, math, biology, et cetera, you know, schools work for. Super excited about that.
B
If I knew my employer was giving me a language tool, but then he was evaluating me on my management skills while learning language, I might use it less. Just, you know, fair. You want to separate that out?
C
Yeah, very fair.
B
I agree overall that the Knowledge Graph problem is very important. We have a whole track on it for the conference and I think that the amount of data can be so high. And actually you want to generate relevant triplets. I assume you use the normal subjects, predicate objects type.
C
It's a bit more custom than that because it's a bit more domain specific around the way that we conceptualize the vocabulary, you know, and the sentence patterns and so on. So it's more specifically around like language learning concepts, if you will.
B
But what I think we can extract from Speak or as it is generalized as a framework, is what I've been calling sort of like the Bloom 2 segment problem type thing, like the level adjusting tutor, like, where are you at? Let me adjust my thing to where you're at and then I'll push you up to the next level. And I think the Knowledge Graph is.
C
A part of it.
B
But I don't know if that's all of it. I've never seen a working example.
C
We are approaching that problem from a few different angles. I think part of it is Knowledge Graph. Part of it is being very careful in how we structure the curriculum so that you're placed at the right level so that the learning path itself, which has a foundational backbone because beginner to intermediate English learners actually like all need to know a bunch of similar concepts. It isn't really until you get intermediate and more advanced where that starts to like, more sharply diverge. And a zero through B1, I would say there's a pretty well defined, like sort of linear path. Actually, a lot of the deep thinking that we've done around how do we structure the Pedagogy is also super useful in terms of just like matching people to the right level. And then you can take this backbone and then basically modify it based on the knowledge graph, on your system's knowledge of what the user is like, bad at versus good at.
B
I think a lot of startups or especially ed tech like that is the core engine. Like, you know, once you do that, you can kind of teach anything.
C
Totally. Yeah.
A
We have a few more broader fun questions. Yeah. So speak.com. great domain. I looked it up. Voice.com got bought for 30 million.
B
Oh my.
C
When?
A
2019.
B
Okay.
A
So I don't know if you want to share how much you paid for it, but it was a lot less. It was a lot less. I figured it would be a lot.
B
Less, but I'm curious if my estimate was 100k.
C
But it was. It was more than that.
B
More than that.
C
Wow. Okay. I'm not gonna say any more about the numbers.
A
So what's the. Yeah, what was the story? Was it easy? Was it. Did you use a broker? Like we had, you know, their mesh shop from hotspot who sold chad.com to OpenAI and he has a lot of very.
C
That was like 100 million deal or something, right?
A
That was. That was very big.
C
Oh, wait, no, that was AI.com? chad.com. Okay. Yeah, we bought it several years ago. It felt very expensive for us at the time. It was a little bit of a crazy move. But I think we were very convinced that we needed a super strong consumer brand that was scalable globally and that was just always our ambition. Like we want to be the way the next billion people learn languages and we need speak.com so we don't regret it.
B
It's such a, such a great word.
C
Makes for great swag.
B
Very nice decision. You had a couple other fun questions. Any fun Korean celebrity stories because you.
C
Work with so many influencers, we have a bunch baking right now. But I think it was something more generally that has just been so fun on the journey. So we would visit Seoul every year. Yeah. And seeing Speak go from nothing to the first time we saw somebody on the street using Speak to. Now our main teacher in the app is like a mini celebrity. People come up to her on the street as she's just walking around Seoul and recognize her from the app, which is really cool. Now we do a lot of advertising. We do billboards, TV commercials, we work with big influencers and so on. So just like seeing the scale of that has me kind of like in awe. It's like really cool just to see something that used to be nothing.
B
I wanted you to name drop like blackpink or I don't know.
C
Look, there's some stuff baking right now.
B
Okay. All right.
A
We talked about the Teal fellowship on your LinkedIn. You kind of have this whole between 2012 and 2016 which you talked about. You did some startups, Any of them that you want to share? Like ideas that you worked on that you thought were cool?
B
Maybe it was just early, but yeah.
A
What people should revisit, Try again.
C
I've always been interested in learning and education. One of the other field startups that I did in that time was it feels silly to even talk about this because amounted to nothing. But it was called Bloom, you know, the Bloom two Sigma problem. It was actually like named after that. And we were trying to to basically build like a better adult learning platform and have really cool interactive JavaScript widgets for various concepts that you could learn. Didn't find pmf. I was young and didn't really know anything about business at the time either. But I think that the common thread through actually like everything that I've been interested in since leaving grad school has been how do we build software, build tools that help people learn things more effectively and better and faster? And now I feel very lucky to be in this position because obviously AI is the ultimate version of that. Right. And it's been completely transformative for me personally because I just get a lot of just inherent fun and pleasure out of being able to like think of a concept and then, oh, now I can talk to this omniscient LLM that can tell me more about it. And I'm really good at asking the right follow up questions that I want to know. So that, that's been completely transformative for me.
B
Do you get a lot of like people using Speak for therapy? Like, you know, because it's not meant to be that, but since you have inference they will use it.
C
In 2023. When we first launched our AI roleplays using GPT4 back then people were way more concerned about safety. Right. And obviously the models now are much better at like refusals and the line is sharper between what's appropriate and not. But we did see a lot of our first users start to put in pretty questionable custom scenarios, you probably guess. Yeah, and you know, like this was something we expected, but I think seeing the logs in person is like very different.
B
Got it.
C
Some shocking stuff in there.
B
Last couple questions. One on Andrej, you talked to him in your Machine learning journey.
C
That was a long time ago. Yeah.
B
He's also working on edtech now. I don't know if you've ever had conversations with him.
C
No, I haven't.
B
He's also interested in language learning, by the way.
C
You know, one thing that I think we didn't really realize early on or like fully internalize at least was just like how deep the market is.
B
Say more.
C
It was so universal where we really struggled to do some of the basic startup stuff around. Define your like ideal customer profile and like, you know, segment your users. Because our users were everyone. Like we, we had parents using it with their kids, we had really old people using it, we had people using it for work. So that was kind of like mind boggling.
B
You still did customer segmentation or are you saying it doesn't matter?
C
I'm saying it was hard to do. Like we tried and we have a sweet spot in Korea. It's like 25 to 45, more professional, more white collar. But it's very, it's like, like very long tail on either side. Yeah, I think it's, you know, it's a huge market and I think it's a very special moment in time right now where it's obvious that a lot of the tech is here. I think it's really good for humanity if we make a lot of progress here. So I'm really excited for his company too.
A
We started asking about the Thiel Fellowship. So maybe we can wrap with one of Thiel's favorite questions, which is, what's something you believe in today that most people will not agree with you on?
C
I think that people, if you recall, expected the world to kind of explode when GPT4 came out and you know, like everything would change. And I think if you like go to another state outside of the Bay Area, probably even in California outside of the Bay Area, and you ask somebody how much their life has materially changed, it's like pretty close to zero. Real world inertia is enormous. Obviously, AI is probably the most transformative technology we've ever built, but I think in a very real sense the world hasn't changed that much either. And that's a really weird thing. Right? So I think we need more builders, we need more people building applications. It's weird to me that Speak is actually like not that many net new consumer AI native applications at scale. Like there should be way more. I would love for there to be way more consumer is hard.
B
Yeah, I'm intimidated. But like, you know, it was just.
C
Like there was never any alternative for us.
B
Yeah, like I said, you didn't have a choice. But also, you're very smart. But also, maybe you have some growth hack things that you can advise people on, that people could learn. But yeah, I agree. I think the general take actually is this is what we want, which is slow takeoff, short timeline.
C
That's fair, right?
B
This is the two by two that everyone always talks about in AI safety. You've seen slow takeoff and maybe don't complain. We have a heads up. Or Dario's right, and half of us lose our jobs in the next two years.
C
Yeah. It's so hard to predict. Yeah. Sometimes I get AI anxiety and then I just.
B
You get anxiety?
C
Yeah. Okay. And I just focus on our users.
B
It's a perfect place to wrap. Thank you so much for taking the time.
C
Yeah, thank you both so much. This is great.
Latent Space: The AI Engineer Podcast
Date: July 11, 2025
Host(s): Alessio (A), Wicks (B)
Guest: Andrew Hsu (C), CTO & Co-founder of Speak
This episode explores how foundation AI models are revolutionizing language learning, featuring Andrew Hsu of Speak—one of the leading AI-powered language education companies. Andrew shares the story of building Speak, its technical and product evolution, and broader reflections on the future of human learning with AI. In a lively and candid conversation, the hosts delve into:
For detailed notes and more episodes: Latent Space podcast