
Loading summary
A
The idea of increasing human productivity, letting humans do more, increasing supply, driving costs of things down, letting humans do more and accomplish more. That's what inspires me much more than building God or saving the world from AI. I want to save the world with AI.
B
Hi, I'm Matt Turk from firstmark. Welcome back to the Matt Podcast. Today my guest is aiden Gomez, the CEO of Cohere. In 2017, Aidan was a third year undergrad student who cold emailed Google brain, landed an internship and ended up co author Attention is all you need, the seminal Transformers paper that helped usher the entire generative AI revolution. Fast forward to 2025 and Aidan is now steering Cohere, a 500 person enterprise AI company that has raised over $1 billion in venture capital and powers, Oracle Notion and many enterprise customers with a full stack multilingual platform for AI agents. In this episode we unpack the inside story of the Transformers paper, including the mishire that put Aidan in the room.
A
And I had to be like oh no, I need to finish third year undergrad. We don't hire undergrad students. I got in through an administrative mistake.
B
The state of AI research, including agents, reasoning, compute, synthetic data, and why Transformers is still the dominant architecture eight years after the paper was originally published.
A
So one of the big shocks is how over the past eight years, how little things have changed. It's really surprising to me that the Transformers we train today look so similar to what was back then. We've just scratched the surface.
B
And why Cohere bet the farm on Enterprise AI in the first place instead of focusing on the AGI ego fest.
A
I never liked the vibes of the whole AGI. It felt like cosplay. It felt like people were larping a new religion. Frankly, Enterprise maybe has like a rap of being boring, but I think it's way more important.
B
This is a fantastic episode packed with fun stories and insights to understand the past, present, present and future of AI. Please enjoy this great conversation with Aidan Gomez. Aidan, welcome.
A
Thank you, thank you. Thanks for having me.
B
So to get started, I'd love for you to tell the story of the Transformers paper. You're famously one of the eight co authors of it. And of course Transformers and attention is all you need is the seminal paper that led to everything that we're experiencing today in Generative AI. What was your journey to it? Had you become part of the eight?
A
I was a student at the University of Toronto and I had been working on deep learning because a lot of the early work in the field was obviously done by Geoff Hinton and others at the school. So I'd been getting quite close to it and I was reading up on papers and I just kept seeing Google Brain repeatedly across these papers again and again and again. Researchers from. From Brain. And so eventually I ended up reaching out to them, reaching out to some of the researchers there and saying, hey, listen, I read your paper, I have this idea of how to extend it. And one thing led to another and I got an offer to join you.
B
Just cold email them. You found their email address and just like cold email them. That's awesome.
A
Yeah, well, on papers you have like the email address under the author name. I could just reach out to them to say, hey, super cool experiments. Did you try this? What if we did this next? And they replied to me and said, hey, why don't you come down to Mountain View and do some work with us? So I got hired as an intern on Lukash Kaiser's team and I was sat next to Noam Shazir. And that was sort of the start of it, the end of it, if I skip all the way to the end. I was leaving Google after the internship had finished up and they were throwing a little goodbye Ayden party, whatever, some sweets and stuff. And my manager, Lukash is like, okay, everyone, Aidan's going Back to his PhD. Aidan, how many more years have you got left? And I had to be like, oh no, I need to finish third year undergrad. And the question was like, what? We don't hire undergrad students. I think I got in through an administrative mistake because my manager thought I was a PhD student. So that's sort of like how I got there. The process of the transformer, I mean, it was incredible. Like the velocity that that came together was. I haven't seen anything like it since. So I showed up the project that I was supposed to work on, it's a paper that is out and it was released at the same time called One Model to Learn them all. And it was like a omni model, so you can feed in text, audio, images, everything, and it could output the same. So super multimodal again, this was eight years ago and so primitive compared to what we have today, but certainly prescient to where things were going. But as we were working on that, Lukash and I, we built this framework called Tensor to Tensor. And it was used for doing big training jobs, distributing over a bunch of different GPUs, making things super efficient. And because we were sitting next to gnome, we convinced GNOME to use it. So GNOME joined Tensor to Tensor and then Gnome was in conversation with folks over at Google Translate, which was the second group of people who were working on these projects. And we realized folks were kind of working on the same thing. We were all looking at text based autoregressive models that heavily leveraged attention or were much more pure attention as opposed to these previous RNN models, LSTM models, which were quite complicated and kind of in some ways ugly. So we wanted to strip all that back and just create the most simple, efficient attention based language model. So then we just decided to team up and join forces. And that happened about a month into my internship.
B
So that was purely organic, like Noem happened to be around. And then you had some conversations with other folks. Was that how Google Brain operated? As you know, meaning Google Brain allowed people to just like organically form groups and teams?
A
Yeah, totally. That was it. It was a group of people who were researchers with full academic freedom to do whatever interested them. And you would sort of congeal around projects or ideas. And so that's what happened. We were just chatting to folks, saw a good idea and then teamed up to take it on.
B
It seems that it's a defining characteristic of successful research organizations. At some point recently we were chatting with Dao Kila now of Contextual, and he was talking about FAIR at the time and it sounded like it was a little bit like that as well. Do you think that was a moment in time when like all those labs were authorized to, or you know, allowed to, to operate that way and given free rein to explore anything? Is that still true today as it has it changed?
A
I don't know. I've been out of Google for long enough that I'm. I'm not sure how the culture has shifted. I would say that the, the economic relevance of this work is very different to when I was interning eight years ago. So I imagine things would have to shift out of necessity, especially because of the product implications, the amount of resources that are being thrown at these projects and these models. It's much more consolidated, I would imagine and expect. Certainly the way we run things at Cohere, it's much more like a product organization. Like you have very clear work streams, there's scope to experiment and try to find new alpha, but towards the ends of the product. Right. So it's focused and more narrow. But back then it was very Greenfield and so you could work on whatever you were excited about. And yeah, I do think that is like a crucial component of successful research organizations. And it worked, it produced incredible technology.
B
So going Back to the story, the eight of you got together and then how long was that process of writing the paper?
A
Super, super fast. So probably about a month in, we decided to consolidate and all work on the transformer together. And it just became a mad dash towards the Neurips conference deadline. So Neurips is like this, the biggest AI conference for academics where you submit your papers to. And so we were just all out sprinting and it was a lot of, honestly, it was a lot of like throwing shit at the wall and seeing what sticks. So many different things were tried. So many little bugs kind of hackily patched. Like one example is pure attention architecture. The model can't tell the difference between positions of the elements in the same way that an LSTM could, because it could consume each one one by one. And so then Gnome just came up with this idea. I remember the day I was sitting next to him and he was talking to me about it, of just throwing these sinusoids into the embeddings and having that represent the position and it stuck. And I think we've moved on a little bit, but shockingly we're still quite close to that strategy today. So that's how stuff worked. It was just fixing bugs one by one as quickly as possible. And then whatever we were left with at the last moment, that's what we submitted to NeurIPS. And so one of the big shocks is how over the past eight years, how little things have changed. It's really surprising to me that the transformers we train today look so similar to what was back then.
B
And when you guys submitted the paper and got accepted into Europe, what was the general reception to it? Was it clear to folks that this was going to be a big deal or not?
A
Yeah, I think folks noticed that and we're pretty excited about it. But it was still like the folks who were in NLP or translation, a subset of a subset of the AI community. And so it was fairly limited in terms of reaction, but the folks who knew were quite excited about it.
B
And then the much debated question of why did Google not immediately jump on this? And eventually, famously, OpenAI is the company that's leveraged the transformer architecture faster. What's your sort of insider perspective on that?
A
No, they jumped all over it. So it went to production inside of search, inside of Translate, the existing product suite. And so to say that they didn't adopt the transformer architecture would not be correct. To say they didn't lean hard enough into language modeling, like just pure sequence modeling of text on the Internet, that's, I think, the accurate statement. That's what OpenAI did early and uniquely well, but certainly it was everywhere across Google quite quickly. Bert and the search folks, they figured out how to make use of the transformer super, super fast and to the.
B
Point that you were making a second ago. Why do you think Transformers has had so much staying power? Is that because it's the gift that keeps on giving and the more data and compute you feed it, the better it performs and we haven't reached that moment when things get less exciting? Or is there something in research where, you know, people are, I don't know, for whatever reason, not as productive in terms of new ideas?
A
I don't think people are not productive in terms of new ideas, but I do think it's sort of a, you know, a reinforcing loop or like a self fulfilling prophecy where the community got super excited about the transformer. They built so much infrastructure specialized to the transformer. And so it's like we dug ourselves into this. Well, we now have chips that are being optimized explicitly to that architecture. And so to move architecture it requires so much effort, energy, lift. To rewrite everything and start from scratch, that new architecture needs to present something extraordinarily compelling, like a very good reason to move. And we just haven't found that architecture yet.
B
Yeah, so the bar is super high. And I don't know if that's an unfair question because you're now a CEO, so you're presumably, I'm sure, focused on building a fantastic company pretty much all day, every day. So I don't know how long you spend looking at research papers, but is there anything that you find exciting? So this, you know, Ian Lacan has proposed alternative architecture to LLMs. There was, you know, discussion about state space models, like all that stuff. Do you. Is there something that's emerging on your radar as a post transformer architecture? Possibly even if it doesn't work today, but it sounds promising.
A
Yeah. So I for a long time believed and kind of hoped for a replacement to the transformer. And I think most of the transformer paper, like we're researchers, we want to see our stuff surpassed. It would be terrible if the best we could do is this paper from eight years ago. That's just bad for humanity. Right. We want to see progress. So I've been hoping for that. So much so that when we opened the New York office for cohere, I named one of our meeting rooms SSM because I was like, this is it. It's going to get replaced and it hasn't. It hasn't. It turns out the transformer is a great artist. It copies all the good ideas that it sees out there and so it just hasn't been replaced. When ssms came out, the good ideas from that got ported over to the transformer and we kept going with transformers. Now there are these discrete diffusion models which do diffusion, which has been super popular for image understanding, image generation. It's doing that same process for language models. But I still don't see that replacing the transformer. So I'm waiting like everyone else is. But I am hopeful that we'll get something.
B
And the whole reasoning test time compute that at least for the non AI researcher of us, seems to have come out of nowhere in the last sort of like five, six months in particular from your perspective, that a somewhat obvious idea that was sort of. It was a matter of time until it was going to be implemented. And you know, it's good but not completely groundbreaking. Or is that a major development?
A
Well, it's been worked on for a very long time. So we've known it was coming for. For years. For years. For the past three years. And it is kind of obvious. It is kind of obvious because if you think of like the pre reasoning world, the input space to a language model is everything. It's all of language. And so you can ask it very simple questions like one plus one or extremely complex ones like go cure cancer. Those are two strings or requests that you can ask it. And you really don't expect it to spend the same amount of energy and time on those two different problems. One it should respond immediately, the other one it should probably take years of thinking and trying to accomplish. But we didn't have that reality before reasoning. We had an input and then an immediate response. And so both of those got the same energy and effort put into them. So it had to come at some point. This notion of different amounts of energy or time being spent on problems, test time, compute. I think the effectiveness of it was surprising. It was really quite incredible to see how much gets unlocked these models actually can with very little supervision, very little data from humans saying this is how you think through problems, sort of figure it out for themselves. So that's been incredible to watch. I think the other thing that's been incredible is it's actually really easy to do. It's easy to create a reasoning model. It's dramatically cheaper than pre training and so it's accessible. So there's this huge intelligence uplift that comes for really quite little effort Is.
B
There more juice to squeeze there? Like, should we expect more progress specifically under that paradigm?
A
Totally, totally. We've just scratched the surface at the moment. It's mostly focused on, you know, math problems and this sort of thing. There is a whole world of applications that we need to make it work. Work in medicine, you know, everything from the pure sciences, physics, chemistry. There is, there's so many. All the interesting problems require reasoning and so there's a lot of white space for us to go after and maybe.
B
To unpack this for people. The reason for that, it's because each time you do work on test time compute you effectively, there's an element of bringing a domain into the effort. So people have done that for math, but they haven't done that for other areas. Is that the right way to think about it?
A
Yeah. So they've spent a lot of time teaching the model to reason in the domain of mathematics, in the domain of computer science and coding. But that effort hasn't been applied to biology yet or some of these pure sciences. And it hasn't been applied to automation and the enterprise world. So using the tools that enterprises use to create value in our world, to get things done.
B
So after the Transformers paper, what was your journey to starting cohere?
A
So I bounced around Google for a while. I went back to Toronto. I started working at Google for Jeff Hinton. There I met my co founders Nick and Ivan. And then I started my PhD in England and I was still working at Google. I was flying back and forth between London and Berlin because in Berlin, like London is, it was sort of exclusively DeepMind which was another arm of Google which did AI research. And so brain didn't really exist there. But one of the Transformer co authors, Jakob, he had opened a brain office in Berlin and so I would bounce back and forth to go see him. And we were doing work with Jeff Dean on scaling up infrastructure, so training on networks of TPU pods. So instead of having one supercomputer, you chain together a bunch of supercomputers and you can train something dramatically larger. And that's where we started to see the first instances of scaling up large language models.
B
And that was what year was that?
A
That would have been 2019 or 2020? No, 2019. It would have been just before GPT2. So the first time computers started writing in a way that was compelling, almost like a human. Like you read it and you're like, that was the moment I felt shock reading what I was reading.
B
That's amazing. Just to unpack that, that was A surprise to you? That's one of the things I found the most fascinating about this whole thing is that even people are super deep in the field would be shocked because, you know, there's a whole train of thought that says, well, you know, the rest of us are impressed because effectively the computer does what the computer always did, but now we can emotionally relate to it because it uses language. But so I find it fascinating when somebody of your caliber says, well, no, no, it was a shock to me as well.
A
Yeah, No, I think it's like, before you press go on one of these experiments, you have some belief, you think it'll get somewhere, but then when you actually get there, it takes you a while to get used to it. I had the same feeling when some of these voice models that are able to inject emotion. The experience of interacting with a machine where you hear it inhale before saying something, you hear its lip smacking as it talks to you. You hear it like, you know, hum or ha. Like, it is. It tickles like a part of, like, deep down in your brain. The experience is just so incredible, you can't prepare yourself for it when it's actually in front of you. You just, like, can't wipe the smile off your face because it's so crazy. I had the same thing with, like, reading the first outputs of these. These models and just. It's so delight. It's shocking. Like, it's so creative. It's. You know. The first sample that I got was sent to me by Lukash, my manager. And I've told this story a bunch before, but, you know, it was like an email and the subject was, Aiden, look at this. And then in the body, it was a Wikipedia article titled the Transformer. It was about this, like, Japanese punk rock band that had gotten together. I was just like, reading through the story of this and then at the end, Lukash was like, I just wrote the Transformer. The machine wrote the rest. And I was just like, what the fuck? Like, what do you mean? This is like completely written by a human. So those moments, you know, they're not that frequent, but they're pretty frequent. It's like once a year. Like, same with reasoning. Like, when you're reading through what this model is thinking, you're like, holy shit. Like, it's so part. Yeah, yeah. Like it has a monologue. It's talking to itself. It's thinking through stuff. It's like, oh, I messed up. You know what? Let me try this. And figuring it out. It's so beautiful to see it's very cool.
B
So you are in England working with Jeff Dinan and others, and then at some point you decided to leave and start the company. What was the next hop?
A
Yeah, so when computers started to get quite compelling in language using these language models, I called up Nick and Ivan and said, guys, we need to do something here. There's a very interesting direction of travel. Let's see if we can raise some money and build a company that builds models of the web. Because that's what this project really was. That's what these language models were. They were just models of the content on the Internet. And so that's what we started doing. And then very soon after that, we wanted to explore enterprise applications of it stuff like customer support and chatbots, this type of thing. So we, we quickly became an enterprise company and the rest is history. We're now, I think, approaching 500 people. Offices in SF, Toronto, New York, London, Tokyo, Seoul, and yeah, it's been, it's been really incredible to scale the company.
B
Why did you guys decide early in the company to build an enterprise company? Certainly given your credentials, you could have been one of those AGI labs. Was there a specific reason why you decided not to do that and focus on the enterprise instead?
A
Yeah. I never liked the vibes of the whole AGI effective altruist. This, this whole ecosystem, it never resonated with me. It felt like cosplay. It felt like people were larping a new religion and all of this stuff like create God. I just didn't like that. I didn't like that ethos. And frankly, enterprise maybe has like a rap of being boring, but I think it's way more important. I think like the, the idea of increasing human productivity, letting humans do more, increasing supply, driving costs of things down, letting humans do more and accomplish more, that's what inspires me much more than building God or saving the world from AI. I want to save the world with AI, right? Like, I want to put it to work to actually make healthcare better. I want doctors to spend less time writing up notes and filling out paperwork. I want them to be with the patient, thinking through problems. I want to help them solve those problems, give them agents that can help them do research on something they've never seen before. So I want to put the technology to work in the global economy. And that's really what inspired Nick, Ivan and I.
B
Do you spend any time thinking about AGI and sei? You know, bearing in mind what you, what you just said, do you think that's something where we actually. Or do you think that's a tractable problem and we're getting closer or nobody knows?
A
I mean, the goalposts keep moving constantly and we get to the place that we thought, you know, AGI was and we say, oh, actually, I guess it's a little bit harder than that. No, it's out here now. Well, to your first question, can I avoid it? Do I spend time thinking about it? I can't avoid it. I do spend time thinking about it, of course, and many regulators ask me questions about it, and policymakers, et cetera. So I have to think about it.
B
That sounds like fun to get regulated about AGI. I'm sure that's your favorite activity.
A
It's a joy. Yeah, I love it. I'm glad people are thinking about it. I don't think it should be the center of the discourse in the way that it historically has been. I think there's been positive change to more of a practical focus. But certainly early on in the language modeling game, it was doomsday. It was, we're going to save the world. It was, this is so risky we have to shut down everything. So I think that era has passed us, which is good. And I am glad that people think about that stuff. The long tail risks are important and they're worthy of academic inquiry. And so I'm glad, but I'm very relieved. We're past the point where that's the only thing people seem to talk about.
B
But do moreism aside, this concept of AI continuing to be on some kind of exponential curve towards whether you call it human level intelligence or superhuman intelligence, is that something that you see progress or effectively, we've made a lot of progress, but like, nobody really knows when that's going to stop or whether it's going to accelerate or. What's your sort of expert take on it?
A
Listen, the models are going to continue to get better. They're going to continue to get better and they're going to do some incredible things. And there will be specialist models that emerge to help in things like in pharma, the creation of new drugs in material sciences, for advanced materials. The models will be able to be incredibly helpful to us. The definition of ASI and AGI is so hazy and ill defined, it's hard for me to give a concrete answer to what you're asking. Aside from it must be a continuum and not a discrete bit flip where suddenly it's ASI or AGI. I think we already have AGI to a large extent. If you have the choice between you Right now, you have some symptoms and the only option is Aiden prescribes me drugs or Aidan's model coheres model command prescribes me drugs. The logical and actually correct answer is to have the model do. I promise you, it knows more than I do. Now, it shouldn't be prescribing anyone drugs, but it's smarter than me at that thing and many other things. Right? Like most things, actually.
B
I personally thought we had a GI when Google Photos was able to recognize my family in thousands of photos. My bar is low.
A
Okay, yeah, yeah. So you're already there. Yeah. And then the ASI thing, Like, better than humans. Of course it will get better than humans at some things. Like I just said, it's better than me at this point. Is it better than the world's best doctor at prescribing drugs? Probably not. Will it get there? Probably. So I think betting against progress is bad. I think it's. What does that mean? What does that progress mean? What are the tangible effects? Anyone who's selling you doom and gloom, I think is wrong.
B
One thing that's really interesting about you guys as a founding team is that I believe the three of you are researchers. Right? Like, you all met. That's what you were describing. All of you met at Google Brain. And it's something that I've been thinking about the concept of AI researchers as founders and entrepreneurs. And it seems that there was a whole wave of people doing this, but equally there was a wave back of people just going back to the labs and sometimes going directly through employment, sometimes selling companies, whether that's adept or inflection or character. What was your personal evolution to, you know, go from a world class, you know, person that writing one of the most important papers of all time to suddenly, oh, you know, you have to worry about HR and fundraising and making customers happy?
A
Yeah, no, HR is the worst. By far the worst. What was the transition? I would say it was gradual. It wasn't immediate. I was still doing loads of research for the first, like, year and a half, two years of the company, but it's. I've slowly been weaned or pushed off of research. I'm probably more annoying now to the modeling team than I am helpful. Guys.
B
I can still do it.
A
Yeah, guys, listen to this idea that I did. Yeah, no, it was gradual, I would say. I love my job. Like, I think being a CEO is such a privilege. You get to see so many different parts of the world and not just obviously geographically, but, like, you get to just see so much of what happens out there. Different sectors, different types of people, you know, private sector, public sector. And so it's, it's been a huge privilege, but certainly a journey. And I've learned a lot. I've luck, I've been lucky to have really good mentors. So I have an incredible board. Mike Volpe and Jordan Jacobs, they've been on my board since the very beginning and have really taught me everything I know.
B
Great folks. Yes.
A
And then I've met great founder, CEOs like Jensen and many others. And so it's been a lot of learning very fast and lots of failures which I've had to adapt to and react to. But it's been a privilege.
B
So let's get into Corey here in more detail. So you have built this enterprise AI platform and I will let you correct me if those are not the right words, but interestingly, it's very vertically integrated. So you do the model part, which is command, I believe, and then on top of that you build other products and I believe the most recent one is north, which is an agentic platform. So in better words, how do you describe the company, what it does, and then that product architecture where all the pieces fit in?
A
Yeah, I mean for north, the fast way to describe it is it's like an AI agent platform where you can build agents, plug those agents into all the software and data that the humans inside your organization have access to, and then ask them to go do things. So they use tools, they use sales software, HR software, whatever, like your email, your docs, and it can just go out and accomplish tasks for you. So that's north. But I think it's more interesting to build up from the base which is first and foremost cohere builds models. So we have our command model which is like our generative model, competes with GPT4O Llama, et cetera. We also have another set of models which are our search models. So embed v4 rerank 3.5. These are our search models that can see that can sift through data surface information for you. And so those two form the backbone of North. They're what manages the model's ability to interact with the user, think through problems, use tools, and then find the data that it needs to accomplish a task.
B
How are those trained? The command and the re ranker.
A
Yeah. So the generative model, we actually released a very detailed paper on command A which describes exactly what we did. But similar to others, we do a large pre training phase which involves data from the web, synthetic data that we generate and we do a big training run and then we start doing SFT and RLHF. And so this is the part where you have a model that knows a bunch. And now we want to start making it capable of using that knowledge, using that intelligence towards some task, whether it's using tools performing stuff like rag, where you need to search databases, pull back information, respond with that information. So that's how we build that side. On the search side, it's similar, right? Like we have to search over really complex data. In particular, enterprise data isn't usually out on the web, so you can't find it out there. So we need to go create data that looks like it synthetically and train on that instead. And those models are fully multimodal. And I think I can say, and I think most people would agree, our reranker and embedding models are the best out there.
B
And re ranker just to pick this educational. That's part of the search architecture and basically that enables the customer to. To decide which results should be given priority versus others. Is that fair?
A
Exactly, exactly. You give it a bunch of stuff and there's some needle inside the haystack, and the re ranker surfaces it so it pulls it up to the front so that you can just grab that piece and pass it along.
B
What's the deal with synthetic data? A year or two ago, everybody was saying it doesn't work, it doesn't enable you to train as well. And I may be wrong, but that's what I would hear and tell me that's not correct. And this year it seems to be. People seem to be viewing synthetic data as something that's completely ready for primetime and that they are increasingly using in just everything AI. So one, is that fair or not? And two, if that's fair, what happened?
A
Yeah, there was a period where there were a lot of people saying, I forget the word of like the snake eating the snake, the ouroboros or whatever, like a human centipede of data. And yeah, I think it just got like decisively proven wrong. Synthetic data is incredibly effective. It's now the majority of the data that we train on for creating something like command A. And in many instances, it's actually more useful to the model than human data. The most obvious example of that is stylistically so humans, if you ask them to respond to a question, we're lazy. If I ask, what's nine plus two? A human is just going to say 11. But what the human actually wants to see is Aidan, that's a great question. That's so interesting.
B
This is the first time I'm being asked this question.
A
Yeah, yeah, you're incredible. So stylistically, humans actually prefer models, answers that are incredibly empathetic, positive, patient to human answers. And so what we have to do is we have to get the models to rewrite the human answers because the humans are lazy and they're kind of like, it's 11. It makes you feel like shit. You're like, okay, fine, thanks. So that's the most concrete place where, yeah, synthetic data is just way better. But obviously we're applying it all over the place.
B
Yeah, because one of the many interesting things that you guys do is that you seem to have a development model where you work very closely with some key design partners per industry. So I believe you had RBC for banking, so I guess that's north for banking. So the agentic platform, and then I believe you have a customer for telecom and so on and so forth. Is part of the idea that you can train the model working with these partners, but like ultimately, obviously their data is their data. And if you want to generalize what you're doing to other financial customers, then you need to create synthetic data that looks like that data. Is that part of the idea?
A
Yeah, that's exactly correct. So sometimes our customers either can't or don't want to train a model on their data. Either they actually don't want to train on their data, either they don't have permission to, or they're just not comfortable with it. And so in that case, synthetic is the only option. But given a few examples of the ground truth data, synthetic data works extremely well. You can create huge quantities of fake data that is very, very faithful to the ground truth. And so that's being used all over the place for us. We're lucky in that most customers do trust us because of our deployment model. So we can deploy completely privately, like on premise, we can air gap it. So it's just way more secure than some of the other fine tuning options that are out there. But if they still don't have comfort, we have this option that we can develop a bunch of synthetic data and show uplift in performance without having to actually train on real user data or real patient data.
B
But ultimately you have one big model. Right. So what you learn in finance, what you learn in healthcare, what you learn in telecom, all that goes to the same model versus having specialized versions of command?
A
No, we do have specialized versions, so we can create custom versions for an enterprise. Now of course, like we want to be helpful to an industry. And if we know that a particular industry cares about a use case, we're going to make sure that our general model performs at that use case. And yeah, synthetic data will be a huge part of that. But for a particular customer, oftentimes they want a dedicated model for them for their use case, for whatever it is they care about.
B
And they want this to happen at the model level, not just the agents or interface or they want. Okay, okay, interesting. So that's Command and Re rank. I read somewhere the AYA models, is that on the side? Is that part of the research arm?
A
So it's based off of our command models.
B
Okay.
A
But it's extended and trained for many more languages. So like over 100 different languages. So it's our multilingual effort. That effort's being run by Sarah Hooker. She leads Cohere Labs, which is like our research, nonformation, profit.
B
And that used to be called Cohere for good. Is that what it is?
A
Coheree for AI? Yeah.
B
Okay. And before that it had another name, just 4ai. Just 4ai.
A
Okay. Okay. Yeah. That's what Ivan and I started 4ai when we were in undergrad, just because we wanted to do more AI research and we needed some money for GPUs, so we built a little organization and.
B
And that's still around now in the form of that organization. Okay, great. So that's a research. Research. Is that the research part of Cohere?
A
I mean, yes, yes, it is, but so is Cohere proper. So everyone does research. Cohere collaborates with Cohere Labs very, very closely. A lot of the stuff that you see in Cohere Labs papers, you'll see it pop up in the next version of the command model or other models. And so, like, there's an AYA vision model that we released today. So Command Command A isn't currently multimodal, but you can imagine very soon it will be. So Cohere Labs usually runs a little bit out in front of Cohere proper, but with deep collaboration to the org itself.
B
Interesting. Do you see enterprise demand for multimodal, or is that more anticipation of what may come?
A
Totally. No, there's lots of demand. Some of the use cases, I mean, there's all the OCR type stuff, but there's also like, multimodal is essential for understanding enterprise data, like PDF documents, where there's graphs and this type of thing, or understanding slide decks. A lot of the modalities that enterprises work in are visual, so it's sort of table stakes. The other thing that vision is crucial for is computer use. Right. Like the ability to control a computer. It's a real, like it's a gui. It's like a visual experience trying to use a computer or even like surf the web looking at HTML. You could try doing it if you want. It's a horrible experience for language models too. And so the ability to see is essential to being able to navigate it.
B
Multilingual seems to be something that you guys care about a lot as well. I read that you apply some of the same design principle or design kind of go to market with some key partners like Fujitsu for Japanese, I believe LG for yes. So walk us through the thinking, how it came about. What's special and perhaps difficult about doing this.
A
Yeah, I mean the markets there are extremely underserved. They have populations that don't speak that much English and the current technology that exists doesn't serve their needs, especially in the enterprise world. Like maybe you can get them to speak good enough Japanese at chit chat, but for actual enterprise documents, everything breaks down immediately. And so we focused on teaming up with regional champions. So Fujitsu is the largest SI in Japan, LGCNS is one of the largest SIs in Korea and obviously one of the large chaebols. And so we team up with them to create models specifically for the market. So native in Japanese and Korean, focused on the enterprise use cases that that economy cares about, whether it's manufacturing, finance, whatever is relevant in that space. So that's what we've been doing and it's been extremely successful. It gives that market access to something that it would have had to just wait and wait or develop internally at massive cost overnight.
B
All right, so that's the model layer on top of that. Do you have an API layer? It seems that part of the business is to provide the models to companies like Oracle or Notion. Is that the model themselves, is it a separate product?
A
No, we provide the whole platform. So serving of the models, optimization on different hardware like amd. We're up and running on Cerebras, Groq, of course, Nvidia. So all of that we provide and we can deploy anywhere. And that's been quite unique. I think one thing that's different about agents and AI compared to other SaaS is that usually you're trying to do something that a human is doing in the organization. And to do that you need the same context that human has. And so that means you need to give very broad visibility like our north platform to be fully useful. Needs to see all of your internal communications, all of your emails, all of your documents, your customer records, your et cetera, et cetera, et cetera. And that is a huge security risk, very unique one compared to like CRM software or HR software, that type of thing. And so our security posturing, the fact that we don't say hey, send your data over to us, like hit our API, trust us, we're SOC2. The fact that we don't say that and instead we say we're going to ship our models directly on your hardware, whether it's in your vpc, on a cloud or for regulated industries in your data center. That's been a huge unlock. People get comfortable plugging in much more and so they can actually do more with the product.
B
Interesting. So that's one benefit of vertical integration. So it's not just that you control the whole experience, it's also that you can make the claim with a straight face because obviously it's true that you're not sending data anywhere as opposed to some direct or indirect competitors to cohere would say hey, we can connect to all your sources, but ultimately we, we power it by Gemini, OpenAI or Claude.
A
And that's on Bedrock or Vertex or Model as a service, whereas here you.
B
Control the whole thing. Okay, are you seeing a lot of demand for On PREM and VPC deployments versus cloud? Is the overlap 100% with the regulated industries versus non regulated industries or, or do you see some nuances there?
A
I would say the most interesting work we do requires in VPC or on Prem, it touches the most critical data to the business and so we're having the most significant impact and that's regardless.
B
Of the regulated industries kind of have to do it. But are you seeing non regulated industries that still decide to use you guys on PREM or VPC because their data is so sensitive? I'm just curious because this is all theme of cloud repetition that could be accelerated by AI. I'm just curious what you see in your daily reality.
A
Yeah, I mean in startups that use us, the ones who care about On PREM and mvpc, they are serving regulated customers. That's what they're doing. And so it comes from a security standpoint. And I do think On Prem and NVPC resonates the most with the folks who have the most restrictions on what they can do with their data and where it can live.
B
So talking about north, what are some of the capabilities it has and doesn't have yet? Perhaps express in terms of use cases, what can it do? What can it not do.
A
Yeah. So there's a few different modalities in North. We're still in early access, so we opened that up in January.
B
Right, right. It's brand new.
A
Yeah. Hopefully we'll ga that product quite soon and I'll be able to say more about it. But yeah, you can imagine it as like a fully private version of your favorite consumer chatbot with thinking natively or, sorry, reasoning natively supported, but much more customizable. So you can plug in literally anything. If Accenture or Deloitte built your supply chain team a custom piece of software, you can integrate that into north and the model can start using that software to accomplish problems. And it does a lot of interesting work on deep research and this type of spending time thinking, researching things, sifting through data, taking 5, 10, 20, 30 minutes to accomplish a task. It does that extremely well because of all the data it has access to. It's not just web search, it's sifting through your entire organization is part of.
B
The idea that you're going to evolve towards multi agent kind of workflows. Which seems to be the cool thing to talk about on Twitter in 2025. And if so, what would be some examples of what an agentic workflow looks like? We have to take a shot, by the way, each time we say the word agentic. That's the rule of podcasts.
A
I'll be asleep by the end of it. So for use cases, like right now, the coolest stuff is usually in time sensitive areas. So there's one really cool use case which I think is going to have a huge, huge, huge, huge impact on the economy, which is like doing research for folks in finance, for example, wealth managers, when an event breaks. So what happens today is if I'm a wealth manager, I'm managing like 10 to 15 clients. A war breaks out somewhere or there's some announcement about a new tariff. I get calls from all those clients saying, oh my God, what are we going to do? And I have to spend maybe a week, maybe four weeks researching and coming up with a hedge proposal. So I come up with a portfolio hedge and then I need to implement that across my client base at that point, like a week, four weeks, the world has changed, the market has dropped 20%, the conflict is resolved, whatever. And so the velocity and the importance of being able to act with speed in that space is crucial. And so that's what these agents can do. They can help do that job dramatically faster, that research. They can come up with a proposal way faster than a human ever could. Because it can read 100,000 times faster than a human. So we can go out, read analysis, read articles, and come back with a concrete proposal of what to do. And then the human can take over and make whatever edits they want to make and then take that forward. So we can take something that used to be a month and bring it down to four hours, eight hours.
B
That's a fascinating example. The beauty and the curse of this whole agentic shot, horizontal kind of possibility. The fact that even in the enterprise, agents can do all those things feels like both a blessing and a curse. Meaning that the risk is that customers could be just overwhelmed by the possibilities and not know where to start. How does that work in practice? Do you find yourself effectively doing consulting and sitting down with customers to help them come up with use cases, or do some people know exactly what they want to do? What's the reality of that part?
A
It used to be much more like that, where you'd come in and they'd say, hey, this AI stuff is super cool. My board is giving me a bunch of pressure. What am I going to do about Genai? What should I do? And we would have to help them think through the opportunity space. We'd have to learn about their business, try to help them identify the opportunity. But increasingly, it's actually changed dramatically, and the competency of the customer is much higher. They know exactly what they want to do, they know their business, they know what will count, and they just want to go implement it. And so they need a partner to help them accomplish that roadmap that they've decided on. So there's that phase of, like, POC or figuring things out. It feels like it's passed us by now. Like, most organizations know the opportunity, they know what they want to do, and they really just need help to go execute on it.
B
The 2 billion we collectively spent on Accenture did pay off. Now our customers are ready to truly move forward. Okay, that's fantastic news for the industry. Particularly interesting because, again, like, the agentic aspect of this actually in some ways makes it more complex because. More complex because you have more possibilities. So it's fascinating to hear that people actually, despite the scope of this being widened, actually more, you know, have a more precise idea of what it is that they want to do. Okay, great.
A
Yeah. They also, like, it's not like they come with only one idea, so the scope is still broad. Like, we've had multiple large enterprises come to us and say, we have 700 use cases that we've identified. Can you help us accomplish Them it's like, okay, yeah, we can, but let's start on the most important five and then expand from there. But with a company like Oracle, which has all of this workplace software in Fusion apps and NetSuite, they've implemented hundreds of use cases themselves using Cohere's models. And so the progress for the early adopters, it is pretty staggering. It's incredible how far this technology is reaching. And of course, it's in the hands of hundreds of millions, potentially over a billion now, of consumers. But for workers, for employees, it's very, very rapidly approaching similar scales.
B
And you're starting to see a breakout between the companies that adopted this early, did the work, joke aside, brought in Accenture and are starting to deploy this at scale, and the laggards following the typical adoption cycle, because that's something that we as an industry have been like collectively. Warning, quote, end of quote. Let's say the Global 2000, that would happen. Are you seeing it happen?
A
I definitely think so. I think there's advantage being conferred to those who have access, who have adopted early and given their employees this augmentation and whose employees are by the day getting more and more competent at integrating models and agents and AI into their work. The organizations that have the workforce best capable of doing that, they're going to win.
B
What are agents not quite ready for? What would you advise customers to not do?
A
Well, there's all the sensitive use cases like medicine and these sorts of places, lots in finance for those. You want a human in the loop, you don't want to just hand it over to an agent and let the model go crazy. So there's places where it's not ready just because we need good oversight and it may never be ready. There's a large swath of things that we always want a human in the loop for in terms of technological limitations, where the model is not yet smart enough to accomplish something. I think one of the things that's said quite often now and is a really good example of how far the bar has been raised, is people are saying these models haven't discovered new science, they haven't solved some millennium problem or something like that. That is somewhere that the models are not yet super helpful. If you're a postdoc, it might be useful to your productivity of reading papers and, and preparing talks and that type of thing, but actually helping you discover a new compound. I'm not sure how useful it is yet, but I'm very confident it will be useful very soon.
B
It's been an awesome conversation. Maybe to close, zooming out. What keeps you up at night? Progress or lack of progress in AI research, or the world moving in a specific direction, or micro problems that you deal with every day at cohere?
A
As a CEO, I think for me what keeps me up at night is politics. I think it's the fracturing that we're seeing around the world. And of course those have implications on me and the business. But I'm mostly concerned about our societies, liberal democracies. I'm afraid for liberalism and the progress that's been made over the past century. I'm afraid for that. So that's, that's mostly what keeps me up at night, which is not a technical answer, but I'm really optimistic about AI. I think it can actually be a big force for good in making sure the good guys win and making sure that some of the economic issues the world has been facing over the past 15 years, in terms of slow productivity growth, stagnation, wealth not reaching the population evenly, I'm optimistic that AI can play a role in helping to resolve some.
B
Of that success for you guys over the next three years, five years, what does that look like?
A
I want to see GDP impacting productivity gains. I want to see this technology integrated across the globe and I want it to become a part of everybody's workday, not just their fun time or like searching up stuff. I want it to help people accomplish more and I want this technology to make stuff much cheaper, much more abundant.
B
Aidan, terrific. Thank you so much for doing this.
A
Thanks so much.
B
Hi, it's Matt Turk again. Thanks for listening to this episode of the MAD Podcast. If you enjoyed it, we'd be very grateful if you would consider subscrib if you haven't already, or leaving a positive review or comment on whichever platform you're watching this or listening to this episode from. This really helps us build a podcast and get great guests. Thanks and see you at the next episode.
Episode Title: "Inside the Paper That Changed AI Forever — Cohere CEO Aidan Gomez on 2025 Agents"
Date: June 5, 2025
Host: Matt Turck
Guest: Aidan Gomez, CEO of Cohere
In this in-depth episode, Matt Turck interviews Aidan Gomez, co-author of the seminal "Attention is All You Need" paper and current CEO of Cohere. The conversation explores the origins and impact of the transformer architecture, the evolution of AI research, the founding and strategy behind Cohere, the current landscape of enterprise AI agents, and Aidan’s vision for the future of AI. Listeners get a rare inside look at the history, ethos, and rapid progress in AI, directly from one of the field’s key contributors.
“I think I got in through an administrative mistake because my manager thought I was a PhD student.” (Aidan, 03:14)
“It was a group of people who were researchers with full academic freedom… You would congeal around projects or ideas.” (Aidan, 06:44)
“One of the big shocks is how over the past eight years, how little things have changed.” (Aidan, 08:54 & repeated at 01:18)
“To say they didn’t lean hard enough into language modeling…that’s, I think, the accurate statement.” (Aidan, 11:35)
“We now have chips that are being optimized explicitly to that architecture… we just haven’t found that architecture yet.” (Aidan, 12:45)
“You really don’t expect it to spend the same amount of energy and time on those two different problems… but we didn’t have that reality before reasoning.” (Aidan, 16:18)
“It’s actually really easy to do. It’s dramatically cheaper than pretraining and so it’s accessible.” (Aidan, 17:37)
“Very soon after that, we wanted to explore enterprise applications…We quickly became an enterprise company.” (Aidan, 24:14)
“I never liked the vibes of the whole AGI. It felt like cosplay. It felt like people were larping a new religion.” (Aidan, 25:34) “I want to save the world with AI, right? I want to put it to work to actually make healthcare better.” (Aidan, 25:54)
“It’s like an AI agent platform where you can build agents, plug those agents into all the software and data… and then ask them to go do things.” (Aidan, 34:08)
“Synthetic data is incredibly effective. It’s now the majority of the data that we train on for creating something like Command A.” (Aidan, 37:51)
“The markets there are extremely underserved. ...The current technology that exists doesn’t serve their needs, especially in the enterprise world.” (Aidan, 45:40)
“That’s been a huge unlock. People get comfortable plugging in much more and so they can actually do more with the product.” (Aidan, 47:17)
“We can take something that used to be a month and bring it down to four hours, eight hours.” (Aidan, 54:21)
“...the competency of the customer is much higher. They know exactly what they want to do, they know their business, they know what will count…” (Aidan, 55:12)
“I think there’s advantage being conferred to those who have access, who have adopted early and given their employees this augmentation…” (Aidan, 58:15)
“There’s places where it’s not ready just because we need good oversight and it may never be ready.” (Aidan, 58:51)
“I think betting against progress is bad. … Anyone who’s selling you doom and gloom, I think is wrong.” (Aidan, 30:29)
“As a CEO… what keeps me up at night is politics. … I’m really optimistic about AI. I think it can actually be a big force for good in making sure the good guys win…” (Aidan, 60:27)
“I want to see GDP impacting productivity gains… I want this technology to make stuff much cheaper, much more abundant.” (Aidan, 61:35)
On the purpose of AI:
“I want to save the world with AI, right? … I want to put it to work to actually make healthcare better.” — Aidan Gomez, (25:54)
On research culture at Google Brain:
“It was a group of people who were researchers with full academic freedom to do whatever interested them... that is a crucial component of successful research organizations.” — Aidan Gomez, (06:44)
On transformer persistence:
“One of the big shocks is how over the past eight years, how little things have changed.” — Aidan Gomez, (08:54 & 01:18)
On AGI discourse:
“I never liked the vibes of the whole AGI. It felt like cosplay. It felt like people were larping a new religion.” — Aidan Gomez, (25:34)
On synthetic data:
“Synthetic data is incredibly effective. It’s now the majority of the data that we train on for creating something like Command A.” — Aidan Gomez, (37:51)
On agent productivity:
“We can take something that used to be a month and bring it down to four hours, eight hours.” — Aidan Gomez, (54:21)
On what keeps him up at night:
“As a CEO… it’s politics. I’m really optimistic about AI. I think it can actually be a big force for good…” — Aidan Gomez, (60:27)
Aidan Gomez brings a refreshingly practical, grounded, and optimistic view to the AI discourse. From accidentally joining Google Brain as an undergrad, to helping lay the technical foundation of the AI revolution, to leading one of the field’s most important product companies, his story is both unique and emblematic of the rapid progress of the past decade. This episode stands out for its candor, colorful anecdotes, clear explanations of complex technical concepts, and thoughtful insights into the future of AI—focused less on speculative AGI and more on real-world, positive impact.
Listen to the full episode for much more on these topics and to hear from one of AI’s most influential voices.