The MAD Podcast with Matt Turck
Episode Title: "Inside the Paper That Changed AI Forever — Cohere CEO Aidan Gomez on 2025 Agents"
Date: June 5, 2025
Host: Matt Turck
Guest: Aidan Gomez, CEO of Cohere
Overview
In this in-depth episode, Matt Turck interviews Aidan Gomez, co-author of the seminal "Attention is All You Need" paper and current CEO of Cohere. The conversation explores the origins and impact of the transformer architecture, the evolution of AI research, the founding and strategy behind Cohere, the current landscape of enterprise AI agents, and Aidan’s vision for the future of AI. Listeners get a rare inside look at the history, ethos, and rapid progress in AI, directly from one of the field’s key contributors.
Key Themes & Discussion Points
1. Genesis of the “Attention is All You Need” Paper
- Aidan’s Path to Google Brain & the Transformers Paper
- As an undergrad at University of Toronto, Aidan cold-emailed researchers at Google Brain, leading to an internship via an administrative mistake.
“I think I got in through an administrative mistake because my manager thought I was a PhD student.” (Aidan, 03:14)
- Initially slated to join a different project, he and colleagues consolidated efforts with others at Google Brain to create the Transformer architecture.
- As an undergrad at University of Toronto, Aidan cold-emailed researchers at Google Brain, leading to an internship via an administrative mistake.
- Collaborative, Organic Research at Google Brain
- The openness and academic freedom present at Google Brain during this time enabled the formation of high-velocity, impact-driven teams.
“It was a group of people who were researchers with full academic freedom… You would congeal around projects or ideas.” (Aidan, 06:44)
- The openness and academic freedom present at Google Brain during this time enabled the formation of high-velocity, impact-driven teams.
- Rushed yet Impactful Development
- The model was built “super, super fast”—a sprint to the NeurIPS deadline, with rapid experimentation and bug patching characterizing the process.
- Transformers’ Staying Power
- Despite eight years of progress, the architecture remains fundamentally the same:
“One of the big shocks is how over the past eight years, how little things have changed.” (Aidan, 08:54 & repeated at 01:18)
- Despite eight years of progress, the architecture remains fundamentally the same:
2. Transformers’ Impact and Industry Adoption
- Adoption within Google and Beyond
- Contrary to some industry lore, Google implemented transformers across their products (Search, Translate) quickly, but missed focusing on pure language modeling at Internet scale, which OpenAI pursued early and aggressively.
“To say they didn’t lean hard enough into language modeling…that’s, I think, the accurate statement.” (Aidan, 11:35)
- Contrary to some industry lore, Google implemented transformers across their products (Search, Translate) quickly, but missed focusing on pure language modeling at Internet scale, which OpenAI pursued early and aggressively.
- Why No Post-Transformer Breakthrough?
- The community’s infrastructure and hardware investments cemented transformer dominance; any architectural replacement would require overwhelming benefits.
“We now have chips that are being optimized explicitly to that architecture… we just haven’t found that architecture yet.” (Aidan, 12:45)
- The community’s infrastructure and hardware investments cemented transformer dominance; any architectural replacement would require overwhelming benefits.
3. Frontiers in AI Research: Reasoning & Compute
- Test-Time Compute and Reasoning
- The rise of models that can dynamically allocate “effort” based on problem complexity (e.g. simple math vs. complex tasks):
“You really don’t expect it to spend the same amount of energy and time on those two different problems… but we didn’t have that reality before reasoning.” (Aidan, 16:18)
- Surprisingly, implementing reasoning is relatively easy and cost-effective.
“It’s actually really easy to do. It’s dramatically cheaper than pretraining and so it’s accessible.” (Aidan, 17:37)
- The rise of models that can dynamically allocate “effort” based on problem complexity (e.g. simple math vs. complex tasks):
- Room for Progress in Reasoning
- Most progress has been in math; vast opportunity for expansion into science, medicine, and enterprise tasks.
- Synthetic data and tool use are pillars of next-gen agent capability.
4. Founding and Building Cohere
- From Research to Startup
- Return to Toronto, further work at Google with co-founders Nick and Ivan. Noted the potential of large language models; founded Cohere to build web-scale models, then shifted to enterprise AI.
“Very soon after that, we wanted to explore enterprise applications…We quickly became an enterprise company.” (Aidan, 24:14)
- Return to Toronto, further work at Google with co-founders Nick and Ivan. Noted the potential of large language models; founded Cohere to build web-scale models, then shifted to enterprise AI.
- Enterprise over AGI/ “AI God” Labs
- Aidan found the culture around AGI and “effective altruism” off-putting, favoring practical productivity enhancements:
“I never liked the vibes of the whole AGI. It felt like cosplay. It felt like people were larping a new religion.” (Aidan, 25:34) “I want to save the world with AI, right? I want to put it to work to actually make healthcare better.” (Aidan, 25:54)
- Aidan found the culture around AGI and “effective altruism” off-putting, favoring practical productivity enhancements:
5. Cohere Today: Tech Stack and Product Philosophy
- Vertically Integrated AI for the Enterprise
- Offers its own models (Command, search models) and a platform (North) for building agents that can integrate deeply and securely within organizations.
- North: Private, customizable agentic platform capable of integrating with any enterprise system.
“It’s like an AI agent platform where you can build agents, plug those agents into all the software and data… and then ask them to go do things.” (Aidan, 34:08)
- Synthetic Data: From Skepticism to Ubiquity
- Synthetic data is now the backbone of model training, especially key for stylistic improvements and in regulated/permission-sensitive settings.
“Synthetic data is incredibly effective. It’s now the majority of the data that we train on for creating something like Command A.” (Aidan, 37:51)
- Synthetic data is now the backbone of model training, especially key for stylistic improvements and in regulated/permission-sensitive settings.
- Multilingual and Multimodal Capabilities
- Strong focus on underserved regions (Japan, Korea) via partnerships with local champions (Fujitsu, LG CNS).
“The markets there are extremely underserved. ...The current technology that exists doesn’t serve their needs, especially in the enterprise world.” (Aidan, 45:40)
- Strong focus on underserved regions (Japan, Korea) via partnerships with local champions (Fujitsu, LG CNS).
- On-Premise/VPC Deployments for Security
- A key differentiator: customers can deploy the full stack on their infrastructure for sensitive data and regulatory compliance.
“That’s been a huge unlock. People get comfortable plugging in much more and so they can actually do more with the product.” (Aidan, 47:17)
- A key differentiator: customers can deploy the full stack on their infrastructure for sensitive data and regulatory compliance.
6. Practical AI Agents in the Enterprise
- Use Cases & Agent Workflows
- Agentic platforms’ impact spans everything from customer support to rapid, complex financial research for wealth managers:
“We can take something that used to be a month and bring it down to four hours, eight hours.” (Aidan, 54:21)
- Agentic platforms’ impact spans everything from customer support to rapid, complex financial research for wealth managers:
- Enterprise Adoption: From Experimentation to Execution
- Early stage required heavy consulting. Now, enterprise customers know what they want and pursue precise, high-value use cases.
“...the competency of the customer is much higher. They know exactly what they want to do, they know their business, they know what will count…” (Aidan, 55:12)
- Early stage required heavy consulting. Now, enterprise customers know what they want and pursue precise, high-value use cases.
- Adoption Curve and Laggards
- Early adopters are already seeing productivity gains and developing internal competence integrating AI agents:
“I think there’s advantage being conferred to those who have access, who have adopted early and given their employees this augmentation…” (Aidan, 58:15)
- Early adopters are already seeing productivity gains and developing internal competence integrating AI agents:
- Agent Limitations & Responsible Use
- Not ready for full autonomy in sensitive domains; human-in-the-loop required for medicine, high-stakes finance, and breakthrough scientific discovery.
“There’s places where it’s not ready just because we need good oversight and it may never be ready.” (Aidan, 58:51)
- Not ready for full autonomy in sensitive domains; human-in-the-loop required for medicine, high-stakes finance, and breakthrough scientific discovery.
7. Outlook on AGI, Regulation, and Societal Impact
- AGI and ASI: Fluid Definitions
- Aidan is pragmatic, skeptical of hype, and focused on measurable gains:
“I think betting against progress is bad. … Anyone who’s selling you doom and gloom, I think is wrong.” (Aidan, 30:29)
- Aidan is pragmatic, skeptical of hype, and focused on measurable gains:
- What Keeps Him Up at Night
- Less about technical progress, more concern over global politics, societal stability, and the capacity of AI to address stagnation and inequality.
“As a CEO… what keeps me up at night is politics. … I’m really optimistic about AI. I think it can actually be a big force for good in making sure the good guys win…” (Aidan, 60:27)
- Less about technical progress, more concern over global politics, societal stability, and the capacity of AI to address stagnation and inequality.
- Vision for Success
- Envisions AI reducing costs, increasing abundance, and being woven into the fabric of everyday work for all:
“I want to see GDP impacting productivity gains… I want this technology to make stuff much cheaper, much more abundant.” (Aidan, 61:35)
- Envisions AI reducing costs, increasing abundance, and being woven into the fabric of everyday work for all:
Notable Quotes (with Timestamps)
-
On the purpose of AI:
“I want to save the world with AI, right? … I want to put it to work to actually make healthcare better.” — Aidan Gomez, (25:54) -
On research culture at Google Brain:
“It was a group of people who were researchers with full academic freedom to do whatever interested them... that is a crucial component of successful research organizations.” — Aidan Gomez, (06:44) -
On transformer persistence:
“One of the big shocks is how over the past eight years, how little things have changed.” — Aidan Gomez, (08:54 & 01:18) -
On AGI discourse:
“I never liked the vibes of the whole AGI. It felt like cosplay. It felt like people were larping a new religion.” — Aidan Gomez, (25:34) -
On synthetic data:
“Synthetic data is incredibly effective. It’s now the majority of the data that we train on for creating something like Command A.” — Aidan Gomez, (37:51) -
On agent productivity:
“We can take something that used to be a month and bring it down to four hours, eight hours.” — Aidan Gomez, (54:21) -
On what keeps him up at night:
“As a CEO… it’s politics. I’m really optimistic about AI. I think it can actually be a big force for good…” — Aidan Gomez, (60:27)
Timestamps for Important Segments
- 00:00 — Aidan’s motivation: saving the world with AI, not building God or fighting AGI risk
- 03:02–07:00 — Story behind the Transformer paper & Google Brain’s research culture
- 08:54–10:39 — The surprisingly slow evolution of the transformer architecture
- 11:35–12:16 — Why Google didn’t “win” with transformers; OpenAI’s bet on language modeling
- 16:18–18:09 — Explanation and significance of reasoning/test-time compute in modern LLMs
- 25:34–26:55 — Aidan’s critique of AGI culture and preference for practical, productivity-focused AI
- 34:08–35:22 — Overview of Cohere product architecture
- 37:51–39:31 — Synthetic data’s role and new best practices in AI model training
- 47:17–48:48 — Vertical integration and the importance of on-premise/VPC deployments in enterprise AI
- 54:21 — Concrete example of agentic workflows in finance
- 58:15–58:42 — Emergent differences between early adopters and laggards in enterprise AI adoption
- 60:27 — Aidan’s closing thoughts: optimism for AI, concern for global politics, and broad vision for AI-driven societal progress
Closing Reflections
Aidan Gomez brings a refreshingly practical, grounded, and optimistic view to the AI discourse. From accidentally joining Google Brain as an undergrad, to helping lay the technical foundation of the AI revolution, to leading one of the field’s most important product companies, his story is both unique and emblematic of the rapid progress of the past decade. This episode stands out for its candor, colorful anecdotes, clear explanations of complex technical concepts, and thoughtful insights into the future of AI—focused less on speculative AGI and more on real-world, positive impact.
Listen to the full episode for much more on these topics and to hear from one of AI’s most influential voices.
