Reiner Pope of MatX on accelerating AI with transformer-optimized chips

Cheeky Pint Podcast: Reiner Pope of MatX on Accelerating AI with Transformer-Optimized Chips

Date: February 26, 2026
Host: Stripe (John Collison)
Guest: Reiner Pope (Co-founder & CEO, MatX)

Episode Overview

In this engaging episode, Stripe's John Collison ("Host") sits down with Reiner Pope, CEO and co-founder of MatX, over a pint to dive deep into the world of AI-optimized chips. Pope, a former Google TPU architect and seasoned software/hardware engineer, gives an insider's view on transformer hardware evolution, the economics and constraints of AI chip manufacturing, and MatX's quest to build the best chips possible for large language models (LLMs). The conversation ranges from deep technical architecture details to industry supply chain dynamics and predictions for AI's evolving landscape.

Key Discussion Points & Insights

1. Google’s AI Hardware Legacy & The TPU

Laying the Foundation: Google’s early work in AI hardware, especially with TPUs, positioned it strongly for the transformer revolution. The original TPUs were elegant, purpose-built for neural nets, not graphic applications, a key competitive advantage (00:52).
"They started with the research, right? The Transformers came from there... TPUs are pretty good."
— Reiner Pope (00:52)
TPU Genesis: The first TPU (TPUv1, 2016) was a minimal but highly effective product designed by a small team, catalyzing a wave of AI chip startups (01:37).

2. Transformers, Parallelization, and 'Mechanical Sympathy'

Transformers and TPUs both exploit parallelization; the hardware's inherent parallel nature must be matched by software that leverages it (03:36).
"Hardware is massively parallel... you have to take advantage of that."
— Pope (03:36)
The idea of ‘mechanical sympathy’—designing software with the ‘wants’ of the hardware in mind—originated in other industries but is now critical in AI (04:32).

3. CPUs vs. GPUs vs. Custom AI Chips

GPUs outperform CPUs by shifting architectural emphasis: less "driver" (control), more "payload" (math), and support for wide vector instructions (06:35).
CPUs are like motorcycles (lots of steering/control), GPUs are trucks (big, straight-line payloads) (07:29).
"CPUs... fine grained changing what you want to do. On a GPU, you're just going to go straight line for a really long time."
— Pope (07:42)

4. MatX: A Startup’s Chip for LLMs

Why MatX? Pope and co-founder Mike (also ex-Google), frustrated by incumbent hardware constraints and slow big-company risk appetites, founded MatX to make chips laser-focused on LLMs (08:03).
Product Philosophy: MatX targets two key metrics: throughput (tokens per second per dollar) and latency, aiming to excel at both—contrary to most chips which trade them off (09:58).

5. Training vs. Inference Products

While MatX's core chip will serve both inference and training workloads, first sales are likely inference due to lower risk for buyers (11:15).

6. Raising & Scaling: The Go-To-Market Journey

MatX recently closed a $500M Series B/E led by Jane Street and Leopold Aschenbrenner's Situational Awareness, crucial for manufacturing ramp and competing against hyperscalers (11:45).

7. Measuring Progress: From Flops to Tokens/Second

Chip-level metrics (flops) are necessary, but end-users increasingly care about application-level throughput (tokens/second) and usable compute (13:31).
"If I can make this 20% more efficient, then it can save that 20% of the build out."
— Pope (18:31)

8. Supply Chain Bottlenecks

As AI demand surges, supplies become constrained: TSMC for logic dies, HBM (memory), rack components—even cables and connectors—are all potential limiting factors (18:23).
"We're going to have crunches on all of the supply chain, really."
— Pope (18:30)

9. MATX Technical Architecture & Innovations

Main Hardware Innovations:
- Combined SRAM (for weights, low latency) + HBM (for inference data, high throughput)
- Extremely large, split systolic arrays for efficient matrix multiply (core to transformer LLMs)
- Novel low-precision arithmetic, with 4-bit numerics and beyond for efficient computation (21:54)
"Pick your memory system... take the hbm, take the sram, put them together in the same chip... We've done that work."
— Pope (21:57)
"We actually have an ML team who... research different forms of numerics."
— Pope (24:41)

10. Chip Design: How the Sausage is Made

Programming happens in Verilog, simulating with custom performance models before the slow, expensive EDA compile (25:42–26:22).
Chip development is waterfall-like: much gets fixed early in architecture, with limited flexibility late, and high stakes for bugs ("tape out" costs ~$30M for first chip) (31:20).
"The ideal is that your first tape out is your production thing."
— Pope (31:20)

11. Product Development & Iteration

Most architectural optimization happens "in your head"—estimating performance and area costs based on rough numbers and established "gate costs" (58:55, 61:06).
Fast iteration with ML experiments, performance simulators, and small models (60:56).

12. TSMC and the Global Fab Landscape

TSMC dominates at leading-edge nodes due to technical capability and business conservatism, without extracting monopoly rents (37:13).
"They don't charge a lot as well... That's a big aspect of why they're so durable."
— Pope (37:21)

13. Vertical Integration and Specialization

Labs face tradeoffs: vertical integration (design own chips) vs. concentrating R&D in specialist suppliers (Google is an exception; OpenAI is considering) (40:29).

14. AI & Chip Design: Using AI for Hardware

Using LLMs to write and test Verilog is promising, especially for logic design; physical design and manufacturing stay bottlenecks (47:44–48:45).
Training LLMs for hardware still faces feedback challenges: what makes a "good" chip? (45:13–46:39).

15. AI Model Architecture Predictions

Expect context windows (memory) to remain roughly stable but parameter counts to grow, with more interventions (like compaction) to manage large contexts (54:18).
"Parameter count should grow much faster than context length... just because of the underlying physics."
— Pope (54:44)

16. Team Building & Culture at MatX

MatX prioritizes co-design: HW, SW, and ML researchers working together, not just tuning existing models but informing the overall chip architecture (56:14–56:57).
"Our ML team is actual real ML research... It's really powerful, and it's really interesting that we can make some sloppy choices in these cases."
— Pope (56:57)

17. Why Rust?

Rust wins over Haskell or Go due to its memory safety, performance, and flexible type system—allowing MatX to finely tune hardware data types during development (62:59–64:10).

18. Optimization Mindset

Pope’s career long fascination: squeezing every drop of efficiency out of code and hardware, whether with hash tables, allocators, or pushing the limits of CPUs (65:25–68:44).
"I just really like dealing with the details. Like you give me a puzzle and I'll be like, let me solve every single piece of it."
— Pope (65:13)

19. Industry Opportunities & Future Entrepreneurial Potential

Still room for new labs, bold approaches to novel model architectures, or loosening traditional constraints (71:21–72:59).

Memorable Quotes & Moments (with Timestamps)

On Google's secret:
"TPUs are pretty good... they at least had the opportunity to design the TPUs for neural nets... a lot of really good decisions." — Pope (00:52)
On parallelization:
"Hardware is massively parallel... you have to take advantage of that." — Pope (03:36)
On memory architecture:
"Take the HBM, you take the SRAM, put them together... weights in SRAM, inference data in HBM; that's what we're doing." — Pope (15:22)
On product iteration:
"What is the most extreme form of fast iteration? It's doing it in your head." — Pope (58:55)
On startup hardware risk:
"A startup is more of the right place to make a big bet... if you fail, it's fine, another startup will succeed." — Pope (08:03)
On TSMC’s market dominance:
"They don't charge a lot... that's a big aspect of why they're so durable." — Pope (37:21)
On the impact of optimization:
"A 20% higher throughput chip means 20% more AI is happening... you actually are meaningfully increasing the amount of intelligence in the world." — Host (62:26)
On future AI products:
"Tape out in under a year and then chips available in... 2027 I will be seeing very high performing chats as a result." — Pope/Host (55:45–55:57)

Timestamps for Important Segments

| Topic/Segment | Timestamp | |---------------|---------------| | Google's TPU foundations | 00:52–01:37 | | What makes AI chips fundamentally different | 03:36–05:30 | | GPUs vs. CPUs for AI: the truck vs. motorcycle analogy | 06:35–07:54 | | Why MatX exists—niche hardware for LLMs | 08:03–09:36 | | Core MatX product metrics: throughput & latency | 09:58–11:00 | | Series B/E funding, supply chain strategy | 11:45–12:49 | | Application-level chip metrics (tokens/sec) | 13:31–13:56 | | Current/potential supply chain constraints | 18:23–19:29 | | Details of the MatX chip architecture | 21:54–25:13 | | Chip design methodology (Verilog, iteration) | 25:42–29:42 | | The economics and risks of 'tape out' | 31:20–32:14 | | TSMC, global fab context | 37:13–40:13 | | Labs making their own chips? | 40:29–41:22 | | AI in chip design; feedback cycles | 47:44–48:45 | | Model architecture predictions | 54:18–55:12 | | Team-building and co-design at MatX | 56:07–56:57 | | Iteration and performance estimation | 58:55–60:56 | | The “Brooks’ Law” of hardware: waterfall vs. agile | 27:41–27:48 |

Additional Highlights

Supply Chain Crunches: Expect ongoing HBM memory, rack/cabling, and power/grid constraints to pace AI scaling. "Great time to be a supplier in this space." (18:23)
Iterative Testing Culture: MatX encourages everyone to know gate-level costs "in their head," aiming to shift more architectural iteration to mental math before heavy simulation (61:06).
Co-Design Culture: ML, hardware, and software are not siloed; team structure maximizes cross-pollination and innovation.
Enduring Innovation in Hash Tables: Pope’s persistent fascination with optimization, applying both hardware and software innovation.
Trends in AI Model/Hardware Interface: Expect more creative splits between training and inference, loosening traditional assumptions for performance (71:55).
Vision for AI's Next Few Years: Anticipate faster, cheaper, and smarter models, iterative improvements in chip design, and new approaches to manage memory/context.

Final Takeaways

MatX’s story exemplifies the intersection of deep compute systems engineering, the evolving economics of AI, and the entrepreneurial courage to innovate under constraints. Reiner Pope provides both a technical masterclass and a candid look at the challenges and rewards of building at the edge of AI hardware.

[End of summary]

Cheeky Pint Podcast: Reiner Pope of MatX on Accelerating AI with Transformer-Optimized Chips

Date: February 26, 2026
Host: Stripe (John Collison)
Guest: Reiner Pope (Co-founder & CEO, MatX)

Episode Overview

Key Discussion Points & Insights

1. Google’s AI Hardware Legacy & The TPU

Laying the Foundation: Google’s early work in AI hardware, especially with TPUs, positioned it strongly for the transformer revolution. The original TPUs were elegant, purpose-built for neural nets, not graphic applications, a key competitive advantage (00:52).
"They started with the research, right? The Transformers came from there... TPUs are pretty good."
— Reiner Pope (00:52)
TPU Genesis: The first TPU (TPUv1, 2016) was a minimal but highly effective product designed by a small team, catalyzing a wave of AI chip startups (01:37).

2. Transformers, Parallelization, and 'Mechanical Sympathy'

Transformers and TPUs both exploit parallelization; the hardware's inherent parallel nature must be matched by software that leverages it (03:36).
"Hardware is massively parallel... you have to take advantage of that."
— Pope (03:36)
The idea of ‘mechanical sympathy’—designing software with the ‘wants’ of the hardware in mind—originated in other industries but is now critical in AI (04:32).

3. CPUs vs. GPUs vs. Custom AI Chips

GPUs outperform CPUs by shifting architectural emphasis: less "driver" (control), more "payload" (math), and support for wide vector instructions (06:35).
CPUs are like motorcycles (lots of steering/control), GPUs are trucks (big, straight-line payloads) (07:29).
"CPUs... fine grained changing what you want to do. On a GPU, you're just going to go straight line for a really long time."
— Pope (07:42)

4. MatX: A Startup’s Chip for LLMs

Why MatX? Pope and co-founder Mike (also ex-Google), frustrated by incumbent hardware constraints and slow big-company risk appetites, founded MatX to make chips laser-focused on LLMs (08:03).
Product Philosophy: MatX targets two key metrics: throughput (tokens per second per dollar) and latency, aiming to excel at both—contrary to most chips which trade them off (09:58).

5. Training vs. Inference Products

While MatX's core chip will serve both inference and training workloads, first sales are likely inference due to lower risk for buyers (11:15).

6. Raising & Scaling: The Go-To-Market Journey

MatX recently closed a $500M Series B/E led by Jane Street and Leopold Aschenbrenner's Situational Awareness, crucial for manufacturing ramp and competing against hyperscalers (11:45).

7. Measuring Progress: From Flops to Tokens/Second

Chip-level metrics (flops) are necessary, but end-users increasingly care about application-level throughput (tokens/second) and usable compute (13:31).
"If I can make this 20% more efficient, then it can save that 20% of the build out."
— Pope (18:31)

8. Supply Chain Bottlenecks

As AI demand surges, supplies become constrained: TSMC for logic dies, HBM (memory), rack components—even cables and connectors—are all potential limiting factors (18:23).
"We're going to have crunches on all of the supply chain, really."
— Pope (18:30)

9. MATX Technical Architecture & Innovations

Main Hardware Innovations:
- Combined SRAM (for weights, low latency) + HBM (for inference data, high throughput)
- Extremely large, split systolic arrays for efficient matrix multiply (core to transformer LLMs)
- Novel low-precision arithmetic, with 4-bit numerics and beyond for efficient computation (21:54)
"Pick your memory system... take the hbm, take the sram, put them together in the same chip... We've done that work."
— Pope (21:57)
"We actually have an ML team who... research different forms of numerics."
— Pope (24:41)

10. Chip Design: How the Sausage is Made

Programming happens in Verilog, simulating with custom performance models before the slow, expensive EDA compile (25:42–26:22).
Chip development is waterfall-like: much gets fixed early in architecture, with limited flexibility late, and high stakes for bugs ("tape out" costs ~$30M for first chip) (31:20).
"The ideal is that your first tape out is your production thing."
— Pope (31:20)

11. Product Development & Iteration

Most architectural optimization happens "in your head"—estimating performance and area costs based on rough numbers and established "gate costs" (58:55, 61:06).
Fast iteration with ML experiments, performance simulators, and small models (60:56).

12. TSMC and the Global Fab Landscape

TSMC dominates at leading-edge nodes due to technical capability and business conservatism, without extracting monopoly rents (37:13).
"They don't charge a lot as well... That's a big aspect of why they're so durable."
— Pope (37:21)

13. Vertical Integration and Specialization

Labs face tradeoffs: vertical integration (design own chips) vs. concentrating R&D in specialist suppliers (Google is an exception; OpenAI is considering) (40:29).

14. AI & Chip Design: Using AI for Hardware

Using LLMs to write and test Verilog is promising, especially for logic design; physical design and manufacturing stay bottlenecks (47:44–48:45).
Training LLMs for hardware still faces feedback challenges: what makes a "good" chip? (45:13–46:39).

15. AI Model Architecture Predictions

Expect context windows (memory) to remain roughly stable but parameter counts to grow, with more interventions (like compaction) to manage large contexts (54:18).
"Parameter count should grow much faster than context length... just because of the underlying physics."
— Pope (54:44)

16. Team Building & Culture at MatX

MatX prioritizes co-design: HW, SW, and ML researchers working together, not just tuning existing models but informing the overall chip architecture (56:14–56:57).
"Our ML team is actual real ML research... It's really powerful, and it's really interesting that we can make some sloppy choices in these cases."
— Pope (56:57)

17. Why Rust?

Rust wins over Haskell or Go due to its memory safety, performance, and flexible type system—allowing MatX to finely tune hardware data types during development (62:59–64:10).

18. Optimization Mindset

Pope’s career long fascination: squeezing every drop of efficiency out of code and hardware, whether with hash tables, allocators, or pushing the limits of CPUs (65:25–68:44).
"I just really like dealing with the details. Like you give me a puzzle and I'll be like, let me solve every single piece of it."
— Pope (65:13)

19. Industry Opportunities & Future Entrepreneurial Potential

Still room for new labs, bold approaches to novel model architectures, or loosening traditional constraints (71:21–72:59).

Memorable Quotes & Moments (with Timestamps)

On Google's secret:
"TPUs are pretty good... they at least had the opportunity to design the TPUs for neural nets... a lot of really good decisions." — Pope (00:52)
On parallelization:
"Hardware is massively parallel... you have to take advantage of that." — Pope (03:36)
On memory architecture:
"Take the HBM, you take the SRAM, put them together... weights in SRAM, inference data in HBM; that's what we're doing." — Pope (15:22)
On product iteration:
"What is the most extreme form of fast iteration? It's doing it in your head." — Pope (58:55)
On startup hardware risk:
"A startup is more of the right place to make a big bet... if you fail, it's fine, another startup will succeed." — Pope (08:03)
On TSMC’s market dominance:
"They don't charge a lot... that's a big aspect of why they're so durable." — Pope (37:21)
On the impact of optimization:
"A 20% higher throughput chip means 20% more AI is happening... you actually are meaningfully increasing the amount of intelligence in the world." — Host (62:26)
On future AI products:
"Tape out in under a year and then chips available in... 2027 I will be seeing very high performing chats as a result." — Pope/Host (55:45–55:57)

Timestamps for Important Segments

Additional Highlights

Supply Chain Crunches: Expect ongoing HBM memory, rack/cabling, and power/grid constraints to pace AI scaling. "Great time to be a supplier in this space." (18:23)
Iterative Testing Culture: MatX encourages everyone to know gate-level costs "in their head," aiming to shift more architectural iteration to mental math before heavy simulation (61:06).
Co-Design Culture: ML, hardware, and software are not siloed; team structure maximizes cross-pollination and innovation.
Enduring Innovation in Hash Tables: Pope’s persistent fascination with optimization, applying both hardware and software innovation.
Trends in AI Model/Hardware Interface: Expect more creative splits between training and inference, loosening traditional assumptions for performance (71:55).
Vision for AI's Next Few Years: Anticipate faster, cheaper, and smarter models, iterative improvements in chip design, and new approaches to manage memory/context.

Final Takeaways

[End of summary]

Powered by Wave AI

Summary

Cheeky Pint Podcast: Reiner Pope of MatX on Accelerating AI with Transformer-Optimized Chips

Episode Overview

Key Discussion Points & Insights

1. Google’s AI Hardware Legacy & The TPU

2. Transformers, Parallelization, and 'Mechanical Sympathy'

3. CPUs vs. GPUs vs. Custom AI Chips

4. MatX: A Startup’s Chip for LLMs

5. Training vs. Inference Products

6. Raising & Scaling: The Go-To-Market Journey

7. Measuring Progress: From Flops to Tokens/Second

8. Supply Chain Bottlenecks

9. MATX Technical Architecture & Innovations

10. Chip Design: How the Sausage is Made

11. Product Development & Iteration

12. TSMC and the Global Fab Landscape

13. Vertical Integration and Specialization

14. AI & Chip Design: Using AI for Hardware

15. AI Model Architecture Predictions

16. Team Building & Culture at MatX

17. Why Rust?

18. Optimization Mindset

19. Industry Opportunities & Future Entrepreneurial Potential

Memorable Quotes & Moments (with Timestamps)

Timestamps for Important Segments

Additional Highlights

Final Takeaways

Summary

Cheeky Pint Podcast: Reiner Pope of MatX on Accelerating AI with Transformer-Optimized Chips

Episode Overview

Key Discussion Points & Insights

1. Google’s AI Hardware Legacy & The TPU

2. Transformers, Parallelization, and 'Mechanical Sympathy'

3. CPUs vs. GPUs vs. Custom AI Chips

4. MatX: A Startup’s Chip for LLMs

5. Training vs. Inference Products

6. Raising & Scaling: The Go-To-Market Journey

7. Measuring Progress: From Flops to Tokens/Second

8. Supply Chain Bottlenecks

9. MATX Technical Architecture & Innovations

10. Chip Design: How the Sausage is Made

11. Product Development & Iteration

12. TSMC and the Global Fab Landscape

13. Vertical Integration and Specialization

14. AI & Chip Design: Using AI for Hardware

15. AI Model Architecture Predictions

16. Team Building & Culture at MatX

17. Why Rust?

18. Optimization Mindset

19. Industry Opportunities & Future Entrepreneurial Potential

Memorable Quotes & Moments (with Timestamps)

Timestamps for Important Segments

Additional Highlights

Final Takeaways